PoFF - Orthology detection tool combining sequence similarity and synteny

Description

PoFF is an extension of Proteinortho which incorporates data on conserved synteny to detect orthologous relationships. This leads to a substantial improvement of the data quality for related species without loss of performance.

Manual

View PoFF manual

Supplemental Material

Simulation pipeline

The source code of Proteinortho 5 with the PoFF extension and the adapted implementation of the FFadj-MCS algorithm can be found at the Proteinortho download page. Use -synteny to enable the extension in Proteinortho 5.

The tool to generate gene trees can be used as follows:
Example: ./grbt.x 10 3 out.tree .9 .2 0 .4 .1 .1 0 30
Parameters go like this:

  1. number of species
  2. number of gene families
  3. output file
  4. probability of gene duplication
  5. probability of cluster duplication
  6. probability of genome duplication
  7. probability of gene loss
  8. probs. that help to resize gene families according to its size (as describes in the paper)
  9. as before
  10. you set this to 1 in case you want to have more duplications at the root of the species tree
  11. Noise (Recommendation: At least 10)
The output file will contain the gene order, the species tree and a gene tree and a matrix of orthology per each gene family. Java is required to run the application.
The pipeline to generate simulated sequence and position data can be downloaded here. To start a simulation, run ./generate.sh input file. Example input files as well as F50, F80d and F100 datasets are provided in the archive.

Simulated Data

FASTA and GFF files for F50 F80d and F100 data set.
Orthology matrices for F50 F80d and F100 data set (insanely compressed, we recommend to use command line for decompression: bunzip2 m20_XXX.tsv.bz2).