PoFF - Orthology detection tool combining sequence similarity and synteny

Description

PoFF is an extension of Proteinortho which incorporates data on conserved synteny to detect orthologous relationships. This leads to a substantial improvement of the data quality for related species without loss of performance.

Manual

View PoFF manual

Supplemental Material

Simulation pipeline

The source code of Proteinortho 5 with the PoFF extension and the adapted implementation of the FFadj-MCS algorithm can be found at the Proteinortho download page. Use -synteny to enable the extension in Proteinortho 5.

The tool to generate gene trees can be used as follows:
Example: ./grbt.x 10 3 out.tree .9 .2 0 .4 .1 .1 0 30
Parameters go like this:

number of species
number of gene families
output file
probability of gene duplication
probability of cluster duplication
probability of genome duplication
probability of gene loss
probs. that help to resize gene families according to its size (as describes in the paper)
as before
you set this to 1 in case you want to have more duplications at the root of the species tree
Noise (Recommendation: At least 10)

The output file will contain the gene order, the species tree and a gene tree and a matrix of orthology per each gene family. Java is required to run the application.
The pipeline to generate simulated sequence and position data can be downloaded here. To start a simulation, run ./generate.sh input file. Example input files as well as F50, F80d and F100 datasets are provided in the archive.

Simulated Data

FASTA and GFF files for F50 F80d and F100 data set.
Orthology matrices for F50 F80d and F100 data set (insanely compressed, we recommend to use command line for decompression: bunzip2 m20_XXX.tsv.bz2).