PoFF - Orthology detection tool combining sequence similarity and synteny
Description
PoFF is an extension of Proteinortho which incorporates data on conserved synteny
to detect orthologous relationships. This leads to a substantial improvement of the data quality for
related species without loss of performance.
Manual
View PoFF manual
Supplemental Material
Simulation pipeline
The source code of Proteinortho 5 with the PoFF extension and the adapted implementation of the FFadj-MCS algorithm can be found
at the Proteinortho download page. Use -synteny to enable the extension in Proteinortho 5.
The tool to generate gene trees can be used as follows:
Example: ./grbt.x 10 3 out.tree .9 .2 0 .4 .1 .1 0 30
Parameters go like this:
- number of species
- number of gene families
- output file
- probability of gene duplication
- probability of cluster duplication
- probability of genome duplication
- probability of gene loss
- probs. that help to resize gene families according to its size (as describes in the paper)
- as before
- you set this to 1 in case you want to have more duplications at the root of the species tree
- Noise (Recommendation: At least 10)
The output file will contain the gene order, the species tree and a gene tree and a matrix of orthology per each gene family. Java is required to run the application.
The pipeline to generate simulated sequence and position data can be downloaded here.
To start a simulation, run ./generate.sh input file. Example input files as well as F50, F80d and F100 datasets are provided in the archive.
Simulated Data
FASTA and GFF files for F50 F80d and F100 data set.
Orthology matrices for F50 F80d and F100 data set (insanely compressed, we recommend to use command line for decompression: bunzip2 m20_XXX.tsv.bz2).