Introduction
Creto is a program for the determination of the evolutionary origination and decay rates of transcription factor binding sites. Given a phylogenetic tree and the binding site numbers of the species at the tips, creto uses a maximum likelihood model that permits variable turnover rates in different parts of the species tree. This model can be used to detect changes in turnover rate as a proxy for differences in the selective pressures acting on TFBS in different clades.
Input
Creto uses an input file with the phylogenetic data for the calculation.
The data in the file has to be marked with tags, followed with the corresponding
data in a new line. Empty lines are allowed and '#' can be used for comments.
Following tags are defined (see also example input file in './example/input'):
-
<COMMENTS>
[Optional] Text after this tag (for example information to the data set) is printed in result files. -
<TREE>
(Required) Phylogenetic tree with binding site numbers of the leafs in extendet Newick grammar:<tree> ::= <subtree>";" <subtree> ::= <leaf> | <internal> <leaf> ::= <description> <internal> ::= "("<branchlist>")"<description> <branchlist> ::= <branch> | <branch>","<branchlist> <branch> ::= <subtree><distance> <description> ::= <empty> | <name> | <name>"|"<bs_number> <name> ::= <empty> | <string> <bs_number> ::= <empty> | <int> <distance> ::= <empty> | ":"<double>For example '((Human|12:6.6,Chimp|14:6.6):23.9,Baboon|11:30.5)'. -
<SCALE>
[Optional] Given double value is multiplied with all distances in the tree. -
<START>
[Optional] Assumed number of binding sites for the root node. If not given, the mean of all leaf binding site numbers is used. -
<ROOT>
[Optional] If given, creto determines the parameters for the subtree that corresponds to the given name of a node in the tree. Node names have the following syntax:<name> ::= <leaf> | "("<name>"-"<name>")"For example '((Human-Chimp)-Baboon)'. -
<ALTERNATIVES>
[Optional] For each given subtree defined by the name of the corresponding root node in the tree the program determines alternative turnover rates Node names have again the following syntax:<name> ::= <leaf> | "("<name>"-"<name>")"For example '((Human-Chimp)-Baboon)'.
Call
The program is called by
creto [OPTIONS] FILEFILE is thereby the name of the input file that contains the phylogenetic data (see Input). OPTIONS are parameter for the program, defined as follows:
Generic options:
--help display this help and exit
--version output version information and exit
-v, --verbose explain what is being done
Configuration:
Calculations:
-a, --alternatives detect this number of subtrees that are most likely
to have alternative rates and determine this
rates. [0]
-e, --equilibrium optimize root parameters under the condition that
they are in the equilibrium state.
-l, --lambda optimize only lambda for all alternative nodes.
-m, --mu optimize only mu for all alternative nodes.
Start values:
--bound fraction of the binding site number density on
both boundaries that is not considered by the
the calculations. [1e-6]
--decay fraction of the binding site numbers that are exists
continuously since root (for the determination
of the optimization start values). [0.5]
Optimization:
-d, --delta limit number of rate value bisections in each improvement
step to given value. [20]
-r, --repeats limit rounds of parameter optimizations to given
value.
--density minimal probability density.
Output:
-p, --plain write plain text results in this file.
[results.txt]
-P, --postscript write encapsulated postscript results in this file.
[results.eps]
-s, --subtree print subtree attributes in eps file.
-u, --uniform print branches with uniform length in eps file.
Output
During the optimization, creto shows the actual turnover rates together with
the corresponding likelihood. When the optimization is finished, the program
prints an overview of the input, the initialization values and the results together
with characteristics of the given data like mean binding site numbers, the variance
of the binding site numbers, the ages of clades, the variance/mean ratio. the turnover
rates as well as the ratios of these rates are given for each defined subtree with
alternative rates. This data is also written in a text file [default results.txt] and
together with the tree in a graphical file [default results.eps].
Downloads
Current release
- Creto Version 1.0 (March 2009, 43kb)
Installation
- Requirements: C++ Compiler, GNU Make
- Copy source file in target directory.
- Unzip sourcefile with
tar -xzf creto_1-0.src.tgz
- Change to Creto directory
- Compile Creto with
make
- For an example call
./creto example/primate_HoxA_ERE.txt
Changes and old versions
-
Creto Version 1.0
First release, March 2009
Publications
-
Wagner GP, Otto W, Lynch V, Stadler PF.
A stochastic model for the evolution of transcription factor binding site abundance.
J Theor Biol. 2007 Aug 7;247(3):544-53. Epub 2007 Mar 7.
-
Otto W, Stadler PF, López-Gialdéz F, Townsend JP, Lynch VJ, Wagner GP
Measuring Transcription Factor Binding Site Turnover: A Maximum Likelihood Approach using Phylogenies.
Submitted to "Genome Biology and Evolution" (April 2009)
Contact
- Wolfgang Otto <
>
