Creto is a program for the determination of the evolutionary origination
and decay rates of transcription factor binding sites. Given a
phylogenetic tree and the binding site numbers of the species at the
tips, creto uses a maximum likelihood model that permits variable
turnover rates in different parts of the species tree. This model can be
used to detect changes in turnover rate as a proxy for differences in the
selective pressures acting on TFBS in different clades.
Creto uses an input file with the phylogenetic data for the calculation.
The data in the file has to be marked with tags, followed with the
corresponding data in a new line. Empty lines are allowed and '#' can be
used for comments. Following tags are defined (see also example input
file in './example/input'):
-
<COMMENTS> [Optional]
Text after this tag (for example information to the data set) is
printed in result files.
-
<TREE> (Required)
Phylogenetic tree with binding site numbers of the leafs in
extendet Newick grammar:
<tree> ::= <subtree>";"
<subtree> ::= <leaf> | <internal>
<leaf> ::= <description>
<internal> ::= "("<branchlist>")"<description>
<branchlist> ::= <branch> | <branch>","<branchlist>
<branch> ::= <subtree><distance>
<description> ::= <empty> | <name> | <name>"|"<bs_number>
<name> ::= <empty> | <string>
<bs_number> ::= <empty> | <int>
<distance> ::= <empty> | ":"<double>
For example '((Human|12:6.6,Chimp|14:6.6):23.9,Baboon|11:30.5)'.
-
<SCALE> [Optional]
Given double value is multiplied with all distances in the tree.
-
<START> [Optional]
Assumed number of binding sites for the root node. If not given, the mean
of all leaf binding site numbers is used.
-
<ROOT> [Optional]
If given, creto determines the parameters for the
subtree that corresponds to the given name of a node in
the tree. Node names have the following syntax:
<name> ::= <leaf> | "("<name>"-"<name>")"
For example '((Human-Chimp)-Baboon)'.
-
<ALTERNATIVES> [Optional]
For each given subtree defined by the name of the corresponding root
node in the tree the program determines alternative turnover rates
Node names have again the following syntax:
<name> ::= <leaf> | "("<name>"-"<name>")"
For example '((Human-Chimp)-Baboon)'.
The program is called by
creto [OPTIONS] FILE
FILE is thereby the name
of the input file that contains the phylogenetic data (see
Input).
OPTIONS are parameter for the program, defined as follows:
Generic options:
--help display this help and exit
--version output version information and exit
-v, --verbose explain what is being done
Configuration:
Calculations:
-a, --alternatives detect this number of subtrees that are most likely
to have alternative rates and determine this
rates. [0]
-e, --equilibrium optimize root parameters under the condition that
they are in the equilibrium state.
-l, --lambda optimize only lambda and use root mu for all alternative
nodes.
-m, --mu optimize only mu and use root lambda for all alternative
nodes.
-t, --test use test state.
Start values:
--bound fraction of the binding site number density on
both boundaries that is not considered by the
the calculations. [1e-6]
--decay fraction of the binding site numbers that are exists
continuously since root (for the determination
of the optimization start values). [0.5]
Optimization:
-d, --delta limit number of rate value bisections in each improvement
step to given value. [20]
-r, --repeats limit rounds of parameter optimizations to given
value.
--density minimal probability density.
Output:
-p, --plain write plain text results in this file.
[results.txt]
-P, --postscript write encapsulated postscript results in this file.
[results.eps]
-f, --figure don't print file name and comments in eps file.
-s, --subtree print subtree attributes in plain and eps file.
-u, --uniform print branches with uniform length in eps file.
During the optimization, creto shows the actual turnover rates together with
the corresponding likelihood. When the optimization is finished, the program
prints an overview of the input, the initialization values and the results together
with characteristics of the given data like mean binding site numbers, the variance
of the binding site numbers, the ages of clades, the variance/mean ratio. the turnover
rates as well as the ratios of these rates are given for each defined subtree with
alternative rates. This data is also written in a text file [default results.txt] and
together with the tree in a graphical file [default results.eps].
Current release
Installation
Changes and old versions