Introduction

Creto is a program for the determination of the evolutionary origination and decay rates of transcription factor binding sites. Given a phylogenetic tree and the binding site numbers of the species at the tips, creto uses a maximum likelihood model that permits variable turnover rates in different parts of the species tree. This model can be used to detect changes in turnover rate as a proxy for differences in the selective pressures acting on TFBS in different clades.

Input

Creto uses an input file with the phylogenetic data for the calculation. The data in the file has to be marked with tags, followed with the corresponding data in a new line. Empty lines are allowed and '#' can be used for comments. Following tags are defined (see also example input file in './example/input'):
  • <COMMENTS> [Optional]
    Text after this tag (for example information to the data set) is printed in result files.
  • <TREE> (Required)
    Phylogenetic tree with binding site numbers of the leafs in extendet Newick grammar:
            <tree>        ::= <subtree>";"
            <subtree>     ::= <leaf> | <internal>
            <leaf>        ::= <description>
            <internal>    ::= "("<branchlist>")"<description>
            <branchlist>  ::= <branch> | <branch>","<branchlist>
            <branch>      ::= <subtree><distance>
            <description> ::= <empty> | <name> | <name>"|"<bs_number>
            <name>        ::= <empty> | <string>
            <bs_number>   ::= <empty> | <int>
            <distance>    ::= <empty> | ":"<double>
          
    For example '((Human|12:6.6,Chimp|14:6.6):23.9,Baboon|11:30.5)'.
  • <SCALE> [Optional]
    Given double value is multiplied with all distances in the tree.
  • <START> [Optional]
    Assumed number of binding sites for the root node. If not given, the mean of all leaf binding site numbers is used.
  • <ROOT> [Optional]
    If given, creto determines the parameters for the subtree that corresponds to the given name of a node in the tree. Node names have the following syntax:
    <name> ::= <leaf> | "("<name>"-"<name>")"
    For example '((Human-Chimp)-Baboon)'.
  • <ALTERNATIVES> [Optional]
    For each given subtree defined by the name of the corresponding root node in the tree the program determines alternative turnover rates Node names have again the following syntax:
    <name> ::= <leaf> | "("<name>"-"<name>")"
    For example '((Human-Chimp)-Baboon)'.

Call

The program is called by
creto [OPTIONS] FILE
FILE is thereby the name of the input file that contains the phylogenetic data (see Input). OPTIONS are parameter for the program, defined as follows:
Generic options:
  --help                      display this help and exit
  --version                   output version information and exit
  -v, --verbose               explain what is being done

Configuration:
 Calculations:
  -a, --alternatives          detect this number of subtrees that are most likely
                                to have alternative rates and determine this
                                rates. [0]
  -e, --equilibrium           optimize root parameters under the condition that
                                they are in the equilibrium state.
  -l, --lambda                optimize only lambda and use root mu for all alternative
                                nodes.
  -m, --mu                    optimize only mu and use root lambda for all alternative
                                nodes.
  -t, --test                  use test state.
 Start values:
  --bound                     fraction of the binding site number density on
                                both boundaries that is not considered by the
                                the calculations. [1e-6]
  --decay                     fraction of the binding site numbers that are exists
                                continuously since root (for the determination
                                of the optimization start values). [0.5]
 Optimization:
  -d, --delta                 limit number of rate value bisections in each improvement
                                step to given value. [20]
  -r, --repeats               limit rounds of parameter optimizations to given
                                value.
  --density                   minimal probability density.
 Output:
  -p, --plain                 write plain text results in this file.
                                [results.txt]
  -P, --postscript            write encapsulated postscript results in this file.
                                [results.eps]
  -f, --figure                don't print file name and comments in eps file.
  -s, --subtree               print subtree attributes in plain and eps file.
  -u, --uniform               print branches with uniform length in eps file.
  

Output

During the optimization, creto shows the actual turnover rates together with the corresponding likelihood. When the optimization is finished, the program prints an overview of the input, the initialization values and the results together with characteristics of the given data like mean binding site numbers, the variance of the binding site numbers, the ages of clades, the variance/mean ratio. the turnover rates as well as the ratios of these rates are given for each defined subtree with alternative rates. This data is also written in a text file [default results.txt] and together with the tree in a graphical file [default results.eps].

Downloads

Current release

Installation

  • Requirements: C++ Compiler, GNU Make
  • Copy source file in target directory.
  • Unzip sourcefile with
    tar -xzf creto_1-0.src.tgz
  • Change to Creto directory
  • Compile Creto with
    make
  • For an example call
    ./creto example/primate_HoxA_ERE.txt

Changes and old versions

Publications

Contact

  • Wolfgang Otto <>