#WagnerTest.pl version 0.0
#   author: Sonja J. Prohaska
#institute: Bioinformatics, Institute for Computer Science
#  address: Haertelstr. 16-18, 04107 Leipzig, Germany
#     date: 16.01.2005
# language: perl


Please report bug or problems to sonja@bioinf.uni-leipzig.de


Documentation
=============

The program takes an alignment an a tree file and 
computes the relative modification rates along branches
as described in Wagner et al 2004. 

allow execution of the perl program (chmod +x WagnerTest.pl) and

(1) call the program without parameters to get usage information:

  usage: WagnerTest.pl [OPTIONS] ALIGNMENTFILE TREEFILE
options: -d          debug
         -cw         default: alntype is clustalw
         -mfa        alntype set from clustalw to multifasta
         -lc=ignore  default: treat lower case letters as unaligned (dialign2)
         -lc=uc      treat lower case letters like upper case letters
         -r=         number of bootstrap replicates, default: 0         
                     (each bootsprap step will take as long as the actual run)
         -bootsize=  default: double the mean block length


ALIGNMENTFILE
-------------

should be in either of the following formats:
- multifasta
- clustalw (or clustalw like format of dialign2)

The program cannot handle files with several multifasta alignments in one file.

CAUTION: there are no checks that these formats are correct


TREEFILE
--------

very verbose format:

(((Aa[0]:0.2,Bb[0]:0.2)[5]:0.2,(C[0]:0.3,D[0]:0.3)[7]:0.1)[10]:0.6,E[0]:1.0)[15]
   |  |   | |           |
   |  |   | |           divergence time of Aa and Bb
   |  |   | Aa and Bb are sister groups
   |  |   branch length of Aa (not used)
   |  time of leave Aa always recent (0)
   species Aa 

most simple format:

(((A,B),(C,D)),E)

other examples:

((((A:0.2,B:0.2):0.2,(C:0.3,D:0.3):0.1):0.6,E:1.0)
(((A,B)[5],(C[0],D[0])[7])[10],E)[15]
(E[0],((D,C)[7],(A[0],B[0]))[10])[15]


OPTIONS
-------

-cw          alignment format is clustalw

-mfa         alignment format is multifasta

-lc=ignore   lower case letters are treated as unaligned
             and therefore replaced by gaps

-lc=uc       lower case letters are treated as if they were upper case

-d           debug, writes out more information
             about the tree and the alignment read

-r=100       sampling blocks of length 'bootsize' from the input alignment,
             the program generates 'r' (e.g. 100) randomized alignments
             with the same length         
             (analysis of each randomized alignment will take as much time
             as a run with -r=0)

             default: 0

-bootsize=   is the size of the block for sampling randomized alignments
             from the input alignment

             default: double the mean block length

PERFORMANCE
-----------

The program has not a very good performance (it is written in perl).
It takes about 2 min for 8 species and an alignment of length ~52000nt.
A progress report is written to STDERR:

1 of 2 quartets...
2 of 2 quartets...
replicate 1 of 100:
1 of 2 quartets...
2 of 2 quartets...
replicate 2 of 100:
1 of 2 quartets...

OUTPUT
------

The output is written to STDOUT.

Lines starting with '#' are comments, headers, seperators.

It is a list of all quartets (O X A B) where X is an outgroup to (A,B)
and O is an outgroup to (X,lca(A,B)).
While lca(A,B) denotes the last common ancestor of A and B.

T2    ... divergence time of A and B (taken from the treefile)
c(XA) ... number of characters shared between O, X and A
c(XB) ... number of characters shared between O, X and B
c(AB) ... number of characters shared between O, X, A and B
u     ... number of characters shared between O and ( X or A or B)
q     ... estimated number of characters at the last common ancestor of X, A and B 
var   ... variance of the exponetial process
block_len mean length of contiguous blocks of characters (footprints)
          conserved in O and X
s     ... standard deviation from the mean block length  
z     ... test statistics
z'    ... test statistics rescaled by the block_len
          positive z' values indicate increased rate along lca(A,B) to B
          negative z' values indicate decreased rate along lca(A,B) to B

          10% Level  z'>= 1.64
           5% Level  z'>= 1.96
           1% Level  z'>= 2.57

la2T2 ... lambda_2 * T_2
laT2  ... lambda * T_2
la2   ... lambda_2 (modification rate along the branch from lca(A,B) to B)
          only computed if T2 is given
la    ... lambda (modification rate along the branch from lca(A,B) to A,
          and all other branches)
          only computed if T2 is given
violated! bad news! Constraint q/u >= 1 violated. 

The output can be further sumarized with the program: WagnerTest_plot.pl

WagnerTest_plot.pl [OPTIONS] WAGNERTEST-OUTFILE TREEFILE

It calculates the mean z' (or z) value and its standard deviation for
the 'r' sampled alignments for each quartet.

The output has seven columns:

O-X                 ... outgroups O and X joined with a '-' character
A                   ... species A
B                   ... species B
data->z(')          ... result from the data (should be close to the mean
                        from the distribution of sampled alignments)
sampling_mean->z(') ... mean z' (or z) value for the sampled alignments
sampling_std->z(')  ... standard deviation for the sampled alignments
replicates          ... number of replicates/sampled alignments


For interpretation of the results see:

Reference
=========

Divergence of Conserved Non-Coding Sequences:
Rate Estimates and Relative Rate Tests
Günter P. Wagner, Claudia Fried, Sonja J. Prohaska, Peter F. Stadler
Mol.Biol.Evol. 21: 2116-2121 (2004)

download at: http://www.bioinf.uni-leipzig.de/Publications/published.html