#WagnerTest.pl version 0.0 # author: Sonja J. Prohaska #institute: Bioinformatics, Institute for Computer Science # address: Haertelstr. 16-18, 04107 Leipzig, Germany # date: 16.01.2005 # language: perl Please report bug or problems to sonja@bioinf.uni-leipzig.de Documentation ============= The program takes an alignment an a tree file and computes the relative modification rates along branches as described in Wagner et al 2004. allow execution of the perl program (chmod +x WagnerTest.pl) and (1) call the program without parameters to get usage information: usage: WagnerTest.pl [OPTIONS] ALIGNMENTFILE TREEFILE options: -d debug -cw default: alntype is clustalw -mfa alntype set from clustalw to multifasta -lc=ignore default: treat lower case letters as unaligned (dialign2) -lc=uc treat lower case letters like upper case letters -r= number of bootstrap replicates, default: 0 (each bootsprap step will take as long as the actual run) -bootsize= default: double the mean block length ALIGNMENTFILE ------------- should be in either of the following formats: - multifasta - clustalw (or clustalw like format of dialign2) The program cannot handle files with several multifasta alignments in one file. CAUTION: there are no checks that these formats are correct TREEFILE -------- very verbose format: (((Aa[0]:0.2,Bb[0]:0.2)[5]:0.2,(C[0]:0.3,D[0]:0.3)[7]:0.1)[10]:0.6,E[0]:1.0)[15] | | | | | | | | | divergence time of Aa and Bb | | | Aa and Bb are sister groups | | branch length of Aa (not used) | time of leave Aa always recent (0) species Aa most simple format: (((A,B),(C,D)),E) other examples: ((((A:0.2,B:0.2):0.2,(C:0.3,D:0.3):0.1):0.6,E:1.0) (((A,B)[5],(C[0],D[0])[7])[10],E)[15] (E[0],((D,C)[7],(A[0],B[0]))[10])[15] OPTIONS ------- -cw alignment format is clustalw -mfa alignment format is multifasta -lc=ignore lower case letters are treated as unaligned and therefore replaced by gaps -lc=uc lower case letters are treated as if they were upper case -d debug, writes out more information about the tree and the alignment read -r=100 sampling blocks of length 'bootsize' from the input alignment, the program generates 'r' (e.g. 100) randomized alignments with the same length (analysis of each randomized alignment will take as much time as a run with -r=0) default: 0 -bootsize= is the size of the block for sampling randomized alignments from the input alignment default: double the mean block length PERFORMANCE ----------- The program has not a very good performance (it is written in perl). It takes about 2 min for 8 species and an alignment of length ~52000nt. A progress report is written to STDERR: 1 of 2 quartets... 2 of 2 quartets... replicate 1 of 100: 1 of 2 quartets... 2 of 2 quartets... replicate 2 of 100: 1 of 2 quartets... OUTPUT ------ The output is written to STDOUT. Lines starting with '#' are comments, headers, seperators. It is a list of all quartets (O X A B) where X is an outgroup to (A,B) and O is an outgroup to (X,lca(A,B)). While lca(A,B) denotes the last common ancestor of A and B. T2 ... divergence time of A and B (taken from the treefile) c(XA) ... number of characters shared between O, X and A c(XB) ... number of characters shared between O, X and B c(AB) ... number of characters shared between O, X, A and B u ... number of characters shared between O and ( X or A or B) q ... estimated number of characters at the last common ancestor of X, A and B var ... variance of the exponetial process block_len mean length of contiguous blocks of characters (footprints) conserved in O and X s ... standard deviation from the mean block length z ... test statistics z' ... test statistics rescaled by the block_len positive z' values indicate increased rate along lca(A,B) to B negative z' values indicate decreased rate along lca(A,B) to B 10% Level z'>= 1.64 5% Level z'>= 1.96 1% Level z'>= 2.57 la2T2 ... lambda_2 * T_2 laT2 ... lambda * T_2 la2 ... lambda_2 (modification rate along the branch from lca(A,B) to B) only computed if T2 is given la ... lambda (modification rate along the branch from lca(A,B) to A, and all other branches) only computed if T2 is given violated! bad news! Constraint q/u >= 1 violated. The output can be further sumarized with the program: WagnerTest_plot.pl WagnerTest_plot.pl [OPTIONS] WAGNERTEST-OUTFILE TREEFILE It calculates the mean z' (or z) value and its standard deviation for the 'r' sampled alignments for each quartet. The output has seven columns: O-X ... outgroups O and X joined with a '-' character A ... species A B ... species B data->z(') ... result from the data (should be close to the mean from the distribution of sampled alignments) sampling_mean->z(') ... mean z' (or z) value for the sampled alignments sampling_std->z(') ... standard deviation for the sampled alignments replicates ... number of replicates/sampled alignments For interpretation of the results see: Reference ========= Divergence of Conserved Non-Coding Sequences: Rate Estimates and Relative Rate Tests Günter P. Wagner, Claudia Fried, Sonja J. Prohaska, Peter F. Stadler Mol.Biol.Evol. 21: 2116-2121 (2004) download at: http://www.bioinf.uni-leipzig.de/Publications/published.html