This is the documentation to aln_order_v2.0.pl version 2.0.0 . usage: aln_order_v2.0.pl [OPTIONS] [MULTIFASTA SCORINGMATRIX | -dist=DISTANCEMATRIX] options: -d debug -dmo distance matrix (only) -msto minimum spanning tree (only) -mhpo minimum haletonian path (only) -rr= calculate path for x *r*epetitions of *r*andom paths This program can handle two different types of input: A) a multi-fasta sequence alignment (I) AND a scoring matrix (II) (from which it calculates the distance matrix itself) or B) a distance matrix (III) With default parameter settings it computes * the distance matrix (if it was not part of the input) (III) * the minimum weight spanning tree (IV) * and the minimum weigth hamiltonian path(s) (V) If -rr=[INTEGER] is specified and the minimum hamiltoniam path is going to be computed, -rr=[INTEGER] causes computation of [INTEGER] random pathes and their lengths. (Va) Examples (for a protein sequence alignment): (I) multi-fasta sequence alignment: > hox1 PNAIRTNFTTKQLTELEKEFHFNKYLTRARRVEIAATLQLNETQVKIWFQNRRMKQKKRE > hox2 SRRLRTAYTNTQLLELEKEFHFNKYLCRPRRVEIAALLDLTERQVKVWFQNRRMKHKRQT > hox3 SKRARTAYTSAQLVELEKEFHFNRYLCRPRRVEMANLLNLTERQIKIWFQNRRMKYKKDQ > hox4 PKRSRTAYTRQQVLELEKEFHYNRYLTRRRRIEIAHTLCLSERQIKIWFQNRRMKWKKDH > hox5 GKRARTAYTRYQTLELEKEFHFNRYLTRRRRIEIAHALCLSERQIKIWFQNRRMKWKKDN > hox6 RRRGRQTYTRYQTLELEKEFHFNRYLTRRRRIEIANALCLTERQIKIWFQNRRMKWKKEN > hox7 RKRGRQTYTRYQTLELEKEFHFNRYLTRRRRIEIAHALCLTERQIKIWFQNRRMKWKKEH > hox8 RRRGRQTYSRYQTLELEKEFLFNPYLTRKRRIEVSHALGLTERQVKIWFQNRRMKWKKEN > hox9 TRKKRCPYTKYQTLELEKEFLFNMYLTRDRRYEVARLLNLTERQVKIWFQNRRMKMKKMN > hox10 TRKKRCPYTKHQTLELEKEFLFNMYLTRERRLEISKSVNLTDRQVKIWFQNRRMKLKKMN > hox11 TRKKRCPYTKYQIRELEREFFFNVYINKEKRLQVSRNLNLTDRQVKIWFQNRRMKEKKLN > hox12 SRKKRKPYTKQQIAELENEFLVNEFINRQKRKELSNRLNLSDQQVKIWFQNRRMKKKRVV > hox13 GRKKRVPYTKLQLKELENEYAINKFINKDKRRRISATTNLSERQVTIWFQNRRVKEKKVV (II) scoring matrix: Notice: the comment line is mandatory! # A C D E F G H I K L M N P Q R S T V W Y - A 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 C 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 D 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 E 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 F 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 G 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 H 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 K 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 L 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 N 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 P 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 Q 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 R 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 S 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 T 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 V 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 W 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 Y 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 - 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 (III) distance matrix: Notice: the comment line is mandatory! # hox2 hox5 hox13 hox8 hox10 hox3 hox7 hox6 hox12 hox11 hox9 hox4 hox1 hox2 0 20 32 23 24 15 21 20 30 30 21 21 20 hox5 20 0 33 15 22 15 7 8 32 29 19 7 22 hox13 32 33 0 34 28 34 35 34 23 24 28 33 32 hox8 23 15 34 0 19 24 10 9 30 25 16 20 26 hox10 24 22 28 19 0 25 22 20 24 17 10 25 26 hox3 15 15 34 24 25 0 18 18 30 29 21 17 22 hox7 21 7 35 10 22 18 0 3 33 29 19 10 23 hox6 20 8 34 9 20 18 3 0 31 27 17 13 23 hox12 30 32 23 30 24 30 33 31 0 22 25 31 33 hox11 30 29 24 25 17 29 29 27 22 0 17 32 32 hox9 21 19 28 16 10 21 19 17 25 17 0 23 24 hox4 21 7 33 20 25 17 10 13 31 32 23 0 21 hox1 20 22 32 26 26 22 23 23 33 32 24 21 0 (IV) minimum weigth spanning tree (mst): # minimum spanning tree (MST): # hox6 hox7 [3] # hox5 hox7 [7] # hox5 hox4 [7] # hox8 hox6 [9] # hox10 hox9 [10] # hox5 hox3 [15] # hox2 hox3 [15] # hox8 hox9 [16] # hox10 hox11 [17] # hox2 hox1 [20] # hox12 hox11 [22] # hox13 hox12 [23] MST [164] # ----- The mst is a list of distances (fourth column in square brackets) between sequence one (second column) and sequence two (third column). It runs through all sequences, and conntects them with minimum weigth, given in square brackets after "MST". The minimum hamiltonian path is at minimum as short as the sum over all weigthes in the minimum weigth spanning tree. (V) minimum weigth hamiltonian path(s): # minimum hamiltonian path(s) (MHP): MHP hox13 hox12 hox11 hox10 hox9 hox8 hox6 hox7 hox5 hox4 hox3 hox2 hox1 [166] MHP hox1 hox2 hox3 hox4 hox5 hox7 hox6 hox8 hox9 hox10 hox11 hox12 hox13 [166] ----- The minimum hamiltonian path is a list of sequences, that lists each sequence exactly once. The order represents the path. The number in square brackets is the length of the hamiltonian path. Several (but not all!) pathes with the minimum length are listed. There should be at least 2 paths with start at A and end at B and the other way round. The algorithm computes at least >one< exact solution. (It doesn't use a heuristic.) Other solutions with minimum length might exist. The problem is NP-hard, the complexity of the algorithm is about O(n^3 x 2^n). This causes a RAPID increase in runtime with the number of sequences (n). Running the program on small datasets is strongly recommanded. (Va) length of random paths: If -rr=[INTEGER] is specified [INTEGER] random pathes and their lengths will be computed. # length of 5 random paths (RR): MHPRR hox11 hox13 hox9 hox1 hox2 hox7 hox4 hox5 hox3 hox6 hox8 hox12 hox10 [230] MHPRR hox7 hox13 hox2 hox4 hox3 hox5 hox12 hox9 hox10 hox1 hox11 hox6 hox8 [281] MHPRR hox10 hox9 hox3 hox11 hox8 hox12 hox6 hox5 hox13 hox2 hox1 hox4 hox7 [270] MHPRR hox12 hox2 hox9 hox10 hox11 hox8 hox1 hox7 hox4 hox5 hox3 hox13 hox6 [252] MHPRR hox8 hox5 hox1 hox13 hox9 hox4 hox3 hox11 hox6 hox12 hox7 hox2 hox10 [302] Calculation of the the HMP is so time consuming that computation of even 1000 replicates for random pathes is negectable concering the runtime. The program has been tested on Linux, Fedora core 7, with perl, v5.8.8 built for x86_64-linux. It should be running on other operating systems as well.