CODE2ALN - the tutorial

A short and simple guide for first usage of code2aln.

Code2aln produces multiple nucleic acid alignments using information on coding and non-coding regions as part of the scoring function. It is very easy to use.
Together with the source code you have got an example data set that can be processed as a test case. The results you get can be compared to the results made available together with the source code.

1. If you have not done this already, the first step is to compile the code: Change to the directory where the code files lie and type

make

You get the executable file code2aln.

2. Code2aln reads the input nucleic acid sequence files as arguments. See the manual page code2aln.1. The example data set is in Pearson's format. Type

code2aln test.seq

to start the alignment of the test data set. Please wait a few minutes until the alignment is done.

3. The program produces some information to stdout concerning the scores of the particular partial (pairwise and cluster) alignments. Compare this output to the contents of the file stdout.txt in the subdirectory ./Output. There should be no difference.

4. Code2aln detects all theoretically possible open reading frames which have a minimal length of 300 or one fifteenths of the sequence length. And it extracts all information about open reading frames and exons from the files when you are aligning sequences in GenBank files. You can see the arrangement of the coding regions as PostScript output in the file ORF.ps. The file ORF.ps in the subdirectory ./Output should show the same (the date of generation differs).

5. Information about the aligned sequences and their coding regions is also written in the file info.ral. The contents should be the same as in the file ./Output/info.ral.

6. The file cluster.txt gives a representation of the guide tree that determines the order of the profile alignments. The file should be the same as the example in ./Output/cluster.txt.

7. The final alignment is written to the file aln.aln. This file should be the same as the example in ./Output/aln.aln exept the first line which shows the date of generation.

You might compare the resulting alignment with the results of some other alignment algorithms like clustalw. You should see that code2aln produces less gaps with a much higher fraction of gaps that can be divided by 3 which shows the strong tendency of code2aln not to disrupt codons within coding regions.

You can use various types of input file formats, one or more sequences per file, and 18 types of codon tables. See the manual page for further information.

Have a good aligning ;)