Content-type: text/html Manpage of codaln

codaln

Section: Multiple Nucleic Acid Sequence Alignments (1.0)
Updated: man page version 1.0
Index Return to Main Contents
 

NAME

codaln  

SYNOPSIS

codaln [-h] [-l orf-length] [-e] [-f filename] [-a[1|2|3] factor] [-n[0|1|2|3] factor] [-c CTN] [-g[o|x] factor] [-m[0|1] factor] [-S[f|i] filename] [-G[f|i] filename] datafiles ...  

DESCRIPTION

codaln produces multiple nucleic acid alignments using information on coding and non-coding regions as part of the scoring function. This is done in order to prevent the problem of higher sequence divergency on the level of nucleic acids as compared to the underlying protein sequences in the case of coding at a certain region of the input nucleic acid sequences.

codaln reads the input nucleic acid sequence files (datafiles) as arguments. The possible input sequence file formats are Pearson's (FASTA) format, GenBank file format, ViennaRNA format or sequence data in one or more lines without any format, in any combination of formats during one program run. codaln automatically detects the types of the various input sequence files and handles them accordingly. All data in each format can be read as separate files or merged into one file.

It is possible to define one or more than one codon tables (CTN) for each sequence or groups of sequences. Default is the universal genetic code. Entering 'codaln -h' without further options or input files displays a list of the various available codon tables (currently 18, various for mitochondrial genomes). The codon tables are necessary for searching for start and stop codons and for translation of the detected open reading frames.

Each defined codon table (-c CTN)is used for all input sequences that follow this codon table entry subsequently in the command line until the next codon table entry. codaln detects all theoretically possible open reading frames which have a minimal length that can be defined (-l orf-length). Default is 300 nucleotides or one fifteenths of the sequence length. Exons and fragmented coding regions are joined, translated, and the resulting amino acid sequences are used for the scoring function beside the nucleic acid sequences, just the same as if they were not divided and entire.

All pairwise alignments are done using all scoring parameters. (Amino acid scoring values are weighted against each other as defined by the widely universal BLOSUM62 matrix, for fine-tuning reasons regarding the default nucleotide scoring values, all BLOSUM62 values are multiplied by 50; see the OPTIONS section for changing nucleotide scoring values). All scoring parameters can be weighted by the user between 0.01 and the highest floating-point value (standard IEEE 754) with 32-bit precision, or set to 0 (no relevance for scoring).

A guide tree is built which defines the order of the profile alignments. An output file is created that gives a textual representation of this guide tree (cluster.ral). Further output files are ORF.ps, a PostScript display which shows the read and found open reading frames and exons, and info.ral, a file containing the same information in text format.

The coding regions that are used for scoring can be automatically defined, user defined, modified, or eliminated. The file edit.ral can be modified, saved as another renamed file, and used as an input file (-f filename) for another (further) program run to overrule the initial coding region settings by user defined settings. After restarting the program the file ORF_edit.ps is created, a PostScript display which shows the coding region settings after editing.

The profile alignments are done respecting the guide tree and using all scoring parameter settings, just the same way the pairwise alignments have been done. Finally, the resulting multiple nucleic acid sequence alignment is written to the output file aln.aln which is compatible to the widely known clustalw format (See appropriate literature for more information about clustalw).

Further output is the file checklist.ral which gives a simple list of two numbers for all positions in the alignment. Both counts indicate gaps this position is part of. The first count indicates all those gaps that have a length which CANNOT be divided by three, the second count indicates those gaps that have a length which CAN be divided by three. This file can be further used as input for statistics and chart creating to see the effects of scoring that respects coding regions (the ratio of the second count as to the first should rise significantly in coding regions in the case of successful alignments with adequate scoring parameters).

 

OPTIONS

-h displays a short syntax information, a list of organisms that have non-canonical codon tables that can be used, and the 'encouragement' to read this man page.

-l orf-length defines a minimum length orf-length that an automatically detected open reading frame has to equal, in order to be relevant for scoring.

-e causes the program to exit after processing the input files, coding region detection, and presentation of the user information output. This is usually used to get the file edit.ral as an template for modification of the list of sequence parts that are treated as coding regions during scoring. The modified file edit.ral has to be renamed after editing.

-f filename The file filename is a modified edit.ral type file that is read at program start and used as an input to overrule the initial coding region settings made by codaln. If modified coding region settings are used a further PostScript output is presented (ORF_edit.ps) that shows the new coding region arrangements graphically.

-a[1|2|3] factor The floating variable factor can be set between 0.01 and the highest floating-point value (standard IEEE 754) with 32-bit precision, or to 0, in order to weight the effective scoring terms of amino acid alignments in coding regions in the first (-a1), second (-a2), or third (-a3) reading frame. Exceeding a (comparatively low) factor of e.g. 1000.00 is probably senseless for all practical reasons. Information about reading frame types is derived from ORF.ps and ORF_edit.ps.

-n[0|1|2|3] factor The floating variable factor can be set between 0.01 and the highest floating-point value (standard IEEE 754) with 32-bit precision, or to 0, in order to weight the effective scoring terms of nucleic acid alignments in coding regions in the first (-n1), second (-n2), or third (-n3) reading frame, or in non-coding regions (-n0). Exceeding a (comparatively low) factor of e.g. 1000.00 is probably senseless for all practical reasons. Information about reading frame types is derived from ORF.ps and ORF_edit.ps.

-c CTN defines a codon table that is used for detection of coding regions and translation of them into amino acid sequences for scoring. Each defined codon table is used for the input sequence data that follow this CTN in the command line until the next CTN is defined. The following codon tables (CTN) are available:
    
      univ: universal genetic code (default)
      acet: Acetabularia
      ccyl: Candida cylindrica
      tepa: Tetrahymena, Paramecium,
            Oxytrichia, Stylonychia, Glaucoma
      eupl: Euplotes
      mlut: Micrococcus luteus
      mysp: Mycoplasma, Spiroplasma
   mitocan: canonical mitochondrial code
   mitovrt: Vertebrates -  mitochondrial code
   mitoart: Arthropods -  mitochondrial code
   mitoech: Echinoderms -  mitochondrial code
   mitomol: Molluscs -  mitochondrial code
   mitoasc: Ascidians -  mitochondrial code
   mitonem: Nematodes -  mitochondrial code
   mitopla: Plathelminths -  mitochondrial code
   mitoyea: Yeasts -  mitochondrial code
   mitoeua: Euascomycetes - mitochondrial code
   mitopro: Protozoans - mitochondrial code

-g[o|x] factor The floating variable factor can be set between 0.01 and the highest floating-point value (standard IEEE 754) with 32-bit precision, or to 0, in order to weight the effective gap open (-go) and gap extension (-gx) penalties. Exceeding a (comparatively low) factor of e.g. 1000.00 is probably senseless for all practical reasons.

-m[0|1] factor The floating variable factor can be set between 0.01 and the highest floating-point value (standard IEEE 754) with 32-bit precision, or to 0, in order to weight the effective scoring terms of match (-m1) or mismatch (-m0) states in alignments of nucleic acid sequences. Exceeding a (comparatively low) factor of e.g. 1000.00 is probably senseless for all practical reasons.

-Sf filename reads the file filename at program start. filename can be written by the user, consists of FIVE lines with FIVE integer variable entries per line that are used then as the nucleotide scoring matrix. The (crucial) correct entry order is for the following pairs as shown (where numbers are the default entries, all other letters on top and on left in this example here that are not numbers MUST NOT be written into the file filename):


         A    U/T   G    C    else 
   A     1000 300   300  300  300
   U/T   300  1000  300  300  300
   G     300  300   1000 300  300
   C     300  300   300  1000 300
   else  300  300   300  300  300

-Si starts an input process that asks for each score term for each pair of nucleotides interactively after starting codaln. This inhibits fast command line operation, but is suited for users without experience in definitely error free ASCII text editing.

-Gf filename reads the file filename at program start. filename can be written by the user, consists of one line that contains two integer variables (first is gap open-, second is gap extension penalty).

-Gi starts an input process that asks for each gap penalty interactively after starting codaln. This inhibits fast command line operation, but is suited for users without experience in definitely error free ASCII text editing.

 

VERSION

This man page documents version 1.0 of codaln.  

COMMENT

codaln is the successor of version 1.1 of code2aln.  

AUTHOR

Roman R. Stocsits  

BUGS

Comments and bug reports should be sent to roman@bioinf.uni-leipzig.de or to roman@tbi.univie.ac.at.


 

Index

NAME
SYNOPSIS
DESCRIPTION
OPTIONS
VERSION
COMMENT
AUTHOR
BUGS

This document was created by man2html, using the manual pages.
Time: 13:20:52 GMT, September 27, 2004