Bioinformatics Preprint 05-004
Download:
[PostScript]
[PDF]
Titel:
Multiple Sequence Alignments of Partially Coding Nucleic Acid Sequences
Author(s):
Roman R. Stocsits,
Ivo L. Hofacker,
Claudia Fried,
Peter F. Stadler
Submitted
Abstract:
Background: High quality sequence alignments of RNA and DNA sequences
are an important prerequisite for the comparative analysis of genomic
sequence data. Nucleic acid sequences, however, exhibit a much larger
sequence heterogeneity compared to their encoded protein sequences due to
the redundancy of the genetic code. It is desirable, therefore, to make use
of the amino acid sequence when aligning coding nucleic acid sequences. In
many cases, however, only a part of the sequence of interest is
translated. On the other hand, overlapping reading frames may encode
multiple alternative proteins, possibly with intermittent non-coding
parts. Examples are, in particular, RNA virus genomes.
Methods: The standard scoring scheme for nucleic acid alignments can
be extended to incorporate simultaneously information on translation
products in one or more reading frames.
Results: Here we present a multiple alignment tool, codaln,
that implements a combined nucleic acid plus amino acid scoring model for
pairwise and progressive multiple alignments that allows arbitrary
weighting for almost all scoring parameters. Resource requirements of
codaln are comparable with those of standard tools such as
ClustalW.
Conclusions: We demonstrate the applicability of codaln to
various biologically relevant types of sequences (bacteriophage Levivirus
and Vertebrate Hox clusters) and show that the combination of nucleic acid
and amino acid sequence information leads to improved alignments. These, in
turn, increase the performance of analysis tools that depend strictly on
good input alignments such as methods for detecting conserved RNA secondary
structure elements.
Availability:
The source code and documentation may be downloaded from
http://www.bioinf.uni-leipzig.de/Software/codaln/ and
href="http://www.tbi.univie.ac.at/~roman/Codaln/">
Keywords:
multiple sequence alignment, partially coding sequences, overlapping
reading frames, RNA virus genomes, Hox genes.
Alternative Numbers:
Return to 2005 working papers list.
Last modified: 2004-03-28 19:56:33 studla