Multiple sequence alignment with user-defined constraints

Burkhard Morgenstern, Sonja J. Prohaska, Nadine Werner, Jan Weyer-Menkhoff, Isabelle Schneider, Amarendran R. Subramanian, Peter F. Stadler

GCB 2004 (Bielefeld)

In many situations, automated multi-alignment programs are not able to correctly align families of nucleic acid or protein sequences. Difficult cases comprise not only distantly related sequences but also tandem duplications independent of their evolutionary age. Frequently, additional biological information is available that establishes homologies at least in parts of the sequences based on structural or functional consideration. In the present paper, we describe a semi-automatic approach to multiple sequence alignment in which the user can explicitly specify parts of the sequences that are biologically related to each other. Our software program uses these sites as anchor points and creates a multiple alignment that respects these user-defined constraints and hence should be biologically more plausible than alignments produced by fully automated procedures. We apply our approach to genomic sequences adjacent to the Hox genes. As a by-product, we obtain not only useful insights for the further development of alignment algorithms, but also an improved approach to phylogenetic footprinting.

multiple sequence alignments, anchored alignments, Hox gene clusters, phylogenetic footprinting

