Bioinformatics Preprint 03-021

**Download:**

[PostScript]
[PDF]

**Titel:**

**
The Footprint Sorting Problem
**

**Author(s):**

Claudia Fried,
Wim Hordijk,
Sonja J. Prohaska,
Claus R. Stadler,
Peter F. Stadler

**Submitted to:**

*MCC 2003*

**Abstract:**

Phylogenetic footprints are short pieces of non-coding DNA sequence in the
vicinity of a gene that are conserved between evolutionary distant species.
A seemingly simple problem is to sort footprints in their order along the
genomes. It is complicated by the fact that not all footprints are
co-linear: they may cross each other. The problem thus becomes the
identification of the crossing footprints, the sorting of the remaining
co-linear cliques, and finally the insertion of the non-colinear ones at
``reasonable'' positions. We show that solving the footprint sorting
problem requires the solution of the ``Minimum Weight Vertex Feedback Set
Problem'', which is known to be NP-complete and APX-hard. Nevertheless good
approximations can be obtained for datasets of interest. The remaining
steps of the sorting process are straight forward: computation of the
transitive closure of an acyclic graph, linear extension of the resulting
partial order, and finally sorting w.r.t.\ the linear
extension. Alternatively, the footprint sorting problem can be rephrased as
a combinatorial optimization problem for which approximate solutions can be
obtained by means of general purpose heuristics. Footprint sortings
obtained with different methods can be compared using a version of multiple
sequence alignment that allows the identification of unambiguously ordered
sub-lists.

**Alternative Numbers:**

Return to 2003 working papers list.

Last modified: 2003-07-23 12:23:25 studla