M. Clausen, A. Mosig, Approximately Matching Polygonal Curves
with Respect to the Frechet Distance,
Accepted for Special
Issue on the 19th Europ. Workshop on Comp. Geom. of
Computational Geometry: Theory and Applications., 2004.
Motivated by problem settings such as the determination of motifs in
proteins or molecular docking, we present a generic framework for
finding geometric similarities between two molecular shapes. Our
approach is based on minimizing a distance between the two given
shapes, where a problem-specific distance function can be chosen from
a certain class of distance measures, the so-called relational
distance measures.
The setting we investigate is as follows: we are given two molecules,
modeled as point sets (or, in some cases, as point sequences) P and
Q in R^3, where each point represents a chemical entity
such as a single atom or an amino acid of a protein. Furthermore, we
are given a distance measure $\d$ between point sets such that
d(P,Q) measures the resemblance of two molecules in a
fixed spatial position, with values of d(P,Q) close to
zero indicating large resemblance; the resemblance usually changes
when one of the molecules, say Q, is rotated or translated
(i.e.,~transformed by a rigid motion g). In this setting, many
typical pattern matching problems involving molecular structures fit
into one of the two following problem settings:
Global resemblance between P and Q: It is our goal to find a
transformation $g$ that minimizes the distance between P and Q,
i.e., arg min_g d(P,gQ), with g ranging over the
set of all rigid motions in three dimensions and gQ denoting Q
transformed by g.
Largest common substructures of P and Q: Given a fault
tolerance eps>0, we want to determine largest possible
substructures P' of P and Q' of Q such that
d(P',gQ')<=eps for some transformation g.
In typical application scenarios, we also have physical or chemical
features attached to the points in P and Q that can be taken into
account by the distance measure d. Based on algebraic
considerations, our algorithms for determining global resemblance and
common substructures work for any distance measure that is a
relational distance measure, and yield solutions that are
within a guaranteed quality of approximation. Relational distance
measures encompass distance measures such as a discrete version of the
Frechet distance, which is a suggestive distance measure for
geometrically aligning protein backbones and determining motifs, as
well as the Hausdorff and the bottleneck distance, which are
well-suited for applications involving surface interactions between
molecules.