92-02-01
Statistics of RNA Secondary Structures
Walter Fontana, Danielle A. M. Konings,
Peter F. Stadler, Peter Schuster
A statistical reference for RNA secondary structures with minimum free
energies is computed by folding large ensembles of random RNA
sequences. Four nucleotide alphabets are used: two binary alphabets,
AU and GC, the biophysical AUGC and the synthetic GCXK alphabet. RNA
secondary structures are made of structural elements, such as stacks,
loops, joints and free ends. Statistical properties of these elements
are computed for small RNA molecules of chain lengths up to $100$.
The results of RNA structure statistics depend strongly on the
particular alphabet chosen. The statistical reference is compared
with the data derived from natural RNA molecules with similar base
frequencies.
Secondary structures are represented as trees. Tree editing provides
a quantitative measure for the distance, dt, between
two structures. We compute a structure density surface as the
conditional probability of two structures having distance t
given that their sequences have distance h. This surface
indicates that the vast majority of possible minimum free energy
secondary structures occur within a fairly small neighbourhood of any
typical \brd{random} sequence.
Correlation lengths for secondary structures in their tree
representations are computed from probability densities. They are
appropriate measures for the complexity of the
sequence-structure-relation. The correlation length also provides a
quantitative estimate for the mean sensitivity of structures to
point-mutations.
Keywords:
Combinatory map - Correlation length
- Landscape - Random RNA sequence - RNA secondary structure -
Shape space covering - Tree editing
Return to 1992 working papers list.