92-02-01

Statistics of RNA Secondary Structures

Walter Fontana, Danielle A. M. Konings, Peter F. Stadler, Peter Schuster

A statistical reference for RNA secondary structures with minimum free energies is computed by folding large ensembles of random RNA sequences. Four nucleotide alphabets are used: two binary alphabets, AU and GC, the biophysical AUGC and the synthetic GCXK alphabet. RNA secondary structures are made of structural elements, such as stacks, loops, joints and free ends. Statistical properties of these elements are computed for small RNA molecules of chain lengths up to $100$. The results of RNA structure statistics depend strongly on the particular alphabet chosen. The statistical reference is compared with the data derived from natural RNA molecules with similar base frequencies.
Secondary structures are represented as trees. Tree editing provides a quantitative measure for the distance, d_t, between two structures. We compute a structure density surface as the conditional probability of two structures having distance t given that their sequences have distance h. This surface indicates that the vast majority of possible minimum free energy secondary structures occur within a fairly small neighbourhood of any typical \brd{random} sequence.
Correlation lengths for secondary structures in their tree representations are computed from probability densities. They are appropriate measures for the complexity of the sequence-structure-relation. The correlation length also provides a quantitative estimate for the mean sensitivity of structures to point-mutations.

Keywords: Combinatory map - Correlation length - Landscape - Random RNA sequence - RNA secondary structure - Shape space covering - Tree editing

Return to 1992 working papers list.