Inst. f. Informatik   
Uni Leipzig

Bioinformatics Preprint 06-010


Structured RNAs in the ENCODE Selected Regions of the Human Genome

Stefan Washietl, Jakob S. Pedersen, Jan O. Korbel, Claudia Fried, Andreas R. Gruber, Jöorg Hackermüller, Jana Hertel, Manja Lindemeyer, Kristin Missal, Andrea Tanzer, Catherine Ucla, Stylianos E. Antonarakis, Alexandre Reymond, France Denoeud, Julien Lagarde, Jorg Drenkow, Philipp Kapranov, Thomas R. Gingeras, Michael Snyder, Mark B. Gerstein, Ivo L. Hofacker, Peter F. Stadler


Functional RNA structures play an important role both in the context of non-coding RNA transcripts as well as regulatory elements in mRNAs. Here we present a computational study to detect functional RNA structures within the ENCODE selected regions of the human genome. Since structural RNAs in general lack characteristic signals in their primary sequence, comparative approaches that evaluate the evolutionary conservation of structures are most promising. The deeply sequenced ENCODE regions therefore provide an ideal data set for these methods. We have used three recently introduced programs based on either phylogenetic stochastic context free grammar (EvoFold) or energy directed folding (RNAz and AlifoldZ) yielding several thousand candidate structures (corresponding to about 2.7% of the ENCODE regions). EvoFold has its highest sensitivity in highly conserved and relatively AU-rich regions while RNAz favors slightly GC-rich regions, resulting in a relatively small overlap between methods. In comparison with the GENCODE annotation our data points to new functional structural RNAs in all genomic contexts, with a slightly increased density of predictions in 3'UTRs. While we estimate a significant false discovery rate of about 50-70% in this screen, many of the predictions can be further substantiated by additional criteria: 248 loci are predicted by both RNAz and EvoFold, an additional 239 RNAz or EvoFold predictions are supported by the (more stringent) AlifoldZ algorithm. Over 700 predictions overlap with non-protein coding transcripts detected by oligonucleotide tiling arrays. 570 RNAz structure predictions fall into regions that show signs of selection pressure also on the sequence level (i.e. conserved elements). We present a small set of manually selected novel non-coding transcripts that are supported by most of the above criteria, and for which transcription was verified by 5'-RACE analysis.

Keywords: Functional RNA; conserved RNA secondary structure; comparative genomics

Return to 2006 working papers list.
Last modified: 2006-03-08 09:22 jana