Publications - Published papers

Please find below publications of our group. Currently, we list 496 papers. Some of the publications are in collaboration with the group of Sonja Prohaska and are also listed in the publication list for her individual group. Access to published papers (access) is restricted to our local network and chosen collaborators. If you have problems accessing electronic information, please let us know:

©NOTICE: All papers are copyrighted by the authors; If you would like to use all or a portion of any paper, please contact the author.

Computational discovery of human coding and non-coding transcripts with conserved splice sites

Dominic Rose, Michael Hiller, Katharina Schutt, Jörg Hackermüller, Rolf Backofen, Peter F. Stadler

Download


PREPRINT 10-010: [ PDF ]  [ Supplement ]
[ Publishers's page ]  paperID

Status:


Bioinformatics

Abstract


<p><b>Motivation:</b> Long non-coding RNAs (lncRNAs) resemble protein- coding mRNAs but do not encode proteins. Most lncRNAs are under lower sequence constraints than protein-coding genes and lack conserved secondary structures, making it hard to predict them computationally. </p> <p><b>Results:</b> We introduce an approach to predict spliced lncRNAs in vertebrate genomes combining comparative genomics and machine learning. It is based on detecting signatures of characteristic splice site evolution in vertebrate whole genome alignments. First, we predict individual splice sites, then assemble compatible sites into exon candidates, and finally predict multi-exon transcripts. Using a novel method to evaluate typical splice site substitution patterns that explicitly takes the species phylogeny into account, we show that individual splice sites can be accurately predicted. Since our approach relies only on predicted splice sites, it can uncover both coding and non-coding exons. We show that our predicted exons and partial transcripts are mostly non-coding and lack conserved secondary structures. These exons are of particular interest, since existing computational approaches cannot detect them. Transcriptome sequencing data indicate tissue-specific expression patterns of predicted exons and there is evidence that increasing sequencing depth and breadth will validate additional predictions. We also found a significant enrichment of predicted exons that form multi-exon transcript parts, and we experimentally validate such a novel multi-exon gene. Overall, we obtain 336 novel multi-exon transcript predictions from human intergenic regions. Our results indicate the existence of novel human transcripts that are conserved in evolution and our approach contributes to the completion of the human transcript catalog. </p> <p> <b>Availability and Implementation:</b> A Perl implementation of the tree- based log-odds scoring is available online (see supplement). <p>

Keywords


Splicing, splice site prediction, long non-coding RNA, lncRNA, log-odds substitution scores, human genome, ncRNA

Note


doi: 10.1093/bioinformatics/btr314