Publications - Supplemental material

Please find below supplemental material corresponding to publications of our group. Currently, we list 132 supplements. If you have problems accessing electronic information, please let us know:

©NOTICE: All documents are copyrighted by the authors; If you would like to use all or a portion of any paper, please contact the author.

This supplement is also available at http://www.bioinf.uni-leipzig.de/publications/supplements/22-003
You may use this URL to cite or link to us.

BIOINF 22-003: Tailored machine learning models for functional RNA detection in genome wide screens

Christopher Klapproth, Siegfried Zöztsche, Felix Kühnl, Jörg Fallmann, Peter F. Stadler, Sven Findeiß



In the following section, we list input and output of used test sets. Test set 1 consists of alignments of known highly conserved noncoding RNAs and respective control sets that were sampled using selection random noncoding genomic alignments, SISSIz simulation and shuffling with the rnazRandomizeAln.pl tool. Test set 2 is built from a 27-way multiple genome alignment (FlyBase v2, last accessed 01.05.22) cut into overlapping windows using the rnazWindow.pl tool with parameter --slide=40.

Test set Folder
Test Set 1 Test_Set_1_RNAz
Test Set 2 Test_Set_2_Drosophila


Training data as ClustalW alignments as selected for training of experimental models that were later used in evaluation and prediction of Drosophila genomic data.

Training Set Folder
Training Set noncoding RNA ncRNA_training_alignments
Training set protein coding Protein_training_alignments


Test data for structural conservation filter acceptance rates and z-score SVR training and test data.

Data Set Folder
Structural conservation filter test set Acceptance_rate_test
z-score SVR raw training data Trainingdata
z-score SVR test data Testdata


Model files
ncRNA_model
coding_model
three_way_model
ncRNA_trainingdata
trainingdata_protein


UCSC TrackHub
This public hub provides detailed data for the analysis of two-way classification approaches, i.e. RNAz 2.0 and Svhip, identifying conserved non-coding RNA elements in Drosophila melanogaster. Detailed description of the individual tracks are provided within the hub. Please go to https://genome-euro.ucsc.edu/cgi-bin/hgHubConnect and copy the link http://www.bioinf.uni-leipzig.de/Publications/SUPPLEMENTS/22-003/SvhipDmelHub/hub.txt into the URL text field and click "Add Hub". You will be directly forwarded to the UCSC Genome Browser loading D. melanogaster Assembly dm6. Please add chr3R:17,645,050-17,647,096 to the "Position/Search Term" text field and click "GO". You should get something similar to the picture below.

UCSC screen shot of chr3R:17,645,050-17,647,096

Utilizied annotations and predictions

The underlying data displayed on UCSC as trackhub provided in BigBed format (i.e. an indexed binary format). Applying the UCSC bigBedToBed tool available at http://hgdownload.soe.ucsc.edu/admin/exe/ this format can be easily converted to human readable Bed.

FlyBase C/D box snoRNA annotation
FlyBase HACA box snoRNA annotation
FlyBase miRNA annotation
FlyBase tRNA annotation
RNAcentral snoRNA annotation
RNAcentral miRNA annotation
RNAcentral tRNA annotation
Sliced raw MAF input
Sliced merged MAF input
RNAz predictions on raw MAF input
RNAz predictions on merged MAF input
Svhip predictions on raw MAF input
Svhip predictions on merged MAF input