Publications - Supplemental material

Please find below supplemental material corresponding to publications of our group. Currently, we list 118 supplements. If you have problems accessing electronic information, please let us know:

©NOTICE: All documents are copyrighted by the authors; If you would like to use all or a portion of any paper, please contact the author.

This supplement is also available at http://www.bioinf.uni-leipzig.de/publications/supplements/17-001
You may use this URL to cite or link to us.

BIOINF 17-001: The Fungi snoRNAome

Sebastian Canzler , Peter F. Stadler, Jana Hertel




S1: Taxonomic Tree and Genome Information

An NCBI-based taxonomic classification of all 147 fungal organisms that were used in our publication can be found here:
  • Taxonomic Tree of Fungi, except of Pezizomycotina [pdf][newick]
  • Taxonomic Tree of Pezizomycotina [pdf][newick]


Machine readable information about the fungal organisms, such as genome sources and 3-letter abbreviations are given here:
Genome Information


S2: Experimentally detected snoRNAs

The analysis of fungal snoRNAs was mainly based on five surveys introducing experimentally detected snoRNAs from different organisms such as N.crassa (Liu et al.), A.fumigatus (Jöchl et al.), C.albicans (Mitrovich et al.), S.cerevisiae (Piekna-Przybylska et al.), and S.pombe (Li et al.). An overview of the retrieved snoRNAs and the corresponding publications can be seen in the Table below. Although the survey by Jöchl et al. originally covered box C/D snoRNAs only, sequence AM921943 was treated as a box H/ACA snoRNA instead of a box C/D snoRNA since this sequence shows two separated, perfect hairpins and comprises convincing box motifs while it clearly lacks characteristics of box C/D snoRNAs. Another issue concerns the sequences AM921919 and AM921934 which are treated as the same snoRNA in this work decreasing the amount to 25 box C/D snoRNAs that were used from this publication. Both sequences map to the exact same genomic location, despite that AM921934 comprises three point-mutation with respect to AM921919. All snoRNAs sets were taken from their corresponding publication, despite the budding yest sequences, that were downloaded from the UMass-database.

organism box C/D snoRNAs box H/ACA snoRNAs publication



S3: Mapping of experimentally detected snoRNAs

The following tables display the mapping of previously, experimentally verified snoRNAs of the five fungi S.pombe, S.cerevisiae, C.albicans, A.fumigatus, and N.crassa as it was automatically detected by the snoStrip pipeline.

  • Mapping of experimentally detected box C/D snoRNAs [html] [csv]
  • Mapping of experimentally detected box H/ACA snoRNAs [html] [csv]



S4: Target RNA alignments

Table S4. The table shows the number of target RNA sequences that were gathered for fungal organisms. Numbers in 'target alignment' denote the number of target sequences in the respective alignment since not all single sequences targets were used by all means.
target RNA single sequence targets target alignment
25S rRNA 114 [fasta] 87 [aln]
18S rRNA 130 [fasta] 121 [aln]
5.8S rRNA 134 [fasta] 134 [aln]
U1 snRNA 98 [fasta] 98 [aln]
U2 snRNA 133 [fasta] 133 [aln]
U4 snRNA 132 [fasta] 132 [aln]
U5 snRNA 118 [fasta] 118 [aln]
U6 snRNA 143 [fasta] 143 [aln]




S5: SnoRNA sequences and alignments

In this section we provide family-specific snoRNA sequences and alignments (in clustal or stockholm format). You can either download all box C/D or box H/ACA families at once (in the list below) or download the sequence information of a respective family of your choice (listed in the tables below).
  • box C/D snoRNA families     [fa] [aln] [stk]
  • box H/ACA snoRNA families [fa] [aln] [stk]



Sequences and alignments of box C/D snoRNA families
SnoRNA family S.cerevisiae name Fasta files Alignment (clustal format) Alignment (stockholm format)



Sequences and alignments of box H/ACA snoRNA families
SnoRNA family S.cerevisiae name Fasta files Alignment (clustal format) Alignment (stockholm format)



S6: General Characteristics of Fungal snoRNAs



A) Box Motif Analysis
  1. Box C/D snoRNAs
    • Box motif of box C [png] [eps]
    • Box motif of box D [png] [eps]
    • Box motif of box C\' [png] [eps]
    • Box motif of box D\' [png] [eps]
  2. Box H/ACA snoRNAs
    • Box motif of box H [png] [eps]
    • Box motif of box ACA [png] [eps]
Box C
Box D
Box C'
Box D'

Box H
Box ACA

Figure S6.1 Sequence logos of snoRNA specific box motifs. Box motifs were extracted from all snoStrip-annotated box C/D snoRNAs (5593 sequences) and box H/ACA snoRNAs (2331 sequences). Pictures were generated with WebLogo.


B) Sequence Lengths and Distances betweeen Boxes

Both major snoRNA classes, box C/D and box H/ACA, are clearly distinguishable based on their distinct sequence lengths. In accordance to the published canonical length distribution, 90% of the novel snoStrip-annotated box C/D snoRNAs are found to be 80nts to 135nts in length. The median length is 93nts, see Figure S5.2 (C/D snoRNAs). Family CD_53 is the only exception since its members share sequences with lengths between 200 and 300nts. Crucial features are the distances between box C and the potential box D' as well as between box C' and D since these stretches harbor the target binding sites. Hence they need to provide a sufficient length. In case of box C/D' distances, the minimal gap is found to be 11nts while the median space is 24nts long. The gap between box C' and box D seems to be smaller. The shortest distance is 9nts long while the median is 22nts. The distance between both prime boxes is not known to be of significant relevance. A single requirement is given by a minimal distance of at least 2nts to form another kink-turn motif with the aid of snoRNP associated proteins. Larger distances do not pose a problem. Within the novel fungi snoRNAs, the shortest distance is 3nts while 80% of all prime box annotated sequences possess gaps between 6 to 31 nucleotides.

In contrast to box C/D snoRNAs, box H/ACA snoRNAs are reasonably longer. Their median sequence length is 188nt, see Figure S5.2 (H/ACA snoRNAs). The shortest sequence being annotated by snoStrip is 115nts while 90% of all sequences are between 148 and 266nts long. When comparing both hairpins, no significant difference can be observed. Both share similar median values of 85nts and 79nts for hairpin 1 (HP1) and hairpin 2 (HP2), respectively. Solely the length distribution of HP2 sequences is a little bit tighter than for HP1. Extraordinary long snoRNAs can be found in families HACA 36 (snR86) and HACA 41 (snR84) with lengths of ∼1000nt and ∼600nt, respectively. Family HACA 12 (snR30), which is ∼600nt long, provides an exceptional secondary structure with extensively enlarged 5’ hairpins and hinge regions, where the latter one is also able to form a so-called internal hairpin [Fayet-Lebaron et al. 2009].
  • Length Analysis of box C/D snoRNAs [png] [eps]
  • Length Analysis of box H/ACA snoRNAs [png] [eps]
box C/D snoRNAs
box H/ACA snoRNAs

Figure S6.2 Length distributions of snoRNA sequences and distances between characteristic box C/D motifs can be seen. For H/ACA snoRNAs the hairpin lengths are depicted. Due to visibility reasons, extraordinary long families such as Nc_CD_53 (CD_53), snR30 (HACA_12), and the Saccharomycetes specific families snR86 (HACA 36), snR84 (HACA 41) were excluded from these boxplots.


S7: Phylogenetic Heatmaps

The phylogenetic distribution of box C/D and box H/ACA snoRNA families is depicted in two separate heatmaps, respectively. Therein, the amount of snoRNA sequences belonging to a particular organism and family is color encoded. Both images share the same structure: each column represents a specific snoRNA family while each row represents a certain organism or genus. The NCBI-derived taxonomic classification is shown on the left hand side. SnoRNA families that appear to be lineage-specific are shown in red boxes. The figure concerning box C/D snoRNAs is already shown in the paper as Figure 2 in the result section.
  • Phylogenetic heatmap of box C/D snoRNA families [png] [eps]
  • Phylogenetic heatmap of box H/ACA snoRNA families [png] [eps]

Figure S7.1 A heatmap of snoStrip-detected box C/D snoRNAs is shown on the previous site. Each column represents a specific snoRNA family, while each row either represents a certain species or genus. A taxonomic classification is shown on the left hand side. The amount of snoRNAs detected in a specific species and snoRNA family is encoded in a blue color scheme. Lineage specific families are boxed (A: Saccharomycotina, B: Pezizomycotina, C: Sordariomycetes).

Figure S7.2 A heatmap of snoStrip-detected box H/ACA snoRNAs is shown on the previous site. Each column represents a specific snoRNA family, while each row either represents a certain species or genus. A taxonomic classification is shown on the left side. The amount of snoRNAs detected in a specific species and snoRNA family is encoded in a blue color scheme. Lineage specific families are boxed (A: Schizosaccharomycotina, B: Saccharomycotina, C: Pezizomycotina).



S8: Evolutionary Events in snoRNA History

In the following, a general analysis on evolutionary innovation and deletion events on sequence and family level is presented. To precisely determine evolutionary events leading to innovations and losses, an adapted version of the ePope (Hertel and Stadler) tool was applied. The following figures show two different representations of evolutionary events mapped to the NCBI-taxonomic tree. The first one shows absolute events at the root of major fungal clades up to a level of families and orders. The second one, on the other hand, shows relative innovation and deletion events mapped to the pre-ordered nodes of the taxonomic tree up to species level. The latter one is already shown in the original paper as Figure 3 in the result section.
  • Absolute innovation and deletion events [png] [eps]
  • Relative innovation and deletion events [png] [eps]

Figure S8.1 Absolute innovation and deletion events of snoRNAs during fungal evolution.


Figure S8.2 Relative number of gains and losses of entire snoRNA families during fungal evolution. The relative gain is the number of gained snoRNA families compared to the observed number of snoRNA families. The relative loss describes the number of lost snoRNA families compared to the number of snoRNA families in the parent node of the phylogenetic tree.



S9: Target Switches

This section deals with two 'snoRNA clans' each of which comprises more than just one previously annotated snoRNA family, whose evolutionary history is coupled through a series of target switches and major rearrangements. Please have also a look at the 'Target switches' paragraph in the result section of the original paper.

Evolutionary history of snoRNA cluster CD_5

Since the evolutionary history of snoRNA clan CD_5 is discussed in great detail in the paper, we will solely publish the more detailed figure summarizing the evolutionary events similar to Figure 6 and Figure 7 in the paper.
  • Potential evolutionary History of snoRNA cluster CD_5 [png] [eps]
  • Evolutionary insights into a snoRNA cluster harboring members of the CD_5 snoRNA clan [png] [eps]

Figure S9.1 Potential evolutionary history of snoRNA clan CD 5 involving four different modification sites on the LSU rRNA. Gain/loss events are displayed with arrows, while potential rearrangements are shown with red stars. ⊤ 25S-1866 is solely found in Pichia. ∓ Only putative since LSU sequences are missing, but snoRNAs show convincing ASE conservation. ⊥ Only putative since no LSU sequence is present, but snoRNAs shows convincing ASE conservation for three modifications.


Evolutionary history of snoRNA cluster CD_19

A similar evolutionary history can be reconstructed for the snoRNA clan CD\_19 including the budding yeast \sno s snR52 and snR56 as well as three Neurospora sequences (Nc_CD_19, Nc_CD_41, and Nc_CD_42). The RNA molecules of this snoRNA clan are known to guide two SSU methylations: 18S-462 (S.cerevisiae 18S-420, D target), 18S-1580 (18S-1428, D' target), and two LSU methylations: 25S-2574 (25S-1508, D target) and 25S-4143 (25S-2921, D' target). A potential evolutionary history is depicted in Figure below.

All four modification sites can be denoted as ancient since they map to known methylated positions in human small and large subunit rRNAs. However, a potential ancient state at the root of fungi involves solely both SSU modifications. Both methylations in the LSU at 25S-2574 and 25S-4143 are exclusively found in Pezizomycotina and Saccharomycotina, respectively. Thus, they are rather be reinvented in these lineages than lost in all other. Both SSU sites are present in nearly all analyzed fungi with the exception of lineages were only a few species are present, e.g., Tremellomycetes or Blastocladiomycota. A noteworthy observation is the putative duplication of target interaction for position 18S-1580 at the root of Pezizomycotina. It seems that the duplicated interaction is inserted in a new single guide snoRNA. In Eurotiomycetes, on the other hand, this anti sense element is relocated into the formerly single guide sequence that targets the other 18S position of this snoRNA clan. Other double guide snoRNAs can be seen in Saccharomycotina combining the ancient target of 18S-462 with the presumably reinvented 25S-4143.

A similar behaviour is detected in several Pezizomycotina species, where novel double guide sequence incorporate target binding capabilities for 18S-1580 and 25S-2574. A further target switch is observed in Pichia membranifaciens, where the species specific duplication of 18S-462 is inserted as D target in the snoRNA guiding 18S-1580.

In all but one species that are capable of guiding methylations at 18S-462, a second target at position 18S-602 is further predicted with the same snoRNA ASE. The additional interaction is marginally weaker than the annotated one but still rather exceptional raising the question if potentially both positions are modified by one anti sense element.

Target 18S-462 seems also subjected to yet another reinvention since it is also predictable as potential D' target (!) in family CD_42. This family is exclusively found in Pezizomycotina and is predicted to contain a highly conserved D target guiding 25S-2979 (25S-1856, ICI; 1.26). In Dothideomycetes, Eurotiomycetes, and Leotiomycetes, an additional D' target site capable of targeting 18S-462 is found with an ICI score of 0.60 and a mean mfe of -14.57 kcal/mol. It is quite remarkable that this modification seems to be guided by two different snoRNA families where the ASEs are located at different sites.
  • Potential evolutionary history of snoRNA cluster CD_19 [png] [eps]

Figure S9.2 Potential evolutionary history of snoRNA clan CD 19 involving four different modification sites on the LSU and SSU rRNA. Gain/loss events are displayed with arrows, while potential rearrangements are shown with red stars. ∓ Targets for 25S-2574 are putative since LSU sequences are missing in these species, but the snoRNAs show convincing ASE conservation.





S10: Comparison to the Rfam database

The total amount of 18 snoRNA families comprise sequences of two different Rfam models each. To investigate and validate the conflations made by snoStrip, we run CMcompare to compare the Rfam snoRNA models.



For each Rfam snoRNA family, we used CMcompare to calculate pairwise scores to the models that are merged by snoStrip. In the figures below, we plotted the resulting z-score distribution to distinguish between models that are truly merged by snoStrip and all remaining models.

Figure S10.1Comparison of all snoRNA Rfam models against both models of the each merged box C/D Rfam pair.


Figure S10.2Comparison of all snoRNA Rfam models against both models of the each merged box H/ACA Rfam pair.



S11: Ribosome profiling

To verify our snoRNA annotation, we cross-checked with available Ribo-seq data of four different fungal organisms: Saccharomyces cerevisiae, Schizosaccharomyces pombe, Candida albicans, and Ajellomyces capsulatus.
Sequencing data of ribosomal profiling experiments does not only contain ribosom-protected mRNAs but also non-ribosomal protein-protected ncRNAs such as tRNAs, snRNAs, or snoRNAs. Furthermore, there is a fundamental difference in the read distribution of mRNAs and ncRNAs. While mRNAs share a quite uniform read distribution with a visible 3nt periodicity, in response to the 3-letter genetic code, ncRNAs show a rather tight read distribution embracing only these regions that were protein-protected against RNase digestion, which is an essential part of Ribo-seq library preparation.
  • Saccharomyces cerevisiae [html] [csv]
  • Schizosaccharomyces pombe [html] [csv]
  • Candida albicans [html] [csv]
  • Ajellomyces capsulatus [html] [csv]


S12: Single guide box C/D snoRNAs (D Target)


  • Box C/D snoRNAs with a conserved D target region [html] [tex] [csv]




S13: Single guide box C/D snoRNAs (D' Target)


  • Box C/D snoRNAs with a conserved D' target region [html] [tex] [csv]




S14: Single guide box C/D snoRNAs with additional lineage specific targets


  • Box C/D snoRNAs with additional lineage specific targets [html] [tex] [csv]




S15: Double guide box C/D snoRNAs


  • Box C/D snoRNAs with two conserved target regions [html] [tex] [csv]




S16: Orphan box C/D snoRNAs


  • SnoRNAs originally denoted as orphan [html] [tex] [csv]




S17: Target switches between box C/D snoRNAs


  • Target switches between different box C/D snoRNA families [html] [tex] [csv]




S18: Single guide box H/ACA snoRNAs (HP1 target)


  • Box H/ACA snoRNAs with a conserved target region in hairpin 1 [html] [tex] [csv]




S19: Single guide box H/ACA snoRNAs (HP2 target)


  • Box H/ACA snoRNAs with a conserved target region in hairpin 2 [html] [tex] [csv]




S20: Single guide box H/ACA snoRNAs with additional lineage specific targets


  • Box H/ACA snoRNAs with additional lineage specific targets [html] [tex] [csv]




S21: Double guide box H/ACA snoRNAs


  • Box H/ACA snoRNAs with two conserved target regions [html] [tex] [csv]




S22: Orphan box H/ACA snoRNAs


  • SnoRNAs originally denoted as orphan [html] [tex] [csv]




S23: Target switches between box H/ACA snoRNAs


  • Target switches between different box H/ACA snoRNA families [html] [tex] [csv]