The 3-dimensional (3D) structure of the genome is of significant importance for many cellular processes. In this paper, we study the problem of reconstructing the 3D structure of chromosomes from Hi-C data of diploid organisms, which poses additional challenges compared to the better-studied haploid setting. With the help of techniques from algebraic geometry, we prove that a small amount of phased data is sufficient to ensure finite identifiability, both for noiseless and noisy data. In the light of these results, we propose a new 3D reconstruction method based on semidefinite programming, paired with numerical algebraic geometry and local optimization. The performance of this method is tested on several simulated datasets under different noise levels and with different amounts of phased data. We also apply it to a real dataset from mouse X chromosomes, and we are then able to recover previously known structural features.
The computation of reliable, chemically correct atom maps from educt/product pairs has turned out to be a difficult problem in cheminformatics because the chemically correct solution is not necessarily an optimal solution for combinatorial formulations such as maximum common subgraph problems. As a consequence, competing models have been devised and compared in extensive benchmarking studies. Due to isomorphisms among products and educts it is not immediately obvious, however, when two atom maps for a given educt/product pairs are the same. We formalize here the equivalence of atom maps and show that equivalence of atom maps is in turn equivalent to the isomorphism of labeled auxiliary graphs. In particular, we demonstrate that Fujita's Imaginary Transition State can be used for this purpose. Numerical experiments show that practical feasibility. Generalizations to the equivalence of subgraph matches, double pushout graph transformation rules, and mechanisms of multi-step reactions are discussed briefly.
Circular RNAs (circRNAs) are a regulatory RNA class. While cancer-driving functions have been identified for single circRNAs, how they modulate gene expression in cancer is not well understood. We investigate circRNA expression in the pediatric malignancy, neuroblastoma, through deep whole-transcriptome sequencing in 104 primary neuroblastomas covering all risk groups. We demonstrate that MYCN amplification, which defines a subset of high-risk cases, causes globally suppressed circRNA biogenesis directly dependent on the DHX9 RNA helicase. We detect similar mechanisms in shaping circRNA expression in the pediatric cancer medulloblastoma implying a general MYCN effect. Comparisons to other cancers identify 25 circRNAs that are specifically upregulated in neuroblastoma, including circARID1A. Transcribed from the ARID1A tumor suppressor gene, circARID1A promotes cell growth and survival, mediated by direct interaction with the KHSRP RNA-binding protein. Our study highlights the importance of MYCN regulating circRNAs in cancer and identifies molecular mechanisms, which explain their contribution to neuroblastoma pathogenesis.
Structural analysis of RNA is an important and versatile tool to investigate the function of this type of molecules in the cell as well as in vitro. Several robust and reliable procedures are available, relying on chemical modification inducing RT stops or nucleotide misincorporations during reverse transcription. Others are based on cleavage reactions and RT stop signals. However, these methods address only one side of the RT stop or misincorporation position. Here, we describe Led-Seq, a new approach based on lead-induced cleavage of unpaired RNA positions, where both resulting cleavage products are investigated. The RNA fragments carrying 2′, 3′-cyclic phosphate or 5′-OH ends are selectively ligated to oligonucleotide adapters by specific RNA ligases. In a deep sequencing analysis, the cleavage sites are identified as ligation positions, avoiding possible false positive signals based on premature RT stops. With a benchmark set of transcripts in Escherichia coli, we show that Led-Seq is an improved and reliable approach based on metal ion-induced phosphodiester hydrolysis to investigate RNA structures in vivo.
Background: Evolutionary scenarios describing the evolution of a family of genes within a collection of species comprise the mapping of the vertices of a gene tree T to vertices and edges of a species tree S. The relative timing of the last common ancestors of two extant genes (leaves of T) and the last common ancestors of the two species (leaves of S) in which they reside is indicative of horizontal gene transfers (HGT) and ancient duplications. Orthologous gene pairs, on the other hand, require that their last common ancestors coincides with a corresponding speciation event. The relative timing information of gene and species divergences is captured by three colored graphs that have the extant genes as vertices and the species in which the genes are found as vertex colors: the equal-divergence-time (EDT) graph, the later-divergence-time (LDT) graph and the prior-divergence-time (PDT) graph, which together form an edge partition of the complete graph. Results: Here we give a complete characterization in terms of informative and forbidden triples that can be read off the three graphs and provide a polynomial time algorithm for constructing an evolutionary scenario that explains the graphs, provided such a scenario exists. While both LDT and PDT graphs are cographs, this is not true for the EDT graph in general. We show that every EDT graph is perfect. While the information about LDT and PDT graphs is necessary to recognize EDT graphs in polynomial-time for general scenarios, this extra information can be dropped in the HGT-free case. However, recognition of EDT graphs without knowledge of putative LDT and PDT graphs is NP-complete for general scenarios. In contrast, PDT graphs can be recognized in polynomial-time. We finally connect the EDT graph to the alternative definitions of orthology that have been proposed for scenarios with horizontal gene transfer. With one exception, the corresponding graphs are shown to be colored cographs.