array(14 items) uid => 912 (integer) title => '3D Genome Reconstruction from Partially Phased Hi-C Data' (56 chars) abstract => 'The 3-dimensional (3D) structure of the genome is of significant importance
for many cellular processes. In this paper, we study the problem of reconstr
ucting the 3D structure of chromosomes from Hi-C data of diploid organisms,
which poses additional challenges compared to the better-studied haploid set
ting. With the help of techniques from algebraic geometry, we prove that a s
mall amount of phased data is sufficient to ensure finite identifiability, b
oth for noiseless and noisy data. In the light of these results, we propose
a new 3D reconstruction method based on semidefinite programming, paired wit
h numerical algebraic geometry and local optimization. The performance of th
is method is tested on several simulated datasets under different noise leve
ls and with different amounts of phased data. We also apply it to a real dat
aset from mouse X chromosomes, and we are then able to recover previously kn
own structural features.' (936 chars) authors => array(5 items) 0 => array(3 items) last_name => 'Cifuentes' (9 chars) first_name => 'Diego' (5 chars) sorting => 1 (integer) 1 => array(3 items) last_name => 'Jan' (3 chars) first_name => 'Draisma' (7 chars) sorting => 2 (integer) 2 => array(3 items) last_name => 'Henriksson' (10 chars) first_name => 'Oskar' (5 chars) sorting => 3 (integer) 3 => array(3 items) last_name => 'Korchmaros' (10 chars) first_name => 'Annachiara' (10 chars) sorting => 4 (integer) 4 => array(3 items) last_name => 'Kubjas' (6 chars) first_name => 'Kaie' (4 chars) sorting => 5 (integer) type => '0' (1 chars) keywords => '3D genome organization, Diploid organisms, Hi-C, Applied algebraic geometry,
Numerical algebraic geometry, Mathematics Subject Classification, 92E10, 92
-08, 13P25, 14P05, 65H14, 90C90' (183 chars) year => 2024 (integer) affiliation => 0 (integer) link_paper => '' (0 chars) link_supplements => '' (0 chars) file_published => 0 (integer) journal => '' (0 chars) doi => '' (0 chars) preprint => '-1' (2 chars)
3D Genome Reconstruction from Partially Phased Hi-C Data
2024: Diego Cifuentes; Draisma Jan; Oskar Henriksson; Annachiara Korchmaros; Kaie KubjasThe 3-dimensional (3D) structure of the genome is of significant importance for many cellular processes. In this paper, we study the problem of reconstructing the 3D structure of chromosomes from Hi-C data of diploid organisms, which poses additional challenges compared to the better-studied haploid setting. With the help of techniques from algebraic geometry, we prove that a small amount of phased data is sufficient to ensure finite identifiability, both for noiseless and noisy data. In the light of these results, we propose a new 3D reconstruction method based on semidefinite programming, paired with numerical algebraic geometry and local optimization. The performance of this method is tested on several simulated datasets under different noise levels and with different amounts of phased data. We also apply it to a real dataset from mouse X chromosomes, and we are then able to recover previously known structural features.
Keywords: 3D genome organization, Diploid organisms, Hi-C, Applied algebraic geometry, Numerical algebraic geometry, Mathematics Subject Classification, 92E10, 92-08, 13P25, 14P05, 65H14, 90C90array(14 items) uid => 953 (integer) title => 'Aberrant Mitochondrial tRNA Genes Appear Frequently in Animal Evolution' (71 chars) abstract => 'Mitochondrial tRNAs have acquired a diverse portfolio of aberrant structures
throughout metazoan evolution. With the availability of more than 12,500 mi
togenome sequences, it is essential to compile a comprehensive overview of t
he pattern changes with regard to mitochondrial tRNA repertoire and structur
al variations. This, of course, requires reanalysis of the sequence data of
more than 250,000 mitochondrial tRNAs with a uniform workflow. Here, we repo
rt our results on the complete reannotation of all mitogenomes available in
the RefSeq database by September 2022 using mitos2. Based on the individual
cases of mitochondrial tRNA variants reported throughout the literature, our
data pinpoint the respective hotspots of change, i.e. Acanthocephala (Lopho
trochozoa), Nematoda, Acariformes, and Araneae (Arthropoda). Less dramatic d
eviations of mitochondrial tRNAs from the norm are observed throughout many
other clades. Loss of arms in animal mitochondrial tRNA clearly is a phenome
non that occurred independently many times, not limited to a small number of
specific clades. The summary data here provide a starting point for systema
tic investigations into the detailed evolutionary processes of structural re
duction and loss of mitochondrial tRNAs as well as a resource for further im
provements of annotation workflows for mitochondrial tRNA annotation.' (1361 chars) authors => array(6 items) 0 => array(3 items) last_name => 'Ozerova' (7 chars) first_name => 'Iuliia' (6 chars) sorting => 1 (integer) 1 => array(3 items) last_name => 'Fallmann' (8 chars) first_name => 'Jörg' (5 chars) sorting => 2 (integer) 2 => array(3 items) last_name => 'Mörl' (5 chars) first_name => 'Mario' (5 chars) sorting => 3 (integer) 3 => array(3 items) last_name => 'Bernt' (5 chars) first_name => 'Matthias' (8 chars) sorting => 4 (integer) 4 => array(3 items) last_name => 'Prohaska' (8 chars) first_name => 'Sonja J.' (8 chars) sorting => 5 (integer) 5 => array(3 items) last_name => 'Stadler' (7 chars) first_name => 'Peter Florian' (13 chars) sorting => 6 (integer) type => '0' (1 chars) keywords => '' (0 chars) year => 2024 (integer) affiliation => 0 (integer) link_paper => '' (0 chars) link_supplements => '' (0 chars) file_published => 0 (integer) journal => 'Genome Biol. Evol.' (18 chars) doi => '10.1093/gbe/evae232' (19 chars) preprint => '-1' (2 chars)
array(14 items) uid => 950 (integer) title => 'Are the chemical families still there? Formal structure of similarity of ele
ments and its thermochemical domain' (111 chars) abstract => 'The periodic table organizes chemical elements into families based on their
similarity, now understood through Quantum Mechanics. However, these familie
s were inferred from limited compounds in the nineteenth century. Since then
, the number of compounds has exponentially grown, leading to the discovery
of new types unknown to pioneers. This situation prompts the question of whe
ther these families can still be discerned from the amassed data, or if thei
r recognition is confined to specific thermochemical domains. To address thi
s inquiry, we conducted a comprehensive exploration by comparing formulae (1
771–2015) as a proxy for chemical similarity. Our findings reveal that sto
ichiometry not only captures a significant portion of the trends observed wi
thin families but also unveils other intriguing features of the formal struc
ture of chemical similarity. These patterns approach equivalence classes ind
ependent of thermochemical context and demonstrate high resilience. Temporal
analysis demonstrates that, since approximately 1980, similarity is diminis
hing due to an increasing production of unique formulae for nearly all eleme
nts. Nevertheless, chemical families endure over time and they stand out as
the most robust similarity patterns. Our analysis offers compelling evidence
that any study will reach the same conclusions, provided there is a suffici
ent diversity in input compound data.' (1405 chars) authors => array(5 items) 0 => array(3 items) last_name => 'Llanos Ballestas' (16 chars) first_name => 'Eugenio' (7 chars) sorting => 1 (integer) 1 => array(3 items) last_name => 'Leal' (4 chars) first_name => 'Wilmer' (6 chars) sorting => 2 (integer) 2 => array(3 items) last_name => 'Bernal' (6 chars) first_name => 'Andrés' (7 chars) sorting => 3 (integer) 3 => array(3 items) last_name => 'Jost' (4 chars) first_name => 'Jürgen' (7 chars) sorting => 4 (integer) 4 => array(3 items) last_name => 'Stadler' (7 chars) first_name => 'Peter Florian' (13 chars) sorting => 5 (integer) type => '0' (1 chars) keywords => '' (0 chars) year => 2024 (integer) affiliation => 0 (integer) link_paper => '' (0 chars) link_supplements => '' (0 chars) file_published => 0 (integer) journal => 'Proc. Roy. Soc. A' (17 chars) doi => '10.1098/rspa.2024.0165' (22 chars) preprint => '-1' (2 chars)
array(14 items) uid => 935 (integer) title => 'BioDeepfuse: a hybrid deep learning approach with integrated feature extract
ion techniques for enhanced non-coding RNA classification' (133 chars) abstract => 'The accurate classification of non-coding RNA (ncRNA) sequences is pivotal f
or advanced non-coding genome annotation and analysis, a fundamental aspect
of genomics that facilitates understanding of ncRNA functions and regulatory
mechanisms in various biological processes. While traditional machine learn
ing approaches have been employed for distinguishing ncRNA, these often nece
ssitate extensive feature engineering. Recently, deep learning algorithms ha
ve provided advancements in ncRNA classification. This study presents BioDee
pFuse, a hybrid deep learning framework integrating convolutional neural net
works (CNN) or bidirectional long short-term memory (BiLSTM) networks with h
andcrafted features for enhanced accuracy. This framework employs a combinat
ion of k-mer one-hot, k-mer dictionary, and feature extraction techniques fo
r input representation. Extracted features, when embedded into the deep netw
ork, enable optimal utilization of spatial and sequential nuances of ncRNA s
equences. Using benchmark datasets and real-world RNA samples from bacterial
organisms, we evaluated the performance of BioDeepFuse. Results exhibited h
igh accuracy in ncRNA classification, underscoring the robustness of our too
l in addressing complex ncRNA sequence data challenges. The effective meldin
g of CNN or BiLSTM with external features heralds promising directions for f
uture research, particularly in refining ncRNA classifiers and deepening ins
ights into ncRNAs in cellular processes and disease manifestations. In addit
ion to its original application in the context of bacterial organisms, the m
ethodologies and techniques integrated into our framework can potentially re
nder BioDeepFuse effective in various and broader domains.' (1730 chars) authors => array(9 items) 0 => array(3 items) last_name => 'Avila Santos' (12 chars) first_name => 'Anderson P.' (11 chars) sorting => 1 (integer) 1 => array(3 items) last_name => 'de Almeida' (10 chars) first_name => 'Breno L. S.' (12 chars) sorting => 2 (integer) 2 => array(3 items) last_name => 'Bonidia' (7 chars) first_name => 'Robson P.' (9 chars) sorting => 3 (integer) 3 => array(3 items) last_name => 'Stadler' (7 chars) first_name => 'Peter Florian' (13 chars) sorting => 4 (integer) 4 => array(3 items) last_name => 'Štefanič' (10 chars) first_name => 'Polanca' (7 chars) sorting => 5 (integer) 5 => array(3 items) last_name => 'Mandic-Mulec' (12 chars) first_name => 'Ines' (4 chars) sorting => 6 (integer) 6 => array(3 items) last_name => 'da Rocha' (8 chars) first_name => 'Ulisses Nunes' (13 chars) sorting => 7 (integer) 7 => array(3 items) last_name => 'Sanches' (7 chars) first_name => 'Danilo S.' (9 chars) sorting => 8 (integer) 8 => array(3 items) last_name => 'Carvalho' (8 chars) first_name => 'André Carlos Ponce de Leon Ferreira de' (39 chars) sorting => 9 (integer) type => '0' (1 chars) keywords => '' (0 chars) year => 2024 (integer) affiliation => 0 (integer) link_paper => '' (0 chars) link_supplements => '' (0 chars) file_published => 0 (integer) journal => 'RNA Biology' (11 chars) doi => '10.1080/15476286.2024.2329451' (29 chars) preprint => '-1' (2 chars)
array(14 items) uid => 931 (integer) title => 'Cavity approach for the approximation of spectral density of graphs with het
erogeneous structures' (97 chars) abstract => 'Graphs have become widely used to represent and study social, biological, an
d technological systems. Statistical methods to analyze empirical graphs wer
e proposed based on the graph's spectral density. However, their running tim
e is cubic in the number of vertices, precluding direct application to large
instances. Thus, efficient algorithms to calculate the spectral density bec
ome necessary. For sparse graphs, the cavity method can efficiently approxim
ate the spectral density of locally treelike undirected and directed graphs.
However, it does not apply to most empirical graphs because they have heter
ogeneous structures. Thus, we propose methods for undirected and directed gr
aphs with heterogeneous structures using a new vertex's neighborhood definit
ion and the cavity approach. Our methods' time and space complexities are O(
|E|h_{max}^{3}t) and O(|E|h_{max}^{2}t), respectively, where |E| is the numb
er of edges, h_{max} is the size of the largest local neighborhood of a vert
ex, and t is the number of iterations required for convergence. We demonstra
te the practical efficacy by estimating the spectral density of simulated an
d real-world undirected and directed graphs.' (1184 chars) authors => array(3 items) 0 => array(3 items) last_name => 'Guzman' (6 chars) first_name => 'Grover E. C.' (13 chars) sorting => 1 (integer) 1 => array(3 items) last_name => 'Stadler' (7 chars) first_name => 'Peter Florian' (13 chars) sorting => 2 (integer) 2 => array(3 items) last_name => 'Fujita' (6 chars) first_name => 'André' (6 chars) sorting => 3 (integer) type => '0' (1 chars) keywords => '' (0 chars) year => 2024 (integer) affiliation => 1 (integer) link_paper => '' (0 chars) link_supplements => '' (0 chars) file_published => 0 (integer) journal => 'Physical Review E' (17 chars) doi => '10.1103/PhysRevE.109.034303' (27 chars) preprint => '-1' (2 chars)
array(14 items) uid => 941 (integer) title => 'Chemically inspired Erdős-Rényi oriented hypergraphs' (54 chars) abstract => 'High-order structures have been recognised as suitable models for systems go
ing beyond the binary relationships for which graph models are appropriate.
Despite their importance and surge in research on these structures, their ra
ndom cases have been only recently become subjects of interest. One of these
high-order structures is the oriented hypergraph, which relates couples of
subsets of an arbitrary number of vertices. Here we develop the Erdős-Rény
i model for oriented hypergraphs, which corresponds to the random realisatio
n of oriented hyperedges of the complete oriented hypergraph. A particular f
eature of random oriented hypergraphs is that the ratio between their expect
ed number of oriented hyperedges and their expected degree or size is 3/2 fo
r large number of vertices. We highlight the suitability of oriented hypergr
aphs for modelling large collections of chemical reactions and the importanc
e of random oriented hypergraphs to analyse the unfolding of chemistry.' (983 chars) authors => array(5 items) 0 => array(3 items) last_name => 'Garcia-Chung' (12 chars) first_name => 'Angel' (5 chars) sorting => 1 (integer) 1 => array(3 items) last_name => 'Bermúdez-Montaña' (18 chars) first_name => 'Marisol' (7 chars) sorting => 2 (integer) 2 => array(3 items) last_name => 'Stadler' (7 chars) first_name => 'Peter Florian' (13 chars) sorting => 3 (integer) 3 => array(3 items) last_name => 'Jost' (4 chars) first_name => 'Jürgen' (7 chars) sorting => 4 (integer) 4 => array(3 items) last_name => 'Restrepo' (8 chars) first_name => 'Guillermo' (9 chars) sorting => 5 (integer) type => '0' (1 chars) keywords => '' (0 chars) year => 2024 (integer) affiliation => 0 (integer) link_paper => '' (0 chars) link_supplements => '' (0 chars) file_published => 0 (integer) journal => 'J. Math. Chem.' (14 chars) doi => '10.1007/s10910-024-01595-8' (26 chars) preprint => '-1' (2 chars)
array(14 items) uid => 938 (integer) title => 'Comparative RNA genomics' (24 chars) abstract => 'Over the last quarter of a century it has become clear that RNA is much more
than just a boring intermediate in protein expression. Ancient RNAs still a
ppear in the core information metabolism and comprise a surprisingly large c
omponent in bacterial gene regulation. A common theme with these types of mo
stly small RNAs is their reliance of conserved secondary structures. Large-s
cale sequencing projects, on the other hand, have profoundly changed our und
erstanding of eukaryotic genomes. Pervasively transcribed, they give rise to
a plethora of large and evolutionarily extremely flexible non-coding RNAs t
hat exert a vastly diverse array of molecule functions. In this chapter we p
rovide a-necessarily incomplete-overview of the current state of comparative
analysis of non-coding RNAs, emphasizing computational approaches as a mean
s to gain a global picture of the modern RNA world.' (887 chars) authors => array(4 items) 0 => array(3 items) last_name => 'Backofen' (8 chars) first_name => 'Rolf' (4 chars) sorting => 1 (integer) 1 => array(3 items) last_name => 'Gorodkin' (8 chars) first_name => 'Jan' (3 chars) sorting => 2 (integer) 2 => array(3 items) last_name => 'Hofacker' (8 chars) first_name => 'Ivo L.' (6 chars) sorting => 3 (integer) 3 => array(3 items) last_name => 'Stadler' (7 chars) first_name => 'Peter Florian' (13 chars) sorting => 4 (integer) type => '0' (1 chars) keywords => '' (0 chars) year => 2024 (integer) affiliation => 0 (integer) link_paper => '' (0 chars) link_supplements => '' (0 chars) file_published => 0 (integer) journal => 'Methods in Molecular Biology' (28 chars) doi => ' 10.1007/978-1-0716-3838-5_12' (29 chars) preprint => '-1' (2 chars)
array(14 items) uid => 943 (integer) title => 'Comprehensive survey of conserved RNA secondary structures in full-genome al
ignment of Hepatitis C Virus' (104 chars) abstract => 'Hepatitis C virus (HCV) is a plus-stranded RNA virus that often chronically
infects liver hepatocytes and causes liver cirrhosis and cancer. These virus
es replicate their genomes employing error-prone replicases. Thereby, they r
outinely generate a large 'cloud' of RNA genomes (quasispecies) which-by tri
al and error-comprehensively explore the sequence space available for functi
onal RNA genomes that maintain the ability for efficient replication and imm
une escape. In this context, it is important to identify which RNA secondary
structures in the sequence space of the HCV genome are conserved, likely du
e to functional requirements. Here, we provide the first genome-wide multipl
e sequence alignment (MSA) with the prediction of RNA secondary structures t
hroughout all representative full-length HCV genomes. We selected 57 represe
ntative genomes by clustering all complete HCV genomes from the BV-BRC datab
ase based on k-mer distributions and dimension reduction and adding RefSeq s
equences. We include annotations of previously recognized features for easy
comparison to other studies. Our results indicate that mainly the core codin
g region, the C-terminal NS5A region, and the NS5B region contain secondary
structure elements that are conserved beyond coding sequence requirements, i
ndicating functionality on the RNA level. In contrast, the genome regions in
between contain less highly conserved structures. The results provide a com
plete description of all conserved RNA secondary structures and make clear t
hat functionally important RNA secondary structures are present in certain H
CV genome regions but are largely absent from other regions. Full-genome ali
gnments of all branches of Hepacivirus C are provided in the supplement.' (1744 chars) authors => array(7 items) 0 => array(3 items) last_name => 'Triebel' (7 chars) first_name => 'Sandra' (6 chars) sorting => 1 (integer) 1 => array(3 items) last_name => 'Lamkiewicz' (10 chars) first_name => 'Kevin' (5 chars) sorting => 2 (integer) 2 => array(3 items) last_name => 'Ontiveros' (9 chars) first_name => 'Nancy' (5 chars) sorting => 3 (integer) 3 => array(3 items) last_name => 'Sweeney' (7 chars) first_name => 'Blake A' (7 chars) sorting => 4 (integer) 4 => array(3 items) last_name => 'Stadler' (7 chars) first_name => 'Peter Florian' (13 chars) sorting => 5 (integer) 5 => array(3 items) last_name => 'Petrov' (6 chars) first_name => 'Anton I' (7 chars) sorting => 6 (integer) 6 => array(3 items) last_name => 'Marz' (4 chars) first_name => 'Manja' (5 chars) sorting => 7 (integer) type => '0' (1 chars) keywords => '' (0 chars) year => 2024 (integer) affiliation => 0 (integer) link_paper => '' (0 chars) link_supplements => '' (0 chars) file_published => 0 (integer) journal => 'Sci. Rep.' (9 chars) doi => ' 10.1038/s41598-024-62897-0' (27 chars) preprint => '-1' (2 chars)
array(14 items) uid => 951 (integer) title => 'Core potentials: The consensus segmentation conjecture' (54 chars) abstract => 'Segmentations are partitions of an ordered set into non-overlapping interval
s. The Consensus Segmentation or Segmentation Aggregation problem is a speci
al case of the median problems with applications in time series analysis and
computational biology. A wide range of dissimilarity measures for segmentat
ions can be expressed in terms of potentials, a special type of set-function
s. In this contribution, we shed more light on the properties of potentials,
and how such properties affect the solutions of the Consensus Segmentation
problem. In particular, we disprove a conjecture stated in 2021, and we prov
ide further insights into the theoretical foundations of the problem.' (677 chars) authors => array(3 items) 0 => array(3 items) last_name => 'Santiago Argüllo' (17 chars) first_name => 'Anahy' (5 chars) sorting => 1 (integer) 1 => array(3 items) last_name => 'Scholz' (6 chars) first_name => 'Guillaume. E.' (14 chars) sorting => 2 (integer) 2 => array(3 items) last_name => 'Stadler' (7 chars) first_name => 'Peter Florian' (13 chars) sorting => 3 (integer) type => '0' (1 chars) keywords => '' (0 chars) year => 2024 (integer) affiliation => 0 (integer) link_paper => '' (0 chars) link_supplements => '' (0 chars) file_published => 0 (integer) journal => ' Math. Comput. Sci.' (19 chars) doi => '10.1007/s11786-024-00593-y' (26 chars) preprint => '-1' (2 chars)
array(14 items) uid => 964 (integer) title => 'Crossover Operators for Molecular Graphs with an Application to Virtual Drug
Screening' (86 chars) abstract => 'Genetic Algorithms are a powerful method to solve optimization problems with
complex cost functions over vast search spaces that rely in particular on r
ecombining parts of previous solutions. Crossover operators play a crucial r
ole in this context. Here, we describe a large class of these operators desi
gned for searching over spaces of graphs. These operators are based on intro
ducing small cuts into graphs and rejoining the resulting induced subgraphs
of two parents. This form of cut-and-join crossover can be restricted in a c
onsistent way to preserve local properties such as vertex-degrees (valency),
or bond-orders, as well as global properties such as graph-theoretic planar
ity. In contrast to crossover on strings, cut-and-join crossover on graphs i
s powerful enough to ergodically explore chemical space even in the absence
of mutation operators. Extensive benchmarking shows that the offspring of mo
lecular graphs are again plausible molecules with high probability, while at
the same time crossover drastically increases the diversity compared to ini
tial molecule libraries. Moreover, desirable properties such as favorable in
dices of synthesizability are preserved with sufficient frequency that candi
date offsprings can be filtered efficiently for such properties. As an appli
cation we utilized the cut-and-join crossover in REvoLd, a GA-based system f
or computer-aided drug design. In optimization runs searching for ligands bi
nding to four different target proteins we consistently found candidate mole
cules with binding constants exceeding the best known binders as well as can
didates found in make-on-demand libraries. Taken together, cut-and-join cros
sover operators constitute a mathematically simple and well-characterized ap
proach to recombination of molecules that performed very well in real-life C
ADD tasks.' (1834 chars) authors => array(8 items) 0 => array(3 items) last_name => 'Domschke' (8 chars) first_name => 'Nico' (4 chars) sorting => 1 (integer) 1 => array(3 items) last_name => 'Schmidt' (7 chars) first_name => 'Bruno J.' (8 chars) sorting => 2 (integer) 2 => array(3 items) last_name => 'Gatter' (6 chars) first_name => 'Thomas' (6 chars) sorting => 3 (integer) 3 => array(3 items) last_name => 'Golnik' (6 chars) first_name => 'Richard' (7 chars) sorting => 4 (integer) 4 => array(3 items) last_name => 'Eisenhuth' (9 chars) first_name => 'Paul' (4 chars) sorting => 5 (integer) 5 => array(3 items) last_name => 'Liessmann' (9 chars) first_name => 'Fabian' (6 chars) sorting => 6 (integer) 6 => array(3 items) last_name => 'Meiler' (6 chars) first_name => 'Jens' (4 chars) sorting => 7 (integer) 7 => array(3 items) last_name => 'Stadler' (7 chars) first_name => 'Peter Florian' (13 chars) sorting => 8 (integer) type => '1' (1 chars) keywords => '' (0 chars) year => 2024 (integer) affiliation => 0 (integer) link_paper => '' (0 chars) link_supplements => '' (0 chars) file_published => 0 (integer) journal => 'ChemRxiv' (8 chars) doi => '10.26434/chemrxiv-2024-41295' (28 chars) preprint => '-1' (2 chars)