Proseminars at our group

schland
Beabsichtigen Sie einen Vortrag im Rahmen eines Problemseminars zu halten, so melden Sie dies bitte vorher unverbindlich an: Anmeldeformular für Problemseminare

Vortragszeit: 30 min + anschliessende Diskussion

uk
If you plan to give a talk as a proseminar, please register first: Registration-Form Proseminars

Length of talk: 30 min + discussion afterwards

Registered proseminars

[ edit ]
Iman Gharib, 28.01.2013, Sommersemester 2013

A subgraph isomorphism algorithm and its application to biochemical data
Background: Graphs can represent biological networks at the molecular, protein, or species level. An important query
is to find all matches of a pattern graph to a target graph. Accomplishing this is inherently difficult (NP-complete) and
the efficiency of heuristic algorithms for the problem may depend upon the input graphs. The common aim of existing
algorithms is to eliminate unsuccessful mappings as early as and as inexpensively as possible.
Results: We propose a new subgraph isomorphism algorithm which applies a search strategy to significantly
reduce the search space without using any complex pruning rules or domain reduction procedures. We compare
our method with the most recent and efficient subgraph isomorphism algorithms (VFlib, LAD, and our C++
implementation of FocusSearch which was originally distributed in Modula2) on synthetic, molecules, and
interaction networks data. We show a significant reduction in the running time of our approach compared with
these other excellent methods and show that our algorithm scales well as memory demands increase.
Conclusions: Subgraph isomorphism algorithms are intensively used by biochemical tools. Our analysis gives a
comprehensive comparison of different software approaches to subgraph isomorphism highlighting their
weaknesses and strengths. This will help researchers make a rational choice among methods depending on their
application. We also distribute an open-source package including our system and our own C++ implementation of
FocusSearch together with all the used datasets . In future work, our findings may
be extended to approximate subgraph isomorphism algorithms.

eferences
McKay B: Practical graph isomorphism. Congressus Numerantium 1981,
30:45-87.
A subgraph isomorphism algorithm and its application to biochemical data
Vincenzo Bonnici , Rosalba Giugno, Alfredo Pulvirenti, Dennis Shasha, Alfredo Ferro

[ edit ]
Georg Mühlenberg, 07.02.2013, Wintersemester 2012/2013

Phylogeographic characterization of tick-borne encephalitis virus from patients, rodents and ticks in Slovenia.
Abstract

Tick-borne encephalitis virus (TBEV) is the most important arboviral agent causing infections of the central nervous system in central Europe. Previous studies have shown that TBEV exhibits pronounced genetic variability, which is often correlated to the geographical origin of TBEV. Genetic variability of TBEV has previously been studied predominantly in rodents and ticks, while information about the variability in patients is scarce. In order to understand the molecular relationships of TBEV between natural hosts, vectors and humans, as well as correlation between phylogenetic and geographical clustering, sequences of TBEV E and NS5 protein genes, were obtained by direct sequencing of RT-PCR products from TBE-confirmed patients as well as from rodents and ticks collected from TBE-endemic regions in Slovenia. A total of 27 partial E protein gene sequences representing 15 human, 4 rodent and 8 tick samples and 30 partial NS5 protein gene sequences representing 17 human, 5 rodent and 8 tick samples were obtained. The complete genome sequence of TBEV strain Ljubljana I was simultaneously obtained. Phylogenetic analysis of the E and NS5 protein gene sequences revealed a high degree of TBEV variability in patients, ticks and rodents. Furthermore, an evident correlation between geographical and phylogenetic clustering was shown that was independent of the TBEV host. Moreover, we show the presence of a possible recombination event in the TBEV genome obtained from a patient sample, which was supported with multiple recombination event detection methods. This is the first study that simultaneously analyzed the genetic relationships of directly sequenced TBEV samples from patients, ticks and rodents and provides the largest set of patient-derived TBEV sequences up to date. In addition, we have confirmed the geographical clustering of TBEV sequences in Slovenia and have provided evidence of a possible recombination event in the TBEV genome, obtained from a patient.

http://www.ncbi.nlm.nih.gov/pubmed/23185257

Das Paper stammt von

Fajs L, Durmiši E, Knap N, Strle F, Avšič-Županc T.
Source

Institute of Microbiology and Immunology, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia

[ edit ]
Katharina Hoessel, 07.02.2013, Wintersemester 2012/2013

The GEM mapper: fast, accurate and versatile alignment by filtration
Because of ever-increasing throughput requirements of sequencing data, most existing short-read aligners have been designed to focus on speed at the expense of accuracy. The Genome Multitool (GEM) mapper can leverage string matching by filtration to search the alignment space more efficiently, simultaneously delivering precision (performing fully tunable exhaustive searches that return all existing matches, including gapped ones) and speed (being several times faster than comparable state-of-the-art tools).

Marco-Sola, S., Sammeth, M., Guigo, R., & Ribeca, P. (2012). The GEM mapper: fast, accurate and versatile alignment by filtration. Nat Meth, advance online publication. doi:10.1038/nmeth.2221

[ edit ]
Marina Chkolnikov, 07.02.2013, Wintersemester 2012/2013

What do molecules do when we are not looking? State sequence analysis for stochastic chemical systems.
Many biomolecular systems depend on orderly sequences of chemical transformations or reactions. Yet, the dynamics of single molecules or small-copy-number molecular systems are significantly stochastic. Here, we propose state sequence analysis--a new approach for predicting or visualizing the behaviour of stochastic molecular systems by computing maximum probability state sequences, based on initial conditions or boundary conditions. We demonstrate this approach by analysing the acquisition of drug-resistance mutations in the human immunodeficiency virus genome, which depends on rare events occurring on the time scale of years, and the stochastic opening and closing behaviour of a single sodium ion channel, which occurs on the time scale of milliseconds. In both cases, we find that our approach yields novel insights into the stochastic dynamical behaviour of these systems, including insights that are not correctly reproduced in standard time-discretization approaches to trajectory analysis.

[ edit ]
Georg Muehlenberg, 28.01.2013, Wintersemester 2012/2013

TBEV Paper
Das Paper wertet eine Studie aus der Slowakei aus.
Ein Abstract wird noch nachgereicht.

[ edit ]
Tariq Yousef, 07.02.2013, Wintersemester 2012/2013

clustering algorithms in bioinformatics
Clustering is a process which has a great importance in many fields including bioinformatics. One of the examples is finding groups of genes performing similar functions or being involved in the same biological processes.

My talk will focus at the following points :

- What is clustering?
- K-means algorithm
- Graph-based Clustering
- Hierarchical Clustering

http://bix.ucsd.edu/bioalgorithms/presentations/Ch10_Clustering.pdf
http://lib.bioinfo.pl/courses/view/491

[ edit ]
Lisa Duchstein, 07.02.2013, Wintersemester 2012/2013

Alternative Protein-Protein Interfaces Are Frequent Exceptions
The intricate molecular details of protein-protein interactions (PPIs) are crucial for function. Therefore, measuring the same interacting protein pair again, we expect the same result. This work measured the similarity in the molecular details of interaction for the same and for homologous protein pairs between different experiments. All scores analyzed suggested that different experiments often find exceptions in the interfaces of similar PPIs: up to 22% of all comparisons revealed some differences even for sequence-identical pairs of proteins. The corresponding number for pairs of close homologs reached 68%. Conversely, the interfaces differed entirely for 12–29% of all comparisons. All these estimates were calculated after redundancy reduction. The magnitude of interface differences ranged from subtle to the extreme, as illustrated by a few examples. An extreme case was a change of the interacting domains between two observations of the same biological interaction. One reason for different interfaces was the number of copies of an interaction in the same complex: the probability of observing alternative binding modes increases with the number of copies. Even after removing the special cases with alternative hetero-interfaces to the same homomer, a substantial variability remained. Our results strongly support the surprising notion that there are many alternative solutions to make the intricate molecular details of PPIs crucial for function.

[ edit ]
Lieselotte Erber, 07.02.2013, Wintersemester 2012/2013

High-throughput analysis of epistasis in genome-wide association studies with BiForce
Abstract
MOTIVATION

Gene-gene interactions (epistasis) are thought to be important in shaping complex traits, but they have been under-explored in genome-wide association studies (GWAS) due to the computational challenge of enumerating billions of single nucleotide polymorphism (SNP) combinations. Fast screening tools are needed to make epistasis analysis routinely available in GWAS.
RESULTS

We present BiForce to support high-throughput analysis of epistasis in GWAS for either quantitative or binary disease (case-control) traits. BiForce achieves great computational efficiency by using memory efficient data structures, Boolean bitwise operations and multithreaded parallelization. It performs a full pair-wise genome scan to detect interactions involving SNPs with or without significant marginal effects using appropriate Bonferroni-corrected significance thresholds. We show that BiForce is more powerful and significantly faster than published tools for both binary and quantitative traits in a series of performance tests on simulated and real datasets. We demonstrate BiForce in analysing eight metabolic traits in a GWAS cohort (323 697 SNPs, 4500 individuals) and two disease traits in another (340 000 SNPs, 1750 cases and 1500 controls) on a 32-node computing cluster. BiForce completed analyses of the eight metabolic traits within 1 day, identified nine epistatic pairs of SNPs in five metabolic traits and 18 SNP pairs in two disease traits. BiForce can make the analysis of epistasis a routine exercise in GWAS and thus improve our understanding of the role of epistasis in the genetic regulation of complex traits.


[ edit ]
Sabina Kanton, 28.01.2013, Wintersemester 2012/2013

REvolver modeling sequence evolution under domain constraints
Simulating the change of protein sequences over time in a biologically realistic way is fundamental for a broad range of studies with a focus on evolution. It is, thus, problematic that typically simulators evolve individual sites of a sequence identically and independently. More realistic simulations are possible, however, they are often prohibited by limited knowledge concerning site-specific evolutionary constraints or functional dependencies between amino acids. As a consequence, a proteins functional and structural characteristics are rapidly lost in the course of simulated evolution. Here, we present REvolver, a program that simulates protein sequence alteration such that evolutionarily stable sequence characteristics, like functional domains, are maintained. For this purpose, REvolver recruits profile hidden Markov models (pHMMs) for parameterizing site-specific models of sequence evolution in an automated fashion. pHMMs derived from alignments of homologous proteins or protein domains capture information regarding which sequence sites remained conserved over time and where in a sequence insertions or deletions are more likely to occur. Thus, they describe constraints on the evolutionary process acting on these sequences. To demonstrate the performance of REvolver as well as its applicability in large-scale simulation studies, we evolved the entire human proteome up to 1.5 expected substitutions per site. Simultaneously, we analyzed the preservation of Pfam and SMART domains in the simulated sequences over time. REvolver preserved 92 percent of the Pfam domains originally present in the human sequences. This value drops to 15 percent when traditional models of amino acid sequence evolution are used. Thus, REvolver represents a significant advance toward a realistic simulation of protein sequence evolution on a proteome-wide scale. Further, REvolver facilitates the simulation of a protein family with a user-defined domain architecture at the root.

[ edit ]
Henrike Indrischek, 28.01.2013, Wintersemester 2012/2013

Prediction of Cell Penetrating Peptides by Support Vector Machines
Cell penetrating peptides (CPPs) are those peptides that can transverse cell membranes to enter cells. Once inside the cell, different CPPs can localize to different cellular components and perform different roles. Some generate pore-forming complexes resulting in the destruction of cells while others localize to various organelles. Use of machine learning methods to predict potential new CPPs will enable more rapid screening for applications such as drug delivery. We have investigated the influence of the composition of training datasets on the ability to classify peptides as cell penetrating using support vector machines (SVMs). We identified 111 known CPPs and 34 known non-penetrating peptides from the literature and commercial vendors and used several approaches to build training data sets for the classifiers. Features were calculated from the datasets using a set of basic biochemical properties combined with features from the literature determined to be relevant in the prediction of CPPs. Our results using different training datasets confirm the importance of a balanced training set with approximately equal number of positive and negative examples. The SVM based classifiers have greater classification accuracy than previously reported methods for the prediction of CPPs, and because they use primary biochemical properties of the peptides as features, these classifiers provide insight into the properties needed for cell-penetration. To confirm our SVM classifications, a subset of peptides classified as either penetrating or non-penetrating was selected for synthesis and experimental validation. Of the synthesized peptides predicted to be CPPs, 100% of these peptides were shown to be penetrating.

[ edit ]
Maggie Moosig, 16.01.2013, Wintersemester 2012/2013

xxx
xxx

xxx

[ edit ]
Toni Foerster, 28.01.2013, Wintersemester 2012/2013

Detection of differentially expressed segments in tiling array data
Motivation: Tiling arrays have been a mainstay of unbiased
genome-wide transcriptomics over the last decade. Currently
available approaches to identify expressed or differentially expressed
segments in tiling array data are limited in the recovery of the
underlying gene structures and require several parameters that are
intensity-related or partly dataset-specific.
Results: We have developed TileShuffle, a statistical approach
that identifies transcribed and differentially expressed segments
as significant differences from the background distribution while
considering sequence-specific affinity biases and cross-hybridization.
It avoids dataset-specific parameters in order to provide better
comparability of different tiling array datasets, based on different
technologies or array designs. TileShuffle detects highly and
differentially expressed segments in biological data with significantly
lower false discovery rates under equal sensitivities than commonly
used methods. Also, it is clearly superior in the recovery of exonintron
structures. It further provides window z-scores as a normalized
and robust measure for visual inspection.

[ edit ]
Loreen Knoebel, 16.01.2013, Wintersemester 2012/2013

Exploiting Gene Families for Phylogenomic Analysis of Myzstomid Transcriptome Data
- Einordnung der Myzostomida in den Tree of Life der Metazoa mit bisherigen Methoden eher schwierig
- neue bioinformatische Methoden und Hochdurchsatz-Sequenzierung koennen helfen
- Methoden = Perlskripte, frei erhaeltliche Software, Multiple Sequenzalignments, Gene Tree Parsimony, Maximum Likelihood Trees

[ edit ]
Franziska Hopfe, 16.01.2013, Wintersemester 2012/2013

Seven new dolphin mitochondrial genomes and a time-calibrated phylogeny of whales und Cytochrom b and Bayesian
Mit Hilfe 7 neu sequnezierter mitochondrialer Genome von Delphinen und einer Molekularen Uhr wurde die Phylogenie der Wale konstruiert. Daten wurden mit Bayesian analysiert und die Molekulare Uhr gab Aufschluss ueber die Radiation verschiedener Walfamilien. Man konnte nachweisen das Tursiops und Stenella nicht monophyletisch sind.
Mit Hilfe eines mitochondrialen Genoms konnte zum ersten mal Odontoceti als eine eigene Gruppe dargestellt werden (monophyletisch). Es wurden die Cytochrom b Daten von 66 Waltaxa und 24 Aussengruppen mit Bayesian analysiert. Paper stellt klar, wie wichtig das Einbeziehen von Aussengruppen ist.

[ edit ]
Ronny Richter, 16.07.2012, Sommersemester 2012

Modeling population connectivity by ocean currents, a graph-theoretic approach for marine conservation
Abstract The dispersal of individuals among
marine populations is of great importance to metapopulation
dynamics, population persistence, and
species expansion. Understanding this connectivity
between distant populations is key to their effective
conservation and management. For many marine
species, population connectivity is determined largely
by ocean currents transporting larvae and juveniles
between distant patches of suitable habitat. Recent
work has focused on the biophysics of marine larval
dispersal and its importance to population dynamics,
although few studies have evaluated the spatial and
temporal patterns of this potential dispersal. Here, we
show how an Eulerian advection–diffusion approach
can be used to model the dispersal of coral larvae
between reefs throughout the Tropical Pacific. We
illustrate how this connectivity can be analyzed using
graph theory—an effective approach for exploring
patterns in spatial connections, as well as for
determining the importance of each site and pathway
to local and regional connectivity. Results indicate
that the scale (average distance) of dispersal in the
Pacific is on the order of 50–150 km, consistent with
recent studies in the Caribbean (Cowen, et al. 2006).
Patterns in the dispersal graphs highlight pathways
for larval dispersal along major ocean currents and
through island chains. A series of critical island
‘stepping stones’ are discovered providing potential
pathways across the equatorial currents and connecting
distant island groups. Patterns in these dispersal
graphs highlight possible pathways for species
expansions, reveal connected upstream/downstream
populations, and suggest areas that might be prioritized
for marine conservation efforts.

Andrews JC, Gay S, Sammarco PW (1988) Influence of circulation
on self-seeding patterns at Helix Reef-Great
Barrier Reef. In: Proceedings of the 6th Int. Coral Reef
Symposium, Townsville, Australia vol 2, pp 469–474
Barber PH, Palumbi SR, Erdmann MV et al (2002) Sharp
genetic breaks among populations of Haptosquilla pulchella
(Stomatopoda) indicate limits to larval transport:
patterns, causes, and consequences. Mol Ecol 11:659–674
Benzie JAH (1999) Genetic structure of coral reef organisms:
ghosts of dispersal past. Am Zool 39:131–145
Benzie JAH, Williams ST (1997) Genetic structure of giant
clam (Tridacna maxima) populations in the west Pacific is
not consistent with dispersal by present-day ocean currents.
Evolution 51:768–783
Botsford LW, Hastings A, Gaines SD (2001) Dependence of
sustainability on the configuration of marine reserves and
larval dispersal distance. Ecol Lett 4:144–150
Botsford LW, Micheli F, Hastings A (2003) Principles for the
design of marine reserves. Ecol Appl 13:S25–S31
Calabrese JM, Fagan WF (2004) A comparison-shopper’s
guide to connectivity metrics. Front Ecol Environ 2:529–
536
Cantwell MD, Forman RTT (1993) Landscape graphs—ecological
modeling with graph-theory to detect
configurations common to diverse landscapes. Landsc
Ecol 8:239–255
Clark JS, Silman M, Kern R et al (1999) Seed dispersal near
and far: patterns across temperate and tropical forests.
Ecology 80:1475–1494
Connolly SR, Bellwood DR, Hughes TP (2003) Indo-Pacific
biodiversity of coral reefs: deviations from a mid-domain
model. Ecology 84:2178–2190
Connolly SR, Hughes TP, Bellwood DR et al (2005) Community
structure of corals and reef fishes at multiple
scales. Science 309:1363–1365
Cowen RK, Lwiza KMM, Sponaugle S et al (2000) Connectivity
of marine populations: open or closed? Science
287:857–859
Cowen RK, Paris CB, Olson D et al (2003) The role of long
distance dispersal versus local retention in replenshing
marine populations. Gulf Caribb Res 14:129–137
Cowen RK, Paris CB, Srinivasan A (2006) Scaling of connectivity
in marine populations. Science 311:522–527
de Queiroz A (2005) The resurrection of oceanic dispersal in
historical biogeography. Trends Ecol Evol 20:68–73
Dijkstra EW (1959) A note on two problems in connection
with graphs. Numer Math 1:269–271
Dunbar RB, Wellington GM, Colgan MW et al (1994) Eastern
Pacific sea-surface temperature since 1600-AD—the
Delta-O18 record of climate variability in Galapagos
corals. Paleoceanography 9:291–315
Dunne JA, Williams RJ, Martinez ND (2002) Food-web
structure and network theory: the role of connectance and
size. Proc Natl Acad Sci USA 99:12917–12922
Dyer RJ, Nason JD (2004) Population graphs: the graph theoretic
shape of genetic structure. Mol Ecol 13:1713–1727
Fahrig L, Merriam G (1985) Habitat patch connectivity and
population survival. Ecology 66:1762–1768
Freeman LC (1979) Centrality in social networks conceptual
clarification. Soc Netw 1:215–239
Gaines SD, Gaylord B, Largier JL (2003) Avoiding current
oversights in marine reserve design. Ecol Appl 13:S32–
S46
Gaines SD, Lafferty KD (1995) Modeling the dynamics of
marine species: the importance of incorporating larval
34 Landscape Ecol (2008) 23:19–36
123
dispersal. In: McEdward LR (ed) Ecology of marine
invertebrate larvae. CRC Press, Boca Raton, pp 389–412
Gastner MT, Newman MEJ (2006) The spatial structure of
networks. Eur Phys J B 49:247–252
Gay SL, Andrews JC (1994) The effects of recruitment strategies
on coral larvae settlement distributions at Helix
Reef. In: Sammarco PW, Heron ML (eds) The bio-physics
of marine larval dispersal. American Geophysical Union,
Washington, DC, pp 73–88
Gaylord B, Gaines SD (2000) Temperature or transport? Range
limits in marine species mediated solely by flow. Am Nat
155:769–789
Gerber LR, Botsford LW, Hastings A et al (2003) Population
models for marine reserve design: a retrospective and
prospective synthesis. Ecol Appl 13:S47–S64
Gilg MR, Hilbish TJ (2003) The geography of marine larval
dispersal: coupling genetics with fine-scale physical
oceanography. Ecology 84:2989–2998
Glynn PW, Ault JS (2000) A biogeographic analysis and
review of the far eastern Pacific coral reef region. Coral
Reefs 19:1–23
Grantham BA, Eckert GL, Shanks AL (2003) Dispersal
potential of marine invertebrates in diverse habitats. Ecol
Appl 13:S108–S116
Guichard F, Levin SA, Hastings A et al (2004) Toward a
dynamic metacommunity approach to marine reserve
theory. Bioscience 54:1003–1011
Halpern BS, Warner RR (2002) Marine reserves have rapid and
lasting effects. Ecol Lett 5:361–366
Hare JA, Quinlan JA, Werner FE et al (

[ edit ]
Marcel Kansy, 13.07.2012, Sommersemester 2012

Detecting breakdown points in metabolic networks
Background: A complex network of biochemical reactions present in an organism generates various biological moieties necessary for its survival. It is seen that biological systems are robust to genetic and environmental changes at all levels of organization. Functions of various organisms are sustained against
mutational changes by using alternative pathways. It is also seen that if any one of the paths for production of the same metabolite is hampered, an alternate path tries to overcome this defect and helps in combating the damage.
Methodology: Certain physical, chemical or genetic change in any of the precursor substrate of a biochemical reaction may damage the production of the ultimate product. We employ a quantitative approach for simulating this phenomena of causing a physical change in the biochemical reactions by performing external perturbations to 12 metabolic pathways under carbohydrate metabolism in Saccharomyces cerevisae as well as 14 metabolic pathways under carbohydrate metabolism in Homo sapiens. Here, we investigate the relationship between structure and degree of compatibility of metabolites against external
perturbations, i.e., robustness. Robustness can also be further used to identify the extent to which a metabolic pathway can resist a mutation event. Biological networks with a certain connectivity distribution may be very resilient to a particular attack but not to another. The goal of this work is to determine the exact boundary of network breakdown due to both random and targeted attack, thereby analyzing its robustness. We also find that compared to various non-standard models, metabolic networks are exceptionally robust. Here, we report the use of a ‘Resilience-based’ score for enumerating the concept of ‘network-breakdown’. We also use this approach for analyzing metabolite essentiality providing insight into cellular robustness that can be further used for future drug development.
Results: We have investigated the behavior of metabolic pathways under carbohydrate metabolism in S. cerevisae and H. sapiens against random and targeted attack. Both random as well as targeted resilience were calculated by formulating a measure, that we termed as ‘Resilience score’.
Datasets of metabolites were collected for 12 metabolic pathways belonging to carbohydrate metabolism in S. cerevisae and 14 metabolic pathways belonging to carbohydrate metabolism in H. sapiens from Kyoto Encyclopedia for Genes and Genomes (KEGG).

[ edit ]
Christoph Krell, 29.06.2012, Sommersemester 2012

Matching Index of Uncertain Graph: Concept and Algorithm
In practical applications of graph theory, there is no doubt that some uncertain factors may
appear in graphs. This paper employs the uncertainty theory to deal with uncertain factors in uncertain
graph. Matching index and perfect matching index of uncertain graph are proposed. Some properties of
the matching index are discussed. Furthermore, we give an algorithm to calculate the matching index of
uncertain graph.

http://www.orsc.edu.cn/online/120602.pdf

[ edit ]
Konrad Abicht, 29.06.2012, Sommersemester 2012

Link communities reveal multiscale complexity in networks
Networks have become a key approach to understanding systems
of interacting objects, unifying the study of diverse phenomena
including biological organisms and human society1–3. One crucial
step when studying the structure and dynamics of networks is to
identify communities4,5: groups of related nodes that correspond
to functional subunits such as protein complexes6,7 or social
spheres8–10. Communities in networks often overlap9,10 such that
nodes simultaneously belong to several groups. Meanwhile, many
networks are known to possess hierarchical organization, where
communities are recursively grouped into a hierarchical struc-
ture11–13. However, the fact that many real networks have com-
munities with pervasive overlap, where each and every node
belongs to more than one group, has the consequence that a global
hierarchy of nodes cannot capture the relationships between over-
lapping groups. Here we reinvent communities as groups of links
rather than nodes and show that this unorthodox approach suc-
cessfully reconciles the antagonistic organizing principles of over-
lapping communities and hierarchy. In contrast to the existing
literature, which has entirely focused on grouping nodes, link
communities naturally incorporate overlap while revealing hier-
archical organization. We find relevant link communities in many
networks, including major biological networks such as protein–
protein interaction6,7,14 and metabolic networks11,15,16, and show
that a large social network10,17,18 contains hierarchically organized
community structures spanning inner-city to regional scales while
maintaining pervasive overlap. Our results imply that link com-
munities are fundamental building blocks that reveal overlap and
hierarchical organization in networks to be two aspects of the
same phenomenon.

[1] Link communities reveal multiscale complexity in networks, Yong-Yeol Ahn, James P. Bagrow & Sune Lehmann

[ edit ]
Tony Mey, 13.07.2012, Sommersemester 2012

Network Centrality in the Human Functional Connectome
The network architecture of functional connectivity within the human
brain connectome is poorly understood at the voxel level. Here, using
resting state functional magnetic resonance imaging data from 1003
healthy adults, we investigate a broad array of network centrality
measures to provide novel insights into connectivity within the
whole-brain functional network (i.e., the functional connectome).We
first assemble and visualize the voxel-wise (4 mm) functional
connectome as a functional network. We then demonstrate that each
centrality measure captures different aspects of connectivity,
highlighting the importance of considering both global and local
connectivity properties of the functional connectome. Beyond
‘‘detecting functional hubs,’’ we treat centrality as measures of
functional connectivity within the brain connectome and demonstrate
their reliability and phenotypic correlates (i.e., age and sex).
Specifically, our analyses reveal age-related decreases in degree
centrality, but not eigenvector centrality, within precuneus and
posterior cingulate regions. This implies that while local or (direct)
connectivity decreases with age, connections with hub-like regions
within the brain remain stable with age at a global level. In sum, these
findings demonstrate the nonredundancy of various centrality
measures and raise questions regarding their underlying physiological
mechanisms that may be relevant to the study of neurodegenerative
and psychiatric disorders.

http://www.medlive.cn/uploadfile/2011/1011/20111011044407894.pdf

[ edit ]
Didier Cherix, 12.07.2012, Sommersemester 2012

Characterization of the anterior cingulate
Um die Veraenderungen im Gehirn bei Patienten mit einem erhoeten Risiko zur Schizophrenie wurden Bilder mittels fMRT aufgenommen. Von den Bildern werden Graphen hergestellt und mehrere Zentralitaetsmassen berechnet, um signifikante Unterschiede herauszufinden.

http://dx.doi.org/10.1016/j.neuroimage.2011.02.012

[ edit ]
Lisa Falkowski, 29.06.2012, Sommersemester 2012

New insights into RNA secondary structure in the alternative splicing of pre-mRNAs.
Alternative splicing is an important mechanism in generating proteomic diversity, and RNA secondary structure is an important element in splicing regulation. The use of high-throughput sequencing and other approaches has increased the number of known pre-mRNA secondary structures by several orders of magnitude, and we now have new insights into the role of RNA secondary structure in alternative splicing and the mechanisms involved (e.g., physical competition, long-range RNA pairing, the structural splicing code, and co-transcriptional splicing). Furthermore, an RNA pairing-based mechanism ensures the selection of only one of several available exons (e.g., Dscam splicing). Here we review several recent discoveries related to the role of RNA secondary structure in alternative splicing and the underlying mechanisms.

[ edit ]
Katharina Theuerkorn, 29.06.2012, Sommersemester 2012

Exploring hierarchical and overlapping modular structure in the yeast protein interaction network
Background: Developing effective strategies to reveal modular structures in protein interaction networks is crucial
for better understanding of molecular mechanisms of underlying biological processes. In this paper, we propose a
new density-based algorithm (ADHOC) for clustering vertices of a protein interaction network using a novel
subgraph density measurement.
Results: By statistically evaluating several independent criteria, we found that ADHOC could significantly improve
the outcome as compared with five previously reported density-dependent methods. We further applied ADHOC
to investigate the hierarchical and overlapping modular structure in the yeast PPI network. Our method could
effectively detect both protein modules and the overlaps between them, and thus greatly promote the precise
prediction of protein functions. Moreover, by further assaying the intermodule layer of the yeast PPI network, we
classified hubs into two types, module hubs and inter-module hubs. Each type presents distinct characteristics both
in network topology and biological functions, which could conduce to the better understanding of relationship
between network architecture and biological implications.
Conclusions: Our proposed algorithm based on the novel subgraph density measurement makes it possible to
more precisely detect hierarchical and overlapping modular structures in protein interaction networks. In addition,
our method also shows a strong robustness against the noise in network, which is quite critical for analyzing such
a high noise network.

[ edit ]
Sascha Ludwig, 29.06.2012, Sommersemester 2012

Inferring Boolean network structure via correlation
Motivation: Accurate, context-specific regulation of gene
expression is essential for all organisms. Accordingly, it is very
important to understand the complex relations within cellular gene
regulatory networks. A tool to describe and analyze the behavior
of such networks are Boolean models. The reconstruction of a
Boolean network from biological data requires identification of
dependencies within the network. This task becomes increasingly
computationally demanding with large amounts of data created by
recent high-throughput technologies. Thus, we developed a method
that is especially suited for network structure reconstruction from
large-scale data. In our approach, we took advantage of the fact that
a specific transcription factor often will consistently either activate
or inhibit a specific target gene, and this kind of regulatory behavior
can be modeled using monotone functions.
Results: To detect regulatory dependencies in a network, we
examined how the expression of different genes correlates to
successive network states. For this purpose, we used Pearson
correlation as an elementary correlation measure. Given a Boolean
network containing only monotone Boolean functions, we prove that
the correlation of successive states can identify the dependencies
in the network. This method not only finds dependencies in
randomly created artificial networks to very high percentage, but also
reconstructed large fractions of both a published Escherichia coli
regulatory network from simulated data and a yeast cell cycle
network from real microarray data.

Wunschtermin für Präsentation: Montag, 09.07.

[ edit ]
Jan Rüdiger, 29.06.2012, Sommersemester 2012

Protein Docking by the Interface Structure Similarity: How Much Structure Is Needed?
The increasing availability of co-crystallized protein-protein complexes provides an opportunity to use template-based modeling for protein-protein docking. Structure alignment techniques are useful in detection of remote target-template similarities. The size of the structure involved in the alignment is important for the success in modeling. This paper describes a systematic large-scale study to find the optimal definition/size of the interfaces for the structure alignment-based docking applications. The results showed that structural areas corresponding to the cutoff values ,12 A° across the interface inadequately represent structural details of the interfaces. With the increase of the cutoff beyond 12 A°, the success rate for the benchmark set of 99 protein complexes, did not increase significantly for higher accuracy models, and decreased for lower-accuracy models. The 12 A° cutoff was optimal in our interface alignment-based docking, and a likely best choice for the large-scale (e.g., on the scale of the entire genome) applications to protein interaction networks. The results provide guidelines for the docking approaches, including high-throughput applications to modeled structures.

http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0031349

[ edit ]
Fabian Externbrink, 29.06.2012, Sommersemester 2012

Efficient RNA pairwise structure comparison by SETTER method.
Motivation: Understanding the architecture and function of RNA molecules requires methods for comparing and analyzing their 3D structures. While a structural alignment of short RNAs is achievable in a reasonable amount of time, large structures represent much bigger challenge. However the growth of the number of large RNAs deposited in the PDB database calls for the development of fast and accurate methods for analyzing their structures, as well as for rapid similarity searches in databases.
Results: In this article a novel algorithm for an RNA structural comparison SETTER (SEcondary sTructure-based TERtiary Structure Similarity Algorithm) is introduced. SETTER utilizes a pairwise comparison method based on 3D similarity of the so-called generalized secondary structure units (GSSU). For each pair of structures, SETTER produces a distance score and an indication of its statistical significance. SETTER can be used both for the structural alignments of structures that are already known to be homologous, as well as for 3D structure similarity searches and functional annotation. The algorithm presented is both accurate and fast and does not impose limits on the size of aligned RNA structures.

[ edit ]
Tobias Mede, 29.06.2012, Sommersemester 2012

GraphClust: alignment-free structural clustering of local RNA secondary structures
ABSTRACT
Motivation: Clustering according to sequence–structure similarity
has now become a generally accepted scheme for ncRNA
annotation. Its application to complete genomic sequences as well
as whole transcriptomes is therefore desirable but hindered by
extremely high computational costs.
Results: We present a novel linear-time, alignment-free method
for comparing and clustering RNAs according to sequence and
structure. The approach scales to datasets of hundreds of thousands
of sequences. The quality of the retrieved clusters has been
benchmarked against known ncRNA datasets and is comparable
to state-of-the-art sequence–structure methods although achieving
speedups of several orders of magnitude. A selection of applications
aiming at the detection of novel structural ncRNAs are presented.
Exemplarily, we predicted local structural elements specific to
lincRNAs likely functionally associating involved transcripts to vital
processes of the human nervous system. In total, we predicted 349
local structural RNA elements.
Availability: The GraphClust pipeline is available on request.
Contact: backofen@informatik.uni-freiburg.de
Supplementary information: Supplementary data are available at
Bioinformatics online.

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3371856/pdf/bts224.pdf

[ edit ]
Francine Klausnitzer, 29.06.2012, Sommersemester 2012

A new protein-ligand binding sites prediction method based on the integration of protein sequence conservation informati
Background: Prediction of protein-ligand binding sites is an important issue for protein function annotation and
structure-based drug design. Nowadays, although many computational methods for ligand-binding prediction have
been developed, there is still a demanding to improve the prediction accuracy and efficiency. In addition, most of
these methods are purely geometry-based, if the prediction methods improvement could be succeeded by
integrating physicochemical or sequence properties of protein-ligand binding, it may also be more helpful to
address the biological question in such studies.
Results: In our study, in order to investigate the contribution of sequence conservation in binding sites prediction
and to make up the insufficiencies in purely geometry based methods, a simple yet efficient protein-binding sites
prediction algorithm is presented, based on the geometry-based cavity identification integrated with sequence
conservation information. Our method was compared with the other three classical tools: PocketPicker, SURFNET,
and PASS, and evaluated on an existing comprehensive dataset of 210 non-redundant protein-ligand complexes.
The results demonstrate that our approach correctly predicted the binding sites in 59% and 75% of cases among
the TOP1 candidates and TOP3 candidates in the ranking list, respectively, which performs better than those of
SURFNET and PASS, and achieves generally a slight better performance with PocketPicker.
Conclusions: Our work has successfully indicated the importance of the sequence conservation information in
binding sites prediction as well as provided a more accurate way for binding sites identification.

[ edit ]
Lisa Falkowski, 03.02.2012, Wintersemester 2011/2012

Histone exchange and histone modifications during transcription and aging.
The organization of the eukaryotic genome into chromatin enables DNA to fit inside the nucleus while also regulating the access of proteins to the DNA to facilitate genomic functions such as transcription, replication and repair. The basic repeating unit of chromatin is the nucleosome, which includes 147bp of DNA wrapped 1.65 times around an octamer of core histone proteins comprising two molecules each of H2A, H2B, H3 and H4 [1]. Each nucleosome is a highly stable unit, being maintained by over 120 direct protein-DNA interactions and several hundred water mediated ones [1]. Accordingly, there is considerable interest in understanding how processive enzymes such as RNA polymerases manage to pass along the coding regions of our genes that are tightly packaged into arrays of nucleosomes. Here we present the current mechanistic understanding of this process and the evidence for profound changes in chromatin dynamics during aging. This article is part of a Special Issue entitled: Histone chaperones and Chromatin assembly.

[ edit ]
Stefanie Heidenreich, 03.02.2012, Wintersemester 2011/2012

Histone methylation makes its mark on longevity
How long organisms live is not entirely written in their
genes. Recent findings reveal that epigenetic factors that
regulate histone methylation, a type of chromatin modification,
can affect lifespan. The reversible nature of
chromatin modifications suggests that therapeutic targeting
of chromatin regulators could be used to extend
lifespan and healthspan. This review describes the epigenetic
regulation of lifespan in diverse model organisms,
focusing on the role and mode of action of
chromatin regulators that affect two epigenetic marks,
trimethylated lysine 4 of histone H3 (H3K4me3) and
trimethylated lysine 27 of histone H3 (H3K27me3), in
longevity.

Greer, E.L. et al. (2010) Members of the H3K4 trimethylation complex
regulate lifespan in a germline-dependent manner in C. elegans.
Nature 466, 383–387

Kenyon, C.J. (2010) The genetics of ageing. Nature 464, 504–512

[ edit ]
Linda Arnold, 03.02.2012, Wintersemester 2011/2012

On the Connection between RNAi and Heterochromatin at Centromeres
RNA interference (RNAi) is a conserved silencing mechanism whereby double-strand RNA induces specific down-regulation
of homologous sequences. In the fission yeast Schizosaccharomyces pombe, centromeric heterochromatin assembly is an
RNAi-dependent process. Noncoding RNAs transcribed from pericentromeric repeat sequences are processed into short interfering
RNAs (siRNAs) that direct the Argonaute-containing RNA-induced transcriptional silencing (RITS) effector complex
to homologous nascent transcripts. RITS is required for H3K9 methylation by the histone methyltransferase (HMT) Clr4;
conversely, H3K9 methylation can attract RITS to chromatin via binding of the chromodomain protein Chp1. This codependency
has hampered dissection of the order of events and mechanisms of cross talk between the RNAi and chromatin modification
machineries. To tackle this problem, we have developed systems that reconstitute heterochromatin at a euchromatic
locus, using either hairpin triggers or DNA-tethered chromatin-modifying complexes. These systems reveal that RNAi is sufficient
to promote heterochromatin assembly in cis and that direct recruitment of the HMT Clr4 can bypass the role of RNAi
in heterochromatin assembly. We have also characterized a new pathway component, Stc1, that translates the RNAi signal
into chromatin marks. We discuss the implications of these findings for our understanding of the mechanism and function of
RNAi-directed heterochromatin assembly at centromeres.

[ edit ]
Henrike Indrischek, 30.01.2012, Wintersemester 2011/2012

Genomic characterization reveals a simple histone H4 acetylation code
The histone code hypothesis holds that covalent posttranslational modifications of histone tails are interpreted by the cell to yield a rich combinatorial transcriptional output. This hypothesis has been the subject of active debate in the literature. Here, we investigated the combinatorial complexity of the acetylation code at the four lysine residues of the histone H4 tail in budding yeast. We constructed yeast strains carrying all 15 possible combinations of mutations among lysines 5, 8, 12, and 16 to arginine in the histone H4 tail, mimicking positively charged, unacetylated lysine states, and characterized the resulting genome-wide changes in gene expression by using DNA microarrays. Only the lysine 16 mutation had specific transcriptional consequences independent of the mutational state of the other lysines (affecting approximately 100 genes). In contrast, for lysines 5, 8, and 12, expression changes were due to nonspecific, cumulative effects seen as increased transcription correlating with an increase in the total number of mutations (affecting approximately 1,200 genes). Thus, acetylation of histone H4 is interpreted by two mechanisms: a specific mechanism for lysine 16 and a nonspecific, cumulative mechanism for lysines 5, 8, and 12.

Steven Henikoff: Histone modifications: Combinatorial complexity or cumulative simplicity?

[ edit ]
Juliane Meißner, 03.02.2012, Wintersemester 2011/2012

Genome Digging: Insight into the Mitochondrial Genome of Homo
Abstract
Background: A fraction of the Neanderthal mitochondrial genome sequence has a similarity with a 5,839-bp nuclear DNA
sequence of mitochondrial origin (numt) on the human chromosome 1. This fact has never been interpreted. Although this
phenomenon may be attributed to contamination and mosaic assembly of Neanderthal mtDNA from short sequencing
reads, we explain the mysterious similarity by integration of this numt (mtAncestor-1) into the nuclear genome of the
common ancestor of Neanderthals and modern humans not long before their reproductive split.
Principal Findings: Exploiting bioinformatics, we uncovered an additional numt (mtAncestor-2) with a high similarity to the
Neanderthal mtDNA and indicated that both numts represent almost identical replicas of the mtDNA sequences ancestral to
the mitochondrial genomes of Neanderthals and modern humans. In the proteins, encoded by mtDNA, the majority of
amino acids distinguishing chimpanzees from humans and Neanderthals were acquired by the ancestral hominins. The
overall rate of nonsynonymous evolution in Neanderthal mitochondrial protein-coding genes is not higher than in other
lineages. The model incorporating the ancestral hominin mtDNA sequences estimates the average divergence age of the
mtDNAs of Neanderthals and modern humans to be 450,000–485,000 years. The mtAncestor-1 and mtAncestor-2 sequences
were incorporated into the nuclear genome approximately 620,000 years and 2,885,000 years ago, respectively.
Conclusions: This study provides the first insight into the evolution of the mitochondrial DNA in hominins ancestral to
Neanderthals and humans. We hypothesize that mtAncestor-1 and mtAncestor-2 are likely to be molecular fossils of the
mtDNAs of Homo heidelbergensis and a stem Homo lineage. The dN/dS dynamics suggests that the effective population size
of extinct hominins was low. However, the hominin lineage ancestral to humans, Neanderthals and H. heidelbergensis, had a
larger effective population size and possessed genetic diversity comparable with those of chimpanzee and gorilla.

[ edit ]
Vera Lede, 03.02.2012, Wintersemester 2011/2012

Evolutionary Origins of Transcription Factor Binding Site Clusters
Abstract
Empirical studies have revealed that regulatory DNA sequences such as enhancers or promoters often harbor multiple
binding sites for the same transcription factor. Such ‘‘homotypic site clustering’’ has been hypothesized as arising out of
functional requirements of the sequences. Here, we propose an alternative explanation of this phenomenon that multisite
enhancers are common because they are favored by evolutionary sampling of the genotype–phenotype landscape. To test
this hypothesis, we developed a new computational framework specialized for population genetic simulations of enhancer
evolution. It uses a thermodynamics-based model of enhancer function, integrating information from strong as well as
weak binding sites, to determine the strength of selection. Using this framework, we found that even when simpler
genotypes exist for a desired strength of regulation, relatively complex genotypes (enhancers with more sites) are more
readily reached by the simulated evolutionary process. We show that there are more ways to ‘‘build’’ a fit genotype with
many weak sites than with a few strong sites, and this is why evolution finds complex genotypes more often. Our claims
are consistent with an empirical analysis of binding site content in enhancers characterized in Drosophila melanogaster and
their orthologs in other Drosophila species. We also characterized a subtle but significant difference between genotypes
likely to be sampled by evolution and equally fit genotypes one would obtain by uniform sampling of the fitness landscape,
that is, an ‘‘evolutionary signature’’ in enhancer sequences. Finally, we investigated potential effects of other factors, such as
rugged fitness landscapes, short local duplications, and noise characteristics of enhancers, on the emergence of homotypic
site clustering.
Homotypic site clustering is an important contributor to the complexity and function of cis-regulatory sequences. This
work provides a simple null hypothesis for its origin, against which alternative adaptationist explanations may be
evaluated, and cautions against ‘‘evolutionary mirages’’ present in common features of genomic sequence. The quantitative
framework we develop here can be used more generally to understand how mechanisms of enhancer action influence their
composition and evolution.

[ edit ]
Christian Sonnendecker, 30.01.2012, Wintersemester 2011/2012

Computational approaches toward the design of pools for the in vitro selection of complex aptamers
It is well known that using random RNA/DNA sequences for SELEX experiments will generally yield low-complexity structures. Early experimental results suggest that having a structurally diverse library, which, for instance, includes high-order junctions, may prove useful in finding new functional motifs. Here, we develop two computational methods to generate sequences that exhibit higher structural complexity and can be used to increase the overall structural diversity of initial pools for in vitro selection experiments. Random Filtering selectively increases the number of five-way junctions in RNA/DNA pools, and Genetic Filtering designs RNA/DNA pools to a specified structure distribution, whether uniform or otherwise. We show that using our computationally designed DNA pool greatly improves access to highly complex sequence structures for SELEX experiments (without losing our ability to select for common one-way and two-way junction sequences).

Computational approaches toward the design of pools for the in vitro selection of complex aptamers.

Luo X, McKeague M, Pitre S, Dumontier M, Green J, Golshani A, Derosa MC, Dehne F.

RNA. 2010 Nov;16(11):2252-62. Epub 2010 Sep 24.

PMID:
20870801

[ edit ]
Caroline Wilde, 30.01.2012, Wintersemester 2011/2012

Defining an epigenetic code
The nucleosome surface is decorated with an array of enzyme-catalysed modifications on histone tails. These modifications have well-defined roles in a variety of ongoing chromatin functions, often by acting as receptors for non-histone proteins, but their longer-term effects are less clear. Here, an attempt is made to define how histone modifications operate as part of a predictive and heritable epigenetic code that specifies patterns of gene expression through differentiation and development.

[ edit ]
Falko Altenkirch, 03.02.2012, Wintersemester 2011/2012

De novo assembly of human genomes with massively parallel short read sequencing
Next-generation massively parallel DNA sequencing technologies provide ultrahigh throughput at a substantially lower unit data cost; however, the data are very short read length sequences, making de novo assembly extremely challenging. Here, we describe a novel method for de novo assembly of large genomes from short read sequences. We successfully assembled both the Asian and African human genome sequences, achieving an N50 contig size of 7.4 and 5.9 kilobases (kb) and scaffold of 446.3 and 61.9 kb, respectively. The development of this de novo short read assembly method creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way.

[ edit ]
Sabina Kanton, 03.02.2012, Wintersemester 2011/2012

An enhanced RNA alignment benchmark for sequence alignment programs
Background

The performance of alignment programs is traditionally tested on sets of protein sequences, of which a reference alignment is known. Conclusions drawn from such protein benchmarks do not necessarily hold for the RNA alignment problem, as was demonstrated in the first RNA alignment benchmark published so far. For example, the twilight zone – the similarity range where alignment quality drops drastically – starts at 60 percent for RNAs in comparison to 20 percent for proteins. In this study we enhance the previous benchmark.

Results

The RNA sequence sets in the benchmark database are taken from an increased number of RNA families to avoid unintended impact by using only a few families. The size of sets varies from 2 to 15 sequences to assess the influence of the number of sequences on program performance. Alignment quality is scored by two measures: one takes into account only nucleotide matches, the other measures structural conservation. The performance order of parameters – like nucleotide substitution matrices and gap-costs – as well as of programs is rated by rank tests.

Conclusion

Most sequence alignment programs perform equally well on RNA sequence sets with high sequence identity, that is with an average pairwise sequence identity (APSI) above 75 percent. Parameters for gap-open and gap-extension have a large influence on alignment quality lower than APSI 75 percent; optimal parameter combinations are shown for several programs. The use of different 4 × 4 substitution matrices improved program performance only in some cases. The performance of iterative programs drastically increases with increasing sequence numbers and/or decreasing sequence identity, which makes them clearly superior to programs using a purely non-iterative, progressive approach. The best sequence alignment programs produce alignments of high quality down to APSI higher than 55 percent; at lower APSI the use of sequence+structure alignment programs is recommended.

[ edit ]
Toni Förster, 30.01.2012, Wintersemester 2011/2012

Metabolic flux analysis
One of the ultimate goals of systems biology
research is to obtain a comprehensive understanding of the
control mechanisms of complex cellular metabolisms. Metabolic
Flux Analysis (MFA) is a important method for the
quantitative estimation of intracellular metabolic flows through
metabolic pathways and the elucidation of cellular physiology.
The primary challenge in the use of MFA is that many biological
networks are underdetermined systems; it is therefore difficult
to narrow down the solution space from the stoichiometric
constraints alone. In this tutorial, we present an overview of Flux
Balance Analysis (FBA) and 13C-Metabolic Flux Analysis (13CMFA),
both of which are frequently used to solve such underdetermined
systems, and we demonstrate FBA and 13C-MFA using the genome-scale model and the central carbon metabolism model, respectively. Furthermore, because such comprehensive study of intracellular fluxes is inherently complex, we subsequently introduce various pathway mapping and visualization tools to facilitate understanding of these data in the context of the pathways.

Yoshihiro Toya, Nobuaki Kono, Kazuharu Arakawa and Masaru Tomita (2011) Metabolic Flux Analysis and Visualization. Journal of proteome research 10: 3313-3323

[ edit ]
Fabian Externbrink, 03.02.2012, Wintersemester 2011/2012

MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.
A multiple sequence alignment program, MAFFT, has been developed. The CPU time is drastically reduced as compared with existing methods. MAFFT includes two novel techniques. (i) Homo logous regions are rapidly identified by the fast Fourier transform (FFT), in which an amino acid sequence is converted to a sequence composed of volume and polarity values of each amino acid residue. (ii) We propose a simplified scoring system that performs well for reducing CPU time and increasing the accuracy of alignments even for sequences having large insertions or extensions as well as distantly related sequences of similar length. Two different heuristics, the progressive method (FFT-NS-2) and the iterative refinement method (FFT-NS-i), are implemented in MAFFT. The performances of FFT-NS-2 and FFT-NS-i were compared with other methods by computer simulations and benchmark tests; the CPU time of FFT-NS-2 is drastically reduced as compared with CLUSTALW with comparable accuracy. FFT-NS-i is over 100 times faster than T-COFFEE, when the number of input sequences exceeds 60, without sacrificing the accuracy.

[ edit ]
Ying-Chi Lin, 03.02.2012, Wintersemester 2011/2012

Maximally Efficient Modeling of DNA Sequence Motifs at All Levels of Complexity
Identification of transcription factor binding sites is necessary for deciphering gene regulatory networks.
Several new methods provide extensive data about the specificity of transcription factors but most methods for analyzing these data to obtain specificity models are limited in scope by, for example, assuming additive interactions or are inefficient in their exploration of more complex models. This article describes an approach—encoding of DNA sequences as the vertices of a regular simplex—that allows simultaneous direct comparison of simple and complex models, with higher-order parameters fit to the residuals of lower-order models. In addition to providing an efficient assessment of all model parameters, this approach can yield valuable insight into the mechanism of binding by highlighting features that are critical to accurate models.

Gary D. Stormo (2011) Maximally Efficient Modeling of DNA Sequence Motifs at All Levels of Complexity. Genetics 187(4): 1219-1224.

[ edit ]
Jan Engelhardt, 04.07.2011, Sommersemester 2011

The Role of RNA Sequence and Structure in RNA-Protein Interactions.
We investigate the sequence and structural properties of RNA-protein interaction sites in 211 RNA-protein chain pairs, the largest set of RNA-protein complexes analyzed to date. Statistical analysis confirms and extends earlier analyses made on smaller data sets. There are 24.6% of hydrogen bonds between RNA and protein that are nucleobase specific, indicating the importance of both nucleobase-specific and -nonspecific interactions. While there is no significant difference between RNA base frequencies in protein-binding and non-binding regions, distinct preferences for RNA bases, RNA structural states, protein residues, and protein secondary structure emerge when nucleobase-specific and -nonspecific interactions are considered separately. Guanine nucleobase and unpaired RNA structural states are significantly preferred in nucleobase-specific interactions; however, nonspecific interactions disfavor guanine, while still favoring unpaired RNA structural states. The opposite preferences of nucleobase-specific and -nonspecific interactions for guanine may explain discrepancies between earlier studies with regard to base preferences in RNA-protein interaction regions. Preferences for amino acid residues differ significantly between nucleobase-specific and -nonspecific interactions, with nonspecific interactions showing the expected bias towards positively charged residues. Irregular protein structures are strongly favored in interactions with the protein backbone, whereas there is little preference for specific protein secondary structure in either nucleobase-specific interaction or -nonspecific interaction. Overall, this study shows strong preferences for both RNA bases and RNA structural states in protein-RNA interactions, indicating their mutual importance in protein recognition.

PMID: 21514302

[ edit ]
Christoph Kaempf, 08.07.2011, Sommersemester 2011

A greedy, graph-based algorithm for the alignment of multiple homologous gene lists.
MOTIVATION:

Many comparative genomics studies rely on the correct identification of homologous genomic regions using accurate alignment tools. In such case, the alphabet of the input sequences consists of complete genes, rather than nucleotides or amino acids. As optimal multiple sequence alignment is computationally impractical, a progressive alignment strategy is often employed. However, such an approach is susceptible to the propagation of alignment errors in early pairwise alignment steps, especially when dealing with strongly diverged genomic regions. In this article, we present a novel accurate and efficient greedy, graph-based algorithm for the alignment of multiple homologous genomic segments, represented as ordered gene lists.
RESULTS:

Based on provable properties of the graph structure, several heuristics are developed to resolve local alignment conflicts that occur due to gene duplication and/or rearrangement events on the different genomic segments. The performance of the algorithm is assessed by comparing the alignment results of homologous genomic segments in Arabidopsis thaliana to those obtained by using both a progressive alignment method and an earlier graph-based implementation. Especially for datasets that contain strongly diverged segments, the proposed method achieves a substantially higher alignment accuracy, and proves to be sufficiently fast for large datasets including a few dozens of eukaryotic genomes.

[ edit ]
Markus Mueller, 08.07.2011, Sommersemester 2011

Sequence assembly
Despite the rapidly increasing number of sequenced and re-sequenced genomes, many issues regarding the computational
assembly of large-scale sequencing data have remain unresolved. Computational assembly is crucial in large genome
projects as well for the evolving high-throughput technologies and plays an important role in processing the information
generated by these methods. Here, we provide a comprehensive overview of the current publicly available sequence assembly
programs. We describe the basic principles of computational assembly along with the main concerns, such as
repetitive sequences in genomic DNA, highly expressed genes and alternative transcripts in EST sequences.

[1] K. Liolios, K. Mavromatis, N. Tavernarakis, N. Kyrpides, The genomes on line database (gold) in 2007: status of
genomic and metagenomic projects and their associated metadata., Nucleic Acids Res 36 (Database Issue) (2008)
D475–9.
[2] F. Sanger, G. Air, B. Barrell, N. Brown, A. Coulson, C. Fiddes, C. Hutchison, P. Slocombe, M. Smith, Nucliotide
sequence of bacteriophage phi X174 DNA., Nature 265 (5596) (1977) 687–95.
[3] F. Sanger, A. Coulson, T. Friedmann, G. Air, B. Barrell, N. Brown, J. Fiddes, C. r. Hutchison, P. Slocombe,
M. Smith, The nucleotide sequence of bacteriophage phiX174., J Mol Biol 125 (2) (1978) 225–46.
[4] F. Sanger, A. Coulson, G. Hong, D. Hill, G. Petersen, Nucleotide sequence of bacteriophage lambda DNA., J Mol
Biol 162 (4) (1982) 729–73.
16
[5] W. Fiers, R. Contreras, G. Haegemann, R. Rogiers, A. Van de Voorde, H. Van Heuverswyn, J. Van Herreweghe,
G. Volckaert, M. Ysebaert, Complete nucleotide sequence of SV40 DNA., Nature 273 (5658) (1978) 113–20.
[6] S. Anderson, A. Bankier, B. Barrell, M. de Bruijn, A. Coulson, J. Drouin, I. Eperon, D. Nierlich, B. Roe, F. Sanger,
et al., Sequence and organization of the human mitochondrial genome., Nature 290 (5806) (1981) 457–65.
[7] S. Anderson, Shotgun DNA sequencing using cloned DNase I-generated fragments., Nucleic Acids Res 9 (13)
(1981) 3015–27.
[8] P. Deininger, Random subcloning of sonicated DNA: application to shotgun DNA sequence analysis., Anal
Biochem 129 (1) (1983) 216–23.
[9] A. Edwards, H. Voss, P. Rice, A. Civitello, J. Stegemann, C. Schwager, J. Zimmermann, H. Erfle, C. Caskey,
W. Ansorge, Automated DNA sequencing of the human HPRT locus., Genomics 6 (4) (1990) 593–608.
[10] R. Wooster, Identification of the breast cancer susceptibility gene BRCA2., Nature 378 (1995) 789–92.
[11] R. Fleischmann, M. Adams, O. White, R. Clayton, E. Kirkness, A. Kerlavage, C. Bult, J. Tomb, B. Dougherty,
J. Merrick, et al.;, Whole-genome random sequencing and assembly of Haemophilus influenzae Rd., Science
269 (5223) (1995) 496–512.
[12] M. Adams, J. Kelley, J. Gocayne,M. Dubnick,M. Polymeropoulos,H. Xiao, C.Merril, A.Wu, B. Olde, R.Moreno,
et al, Complementary DNA sequencing: expressed sequence tags and human genome project., Science. 252 (5013)
(1991) 1651–6.
[13] A. Christoffels, A. van Gelder, G. Greyling, R. Miller, T. Hide, W. Hide, STACK: Sequence Tag Alignment and
Consensus Knowledgebase., Nucleic Acids Res 29 (1) (2001) 234–8.
[14] M. Boguski, The turning point in genome research., Trends Biochem Sci. 20 (8) (1995) 295–6.
[15] M. Marra, L. Hillier, R. Waterston, Expressed sequence tags–ESTablishing bridges between genomes., Trends
Genet. 14 (1) (1998) 4–7.
[16] M. Adams, M. Dubnick, A. Kerlavage, R. Moreno, J. Kelley, T. Utterback, J. Nagle, C. Fields, J. Venter, Sequence
identification of 2,375 human brain genes., Nature. 355 (6361) (1992) 632–4.
[17] M. Adams, A. Kerlavage, C. Fields, J. Venter, 3,400 new expressed sequence tags identify diversity of transcripts
in human brain., Nat Genet. 4 (3) (1993) 256–67.
[18] T. Nakamura, G. Morin, K. Chapman, S. Weinrich, W. Andrews, J. Lingner, C. Harley, T. Cech, Telomerase
catalytic subunit homologs from fission yeast and human., Science. 277 (5328) (1997) 955–9.
[19] R. Medzhitov, P. Preston Hurlburt, C. J. Janeway, A human homologue of the Drosophila Toll protein signals
activation of adaptive immunity., Nature. 388 (6640) (1997) 394–7.
[20] F. Liang, I. Holt, G. Pertea, S. Karamycheva, S. Salzberg, J. Quackenbush, Gene index analysis of the human
genome estimates approximately 120,000 genes., Nat Genet. 25 (2) (2000) 239–40.
[21] T. Hudson, L. Stein, S. Gerety, J. Ma, A. Castle, J. Silva, D. Slonim, R. Baptista, L. Kruglyak, S. Xu, et al.;, An
STS-based map of the human genome., Science. 270 (5244) (1995) 1945–54.
[22] G. Schuler, M. Boguski, E. Stewart, L. Stein, G. Gyapay, K. Rice, R. White, P. Rodriguez Tome, A. Aggarwal,
E. Bajorek, et al., A gene map of the human genome., Science. 274 (5287) (1996) 540–6.
[23] P. Deloukas, G. Schuler, G. Gyapay, E. Beasley, C. Soderlund, P. Rodriguez Tome, L. Hui, T.Matise, K.McKusick,
Beckmann, et al., A physical map of 30,000 human genes., Science. 282 (5389) (1998) 744–6.
[24] R. Waterston, C. Martin, M. Craxton, C. Huynh, A. Coulson, L. Hillier, R. Durbin, P. Green, R. Shownkeen,
N. Halloran, et al.;, A survey of expressed genes in Caenorhabditis elegans., Nat Genet. 1 (2) (1992) 114–23.
[25] W. McCombie, M. Adams, J. Kelley, M. FitzGerald, T. Utterback, M. Khan, M. Dubnick, A. Kerlavage, J. Venter,
C. Fields, Caenorhabditis elegans expressed sequence tags identify gene families and potential disease gene
homologues., Nat Genet. 1 (2) (1992) 124–31.

[ edit ]
Stefan Schaffer, 08.07.2011, Sommersemester 2011

Colonization Process of the Brazilian Common Vesper Mouse, Calomys expulsus (Cricetidae, Sigmodontinae): A Biogeographic
Riverine barriers have been associated to genetic diversification and speciation of several taxa. The Rio Sa˜o Francisco is one
of the largest rivers in South America, representing the third largest river basin in Brazil and operating as a geographic
barrier to gene flow of different taxa. To evaluate the influence of the Rio Sa˜o Francisco in the speciation of small rodents,
we investigated the genetic structure of Calomys expulsus with phylogenetic and network analyses of cytochrome b DNA. Our
results suggested that C. expulsus can be divided into 3 subpopulations, 2 on the left and another one on the right bank of
this river. The time of divergence of these subpopulations, using a Bayesian framework, suggested colonization from the
south to the north/northeast. Spatial analysis using a clustering method and the Monmonier’s algorithm suggested that the
Rio Sa˜o Francisco is a biogeographic barrier to gene flow and indicated that this river may play a role in the incipient
speciation process of these subpopulations.

[ edit ]
Annegret Grimm, 04.07.2011, Sommersemester 2011

Spatiotemporal dynamics of prairie wetland networks: power-law scaling and implications for conservation planning
Abstract. Although habitat networks show promise for conservation planning at regional scales, their spatiotemporal dynamics have not been well studied, especially in climatesensitive landscapes. Here I use satellite remote sensing to compile wetland habitat networks from the Prairie Pothole Region (PPR) of North America. An ensemble of networks assembled across a hydrologic gradient from deluge to drought and a range of representative dispersal distances exhibits power-law scaling of important topological parameters. Prairie wetland networks are ‘‘meso-worlds’’ with mean topological distance increasing faster with network size than small-world networks, but slower than a regular lattice (or ‘‘large world’’). This scaling implies rapid dispersal through wetland networks without some of the risks associated with ‘‘small worlds’’ (e.g., extremely rapid propagation of disease or disturbance). Retrospective analysis of wetland networks establishes a climatic envelope for landscape connectivity in the PPR, where I show that a changing climate might severely impact metapopulation viability and restrict long-distance dispersal and range shifts. More generally, this study demonstrates an efficient approach to conservation planning at a level of abstraction addressing key drivers of the global biodiversity crisis: habitat fragmentation and climatic change.

CHRISTOPHER K. WRIGHT (2010): Spatiotemporal dynamics of prairie wetland networks: power-law scaling and implications for conservation planning. Ecology, 91(7): 1924–1930.
Dean L. Urban,* Emily S. Minor, Eric A. Treml and Robert S. Schick (2009): Graph models of habitat mosaics. Ecology Letters 12: 260–273.

[ edit ]
Belinda Kahnt, 04.07.2011, Sommersemester 2011

The social network structure of a wild meerkat population: 2. intragroup interactions
-study of network structure of three interaction forms: grooming, dominance interactions, foraging competition in 8 meerkat pop.
-investigation of:
A) variation of network structure between groups
B) relationship between networks for different interaction forms
C) influence of group attributes (size, sex ratio), individual attributes (tenure of dominants) and ecological factors (ectoparasite load) on network structure
- results: measures of network structure vary between groups and between interaction forms within a group
- ecological factors, group and individual attributes change network structure



[ edit ]
Alice De Mauro, 08.07.2011, Sommersemester 2011

Extending pathways and processes using molecular interaction networks to analyse cancer genome data
Background

Cellular processes and pathways, whose deregulation may contribute to the development of cancers, are often represented as cascades of proteins transmitting a signal from the cell surface to the nucleus. However, recent functional genomic experiments have identified thousands of interactions for the signalling canonical proteins, challenging the traditional view of pathways as independent functional entities. Combining information from pathway databases and interaction networks obtained from functional genomic experiments is therefore a promising strategy to obtain more robust pathway and process representations, facilitating the study of cancer-related pathways.
Results

We present a methodology for extending pre-defined protein sets representing cellular pathways and processes by mapping them onto a protein-protein interaction network, and extending them to include densely interconnected interaction partners. The added proteins display distinctive network topological features and molecular function annotations, and can be proposed as putative new components, and/or as regulators of the communication between the different cellular processes. Finally, these extended pathways and processes are used to analyse their enrichment in pancreatic mutated genes. Significant associations between mutated genes and certain processes are identified, enabling an analysis of the influence of previously non-annotated cancer mutated genes.
Conclusions

The proposed method for extending cellular pathways helps to explain the functions of cancer mutated genes by exploiting the synergies of canonical knowledge and large-scale interaction data.

http://www.biomedcentral.com/1471-2105/11/597

[ edit ]
michael siebauer, 17.07.2009, Sommersemester 2009

Protein Faltungs Simulationen - Review
-

Protein folding simulations: from coarse-grained model to all-atom model.
PMID: 19472192

[ edit ]
Tobias Mede, 17.07.2009, Sommersemester 2009

Domain-oriented edge-based alignment of protein interaction networks
ABSTRACT
Motivation: Recent advances in high-throughput experimental
techniques have yielded a large amount of data on protein–protein
interactions (PPIs). Since these interactions can be organized into
networks, and since separate PPI networks can be constructed for
different species, a natural research direction is the comparative
analysis of such networks across species in order to detect
conserved functional modules. This is the task of network alignment.
Results: Most conventional network alignment algorithms adopt a
node-then-edge-alignment paradigm: they first identify homologous
proteins across networks and then consider interactions among
them to construct network alignments. In this study, we propose
an alternative direct-edge-alignment paradigm. Specifically, instead
of explicit identification of homologous proteins, we directly infer
plausibly alignable PPIs across species by comparing conservation
of their constituent domain interactions. We apply our approach to
detect conserved protein complexes in yeast–fly and yeast–worm
PPI networks, and show that our approach outperforms two recent
approaches in most alignment performance metrics.

[ edit ]
Arli Parikesit, 13.07.2009, Sommersemester 2009

Functional protein divergence in the evolution of Homo sapiens
Background: Protein-coding regions in a genome evolve by sequence divergence and gene gain and loss, altering the gene content of the organism. However, it is not well understood how this has given rise to the enormous diversity of metazoa present today.
Results: To obtain a global view of human genomic evolution, we quantify the divergence of proteins by functional category at different evolutionary distances from human.
Conclusion: This analysis highlights some general systems-level characteristics of human evolution: regulatory processes, such as signal transducers, transcription factors and receptors, have a high degree of plasticity, while core processes, such as metabolism, transport and protein synthesis, are largely conserved. Additionally, this study reveals a dynamic picture of selective forces at short, medium and long evolutionary timescales. Certain functional categories, such as [...]

[ edit ]
Thomas Efer, Sommersemester 2009

Nature-Article: "A simple rule for the evolution of cooperation on graphs"
"A fundamental aspect of all biological systems is cooperation. Cooperative interactions are required for many levels of biological organization ranging from single cells to groups of animals. Human society is based to a large extent on mechanisms that promote cooperation. It is well known that in unstructured populations, natural selection favors defectors over cooperators. There is much current interest, however, for studying evolutionary games in structured populations and on graphs. These efforts recognize the fact that who-meets-whom is not random, but determined by spatial relationships or social networks. Here we describe a surprisingly simple rule, which is a good approximation for all graphs that we have analyzed, including cycles, spatial lattices, random regular graphs, random graphs and scale-free networks: natural selection favors cooperation, if the benefit of the altruistic act, b, divided by the cost, c, exceeds the average number of neighbors, k. Therefore, cooperation can evolve as a consequence of [...]

[OHLE06] Hisashi Ohtsuki, Christoph Hauert, Erez Lieberman and Martin A. Nowak: A simple rule for the evolution of cooperation on graphs. Nature. 2006 May 25; 441(7092): 502–505. doi: 10.1038/nature04605

[ edit ]
Daniel Himmelbach, 17.07.2009, Sommersemester 2009

A practical method for exact computation of subtree prune and regraft distance
Motivation: Subtree prune and regraft (SPR) is one kind of tree rearrangements that has seen applications in solving several computational biology problems. The minimum number of rooted SPR (rSPR) operations needed to transform one rooted binary tree to another is called the rSPR distance between the two trees.
Computing the rSPR distance has been actively studied in recent years. Currently, there is a lack of practical software tools for computing the rSPR distance for relatively large trees with large rSPR distance.
Results: In this article, we present a simple and practical method that computes the exact rSPR distance with integer linear programming.
By applying this new method on several simulated and real biological datasets, we show that our new method outperforms existing software tools in term of accuracy and efficiency. Our experimental results indicate that our method can compute the exact rSPR distance for many large trees with large rSPR distance.

Baroni,M. et al. (2005) Bounding the number of hybridisation events for a consistent evolutionary history. J. Math. Biol., 51, 171-182.
Bordewich,M. and Semple,C. (2004) On the computational complexity of the rooted subtree prune and regraft distance. Ann. Combinatorics, 8, 409-423.
Hein,J. et al. (1996) On the complexity of comparing evolutionary trees. Discrete Appl. Math., 71, 153-169.
Rodrigues,E.M. et al. (2001) Some approximation res [...]

[ edit ]
Franziska Kutzera, Sommersemester 2009

Evolutionary construction of Multiple Graph Alignments for the Structural Analysis of Biomolecules
The concept of multiple graph alignment has recently been
introduced as a novel method for the structural analysis of
biomolecules. Using approximate graph matching techniques, this
method enables the robust identification of approximately conserved
patterns in biologically related structures. In particular, multiple graph
alignment enables the characterization of functional protein families
independent of sequence or fold homology. This paper first recalls the
concept of multiple graph alignment and then addresses the problem
of computing optimal alignments from an algorithmic point of view.
In this regard, a method from the field of evolutionary algorithms is
proposed and empirically compared to a hitherto existing heuristic
approach. Empirically, it is shown that the former yields significantly
better results than the latter, albeit at the cost of an increased runtime.

Bartz-Beielstein, T. (2006). Experimental research in evolutionary computation: The new experimentalism. Springer.
Bateman, A., Coin, L., Durbin, R., Finn, R. D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E. L. L., Studholme, D. J., Yeats, C., and Eddy, S. R. (2004). The Pfam protein families database. Nucleic Acids Research, 32, 138-141.
Berg, J. and L¨assig, M. (2004). Local graph alignment and motif search in biological networks. Proceeding

[ edit ]
Thomas Hofmann, Sommersemester 2009

Perrodou, et al: A new protein linear motif benchmark for multiple sequence alignment software
__Background__:
Linear motifs (LMs) are abundant short regulatory sites used for modulating the functions of many eukaryotic proteins. They play important roles in post-translational modification, cell compartment targeting, docking sites for regulatory complex assembly and protein processing and cleavage. Methods for LM detection are now being developed that are strongly dependent on scores for motif conservation in homologous proteins. However, most LMs are found in natively disordered polypeptide segments that evolve rapidly, unhindered by structural constraints on the sequence. These regions of modular proteins are difficult to align using classical multiple sequence alignment programs that are specifically optimised to align the globular domains. As a consequence, poor motif alignment quality is hindering efforts to detect new LMs.

__Results__:
We have developed a new benchmark, as part of the BAliBASE suite, designed to assess the ability of standard multiple alignment methods to detect and align LMs. The reference alignments are organised into different test sets representing real alignment problems and contain examples of experimentally verified functional motifs, extracted from the Eukaryotic Linear Motif (ELM) database. The benchmark has been used to evaluate and compare a number of multiple alignment programs. With distantly related proteins, the worst alignment program correctly aligns 48% of LMs compared to 73% for the best program. However, the performance of all the programs is adversely affected by the introduction of other sequences containing false positive motifs. The ranking of the alignment programs based on LM alignment quality is similar to that observed when considering full-length protein alignments, however little correlation was observed between LM and overall alignment quality for individual alignment test cases.

__Conclusion__:
We have shown that none of the programs currently available is capable of reliabl

tbd

[ edit ]
Michael Siebauer, 02.02.2009, Wintersemester 2008/2009

Modifikation des Sankoff Algorithmus zur Homologiesuche
Vorgestellt wird eine Modifikation des Sankoff Algorithmus, die eine schnelle und speichersparende Berechnung eines semiglobalen Sequenz-/StrukturAligments ermöglicht.

Modifikation des Sankoff Algorithmus zur Homologiesuche – Bachelor Arbeit
Variantion on RNA Folding and Alignment – Lessons from Benasque

Inferring Noncoding RNA families and classes by means of Genome-Scale Structure-Based Clustering

Alignment of RNA base pairing probability matrices

Prediction of locally stable RNA secondary structures for genome-wide surveys

Secondary Structure Predicition for Aligned RNA Sequences

[ edit ]
Christoph Kämpf, Wintersemester 2008/2009

Combining statistical alignment and phylogenetic footprinting to detect regulatory elements
siehe gleichnamiges Paper.

siehe gleichnamiges Paper.

[ edit ]
Jan Engelhardt, 02.02.2009, Wintersemester 2008/2009

Something about smyRNAs and slRNAs
Interaction of smyRNAs and slRNAs

wird nachgereicht

[ edit ]
Maria Herberg, 02.02.2009, Wintersemester 2008/2009

Modelling Protein Interaction Networks - Age-Dependent Evolution of the Yeast Protein Interaction
Proteins interact in complex protein–protein interaction (PPI) networks whose topological properties—such as scale-free topology, hierarchical modularity, and dissortativity—have suggested models of network evolution. Currently preferred models invoke preferential attachment or gene duplication and divergence to produce networks whose topology matches that observed for real PPIs, thus supporting these as likely models for network evolution. Here, we show that the interaction density and homodimeric frequency are highly protein age–dependent in real PPI networks in a manner which does not agree with these canonical models. In light of these results, we propose an alternative stochastic model, which adds each protein sequentially to a growing network in a manner analogous to protein crystal growth (CG) in solution. The CG model produces PPI networks consistent in both topology and age distributions with real PPI networks and is well supported by the spatial arrangement of protein complexes of known 3-D structure, suggesting a plausible physical mechanism for network evolution. Kim WK, Marcotte EM (2008)

Kim WK, Marcotte EM (2008) Age-Dependent Evolution of the Yeast Protein Interaction Network Suggests a Limited Role of Gene Duplication and Divergence. PLoS Comput Biol 4(11);
Barabasi AL, Oltvai ZN (2004) Network biology: understanding the cell’s
functional organization. Nat Rev Genet 5;
Kim WK, Henschel A, Winter C, Schroeder M (2006) The many faces of protein–protein interactions: A compendium of interface geometry. PLoS Comput Biol 2(9);
Newman ME, Girvan M (2004) Finding an

[ edit ]
Daniel Exner, 11.07.2008, Sommersemester 2008

Intelligente RNA-Komprimierung: Metric fuer Sekundaerstruktur Komplexitaet?
Kombinierte Sequenz und Struktur RNA Informationen als reinen Text zu betrachten und mit Standard Verfahren zu komprimieren wird als einem differenzierten Ansatz unterlegen gezeigt.
Ausserdem bietet die Algorithmus implizite Informationen zur Komplexitaet der Sekundaerstruktur.

doi:10.1186/1471-2105-9-176

[ edit ]
Mandy Fuchs, 11.07.2008, Sommersemester 2008

Computational prediction of host-pathogen protein-protein interactions
Infectious diseases such as malaria result in millions of deaths each year. An important aspect of any host-pathogen system is the mechanism by which a pathogen can infect its host. One method of infection is via protein-protein interactions (PPIs) where pathogen proteins target host proteins.
They present a method that integrates known intra-species PPIs with protein-domain profiles to predict PPIs between host and pathogen proteins.

Computational prediction of host-pathogen protein-protein interactions, Matthew D. Dyer, T. M. Murali and Bruno W. Sobral, Bioinformatics 2007

[ edit ]
Sebastian Bartschat, 01.02.2008, Wintersemester 2007/2008

comparative structure prediction of RNA molecules - using a non Sankoff approach
im wesentlichen dreht sich der vortrag um das tool RNAspa beziehungsweise um den algorithmus der dahinter steckt.
des weiteren wird er dann mit dem algotihmus hinter RNAcast verglichen.

RNAspa:
http://www.biomedcentral.com/1471-2105/8/366

RNAcast:
http://bioinformatics.oxfordjournals.org/cgi/content/abstract/21/17/3516

[ edit ]
Mandy Fuchs, 09.07.2007, Sommersemester 2007

Structural Alignment of Two RNA Sequences with Lagrangian Relaxation
RNA is generally a single-stranded molecule where the bases form hydrogen bonds within the same molecule leading to structure formation. In comparing different homologous RNA molecules it is usually not sufficient to consider only the primary sequence, but it is important to consider both the sequence and the structure of the molecules. Traditional alignment algorithms can only account for the sequence of bases, but not for the base pairings. Considering the structure leads to significant computational problems because of the dependencies introduced by the base pairings and the presence of pseudoknots. In this paper we address the problem of optimally aligning two given RNA sequences either with or without known structure (allowing for pseudoknots). We phrase the problem as an integer linear program and then solve it using Lagrangian relaxation. In our computational experiments we could align large problem instances—18S and 23S ribosomal RNA with up to 1500 bases within minutes while preserving pseudoknots.

[ edit ]
Jakob Muehmel, Sommersemester 2007

Multiple alignment by sequence annealing

[ edit ]
Andrej Aderhold, Sommersemester 2007

Inference of miRNA targets using evolutionary conservation and pathway analysis
BACKGROUND: MicroRNAs have emerged as important regulatory genes in a variety of cellular processes and, in recent years, hundreds of such genes have been discovered in animals. In contrast, functional annotations are available only for a very small fraction of these miRNAs, and even in these cases only partially. RESULTS: We developed a general Bayesian method for the inference of miRNA target sites, in which, for each miRNA, we explicitly model the evolution of orthologous target sites in a set of related species. Using this method we predict target sites for all known miRNAs in flies, worms, fish, and mammals. By comparing our predictions in fly with a reference set of experimentally tested miRNA-mRNA interactions we show that our general method performs at least as well as the most accurate methods available to date, including ones specifically tailored for target prediction in fly. An important novel feature of our model is that it explicitly infers the phylogenetic distribution of functional target sites, independently for each miRNA. This allows us to infer species-specific and clade-specific miRNA targeting. We also show that, in long human 3

[ edit ]
Sebastian Bartschat, Sommersemester 2007

Discovering structural motifs using a structural alphabet: Application to magnesium binding sites

http://www.biomedcentral.com/1471-2105/8/106

[ edit ]
Lydia Steiner, Sommersemester 2007

Ontology development for biological systems: immunology

http://bioinformatics.oxfordjournals.org/cgi/reprint/23/7/913?maxtoshow=&HITS=80&hits=80&RESULTFORMAT=1& title=gene%20ontology%20owl&andorexacttitle=or&titleabstract=gene%20ontology%20owl&andorexacttitleabs=or& ;fulltext=gene%20ontology%20owl&andorexactfulltext=or&searchid=1&FIRSTINDEX=0&sortspec=date&resource type=HWCIT

[ edit ]
Marcus Lechner, Sommersemester 2007

Understanding and using the meaning of statements in a bio-ontology

http://www.biomedcentral.com/content/pdf/1471-2105-8-57.pdf

[ edit ]
Christian Arnold, Sommersemester 2007

BranchClust: a phylogenetic algorithm for selecting gene families

Background:
Automated methods for assembling families of orthologous genes include those based on sequence similarity scores and those based on phylogenetic approaches. The first are easy to automate but usually they do not distinguish between paralogs and orthologs or have restriction on the number of taxa. Phylogenetic methods often are based on reconciliation of a gene tree with a known rooted species tree; a limitation of this approach, especially in case of prokaryotes, is that the species tree is often unknown, and that from the analyses of single gene families the branching order between related organisms frequently is unresolved.

Results:
Here we describe an algorithm for the automated selection of orthologous genes that recognizes orthologous genes from different species in a phylogenetic tree for any number of taxa. The algorithm is capable of distinguishing complete (containing all taxa) and incomplete (not containing all taxa) families and recognizes in- and outparalogs. The BranchClust algorithm is implemented in Perl with the use of the BioPerl module for parsing trees and is freely available at http://bioinformatics.org/branchclust.

Conclusion:
BranchClust outperforms the Reciprocal Best Blast hit method in selecting more sets of putatively orthologous genes. In the test cases examined, the correctness of the selected families and of the identified in- and outparalogs was confirmed by inspection of the pertinent phylogenetic trees.

http://www.biomedcentral.com/1471-2105/8/120

[ edit ]
Christoph Theunert, Sommersemester 2007

miRNA - targetScan

[ edit ]
Christian Otto, Sommersemester 2007

Comparing sequences without using alignments: application to HIV/SIV subtyping

http://www.biomedcentral.com/1471-2105/8/1/abstract

[ edit ]
Julian Joeris, Sommersemester 2007

ILP - Integer Linear Programming

[ edit ]
Michael Siebauer, Sommersemester 2007

Identifying bacterial genes and endosymbiont DNA with Glimmer
Die Autoren haben ein Modul (bzw. Ansatz gefunden) um bakterielle Gene aus dem Organismus Genom zu filtern. Es gibt wohl intrazellulär lebende Bakterien deren Genom beim Sequenzieren mit dem Wirtsgenom vermischt wird. Das OpenSource Programm "Glimmer" kann mittels trainierter Hidden-Markov-Modelle diese beiden Genome wieder trennen.

Bioinformatics 2007 23(6):673-679
doi:10.1093/bioinformatics/btm009
http://bioinformatics.oxfordjournals.org/cgi/content/full/23/6/673