Proseminars at our group

Beabsichtigen Sie einen Vortrag im Rahmen eines Problemseminars zu halten, so melden Sie dies bitte vorher unverbindlich an: Anmeldeformular für Problemseminare
Vortragszeit: 30 min + anschliessende Diskussion

If you plan to give a talk as a proseminar, please register first: Registration-Form Proseminars
Length of talk: 30 min + discussion afterwards
Registered proseminars
[ edit ]
Lisa Falkowski, 03.02.2012, Wintersemester 2011/2012Histone exchange and histone modifications during transcription and aging.
The organization of the eukaryotic genome into chromatin enables DNA to fit inside the nucleus while also regulating the access of proteins to the DNA to facilitate genomic functions such as transcription, replication and repair. The basic repeating unit of chromatin is the nucleosome, which includes 147bp of DNA wrapped 1.65 times around an octamer of core histone proteins comprising two molecules each of H2A, H2B, H3 and H4 [1]. Each nucleosome is a highly stable unit, being maintained by over 120 direct protein-DNA interactions and several hundred water mediated ones [1]. Accordingly, there is considerable interest in understanding how processive enzymes such as RNA polymerases manage to pass along the coding regions of our genes that are tightly packaged into arrays of nucleosomes. Here we present the current mechanistic understanding of this process and the evidence for profound changes in chromatin dynamics during aging. This article is part of a Special Issue entitled: Histone chaperones and Chromatin assembly.
[ edit ]
Stefanie Heidenreich, 03.02.2012, Wintersemester 2011/2012Histone methylation makes its mark on longevity
How long organisms live is not entirely written in their
genes. Recent findings reveal that epigenetic factors that
regulate histone methylation, a type of chromatin modification,
can affect lifespan. The reversible nature of
chromatin modifications suggests that therapeutic targeting
of chromatin regulators could be used to extend
lifespan and healthspan. This review describes the epigenetic
regulation of lifespan in diverse model organisms,
focusing on the role and mode of action of
chromatin regulators that affect two epigenetic marks,
trimethylated lysine 4 of histone H3 (H3K4me3) and
trimethylated lysine 27 of histone H3 (H3K27me3), in
longevity.
genes. Recent findings reveal that epigenetic factors that
regulate histone methylation, a type of chromatin modification,
can affect lifespan. The reversible nature of
chromatin modifications suggests that therapeutic targeting
of chromatin regulators could be used to extend
lifespan and healthspan. This review describes the epigenetic
regulation of lifespan in diverse model organisms,
focusing on the role and mode of action of
chromatin regulators that affect two epigenetic marks,
trimethylated lysine 4 of histone H3 (H3K4me3) and
trimethylated lysine 27 of histone H3 (H3K27me3), in
longevity.
Greer, E.L. et al. (2010) Members of the H3K4 trimethylation complex
regulate lifespan in a germline-dependent manner in C. elegans.
Nature 466, 383–387
Kenyon, C.J. (2010) The genetics of ageing. Nature 464, 504–512
regulate lifespan in a germline-dependent manner in C. elegans.
Nature 466, 383–387
Kenyon, C.J. (2010) The genetics of ageing. Nature 464, 504–512
[ edit ]
Linda Arnold, 03.02.2012, Wintersemester 2011/2012On the Connection between RNAi and Heterochromatin at Centromeres
RNA interference (RNAi) is a conserved silencing mechanism whereby double-strand RNA induces specific down-regulation
of homologous sequences. In the fission yeast Schizosaccharomyces pombe, centromeric heterochromatin assembly is an
RNAi-dependent process. Noncoding RNAs transcribed from pericentromeric repeat sequences are processed into short interfering
RNAs (siRNAs) that direct the Argonaute-containing RNA-induced transcriptional silencing (RITS) effector complex
to homologous nascent transcripts. RITS is required for H3K9 methylation by the histone methyltransferase (HMT) Clr4;
conversely, H3K9 methylation can attract RITS to chromatin via binding of the chromodomain protein Chp1. This codependency
has hampered dissection of the order of events and mechanisms of cross talk between the RNAi and chromatin modification
machineries. To tackle this problem, we have developed systems that reconstitute heterochromatin at a euchromatic
locus, using either hairpin triggers or DNA-tethered chromatin-modifying complexes. These systems reveal that RNAi is sufficient
to promote heterochromatin assembly in cis and that direct recruitment of the HMT Clr4 can bypass the role of RNAi
in heterochromatin assembly. We have also characterized a new pathway component, Stc1, that translates the RNAi signal
into chromatin marks. We discuss the implications of these findings for our understanding of the mechanism and function of
RNAi-directed heterochromatin assembly at centromeres.
of homologous sequences. In the fission yeast Schizosaccharomyces pombe, centromeric heterochromatin assembly is an
RNAi-dependent process. Noncoding RNAs transcribed from pericentromeric repeat sequences are processed into short interfering
RNAs (siRNAs) that direct the Argonaute-containing RNA-induced transcriptional silencing (RITS) effector complex
to homologous nascent transcripts. RITS is required for H3K9 methylation by the histone methyltransferase (HMT) Clr4;
conversely, H3K9 methylation can attract RITS to chromatin via binding of the chromodomain protein Chp1. This codependency
has hampered dissection of the order of events and mechanisms of cross talk between the RNAi and chromatin modification
machineries. To tackle this problem, we have developed systems that reconstitute heterochromatin at a euchromatic
locus, using either hairpin triggers or DNA-tethered chromatin-modifying complexes. These systems reveal that RNAi is sufficient
to promote heterochromatin assembly in cis and that direct recruitment of the HMT Clr4 can bypass the role of RNAi
in heterochromatin assembly. We have also characterized a new pathway component, Stc1, that translates the RNAi signal
into chromatin marks. We discuss the implications of these findings for our understanding of the mechanism and function of
RNAi-directed heterochromatin assembly at centromeres.
[ edit ]
Henrike Indrischek, 30.01.2012, Wintersemester 2011/2012Genomic characterization reveals a simple histone H4 acetylation code
The histone code hypothesis holds that covalent posttranslational modifications of histone tails are interpreted by the cell to yield a rich combinatorial transcriptional output. This hypothesis has been the subject of active debate in the literature. Here, we investigated the combinatorial complexity of the acetylation code at the four lysine residues of the histone H4 tail in budding yeast. We constructed yeast strains carrying all 15 possible combinations of mutations among lysines 5, 8, 12, and 16 to arginine in the histone H4 tail, mimicking positively charged, unacetylated lysine states, and characterized the resulting genome-wide changes in gene expression by using DNA microarrays. Only the lysine 16 mutation had specific transcriptional consequences independent of the mutational state of the other lysines (affecting approximately 100 genes). In contrast, for lysines 5, 8, and 12, expression changes were due to nonspecific, cumulative effects seen as increased transcription correlating with an increase in the total number of mutations (affecting approximately 1,200 genes). Thus, acetylation of histone H4 is interpreted by two mechanisms: a specific mechanism for lysine 16 and a nonspecific, cumulative mechanism for lysines 5, 8, and 12.
Steven Henikoff: Histone modifications: Combinatorial complexity or cumulative simplicity?
[ edit ]
Juliane Meißner, 03.02.2012, Wintersemester 2011/2012Genome Digging: Insight into the Mitochondrial Genome of Homo
Abstract
Background: A fraction of the Neanderthal mitochondrial genome sequence has a similarity with a 5,839-bp nuclear DNA
sequence of mitochondrial origin (numt) on the human chromosome 1. This fact has never been interpreted. Although this
phenomenon may be attributed to contamination and mosaic assembly of Neanderthal mtDNA from short sequencing
reads, we explain the mysterious similarity by integration of this numt (mtAncestor-1) into the nuclear genome of the
common ancestor of Neanderthals and modern humans not long before their reproductive split.
Principal Findings: Exploiting bioinformatics, we uncovered an additional numt (mtAncestor-2) with a high similarity to the
Neanderthal mtDNA and indicated that both numts represent almost identical replicas of the mtDNA sequences ancestral to
the mitochondrial genomes of Neanderthals and modern humans. In the proteins, encoded by mtDNA, the majority of
amino acids distinguishing chimpanzees from humans and Neanderthals were acquired by the ancestral hominins. The
overall rate of nonsynonymous evolution in Neanderthal mitochondrial protein-coding genes is not higher than in other
lineages. The model incorporating the ancestral hominin mtDNA sequences estimates the average divergence age of the
mtDNAs of Neanderthals and modern humans to be 450,000–485,000 years. The mtAncestor-1 and mtAncestor-2 sequences
were incorporated into the nuclear genome approximately 620,000 years and 2,885,000 years ago, respectively.
Conclusions: This study provides the first insight into the evolution of the mitochondrial DNA in hominins ancestral to
Neanderthals and humans. We hypothesize that mtAncestor-1 and mtAncestor-2 are likely to be molecular fossils of the
mtDNAs of Homo heidelbergensis and a stem Homo lineage. The dN/dS dynamics suggests that the effective population size
of extinct hominins was low. However, the hominin lineage ancestral to humans, Neanderthals and H. heidelbergensis, had a
larger effective population size and possessed genetic diversity comparable with those of chimpanzee and gorilla.
Background: A fraction of the Neanderthal mitochondrial genome sequence has a similarity with a 5,839-bp nuclear DNA
sequence of mitochondrial origin (numt) on the human chromosome 1. This fact has never been interpreted. Although this
phenomenon may be attributed to contamination and mosaic assembly of Neanderthal mtDNA from short sequencing
reads, we explain the mysterious similarity by integration of this numt (mtAncestor-1) into the nuclear genome of the
common ancestor of Neanderthals and modern humans not long before their reproductive split.
Principal Findings: Exploiting bioinformatics, we uncovered an additional numt (mtAncestor-2) with a high similarity to the
Neanderthal mtDNA and indicated that both numts represent almost identical replicas of the mtDNA sequences ancestral to
the mitochondrial genomes of Neanderthals and modern humans. In the proteins, encoded by mtDNA, the majority of
amino acids distinguishing chimpanzees from humans and Neanderthals were acquired by the ancestral hominins. The
overall rate of nonsynonymous evolution in Neanderthal mitochondrial protein-coding genes is not higher than in other
lineages. The model incorporating the ancestral hominin mtDNA sequences estimates the average divergence age of the
mtDNAs of Neanderthals and modern humans to be 450,000–485,000 years. The mtAncestor-1 and mtAncestor-2 sequences
were incorporated into the nuclear genome approximately 620,000 years and 2,885,000 years ago, respectively.
Conclusions: This study provides the first insight into the evolution of the mitochondrial DNA in hominins ancestral to
Neanderthals and humans. We hypothesize that mtAncestor-1 and mtAncestor-2 are likely to be molecular fossils of the
mtDNAs of Homo heidelbergensis and a stem Homo lineage. The dN/dS dynamics suggests that the effective population size
of extinct hominins was low. However, the hominin lineage ancestral to humans, Neanderthals and H. heidelbergensis, had a
larger effective population size and possessed genetic diversity comparable with those of chimpanzee and gorilla.
[ edit ]
Vera Lede, 03.02.2012, Wintersemester 2011/2012Evolutionary Origins of Transcription Factor Binding Site Clusters
Abstract
Empirical studies have revealed that regulatory DNA sequences such as enhancers or promoters often harbor multiple
binding sites for the same transcription factor. Such ‘‘homotypic site clustering’’ has been hypothesized as arising out of
functional requirements of the sequences. Here, we propose an alternative explanation of this phenomenon that multisite
enhancers are common because they are favored by evolutionary sampling of the genotype–phenotype landscape. To test
this hypothesis, we developed a new computational framework specialized for population genetic simulations of enhancer
evolution. It uses a thermodynamics-based model of enhancer function, integrating information from strong as well as
weak binding sites, to determine the strength of selection. Using this framework, we found that even when simpler
genotypes exist for a desired strength of regulation, relatively complex genotypes (enhancers with more sites) are more
readily reached by the simulated evolutionary process. We show that there are more ways to ‘‘build’’ a fit genotype with
many weak sites than with a few strong sites, and this is why evolution finds complex genotypes more often. Our claims
are consistent with an empirical analysis of binding site content in enhancers characterized in Drosophila melanogaster and
their orthologs in other Drosophila species. We also characterized a subtle but significant difference between genotypes
likely to be sampled by evolution and equally fit genotypes one would obtain by uniform sampling of the fitness landscape,
that is, an ‘‘evolutionary signature’’ in enhancer sequences. Finally, we investigated potential effects of other factors, such as
rugged fitness landscapes, short local duplications, and noise characteristics of enhancers, on the emergence of homotypic
site clustering.
Homotypic site clustering is an important contributor to the complexity and function of cis-regulatory sequences. This
work provides a simple null hypothesis for its origin, against which alternative adaptationist explanations may be
evaluated, and cautions against ‘‘evolutionary mirages’’ present in common features of genomic sequence. The quantitative
framework we develop here can be used more generally to understand how mechanisms of enhancer action influence their
composition and evolution.
Empirical studies have revealed that regulatory DNA sequences such as enhancers or promoters often harbor multiple
binding sites for the same transcription factor. Such ‘‘homotypic site clustering’’ has been hypothesized as arising out of
functional requirements of the sequences. Here, we propose an alternative explanation of this phenomenon that multisite
enhancers are common because they are favored by evolutionary sampling of the genotype–phenotype landscape. To test
this hypothesis, we developed a new computational framework specialized for population genetic simulations of enhancer
evolution. It uses a thermodynamics-based model of enhancer function, integrating information from strong as well as
weak binding sites, to determine the strength of selection. Using this framework, we found that even when simpler
genotypes exist for a desired strength of regulation, relatively complex genotypes (enhancers with more sites) are more
readily reached by the simulated evolutionary process. We show that there are more ways to ‘‘build’’ a fit genotype with
many weak sites than with a few strong sites, and this is why evolution finds complex genotypes more often. Our claims
are consistent with an empirical analysis of binding site content in enhancers characterized in Drosophila melanogaster and
their orthologs in other Drosophila species. We also characterized a subtle but significant difference between genotypes
likely to be sampled by evolution and equally fit genotypes one would obtain by uniform sampling of the fitness landscape,
that is, an ‘‘evolutionary signature’’ in enhancer sequences. Finally, we investigated potential effects of other factors, such as
rugged fitness landscapes, short local duplications, and noise characteristics of enhancers, on the emergence of homotypic
site clustering.
Homotypic site clustering is an important contributor to the complexity and function of cis-regulatory sequences. This
work provides a simple null hypothesis for its origin, against which alternative adaptationist explanations may be
evaluated, and cautions against ‘‘evolutionary mirages’’ present in common features of genomic sequence. The quantitative
framework we develop here can be used more generally to understand how mechanisms of enhancer action influence their
composition and evolution.
[ edit ]
Christian Sonnendecker, 30.01.2012, Wintersemester 2011/2012Computational approaches toward the design of pools for the in vitro selection of complex aptamers
It is well known that using random RNA/DNA sequences for SELEX experiments will generally yield low-complexity structures. Early experimental results suggest that having a structurally diverse library, which, for instance, includes high-order junctions, may prove useful in finding new functional motifs. Here, we develop two computational methods to generate sequences that exhibit higher structural complexity and can be used to increase the overall structural diversity of initial pools for in vitro selection experiments. Random Filtering selectively increases the number of five-way junctions in RNA/DNA pools, and Genetic Filtering designs RNA/DNA pools to a specified structure distribution, whether uniform or otherwise. We show that using our computationally designed DNA pool greatly improves access to highly complex sequence structures for SELEX experiments (without losing our ability to select for common one-way and two-way junction sequences).
Computational approaches toward the design of pools for the in vitro selection of complex aptamers.
Luo X, McKeague M, Pitre S, Dumontier M, Green J, Golshani A, Derosa MC, Dehne F.
RNA. 2010 Nov;16(11):2252-62. Epub 2010 Sep 24.
PMID:
20870801
Luo X, McKeague M, Pitre S, Dumontier M, Green J, Golshani A, Derosa MC, Dehne F.
RNA. 2010 Nov;16(11):2252-62. Epub 2010 Sep 24.
PMID:
20870801
[ edit ]
Caroline Wilde, 30.01.2012, Wintersemester 2011/2012Defining an epigenetic code
The nucleosome surface is decorated with an array of enzyme-catalysed modifications on histone tails. These modifications have well-defined roles in a variety of ongoing chromatin functions, often by acting as receptors for non-histone proteins, but their longer-term effects are less clear. Here, an attempt is made to define how histone modifications operate as part of a predictive and heritable epigenetic code that specifies patterns of gene expression through differentiation and development.
[ edit ]
Falko Altenkirch, 03.02.2012, Wintersemester 2011/2012De novo assembly of human genomes with massively parallel short read sequencing
Next-generation massively parallel DNA sequencing technologies provide ultrahigh throughput at a substantially lower unit data cost; however, the data are very short read length sequences, making de novo assembly extremely challenging. Here, we describe a novel method for de novo assembly of large genomes from short read sequences. We successfully assembled both the Asian and African human genome sequences, achieving an N50 contig size of 7.4 and 5.9 kilobases (kb) and scaffold of 446.3 and 61.9 kb, respectively. The development of this de novo short read assembly method creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way.
[ edit ]
Sabina Kanton, 03.02.2012, Wintersemester 2011/2012An enhanced RNA alignment benchmark for sequence alignment programs
Background
The performance of alignment programs is traditionally tested on sets of protein sequences, of which a reference alignment is known. Conclusions drawn from such protein benchmarks do not necessarily hold for the RNA alignment problem, as was demonstrated in the first RNA alignment benchmark published so far. For example, the twilight zone – the similarity range where alignment quality drops drastically – starts at 60 percent for RNAs in comparison to 20 percent for proteins. In this study we enhance the previous benchmark.
Results
The RNA sequence sets in the benchmark database are taken from an increased number of RNA families to avoid unintended impact by using only a few families. The size of sets varies from 2 to 15 sequences to assess the influence of the number of sequences on program performance. Alignment quality is scored by two measures: one takes into account only nucleotide matches, the other measures structural conservation. The performance order of parameters – like nucleotide substitution matrices and gap-costs – as well as of programs is rated by rank tests.
Conclusion
Most sequence alignment programs perform equally well on RNA sequence sets with high sequence identity, that is with an average pairwise sequence identity (APSI) above 75 percent. Parameters for gap-open and gap-extension have a large influence on alignment quality lower than APSI 75 percent; optimal parameter combinations are shown for several programs. The use of different 4 × 4 substitution matrices improved program performance only in some cases. The performance of iterative programs drastically increases with increasing sequence numbers and/or decreasing sequence identity, which makes them clearly superior to programs using a purely non-iterative, progressive approach. The best sequence alignment programs produce alignments of high quality down to APSI higher than 55 percent; at lower APSI the use of sequence+structure alignment programs is recommended.
The performance of alignment programs is traditionally tested on sets of protein sequences, of which a reference alignment is known. Conclusions drawn from such protein benchmarks do not necessarily hold for the RNA alignment problem, as was demonstrated in the first RNA alignment benchmark published so far. For example, the twilight zone – the similarity range where alignment quality drops drastically – starts at 60 percent for RNAs in comparison to 20 percent for proteins. In this study we enhance the previous benchmark.
Results
The RNA sequence sets in the benchmark database are taken from an increased number of RNA families to avoid unintended impact by using only a few families. The size of sets varies from 2 to 15 sequences to assess the influence of the number of sequences on program performance. Alignment quality is scored by two measures: one takes into account only nucleotide matches, the other measures structural conservation. The performance order of parameters – like nucleotide substitution matrices and gap-costs – as well as of programs is rated by rank tests.
Conclusion
Most sequence alignment programs perform equally well on RNA sequence sets with high sequence identity, that is with an average pairwise sequence identity (APSI) above 75 percent. Parameters for gap-open and gap-extension have a large influence on alignment quality lower than APSI 75 percent; optimal parameter combinations are shown for several programs. The use of different 4 × 4 substitution matrices improved program performance only in some cases. The performance of iterative programs drastically increases with increasing sequence numbers and/or decreasing sequence identity, which makes them clearly superior to programs using a purely non-iterative, progressive approach. The best sequence alignment programs produce alignments of high quality down to APSI higher than 55 percent; at lower APSI the use of sequence+structure alignment programs is recommended.
[ edit ]
Toni Förster, 30.01.2012, Wintersemester 2011/2012Metabolic flux analysis
One of the ultimate goals of systems biology
research is to obtain a comprehensive understanding of the
control mechanisms of complex cellular metabolisms. Metabolic
Flux Analysis (MFA) is a important method for the
quantitative estimation of intracellular metabolic flows through
metabolic pathways and the elucidation of cellular physiology.
The primary challenge in the use of MFA is that many biological
networks are underdetermined systems; it is therefore difficult
to narrow down the solution space from the stoichiometric
constraints alone. In this tutorial, we present an overview of Flux
Balance Analysis (FBA) and 13C-Metabolic Flux Analysis (13CMFA),
both of which are frequently used to solve such underdetermined
systems, and we demonstrate FBA and 13C-MFA using the genome-scale model and the central carbon metabolism model, respectively. Furthermore, because such comprehensive study of intracellular fluxes is inherently complex, we subsequently introduce various pathway mapping and visualization tools to facilitate understanding of these data in the context of the pathways.
research is to obtain a comprehensive understanding of the
control mechanisms of complex cellular metabolisms. Metabolic
Flux Analysis (MFA) is a important method for the
quantitative estimation of intracellular metabolic flows through
metabolic pathways and the elucidation of cellular physiology.
The primary challenge in the use of MFA is that many biological
networks are underdetermined systems; it is therefore difficult
to narrow down the solution space from the stoichiometric
constraints alone. In this tutorial, we present an overview of Flux
Balance Analysis (FBA) and 13C-Metabolic Flux Analysis (13CMFA),
both of which are frequently used to solve such underdetermined
systems, and we demonstrate FBA and 13C-MFA using the genome-scale model and the central carbon metabolism model, respectively. Furthermore, because such comprehensive study of intracellular fluxes is inherently complex, we subsequently introduce various pathway mapping and visualization tools to facilitate understanding of these data in the context of the pathways.
Yoshihiro Toya, Nobuaki Kono, Kazuharu Arakawa and Masaru Tomita (2011) Metabolic Flux Analysis and Visualization. Journal of proteome research 10: 3313-3323
[ edit ]
Fabian Externbrink, 03.02.2012, Wintersemester 2011/2012MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.
A multiple sequence alignment program, MAFFT, has been developed. The CPU time is drastically reduced as compared with existing methods. MAFFT includes two novel techniques. (i) Homo logous regions are rapidly identified by the fast Fourier transform (FFT), in which an amino acid sequence is converted to a sequence composed of volume and polarity values of each amino acid residue. (ii) We propose a simplified scoring system that performs well for reducing CPU time and increasing the accuracy of alignments even for sequences having large insertions or extensions as well as distantly related sequences of similar length. Two different heuristics, the progressive method (FFT-NS-2) and the iterative refinement method (FFT-NS-i), are implemented in MAFFT. The performances of FFT-NS-2 and FFT-NS-i were compared with other methods by computer simulations and benchmark tests; the CPU time of FFT-NS-2 is drastically reduced as compared with CLUSTALW with comparable accuracy. FFT-NS-i is over 100 times faster than T-COFFEE, when the number of input sequences exceeds 60, without sacrificing the accuracy.
[ edit ]
Ying-Chi Lin, 03.02.2012, Wintersemester 2011/2012Maximally Efficient Modeling of DNA Sequence Motifs at All Levels of Complexity
Identification of transcription factor binding sites is necessary for deciphering gene regulatory networks.
Several new methods provide extensive data about the specificity of transcription factors but most methods for analyzing these data to obtain specificity models are limited in scope by, for example, assuming additive interactions or are inefficient in their exploration of more complex models. This article describes an approach—encoding of DNA sequences as the vertices of a regular simplex—that allows simultaneous direct comparison of simple and complex models, with higher-order parameters fit to the residuals of lower-order models. In addition to providing an efficient assessment of all model parameters, this approach can yield valuable insight into the mechanism of binding by highlighting features that are critical to accurate models.
Several new methods provide extensive data about the specificity of transcription factors but most methods for analyzing these data to obtain specificity models are limited in scope by, for example, assuming additive interactions or are inefficient in their exploration of more complex models. This article describes an approach—encoding of DNA sequences as the vertices of a regular simplex—that allows simultaneous direct comparison of simple and complex models, with higher-order parameters fit to the residuals of lower-order models. In addition to providing an efficient assessment of all model parameters, this approach can yield valuable insight into the mechanism of binding by highlighting features that are critical to accurate models.
Gary D. Stormo (2011) Maximally Efficient Modeling of DNA Sequence Motifs at All Levels of Complexity. Genetics 187(4): 1219-1224.
[ edit ]
Jan Engelhardt, 04.07.2011, Sommersemester 2011The Role of RNA Sequence and Structure in RNA-Protein Interactions.
We investigate the sequence and structural properties of RNA-protein interaction sites in 211 RNA-protein chain pairs, the largest set of RNA-protein complexes analyzed to date. Statistical analysis confirms and extends earlier analyses made on smaller data sets. There are 24.6% of hydrogen bonds between RNA and protein that are nucleobase specific, indicating the importance of both nucleobase-specific and -nonspecific interactions. While there is no significant difference between RNA base frequencies in protein-binding and non-binding regions, distinct preferences for RNA bases, RNA structural states, protein residues, and protein secondary structure emerge when nucleobase-specific and -nonspecific interactions are considered separately. Guanine nucleobase and unpaired RNA structural states are significantly preferred in nucleobase-specific interactions; however, nonspecific interactions disfavor guanine, while still favoring unpaired RNA structural states. The opposite preferences of nucleobase-specific and -nonspecific interactions for guanine may explain discrepancies between earlier studies with regard to base preferences in RNA-protein interaction regions. Preferences for amino acid residues differ significantly between nucleobase-specific and -nonspecific interactions, with nonspecific interactions showing the expected bias towards positively charged residues. Irregular protein structures are strongly favored in interactions with the protein backbone, whereas there is little preference for specific protein secondary structure in either nucleobase-specific interaction or -nonspecific interaction. Overall, this study shows strong preferences for both RNA bases and RNA structural states in protein-RNA interactions, indicating their mutual importance in protein recognition.
PMID: 21514302
[ edit ]
Christoph Kaempf, 08.07.2011, Sommersemester 2011A greedy, graph-based algorithm for the alignment of multiple homologous gene lists.
MOTIVATION:
Many comparative genomics studies rely on the correct identification of homologous genomic regions using accurate alignment tools. In such case, the alphabet of the input sequences consists of complete genes, rather than nucleotides or amino acids. As optimal multiple sequence alignment is computationally impractical, a progressive alignment strategy is often employed. However, such an approach is susceptible to the propagation of alignment errors in early pairwise alignment steps, especially when dealing with strongly diverged genomic regions. In this article, we present a novel accurate and efficient greedy, graph-based algorithm for the alignment of multiple homologous genomic segments, represented as ordered gene lists.
RESULTS:
Based on provable properties of the graph structure, several heuristics are developed to resolve local alignment conflicts that occur due to gene duplication and/or rearrangement events on the different genomic segments. The performance of the algorithm is assessed by comparing the alignment results of homologous genomic segments in Arabidopsis thaliana to those obtained by using both a progressive alignment method and an earlier graph-based implementation. Especially for datasets that contain strongly diverged segments, the proposed method achieves a substantially higher alignment accuracy, and proves to be sufficiently fast for large datasets including a few dozens of eukaryotic genomes.
Many comparative genomics studies rely on the correct identification of homologous genomic regions using accurate alignment tools. In such case, the alphabet of the input sequences consists of complete genes, rather than nucleotides or amino acids. As optimal multiple sequence alignment is computationally impractical, a progressive alignment strategy is often employed. However, such an approach is susceptible to the propagation of alignment errors in early pairwise alignment steps, especially when dealing with strongly diverged genomic regions. In this article, we present a novel accurate and efficient greedy, graph-based algorithm for the alignment of multiple homologous genomic segments, represented as ordered gene lists.
RESULTS:
Based on provable properties of the graph structure, several heuristics are developed to resolve local alignment conflicts that occur due to gene duplication and/or rearrangement events on the different genomic segments. The performance of the algorithm is assessed by comparing the alignment results of homologous genomic segments in Arabidopsis thaliana to those obtained by using both a progressive alignment method and an earlier graph-based implementation. Especially for datasets that contain strongly diverged segments, the proposed method achieves a substantially higher alignment accuracy, and proves to be sufficiently fast for large datasets including a few dozens of eukaryotic genomes.
[ edit ]
Markus Mueller, 08.07.2011, Sommersemester 2011Sequence assembly
Despite the rapidly increasing number of sequenced and re-sequenced genomes, many issues regarding the computational
assembly of large-scale sequencing data have remain unresolved. Computational assembly is crucial in large genome
projects as well for the evolving high-throughput technologies and plays an important role in processing the information
generated by these methods. Here, we provide a comprehensive overview of the current publicly available sequence assembly
programs. We describe the basic principles of computational assembly along with the main concerns, such as
repetitive sequences in genomic DNA, highly expressed genes and alternative transcripts in EST sequences.
assembly of large-scale sequencing data have remain unresolved. Computational assembly is crucial in large genome
projects as well for the evolving high-throughput technologies and plays an important role in processing the information
generated by these methods. Here, we provide a comprehensive overview of the current publicly available sequence assembly
programs. We describe the basic principles of computational assembly along with the main concerns, such as
repetitive sequences in genomic DNA, highly expressed genes and alternative transcripts in EST sequences.
[1] K. Liolios, K. Mavromatis, N. Tavernarakis, N. Kyrpides, The genomes on line database (gold) in 2007: status of
genomic and metagenomic projects and their associated metadata., Nucleic Acids Res 36 (Database Issue) (2008)
D475–9.
[2] F. Sanger, G. Air, B. Barrell, N. Brown, A. Coulson, C. Fiddes, C. Hutchison, P. Slocombe, M. Smith, Nucliotide
sequence of bacteriophage phi X174 DNA., Nature 265 (5596) (1977) 687–95.
[3] F. Sanger, A. Coulson, T. Friedmann, G. Air, B. Barrell, N. Brown, J. Fiddes, C. r. Hutchison, P. Slocombe,
M. Smith, The nucleotide sequence of bacteriophage phiX174., J Mol Biol 125 (2) (1978) 225–46.
[4] F. Sanger, A. Coulson, G. Hong, D. Hill, G. Petersen, Nucleotide sequence of bacteriophage lambda DNA., J Mol
Biol 162 (4) (1982) 729–73.
16
[5] W. Fiers, R. Contreras, G. Haegemann, R. Rogiers, A. Van de Voorde, H. Van Heuverswyn, J. Van Herreweghe,
G. Volckaert, M. Ysebaert, Complete nucleotide sequence of SV40 DNA., Nature 273 (5658) (1978) 113–20.
[6] S. Anderson, A. Bankier, B. Barrell, M. de Bruijn, A. Coulson, J. Drouin, I. Eperon, D. Nierlich, B. Roe, F. Sanger,
et al., Sequence and organization of the human mitochondrial genome., Nature 290 (5806) (1981) 457–65.
[7] S. Anderson, Shotgun DNA sequencing using cloned DNase I-generated fragments., Nucleic Acids Res 9 (13)
(1981) 3015–27.
[8] P. Deininger, Random subcloning of sonicated DNA: application to shotgun DNA sequence analysis., Anal
Biochem 129 (1) (1983) 216–23.
[9] A. Edwards, H. Voss, P. Rice, A. Civitello, J. Stegemann, C. Schwager, J. Zimmermann, H. Erfle, C. Caskey,
W. Ansorge, Automated DNA sequencing of the human HPRT locus., Genomics 6 (4) (1990) 593–608.
[10] R. Wooster, Identification of the breast cancer susceptibility gene BRCA2., Nature 378 (1995) 789–92.
[11] R. Fleischmann, M. Adams, O. White, R. Clayton, E. Kirkness, A. Kerlavage, C. Bult, J. Tomb, B. Dougherty,
J. Merrick, et al.;, Whole-genome random sequencing and assembly of Haemophilus influenzae Rd., Science
269 (5223) (1995) 496–512.
[12] M. Adams, J. Kelley, J. Gocayne,M. Dubnick,M. Polymeropoulos,H. Xiao, C.Merril, A.Wu, B. Olde, R.Moreno,
et al, Complementary DNA sequencing: expressed sequence tags and human genome project., Science. 252 (5013)
(1991) 1651–6.
[13] A. Christoffels, A. van Gelder, G. Greyling, R. Miller, T. Hide, W. Hide, STACK: Sequence Tag Alignment and
Consensus Knowledgebase., Nucleic Acids Res 29 (1) (2001) 234–8.
[14] M. Boguski, The turning point in genome research., Trends Biochem Sci. 20 (8) (1995) 295–6.
[15] M. Marra, L. Hillier, R. Waterston, Expressed sequence tags–ESTablishing bridges between genomes., Trends
Genet. 14 (1) (1998) 4–7.
[16] M. Adams, M. Dubnick, A. Kerlavage, R. Moreno, J. Kelley, T. Utterback, J. Nagle, C. Fields, J. Venter, Sequence
identification of 2,375 human brain genes., Nature. 355 (6361) (1992) 632–4.
[17] M. Adams, A. Kerlavage, C. Fields, J. Venter, 3,400 new expressed sequence tags identify diversity of transcripts
in human brain., Nat Genet. 4 (3) (1993) 256–67.
[18] T. Nakamura, G. Morin, K. Chapman, S. Weinrich, W. Andrews, J. Lingner, C. Harley, T. Cech, Telomerase
catalytic subunit homologs from fission yeast and human., Science. 277 (5328) (1997) 955–9.
[19] R. Medzhitov, P. Preston Hurlburt, C. J. Janeway, A human homologue of the Drosophila Toll protein signals
activation of adaptive immunity., Nature. 388 (6640) (1997) 394–7.
[20] F. Liang, I. Holt, G. Pertea, S. Karamycheva, S. Salzberg, J. Quackenbush, Gene index analysis of the human
genome estimates approximately 120,000 genes., Nat Genet. 25 (2) (2000) 239–40.
[21] T. Hudson, L. Stein, S. Gerety, J. Ma, A. Castle, J. Silva, D. Slonim, R. Baptista, L. Kruglyak, S. Xu, et al.;, An
STS-based map of the human genome., Science. 270 (5244) (1995) 1945–54.
[22] G. Schuler, M. Boguski, E. Stewart, L. Stein, G. Gyapay, K. Rice, R. White, P. Rodriguez Tome, A. Aggarwal,
E. Bajorek, et al., A gene map of the human genome., Science. 274 (5287) (1996) 540–6.
[23] P. Deloukas, G. Schuler, G. Gyapay, E. Beasley, C. Soderlund, P. Rodriguez Tome, L. Hui, T.Matise, K.McKusick,
Beckmann, et al., A physical map of 30,000 human genes., Science. 282 (5389) (1998) 744–6.
[24] R. Waterston, C. Martin, M. Craxton, C. Huynh, A. Coulson, L. Hillier, R. Durbin, P. Green, R. Shownkeen,
N. Halloran, et al.;, A survey of expressed genes in Caenorhabditis elegans., Nat Genet. 1 (2) (1992) 114–23.
[25] W. McCombie, M. Adams, J. Kelley, M. FitzGerald, T. Utterback, M. Khan, M. Dubnick, A. Kerlavage, J. Venter,
C. Fields, Caenorhabditis elegans expressed sequence tags identify gene families and potential disease gene
homologues., Nat Genet. 1 (2) (1992) 124–31.
genomic and metagenomic projects and their associated metadata., Nucleic Acids Res 36 (Database Issue) (2008)
D475–9.
[2] F. Sanger, G. Air, B. Barrell, N. Brown, A. Coulson, C. Fiddes, C. Hutchison, P. Slocombe, M. Smith, Nucliotide
sequence of bacteriophage phi X174 DNA., Nature 265 (5596) (1977) 687–95.
[3] F. Sanger, A. Coulson, T. Friedmann, G. Air, B. Barrell, N. Brown, J. Fiddes, C. r. Hutchison, P. Slocombe,
M. Smith, The nucleotide sequence of bacteriophage phiX174., J Mol Biol 125 (2) (1978) 225–46.
[4] F. Sanger, A. Coulson, G. Hong, D. Hill, G. Petersen, Nucleotide sequence of bacteriophage lambda DNA., J Mol
Biol 162 (4) (1982) 729–73.
16
[5] W. Fiers, R. Contreras, G. Haegemann, R. Rogiers, A. Van de Voorde, H. Van Heuverswyn, J. Van Herreweghe,
G. Volckaert, M. Ysebaert, Complete nucleotide sequence of SV40 DNA., Nature 273 (5658) (1978) 113–20.
[6] S. Anderson, A. Bankier, B. Barrell, M. de Bruijn, A. Coulson, J. Drouin, I. Eperon, D. Nierlich, B. Roe, F. Sanger,
et al., Sequence and organization of the human mitochondrial genome., Nature 290 (5806) (1981) 457–65.
[7] S. Anderson, Shotgun DNA sequencing using cloned DNase I-generated fragments., Nucleic Acids Res 9 (13)
(1981) 3015–27.
[8] P. Deininger, Random subcloning of sonicated DNA: application to shotgun DNA sequence analysis., Anal
Biochem 129 (1) (1983) 216–23.
[9] A. Edwards, H. Voss, P. Rice, A. Civitello, J. Stegemann, C. Schwager, J. Zimmermann, H. Erfle, C. Caskey,
W. Ansorge, Automated DNA sequencing of the human HPRT locus., Genomics 6 (4) (1990) 593–608.
[10] R. Wooster, Identification of the breast cancer susceptibility gene BRCA2., Nature 378 (1995) 789–92.
[11] R. Fleischmann, M. Adams, O. White, R. Clayton, E. Kirkness, A. Kerlavage, C. Bult, J. Tomb, B. Dougherty,
J. Merrick, et al.;, Whole-genome random sequencing and assembly of Haemophilus influenzae Rd., Science
269 (5223) (1995) 496–512.
[12] M. Adams, J. Kelley, J. Gocayne,M. Dubnick,M. Polymeropoulos,H. Xiao, C.Merril, A.Wu, B. Olde, R.Moreno,
et al, Complementary DNA sequencing: expressed sequence tags and human genome project., Science. 252 (5013)
(1991) 1651–6.
[13] A. Christoffels, A. van Gelder, G. Greyling, R. Miller, T. Hide, W. Hide, STACK: Sequence Tag Alignment and
Consensus Knowledgebase., Nucleic Acids Res 29 (1) (2001) 234–8.
[14] M. Boguski, The turning point in genome research., Trends Biochem Sci. 20 (8) (1995) 295–6.
[15] M. Marra, L. Hillier, R. Waterston, Expressed sequence tags–ESTablishing bridges between genomes., Trends
Genet. 14 (1) (1998) 4–7.
[16] M. Adams, M. Dubnick, A. Kerlavage, R. Moreno, J. Kelley, T. Utterback, J. Nagle, C. Fields, J. Venter, Sequence
identification of 2,375 human brain genes., Nature. 355 (6361) (1992) 632–4.
[17] M. Adams, A. Kerlavage, C. Fields, J. Venter, 3,400 new expressed sequence tags identify diversity of transcripts
in human brain., Nat Genet. 4 (3) (1993) 256–67.
[18] T. Nakamura, G. Morin, K. Chapman, S. Weinrich, W. Andrews, J. Lingner, C. Harley, T. Cech, Telomerase
catalytic subunit homologs from fission yeast and human., Science. 277 (5328) (1997) 955–9.
[19] R. Medzhitov, P. Preston Hurlburt, C. J. Janeway, A human homologue of the Drosophila Toll protein signals
activation of adaptive immunity., Nature. 388 (6640) (1997) 394–7.
[20] F. Liang, I. Holt, G. Pertea, S. Karamycheva, S. Salzberg, J. Quackenbush, Gene index analysis of the human
genome estimates approximately 120,000 genes., Nat Genet. 25 (2) (2000) 239–40.
[21] T. Hudson, L. Stein, S. Gerety, J. Ma, A. Castle, J. Silva, D. Slonim, R. Baptista, L. Kruglyak, S. Xu, et al.;, An
STS-based map of the human genome., Science. 270 (5244) (1995) 1945–54.
[22] G. Schuler, M. Boguski, E. Stewart, L. Stein, G. Gyapay, K. Rice, R. White, P. Rodriguez Tome, A. Aggarwal,
E. Bajorek, et al., A gene map of the human genome., Science. 274 (5287) (1996) 540–6.
[23] P. Deloukas, G. Schuler, G. Gyapay, E. Beasley, C. Soderlund, P. Rodriguez Tome, L. Hui, T.Matise, K.McKusick,
Beckmann, et al., A physical map of 30,000 human genes., Science. 282 (5389) (1998) 744–6.
[24] R. Waterston, C. Martin, M. Craxton, C. Huynh, A. Coulson, L. Hillier, R. Durbin, P. Green, R. Shownkeen,
N. Halloran, et al.;, A survey of expressed genes in Caenorhabditis elegans., Nat Genet. 1 (2) (1992) 114–23.
[25] W. McCombie, M. Adams, J. Kelley, M. FitzGerald, T. Utterback, M. Khan, M. Dubnick, A. Kerlavage, J. Venter,
C. Fields, Caenorhabditis elegans expressed sequence tags identify gene families and potential disease gene
homologues., Nat Genet. 1 (2) (1992) 124–31.
[ edit ]
Stefan Schaffer, 08.07.2011, Sommersemester 2011Colonization Process of the Brazilian Common Vesper Mouse, Calomys expulsus (Cricetidae, Sigmodontinae): A Biogeographic
Riverine barriers have been associated to genetic diversification and speciation of several taxa. The Rio Sa˜o Francisco is one
of the largest rivers in South America, representing the third largest river basin in Brazil and operating as a geographic
barrier to gene flow of different taxa. To evaluate the influence of the Rio Sa˜o Francisco in the speciation of small rodents,
we investigated the genetic structure of Calomys expulsus with phylogenetic and network analyses of cytochrome b DNA. Our
results suggested that C. expulsus can be divided into 3 subpopulations, 2 on the left and another one on the right bank of
this river. The time of divergence of these subpopulations, using a Bayesian framework, suggested colonization from the
south to the north/northeast. Spatial analysis using a clustering method and the Monmonier’s algorithm suggested that the
Rio Sa˜o Francisco is a biogeographic barrier to gene flow and indicated that this river may play a role in the incipient
speciation process of these subpopulations.
of the largest rivers in South America, representing the third largest river basin in Brazil and operating as a geographic
barrier to gene flow of different taxa. To evaluate the influence of the Rio Sa˜o Francisco in the speciation of small rodents,
we investigated the genetic structure of Calomys expulsus with phylogenetic and network analyses of cytochrome b DNA. Our
results suggested that C. expulsus can be divided into 3 subpopulations, 2 on the left and another one on the right bank of
this river. The time of divergence of these subpopulations, using a Bayesian framework, suggested colonization from the
south to the north/northeast. Spatial analysis using a clustering method and the Monmonier’s algorithm suggested that the
Rio Sa˜o Francisco is a biogeographic barrier to gene flow and indicated that this river may play a role in the incipient
speciation process of these subpopulations.
[ edit ]
Annegret Grimm, 04.07.2011, Sommersemester 2011Spatiotemporal dynamics of prairie wetland networks: power-law scaling and implications for conservation planning
Abstract. Although habitat networks show promise for conservation planning at regional scales, their spatiotemporal dynamics have not been well studied, especially in climatesensitive landscapes. Here I use satellite remote sensing to compile wetland habitat networks from the Prairie Pothole Region (PPR) of North America. An ensemble of networks assembled across a hydrologic gradient from deluge to drought and a range of representative dispersal distances exhibits power-law scaling of important topological parameters. Prairie wetland networks are ‘‘meso-worlds’’ with mean topological distance increasing faster with network size than small-world networks, but slower than a regular lattice (or ‘‘large world’’). This scaling implies rapid dispersal through wetland networks without some of the risks associated with ‘‘small worlds’’ (e.g., extremely rapid propagation of disease or disturbance). Retrospective analysis of wetland networks establishes a climatic envelope for landscape connectivity in the PPR, where I show that a changing climate might severely impact metapopulation viability and restrict long-distance dispersal and range shifts. More generally, this study demonstrates an efficient approach to conservation planning at a level of abstraction addressing key drivers of the global biodiversity crisis: habitat fragmentation and climatic change.
CHRISTOPHER K. WRIGHT (2010): Spatiotemporal dynamics of prairie wetland networks: power-law scaling and implications for conservation planning. Ecology, 91(7): 1924–1930.
Dean L. Urban,* Emily S. Minor, Eric A. Treml and Robert S. Schick (2009): Graph models of habitat mosaics. Ecology Letters 12: 260–273.
Dean L. Urban,* Emily S. Minor, Eric A. Treml and Robert S. Schick (2009): Graph models of habitat mosaics. Ecology Letters 12: 260–273.
[ edit ]
Belinda Kahnt, 04.07.2011, Sommersemester 2011The social network structure of a wild meerkat population: 2. intragroup interactions
-study of network structure of three interaction forms: grooming, dominance interactions, foraging competition in 8 meerkat pop.
-investigation of:
A) variation of network structure between groups
B) relationship between networks for different interaction forms
C) influence of group attributes (size, sex ratio), individual attributes (tenure of dominants) and ecological factors (ectoparasite load) on network structure
- results: measures of network structure vary between groups and between interaction forms within a group
- ecological factors, group and individual attributes change network structure
-investigation of:
A) variation of network structure between groups
B) relationship between networks for different interaction forms
C) influence of group attributes (size, sex ratio), individual attributes (tenure of dominants) and ecological factors (ectoparasite load) on network structure
- results: measures of network structure vary between groups and between interaction forms within a group
- ecological factors, group and individual attributes change network structure
[ edit ]
Alice De Mauro, 08.07.2011, Sommersemester 2011Extending pathways and processes using molecular interaction networks to analyse cancer genome data
Background
Cellular processes and pathways, whose deregulation may contribute to the development of cancers, are often represented as cascades of proteins transmitting a signal from the cell surface to the nucleus. However, recent functional genomic experiments have identified thousands of interactions for the signalling canonical proteins, challenging the traditional view of pathways as independent functional entities. Combining information from pathway databases and interaction networks obtained from functional genomic experiments is therefore a promising strategy to obtain more robust pathway and process representations, facilitating the study of cancer-related pathways.
Results
We present a methodology for extending pre-defined protein sets representing cellular pathways and processes by mapping them onto a protein-protein interaction network, and extending them to include densely interconnected interaction partners. The added proteins display distinctive network topological features and molecular function annotations, and can be proposed as putative new components, and/or as regulators of the communication between the different cellular processes. Finally, these extended pathways and processes are used to analyse their enrichment in pancreatic mutated genes. Significant associations between mutated genes and certain processes are identified, enabling an analysis of the influence of previously non-annotated cancer mutated genes.
Conclusions
The proposed method for extending cellular pathways helps to explain the functions of cancer mutated genes by exploiting the synergies of canonical knowledge and large-scale interaction data.
Cellular processes and pathways, whose deregulation may contribute to the development of cancers, are often represented as cascades of proteins transmitting a signal from the cell surface to the nucleus. However, recent functional genomic experiments have identified thousands of interactions for the signalling canonical proteins, challenging the traditional view of pathways as independent functional entities. Combining information from pathway databases and interaction networks obtained from functional genomic experiments is therefore a promising strategy to obtain more robust pathway and process representations, facilitating the study of cancer-related pathways.
Results
We present a methodology for extending pre-defined protein sets representing cellular pathways and processes by mapping them onto a protein-protein interaction network, and extending them to include densely interconnected interaction partners. The added proteins display distinctive network topological features and molecular function annotations, and can be proposed as putative new components, and/or as regulators of the communication between the different cellular processes. Finally, these extended pathways and processes are used to analyse their enrichment in pancreatic mutated genes. Significant associations between mutated genes and certain processes are identified, enabling an analysis of the influence of previously non-annotated cancer mutated genes.
Conclusions
The proposed method for extending cellular pathways helps to explain the functions of cancer mutated genes by exploiting the synergies of canonical knowledge and large-scale interaction data.
http://www.biomedcentral.com/1471-2105/11/597
[ edit ]
michael siebauer, 17.07.2009, Sommersemester 2009Protein Faltungs Simulationen - Review
-
Protein folding simulations: from coarse-grained model to all-atom model.
PMID: 19472192
PMID: 19472192
[ edit ]
Tobias Mede, 17.07.2009, Sommersemester 2009Domain-oriented edge-based alignment of protein interaction networks
ABSTRACT
Motivation: Recent advances in high-throughput experimental
techniques have yielded a large amount of data on protein–protein
interactions (PPIs). Since these interactions can be organized into
networks, and since separate PPI networks can be constructed for
different species, a natural research direction is the comparative
analysis of such networks across species in order to detect
conserved functional modules. This is the task of network alignment.
Results: Most conventional network alignment algorithms adopt a
node-then-edge-alignment paradigm: they first identify homologous
proteins across networks and then consider interactions among
them to construct network alignments. In this study, we propose
an alternative direct-edge-alignment paradigm. Specifically, instead
of explicit identification of homologous proteins, we directly infer
plausibly alignable PPIs across species by comparing conservation
of their constituent domain interactions. We apply our approach to
detect conserved protein complexes in yeast–fly and yeast–worm
PPI networks, and show that our approach outperforms two recent
approaches in most alignment performance metrics.
Motivation: Recent advances in high-throughput experimental
techniques have yielded a large amount of data on protein–protein
interactions (PPIs). Since these interactions can be organized into
networks, and since separate PPI networks can be constructed for
different species, a natural research direction is the comparative
analysis of such networks across species in order to detect
conserved functional modules. This is the task of network alignment.
Results: Most conventional network alignment algorithms adopt a
node-then-edge-alignment paradigm: they first identify homologous
proteins across networks and then consider interactions among
them to construct network alignments. In this study, we propose
an alternative direct-edge-alignment paradigm. Specifically, instead
of explicit identification of homologous proteins, we directly infer
plausibly alignable PPIs across species by comparing conservation
of their constituent domain interactions. We apply our approach to
detect conserved protein complexes in yeast–fly and yeast–worm
PPI networks, and show that our approach outperforms two recent
approaches in most alignment performance metrics.
[ edit ]
Arli Parikesit, 13.07.2009, Sommersemester 2009Functional protein divergence in the evolution of Homo sapiens
Background: Protein-coding regions in a genome evolve by sequence divergence and gene gain and loss, altering the gene content of the organism. However, it is not well understood how this has given rise to the enormous diversity of metazoa present today.
Results: To obtain a global view of human genomic evolution, we quantify the divergence of proteins by functional category at different evolutionary distances from human.
Conclusion: This analysis highlights some general systems-level characteristics of human evolution: regulatory processes, such as signal transducers, transcription factors and receptors, have a high degree of plasticity, while core processes, such as metabolism, transport and protein synthesis, are largely conserved. Additionally, this study reveals a dynamic picture of selective forces at short, medium and long evolutionary timescales. Certain functional categories, such as [...]
Results: To obtain a global view of human genomic evolution, we quantify the divergence of proteins by functional category at different evolutionary distances from human.
Conclusion: This analysis highlights some general systems-level characteristics of human evolution: regulatory processes, such as signal transducers, transcription factors and receptors, have a high degree of plasticity, while core processes, such as metabolism, transport and protein synthesis, are largely conserved. Additionally, this study reveals a dynamic picture of selective forces at short, medium and long evolutionary timescales. Certain functional categories, such as [...]
[ edit ]
Thomas Efer, Sommersemester 2009Nature-Article: "A simple rule for the evolution of cooperation on graphs"
"A fundamental aspect of all biological systems is cooperation. Cooperative interactions are required for many levels of biological organization ranging from single cells to groups of animals. Human society is based to a large extent on mechanisms that promote cooperation. It is well known that in unstructured populations, natural selection favors defectors over cooperators. There is much current interest, however, for studying evolutionary games in structured populations and on graphs. These efforts recognize the fact that who-meets-whom is not random, but determined by spatial relationships or social networks. Here we describe a surprisingly simple rule, which is a good approximation for all graphs that we have analyzed, including cycles, spatial lattices, random regular graphs, random graphs and scale-free networks: natural selection favors cooperation, if the benefit of the altruistic act, b, divided by the cost, c, exceeds the average number of neighbors, k. Therefore, cooperation can evolve as a consequence of [...]
[OHLE06] Hisashi Ohtsuki, Christoph Hauert, Erez Lieberman and Martin A. Nowak: A simple rule for the evolution of cooperation on graphs. Nature. 2006 May 25; 441(7092): 502–505. doi: 10.1038/nature04605
[ edit ]
Daniel Himmelbach, 17.07.2009, Sommersemester 2009A practical method for exact computation of subtree prune and regraft distance
Motivation: Subtree prune and regraft (SPR) is one kind of tree rearrangements that has seen applications in solving several computational biology problems. The minimum number of rooted SPR (rSPR) operations needed to transform one rooted binary tree to another is called the rSPR distance between the two trees.
Computing the rSPR distance has been actively studied in recent years. Currently, there is a lack of practical software tools for computing the rSPR distance for relatively large trees with large rSPR distance.
Results: In this article, we present a simple and practical method that computes the exact rSPR distance with integer linear programming.
By applying this new method on several simulated and real biological datasets, we show that our new method outperforms existing software tools in term of accuracy and efficiency. Our experimental results indicate that our method can compute the exact rSPR distance for many large trees with large rSPR distance.
Computing the rSPR distance has been actively studied in recent years. Currently, there is a lack of practical software tools for computing the rSPR distance for relatively large trees with large rSPR distance.
Results: In this article, we present a simple and practical method that computes the exact rSPR distance with integer linear programming.
By applying this new method on several simulated and real biological datasets, we show that our new method outperforms existing software tools in term of accuracy and efficiency. Our experimental results indicate that our method can compute the exact rSPR distance for many large trees with large rSPR distance.
Baroni,M. et al. (2005) Bounding the number of hybridisation events for a consistent evolutionary history. J. Math. Biol., 51, 171-182.
Bordewich,M. and Semple,C. (2004) On the computational complexity of the rooted subtree prune and regraft distance. Ann. Combinatorics, 8, 409-423.
Hein,J. et al. (1996) On the complexity of comparing evolutionary trees. Discrete Appl. Math., 71, 153-169.
Rodrigues,E.M. et al. (2001) Some approximation res [...]
Bordewich,M. and Semple,C. (2004) On the computational complexity of the rooted subtree prune and regraft distance. Ann. Combinatorics, 8, 409-423.
Hein,J. et al. (1996) On the complexity of comparing evolutionary trees. Discrete Appl. Math., 71, 153-169.
Rodrigues,E.M. et al. (2001) Some approximation res [...]
[ edit ]
Franziska Kutzera, Sommersemester 2009Evolutionary construction of Multiple Graph Alignments for the Structural Analysis of Biomolecules
The concept of multiple graph alignment has recently been
introduced as a novel method for the structural analysis of
biomolecules. Using approximate graph matching techniques, this
method enables the robust identification of approximately conserved
patterns in biologically related structures. In particular, multiple graph
alignment enables the characterization of functional protein families
independent of sequence or fold homology. This paper first recalls the
concept of multiple graph alignment and then addresses the problem
of computing optimal alignments from an algorithmic point of view.
In this regard, a method from the field of evolutionary algorithms is
proposed and empirically compared to a hitherto existing heuristic
approach. Empirically, it is shown that the former yields significantly
better results than the latter, albeit at the cost of an increased runtime.
introduced as a novel method for the structural analysis of
biomolecules. Using approximate graph matching techniques, this
method enables the robust identification of approximately conserved
patterns in biologically related structures. In particular, multiple graph
alignment enables the characterization of functional protein families
independent of sequence or fold homology. This paper first recalls the
concept of multiple graph alignment and then addresses the problem
of computing optimal alignments from an algorithmic point of view.
In this regard, a method from the field of evolutionary algorithms is
proposed and empirically compared to a hitherto existing heuristic
approach. Empirically, it is shown that the former yields significantly
better results than the latter, albeit at the cost of an increased runtime.
Bartz-Beielstein, T. (2006). Experimental research in evolutionary computation: The new experimentalism. Springer.
Bateman, A., Coin, L., Durbin, R., Finn, R. D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E. L. L., Studholme, D. J., Yeats, C., and Eddy, S. R. (2004). The Pfam protein families database. Nucleic Acids Research, 32, 138-141.
Berg, J. and L¨assig, M. (2004). Local graph alignment and motif search in biological networks. Proceeding
Bateman, A., Coin, L., Durbin, R., Finn, R. D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E. L. L., Studholme, D. J., Yeats, C., and Eddy, S. R. (2004). The Pfam protein families database. Nucleic Acids Research, 32, 138-141.
Berg, J. and L¨assig, M. (2004). Local graph alignment and motif search in biological networks. Proceeding
[ edit ]
Thomas Hofmann, Sommersemester 2009Perrodou, et al: A new protein linear motif benchmark for multiple sequence alignment software
__Background__:
Linear motifs (LMs) are abundant short regulatory sites used for modulating the functions of many eukaryotic proteins. They play important roles in post-translational modification, cell compartment targeting, docking sites for regulatory complex assembly and protein processing and cleavage. Methods for LM detection are now being developed that are strongly dependent on scores for motif conservation in homologous proteins. However, most LMs are found in natively disordered polypeptide segments that evolve rapidly, unhindered by structural constraints on the sequence. These regions of modular proteins are difficult to align using classical multiple sequence alignment programs that are specifically optimised to align the globular domains. As a consequence, poor motif alignment quality is hindering efforts to detect new LMs.
__Results__:
We have developed a new benchmark, as part of the BAliBASE suite, designed to assess the ability of standard multiple alignment methods to detect and align LMs. The reference alignments are organised into different test sets representing real alignment problems and contain examples of experimentally verified functional motifs, extracted from the Eukaryotic Linear Motif (ELM) database. The benchmark has been used to evaluate and compare a number of multiple alignment programs. With distantly related proteins, the worst alignment program correctly aligns 48% of LMs compared to 73% for the best program. However, the performance of all the programs is adversely affected by the introduction of other sequences containing false positive motifs. The ranking of the alignment programs based on LM alignment quality is similar to that observed when considering full-length protein alignments, however little correlation was observed between LM and overall alignment quality for individual alignment test cases.
__Conclusion__:
We have shown that none of the programs currently available is capable of reliabl
Linear motifs (LMs) are abundant short regulatory sites used for modulating the functions of many eukaryotic proteins. They play important roles in post-translational modification, cell compartment targeting, docking sites for regulatory complex assembly and protein processing and cleavage. Methods for LM detection are now being developed that are strongly dependent on scores for motif conservation in homologous proteins. However, most LMs are found in natively disordered polypeptide segments that evolve rapidly, unhindered by structural constraints on the sequence. These regions of modular proteins are difficult to align using classical multiple sequence alignment programs that are specifically optimised to align the globular domains. As a consequence, poor motif alignment quality is hindering efforts to detect new LMs.
__Results__:
We have developed a new benchmark, as part of the BAliBASE suite, designed to assess the ability of standard multiple alignment methods to detect and align LMs. The reference alignments are organised into different test sets representing real alignment problems and contain examples of experimentally verified functional motifs, extracted from the Eukaryotic Linear Motif (ELM) database. The benchmark has been used to evaluate and compare a number of multiple alignment programs. With distantly related proteins, the worst alignment program correctly aligns 48% of LMs compared to 73% for the best program. However, the performance of all the programs is adversely affected by the introduction of other sequences containing false positive motifs. The ranking of the alignment programs based on LM alignment quality is similar to that observed when considering full-length protein alignments, however little correlation was observed between LM and overall alignment quality for individual alignment test cases.
__Conclusion__:
We have shown that none of the programs currently available is capable of reliabl
tbd
[ edit ]
Michael Siebauer, 02.02.2009, Wintersemester 2008/2009Modifikation des Sankoff Algorithmus zur Homologiesuche
Vorgestellt wird eine Modifikation des Sankoff Algorithmus, die eine schnelle und speichersparende Berechnung eines semiglobalen Sequenz-/StrukturAligments ermöglicht.
Modifikation des Sankoff Algorithmus zur Homologiesuche – Bachelor Arbeit
Variantion on RNA Folding and Alignment – Lessons from Benasque
Inferring Noncoding RNA families and classes by means of Genome-Scale Structure-Based Clustering
Alignment of RNA base pairing probability matrices
Prediction of locally stable RNA secondary structures for genome-wide surveys
Secondary Structure Predicition for Aligned RNA Sequences
Variantion on RNA Folding and Alignment – Lessons from Benasque
Inferring Noncoding RNA families and classes by means of Genome-Scale Structure-Based Clustering
Alignment of RNA base pairing probability matrices
Prediction of locally stable RNA secondary structures for genome-wide surveys
Secondary Structure Predicition for Aligned RNA Sequences
[ edit ]
Christoph Kämpf, Wintersemester 2008/2009Combining statistical alignment and phylogenetic footprinting to detect regulatory elements
siehe gleichnamiges Paper.
siehe gleichnamiges Paper.
[ edit ]
Jan Engelhardt, 02.02.2009, Wintersemester 2008/2009Something about smyRNAs and slRNAs
Interaction of smyRNAs and slRNAs
wird nachgereicht
[ edit ]
Maria Herberg, 02.02.2009, Wintersemester 2008/2009Modelling Protein Interaction Networks - Age-Dependent Evolution of the Yeast Protein Interaction
Proteins interact in complex protein–protein interaction (PPI) networks whose topological properties—such as scale-free topology, hierarchical modularity, and dissortativity—have suggested models of network evolution. Currently preferred models invoke preferential attachment or gene duplication and divergence to produce networks whose topology matches that observed for real PPIs, thus supporting these as likely models for network evolution. Here, we show that the interaction density and homodimeric frequency are highly protein age–dependent in real PPI networks in a manner which does not agree with these canonical models. In light of these results, we propose an alternative stochastic model, which adds each protein sequentially to a growing network in a manner analogous to protein crystal growth (CG) in solution. The CG model produces PPI networks consistent in both topology and age distributions with real PPI networks and is well supported by the spatial arrangement of protein complexes of known 3-D structure, suggesting a plausible physical mechanism for network evolution. Kim WK, Marcotte EM (2008)
Kim WK, Marcotte EM (2008) Age-Dependent Evolution of the Yeast Protein Interaction Network Suggests a Limited Role of Gene Duplication and Divergence. PLoS Comput Biol 4(11);
Barabasi AL, Oltvai ZN (2004) Network biology: understanding the cell’s
functional organization. Nat Rev Genet 5;
Kim WK, Henschel A, Winter C, Schroeder M (2006) The many faces of protein–protein interactions: A compendium of interface geometry. PLoS Comput Biol 2(9);
Newman ME, Girvan M (2004) Finding an
Barabasi AL, Oltvai ZN (2004) Network biology: understanding the cell’s
functional organization. Nat Rev Genet 5;
Kim WK, Henschel A, Winter C, Schroeder M (2006) The many faces of protein–protein interactions: A compendium of interface geometry. PLoS Comput Biol 2(9);
Newman ME, Girvan M (2004) Finding an
[ edit ]
Daniel Exner, 11.07.2008, Sommersemester 2008Intelligente RNA-Komprimierung: Metric fuer Sekundaerstruktur Komplexitaet?
Kombinierte Sequenz und Struktur RNA Informationen als reinen Text zu betrachten und mit Standard Verfahren zu komprimieren wird als einem differenzierten Ansatz unterlegen gezeigt.
Ausserdem bietet die Algorithmus implizite Informationen zur Komplexitaet der Sekundaerstruktur.
Ausserdem bietet die Algorithmus implizite Informationen zur Komplexitaet der Sekundaerstruktur.
doi:10.1186/1471-2105-9-176
[ edit ]
Mandy Fuchs, 11.07.2008, Sommersemester 2008Computational prediction of host-pathogen protein-protein interactions
Infectious diseases such as malaria result in millions of deaths each year. An important aspect of any host-pathogen system is the mechanism by which a pathogen can infect its host. One method of infection is via protein-protein interactions (PPIs) where pathogen proteins target host proteins.
They present a method that integrates known intra-species PPIs with protein-domain profiles to predict PPIs between host and pathogen proteins.
They present a method that integrates known intra-species PPIs with protein-domain profiles to predict PPIs between host and pathogen proteins.
Computational prediction of host-pathogen protein-protein interactions, Matthew D. Dyer, T. M. Murali and Bruno W. Sobral, Bioinformatics 2007
[ edit ]
Sebastian Bartschat, 01.02.2008, Wintersemester 2007/2008comparative structure prediction of RNA molecules - using a non Sankoff approach
im wesentlichen dreht sich der vortrag um das tool RNAspa beziehungsweise um den algorithmus der dahinter steckt.
des weiteren wird er dann mit dem algotihmus hinter RNAcast verglichen.
des weiteren wird er dann mit dem algotihmus hinter RNAcast verglichen.
RNAspa:
http://www.biomedcentral.com/1471-2105/8/366
RNAcast:
http://bioinformatics.oxfordjournals.org/cgi/content/abstract/21/17/3516
http://www.biomedcentral.com/1471-2105/8/366
RNAcast:
http://bioinformatics.oxfordjournals.org/cgi/content/abstract/21/17/3516
[ edit ]
Mandy Fuchs, 09.07.2007, Sommersemester 2007Structural Alignment of Two RNA Sequences with Lagrangian Relaxation
RNA is generally a single-stranded molecule where the bases form hydrogen bonds within the same molecule leading to structure formation. In comparing different homologous RNA molecules it is usually not sufficient to consider only the primary sequence, but it is important to consider both the sequence and the structure of the molecules. Traditional alignment algorithms can only account for the sequence of bases, but not for the base pairings. Considering the structure leads to significant computational problems because of the dependencies introduced by the base pairings and the presence of pseudoknots. In this paper we address the problem of optimally aligning two given RNA sequences either with or without known structure (allowing for pseudoknots). We phrase the problem as an integer linear program and then solve it using Lagrangian relaxation. In our computational experiments we could align large problem instances—18S and 23S ribosomal RNA with up to 1500 bases within minutes while preserving pseudoknots.
[ edit ]
Andrej Aderhold, Sommersemester 2007Inference of miRNA targets using evolutionary conservation and pathway analysis
BACKGROUND: MicroRNAs have emerged as important regulatory genes in a variety of cellular processes and, in recent years, hundreds of such genes have been discovered in animals. In contrast, functional annotations are available only for a very small fraction of these miRNAs, and even in these cases only partially. RESULTS: We developed a general Bayesian method for the inference of miRNA target sites, in which, for each miRNA, we explicitly model the evolution of orthologous target sites in a set of related species. Using this method we predict target sites for all known miRNAs in flies, worms, fish, and mammals. By comparing our predictions in fly with a reference set of experimentally tested miRNA-mRNA interactions we show that our general method performs at least as well as the most accurate methods available to date, including ones specifically tailored for target prediction in fly. An important novel feature of our model is that it explicitly infers the phylogenetic distribution of functional target sites, independently for each miRNA. This allows us to infer species-specific and clade-specific miRNA targeting. We also show that, in long human 3
[ edit ]
Sebastian Bartschat, Sommersemester 2007Discovering structural motifs using a structural alphabet: Application to magnesium binding sites
http://www.biomedcentral.com/1471-2105/8/106
[ edit ]
Lydia Steiner, Sommersemester 2007Ontology development for biological systems: immunology
http://bioinformatics.oxfordjournals.org/cgi/reprint/23/7/913?maxtoshow=&HITS=80&hits=80&RESULTFORMAT=1&
title=gene%20ontology%20owl&andorexacttitle=or&titleabstract=gene%20ontology%20owl&andorexacttitleabs=or&
;fulltext=gene%20ontology%20owl&andorexactfulltext=or&searchid=1&FIRSTINDEX=0&sortspec=date&resource
type=HWCIT
[ edit ]
Marcus Lechner, Sommersemester 2007Understanding and using the meaning of statements in a bio-ontology
http://www.biomedcentral.com/content/pdf/1471-2105-8-57.pdf
[ edit ]
Christian Arnold, Sommersemester 2007BranchClust: a phylogenetic algorithm for selecting gene families
Background:
Automated methods for assembling families of orthologous genes include those based on sequence similarity scores and those based on phylogenetic approaches. The first are easy to automate but usually they do not distinguish between paralogs and orthologs or have restriction on the number of taxa. Phylogenetic methods often are based on reconciliation of a gene tree with a known rooted species tree; a limitation of this approach, especially in case of prokaryotes, is that the species tree is often unknown, and that from the analyses of single gene families the branching order between related organisms frequently is unresolved.
Results:
Here we describe an algorithm for the automated selection of orthologous genes that recognizes orthologous genes from different species in a phylogenetic tree for any number of taxa. The algorithm is capable of distinguishing complete (containing all taxa) and incomplete (not containing all taxa) families and recognizes in- and outparalogs. The BranchClust algorithm is implemented in Perl with the use of the BioPerl module for parsing trees and is freely available at http://bioinformatics.org/branchclust.
Conclusion:
BranchClust outperforms the Reciprocal Best Blast hit method in selecting more sets of putatively orthologous genes. In the test cases examined, the correctness of the selected families and of the identified in- and outparalogs was confirmed by inspection of the pertinent phylogenetic trees.
http://www.biomedcentral.com/1471-2105/8/120
[ edit ]
Christian Otto, Sommersemester 2007Comparing sequences without using alignments: application to HIV/SIV subtyping
http://www.biomedcentral.com/1471-2105/8/1/abstract
[ edit ]
Michael Siebauer, Sommersemester 2007Identifying bacterial genes and endosymbiont DNA with Glimmer
Die Autoren haben ein Modul (bzw. Ansatz gefunden) um bakterielle Gene aus dem Organismus Genom zu filtern. Es gibt wohl intrazellulär lebende Bakterien deren Genom beim Sequenzieren mit dem Wirtsgenom vermischt wird. Das OpenSource Programm "Glimmer" kann mittels trainierter Hidden-Markov-Modelle diese beiden Genome wieder trennen.
Bioinformatics 2007 23(6):673-679
doi:10.1093/bioinformatics/btm009
http://bioinformatics.oxfordjournals.org/cgi/content/full/23/6/673
doi:10.1093/bioinformatics/btm009
http://bioinformatics.oxfordjournals.org/cgi/content/full/23/6/673










