Dr. Lydia Müller

PostDoc

Research Interests

Bioinformatics

Bioinformatics, in particular development of data analysis methods, draw my attention since starting the study of computer science. The possibility to gain insight into large data with computational methods was and is fascinating for me. Therefore, my research interests always included aspects of data analysis such as statistical methods and started to include since last year also visualization techniques. I discovered that many data sets are public available in biology and many projects aiming at producing even more data. Therefore, I got interested in the analysis of such data and connected it with my interest in Epigenetics during my PhD. I continued and extended this research during my PostDoc.

Epigenetics

Epigenetics is a relatively new research area which benefit a lot of the new sequencing techniques and ChIP. Due to data produced by the combination of both techniques, scientists are enabled to identify genome wide distribution of epigenetics marks and DNA-bound proteins and infer the function of these marks as well as regulatory mechanisms. Furthermore, connections to transcriptome data is possible using RNA-seq techniques.

In my research, I am particularly interested in quantitatively analyzing such high-throughput-data to find out more about the changes of patterns of epigenetics marks during the processes of differentiation and development. Furthermore, I think that analysis of such data allows conclusions and predictions for regulatory mechanisms initiating and controlling both processes. An global understanding of the epigenetics regulation may help to understand the nature of diseases associated with environmental factors and disregulation during development.

My current research additionally include the modeling of epigenetic process. I'm in particular interested in epigenetic memory and cell fate. The current knowledge of epigenetic mechanisms already shows the importance of epigenetic regulation during cell development and data analysis shows that cell fate and identity are strongly depend on the ability of the cell to retain epigenetics patterns throughout cell division -- the epigenetic memory. However, knowledge about the the detailed mechanism and evolutionary conservation of them is currently legging. Furthermore, analysis of the stability and dependencies of epigenetic memory is not possible in detail. With my model I tackle the last to questions.

Epigenetics

Linguistics

There are several similarities between Linguistics and Bioinformatics. For example, both fields analyze strings. While in Bioinformatics, these string are relatively, are composed of only a few different characters and their structure is hard to determine, in Linguistics such strings consists of a lot of different characters and are compared to biological sequences very short and highly structured.

In contrast to Linguistics, there is a strong interaction between biologists and computer scientists. As a result several methods exists for task like comparing sequences, finding homologous sequences or reconstructing trees. For the same tasks in Linguistics only a few programs exists and a lot of work is done by hand. In my diploma thesis project I tried to apply bioinformatic methods to lingusitical data to find cognates in data sets with at least 1000 words in at least 3 languages. After my diploma thesis, I continue with this project to optimize and improve the models and algorithms for language data. More importantly, there are several problems where Linguistic and Computer Science/ Bioinformatics can benefit from cooperations. Thus, I also interested in the field of computational Humanities and especially computational Linguistics.

Linguistics

Epigenetics and Bioinformatics

Epigenetics was the research topic in my PhD thesis. The main focus lay on the understanding of the dynamics of epigenetics states and mechanisms especially those referring aging and differentiation. I continue working the field of epigenetics during PostDoc. I'm still analyzing the epigenetic state and their dynamics but extended my research areas to modeling of epigenetic memory and the protein components responsible for writing, erasing, and reading histone modifications.

Currently, it is unknown which specific modifications are required to define a special cell type and how much variation/fluctuations of histone modifications and DNA modifications can compensated by a cell without changing its epigenetic state. Thus, one main problem is to find comparison criteria for epigenetics states. These criteria may can found by comparing the epigenetic states of different cell types in different enviroments.

The chromatin of cell is dynamic system changing slightly over time but staying in the stable state. Modeling its behavior enables us to test different hypothesis such as histone distribution strategies during cell division or reprogramming strategies and find out how likely the hypothesis are. Furthermore, with modeling epigenetic memory, I tackle the questions how many different stable epigenetic state may exists at the same time, how much noise the such a system can tolerate, and which components of the cell play a key role in epigenetic regulation and cell identity.

During my PhD, I was part of an interdisciplinary project combining data analysis and modeling. Our model describes histone modifications on the base of interactions complexes binding which may bind to DNA and/or histones to write there modifications. Demodifications of the histone occurs at constant rate independent of a interactions complex. We fit our model to the Polycomb (PcG)/Thritorax (Trx) system in which PcG writes H3K27me3 (a repressing mark) and Trx writes H3K4me3 (an activating mark). Demodifications occurs during cell division where the modifications are diluted (modified histones are randomly distributed to both daughter cells). With this model, we can simulate proliferation induced cell differentiation.

I am working on an model for epigenetic memory allowing for simulations of cell division and analysis of the stability of epigenetic states under a variety of conditions. I based the model on the knowledge found during literature review on epigenetic inheritance and regulation. Simulation with or without particular components of the regulation system of histone modifications can be performed enabling conclusions on the importance of those components or alternative regulation mechanisms. Furthermore, studying the requirements, accuracy, and natural limitation of the epigenetic regulation due to cell division is possible and one aim of my research.

The availability of large data sets measuring epigenetic modifications and transcriptomic data motivates me to perform analysis to support models for epigenetics but also generating testable hypotheses for epigenetic patterns and mechanisms of regulation. In my opinion, such large data sets allow us to gain insights into the epigenetic regulation.

The increasing amount of high-throughout data sets of different kinds requires the development of neew methods to process the data sets in a meaningful manner. This motivates me develop a new peak caller for ChIP-seq data which makes use of replicates. It is robust against noise and can handle replicates from different batches or laboratories. Furthermore, it allows quality control during the peak calling process. I designed a benchmark data set which allows us to show that our method is substantially better than existing methods. Using data sets from the Roadmap Epigenomics Project, my co-authors and me could show that the peak calls of our new pak caller make more sense from a biological point of view.

I take part at the collaboration with the Image and Signal Processing group. With this collaboration, we aim at the development for exploration and visualization tools for large chromatin data sets. A first approach enables us to compare different modifications in different cell types without prior knowledge of the data itself. Therefore, it helps to formulate hypotheses. We continued this collaboration with the help of several master students and one PhD student. As a result, we recently submitted our second joint paper to the BioVis 2016.

Not only histone modifications or simulations provide information on epigenetic regulation. Also long non-coding RNAs (lncRNAs) play a important role in the regulation of developnmental process. It is shown that many of them interact with chromatin modifiers and guide them to specific position in the genome by complementary binding. Not many is known about the interaction of lncRNAs and chromatin modifiers. In a female start up fund was granted to me to investigate the binding sites of PRC2 with lncRNAs. The main aims are (1) to find binding motifs for PRCS2 and (2) to develop a method to detect binding sites based on genomic conservation, structure predicitons, and CLIP-seq data.

I am involved in the project on genome evolution. The main aim here is to allow to compare genome associated data (such as transcriptomic data or epigenetic data) of different species. The comparison is based on the constructed supergenome on which the different annotations and data sets of the species are mapped.

Linguistics

While writing my diploma thesis, I explore the similarities between linguistics computer science and bioinformatics. There are a lot of parallels in the methods of both fields mainly emerging from fact that in both field the main object, either language or DNA/RNA/protein sequences, are represented as strings. While details are greatly different, the basic algorithms and ideas are widely the same.

In my diploma thesis, I proof that bioinformatics methods such as pairwise and multiple alignments, phylogenetic algorithms and clustering methods can be used to find words which originate from the same ancestral word. I based my pipeline on the work flow which is called comparative methods in historical linguistics. Much of this work flow is very similar to the typical way of discovering homologous sequences in bioinformatics. Thus, I could use some strategies originating from homology detection.

Currently, we are working on a more sophisticated approach using bigram to increase the sensitivity and accuracy. We collaborate with Dr. Christian Höner zu Siederdissen at the TBI, Vienna, Austria who provide a framework for fast implementations of grammar products and supports us to build alignment programs for bigram alignments. Furthermore, a collaboration with Prof. Tanmoy Bhattacharya at the SFI, Santa Fe, NM, USA is planed focusing on cognate detection and proto-language reconstruction.

Since large data sets are rare in historical linguistics, I furthermore interested in generating such data sets using computational approach. Such a large data set would not only provide more statistical power for detection of cognates and proto-language reconstruction but also enable reliable loan word detection. Likely, it will be possible to determine the strata of the detected loan words and thus, enable large scale analyses of contacts between languages.

Curriculum Vitae

I had studied computer science from October 2004 to September 2009 at the University of Leipzig. During the study, I was student assistant in the Natural Language Processing group, University of Leipzig and Bioformatics, University of Leipzig. I also visit the Max-Planck-Institute for Mathematics in the Science for an internship.

After the end of my study in 2009, I had became a PhD student in the Bioinformatics Group and IZBI. With the creation of the Junior-professorship for computational EvoDevo, I belonged to this group and I am funded by the MAGE Project which is affiliate at the IZBI. My PhD thesis topic was about epigenetical regulation and aging. Nevertheless, I also continue to work on computational linguistics.

In June 2013, I became a PostDoc at the Bioinformatics group, Junior-professorship for Computational EvoDevo, IZBI, and Wisconsin Institute for Discovery. I worked on the "Origin of regulation" founded by the Templeton Foundation.

From Januar 2015 to August 2016, I was a PostDoc at the Bioinformatics group and analyzed RNA-Seq data. Additionally, I still had a high interest in exploring epigenetic data set to understand the underlying dynamics and regulatory mechanisms. I furthermore interested in the development of methods to analyze high-throughput sequencing data. I kept on working on the evolution of natural languages.

Since April 2016, I supervise a PhD student working on the interaction of lncRNAs with chromatin modifiers. This research is possible due to a female start up funding granted to me.

In August 2016, I switched to the Natural Language Processing Group at the University of Leipzig. As part of the CLARIN-D Project, I re-engineer the ASV toolbox: a modular collection of NLP tools which will be available online. It provides a tool set for reasearchers which allows to execute NLP tasks on written language records. Since it is available online it also allows students to explore NLP tasks and to unserstand to algorthms used in the field by playing with their own data. Apart from ASV Toolbox, I work on disentabling the history of langauges and words computationally but also keep track on bioinformatics methods for genome evolution and the analysis of high throughput sequencing data.

Publications and Conferences

Publications

Design specifications for cellular regulation
David C. Krakauer, Lydia Müller, Sonja J. Prohaska, Peter F. Stadler
Theory in Biosciences
Article Preprint
Sierra platinum: a fast and robust peak-caller for replicated ChIP-seq experiments with visual quality-control and -steering
Lydia Müller, Daniel Gerighausen, Mariam Farman, Dirk Zeckzer
2016
BMC Bioinformatics
Article
Dual RNA-seq unveils noncoding RNA functions in host-pathogen interactions
Alexander J. Westermann, Konrad U. Förster, Fabian Amman, Lars Barquist, Yanjie Chao, Leon N. Schulte, Lydia Müller, Richard Reinhardt, Peter F. Stadler, Jörg Vogel
2016
Nature
Article
The p53-p21-DREAM-CDE/CHR pathway regulates G2/M cell cycle genes
Martin Fischer, Marianne Quaas, Lydia Steiner, Kurt Engeland
2015
Nucl. Acids Res.
Article
The ancestor of modern Holozoa acquired the CCA-adding enzyme from Alphaproteobacteria by horizontal gene transfer
Heike Betat, Tobias Mede, Sandy Tretbar, Lydia Steiner, Peter F. Stadler, Mario Mörl, Sonja J. Prohaska
2015
Nucl. Acids Res.
Article
The transcription factor p53: Not a repressor, solely an activator
Martin Fischer, Lydia Steiner, Kurt Engeland
2014
Cell Cycle
Article
Analyzing Chromatin Using Tiled Binned Scatterplot Matrices
Dirk Zeckzer, Daniel Gerighausen, Lydia Steiner, Sonja J. Prohaska
2014
BioVis 2014 conference in Boston, USA
Article
The Dynamic Epigenome --- Analysis of the Distribution of Histone Modifications
Lydia Steiner
2013
Dissertation published on Qucosa
Pitfalls of Ascertainment Biases in Genome Annotations --- Computing Comparable Protein Domain Distributions in Eukarya
Arli A. Parikesit, Lydia Steiner, Peter F. Stadler, Sonja J. Prohaska
2013
Malaysian Journal of Fundamental and Applied Sciences
Article
Transcriptional regulation by histone modifications: towards a theory of chromatin re-organization during stem cell differentiation
Hans Binder, Lydia Steiner, Thimo Rohlf, Sonja Prohaska, Jörg Galle
2013
Physical Biology
Abstract
A Global Genome Segmentation Method for Exploration of Epigenetic Patterns}
Lydia Steiner, Lydia Hopp, Henry Wirth, Jörg Galle, Hans Binder, Sonja J. Prohaska, Thimo Rohlf
2012
PlosOne
Article
Modeling the dynamic epigenome: from histone modifications towards self-organizing chromatin
Thimo Rohlf, Lydia Steiner, Jens Przybilla, Sonja Prohaska, Hans Binder, Jörg Galle
2012, Epigenomics
PUBMED
A Pipeline for Computational Historical Linguistics
Lydia Steiner, Peter F. Stadler, Michael Cysouw
2011, Language Dynamics and Change
Abstract
Proteinortho: Detection of (Co-)Orthologs in Large-Scale Analysis
Marcus Lechner, Sven Findeiß, Lydia Steiner, Manja Marz, Peter F. Stadler, Sonja J. Prohaska
2011, BMC Bioinformatics
Abstract

Conferences and Seminars

Symposium Environmental Genomics in Aquatic Systems: Current State and Future Perspectives
23.09.2016, Limnological Institute, University of Konstanz
Invited Talk: Studying Host-Pathogen-Interactions using dual RNA-seq
SFI Working Group: Lexical Semantic Networks and Language Change
17.03.2016-18.03.2016
Talk about distributional semantics across language borders.
SFI Working Group: THE LOGIC AND DYNAMICS OF EPIGENETIC REGULATION
17.03.2014-21.03.2014
Talk about our model for epigenetic memory
SFI Working Group: HISTORICAL LINGUISTICS: PROCESSES, INFERENCE, AND RECONSTRUCTION
13.03.2014-14.03.2014
Talk about the ongoing research on historical linguistics in bioinformatics group
Epigenetics Europe 2011
08.09.11-09.09.11, Hotel Holiday Inn Munich City Centre, Munich
Poster: Visualizing the Dynamic Epigenome
Lydia Steiner, Thimo Rohlf, Jörg Galle, Hans Binder, Lydia Hopp, Henry Wirth, Sonja Prohaska
Abstract Poster
Fourth Weißenburg Symposium "Epigenetics and the Control of Gene Expression"
20.06.11 - 22.06.11, Kulturzentrum Karmeliterkirche, Weißenburg (Bayern)
CITEC Workshop on Evolution of Human Language
28.04.2011-29.04.2011, Center of Excellence Cognitive Interaction Technology, University of Bielefeld, Bielefeld
Talk: A Pipeline for Computational Historical Linguistics
Lydia Steiner, Peter F. Stadler,Michael Cysouw
Abstract see Article with same title
14th Annual Workshop on American Indigenous Languages
15.04.2011-16.04.2011, Student Resources Building, UCSB, Santa Barbara, CA
Talk: Assisted reconstruction: The cases of Panoan and Mataco-Guiacuruan
Lydia Steiner, Michael Cysouw
Abstract
26th TBI Winterseminar in Bled, 5th Annual Meeting of the Bompfünewerer Consortium
13.02.2011-20.02.2011, Bled, Slovenia
Talk: Identify Homologous Words
Lydia Steiner
8. Herbstseminar 2010 Vysoka Lipa (Decin)
05.10.2010 - 10.10.2010, Vysoka Lipa, Decin
Talk: Tracing Histone Modifications
Lydia Steiner
INRIA-IZBI-Workshop 2010
01.09.2011, IZBI, Leipzig
Talk: Models of Epigenetic Regulation: Histone Modifications - part I
Lydia Steiner
Transcription, chromatin structure and DNA repair in development and differentiation
07.07.2010-10.07.2010, Zeche Zollverein, Essen
Poster: Novel findings on the genome-wide correlation of chromatin marks and CpGs
Lydia Steiner, Sonja Prohaska, Jörg Galle
poster poster advertiment
25th TBI Winterseminar in Bled, 4th Annual Meeting of the Bompfünewerer Consortium
14.02.2010-21.02.2010, Bled, Slovenia
Talk: An Example for Chromatin Regulation
Lydia Steiner
7. Herbstseminar 2009 Vysoka Lipa (Decin)
21.10.2009 - 25.10.2009, Vysoka Lipa, Decin
Talk: www - world wide words
Lydia Steiner
6. Herbstseminar 2008 Studeny
31.10.2008 - 04.11.2008, Ceska Kamenice
Talk: something about languages
Lydia Steiner

Contact

Lydia Müller

Natural Language Processing Group
Department of Computer Science
University of Leipzig

phone: +49 (0)341 97 32315
email:

are-lydia-xya34[at]ddks-bioinf.uni-leipzig.de

address
Automatische Sprachverarbeitung
Institut für Informatik
Universität Leipzig
Augustusplatz 10
04109 Leipzig

postal address
Automatische Sprachverarbeitung
Institut für Informatik
Universität Leipzig
PF 100920
04009 Leipzig