Research Interests


Bioinformatics, in particular development of data analysis methods, draw my attention since starting the study of computer science. The possibility to gain insight into large data with computational methods was and is fascinating for me. Therefore, my research interests always included aspects of data analysis such as statistical methods and started to include since last year also visualization techniques. I discovered that many data sets are public available in biology and many projects aiming at producing even more data. Therefore, I got interested in the analysis of such data and connected it with my interest in Epigenetics during my PhD. I continued and extended this research during my PostDoc.


Epigenetics is a relatively new research area which benefit a lot of the new sequencing techniques and ChIP. Due to data produced by the combination of both techniques, scientists are enabled to identify genome wide distribution of epigenetics marks and DNA-bound proteins and infer the function of these marks as well as regulatory mechanisms. Furthermore, connections to transcriptome data is possible using RNA-seq techniques.

In my research, I am particularly interested in quantitatively analyzing such high-throughput-data to find out more about the changes of patterns of epigenetics marks during the processes of differentiation and development. Furthermore, I think that analysis of such data allows conclusions and predictions for regulatory mechanisms initiating and controlling both processes. An global understanding of the epigenetics regulation may help to understand the nature of diseases associated with environmental factors and disregulation during development.

My current research additionally include the modeling of epigenetic process. I'm in particular interested in epigenetic memory and cell fate. The current knowledge of epigenetic mechanisms already shows the importance of epigenetic regulation during cell development and data analysis shows that cell fate and identity are strongly depend on the ability of the cell to retain epigenetics patterns throughout cell division -- the epigenetic memory. However, knowledge about the the detailed mechanism and evolutionary conservation of them is currently legging. Furthermore, analysis of the stability and dependencies of epigenetic memory is not possible in detail. With my model I tackle the last to questions.



There are several similarities between Linguistics and Bioinformatics. For example, both fields analyze strings. While in Bioinformatics, these string are relatively, are composed of only a few different characters and their structure is hard to determine, in Linguistics such strings consists of a lot of different characters and are compared to biological sequences very short and highly structured.

In contrast to Linguistics, there is a strong interaction between biologists and computer scientists. As a result several methods exists for task like comparing sequences, finding homologous sequences or reconstructing trees. For the same tasks in Linguistics only a few programs exists and a lot of work is done by hand. In my diploma thesis project I tried to apply bioinformatic methods to lingusitical data to find cognates in data sets with at least 1000 words in at least 3 languages. After my diploma thesis, I continue with this project to optimize and improve the models and algorithms for language data. More importantly, there are several problems where Linguistic and Computer Science/ Bioinformatics can benefit from cooperations. Thus, I also interested in the field of computational Humanities and especially computational Linguistics.


Epigenetics and Bioinformatics

Epigenetics was the research topic in my PhD thesis. The main focus lay on the understanding of the dynamics of epigenetics states and mechanisms especially those referring aging and differentiation. I continue working the field of epigenetics during PostDoc. I'm still analyzing the epigenetic state and their dynamics but extended my research areas to modeling of epigenetic memory and the protein components responsible for writing, erasing, and reading histone modifications.

Currently, it is unknown which specific modifications are required to define a special cell type and how much variation/fluctuations of histone modifications and DNA modifications can compensated by a cell without changing its epigenetic state. Thus, one main problem is to find comparison criteria for epigenetics states. These criteria may can found by comparing the epigenetic states of different cell types in different enviroments.

The chromatin of cell is dynamic system changing slightly over time but staying in the stable state. Modeling its behavior enables us to test different hypothesis such as histone distribution strategies during cell division or reprogramming strategies and find out how likely the hypothesis are. Furthermore, with modeling epigenetic memory, I tackle the questions how many different stable epigenetic state may exists at the same time, how much noise the such a system can tolerate, and which components of the cell play a key role in epigenetic regulation and cell identity.

During my PhD, I was part of an interdisciplinary project combining data analysis and modeling. Our model describes histone modifications on the base of interactions complexes binding which may bind to DNA and/or histones to write there modifications. Demodifications of the histone occurs at constant rate independent of a interactions complex. We fit our model to the Polycomb (PcG)/Thritorax (Trx) system in which PcG writes H3K27me3 (a repressing mark) and Trx writes H3K4me3 (an activating mark). Demodifications occurs during cell division where the modifications are diluted (modified histones are randomly distributed to both daughter cells). With this model, we can simulate proliferation induced cell differentiation.

I am working on an model for epigenetic memory allowing for simulations of cell division and analysis of the stability of epigenetic states under a variety of conditions. I based the model on the knowledge found during literature review on epigenetic inheritance and regulation. Simulation with or without particular components of the regulation system of histone modifications can be performed enabling conclusions on the importance of those components or alternative regulation mechanisms. Furthermore, studying the requirements, accuracy, and natural limitation of the epigenetic regulation due to cell division is possible and one aim of my research.

The availability of large data sets measuring epigenetic modifications and transcriptomic data motivates me to perform analysis to support models for epigenetics but also generating testable hypotheses for epigenetic patterns and mechanisms of regulation. In my opinion, such large data sets allow us to gain insights into the epigenetic regulation.

The increasing amount of high-throughout data sets of different kinds requires the development of neew methods to process the data sets in a meaningful manner. This motivates me develop a new peak caller for ChIP-seq data which makes use of replicates. It is robust against noise and can handle replicates from different batches or laboratories. Furthermore, it allows quality control during the peak calling process. I designed a benchmark data set which allows us to show that our method is substantially better than existing methods. Using data sets from the Roadmap Epigenomics Project, my co-authors and me could show that the peak calls of our new pak caller make more sense from a biological point of view.

I take part at the collaboration with the Image and Signal Processing group. With this collaboration, we aim at the development for exploration and visualization tools for large chromatin data sets. A first approach enables us to compare different modifications in different cell types without prior knowledge of the data itself. Therefore, it helps to formulate hypotheses. We continued this collaboration with the help of several master students and one PhD student. As a result, we recently submitted our second joint paper to the BioVis 2016.

Not only histone modifications or simulations provide information on epigenetic regulation. Also long non-coding RNAs (lncRNAs) play a important role in the regulation of developnmental process. It is shown that many of them interact with chromatin modifiers and guide them to specific position in the genome by complementary binding. Not many is known about the interaction of lncRNAs and chromatin modifiers. In a female start up fund was granted to me to investigate the binding sites of PRC2 with lncRNAs. The main aims are (1) to find binding motifs for PRCS2 and (2) to develop a method to detect binding sites based on genomic conservation, structure predicitons, and CLIP-seq data.

I am involved in the project on genome evolution. The main aim here is to allow to compare genome associated data (such as transcriptomic data or epigenetic data) of different species. The comparison is based on the constructed supergenome on which the different annotations and data sets of the species are mapped.


While writing my diploma thesis, I explore the similarities between linguistics computer science and bioinformatics. There are a lot of parallels in the methods of both fields mainly emerging from fact that in both field the main object, either language or DNA/RNA/protein sequences, are represented as strings. While details are greatly different, the basic algorithms and ideas are widely the same.

In my diploma thesis, I proof that bioinformatics methods such as pairwise and multiple alignments, phylogenetic algorithms and clustering methods can be used to find words which originate from the same ancestral word. I based my pipeline on the work flow which is called comparative methods in historical linguistics. Much of this work flow is very similar to the typical way of discovering homologous sequences in bioinformatics. Thus, I could use some strategies originating from homology detection.

Currently, we are working on a more sophisticated approach using bigram to increase the sensitivity and accuracy. We collaborate with Dr. Christian Höner zu Siederdissen at the TBI, Vienna, Austria who provide a framework for fast implementations of grammar products and supports us to build alignment programs for bigram alignments. Furthermore, a collaboration with Prof. Tanmoy Bhattacharya at the SFI, Santa Fe, NM, USA is planed focusing on cognate detection and proto-language reconstruction.

Since large data sets are rare in historical linguistics, I furthermore interested in generating such data sets using computational approach. Such a large data set would not only provide more statistical power for detection of cognates and proto-language reconstruction but also enable reliable loan word detection. Likely, it will be possible to determine the strata of the detected loan words and thus, enable large scale analyses of contacts between languages.

Curriculum Vitae

I had studied computer science from October 2004 to September 2009 at the University of Leipzig. During the study, I was student assistant in the Natural Language Processing group, University of Leipzig and Bioformatics, University of Leipzig. I also visit the Max-Planck-Institute for Mathematics in the Science for an internship.

After the end of my study in 2009, I had became a PhD student in the Bioinformatics Group and IZBI. With the creation of the Junior-professorship for computational EvoDevo, I belonged to this group and I am funded by the MAGE Project which is affiliate at the IZBI. My PhD thesis topic was about epigenetical regulation and aging. Nevertheless, I also continue to work on computational linguistics.

In June 2013, I became a PostDoc at the Bioinformatics group, Junior-professorship for Computational EvoDevo, IZBI, and Wisconsin Institute for Discovery. I worked on the "Origin of regulation" founded by the Templeton Foundation.

From Januar 2015 to August 2016, I was a PostDoc at the Bioinformatics group and analyzed RNA-Seq data. Additionally, I still had a high interest in exploring epigenetic data set to understand the underlying dynamics and regulatory mechanisms. I furthermore interested in the development of methods to analyze high-throughput sequencing data. I kept on working on the evolution of natural languages.

Since April 2016, I supervise a PhD student working on the interaction of lncRNAs with chromatin modifiers. This research is possible due to a female start up funding granted to me.

In August 2016, I switched to the Natural Language Processing Group at the University of Leipzig. As part of the CLARIN-D Project, I re-engineer the ASV toolbox: a modular collection of NLP tools which will be available online. It provides a tool set for reasearchers which allows to execute NLP tasks on written language records. Since it is available online it also allows students to explore NLP tasks and to unserstand to algorthms used in the field by playing with their own data. Apart from ASV Toolbox, I work on disentabling the history of langauges and words computationally but also keep track on bioinformatics methods for genome evolution and the analysis of high throughput sequencing data.

Publications and Conferences

  • Design specifications for cellular regulation
    David C. Krakauer, Lydia Müller, Sonja J. Prohaska, Peter F. Stadler
    Theory in Biosciences
    Article Preprint
  • Sierra platinum: a fast and robust peak-caller for replicated ChIP-seq experiments with visual quality-control and -steering
    Lydia Müller, Daniel Gerighausen, Mariam Farman, Dirk Zeckzer
    BMC Bioinformatics
  • Dual RNA-seq unveils noncoding RNA functions in host-pathogen interactions
    Alexander J. Westermann, Konrad U. Förster, Fabian Amman, Lars Barquist, Yanjie Chao, Leon N. Schulte, Lydia Müller, Richard Reinhardt, Peter F. Stadler, Jörg Vogel
  • The p53-p21-DREAM-CDE/CHR pathway regulates G2/M cell cycle genes
    Martin Fischer, Marianne Quaas, Lydia Steiner, Kurt Engeland
    Nucl. Acids Res.
  • The ancestor of modern Holozoa acquired the CCA-adding enzyme from Alphaproteobacteria by horizontal gene transfer
    Heike Betat, Tobias Mede, Sandy Tretbar, Lydia Steiner, Peter F. Stadler, Mario Mörl, Sonja J. Prohaska
    Nucl. Acids Res.
  • The transcription factor p53: Not a repressor, solely an activator
    Martin Fischer, Lydia Steiner, Kurt Engeland
    Cell Cycle
  • Analyzing Chromatin Using Tiled Binned Scatterplot Matrices
    Dirk Zeckzer, Daniel Gerighausen, Lydia Steiner, Sonja J. Prohaska
    BioVis 2014 conference in Boston, USA
  • The Dynamic Epigenome --- Analysis of the Distribution of Histone Modifications
    Lydia Steiner
    Dissertation published on Qucosa
  • Pitfalls of Ascertainment Biases in Genome Annotations --- Computing Comparable Protein Domain Distributions in Eukarya
    Arli A. Parikesit, Lydia Steiner, Peter F. Stadler, Sonja J. Prohaska
    Malaysian Journal of Fundamental and Applied Sciences
  • Transcriptional regulation by histone modifications: towards a theory of chromatin re-organization during stem cell differentiation
    Hans Binder, Lydia Steiner, Thimo Rohlf, Sonja Prohaska, Jörg Galle
    Physical Biology
  • A Global Genome Segmentation Method for Exploration of Epigenetic Patterns}
    Lydia Steiner, Lydia Hopp, Henry Wirth, Jörg Galle, Hans Binder, Sonja J. Prohaska, Thimo Rohlf
  • Modeling the dynamic epigenome: from histone modifications towards self-organizing chromatin
    Thimo Rohlf, Lydia Steiner, Jens Przybilla, Sonja Prohaska, Hans Binder, Jörg Galle
    2012, Epigenomics
  • A Pipeline for Computational Historical Linguistics
    Lydia Steiner, Peter F. Stadler, Michael Cysouw
    2011, Language Dynamics and Change
  • Proteinortho: Detection of (Co-)Orthologs in Large-Scale Analysis
    Marcus Lechner, Sven Findeiß, Lydia Steiner, Manja Marz, Peter F. Stadler, Sonja J. Prohaska
    2011, BMC Bioinformatics
Conferences and Seminars
  • Symposium Environmental Genomics in Aquatic Systems: Current State and Future Perspectives
    23.09.2016, Limnological Institute, University of Konstanz
    Invited Talk: Studying Host-Pathogen-Interactions using dual RNA-seq
  • SFI Working Group: Lexical Semantic Networks and Language Change
    Talk about distributional semantics across language borders.
    Talk about our model for epigenetic memory
    Talk about the ongoing research on historical linguistics in bioinformatics group
  • Epigenetics Europe 2011
    08.09.11-09.09.11, Hotel Holiday Inn Munich City Centre, Munich
    Poster: Visualizing the Dynamic Epigenome
    Lydia Steiner, Thimo Rohlf, Jörg Galle, Hans Binder, Lydia Hopp, Henry Wirth, Sonja Prohaska
    Abstract     Poster
  • Fourth Weißenburg Symposium "Epigenetics and the Control of Gene Expression"
    20.06.11 - 22.06.11, Kulturzentrum Karmeliterkirche, Weißenburg (Bayern)
  • CITEC Workshop on Evolution of Human Language
    28.04.2011-29.04.2011, Center of Excellence Cognitive Interaction Technology, University of Bielefeld, Bielefeld
    Talk: A Pipeline for Computational Historical Linguistics
    Lydia Steiner, Peter F. Stadler,Michael Cysouw
    Abstract see Article with same title
  • 14th Annual Workshop on American Indigenous Languages
    15.04.2011-16.04.2011, Student Resources Building, UCSB, Santa Barbara, CA
    Talk: Assisted reconstruction: The cases of Panoan and Mataco-Guiacuruan
    Lydia Steiner, Michael Cysouw
  • 26th TBI Winterseminar in Bled, 5th Annual Meeting of the Bompfünewerer Consortium
    13.02.2011-20.02.2011, Bled, Slovenia
    Talk: Identify Homologous Words
    Lydia Steiner
  • 8. Herbstseminar 2010 Vysoka Lipa (Decin)
    05.10.2010 - 10.10.2010, Vysoka Lipa, Decin
    Talk: Tracing Histone Modifications
    Lydia Steiner
  • INRIA-IZBI-Workshop 2010
    01.09.2011, IZBI, Leipzig
    Talk: Models of Epigenetic Regulation: Histone Modifications - part I
    Lydia Steiner
  • Transcription, chromatin structure and DNA repair in development and differentiation
    07.07.2010-10.07.2010, Zeche Zollverein, Essen
    Poster: Novel findings on the genome-wide correlation of chromatin marks and CpGs
    Lydia Steiner, Sonja Prohaska, Jörg Galle
    poster      poster advertiment
  • 25th TBI Winterseminar in Bled, 4th Annual Meeting of the Bompfünewerer Consortium
    14.02.2010-21.02.2010, Bled, Slovenia
    Talk: An Example for Chromatin Regulation
    Lydia Steiner
  • 7. Herbstseminar 2009 Vysoka Lipa (Decin)
    21.10.2009 - 25.10.2009, Vysoka Lipa, Decin
    Talk: www - world wide words
    Lydia Steiner
  • 6. Herbstseminar 2008 Studeny
    31.10.2008 - 04.11.2008, Ceska Kamenice
    Talk: something about languages
    Lydia Steiner


Lydia Müller

Natural Language Processing Group
Department of Computer Science
University of Leipzig

phone: +49 (0)341 97 32315

Automatische Sprachverarbeitung
Institut für Informatik
Universität Leipzig
Augustusplatz 10
04109 Leipzig

postal address
Automatische Sprachverarbeitung
Institut für Informatik
Universität Leipzig
PF 100920
04009 Leipzig