Research Interests
Bioinformatics
Bioinformatics, in particular development of data analysis methods, draw my attention since starting the study of computer science. The possibility to gain insight into large data with computational methods was and is fascinating for me. Therefore, my research interests always included aspects of data analysis such as statistical methods and started to include since last year also visualization techniques. I discovered that many data sets are public available in biology and many projects aiming at producing even more data. Therefore, I got interested in the analysis of such data and connected it with my interest in Epigenetics during my PhD. I continued and extended this research during my PostDoc.
Epigenetics
Epigenetics is a relatively new research area which benefit a lot of the new sequencing techniques and ChIP. Due to data produced by the combination of both techniques, scientists are enabled to identify genome wide distribution of epigenetics marks and DNA-bound proteins and infer the function of these marks as well as regulatory mechanisms. Furthermore, connections to transcriptome data is possible using RNA-seq techniques.
In my research, I am particularly interested in quantitatively analyzing such high-throughput-data to find out more about the changes of patterns of epigenetics marks during the processes of differentiation and development. Furthermore, I think that analysis of such data allows conclusions and predictions for regulatory mechanisms initiating and controlling both processes. An global understanding of the epigenetics regulation may help to understand the nature of diseases associated with environmental factors and disregulation during development.
My current research additionally include the modeling of epigenetic process. I'm in particular interested in epigenetic memory and cell fate. The current knowledge of epigenetic mechanisms already shows the importance of epigenetic regulation during cell development and data analysis shows that cell fate and identity are strongly depend on the ability of the cell to retain epigenetics patterns throughout cell division -- the epigenetic memory. However, knowledge about the the detailed mechanism and evolutionary conservation of them is currently legging. Furthermore, analysis of the stability and dependencies of epigenetic memory is not possible in detail. With my model I tackle the last to questions.
Linguistics
There are several similarities between Linguistics and Bioinformatics. For example, both fields analyze strings. While in Bioinformatics, these string are relatively, are composed of only a few different characters and their structure is hard to determine, in Linguistics such strings consists of a lot of different characters and are compared to biological sequences very short and highly structured.
In contrast to Linguistics, there is a strong interaction between biologists and computer scientists. As a result several methods exists for task like comparing sequences, finding homologous sequences or reconstructing trees. For the same tasks in Linguistics only a few programs exists and a lot of work is done by hand. In my diploma thesis project I tried to apply bioinformatic methods to lingusitical data to find cognates in data sets with at least 1000 words in at least 3 languages. After my diploma thesis, I continue with this project to optimize and improve the models and algorithms for language data. More importantly, there are several problems where Linguistic and Computer Science/ Bioinformatics can benefit from cooperations. Thus, I also interested in the field of computational Humanities and especially computational Linguistics.
Epigenetics and Bioinformatics
Epigenetics was the research topic in my PhD thesis. The main focus lay on the understanding of the dynamics of epigenetics states and mechanisms especially those referring aging and differentiation. I continue working the field of epigenetics during PostDoc. I'm still analyzing the epigenetic state and their dynamics but extended my research areas to modeling of epigenetic memory and the protein components responsible for writing, erasing, and reading histone modifications.
Currently, it is unknown which specific modifications are required to define a special cell type and how much variation/fluctuations of histone modifications and DNA modifications can compensated by a cell without changing its epigenetic state. Thus, one main problem is to find comparison criteria for epigenetics states. These criteria may can found by comparing the epigenetic states of different cell types in different enviroments.
The chromatin of cell is dynamic system changing slightly over time but staying in the stable state. Modeling its behavior enables us to test different hypothesis such as histone distribution strategies during cell division or reprogramming strategies and find out how likely the hypothesis are. Furthermore, with modeling epigenetic memory, I tackle the questions how many different stable epigenetic state may exists at the same time, how much noise the such a system can tolerate, and which components of the cell play a key role in epigenetic regulation and cell identity.
During my PhD, I was part of an interdisciplinary project combining data analysis and modeling. Our model describes histone modifications on the base of interactions complexes binding which may bind to DNA and/or histones to write there modifications. Demodifications of the histone occurs at constant rate independent of a interactions complex. We fit our model to the Polycomb (PcG)/Thritorax (Trx) system in which PcG writes H3K27me3 (a repressing mark) and Trx writes H3K4me3 (an activating mark). Demodifications occurs during cell division where the modifications are diluted (modified histones are randomly distributed to both daughter cells). With this model, we can simulate proliferation induced cell differentiation.
I am working on an model for epigenetic memory allowing for simulations of cell division and analysis of the stability of epigenetic states under a variety of conditions. I based the model on the knowledge found during literature review on epigenetic inheritance and regulation. Simulation with or without particular components of the regulation system of histone modifications can be performed enabling conclusions on the importance of those components or alternative regulation mechanisms. Furthermore, studying the requirements, accuracy, and natural limitation of the epigenetic regulation due to cell division is possible and one aim of my research.
The availability of large data sets measuring epigenetic modifications and transcriptomic data motivates me to perform analysis to support models for epigenetics but also generating testable hypotheses for epigenetic patterns and mechanisms of regulation. In my opinion, such large data sets allow us to gain insights into the epigenetic regulation.
The increasing amount of high-throughout data sets of different kinds requires the development of neew methods to process the data sets in a meaningful manner. This motivates me develop a new peak caller for ChIP-seq data which makes use of replicates. It is robust against noise and can handle replicates from different batches or laboratories. Furthermore, it allows quality control during the peak calling process. I designed a benchmark data set which allows us to show that our method is substantially better than existing methods. Using data sets from the Roadmap Epigenomics Project, my co-authors and me could show that the peak calls of our new pak caller make more sense from a biological point of view.
I take part at the collaboration with the Image and Signal Processing group. With this collaboration, we aim at the development for exploration and visualization tools for large chromatin data sets. A first approach enables us to compare different modifications in different cell types without prior knowledge of the data itself. Therefore, it helps to formulate hypotheses. We continued this collaboration with the help of several master students and one PhD student. As a result, we recently submitted our second joint paper to the BioVis 2016.
Not only histone modifications or simulations provide information on epigenetic regulation. Also long non-coding RNAs (lncRNAs) play a important role in the regulation of developnmental process. It is shown that many of them interact with chromatin modifiers and guide them to specific position in the genome by complementary binding. Not many is known about the interaction of lncRNAs and chromatin modifiers. In a female start up fund was granted to me to investigate the binding sites of PRC2 with lncRNAs. The main aims are (1) to find binding motifs for PRCS2 and (2) to develop a method to detect binding sites based on genomic conservation, structure predicitons, and CLIP-seq data.
I am involved in the project on genome evolution. The main aim here is to allow to compare genome associated data (such as transcriptomic data or epigenetic data) of different species. The comparison is based on the constructed supergenome on which the different annotations and data sets of the species are mapped.
Linguistics
While writing my diploma thesis, I explore the similarities between linguistics computer science and bioinformatics. There are a lot of parallels in the methods of both fields mainly emerging from fact that in both field the main object, either language or DNA/RNA/protein sequences, are represented as strings. While details are greatly different, the basic algorithms and ideas are widely the same.
In my diploma thesis, I proof that bioinformatics methods such as pairwise and multiple alignments, phylogenetic algorithms and clustering methods can be used to find words which originate from the same ancestral word. I based my pipeline on the work flow which is called comparative methods in historical linguistics. Much of this work flow is very similar to the typical way of discovering homologous sequences in bioinformatics. Thus, I could use some strategies originating from homology detection.
Currently, we are working on a more sophisticated approach using bigram to increase the sensitivity and accuracy. We collaborate with Dr. Christian Höner zu Siederdissen at the TBI, Vienna, Austria who provide a framework for fast implementations of grammar products and supports us to build alignment programs for bigram alignments. Furthermore, a collaboration with Prof. Tanmoy Bhattacharya at the SFI, Santa Fe, NM, USA is planed focusing on cognate detection and proto-language reconstruction.
Since large data sets are rare in historical linguistics, I furthermore interested in generating such data sets using computational approach. Such a large data set would not only provide more statistical power for detection of cognates and proto-language reconstruction but also enable reliable loan word detection. Likely, it will be possible to determine the strata of the detected loan words and thus, enable large scale analyses of contacts between languages.
Curriculum Vitae
I had studied computer science from October 2004 to September 2009 at the University of Leipzig. During the study, I was student assistant in the Natural Language Processing group, University of Leipzig and Bioformatics, University of Leipzig. I also visit the Max-Planck-Institute for Mathematics in the Science for an internship.
After the end of my study in 2009, I had became a PhD student in the Bioinformatics Group and IZBI. With the creation of the Junior-professorship for computational EvoDevo, I belonged to this group and I am funded by the MAGE Project which is affiliate at the IZBI. My PhD thesis topic was about epigenetical regulation and aging. Nevertheless, I also continue to work on computational linguistics.
In June 2013, I became a PostDoc at the Bioinformatics group, Junior-professorship for Computational EvoDevo, IZBI, and Wisconsin Institute for Discovery. I worked on the "Origin of regulation" founded by the Templeton Foundation.
From Januar 2015 to August 2016, I was a PostDoc at the Bioinformatics group and analyzed RNA-Seq data. Additionally, I still had a high interest in exploring epigenetic data set to understand the underlying dynamics and regulatory mechanisms. I furthermore interested in the development of methods to analyze high-throughput sequencing data. I kept on working on the evolution of natural languages.
Since April 2016, I supervise a PhD student working on the interaction of lncRNAs with chromatin modifiers. This research is possible due to a female start up funding granted to me.
In August 2016, I switched to the Natural Language Processing Group at the University of Leipzig. As part of the CLARIN-D Project, I re-engineer the ASV toolbox: a modular collection of NLP tools which will be available online. It provides a tool set for reasearchers which allows to execute NLP tasks on written language records. Since it is available online it also allows students to explore NLP tasks and to unserstand to algorthms used in the field by playing with their own data. Apart from ASV Toolbox, I work on disentabling the history of langauges and words computationally but also keep track on bioinformatics methods for genome evolution and the analysis of high throughput sequencing data.
Publications and Conferences
Publications
- Design specifications for cellular regulation David C. Krakauer, Lydia Müller, Sonja J. Prohaska, Peter F. StadlerTheory in Biosciences Article Preprint
- Sierra platinum: a fast and robust peak-caller for replicated ChIP-seq experiments with visual quality-control and -steering Lydia Müller, Daniel Gerighausen, Mariam Farman, Dirk Zeckzer 2016 BMC Bioinformatics Article
- Dual RNA-seq unveils noncoding RNA functions in host-pathogen interactions Alexander J. Westermann, Konrad U. Förster, Fabian Amman, Lars Barquist, Yanjie Chao, Leon N. Schulte, Lydia Müller, Richard Reinhardt, Peter F. Stadler, Jörg Vogel 2016 Nature Article
- The p53-p21-DREAM-CDE/CHR pathway regulates G2/M cell cycle genes Martin Fischer, Marianne Quaas, Lydia Steiner, Kurt Engeland 2015 Nucl. Acids Res. Article
- The ancestor of modern Holozoa acquired the CCA-adding enzyme from Alphaproteobacteria by horizontal gene transfer Heike Betat, Tobias Mede, Sandy Tretbar, Lydia Steiner, Peter F. Stadler, Mario Mörl, Sonja J. Prohaska2015 Nucl. Acids Res. Article
- The transcription factor p53: Not a repressor, solely an activator Martin Fischer, Lydia Steiner, Kurt Engeland2014 Cell Cycle Article
- Analyzing Chromatin Using Tiled Binned Scatterplot Matrices Dirk Zeckzer, Daniel Gerighausen, Lydia Steiner, Sonja J. Prohaska2014 BioVis 2014 conference in Boston, USA Article
- The Dynamic Epigenome --- Analysis of the Distribution of Histone ModificationsLydia Steiner2013Dissertation published on Qucosa
- Pitfalls of Ascertainment Biases in Genome Annotations --- Computing Comparable Protein Domain Distributions in Eukarya Arli A. Parikesit, Lydia Steiner, Peter F. Stadler, Sonja J. Prohaska2013 Malaysian Journal of Fundamental and Applied Sciences Article
- Transcriptional regulation by histone modifications: towards a theory of chromatin re-organization during stem cell differentiationHans Binder, Lydia Steiner, Thimo Rohlf, Sonja Prohaska, Jörg Galle2013Physical Biology Abstract
- A Global Genome Segmentation Method for Exploration of Epigenetic Patterns}Lydia Steiner, Lydia Hopp, Henry Wirth, Jörg Galle, Hans Binder, Sonja J. Prohaska, Thimo Rohlf2012PlosOne Article
- Modeling the dynamic epigenome: from histone modifications towards self-organizing chromatinThimo Rohlf, Lydia Steiner, Jens Przybilla, Sonja Prohaska, Hans Binder, Jörg Galle2012, EpigenomicsPUBMED
- A Pipeline for Computational Historical LinguisticsLydia Steiner, Peter F. Stadler, Michael Cysouw2011, Language Dynamics and ChangeAbstract
- Proteinortho: Detection of (Co-)Orthologs in Large-Scale AnalysisMarcus Lechner, Sven Findeiß, Lydia Steiner, Manja Marz, Peter F. Stadler, Sonja J. Prohaska2011, BMC BioinformaticsAbstract
Conferences and Seminars
- Symposium Environmental Genomics in Aquatic Systems: Current State and Future Perspectives23.09.2016, Limnological Institute, University of KonstanzInvited Talk: Studying Host-Pathogen-Interactions using dual RNA-seq
- SFI Working Group: Lexical Semantic Networks and Language Change 17.03.2016-18.03.2016Talk about distributional semantics across language borders.
- SFI Working Group: THE LOGIC AND DYNAMICS OF EPIGENETIC REGULATION 17.03.2014-21.03.2014Talk about our model for epigenetic memory
- SFI Working Group: HISTORICAL LINGUISTICS: PROCESSES, INFERENCE, AND RECONSTRUCTION13.03.2014-14.03.2014Talk about the ongoing research on historical linguistics in bioinformatics group
- Epigenetics Europe
201108.09.11-09.09.11, Hotel Holiday Inn Munich
City Centre, Munich Poster: Visualizing the Dynamic
Epigenome
Lydia Steiner, Thimo Rohlf, Jörg Galle, Hans Binder, Lydia Hopp, Henry Wirth, Sonja Prohaska
Abstract Poster - Fourth Weißenburg Symposium "Epigenetics and the Control of Gene Expression"20.06.11 - 22.06.11, Kulturzentrum Karmeliterkirche, Weißenburg (Bayern)
- CITEC Workshop on Evolution of Human Language28.04.2011-29.04.2011, Center of Excellence Cognitive Interaction Technology, University of Bielefeld, BielefeldTalk: A Pipeline for Computational Historical LinguisticsLydia Steiner, Peter F. Stadler,Michael CysouwAbstract see Article with same title
- 14th Annual Workshop on American Indigenous Languages15.04.2011-16.04.2011, Student Resources Building, UCSB, Santa Barbara, CATalk: Assisted reconstruction: The cases of Panoan and Mataco-GuiacuruanLydia Steiner, Michael CysouwAbstract
- 26th TBI Winterseminar in Bled, 5th Annual Meeting of the Bompfünewerer Consortium 13.02.2011-20.02.2011, Bled, SloveniaTalk: Identify Homologous WordsLydia Steiner
- 8. Herbstseminar 2010 Vysoka Lipa (Decin) 05.10.2010 - 10.10.2010, Vysoka Lipa, DecinTalk: Tracing Histone ModificationsLydia Steiner
- INRIA-IZBI-Workshop 201001.09.2011, IZBI, LeipzigTalk: Models of Epigenetic Regulation: Histone Modifications - part ILydia Steiner
- Transcription, chromatin structure and DNA repair in development and differentiation 07.07.2010-10.07.2010, Zeche Zollverein, Essen Poster: Novel findings on the genome-wide correlation of chromatin marks and CpGsLydia Steiner, Sonja Prohaska, Jörg Galleposter poster advertiment
- 25th TBI Winterseminar in Bled, 4th Annual Meeting of the Bompfünewerer Consortium 14.02.2010-21.02.2010, Bled, SloveniaTalk: An Example for Chromatin RegulationLydia Steiner
- 7. Herbstseminar 2009 Vysoka Lipa (Decin) 21.10.2009 - 25.10.2009, Vysoka Lipa, DecinTalk: www - world wide wordsLydia Steiner
- 6. Herbstseminar 2008 Studeny 31.10.2008 - 04.11.2008, Ceska KameniceTalk: something about languagesLydia Steiner