BAT - Bisulfite Analysis Toolkit

Introduction

Cytosine DNA methylation is a biochemical process that has been shown to play an important roll in gene expression and cell differentiation. Recently, a number of whole-genome bisulfite sequencing (WGBS) and targeted bisulfite sequencing (i.e., RRBS) protocols have made it possible to precisely and accurately capture this major epigenetic modification.

Here, a modular bisulfite analysis toolkit (BAT) is introduced. It tackles the major tasks for analyzing bisulfite sequencing data: mapping, extraction of the methylation information (referred to as methylation calling), and differential methylation analysis as well as downstream analyses like integration of the methylation data with annotation and gene expression data. Each part of this analysis workflow is modular and can easily be customized or extended by other bisulfite- or NGS-related tools, but can also be used as is with the additional benefit of many automatically generated graphics by the modules of BAT.

Modules

Mapping

The first module comprises read mapping including pre- and postprocessings. This includes conversion to BAM format, mapping statistics and merging of multiple mapping runs.

Calling

The second module covers the extraction of methylation information from the alignments, filtering for positions of interest, e.g. CG context, and conversion of methylation information for visualisation.

Analysis

In the third module basic analysis of two groups of a single sample or up to multiple samples are performed.

DMRs

Finally, the calling of DMRs is coverd by the fourth module. Basic statistics of the DMRs are provided and given expression information of genes, correlating DMRs can be calculated.

Example data

The example data comprise the raw reads of one sample and the already called, but not filtered reads of that sample and further 7 samples. The samples belong to two groups, each of four samples. The data are adopted from a recent lymphoma publication (link). The unmapped sample consists of two sequencing runs. These reads can be mapped to a reduced genome and merged prior to methylation calling. In addition to the raw and called methylation data, a reduced reference genome, some gene annotations and gene expression data are provided. This will enable you, to run the entire toolkit on a small example region.

In a quite basic version, the tool calls are shown at the example page. There, the tool calls are given, all output files are stated and, if plots are produced, they are presented.

A run-script, running all analysis shown on the example page and the input data can be downloaded here (for futher information see example page. As the run-script refers to a recommended folder structure, the input data including the folder structure can also be downloaded here (985 MB).

In addition, all BAT scripts can be downloaded one by one here.

Docker

If you prefer to not install all dependencies, you can use the BAT docker image. Dependencies and scripts are installed - simply pull the image. To test it, download the input data including the folder structure here (985 MB), run the docker image and the run-script. For a quick start, have a look here

License

This software is published under the terms of the MIT License

Copyright (c) 2016 Helene Kretzmer, Christian Otto, Steve Hoffmann

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

LICENSE