Example data set¶
For a quick start with BAT and its modules, we have assembled a small example dataset, adopted from data used in a recent lymphoma publication (link). It is a subset of a paired-end human WGBS dataset, comprising 8 samples (S1-S8), each with two sequencing runs. The samples are split up in two groups (control: S1-S4, case: S5-S8). Either start with the most minimal example data set or use a run-script and additional files to test BAT.
Minimum input data¶
- Download dataset (BAT_example_input.tar.gz, 54 MB) and extract it
$ tar xvf BAT_example_input.tar.gz
This minimum example data set comprises the raw reads of one sample and the already called, but not filtered reads of that sample and further 7 samples. The samples blong to two groups, each of four samples. The unmapped sample consists of two sequencing runs. These reads could be mapped to a reduced genome and merged prior to methylation calling. In addition to the raw and calles methylation data are provided. This will enable you, to run the entire toolkit on a small example region.
In a quite basic version, the tool calls are shown at the example pages. There, the tool calls are given, all output files are stated and, if plots are produced, they are presented.
Extended input data¶
We recommand to download the entire BAT example directory (985 MB),
since a variety of additional files is provided to run all BAT
tools, eg., a reduced reference genome, some gene annotations and
gene expression data. The directory
BAT_example_structure contains a basic folder structure,
raw- two lanes of paired end data
mapped- output folder for mapped data
called- gzipped vcf files of all samples
data- output folder for filtered methylation files for all samples
annotation- folder for annotation dependen analysis
DMRs- output folder for DMR dependend analysis
genomes/hg19- reduced hg19 genome fasta, annotation of some TFBS, reduced gene annotation and the chromosome size file
expression- gene expression files for all samples
circos- circos-dependent data and output folder for circos plot
Extract the example directory using
$ tar xvf BAT_example_structure.tar.gz
Using the example data, given the directory structure and provided files described above, the following scripts can be tested.
For each script, a link to the more details explanation (including the description of all parameters), the example run command, the output, and a short glimpse at the output files and plots is provided.
The entire calls for running the example data are given in the run script, which is based on the given directory structure.