Groups

From single-sample methylation analysis to the comparison of two groups, each comprising multiple samples.


BAT_summarize

BAT_summarize facilitates the merging of multiple samples of two groups into files comprising a coherent set of cytosine positions that will be used for all downstream analyses such as calling of differentially methylated regions (DMRs), annotation independent and dependent analyses, or data integration (e.g. with histone modifications, transcription factor binding sites, expression). The coherent set of cytosines (referred to as filtered positions) is determined by user-defined thresholds on the maximum number of missing values for this position in each group. Only positions with a sufficiently large number of samples per group and hence with an accurate estimate of the biological variance in the methylation are included in the set of filtered positions. In addition to a single summary file containing methylation rates of all samples at filtered positions, BAT_summary produces the input file for the DMR caller metilene, two bedGraph files (one per group) with mean methylation rates, a bedGraph file with the difference in the mean methylation rates between both groups (group1-group2), and bedGraph as well as bigWig files for each sample.

In case of a human dataset, BAT_summarize is also able to automatically generate a fancy Circos plot, i.e., a genome-wide binned methylation heatmap of all samples. For other datasets it may be necessary to adjust the configuration files for the Circos plot accordingly. In this case, please consult the documentation at the Circos website.

Basic usage

BAT_summarize  --in1 <list> --in2 <list> --out <prefix> --cs <file>

Output files

File Description
prefix_mean_group1.bedgraph BedGraph and BigWig file of mean methylation rates in group 1 at filtered positions.
prefix_mean_group2.bedgraph BedGraph and BigWig file of mean methylation rates in group 2 at filtered positions.
prefix_diff_group1_group2.bedgraph BedGraph and BigWig file of difference in mean methylation rates between group 1 and group 2 at filtered positions (group1 - group2).
prefix_summary_group1_group2.bedgraph BedGraph file of methylation rates of group 1 and group 2 at filtered positions.
prefix_metilene_group1_group2.bedgraph Input file for DMR caller metilene containing methylation rates of group 1 and group 2 at filtered positions.
prefix_sample.bedgraph BedGraph file for each sample containing the methylation rates at filtered positions.
prefix_sample.bw BigWig file for each sample containing the methylation rates at filtered positions.
circos.png Circos plot (png and vcf) illustrating methylation rates of all samples as genomic methylation heatmap.

Input/Output options

Option Description
--in1 Comma-separated list of bedGraph input filenames of group 1.
--in2 Comma-separated list of bedGraph input filenames of group 2.
--out Prefix for output files.

Other options

Option Description
--cs Prefix for chrom.sizes file of corresponding referemce genome.
--groups Comma-separated list of group identifiers, one per group (default: g1,g2).
--mis String indicating how to encode missing values (default: NA).
--mis1 Maximum number of samples in group 1 with missing values, otherwise position will be excluded (default: 0).
--mis2 Maximum number of samples in group 2 with missing values, otherwise position will be excluded (default: 0).
--h1 Comma-separated list of sample identifiers of group 1 (default: prefix of bedGraph input files of group 1).
--h2 Comma-separated list of sample identifiers of group 2 (default: prefix of bedGraph input files of group 2).
--cir Path to Circos folder. If defined, a Circos plot (i.e., genome-wide methylation heatmap of all samples) will be plotted. Requires to contain "bin" BED files.

External tools

Option Description
-c Path to Circos executable. Required if Circos executable is not in PATH. For installation, manual or problems please go to the circos website .
-b Path to bedtools executable. Required if bedtools executable is not in PATH. For installation, manual or problems please go to the bedtools website.
--bgbw Path to UCSCtools' bedGraphToBigWig executable. Required if bedGraphToBigWig executable is not in PATH. For installation, manual or problems please go to the UCSCtools website.

(top)


BAT_overview

To get an annotation-independent overview of the methylome between the two conditions, you can use BAT_overview. It is basically an R wrapper that automatically generates the following overview statistics. A boxplot of the genome-wide average methylation level of each sample in a group as well as a dendrogram showing the hierarchical clustering of the methylation rates of each sample can help to inspect the variance in the methylation level within and between groups and detect possible outlier samples. Moreover, the distribution of position-wise mean methylation rates in each group is depicted as barplot for ranges of methylation levels, e.g. to detect overall shifts in the abundance of lowly, partially, or highly methylated Cs between the two groups. For a direct comparison of the groups at each position, a smoothed scatter plot is generated where the position-wise mean methylation of both groups are plotted against each other. Finally, a histogram of the difference in the mean methylation rate between the groups is generated.

Basic usage

Rscript BAT_overview  -i <file> -o <file> [--groups <list>]

Output file

File Description
output.pdf PDF file with basic overview plots.

Input/Output options

Option Description
-i Input file (summary file produced by BAT_summarize) with methylation rates of all samples in both groups.
-o Prefix of output file (PDF).
--groups Identifier for first and second group, seperated by "," (default g1,g2). Column names need to start with the group identifier.

Other option

Option Description
--miss String indicating how missing values are encoded (default: NA).

(top)


BAT_annotation

BAT_annotation provides an easy method for inspecting the methylation of a set of annotation items. For example, these annotation items could be DMRs (possibly subdivided into hyper- and hypomethylated ones), transcription factor binding sites, CpG islands/shores/shelfes, or protein/non-protein coding genes.

It reports the methylation rate for each annotation item per sample and the average methylation rate per group in a file with bedGraph-related format. Moreover, several graphics are automatically generated as visualizations including the distribution of the length of annotation items (in Cs and nucleotides), boxplots of the methylation rate for all annotation items per sample or per group, and heatmaps of methylation rates with a hierarchical clustering on samples and annotation items.

Basic usage

BAT_annotation -b <file> -i <file> --groups <list> -o <file>

Output file

File Description
output.txt file containing average methylation rates for each annotation item. Averages are given for each sample and the group means.
output.pdf PDF file with annotation item overview plots, i.e. length of annotation items, average methylation per sample in annotation items, heatmaps of average methylation rates.

Input/Output options

Option Description
-i Name of input file (summary file produced by BAT_summarize) with methylation rates of all samples in both groups.
--groups Identifier for first and second group, seperated by "," (default g1,g2). Column names need to start with the group identifier.
-b BedGraph file containing annotation of regions, e.g. TFBS, hypo/hypermethylated regions, genes, CpG islands/shores. Format: chr <tab> start <tab> end <tab> unique_annotation_identifier <tab> group_label.
-o Prefix of output files (default: current directory/annotation).

External tools

Option Description
--bedtools Path to bedtools executable. Required if bedtools executable is not in PATH. For installation, manual or problems please go to the bedtools website.
-R Path to R executable. Required if R executable is not in PATH. For installation, manual or problems please go to the R website.

(top)