Calling

Extraction of methylation information for each cytosine including subsequent filtering.


BAT_calling

BAT_calling is used to extract the methylation information at all cytosines from the bisulfite read alignments produced by BAT_mapping. It only requires the reference genome genome in FastA format and the sorted read alignments in BAM or (gzip'ed) SAM format. Note that for methylation calling using haarz of the segemehl suite, the read alignments are converted to gzip'ed SAM format if given as BAM and indexed. The position-wise methylation information is reported in a novel methylation VCF format (gzip'ed). Therein, INFO fields comprise information of cytosine strand (CS) and its sequence context (e.g. CC=CG). In addition, FORMAT fields contain information of methylation mapping coverage (MDP), detailed nucleotide composition of this position (MDP3), and the estimated methylation rate (MR). The methylation rates are estimated as #C/(#C+#T) where #C and #T are the number of read alignments with a cytosine nucleotide (= unconverted, methylated) and thymine nucleotide (= converted, unmethylated) at this position.

Basic usage

BAT_calling  -d <file> -q <file>

Output files

File Description
input.sam.gz Gzip'ed and indexed SAM file containing all read alignments, if not already present.
input.vcf.gz VCF file containing the cytosine methylation information used for further analyses.
prefix.calling.log Log file.

Input/Output options

Option Description
-d Filename of reference genome FastA.
-q Filename of input BAM or (gzip'ed) SAM file containing the read alignments.
-o Path for output files (default: path of input file).

External tools

Option Description
--haarz Path to haarz executable. Required if haarz executable is not in PATH. For installation, manual or problems please go to the segemehl (haarz) website.
--samtools Path to samtools executable. Required if samtools executable is not in PATH. For installation, manual or problems please go to the samtools website.

(top)


BAT_filter_vcf

BAT_filter_vcf facilitates the filtering of methylation information in VCF format based on several criteria (e.g., genomic context, bisulfite mapping coverage, methylation rate). In addition to the VCF output file containing only the filtered positions, BAT_filter_vcf reports the methylation rates per filtered cytosine as bedGraph file and automatically generates a PDF file containing plots of the methylation rate and coverage distributions, separately at all and at only filtered positions only. Note that, in case only the input VCF file without any of the filtering parameter is provided, BAT_filter_vcf will simply produce the bedGraph and PDF file.

Basic usage

BAT_filter_vcf --vcf <file> --out <prefix>

Output files

File Description
prefix.vcf.gz Gzip'ed VCF file containing only positions passing the filtering criteria (if defined).
prefix.bedgraph BedGraph file of methylation rates at positions passing the filtering criteria (if defined).
prefix.pdf PDF file containing plots of coverage and methylation rate distributions over all positions and positions passing the filtering criteria (if defined).

Input/Output options

Option Description
--vcf Filename of gzip'ed VCF file produced by BAT_calling that contains the cytosine-wise methylation information
--out Prefix of output files (i.e., gzip'ed VCF file, bedGraph file, and PDF file).

Filtering options

Option Description
--context Comma-separated list of genomic contexts (e.g., CG).
--MDP_min Minimum number of reads (i.e. bisulfite mapping coverage) per sample.
--MDP_max Maximum number of reads (i.e. bisulfite mapping coverage) per sample.
--MR_min Minimum methylation rate.
--MR_max Maximum methylation rate.
--MR Indicate whether MR filter should be applied for all samples or only to the mean methylation/difference in methylation rate. Only relevant for VCF files containing multiple samples.

External tools

Option Description
-R Path to R executable. Required if R executable is not in PATH. For installation, manual or problems please go to the R website.

(top)