TileReadCel {TileShuffle}R Documentation

TileReadCel

Description

Reads tiling array data from an Affymetrix BPMAP file and CEL file(s).

Usage

TileReadCel(cel.filename, cel2.filename, celinc.filename,
    bpmap.filename, minhits=8000, group="Hs", pmonly=TRUE,
    mmonly=FALSE, normalize=TRUE, mod.tstat=FALSE, gc=TRUE,
    matchscore=FALSE, verbose=FALSE)

Arguments

cel.filename A vector of one or more filenames of Affymetrix CEL files (as character) that contain the probe intensities in the first cellular condition. Note that replicates are simply defined by more than one filename and used according to mod.tstat.
cel2.filename A vector of one or more filenames of Affymetrix CEL files (as character) that contain the probe intensities in the second cellular condition. Note that replicates are simply defined by more than one filename and used according to mod.tstat.
celinc.filename A vector of one or more filenames of Affymetrix CEL files (as character) containing probe intensities that should be included in the normalization. This may be desirable in case tiling array data of more than two different cellular states is available and multiple transitions between them are being analyzed.
bpmap.filename Filename of Affymetrix binary probe mapping (BPMAP) file (as a character), which is a binary file containing information on the location of each probe in the reference sequence. Moreover, it stores the probe sequences that are necessary to calculate the GC content.
minhits Minimal number of hits in BPMAP entry to be considered for the further analysis. Due to historical reasons there are several entries in the BPMAP file with only around thousand probes assigned that might overlap with the larger entries or with entries on other tiling arrays. In case of Affy tiling array 1.0R, a value of 8000 is recommended.
group A group name as the organism abbreviation in order to consider only these entries in the BPMAP file and hence disregard entries such as TIGR, Affymetrix, or bacterial controls.
pmonly Indicates whether only intensities of perfect match (PM) probes on the Affymetrix tiling array are incorporated in the probe intensity estimation. If neither pmonly nor mmonly is set to TRUE, the specific hybridization effect of a probe is estimated by taking PM-MM. Due to the issue of higher intensities of mismatch (MM) probes compared to PM probes, it is recommended to enable this parameter.
mmonly Indicates whether only intensities of mismatch (MM) probes on the Affymetrix tiling array are incorporated in the probe intensity estimation. If neither pmonly nor mmonly is set to TRUE, the specific hybridaztion effect of a probe is estimated by taking PM-MM. This option is mutually exclusive with the pmonly parameter and is only recommended for investigating the behaviour of mismatch probes on a tiling array but not in common (differential) expression analyses.
normalize Indicates whether the probe intensities of the given CEL files in cel.filename, cel2.filename, and celinc.filename are normalized by use of full-quantile normalization. The normalization is recommended if replicates are available or a differential analysis is executed and, hence, the transition between cellular states is analyzed. Note that both PM and MM probes are included in the normalization regardless of the pmonly or mmonly parameter.
mod.tstat Indicates the use of replicate information. If TRUE, the score is the value of the moderated t-stastistic (see eBayes of limma package for further details). Otherwise, the median probe log2-intensity among the given replicates or the median of all pairwise log2-fold changes between both states will be used as estimate of the probe differential score. Note that the moderated t-statistic can only be used if replicate information is available.
gc Indicates whether GC content of probe sequences will be calculated. It is defined as fraction of both Gs and Cs in the probe sequence.
matchscore Indicates whether match score will be read. The match score is defined as number of perfect matches of the probe sequence per megabase of the genomic sequence. This information needs to be set accordingly in the BPMAP file. Otherwise, all probes contain only the default value.
verbose Indicates whether information on progress are printed.

Details

Reads tiling array data in terms of an Affymetrix BPMAP (binary probe mapping) file and the Affymetrix CEL file that are created by the Affymetrix pipeline. The CEL file stores the results of the intensity calculations on the pixel values of the DAT file. This includes among others an intensity value, standard deviation of the intensity, and the coordinates of each PM/MM probe on the tiling array. These coordinate information can be merged with the Affymetrix BPMAP file that also comprises further information on the probes. This includes the genomic location of each probe in terms of the reference sequence and the probe center position, probe sequence for the GC calculation, and the corresponding probe sequence length. The method generates a data.frame comprising all required data on probes that are necessary for the subsequent shuffling analysis.

Value

Returns a data.frame containing information on all probes, i.e., the full name of the reference sequence (organism abbreviation and chromosome name), the chromosome name, the probe center position, the length of the probe sequence, the GC content of the probe sequence, the match score (if matchscore is enabled), and the probe score.

Examples

## Note that the following example only executes if the external data
## of the Starr R package is available which includes an artificial
## Affymetrix BPMAP file and corresponding CEL files.
path <- system.file("extdata", package = "Starr")
if (path != ""){
## define Affymetrix BPMAP file for probe mapping
bpmap.filename <- file.path(path, "Scerevisiae_tlg_chr1.bpmap")
## define Affymetrix CEL files
## here: file of control experiment (wt)
wt.filename <- file.path(path, "wt_IP_chr1.cel")
stopifnot(file.exists(bpmap.filename) && file.exists(wt.filename))

## read CEL file and return data.frame
## with information such as genomic localization
## of probes, GC content of probe sequences,
## and probe intensities.
## Note that group is '' (blank) for old Affy chr21/22arrays
## while it is "Hs" for other Human, "Mm" for Mouse, or "Dm"
## for Drosophila tiling array platforms.
wt.cel <- TileReadCel(cel.filename=wt.filename,
bpmap.filename=bpmap.filename,
group="", gc=TRUE, verbose=FALSE)

## getting an overview on the reported data.frame
str(wt.cel)

## investigating data
## e.g. plot density of intensities
pdf(file="wt_int_density.pdf")
plot(density(wt.cel$intensity), main="", xlab="Intensity")
dev.off()
## or GC bias with three GC content bins
pdf(file="wt_gc_bias.pdf")
boxplot(wt.cel$intensity ~ cut(wt.cel$gc,
breaks=c(0,0.36,0.52,Inf),right=FALSE),
xlab="GC content", ylab="Intensity")
dev.off()
## cleanup
rm(bpmap.filename, wt.filename, wt.cel)
}
rm(path)

[Package TileShuffle version 0.2.0 Index]