TileReadCel {TileShuffle} | R Documentation |
Reads tiling array data from an Affymetrix BPMAP file and CEL file(s).
TileReadCel(cel.filename, cel2.filename, celinc.filename, bpmap.filename, minhits=8000, group="Hs", pmonly=TRUE, mmonly=FALSE, normalize=TRUE, mod.tstat=FALSE, gc=TRUE, matchscore=FALSE, verbose=FALSE)
cel.filename |
A vector of one or more filenames of Affymetrix
CEL files (as character ) that contain the probe intensities in
the first cellular condition. Note that replicates are simply defined
by more than one filename and used according to mod.tstat . |
cel2.filename |
A vector of one or more filenames of Affymetrix
CEL files (as character ) that contain the probe intensities in
the second cellular condition. Note that replicates are simply
defined by more than one filename and used according to
mod.tstat . |
celinc.filename |
A vector of one or more filenames of
Affymetrix CEL files (as character ) containing probe
intensities that should be included in the normalization. This may
be desirable in case tiling array data of more than two different
cellular states is available and multiple transitions between them
are being analyzed. |
bpmap.filename |
Filename of Affymetrix binary probe mapping (BPMAP)
file (as a character ), which is a binary file containing
information on the location of each probe in the reference sequence.
Moreover, it stores the probe sequences that are necessary to
calculate the GC content. |
minhits |
Minimal number of hits in BPMAP entry to be considered for the further analysis. Due to historical reasons there are several entries in the BPMAP file with only around thousand probes assigned that might overlap with the larger entries or with entries on other tiling arrays. In case of Affy tiling array 1.0R, a value of 8000 is recommended. |
group |
A group name as the organism abbreviation in order to consider only these entries in the BPMAP file and hence disregard entries such as TIGR, Affymetrix, or bacterial controls. |
pmonly |
Indicates whether only intensities of perfect match (PM)
probes on the Affymetrix tiling array are incorporated in the probe
intensity estimation. If neither pmonly nor mmonly is
set to TRUE , the specific hybridization effect of a probe is
estimated by taking PM-MM . Due to the issue of higher
intensities of mismatch (MM) probes compared to PM probes, it is
recommended to enable this parameter. |
mmonly |
Indicates whether only intensities of mismatch (MM) probes
on the Affymetrix tiling array are incorporated in the probe
intensity estimation. If neither pmonly nor mmonly is
set to TRUE , the specific hybridaztion effect of a probe is
estimated by taking PM-MM . This option is mutually exclusive
with the pmonly parameter and is only recommended for
investigating the behaviour of mismatch probes on a tiling array
but not in common (differential) expression analyses. |
normalize |
Indicates whether the probe intensities of the given CEL
files in cel.filename , cel2.filename , and
celinc.filename are normalized by use of full-quantile
normalization. The normalization is recommended if replicates are
available or a differential analysis is executed and, hence, the
transition between cellular states is analyzed. Note that both PM
and MM probes are included in the normalization regardless of the
pmonly or mmonly parameter. |
mod.tstat |
Indicates the use of replicate information. If TRUE ,
the score is the value of the moderated t-stastistic (see
eBayes of limma package for further details).
Otherwise, the median probe log2 -intensity among the given
replicates or the median of all pairwise log2 -fold changes
between both states will be used as estimate of the probe
differential score. Note that the moderated t-statistic can only be
used if replicate information is available. |
gc |
Indicates whether GC content of probe sequences will be calculated. It is defined as fraction of both Gs and Cs in the probe sequence. |
matchscore |
Indicates whether match score will be read. The match score is defined as number of perfect matches of the probe sequence per megabase of the genomic sequence. This information needs to be set accordingly in the BPMAP file. Otherwise, all probes contain only the default value. |
verbose |
Indicates whether information on progress are printed. |
Reads tiling array data in terms of an Affymetrix BPMAP (binary probe
mapping) file and the Affymetrix CEL file that are created by the Affymetrix
pipeline. The CEL file stores the results of the intensity calculations on
the pixel values of the DAT file. This includes among others an intensity
value, standard deviation of the intensity, and the coordinates of each
PM/MM probe on the tiling array. These coordinate information can be merged
with the Affymetrix BPMAP file that also comprises further information on the
probes. This includes the genomic location of each probe in terms of the
reference sequence and the probe center position, probe sequence for the GC
calculation, and the corresponding probe sequence length. The method
generates a data.frame
comprising all required data on probes that
are necessary for the subsequent shuffling analysis.
Returns a data.frame
containing information on all probes,
i.e., the full name of the reference sequence (organism abbreviation
and chromosome name), the chromosome name, the probe center position,
the length of the probe sequence, the GC content of the probe
sequence, the match score (if matchscore
is enabled), and
the probe score.
## Note that the following example only executes if the external data ## of the Starr R package is available which includes an artificial ## Affymetrix BPMAP file and corresponding CEL files. path <- system.file("extdata", package = "Starr") if (path != ""){ ## define Affymetrix BPMAP file for probe mapping bpmap.filename <- file.path(path, "Scerevisiae_tlg_chr1.bpmap") ## define Affymetrix CEL files ## here: file of control experiment (wt) wt.filename <- file.path(path, "wt_IP_chr1.cel") stopifnot(file.exists(bpmap.filename) && file.exists(wt.filename)) ## read CEL file and return data.frame ## with information such as genomic localization ## of probes, GC content of probe sequences, ## and probe intensities. ## Note that group is '' (blank) for old Affy chr21/22arrays ## while it is "Hs" for other Human, "Mm" for Mouse, or "Dm" ## for Drosophila tiling array platforms. wt.cel <- TileReadCel(cel.filename=wt.filename, bpmap.filename=bpmap.filename, group="", gc=TRUE, verbose=FALSE) ## getting an overview on the reported data.frame str(wt.cel) ## investigating data ## e.g. plot density of intensities pdf(file="wt_int_density.pdf") plot(density(wt.cel$intensity), main="", xlab="Intensity") dev.off() ## or GC bias with three GC content bins pdf(file="wt_gc_bias.pdf") boxplot(wt.cel$intensity ~ cut(wt.cel$gc, breaks=c(0,0.36,0.52,Inf),right=FALSE), xlab="GC content", ylab="Intensity") dev.off() ## cleanup rm(bpmap.filename, wt.filename, wt.cel) } rm(path)