TileReadCustom {TileShuffle}R Documentation

TileReadCustom

Description

Reads tiling array data from custom-formatted files.

Usage

TileReadCustom(custom.filename, custom2.filename, custominc.filename,
    pmonly=TRUE, mmonly=FALSE, normalize=TRUE, gc=TRUE, verbose=FALSE)

Arguments

custom.filename A vector of one or more filenames of custom- formatted files (as character) that contain the probe intensities in the first cellular condition and are formatted as described. Note that replicates are simply defined by more than one filename. In case custom2.filename is set to NULL, the median probe log2-intensity among the given replicates is reported as estimate of the probe intensity. On the other hand, if custom2.filename is given, the median of all possible pairwise log2-fold changes between files from the cellular condition (custom.filename) and ones from the second cellular condition (custom2.filename) of a probe will be reported as estimate of the probe intensity change.
custom2.filename A vector of one or more filenames of custom- formatted files (as character) that contain the probe intensities in the second cellular condition and are formatted as described. Note that replicates are simply defined by more than one one filename. This parameter is only required in case of differential expression analysis where the median of all possible pairwise log2-fold changes between files in the first cellular condition (custom.filename) and ones from the second cellular condition (custom2.filename) of a probe will be reported as estimate for the probe intensity change. Otherwise, this parameter must be set to NULL.
custominc.filename A vector of one or more filenames of custom-formatted files (as character) containing probe intensities that should be included in the normalization. This may be desirable in case tiling array data of more than two different cellular states is available and multiple transitions between them are being analyzed. In the analysis of any of these transitions, the files corresponding to the remaining cellular states, i.e., those not given by custom.filename or custom2.filename, may be defined as custominc.filename. Hence, the full-quantile normalization is always done on the entire set of available intensity data and the log2-fold changes among different analyzed transitions are comparable and not biased by the normalization procedure. This parameter is futile if normalization is disabled.
pmonly Indicates whether only intensities of perfect match (PM) probes on the tiling array are incorporated in the probe intensity estimation. If neither pmonly nor mmonly is set to TRUE, the specific hybridization effect of a probe is estimated by taking PM-MM.
mmonly Indicates whether only intensities of mismatch (MM) probes are incorporated in the probe intensity estimation. If neither pmonly nor mmonly is set to TRUE, the specific hybridization effect of a probe is estimated by taking PM-MM. This option is mutually exclusive with the pmonly parameter and is only recommended for investigating the behaviour of mismatch probes but not in common (differential) expression analysis.
normalize Indicates whether the probe intensities of the given files in custom.filename, custom2.filename, and custominc.filename are normalized by use of full-quantile normalization. The normalization is recommended if replicates are available or a differential analysis is executed and, hence, the transition between cellular states is analyzed. Note that PM and MM probe intensities are not included in the normalization if mmonly and pmonly is set to TRUE, respectively.
gc Indicates whether GC content of probe sequences will be calculated. It is defined as fraction of both Gs and Cs in the probe sequence. The probe sequences may be set arbitrarily if gc is disabled.
verbose Indicates whether information on progress are printed.

Details

Reads tiling array data from custom-formatted files that may be created by any tiling array platform. It includes probe information that is separated by tabulators. Except for comment (indicated by '#' on beginning) or empty line, each line must contain the following columns: probe identifier that is unique within each file and consistent among the given files in terms of their probe coordinates, name of the reference sequence, start and end position of the probe on the reference sequence (both 0-based, as integer), intensity value of PM probe (non-log scale), intensity value of MM probe (on non-log scale), and probe sequence in order to calculate the GC content. The data must not contain 'NA' values. Moreover, the probe sequences may be set arbitrary if gc is disabled. The method generates a data.frame comprising all required data on probes that are necessary for the subsequent shuffling analysis.

Value

Returns a data.frame containing information on all probes, i.e., the name of the reference sequence, the chromosome name (here: both are equal), the probe center position, the length of the probe sequence, the GC content of the probe sequence, the match score (if matchscore is enabled), and the log2 probe intensity or the probe log2-fold change.

Examples

## This example requires the custom-formatted files
## in the extdata folder of this package. Otherwise,
## it aborts with an error.
path <- system.file("extdata", package = "TileShuffle")
stopifnot(path != "")
## define filename to custom-formatted file
custom.filename <- file.path(path, "custom.txt")
stopifnot(file.exists(custom.filename))

## read custom-formatted file and return data.frame
## with information such as genomic localization
## of probes, GC content of probe sequences, and
## probe intensities.
custom <- TileReadCustom(custom.filename=custom.filename,
pmonly=TRUE, gc=TRUE, verbose=FALSE)

## getting an overview on the reported data.frame
str(custom)

## investigating data
## e.g. plot density of intensities
pdf(file="custom_int_density.pdf")
plot(density(custom$intensity), main="", xlab="Intensity")
dev.off()
## or GC bias with three GC content bins
pdf(file="custom_gc_bias.pdf")
boxplot(custom$intensity ~ cut(custom$gc,
breaks=c(0,0.36,0.52,Inf),right=FALSE),
xlab="GC content", ylab="Intensity")
dev.off()
## cleanup
rm(path, custom.filename, custom)

[Package TileShuffle version 0.1.0 Index]