TileReadCustom {TileShuffle}R Documentation

TileReadCustom

Description

Reads tiling array data from custom-formatted files.

Usage

TileReadCustom(custom.filename, custom2.filename, custominc.filename,
    pmonly=TRUE, mmonly=FALSE, normalize=TRUE, mod.tstat=FALSE,
    gc=TRUE, verbose=FALSE)

Arguments

custom.filename A vector of one or more filenames of custom- formatted files (as character) that contain the probe intensities in the first cellular condition and are formatted as described. Note that replicates are simply defined by more than one filename and used according to mod.tstat.
custom2.filename A vector of one or more filenames of custom- formatted files (as character) that contain the probe intensities in the second cellular condition and are formatted as described. Note that replicates are simply defined by more than one one filename and used according to mod.tstat.
custominc.filename A vector of one or more filenames of custom-formatted files (as character) containing probe intensities that should be included in the normalization. This may be desirable in case tiling array data of more than two different cellular states is available and multiple transitions between them are being analyzed.
pmonly Indicates whether only intensities of perfect match (PM) probes on the tiling array are incorporated in the probe intensity estimation. If neither pmonly nor mmonly is set to TRUE, the specific hybridization effect of a probe is estimated by taking PM-MM.
mmonly Indicates whether only intensities of mismatch (MM) probes are incorporated in the probe intensity estimation. If neither pmonly nor mmonly is set to TRUE, the specific hybridization effect of a probe is estimated by taking PM-MM. This option is mutually exclusive with the pmonly parameter and is only recommended for investigating the behaviour of mismatch probes but not in common (differential) expression analysis.
normalize Indicates whether the probe intensities of the given files in custom.filename, custom2.filename, and custominc.filename are normalized by use of full-quantile normalization. The normalization is recommended if replicates are available or a differential analysis is executed and, hence, the transition between cellular states is analyzed. Note that PM and MM probe intensities are not included in the normalization if mmonly and pmonly is set to TRUE, respectively.
mod.tstat Indicates the use of replicate information. If TRUE, the score is the value of the moderated t-stastistic (see eBayes of limma package for further details). Otherwise, the median probe log2-intensity among the given replicates or the median of all pairwise log2-fold changes between both states will be used as estimate of the probe differential score. Note that the moderated t-statistic can only be used if replicate information is available.
gc Indicates whether GC content of probe sequences will be calculated. It is defined as fraction of both Gs and Cs in the probe sequence. The probe sequences may be set arbitrarily if gc is disabled.
verbose Indicates whether information on progress are printed.

Details

Reads tiling array data from custom-formatted files that may be created by any tiling array platform. It includes probe information that is separated by tabulators. Except for comment (indicated by '#' on beginning) or empty line, each line must contain the following columns: probe identifier that is unique within each file and consistent among the given files in terms of their probe coordinates, name of the reference sequence, start and end position of the probe on the reference sequence (both 0-based, as integer), intensity value of PM probe (non-log scale), intensity value of MM probe (on non-log scale), and probe sequence in order to calculate the GC content. The data must not contain 'NA' values. Moreover, the probe sequences may be set arbitrary if gc is disabled. The method generates a data.frame comprising all required data on probes that are necessary for the subsequent shuffling analysis.

Value

Returns a data.frame containing information on all probes, i.e., the name of the reference sequence, the chromosome name (here: both are equal), the probe center position, the length of the probe sequence, the GC content of the probe sequence, the match score (if matchscore is enabled), and the log2 probe score.

Examples

## This example requires the custom-formatted files
## in the extdata folder of this package. Otherwise,
## it aborts with an error.
path <- system.file("extdata", package = "TileShuffle")
stopifnot(path != "")
## define filename to custom-formatted file
custom.filename <- file.path(path, "custom.txt")
stopifnot(file.exists(custom.filename))

## read custom-formatted file and return data.frame
## with information such as genomic localization
## of probes, GC content of probe sequences, and
## probe intensities.
custom <- TileReadCustom(custom.filename=custom.filename,
pmonly=TRUE, gc=TRUE, verbose=FALSE)

## getting an overview on the reported data.frame
str(custom)

## investigating data
## e.g. plot density of intensities
pdf(file="custom_int_density.pdf")
plot(density(custom$intensity), main="", xlab="Intensity")
dev.off()
## or GC bias with three GC content bins
pdf(file="custom_gc_bias.pdf")
boxplot(custom$intensity ~ cut(custom$gc,
breaks=c(0,0.36,0.52,Inf),right=FALSE),
xlab="GC content", ylab="Intensity")
dev.off()
## cleanup
rm(path, custom.filename, custom)

[Package TileShuffle version 0.2.0 Index]