TileAnalysis {TileShuffle} | R Documentation |
The statistical analysis of tiling array expression data.
TileAnalysis(data, noofperms, winsize, qvalue, gcmode="fixed", gcnum, score.function="trimmed", randomize=FALSE, diff, diff.variant="B", regions.filename, zscore=TRUE, verbose=FALSE)
data |
A data.frame containing information on all probes,
i.e., the full name of the reference sequence (organism abbreviation
and chromosome name), the chromosome name, the probe center position,
the length of the probe sequence, the GC content of the probe
sequence, and the probe log2 -intensity or log2 -fold
change. This data.frame is reported by the functions
TileReadCel , TileReadSignal , or
TileReadCustom . |
noofperms |
Number of permutations used to sample the background
distribution. With higher number of permutations, the statistical
significance of windows can be assessed more precisely in particular
with more restrictive significance thresholds, i.e., low values of
the qvalue parameter. Moreover, in case of the differential
analysis (diff is enabled), the number of permutations should
be increased to sample the background distribution more accurately.
In general, noofperm is recommended to be set to 10000
and 100000 in case of expression and differential expression
analyses, respectively. |
winsize |
Maximal width of the windows that are being statistically
assessed. The width is defined as the difference in the genomic
center positions of the first and last enclosed probe. Due to
gaps in the commonly uniform distribution of probes over the
genomic sequence. The analyzed windows may be considerably shorter
than the defined winsize or may even consist of only one
probe. |
qvalue |
Maximal permitted q-value that is applied in the statistical
analysis. Hence, windows with a q-value above the given value will
not be included in the returned data.frame . |
gcmode |
Mode of GC content binning. In case of gcmode set to
"fixed", the classification of probes in bins was predefined
considering the GC content effect on probe intensities on the
Affymetrix tiling array 1.0R platform. In this case, only the
following values are permitted: 1, 2, 3, or 4. By setting
gcmode to "automatic", the binning is done automatically
solely on the distribution of the GC content of the probes in order
to obtain GC content bins that are optimally balanced in terms of
their sizes. |
gcnum |
Number of different GC content bins where probes within each
bin have a similar expected sequence-specific affinity and are
permuted independently from each other. Accordingly, intensities
of probes that belong to different affinity bins must not be
interchanged. Due to the trade-off between the reduction of the
sequence-specific effect and the maintenance of sufficiently large
permutation bins, three GC content bins are recommended in the
expression analysis. In case of gcmode set to "fixed", the
classification of probes in bins was predefined considering the
GC content effect on probe intensities on the Affymetrix tiling
array 1.0R platform. In this case, only the following values are
permitted: 1, 2, 3, or 4. By setting gcmode to "automatic",
the binning is done automatically solely on the distribution of the
GC content of the probes in order to obtain GC content bins that are
optimally balanced in terms of their sizes. Note that the
gcnum is set to one in case of differential expression
analysis (diff enabled) since sequence-specific effect cancel
out and affinity binning is rendered unnecessary. |
score.function |
Function to calculate windows scores over the
log2 -intensities or log2 -fold changes of the
corresponding probes, i.e., arithmetic average
(score.function = "mean"), arithmetic mean trimmed by the
minimal and maximal value (score.function = "trimmed"),
or the median (score.function = "median"). Note that the
definition of trimmed mean differs from the common one with given
percentile ranges. Moreover, the resulting scores with trimmed
mean may differ from the mean only in case of windows that contain
more than two probes. The latter two scoring functions are
recommended due to their higher robustness against outliers. However,
due to the higher calculation costs, the running time increases by
selecting "trimmed" or "median". Note that the function is given as
character . |
randomize |
Indicates whether an additional permutation is applied prior to the calculation of original window scores. It is a possiblity to roughly estimate the false positive rate since under the assumption of mostly unexpressed probes no window over permuted intensities is expected to differ significantly from the background distribution. |
diff |
Indicates whether differential expression analysis is applied. |
diff.variant |
The variants of the differential expression analysis
differ in score calculation, in the permutation procedure as well as
in their assignment of statistical significance to windows. The
diff.variant A is similar to the normal expression analysis
but two-tailed p-values are estimated to regard both regulation
directions, up and down. The multiple testing correction is then
adjusted to account for these additional comparisons. The
diff.variant B assumes that entire windows are either up-
or down-regulated between conditions. The presumed direction of
regulation is initially assigned to each window on the basis of its
score. Subsequently, all converse probes, i.e., probes with negative
log2 -fold change within positive windows or vice versa, are
ignored and neigther permuted nor incorporated into the score
calculation. Consequently, positive and negative windows are compared
to different background distributions. The p-value estimation and
correction is done equivalent as in the case of the normal expression
analyses. Both variants produce fairly similar results while the
variant B is slightly superior in its performance and hence
recommended. |
regions.filename |
Filename of BED-formatted file that contains regions
to which the (differential) expression analysis should be limited to.
Hence, only windows entirely enclosed in the union of these regions
are statistically evaluated. Commonly, this parameter is used in
order to identify highly and differentially expression (highdiff)
regions by restricting the differential expression analysis
(diff enabled) to regions identified as highly expressed in
either one of the corresponding cellular conditions. |
zscore |
Indicates whether z-scores, i.e., normalized window scores,
should be calculated by use of the sampled background. More
precisely, the window z-score z is calculated by
z = frac{x - μ}{σ} where x is the window score
and μ and σ are the mean and standard deviation
of the permuted window scores, respectively. Note that negative
z-scores indicate down-regulation and positive indicate up-regulated
regions since z-score are bounded to zero from below or from above in
case of up- or down-regulation, respectively. Hence, using common
expression analysis (if diff is FALSE ), the z-score
cannot be negative. |
verbose |
Indicates whether information on progress are printed. |
Executes the statistical analysis of tiling array expression data that
identifies (differential) expression as significant changes from the
background distribution while considering sequence-specific affinities as
well as cross-hybridization. This method returns a list
with two
data.frames
: one containing the information on the analyzed window
including their estimated z-score if zscore
is enabled while the other
one comprise the significantly (differentially) expressed segments.
Returns a list
with two data.frames
: one containing
the analyzed windows including the calculated z-score and the other
one comprise the significantly expressed segments. The first
list
entry is NULL
if zscore
is FALSE
.
Otherwise, the entry keeps the z-score data.frame
containing
name of reference sequence, start and end position, description,
estimated z-score, and `+' as strand for each analyzed window. The
description is an underscore-delimitted string of the number of
covered probes, the average GC content of their sequences, the window
q-value multiplied by 100, and the window score calculated with the
given scoring.function
on the probe scores. The second
list
entry is a data.frame
that comprises the
significantly (differentially) expressed segments including name of
reference sequence, start and end position, description, score that
is uniformly set to `10' and `+' as strand. The description is an
underscore-delimitted string of the name of reference sequence,
the start and end position, the number of probes covered by the segment,
the average GC content of their sequences, the minimal q-value of
windows that were merged into the segment (multiplied by 100), and
the segment score calculated with the given scoring.function
on the covered probes.