Settings
This section describes global options that you set in order change the default behavior of the aroma framework.
Querying and modifying settings
All settings specific to the aroma packages are stored in the R list
object aromaSettings
. An overview of the current settings can be
obtained as:
> str(as.list(aromaSettings))
List of 4
$ memory:List of 2
..$ ram : num 1
..$ gcArrayFrequency: num 50
$ rules :List of 1
..$ allowAsciiCdfs: logi FALSE
$ output:List of 2
..$ checksum : logi FALSE
..$ timestampsThreshold: num 500
$ models:List of 1
..$ RmaPlm:List of 2
.. ..$ medianPolishThreshold: num [1:2] 500 6
.. ..$ skipThreshold : num [1:2] 5000 1
A particular setting of this list structure is specified as files on a
file system, e.g. "memory/ram"
. For instance,
value <- getOption(aromaSettings, "memory/ram")
will retrieve the current setting and
setOption(aromaSettings, "memory/ram", newValue)
will change the same setting.
Saving settings
After changing some of the aroma settings, they can be saved to disk
(default ~/.aromaSettings
) such that they will be loaded automatically
next time an aroma.* package is loaded. In order to do this, do:
saveAnywhere(aromaSettings)
Available Settings
Memory-related settings
memory/ram
Value: A positive double.
Default: 1.0
Applies to: Methods processing data in chunks of cells or units, e.g. probe-level summarization.
Description: A scale factor controlling the size of each chunk read into memory and processed in each iteration. On systems with very limited amount of memory it may be set to a smaller value than 1.0. On systems with a lot of memory, it may be set to a value greater than 1.0 to allow more data to be processed in each chunk, which may decrease the relative overhead from the file I/O.
See also: How to 'Improve processing time'.
memory/gcArrayFrequency
Value: A positive integer.
Default: 50
Applies to: Methods processing data in chunks.
Description: When processing data in chunks temporary variables are allocated and discarded. The built in garbage collector (GC) of the R engine will automatically clean up after this when memory is needed. However, it may still be the case that the memory will become too fragmented and one wish to take a precautious approach and cleaning up data more frequently. This settings specifies how many iterations is done before calling the GC.
Warning: This settings will be deprecated at some stage. /HB 2009-12-04
Statistical analysis settings
models/RmaPlm/medianPolishThreshold
Value: Two positive integers c(nbrOfCells, nbrOfArrays)
Default: c(500, 6)
Applies to: Fitting an RmaPlm
model.
Description: This setting specifies when the median polish estimator is
used instead of the robust linear model estimator. The median polish is
forced to be used if the number of arrays analyzed is (strictly) greater
than nbrOfArrays
and the number of cells in the probeset (unit group)
is (strictly) greater than nbrOfCells
.
Motivation: When using robust linear model estimators (the default) for
RmaPlm
, the fitting time of a probeset will grow exponentially with
the number of samples. It will also grow, but not as dramatically with
the number of cells in the probeset. When the numbers samples is very
large this will be too expensive. An alternative is then to use the
median polish estimator instead, whose processing time is linear.
models/RmaPlm/skipThreshold
Value: Two positive integers c(nbrOfCells, nbrOfArrays)
Default: c(5000, 1)
Applies to: Fitting an RmaPlm
model.
Description: This setting specifies when a probeset is skipped. A
probeset (unit group) is not fitted if the number of arrays analyzed is
(strictly) greater than nbrOfCells
and the number of cells in the unit
is (strictly) greater than nbrOfCells
. When a probeset is skipped, the
parameter estimates are set to NA
.
Motivation: For some CDFs there exists probesets with an extremely large number of cells and that will take a long time to fit. Such probesets have often no biological meaning, e.g. they contain cells that did not map to the genome or map to multiple places. This setting provides a convenient way to skip such probesets.
Rule settings
rules/allowAsciiCdfs
Value: A logical value (TRUE
or FALSE
).
Default: FALSE
Applies to: Using/setting a CDF of an AffymetrixCelSet
.
Description: This setting is used to prevent the usage of ASCII CDFs,
because they are really slow to work with and the memory overhead is
large. When it is FALSE
(default), only binary CDFs are accepted and
an error will be thrown if an ASCII CDF is used. If TRUE
, ASCII CDFs
are accepted.
Comment: Do not use ASCII CDFs unless really necessary. Instead,
convert existing ASCII CDFs into binary ones.
Display output settings
output/checksum
Value: A logical value (TRUE
or FALSE
).
Default: FALSE
Description: NOT IMPLEMENTED
output/path
Value: A logical value (TRUE
or FALSE
).
Default: TRUE
Description: NOT IMPLEMENTED
output/ram
Value: A logical value (TRUE
or FALSE
).
Default: TRUE
Description: NOT IMPLEMENTED
output/timestampsThreshold
Value: An integer (including Inf
).
Default: 500
Applies: To the print()
output of an AffymetrixCelSet
.
Description: When calling print()
on an AffymetrixCelSet
, the range
of time stamps of all CEL files is reported. This requires that the
header of each CEL file is queried, which might takes a lot of time if
the data set is large. This setting allows you to specify the maximum
number of arrays for which the time stamp range should be reported. If
the data set contains more arrays, the time stamps are neither queried
nor reported, which will be much faster for large data sets.
User profile settings
user/initials
Value: A character string.
Default: NULL
user/fullname
Value: A character string.
Default: NULL
user/email
Value: A character string.
Default: NULL
Beta-feature settings
devel/dropRootPathTags
Value: A logical value (TRUE
or FALSE
).
Default: FALSE
Description: If TRUE
, sibling root paths are recognized, otherwise
ignored. For more details, see 'How data files and data sets are
located'.