Block: doCRMAv2() / doASCRMAv2()
Author: Henrik Bengtsson
Created on: 2010-05-17
Last updated on: 2012-06-18
The CRMA v2 (Bengtsson, Wirapati, and Speed, 2009) is the recommended
method for estimating full-resolution copy numbers (CN) from all
Affymetrix SNP and CN chip types, including custom-made ones. CRMA v2 can
be done either by doing each step explicitly as in the vignettes, or
using the following doCRMAv2()
and doASCRMAv2()
methods.
Notes:
- If this is your first analysis within the aroma project, please make sure to first read the 'Setup' and 'Definition' pages. This will explain the importance of following a well defined directory structure and file names. Understanding this is important and will save you a lot of time.
- If this is your first analysis with a given chip type, please visit the 'Setup' section of the CRMA v2 vignette to make sure that your setup contains all the required annotation files.
Usage:
ds <- doCRMAv2("HapMap270,testSet", chipType="GenomeWideSNP_6")
ds <- doCRMAv2("HapMap270,testSet", chipType="GenomeWideSNP_6,Full")
ds <- doCRMAv2("HapMap270,testSet", chipType="GenomeWideSNP_5")
ds <- doCRMAv2("HapMap270,testSet", chipType="GenomeWideSNP_5,Full,r2")
ds <- doCRMAv2("HapMap270,testSet", chipType="Mapping250K_Nsp", plm="RmaCnPlm")
ds <- doCRMAv2("HapMap270,testSet", chipType="Mapping50K_Hind240", plm="RmaCnPlm"
ds <- doCRMAv2("GSE8605", chipType="Mapping10K_Xba142", plm="RmaCnPlm")
Argument 'verbose': Since CRMA v2 will take minutes or hours, we
suggest that you add argument verbose=-10
to see some verbose output
while processing the data set.
Argument 'plm': Note how we specify plm="RmaCnPlm"
for the
10K-500K platforms. We do this in order to control for probe-affinity
effects, which are possible to estimate for those chip types.
Probe-affinity effects are not possible to estimate in the newer chip
types (e.g. GWS) because there all replicated probes are
technical/identical replicates. See the CRMA v2 paper for more details.
Single- vs multi-array method: When fitting the CRMA v2 model with
plm="RmaCnPlm"
, the method is a multi-array method, whereas with the
default (plm="AvgCnPlm"
), then method is a single-array method. In the
latter case the results for one array will be independent of the other
arrays in the same data set. This make is possible to process a subsets
of the arrays and postponing the others for later and still get the same
results. See the CRMA v2 paper for more details.
Process a subset of the arrays: In case the single-array method
(plm="AvgCnPlm"
) is used it is possible to process a subset of the
arrays without having to worry about what arrays to including in the
batch. The results will be the same regardless. For instance, the
following two cases will give identical results:
dsA <- doCRMAv2("HapMap270,testSet", chipType="GenomeWideSNP_6")
dsA <- dsA[c(6,1,3:4)]
dsB <- doCRMAv2("HapMap270,testSet", chipType="GenomeWideSNP_6", arrays=c(6,1,3:4))
Note, this is only the case if plm="AvgCnPlm"
.
Allele-specific copy-number (ASCN) estimates: By adding argument
combineAlleles=FALSE
to the above, allele-specific CNs will be
estimated (for SNPs). The default is to estimate total CNs
(combineAlleles=TRUE
). Alternatively, use doASCRMAv2()
, which is
short for doCRMAv2(..., combineAlleles=FALSE)
. The estimated total CNs
will differ slightly when using the two alternatives and the CRMA v2
method has been optimized for total CNs (as in CRMA v2 paper). However,
if you plan to do ASCN analysis, we recommend to use
combineAlleles=FALSE
from the beginning, because the statistical
performance is almost as good while you will save lots of time only
processing the data once.
References
[1] H. Bengtsson, P. Wirapati, and T. P. Speed. "A single-array preprocessing method for estimating full-resolution raw copy numbers from all Affymetrix genotyping arrays including GenomeWideSNP 5 & 6". Eng. In: Bioinformatics (Oxford, England) 25.17 (Sep. 2009), pp. 2149-56. ISSN: 1367-4811. DOI: 10.1093/bioinformatics/btp371. PMID: 19535535.