Skip to main content

Resource: Affymetrix microarray data sets

Data sets from Affymetrix Inc.

Affymetrix: Genome-Wide Human SNP Array 5.0 Sample Data Set
Chip type(s): GenomeWideSNP_5
Description: 30 (10 trios) HapMap CEU samples. Among these, 4 are replicated 4 times, and one is replicated 3 times. In total 25+4*4+1*3 = 44 hybridizations.
URL: http://www.affymetrix.com/support/technical/sample_data/genomewide_snp5_data.affx

Affymetrix: Genome-Wide Human SNP Array 6.0 Sample Data Set
Chip type(s): GenomeWideSNP_6
Description: 270 HapMap samples data set [270 hybridizations], and Chromosome X Titration data set (1X, 2X, 3X, 4X, 5X) of five replicates [5x5=25 hybridizations]. Note: According to some people, some 5X and 4X have been "swapped", cf. http://dchip.forum5.com/viewtopic.php?t=179&mforum=dchip.
Note: This data set need to be ordered from Affymetrix and comes on three DVDs together.
URL: http://www.affymetrix.com/support/technical/sample_data/genomewide_snp6_data.affx

Affymetrix: Human Exon 1.0 ST Array Data Set
Chip type(s): HuEx-1_0-st-v2
Description: Tissue/Mixture.
URL: http://www.affymetrix.com/support/technical/sample_data/hugene_1_0_array_data.affx

Affymetrix: Human Gene 1.0 ST Array Data Set
Chip type(s): HuGene-1_0-st-v1
Description: Tissues.
URL: http://www.affymetrix.com/support/technical/sample_data/hugene_1_0_array_data.affx

Affymetrix: Mouse Gene 1.0 ST Array Data Set
Chip type(s):
Description: Tissues.
URL: http://www.affymetrix.com/support/technical/sample_data/hugene_1_0_array_data.affx

Affymetrix: Rat Gene 1.0 ST Array Data Set
Chip type(s):
Description: Tissues.
URL: http://www.affymetrix.com/support/technical/sample_data/hugene_1_0_array_data.affx

Affymetrix: Human Genome U133 Plus 2.0 Array Data Set
Chip type(s): HG-U133_Plus_2
Description: Tissue/Mixture.
URL: http://www.affymetrix.com/support/technical/sample_data/hugene_1_0_array_data.affx

Affymetrix: Mapping10K_Xba131/Mapping10K_Xba142
Chip type(s): Mapping10K_Xba131, Mapping10K_Xba142
Description: CHP files (only) for 5 CEU trios (=15 individuals), 4 Africans, 2 Asians, 8 Caucasians.
URL: https://www.affymetrix.com/analysis/10ksample.affx

Affymetrix: Mapping 100k HapMap Trio Dataset
Chip type(s): Mapping50K_Xba240, Mapping50K_Hind240
Description: Affymetrix - 30 CEU trios (= 30x3x2=180 hybridizations)
URL: http://www.affymetrix.com/support/technical/sample_data/hapmap_trio_data.affx

Affymetrix: 14 Human Mitochondrial Resequencing samples
Chip type(s): Mitochip_2
Description: Affymetrix. This dataset contains 14 Mitov2 array CEL files. The probe array files represent 3 sets of CEPH trios including 2 sets of technical replicates and 2 human mito standard samples.
URL: http://www.affymetrix.com/support/datasets.affx

Affymetrix: 270 Mapping 100K HapMap samples
Chip type(s): Mapping50K_Xba240, Mapping50K_Hind240
Description: HapMap. All 270 HapMap (CEU, CHB+JPT, YRI) 100K samples.
URL: http://www.affymetrix.com/support/technical/sample_data/500k_data.affx

Affymetrix: X Chromosome titration set for 100K
Chip type(s): Mapping50K_Xba240, Mapping50K_Hind240
Description: Affymetrix - 1X, 2X, 3X, 4X, 5X titration set.
URL: http://www.affymetrix.com/support/technical/sample_data/copy_number_data.affx#1_2

Affymetrix: 100K breast cancer cell lines
Chip type(s): Mapping50K_Xba240, Mapping50K_Hind240
Description: Affymetrix - 3 human breast cancer cell lines (SKBR3, MCF7, & ZR75-30), and reference DNA (Ref103).
URL: http://www.affymetrix.com/support/technical/sample_data/copy_number_data.affx#1_2

Affymetrix: 500K Breast cancer cell lines
Chip type(s): Mapping250K_Nsp, Mapping250K_Sty
Description: Affymetrix - 3 human breast cancer cell lines (SKBR3, MCF7, & ZR75-30), and reference DNA (Ref103).
URL: http://www.affymetrix.com/support/technical/sample_data/copy_number_data.affx

Affymetrix: 48 Mapping 500k HapMap Trio Dataset
Chip type(s): Mapping250K_Nsp, Mapping250K_Sty
Description: Affymetrix, 2005. 48 CEU trios (=13x3x2 Nsp+Sty 500K hybridizations).
URL: http://www.affymetrix.com/support/technical/sample_data/500k_data.affx

Affymetrix: 270 Mapping 500k HapMap samples
Chip type(s): Mapping250K_Nsp, Mapping250K_Sty
Description: HapMap, 2006. All 270 HapMap (CEU, CHB+JPT, YRI) 500K samples.
URL: http://www.affymetrix.com/support/technical/sample_data/500k_data.affx

Affymetrix: Sample Data Sets for Copy Number Analysis
Chip type(s): Mapping250K_Nsp, Mapping250K_Sty
Description: 9 Tumor/Normal pairs derived from human cancer cell lines (=9x2 samples = 18x2 hybridizations), and X Chromosome titration set (3X, 4X, and 5X) with 4 replicates. (=3x4x2 hybridizations). 2006.
Note: The data sets are split up in multi-part ZIP archives. See Page 'Affymetrix multi-part DTT/ZIP archives' on how to deal with such files.
URL: http://www.affymetrix.com/support/technical/sample_data/copy_number_data.affx

Affymetrix: Cytogenetics Sample Data
Chip type(s): GenomeWideSNP_6
Description: 10 GWS6 CEL files of samples with UPD on chromosome 15 (two), DMD-del Xp21.1, Williams Syndrome, Mosaic Trisomy, Turner Mosaic, Trisomy 13, Smith-Magenis, Angelman/Prader-Willi, and Normal Male Sample.
URL: http://www.affymetrix.com/support/technical/sample_data/gtc_cytogenetic_data.affx

Data sets from the HapMap Consortium

HapMap: HapMap Portal
URL: http://www.hapmap.org/

HapMap: 270 Affymetrix Mapping 100K HapMap samples
Description: All 270 HapMap Phase II (CEU, CHB+JPT, YRI) 100K samples.
URL: ftp://ftp.ncbi.nlm.nih.gov/hapmap/raw_data/affy100k/

HapMap: 270 Affymetrix Mapping 500K HapMap samples
Chip type(s): Mapping250K_Nsp, Mapping250K_Sty
Description: All 270 HapMap Phase II (CEU, CHB+JPT, YRI) 500K samples.
URL: ftp://ftp.ncbi.nlm.nih.gov/hapmap/raw_data/affy500k/

HapMap: 1200+ Affymetrix GenomeWideSNP_6 HapMap samples
Chip type(s): GenomeWideSNP_6
Description: 1200+ HapMap Phase III (many different populations) GWS6 samples.
URL: ftp://ftp.ncbi.nlm.nih.gov/hapmap/raw_data/hapmap3_affy6.0

Data sets from EBI ArrayExpress

ArrayExpress: ArrayExpress Portal
URL: http://www.ebi.ac.uk/arrayexpress/

ArrayExpress: Data set E-TABM-185
Chip type: HG-U133A
Description: ArrayExpress, EMBL-EBI, March 2007. 5896 (sic!) HG-U133A CEL files.
URL: http://www.ebi.ac.uk/arrayexpress/#ae-browse/q=E-TABM-185

ArrayExpress: Data set E-MEXP-1481
Chip type: Hs_PromPR_v02
Title: Methylation profiling of human normal and cancer tissues to identify long range epigenetic aberrations.
Description: "...25 hybridizations, using 25 samples of species [Homo sapiens], using 25 arrays of Affymetrix GeneChip Human Promoter 1.0R Array [Hs_PromPR_v02], producing 25 raw data files..."
URL: http://www.ebi.ac.uk/microarray-as/ae/browse.html?keywords=E-MEXP-1481

Data sets from the Public Expression Profiling Resource (PEPR)

PEPR: PEPR Portal
URL: http://pepr.cnmcresearch.org/

Data sets from NCBI's Gene Expression Omnibus (GEO)

GEO: Data Set Portal

GSE5258: Connectivity Map dataset (build01)
Chip type(s): HG-U133A
Description: "A reference collection of genome-wide transcriptional expression data for bioactive small molecules. [...] The current collection (build01) contains data for 164 distinct small molecules applied to freely cycling human cell lines, represented by 453 individual treatment and matched vehicle control pairs.". In total 564 samples.
Reference(s): Lamb J et al. The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science, 2006.
URL: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE5258

GSE9222: Structural Variation of Chromosomes in Autism Spectrum Disorder.
Chip type(s): Mapping250KNsp, Mapping250K_Sty
Description: Mapping250K
{Nsp|Sty} CEL files for DNA derived from blood or lymphoblasts. In total, 426 unrelated probands were analyzed along with 232 parents (116 trios). ...
Reference(s): Marshall et al. Structural Variation of Chromosomes in Autism Spectrum Disorder, Am J Hum Genet, Feb 2008.
URL: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE9222

GSE12019: Fine-scale mapping of copy-number alterations with massively parallel sequencing
Chip type(s): Mapping250K_Sty
Description: "In order to benchmark the reproducibility of Affymetrix Genome-Wide Human SNP Array 6.0 for detecting copy-number alterations, we performed replicate hybridizations of 3 tumor cell lines and 2 paired normal cell lines obtained from the American Type Culture Collection (ATCC). [...] 77 replicates of HCC1143 (breast ductal carcinoma), 69 replicates of HCC1143BL (matched normal), 42 replicates of HCC1954 (breast ductal carcinoma), 36 replicates of HCC1954BL (matched normal), 1 replicate of NCI-H2347 (lung adenocarcinoma)"
Reference(s): Chiang, D. Y.; Getz, G.; Jaffe, D. B.; O'Kelly, M. J. T.; Zhao, X.; Carter, S. L.; Russ, C.; Nusbaum, C.; Meyerson, M. & Lander, E. S. High-resolution mapping of copy-number alterations with massively parallel sequencing. Nature Methods, 2009, 6, 99-103
URL: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE12019

GSE13021: Copy number analysis of human glioblastoma multiforme
Chip type(s): Mapping250K_Nsp
Description: "Copy number analysis of human GBM samples were performed, and a high frequency of deletions of the PTPRD gene on chromosome 9p23-24.1 were identified. [...] Genomic DNA from 58 GBM tumor samples were hybridized to Affymetrix 250K NspI Gene Chip Arrays and analyzed by dChip using the hg17 genome assembly."
Reference(s): Solomon DA, Kim JS, Cronin JC, Sibenaller Z et al. Mutational inactivation of PTPRD in glioblastoma multiforme and malignant melanoma. Cancer Res 2008 Dec 15;68(24):10300-6.
URL: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE13021

GSE13372: High-resolution mapping of copy-number alterations with massively parallel sequencing
Chip type(s): GenomeWideSNP_6
Description: "In order to benchmark the reproducibility of Affymetrix 238K Sty arrays for detecting copy-number alterations. We performed replicate hybridizations of 3 tumor cell lines and 2 paired normal cell lines obtained from the American Type Culture Collection (ATCC). [...] 21 replicates of HCC1143 (breast ductal carcinoma), 21 replicates of HCC1143BL (matched normal), 13 replicates of HCC1954 (breast ductal carcinoma), 11 replicates of HCC1954BL (matched normal), 1 replicate of NCI-H2347 (lung adenocarcinoma), 1 replicate of NCI-H2347BL (matched normal)"
Reference(s): Chiang, D. Y.; Getz, G.; Jaffe, D. B.; O'Kelly, M. J. T.; Zhao, X.; Carter, S. L.; Russ, C.; Nusbaum, C.; Meyerson, M. & Lander, E. S. High-resolution mapping of copy-number alterations with massively parallel sequencing. Nature Methods, 2009, 6, 99-103
URL: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE13372
URL: http://www.broad.mit.edu/cancer/pub/solexa_copy_numbers/

GSE14996: Multisampled Lethal Metastatic Prostate Cancer Copy Number Analysis
Chip type(s): GenomeWideSNP_6
Description: "Purified cancer DNA from frozen metastatic cancer tissue obtained at autopsy studied by Affymetrix Genome-Wide Human SNP (single nucleotide polymorphism) Array 6.0 analysis (Affy6)."
Reference(s): Liu W, Laitinen S, Khan S, Vihinen M et al. Copy number analysis indicates monoclonal origin of lethal metastatic prostate cancer. Nat Med 2009 May;15(5):559-65.
URL: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE14996

GSE16619: Identification of novel gene amplification events in breast cancer
Chip type(s): Mapping250K_Nsp, Mapping250K_Sty, GenomeWideSNP_5
Description: To identify novel gene amplification events that may contribute to breast cancer progression, we examined copy number variation in 161 primary breast cancer samples using the Affymetrix 250K_Nsp and 250K_Sty microarrays or the Affymetrix SNP5.0 microarray.
Reference(s): Kadota M, Sato M, Duncan B, Ooshima A et al. Identification of novel gene amplifications in breast cancer and coexistence of gene amplification with an activating mutation of PIK3CA. Cancer Res 2009 Sep 15;69(18):7357-65. PMID: 19706770.
URL: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE16619

GSE20584: The mutation spectrum revealed by paired genome sequences from a lung cancer patient
Chip type(s): GenomeWideSNP_6
Description: One lung tumor and its adjacent normal were profiled for copy-number alterations with the high-resolution Affymetrix SNP6.0 Array.
Reference(s): Lee W, Jiang Z, Liu J, Haverty PM et al. The mutation spectrum revealed by paired genome sequences from a lung cancer patient. Nature 2010 May 27;465(7297):473-7. PMID: 20505728
URL: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE20584

Data sets from the Tumor Cancer Genome Atlas (TCGA) project

TCGA: TCGA Data Portal
Description: The Cancer Genome Atlas (TCGA) project.
URL: http://tcga-data.nci.nih.gov/tcga/homepage.htm (Data Access Matrix)
URL: http://tcga-data.nci.nih.gov/tcga/findArchives.htm (Search by Archives; old approach)

Data sets from The Cancer Genome Project (Sanger Institute)

URL: http://www.sanger.ac.uk/genetics/CGP/ CGP: CGP Data Archive
Chip type(s): GenomeWideSNP_6
Description: The Cancer Genome Project data repository.
URL: http://www.sanger.ac.uk/genetics/CGP/Archive/

Data sets from The Broad Institute

Broad Institute et al.: Broad-Novartis Cancer Cell Line Encyclopedia (CCLE)
Chip type(s): GenomeWideSNP_6, HG-U133_Plus_2, OncoMap mutation data, Hybrib capture sequencing
Description: "The Cancer Cell Line Encyclopedia (CCLE) project is a collaboration between the Broad Institute, and the Novartis Institutes for Biomedical Research and its Genomics Institute of the Novartis Research Foundation to conduct a detailed genetic and pharmacologic characterization of a large panel of human cancer models, to develop integrated computational analyses that link distinct pharmacologic vulnerabilities to genomic patterns and to translate cell line integrative genomics into cancer patient stratification. The CCLE provides public access to genomic data, analysis and visualization for about 1000 cell lines."
URL: http://www.broadinstitute.org/ccle/

Broad Institute: GenomeWideSNP_6 test samples for Birdsuite
Chip type(s): GenomeWideSNP_6
URL: http://www.broad.mit.edu/mpg/birdsuite/download.html
Description: 3 male and 3 female GenomeWideSNP_6 CEL files (part of the "Test input files").

Broad Institute: Genotyping the 270 HapMap samples for GAIN
URL: http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?id=phs000049
Description: 270 HapMap samples data set [270 hybridizations] - No more details.

Broad Institute: Assessing the significance of chromosomal aberrations in cancer
Chip type(s): Mapping50K_Hind240, Mapping50K_Xba240
Description: Brain Cancer, SNP/CN Analysis. 154 glioma and 33 normal samples each hybridized on the 100K chip set totalling (154+33)*2=374 hybridizations/CEL files. Also available at GEO as GSE9635.
Reference: Beroukhim et al. 2007, Assessing the significance of chromosomal aberrations in cancer: Methodology and application to glioma, PNAS, December 2007.
URL: http://www.broad.mit.edu/cgi-bin/cancer/publications/pub_paper.cgi?mode=view&paper_id=162&p=t

Broad Institute et al.: Characterizing the cancer genome in lung adenocarcinoma
Chip type(s): Mapping250K_Sty
Description: 2*384=768 Affymetrix Mapping250K_Sty CEL files for 384 tumor/normal sample pairs for Weir et al. (2007).
Reference: Weir et al. Characterizing the cancer genome in lung adenocarcinoma. Nature, December 2007, 450, 893-898.
URL: http://www.broad.mit.edu/cancer/pub/tsp/

Broad Institute et al.: Connectivity Map (02)
Chip type(s): HG-U133A
Description: 6100 Affymetrix HG-U133A CEL files.
Reference: (1) Lamb et al., The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease, Science, 2006. (2) J. Lamb, The Connectivity Map: a new tool for biomedical research, Nat Rev Cancer, 2007.
URL: http://www.broadinstitute.org/cmap/

Data sets from NIH's REMBRANDT

"REpository for Molecular BRAin Neoplasia DaTa (REMBRANDT) is a robust bioinformatics knowledgebase framework that leverages data warehousing technology to host and integrate clinical and functional genomics data from clinical trials involving patients suffering from Gliomas. The knowledge framework will provide researchers with the ability to perform ad hoc querying and reporting across multiple data domains, such as Gene Expression, Chromosomal aberrations and Clinical data."
URL: https://caintegrator.nci.nih.gov/rembrandt/

Data sets from elsewhere

GSK Cancer Cell Line Genomic Profiling Data
Chip types: Mapping 250K_Nsp, Mapping 250K_Sty, HG-U133_Plus_2
Description: "GlaxoSmithKline (GSK) has released the genomic profiling data for over 300 cancer cell lines via the National Cancer Institute's cancer Bioinformatics Grid (caBIG). Cancer cell lines can be manipulated in the laboratory and have been used extensively by GSK in the discovery and development of novel cancer therapeutics."
Data files: CEL files for in total 676 Affymetrix Mapping 250K_Nsp or Mapping 250K_Sty, and 950 HG-U133_Plus_2 hybridizations. Also MAS5 summarizes for the expression arrays.
URL: https://cabig.nci.nih.gov/tools/caArray_GSKdata/

Harvard: The Meyerson Lab
Chip types: ax13339 ("Early Access 10K")
Description: 58 paired CEL and *.txt genotype files for the ax13339 ("Early Access 10K") used in Zhao X et al. (2004).
References: Zhao X et al., An Integrated View of Copy Number and Allelic Alterations in the Cancer Genome Using Single Nucleotide Polymorphism Arrays, Cancer Research, 64, 3060-3071, May 2004.
URL: http://research.dfci.harvard.edu/meyersonlab/snp/snp.htm (under 'Data download')

Personal Genome Project (PGP)
Description: Currently there are 10 individuals for which PGP are doing SNP, CN, and exon analysis on. It is not clear if it is ony the results or the raw data that is/will be available. It looks like they will publish many more "personal genomes" later.
URL: http://www.personalgenomes.org/public/

Miscellaneous

ACTuDB: a database for the integrated analysis of array-CGH and clinical data for tumors
Author(s): The Curie Institute - Bioinformatics Unit
URL: http://bioinfo-out.curie.fr/actudb/ (see the 'Content' section)
Description: Contains a compiled list of publicly available copy-number data sets. Some of the data sets also have coupled expression data. GPL6801 or GPL8226