Skip to main content

HuEx-1_0-st-v2: Affymetrix transcript clusters definitions

Author: Elizabeth Purdom, Mark Robinson, Ken Simpson
Created on: 2007-12-03
Last updated: 2012-08-30

Affymetrix provides clustering of the exon probesets that are meant to roughly correspond to genes (see Affymetrix documentation). An advantage of these groupings of the exons is that each exon and probe is uniquely mapped to only one transcript cluster. For exon array analysis, we would like a special CDF which maps transcript cluster IDs to exon IDs.Affymetrix does not provide a CDF for these mappings, but Ken Simpson created such CDFs based on the design-time annotation. However, Affymetrix updates these definitions quarterly, so these definitions no longer correspond to the definitions that users of Affymetrix's software would get. Elizabeth Purdom has created updated CDFs based on Affymetrix annotation and will plan to keep these update.

Note that Affymetrix classifies each probeset (exon) as to its reliability, based on which annotation supports it. The classifications are "core","extended","full", "free", and "ambiguous" with "core" being the most reliable. To do analysis constrained to only these definitions, use the corresponding CDF. Note that if you are going to switch back and forth, you should probably define a tag in your call so that your results are not copied over. See Page 'Fullnames, names and tags of directories and files' and thread 'tag' (Dec 5, 2007). Because there will be future revisions, the string 'Rx' is added to the end to indicate the revision and allow a shorter way of identify the CDF than the date, for example in tagging.

NetAffx Nov. 12, 2007 (R3)

Author: Elizabeth Purdom, UC Berkeley.
Created on: 2007-11-12

NetAffx Sept. 14, 2007 (R2)

Author: Elizabeth Purdom, UC Berkeley.
Created on: 2007-09-14

"I used the file 'HuEx-1_0-st-v2.na23.hg18.probeset.csv' to map the probesets defined in Affymetrix's default CDF into transcript clusters. Note that I did not use definitions at the probe level, so if probes are mapped to different probesets in the updated annotation, I would have missed this (I do not think this is the case). One probeset in the current annotation is not contained in Affymetrix's default CDF, otherwise all probesets in the annotation file matched a probeset in the Affymetrix default CDF; I have not investigated this one probeset yet, though I think that it was also the case for Ken's conversion based on notes I received from him." /Elizabeth Purdom (2007-12-03).

Download all: HuEx-1_0-st-v2,R2,A20070914,EP.zip

Note: Above files are hosted by Berkeley Cancer Genome Center.

Design-time annotation

Author: Ken Simpson, WEHI.
Created on: 2007-??-??

"I used the file 'HuEx-1_0-st-v2.r2.dt1.hg18.core.mps', downloaded from the Affymetrix web site, to create a list of transcript-exon mappings. This was used to pull out the cell indices corresponding to each transcript, and new CDFs." /Ken Simpson, 2007-02-21.

"Note that these annotations have some problems. In particular, there are a set of probes that are each mapped to multiple probesets. This is not the case for the NetAffx files above so far. The source for this problem has not been discovered. See thread 'My problem with the residuals -- solved??' (October 18, 2007) for the hazards this will create. If there is a desire for CDFs corresponding to the design-time annotation rather than the updated annotation above, please contact me and I might see if I can correct this problem." /Elizabeth Purdom (2007-12-03).