Correcting for intra-experiment variation in Illumina BeadChip data is necessary to generate robust gene-expression profiles
Open Access
- 24 February 2010
- journal article
- Published by Springer Science and Business Media LLC in BMC Genomics
- Vol. 11 (1), 1-14
- https://doi.org/10.1186/1471-2164-11-134
Abstract
Background: Microarray technology is a popular means of producing whole genome transcriptional profiles, however high cost and scarcity of mRNA has led many studies to be conducted based on the analysis of single samples. We exploit the design of the Illumina platform, specifically multiple arrays on each chip, to evaluate intra-experiment technical variation using repeated hybridisations of universal human reference RNA (UHRR) and duplicate hybridisations of primary breast tumour samples from a clinical study. Results: A clear batch-specific bias was detected in the measured expressions of both the UHRR and clinical samples. This bias was found to persist following standard microarray normalisation techniques. However, when mean-centering or empirical Bayes batch-correction methods (ComBat) were applied to the data, inter-batch variation in the UHRR and clinical samples were greatly reduced. Correlation between replicate UHRR samples improved by two orders of magnitude following batch-correction using ComBat (ranging from 0.9833-0.9991 to 0.9997-0.9999) and increased the consistency of the gene-lists from the duplicate clinical samples, from 11.6% in quantile normalised data to 66.4% in batch-corrected data. The use of UHRR as an inter-batch calibrator provided a small additional benefit when used in conjunction with ComBat, further increasing the agreement between the two gene-lists, up to 74.1%. Conclusion: In the interests of practicalities and cost, these results suggest that single samples can generate reliable data, but only after careful compensation for technical bias in the experiment. We recommend that investigators appreciate the propensity for such variation in the design stages of a microarray experiment and that the use of suitable correction methods become routine during the statistical analysis of the data.Keywords
This publication has 40 references indexed in Scilit:
- A re-annotation pipeline for Illumina BeadArrays: improving the interpretation of gene expression dataNucleic Acids Research, 2009
- Illumina WG-6 BeadChip strips should be normalized separatelyBMC Bioinformatics, 2009
- The development of a comparison approach for Illumina bead chips unravels unexpected challenges applying newest generation microarraysBMC Bioinformatics, 2009
- An embryonic stem cell–like gene expression signature in poorly differentiated aggressive human tumorsNature Genetics, 2008
- Gene Expression Signatures, Clinicopathological Features, and Individualized Therapy in Breast CancerJAMA, 2008
- The properties of high-dimensional data spaces: implications for exploring gene and protein expression dataNature Reviews Cancer, 2008
- The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurementsNature Biotechnology, 2006
- Evaluation of gene expression measurements from commercial microarray platformsNucleic Acids Research, 2003
- Repeated observation of breast tumor subtypes in independent gene expression data setsProceedings of the National Academy of Sciences, 2003
- Significance analysis of microarrays applied to the ionizing radiation responseProceedings of the National Academy of Sciences of the United States of America, 2001