Significance and statistical errors in the analysis of DNA microarray data
Open Access
- 16 September 2002
- journal article
- research article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences of the United States of America
- Vol. 99 (20), 12975-12978
- https://doi.org/10.1073/pnas.162468199
Abstract
DNA microarrays are important devices for high throughput measurements of gene expression, but no rational foundation has been established for understanding the sources of within-chip statistical error. We designed a specialized chip and protocol to investigate the distribution and magnitude of within-chip errors and discovered that, as expected from theoretical expectations, measurement errors follow a Lorentzian-like distribution, which explains the widely observed but unexplained ill-reproducibility in microarray data. Using this specially designed chip, we examined a data set of repeated measurements to extract estimates of the distribution and magnitude of statistical errors in DNA microarray measurements. Using the common “ratio of medians” method, we find that the measurements follow a Lorentzian-like distribution, which is problematic for subsequent analysis. We show that a method of analysis dubbed ”median of ratios“ yields a more Gaussian-like distribution of errors. Finally, we show that the bootstrap algorithm can be used to extract the best estimates of the error in the measurement. Quantifying the statistical error in such measurements has important applications for estimating significance levels, clustering algorithms, and process optimization.Keywords
This publication has 9 references indexed in Scilit:
- Significance analysis of microarrays applied to the ionizing radiation responseProceedings of the National Academy of Sciences of the United States of America, 2001
- Importance of replication in microarray gene expression studies: Statistical methods and evidence from repetitive cDNA hybridizationsProceedings of the National Academy of Sciences of the United States of America, 2000
- Multivariate Measurement of Gene Expression RelationshipsGenomics, 2000
- Prediction of Gene Function by Genome-Scale Expression Analysis: Prostate Cancer-Associated GenesGenome Research, 1999
- Systematic determination of genetic network architectureNature Genetics, 1999
- [12] DNA arrays for analysis of gene expressionMethods in Enzymology, 1999
- Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA MicroarrayScience, 1995
- Statistical Data Analysis in the Computer AgeScience, 1991
- On the ratio of two correlated normal random variablesBiometrika, 1969