Evaluating reproducibility of differential expression discoveries in microarray studies by considering correlated molecular changes
Open Access
- 5 May 2009
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 25 (13), 1662-1668
- https://doi.org/10.1093/bioinformatics/btp295
Abstract
Motivation: According to current consistency metrics such as percentage of overlapping genes (POG), lists of differentially expressed genes (DEGs) detected from different microarray studies for a complex disease are often highly inconsistent. This irreproducibility problem also exists in other high-throughput post-genomic areas such as proteomics and metabolism. A complex disease is often characterized with many coordinated molecular changes, which should be considered when evaluating the reproducibility of discovery lists from different studies. Results: We proposed metrics percentage of overlapping genes-related (POGR) and normalized POGR (nPOGR) to evaluate the consistency between two DEG lists for a complex disease, considering correlated molecular changes rather than only counting gene overlaps between the lists. Based on microarray datasets of three diseases, we showed that though the POG scores for DEG lists from different studies for each disease are extremely low, the POGR and nPOGR scores can be rather high, suggesting that the apparently inconsistent DEG lists may be highly reproducible in the sense that they are actually significantly correlated. Observing different discovery results for a disease by the POGR and nPOGR scores will obviously reduce the uncertainty of the microarray studies. The proposed metrics could also be applicable in many other high-throughput post-genomic areas. Contact:guoz@ems.hrbmu.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 45 references indexed in Scilit:
- Reproducibility of microarray data: a further analysis of microarray quality control (MAQC) dataBMC Bioinformatics, 2007
- Thousands of samples are needed to generate a robust gene list for predicting outcome in cancerProceedings of the National Academy of Sciences of the United States of America, 2006
- An array of problemsNature Reviews Drug Discovery, 2005
- Redefinition of Affymetrix probe sets by sequence overlap with cDNA microarray probes reduces cross-platform inconsistencies in cancer-associated gene expression measurementsBMC Bioinformatics, 2005
- Outcome signature genes in breast cancer: is there a unique set?Bioinformatics, 2004
- SOURCE: a unified genomic resource of functional annotations, ontologies, and gene expression dataNucleic Acids Research, 2003
- Empirical Bayes Analysis of a Microarray ExperimentJournal of the American Statistical Association, 2001
- Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclassesProceedings of the National Academy of Sciences of the United States of America, 2001
- Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple TestingJournal of the Royal Statistical Society: Series B (Methodological), 1995
- Measuring nominal scale agreement among many raters.Psychological Bulletin, 1971