Correlation test to assess low-level processing of high-density oligonucleotide microarray data
Open Access
- 31 March 2005
- journal article
- Published by Springer Science and Business Media LLC in BMC Bioinformatics
- Vol. 6 (1), 80
- https://doi.org/10.1186/1471-2105-6-80
Abstract
There are currently a number of competing techniques for low-level processing of oligonucleotide array data. The choice of technique has a profound effect on subsequent statistical analyses, but there is no method to assess whether a particular technique is appropriate for a specific data set, without reference to external data. We analyzed coregulation between genes in order to detect insufficient normalization between arrays, where coregulation is measured in terms of statistical correlation. In a large collection of genes, a random pair of genes should have on average zero correlation, hence allowing a correlation test. For all data sets that we evaluated, and the three most commonly used low-level processing procedures including MAS5, RMA and MBEI, the housekeeping-gene normalization failed the test. For a real clinical data set, RMA and MBEI showed significant correlation for absent genes. We also found that a second round of normalization on the probe set level improved normalization significantly throughout. Previous evaluation of low-level processing in the literature has been limited to artificial spike-in and mixture data sets. In the absence of a known gold-standard, the correlation criterion allows us to assess the appropriateness of low-level processing of a specific data set and the success of normalization for subsets of genes.Keywords
This publication has 16 references indexed in Scilit:
- Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control datasetGenome Biology, 2005
- A benchmark for Affymetrix GeneChip expression measuresBioinformatics, 2004
- Bioconductor: open software development for computational biology and bioinformaticsGenome Biology, 2004
- Exploration, normalization, and summaries of high density oligonucleotide array probe level dataBiostatistics, 2003
- Comparisons and validation of statistical clustering techniques for microarray gene expression dataBioinformatics, 2003
- Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variationNucleic Acids Research, 2002
- Computational analysis of microarray dataNature Reviews Genetics, 2001
- Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detectionProceedings of the National Academy of Sciences, 2000
- Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detectionProceedings of the National Academy of Sciences of the United States of America, 2000
- High density synthetic oligonucleotide arraysNature Genetics, 1999