Quality Assessment for Short Oligonucleotide Microarray Data

Abstract
Quality of microarray gene expression data has emerged as a new research topic. As in other areas, microarray quality is assessed by comparing suitable numerical summaries across microarrays, so that outliers and trends can be visualized and poor-quality arrays or variable-quality sets of arrays can be identified. Because each single array comprises tens or hundreds of thousands of measurements, the challenge is to find numerical summaries that can be used to make accurate quality calls. Toward this end, several new quality measures are introduced based on probe-level and probeset-level information, all obtained as a byproduct of the low-level analysis algorithms RMA/fitPLM for Affymetrix GeneChips. Quality landscapes spatially localize chip or hybridization problems. Numerical chip quality measures are derived from the distributions of normalized unscaled standard errors and relative log expressions. Quality of chip batches is assessed by residual scale factors. These quality assessment measures are demonstrated on a variety of data sets, including spike-in experiments, small lab experiments, and multisite studies. They are compared with Affymetrix's individual chip quality report.