A new method for class prediction based on signed-rank algorithms applied to Affymetrix® microarray experiments

Open Access

11 January 2008

journal article
research article
Published by Springer Science and Business Media LLC in BMC Bioinformatics

Vol. 9 (1), 16
https://doi.org/10.1186/1471-2105-9-16

Abstract

The huge amount of data generated by DNA chips is a powerful basis to classify various pathologies. However, constant evolution of microarray technology makes it difficult to mix data from different chip types for class prediction of limited sample populations. Affymetrix^® technology provides both a quantitative fluorescence signal and a decision (detection call: absent or present) based on signed-rank algorithms applied to several hybridization repeats of each gene, with a per-chip normalization. We developed a new prediction method for class belonging based on the detection call only from recent Affymetrix chip type. Biological data were obtained by hybridization on U133A, U133B and U133Plus 2.0 microarrays of purified normal B cells and cells from three independent groups of multiple myeloma (MM) patients. After a call-based data reduction step to filter out non class-discriminative probe sets, the gene list obtained was reduced to a predictor with correction for multiple testing by iterative deletion of probe sets that sequentially improve inter-class comparisons and their significance. The error rate of the method was determined using leave-one-out and 5-fold cross-validation. It was successfully applied to (i) determine a sex predictor with the normal donor group classifying gender with no error in all patient groups except for male MM samples with a Y chromosome deletion, (ii) predict the immunoglobulin light and heavy chains expressed by the malignant myeloma clones of the validation group and (iii) predict sex, light and heavy chain nature for every new patient. Finally, this method was shown powerful when compared to the popular classification method Prediction Analysis of Microarray (PAM). This normalization-free method is routinely used for quality control and correction of collection errors in patient reports to clinicians. It can be easily extended to multiple class prediction suitable with clinical groups, and looks particularly promising through international cooperative projects like the "Microarray Quality Control project of US FDA" MAQC as a predictive classifier for diagnostic, prognostic and response to treatment. Finally, it can be used as a powerful tool to mine published data generated on Affymetrix systems and more generally classify samples with binary feature values.

Keywords

This publication has 48 references indexed in Scilit:

Microarray Analysis and Tumor Classification
New England Journal of Medicine, 2006
Heparan sulphate proteoglycans are essential for the myeloma cell growth activity of EGF-family ligands in multiple myeloma
Oncogene, 2006
Empirical Bayes screening of many p-values with applications to microarray studies
Bioinformatics, 2005
Molecular decomposition of complex clinical phenotypes using biologically structured analysis of microarray data
Bioinformatics, 2005
A Multigene Assay to Predict Recurrence of Tamoxifen-Treated, Node-Negative Breast Cancer
New England Journal of Medicine, 2004
Prediction of Survival in Diffuse Large-B-Cell Lymphoma Based on the Expression of Six Genes
New England Journal of Medicine, 2004
Detecting outlying samples in microarray data: A critical assessment of the effect of outliers on sample classification
Chem-Bio Informatics Journal, 2003
Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling
Nature, 2000
Measurement of free kappa and lambda chains in serum and the significance of their ratio in patients with multiple myeloma
British Journal of Haematology, 1992
Pattern recognition by means of disjoint principal components models
Pattern Recognition, 1976

Cited by 34 articles