A comparison of bootstrap methods and an adjusted bootstrap approach for estimating the prediction error in microarray classification

11 July 2007

journal article
research article
Published by Wiley in Statistics in Medicine

Vol. 26 (29), 5320-5334
https://doi.org/10.1002/sim.2968

Abstract

This paper first provides a critical review on some existing methods for estimating the prediction error in classifying microarray data where the number of genes greatly exceeds the number of specimens. Special attention is given to the bootstrap‐related methods. When the sample size n is small, we find that all the reviewed methods suffer from either substantial bias or variability. We introduce a repeated leave‐one‐out bootstrap (RLOOB) method that predicts for each specimen in the sample using bootstrap learning sets of size ln. We then propose an adjusted bootstrap (ABS) method that fits a learning curve to the RLOOB estimates calculated with different bootstrap learning set sizes. The ABS method is robust across the situations we investigate and provides a slightly conservative estimate for the prediction error. Even with small samples, it does not suffer from large upward bias as the leave‐one‐out bootstrap and the 0.632+ bootstrap, and it does not suffer from large variability as the leave‐one‐out cross‐validation in microarray applications. Copyright © 2007 John Wiley & Sons, Ltd.

Keywords

This publication has 17 references indexed in Scilit:

Critical Review of Published Microarray Studies for Cancer Outcome and Guidelines on Statistical Analysis and Reporting
JNCI Journal of the National Cancer Institute, 2007
Gene expression patterns for doxorubicin (Adriamycin) and cyclophosphamide (Cytoxan) (AC) response and resistance
Breast Cancer Research and Treatment, 2005
Effectiveness of Gene Expression Profiling for Response Prediction of Rectal Adenocarcinomas to Preoperative Chemoradiotherapy
Journal of Clinical Oncology, 2005
Estimating misclassification error with small samples via bootstrap cross-validation
Bioinformatics, 2005
Estimating Dataset Size Requirements for Classifying DNA Microarray Data
Journal of Computational Biology, 2003
Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data
Journal of the American Statistical Association, 2002
Improvements on Cross-Validation: The .632+ Bootstrap Method
Journal of the American Statistical Association, 1997
R: A Language for Data Analysis and Graphics
Journal of Computational and Graphical Statistics, 1996
A Graph-Dynamic Model of the Power Law of Practice and the Problem-Solving Fan-Effect
Science, 1988
Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation
Journal of the American Statistical Association, 1983

Cited by 75 articles