A comparison of bootstrap methods and an adjusted bootstrap approach for estimating the prediction error in microarray classification
- 11 July 2007
- journal article
- research article
- Published by Wiley in Statistics in Medicine
- Vol. 26 (29), 5320-5334
- https://doi.org/10.1002/sim.2968
Abstract
This paper first provides a critical review on some existing methods for estimating the prediction error in classifying microarray data where the number of genes greatly exceeds the number of specimens. Special attention is given to the bootstrap‐related methods. When the sample size n is small, we find that all the reviewed methods suffer from either substantial bias or variability. We introduce a repeated leave‐one‐out bootstrap (RLOOB) method that predicts for each specimen in the sample using bootstrap learning sets of size ln. We then propose an adjusted bootstrap (ABS) method that fits a learning curve to the RLOOB estimates calculated with different bootstrap learning set sizes. The ABS method is robust across the situations we investigate and provides a slightly conservative estimate for the prediction error. Even with small samples, it does not suffer from large upward bias as the leave‐one‐out bootstrap and the 0.632+ bootstrap, and it does not suffer from large variability as the leave‐one‐out cross‐validation in microarray applications. Copyright © 2007 John Wiley & Sons, Ltd.Keywords
This publication has 17 references indexed in Scilit:
- Critical Review of Published Microarray Studies for Cancer Outcome and Guidelines on Statistical Analysis and ReportingJNCI Journal of the National Cancer Institute, 2007
- Gene expression patterns for doxorubicin (Adriamycin) and cyclophosphamide (Cytoxan) (AC) response and resistanceBreast Cancer Research and Treatment, 2005
- Effectiveness of Gene Expression Profiling for Response Prediction of Rectal Adenocarcinomas to Preoperative ChemoradiotherapyJournal of Clinical Oncology, 2005
- Estimating misclassification error with small samples via bootstrap cross-validationBioinformatics, 2005
- Estimating Dataset Size Requirements for Classifying DNA Microarray DataJournal of Computational Biology, 2003
- Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression DataJournal of the American Statistical Association, 2002
- Improvements on Cross-Validation: The .632+ Bootstrap MethodJournal of the American Statistical Association, 1997
- R: A Language for Data Analysis and GraphicsJournal of Computational and Graphical Statistics, 1996
- A Graph-Dynamic Model of the Power Law of Practice and the Problem-Solving Fan-EffectScience, 1988
- Estimating the Error Rate of a Prediction Rule: Improvement on Cross-ValidationJournal of the American Statistical Association, 1983