Microarray gene expression data with linked survival phenotypes: diffuse large-B-cell lymphoma revisited
Open Access
- 11 November 2005
- journal article
- research article
- Published by Oxford University Press (OUP) in Biostatistics
- Vol. 7 (2), 268-285
- https://doi.org/10.1093/biostatistics/kxj006
Abstract
Diffuse large-B-cell lymphoma (DLBCL) is an aggressive malignancy of mature B lymphocytes and is the most common type of lymphoma in adults. While treatment advances have been substantial in what was formerly a fatal disease, less than 50% of patients achieve lasting remission. In an effort to predict treatment success and explain disease heterogeneity clinical features have been employed for prognostic purposes, but have yielded only modest predictive performance. This has spawned a series of high-profile microarray-based gene expression studies of DLBCL, in the hope that molecular-level information could be used to refine prognosis. The intent of this paper is to reevaluate these microarray-based prognostic assessments, and extend the statistical methodology that has been used in this context. Methodological challenges arise in using patients' gene expression profiles to predict survival endpoints on account of the large number of genes and their complex interdependence. We initially focus on the Lymphochip data and analysis of Rosenwald et al. (2002). After describing relationships between the analyses performed and gene harvesting (Hastie et al., 2001a), we argue for the utility of penalized approaches, in particular least angle regression-least absolute shrinkage and selection operator (Efron et al., 2004). While these techniques have been extended to the proportional hazards/partial likelihood framework, the resultant algorithms are computationally burdensome. We develop residual-based approximations that eliminate this burden yet perform similarly. Comparisons of predictive accuracy across both methods and studies are effected using time-dependent receiver operating characteristic curves. These indicate that gene expression data, in turn, only delivers modest predictions of posttherapy DLBCL survival. We conclude by outlining possibilities for further work.Keywords
This publication has 42 references indexed in Scilit:
- Tight Clustering: A Resampling‐Based Approach for Identifying Stable and Tight Patterns in DataBiometrics, 2005
- Integrated modeling of clinical and gene expression information for personalized prediction of disease outcomesProceedings of the National Academy of Sciences of the United States of America, 2004
- Prediction of Survival in Diffuse Large-B-Cell Lymphoma Based on the Expression of Six GenesNew England Journal of Medicine, 2004
- Semi-Supervised Methods to Predict Patient Survival from Gene Expression DataPLoS Biology, 2004
- Regression Approaches for Microarray Data AnalysisJournal of Computational Biology, 2003
- Estimating the Number of Clusters in a Data Set Via the Gap StatisticJournal of the Royal Statistical Society Series B: Statistical Methodology, 2001
- Significance analysis of microarrays applied to the ionizing radiation responseProceedings of the National Academy of Sciences of the United States of America, 2001
- Distinct types of diffuse large B-cell lymphoma identified by gene expression profilingNature, 2000
- On Measuring and Correcting the Effects of Data Mining and Model SelectionJournal of the American Statistical Association, 1998
- Multivariate Adaptive Regression SplinesThe Annals of Statistics, 1991