Gradient lasso for Cox proportional hazards model

Open Access

15 May 2009

journal article
research article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 25 (14), 1775-1781
https://doi.org/10.1093/bioinformatics/btp322

Abstract

Motivation: There has been an increasing interest in expressing a survival phenotype (e.g. time to cancer recurrence or death) or its distribution in terms of a subset of the expression data of a subset of genes. Due to high dimensionality of gene expression data, however, there is a serious problem of collinearity in fitting a prediction model, e.g. Cox's proportional hazards model. To avoid the collinearity problem, several methods based on penalized Cox proportional hazards models have been proposed. However, those methods suffer from severe computational problems, such as slow or even failed convergence, because of high-dimensional matrix inversions required for model fitting. We propose to implement the penalized Cox regression with a lasso penalty via the gradient lasso algorithm that yields faster convergence to the global optimum than do other algorithms. Moreover the gradient lasso algorithm is guaranteed to converge to the optimum under mild regularity conditions. Hence, our gradient lasso algorithm can be a useful tool in developing a prediction model based on high-dimensional covariates including gene expression data. Results: Results from simulation studies showed that the prediction model by gradient lasso recovers the prognostic genes. Also results from diffuse large B-cell lymphoma datasets and Norway/Stanford breast cancer dataset indicate that our method is very competitive compared with popular existing methods by Park and Hastie and Goeman in its computational time, prediction and selectivity. Availability: R package glcoxph is available at http://datamining.dongguk.ac.kr/R/glcoxph. Contact:park463@uos.ac.kr

Keywords

This publication has 19 references indexed in Scilit:

A Gradient-Based Optimization Algorithm for LASSO
Journal of Computational and Graphical Statistics, 2008
A note on path-based variable selection in the penalized proportional hazards model
Biometrika, 2008
Adaptive Lasso for Cox's proportional hazards model
Biometrika, 2007
CASPAR: a hierarchical bayesian approach to predict survival times in cancer from gene expression data
Bioinformatics, 2006
Prediction by Supervised Principal Components
Journal of the American Statistical Association, 2006
Microarray gene expression data with linked survival phenotypes: diffuse large-B-cell lymphoma revisited
Biostatistics, 2005
Semi-Supervised Methods to Predict Patient Survival from Gene Expression Data
PLoS Biology, 2004
Least angle regression
The Annals of Statistics, 2004
Repeated observation of breast tumor subtypes in independent gene expression data sets
Proceedings of the National Academy of Sciences of the United States of America, 2003
Exploring the new world of the genome with DNA microarrays
Nature Genetics, 1999

Cited by 59 articles