k-Nearest neighbor models for microarray gene expression analysis and clinical outcome prediction
Open Access
- 30 July 2010
- journal article
- research article
- Published by Springer Science and Business Media LLC in The Pharmacogenomics Journal
- Vol. 10 (4), 292-309
- https://doi.org/10.1038/tpj.2010.56
Abstract
In the clinical application of genomic data analysis and modeling, a number of factors contribute to the performance of disease classification and clinical outcome prediction. This study focuses on the k-nearest neighbor (KNN) modeling strategy and its clinical use. Although KNN is simple and clinically appealing, large performance variations were found among experienced data analysis teams in the MicroArray Quality Control Phase II (MAQC-II) project. For clinical end points and controls from breast cancer, neuroblastoma and multiple myeloma, we systematically generated 463 320 KNN models by varying feature ranking method, number of features, distance metric, number of neighbors, vote weighting and decision threshold. We identified factors that contribute to the MAQC-II project performance variation, and validated a KNN data analysis protocol using a newly generated clinical data set with 478 neuroblastoma patients. We interpreted the biological and practical significance of the derived KNN models, and compared their performance with existing clinical factors.Keywords
This publication has 48 references indexed in Scilit:
- Comparison of performance of one-color and two-color gene-expression analyses in predicting clinical endpoints of neuroblastoma patientsThe Pharmacogenomics Journal, 2010
- The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive modelsNature Biotechnology, 2010
- Verification of genes differentially expressed in neuroblastoma tumours: a study of potential tumour suppressor genesBMC Medical Genomics, 2009
- Predicting outcomes for children with neuroblastoma using a multigene-expression signature: a retrospective SIOPEN/COG/GPOH studyThe Lancet Oncology, 2009
- Gene Expression in Fixed Tissues and Outcome in Hepatocellular CarcinomaNew England Journal of Medicine, 2008
- Gene expression–based survival prediction in lung adenocarcinoma: a multi-site, blinded validation studyNature Medicine, 2008
- Expression-based screening identifies the combination of histone deacetylase inhibitors and retinoids for neuroblastoma differentiationProceedings of the National Academy of Sciences of the United States of America, 2008
- Interferon signaling and treatment outcome in chronic hepatitis CProceedings of the National Academy of Sciences of the United States of America, 2008
- An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survivalProceedings of the National Academy of Sciences of the United States of America, 2005
- MicroRNA expression profiles classify human cancersNature, 2005