k-Nearest neighbor models for microarray gene expression analysis and clinical outcome prediction

Open Access

30 July 2010

journal article
research article
Published by Springer Science and Business Media LLC in The Pharmacogenomics Journal

Vol. 10 (4), 292-309
https://doi.org/10.1038/tpj.2010.56

Abstract

In the clinical application of genomic data analysis and modeling, a number of factors contribute to the performance of disease classification and clinical outcome prediction. This study focuses on the k-nearest neighbor (KNN) modeling strategy and its clinical use. Although KNN is simple and clinically appealing, large performance variations were found among experienced data analysis teams in the MicroArray Quality Control Phase II (MAQC-II) project. For clinical end points and controls from breast cancer, neuroblastoma and multiple myeloma, we systematically generated 463 320 KNN models by varying feature ranking method, number of features, distance metric, number of neighbors, vote weighting and decision threshold. We identified factors that contribute to the MAQC-II project performance variation, and validated a KNN data analysis protocol using a newly generated clinical data set with 478 neuroblastoma patients. We interpreted the biological and practical significance of the derived KNN models, and compared their performance with existing clinical factors.

Keywords

This publication has 48 references indexed in Scilit:

Comparison of performance of one-color and two-color gene-expression analyses in predicting clinical endpoints of neuroblastoma patients
The Pharmacogenomics Journal, 2010
The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models
Nature Biotechnology, 2010
Verification of genes differentially expressed in neuroblastoma tumours: a study of potential tumour suppressor genes
BMC Medical Genomics, 2009
Predicting outcomes for children with neuroblastoma using a multigene-expression signature: a retrospective SIOPEN/COG/GPOH study
The Lancet Oncology, 2009
Gene Expression in Fixed Tissues and Outcome in Hepatocellular Carcinoma
New England Journal of Medicine, 2008
Gene expression–based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study
Nature Medicine, 2008
Expression-based screening identifies the combination of histone deacetylase inhibitors and retinoids for neuroblastoma differentiation
Proceedings of the National Academy of Sciences of the United States of America, 2008
Interferon signaling and treatment outcome in chronic hepatitis C
Proceedings of the National Academy of Sciences of the United States of America, 2008
An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival
Proceedings of the National Academy of Sciences of the United States of America, 2005
MicroRNA expression profiles classify human cancers
Nature, 2005

Cited by 114 articles