Discrepancies in metabolomic biomarker identification from patient-derived lung cancer revealed by combined variation in data pre-treatment and imputation methods
- 27 March 2021
- journal article
- research article
- Published by Springer Science and Business Media LLC in Metabolomics
- Vol. 17 (4), 1-13
- https://doi.org/10.1007/s11306-021-01787-2
Abstract
Introduction The identification of metabolomic biomarkers predictive of cancer patient response to therapy and of disease stage has been pursued as a “holy grail” of modern oncology, relying on the metabolic dysfunction that characterizes cancer progression. In spite of the evaluation of many candidate biomarkers, however, determination of a consistent set with practical clinical utility has proven elusive. Objective In this study, we systematically examine the combined role of data pre-treatment and imputation methods on the performance of multivariate data analysis methods and their identification of potential biomarkers. Methods Uniquely, we are able to systematically evaluate both unsupervised and supervised methods with a metabolomic data set obtained from patient-derived lung cancer core biopsies with true missing values. Eight pre-treatment methods, ten imputation methods, and two data analysis methods were applied in combination. Results The combined choice of pre-treatment and imputation methods is critical in the definition of candidate biomarkers, with deficient or inappropriate selection of these methods leading to inconsistent results, and with important biomarkers either being overlooked or reported as a false positive. The log transformation appeared to normalize the original tumor data most effectively, but the performance of the imputation applied after the transformation was highly dependent on the characteristics of the data set. Conclusion The combined choice of pre-treatment and imputation methods may need careful evaluation prior to metabolomic data analysis of human tumors, in order to enable consistent identification of potential biomarkers predictive of response to therapy and of disease stage.Funding Information
- National Cancer Institute (R15CA203605)
This publication has 49 references indexed in Scilit:
- MeltDB 2.0–advances of the metabolomics software systemBioinformatics, 2013
- Data Preprocessing Method for Liquid Chromatography–Mass Spectrometry Based MetabolomicsAnalytical Chemistry, 2012
- MetSign: A Computational Platform for High-Resolution Mass Spectrometry-Based MetabolomicsAnalytical Chemistry, 2011
- Evaluation of Three Simple Imputation Methods for Enhancing Preprocessing of Data with Missing ValuesInternational Journal of Computer Applications, 2011
- New response evaluation criteria in solid tumours: Revised RECIST guideline (version 1.1)European Journal of Cancer, 2009
- Metabolic Phenotyping in Health and DiseaseCell, 2008
- Discovery of metabolite features for the modelling and analysis of high-resolution NMR spectraInternational Journal of Data Mining and Bioinformatics, 2008
- Improved analysis of multivariate data by variable stability scaling: application to NMR-based metabolic profilingAnalytica Chimica Acta, 2003
- Statistical process monitoring: basics and beyondJournal of Chemometrics, 2003
- Inference and missing dataBiometrika, 1976