Discrepancies in metabolomic biomarker identification from patient-derived lung cancer revealed by combined variation in data pre-treatment and imputation methods

27 March 2021

journal article
research article
Published by Springer Science and Business Media LLC in Metabolomics

Vol. 17 (4), 1-13
https://doi.org/10.1007/s11306-021-01787-2

Abstract

Introduction The identification of metabolomic biomarkers predictive of cancer patient response to therapy and of disease stage has been pursued as a “holy grail” of modern oncology, relying on the metabolic dysfunction that characterizes cancer progression. In spite of the evaluation of many candidate biomarkers, however, determination of a consistent set with practical clinical utility has proven elusive. Objective In this study, we systematically examine the combined role of data pre-treatment and imputation methods on the performance of multivariate data analysis methods and their identification of potential biomarkers. Methods Uniquely, we are able to systematically evaluate both unsupervised and supervised methods with a metabolomic data set obtained from patient-derived lung cancer core biopsies with true missing values. Eight pre-treatment methods, ten imputation methods, and two data analysis methods were applied in combination. Results The combined choice of pre-treatment and imputation methods is critical in the definition of candidate biomarkers, with deficient or inappropriate selection of these methods leading to inconsistent results, and with important biomarkers either being overlooked or reported as a false positive. The log transformation appeared to normalize the original tumor data most effectively, but the performance of the imputation applied after the transformation was highly dependent on the characteristics of the data set. Conclusion The combined choice of pre-treatment and imputation methods may need careful evaluation prior to metabolomic data analysis of human tumors, in order to enable consistent identification of potential biomarkers predictive of response to therapy and of disease stage.

Funding Information

National Cancer Institute (R15CA203605)

This publication has 49 references indexed in Scilit:

MeltDB 2.0–advances of the metabolomics software system
Bioinformatics, 2013
Data Preprocessing Method for Liquid Chromatography–Mass Spectrometry Based Metabolomics
Analytical Chemistry, 2012
MetSign: A Computational Platform for High-Resolution Mass Spectrometry-Based Metabolomics
Analytical Chemistry, 2011
Evaluation of Three Simple Imputation Methods for Enhancing Preprocessing of Data with Missing Values
International Journal of Computer Applications, 2011
New response evaluation criteria in solid tumours: Revised RECIST guideline (version 1.1)
European Journal of Cancer, 2009
Metabolic Phenotyping in Health and Disease
Cell, 2008
Discovery of metabolite features for the modelling and analysis of high-resolution NMR spectra
International Journal of Data Mining and Bioinformatics, 2008
Improved analysis of multivariate data by variable stability scaling: application to NMR-based metabolic profiling
Analytica Chimica Acta, 2003
Statistical process monitoring: basics and beyond
Journal of Chemometrics, 2003
Inference and missing data
Biometrika, 1976

Cited by 3 articles