Improved metabolomic data-based prediction of depressive symptoms using nonlinear machine learning with feature selection

Open Access

19 May 2020

journal article
research article
Published by Springer Science and Business Media LLC in Translational Psychiatry

Vol. 10 (1), 1-12
https://doi.org/10.1038/s41398-020-0831-9

Abstract

To solve major limitations in algorithms for the metabolite-based prediction of psychiatric phenotypes, a novel prediction model for depressive symptoms based on nonlinear feature selection machine learning, the Hilbert–Schmidt independence criterion least absolute shrinkage and selection operator (HSIC Lasso) algorithm, was developed and applied to a metabolomic dataset with the largest sample size to date. In total, 897 population-based subjects were recruited from the communities affected by the Great East Japan Earthquake; 306 metabolite features (37 metabolites identified by nuclear magnetic resonance measurements and 269 characterized metabolites based on the intensities from mass spectrometry) were utilized to build prediction models for depressive symptoms as evaluated by the Center for Epidemiologic Studies-Depression Scale (CES-D). The nested fivefold cross-validation was used for developing and evaluating the prediction models. The HSIC Lasso-based prediction model showed better predictive power than the other prediction models, including Lasso, support vector machine, partial least squares, random forest, and neural network. l-leucine, 3-hydroxyisobutyrate, and gamma-linolenyl carnitine frequently contributed to the prediction. We have demonstrated that the HSIC Lasso-based prediction model integrating nonlinear feature selection showed improved predictive power for depressive symptoms based on metabolome data as well as on risk metabolites based on nonlinear statistics in the Japanese population. Further studies should use HSIC Lasso-based prediction models with different ethnicities to investigate the generality of each risk metabolite for predicting depressive symptoms.

This publication has 46 references indexed in Scilit:

Serum concentrations of phthalate metabolites are related to abdominal fat distribution two years later in elderly women
Environmental Health, 2012
Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery
BMC Bioinformatics, 2011
Sparse Partial Least Squares Classification for High Dimensional Data
Statistical Applications in Genetics and Molecular Biology, 2010
Vitamin D Status and the Risk of Cardiovascular Disease Death
American Journal of Epidemiology, 2009
Biological Variations in Depression and Anxiety Between East and West
CNS Neuroscience & Therapeutics, 2009
Depression and body mass index, a u-shaped association
BMC Public Health, 2009
Sure Independence Screening for Ultrahigh Dimensional Feature Space
Journal of the Royal Statistical Society Series B: Statistical Methodology, 2008
Metabonomics, dietary influences and cultural differences: a 1H NMR-based study of urine samples obtained from healthy British and Swedish subjects
Journal of Pharmaceutical and Biomedical Analysis, 2004
Small sample size effects in statistical pattern recognition: recommendations for practitioners
IEEE Transactions on Pattern Analysis and Machine Intelligence, 1991
The CES-D Scale
Applied Psychological Measurement, 1977

Cited by 25 articles