Decision tree supported substructure prediction of metabolites from GC-MS profiles
Open Access
- 16 February 2010
- journal article
- research article
- Published by Springer Science and Business Media LLC in Metabolomics
- Vol. 6 (2), 322-333
- https://doi.org/10.1007/s11306-010-0198-7
Abstract
Gas chromatography coupled to mass spectrometry (GC-MS) is one of the most widespread routine technologies applied to the large scale screening and discovery of novel metabolic biomarkers. However, currently the majority of mass spectral tags (MSTs) remains unidentified due to the lack of authenticated pure reference substances required for compound identification by GC-MS. Here, we accessed the information on reference compounds stored in the Golm Metabolome Database (GMD) to apply supervised machine learning approaches to the classification and identification of unidentified MSTs without relying on library searches. Non-annotated MSTs with mass spectral and retention index (RI) information together with data of already identified metabolites and reference substances have been archived in the GMD. Structural feature extraction was applied to sub-divide the metabolite space contained in the GMD and to define the prediction target classes. Decision tree (DT)-based prediction of the most frequent substructures based on mass spectral features and RI information is demonstrated to result in highly sensitive and specific detections of sub-structures contained in the compounds. The underlying set of DTs can be inspected by the user and are made available for batch processing via SOAP (Simple Object Access Protocol)-based web services. The GMD mass spectral library with the integrated DTs is freely accessible for non-commercial use at http://gmd.mpimp-golm.mpg.de/. All matching and structure search functionalities are available as SOAP-based web services. A XML + HTTP interface, which follows Representational State Transfer (REST) principles, facilitates read-only access to data base entities.Keywords
This publication has 22 references indexed in Scilit:
- Retention index thresholds for compound matching in GC–MS metabolite profilingJournal of Chromatography B, 2008
- Current challenges and developments in GC–MS based metabolite profiling technologyJournal of Biotechnology, 2006
- CO: A chemical ontology for identification of functional groups and semantic comparison of small moleculesFEBS Letters, 2005
- GC–MS libraries for the rapid identification of metabolites in complex biological samplesFEBS Letters, 2005
- GMD@CSB.DB: the Golm Metabolome DatabaseBioinformatics, 2004
- Chemical derivatization and mass spectral libraries in metabolic profiling by GC/MS and LC/MS/MSJournal of Experimental Botany, 2004
- Construction and application of a mass spectral and retention time index database generated from plant GC/EI-TOF-MS metabolite profilesPhytochemistry, 2003
- Feature selection by genetic algorithms for mass spectral classifiersAnalytica Chimica Acta, 2001
- Mass Spectral Classifiers for Supporting Systematic Structure ElucidationJournal of Chemical Information and Computer Sciences, 1996
- Information retrievalACM SIGIR Forum, 1983