NMRFinder: a novel method for 1D 1H-NMR metabolite annotation

Abstract
Introduction Methods for the automated and accurate identification of metabolites in 1D 1H-NMR samples are crucial, but this is still an unsolved problem. Most available tools are mainly focused on metabolite quantification, thus limiting the number of metabolites that can be identified. Also, most only use reference spectra obtained under the same specific conditions of the target sample, limiting the use of available knowledge. Objectives The main goal of this work was to develop novel methods to perform metabolite annotation from 1D 1H-NMR peaks with enhanced reliability, to aid the users in metabolite identification. An essential step was to construct a vast and up-do-date library of reference 1D 1H-NMR peak lists collected under distinct experimental conditions. Methods Three different algorithms were evaluated for their capacity to correctly annotate metabolites present in both synthetic and real samples and compared to publicly available tools. The best proposed method was evaluated in a plethora of scenarios, including missing references, missing peaks and peak shifts, to assess its annotation accuracy, precision and recall. Results We gathered 1816 peak lists for 1387 different metabolites from several sources across different conditions for our reference library. A new method, NMRFinder, is proposed and allows matching 1D 1H-NMR samples with all the reference peak lists in the library, regardless of acquisition conditions. Metabolites are scored according to the number of peaks matching the samples, how unique their peaks are in the library and how close the spectrum acquisition conditions are in relation to those of the samples. Results show a true positive rate of 0.984 when analysing computationally created samples, while 71.8% of the metabolites were annotated when analysing samples from previously identified public datasets. Conclusion NMRFinder performs metabolite annotation reliably and outperforms previous methods, being of great value in helping the user to ultimately identify metabolites. It is implemented in the R package specmine.
Funding Information
  • Fundação para a Ciência e a Tecnologia (SFRH/BD/138951/2018)