PREGO: A Literature and Data-Mining Resource to Associate Microorganisms, Biological Processes, and Environment Types
Open Access
- 26 January 2022
- journal article
- research article
- Published by MDPI AG in Microorganisms
- Vol. 10 (2), 293
- https://doi.org/10.3390/microorganisms10020293
Abstract
To elucidate ecosystem functioning, it is fundamental to recognize what processes occur in which environments (where) and which microorganisms carry them out (who). Here, we present PREGO, a one-stop-shop knowledge base providing such associations. PREGO combines text mining and data integration techniques to mine such what-where-who associations from data and metadata scattered in the scientific literature and in public omics repositories. Microorganisms, biological processes, and environment types are identified and mapped to ontology terms from established community resources. Analyses of comentions in text and co-occurrences in metagenomics data/metadata are performed to extract associations and a level of confidence is assigned to each of them thanks to a scoring scheme. The PREGO knowledge base contains associations for 364,508 microbial taxa, 1090 environmental types, 15,091 biological processes, and 7971 molecular functions with a total of almost 58 million associations. These associations are available through a web portal, an Application Programming Interface (API), and bulk download. By exploring environments and/or processes associated with each other or with microbes, PREGO aims to assist researchers in design and interpretation of experiments and their results. To demonstrate PREGO’s capabilities, a thorough presentation of its web interface is given along with a meta-analysis of experimental results from a lagoon-sediment study of sulfur-cycle related microbes.Keywords
Funding Information
- Hellenic Foundation for Research and Innovation (241)
This publication has 72 references indexed in Scilit:
- The SPECIES and ORGANISMS Resources for Fast and Accurate Identification of Taxonomic Names in TextPLOS ONE, 2013
- STRING v9.1: protein-protein interaction networks, with increased coverage and integrationNucleic Acids Research, 2012
- The SILVA ribosomal RNA gene database project: improved data processing and web-based toolsNucleic Acids Research, 2012
- The Protist Ribosomal Reference database (PR2): a catalog of unicellular eukaryote Small Sub-Unit rRNA sequences with curated taxonomyNucleic Acids Research, 2012
- Metagenomic mining for microbiologistsThe ISME Journal, 2011
- Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specificationsNature Biotechnology, 2011
- Molecular eco-systems biology: towards an understanding of community functionNature Reviews Microbiology, 2008
- The Microbial Engines That Drive Earth's Biogeochemical CyclesScience, 2008
- Literature mining for the biologist: from information retrieval to biological discoveryNature Reviews Genetics, 2006
- IUPAC‐IUBMB Joint Commission on Biochemical Nomenclature (JCBN) andNomenclature Committee of IUBMB (NC‐IUBMB)JBIC Journal of Biological Inorganic Chemistry, 1999