Supervised machine learning is superior to indicator value inference in monitoring the environmental impacts of salmon aquaculture using eDNA metabarcodes
Open Access
- 13 April 2020
- journal article
- research article
- Published by Wiley in Molecular Ecology
- Vol. 30 (13), 2988-3006
- https://doi.org/10.1111/mec.15434
Abstract
Increasing anthropogenic impact and global change effects on natural ecosystems has prompted the development of less expensive and more efficient bioassessments methodologies. One promising approach is the integration of DNA metabarcoding in environmental monitoring. A critical step in this process is the inference of ecological quality (EQ) status from identified molecular bioindicator signatures that mirror environmental classification based on standard macroinvertebrate surveys. The most promising approaches to infer EQ from biotic indices (BI) are supervised machine learning (SML) and the calculation of indicator values (IndVal). In this study we compared the performance of both approaches using DNA metabarcodes of bacteria and ciliates as bioindicators obtained from 152 samples collected from seven Norwegian salmon farms. Results from standard macroinvertebrate‐monitoring of the same samples were used as reference to compare the accuracy of both approaches. First, SML outperformed the IndVal approach to infer EQ from eDNA metabarcodes. The Random Forest (RF) algorithm appeared to be less sensitive to noisy data (a typical feature of massive environmental sequence data sets) and uneven data coverage across EQ classes (a typical feature of environmental compliance monitoring scheme) compared to a widely used method to infer IndVals for the calculation of a BI. Second, bacteria allowed for a more accurate EQ assessment than ciliate eDNA metabarcodes. For the implementation of DNA metabarcoding into routine monitoring programs to assess ecological quality around salmon aquaculture cages, we therefore recommend bacterial DNA metabarcodes in combination with SML to classify EQ categories based on molecular signatures.Keywords
Funding Information
- Deutsche Forschungsgemeinschaft (STO414/15‐1)
- Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (31003A_179125)
This publication has 99 references indexed in Scilit:
- Comparison of imputation methods for missing laboratory data in medicineBMJ Open, 2013
- Quality-filtering vastly improves diversity estimates from Illumina amplicon sequencingNature Methods, 2013
- Biogeography of bacterial communities exposed to progressive long-term environmental changeThe ISME Journal, 2012
- Towards next‐generation biodiversity assessment using DNA metabarcodingMolecular Ecology, 2012
- Transitions in bacterial communities along the 2000 km salinity gradient of the Baltic SeaThe ISME Journal, 2011
- Supervised classification of human microbiotaFEMS Microbiology Reviews, 2011
- Multiple marker parallel tag environmental DNA sequencing reveals a highly complex eukaryotic community in marine anoxic waterMolecular Ecology, 2010
- Diversity and geographic distribution of ciliates (Protista: Ciliophora)Biodiversity and Conservation, 2007
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- A brief review of approaches using ciliated protists to assess aquatic ecosystem healthJournal of Aquatic Ecosystem Health, 1992