Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data

Top Cited Papers

Open Access

16 December 2018

journal article
research article
Published by Springer Science and Business Media LLC in Microbiome

Vol. 6 (1), 1-14
https://doi.org/10.1186/s40168-018-0605-2

Abstract

BackgroundThe accuracy of microbial community surveys based on marker-gene and metagenomic sequencing (MGS) suffers from the presence of contaminantsDNA sequences not truly present in the sample. Contaminants come from various sources, including reagents. Appropriate laboratory practices can reduce contamination, but do not eliminate it. Here we introduce decontam (https://github.com/benjjneb/decontam), an open-source R package that implements a statistical classification procedure that identifies contaminants in MGS data based on two widely reproduced patterns: contaminants appear at higher frequencies in low-concentration samples and are often found in negative controls.ResultsDecontam classified amplicon sequence variants (ASVs) in a human oral dataset consistently with prior microscopic observations of the microbial taxa inhabiting that environment and previous reports of contaminant taxa. In metagenomics and marker-gene measurements of a dilution series, decontam substantially reduced technical variation arising from different sequencing protocols. The application of decontam to two recently published datasets corroborated and extended their conclusions that little evidence existed for an indigenous placenta microbiome and that some low-frequency taxa seemingly associated with preterm birth were contaminants.ConclusionsDecontam improves the quality of metagenomic and marker-gene sequencing by identifying and removing contaminant DNA sequences. Decontam integrates easily with existing MGS workflows and allows researchers to generate more accurate profiles of microbial communities at little to no additional cost.

Keywords

Funding Information

National Institute of Dental and Craniofacial Research (R01 DE023113)
National Institute of Allergy and Infectious Diseases (R01 AI112401)

This publication has 60 references indexed in Scilit:

Home Life: Factors Structuring the Bacterial Diversity Found within and between Homes
PLOS ONE, 2013
phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data
PLOS ONE, 2013
Bayesian community-wide culture-independent microbial source tracking
Nature Methods, 2011
Human oral, gut, and plaque microbiota in patients with atherosclerosis
Proceedings of the National Academy of Sciences of the United States of America, 2011
Systems-level analysis of microbial community organization through combinatorial labeling and spectral imaging
Proceedings of the National Academy of Sciences of the United States of America, 2011
Vaginal microbiome of reproductive-age women
Proceedings of the National Academy of Sciences of the United States of America, 2010
QIIME allows analysis of high-throughput community sequencing data
Nature Methods, 2010
Bacterial diversity in the oral cavity of 10 healthy individuals
The ISME Journal, 2010
The Human Oral Microbiome Database: a web accessible resource for investigating oral microbe taxonomic and genomic information
Database: The Journal of Biological Databases and Curation, 2010
A core gut microbiome in obese and lean twins
Nature, 2008

Cited by 1728 articles