DADA2: High-resolution sample inference from Illumina amplicon data

Abstract
DADA2 is an open-source software package that denoises and removes sequencing errors from Illumina amplicon sequence data to distinguish microbial sample sequences differing by as little as a single nucleotide. We present the open-source software package DADA2 for modeling and correcting Illumina-sequenced amplicon errors ( https://github.com/benjjneb/dada2 ). DADA2 infers sample sequences exactly and resolves differences of as little as 1 nucleotide. In several mock communities, DADA2 identified more real variants and output fewer spurious sequences than other methods. We applied DADA2 to vaginal samples from a cohort of pregnant women, revealing a diversity of previously undetected Lactobacillus crispatus variants.