Effects of OTU Clustering and PCR Artifacts on Microbial Diversity Estimates
- 12 December 2012
- journal article
- Published by Springer Science and Business Media LLC in Microbial Ecology
- Vol. 65 (3), 709-719
- https://doi.org/10.1007/s00248-012-0145-4
Abstract
Next-generation sequencing has increased the coverage of microbial diversity surveys by orders of magnitude, but differentiating artifacts from rare environmental sequences remains a challenge. Clustering 16S rRNA sequences into operational taxonomic units (OTUs) organizes sequence data into groups of 97 % identity, helping to reduce data volumes and avoid analyzing sequencing artifacts by grouping them with real sequences. Here, we analyze sequence abundance distributions across environmental samples and show that 16S rRNA sequences of >99 % identity can represent functionally distinct microorganisms, rendering OTU clustering problematic when the goal is an accurate analysis of organism distribution. Strict postsequencing quality control (QC) filters eliminated the most prevalent artifacts without clustering. Further experiments proved that DNA polymerase errors in polymerase chain reaction (PCR) generate a significant number of substitution errors, most of which pass QC filters. Based on our findings, we recommend minimizing the number of PCR cycles in DNA library preparation and applying strict postsequencing QC filters to reduce the most prevalent artifacts while maintaining a high level of accuracy in diversity estimates. We further recommend correlating rare and abundant sequences across environmental samples, rather than clustering into OTUs, to identify remaining sequence artifacts without losing the resolution afforded by high-throughput sequencing.Keywords
This publication has 38 references indexed in Scilit:
- UCHIME improves sensitivity and speed of chimera detectionBioinformatics, 2011
- Illumina-based analysis of microbial community diversityThe ISME Journal, 2011
- Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR ampliconsGenome Research, 2011
- Ironing out the wrinkles in the rare biosphere through improved OTU clusteringEnvironmental Microbiology, 2010
- QIIME allows analysis of high-throughput community sequencing dataNature Methods, 2010
- Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimatesEnvironmental Microbiology, 2009
- Quality scores and SNP detection in sequencing-by-synthesis systemsGenome Research, 2008
- Pyrosequencing enumerates and contrasts soil microbial diversityThe ISME Journal, 2007
- Microbial diversity in the deep sea and the underexplored “rare biosphere”Proceedings of the National Academy of Sciences of the United States of America, 2006
- Community structure and metabolism through reconstruction of microbial genomes from the environmentNature, 2004