Clustering 16S rRNA for OTU prediction: A similarity based method

Abstract
To study the phylogeny and taxonomy of samples from complex environments Next-generation sequencing (NGS)-based 16S rRNA sequencing , which has been successfully used jointly with the PCR amplification and NGS technology. First step for many downstream analyses is clustering 16S rRNA sequences into operational taxonomic units (OTUs). Heuristic clustering is one of the most widely employed approaches for generating OTUs in which one or more seed sequences to represent each cluster are selected. In this work we chose five random seeds for each cluster from a genes library, and we present a novel distance measure to cluster bacteria in the sample. Artificially created sets of 16S rRNA genes selected from databases are successfully clustered with more than %98 accuracy, sensitivity, and specificity.