Statistical methods for characterizing diversity of microbial communities by analysis of terminal restriction fragment length polymorphisms of 16S rRNA genes

Abstract
The analysis of terminal restriction fragment length polymorphisms (T-RFLP) of 16S rRNA genes has proven to be a facile means to compare microbial communities and presumptively identify abundant members. The method provides data that can be used to compare different communities based on similarity or distance measures. Once communities have been clustered into groups, clone libraries can be prepared from sample(s) that are representative of each group in order to determine the phylogeny of the numerically abundant populations in a community. In this paper methods are introduced for the statistical analysis of T-RFLP data that include objective methods for (i) determining a baseline so that 'true' peaks in electropherograms can be identified; (ii) a means to compare electropherograms and bin fragments of similar size; (iii) clustering algorithms that can be used to identify communities that are similar to one another; and (iv) a means to select samples that are representative of a cluster that can be used to construct 16S rRNA gene clone libraries. The methods for data analysis were tested using simulated data with assumptions and parameters that corresponded to actual data. The simulation results demonstrated the usefulness of these methods in their ability to recover the true microbial community structure generated under the assumptions made. Software for implementing these methods is available at http://www.ibest.uidaho.edu/tools/trflp_stats/index.php.