Predicting the molecular complexity of sequencing libraries

Abstract
A statistical method and software yields accurate predictions of sequencing library complexity on the basis of initial shallow sequencing surveys, allowing robust estimates of how deep to sequence for adequate coverage. Predicting the molecular complexity of a genomic sequencing library is a critical but difficult problem in modern sequencing applications. Methods to determine how deeply to sequence to achieve complete coverage or to predict the benefits of additional sequencing are lacking. We introduce an empirical Bayesian method to accurately characterize the molecular complexity of a DNA sample for almost any sequencing application on the basis of limited preliminary sequencing.