Dynamic Programming Alignment Accuracy

Abstract
Algorithms for generating alignments of biological sequences have inherent statistical limitations when it comes to the accuracy of the alignments they produce. Using simulations, we measure the accuracy of the standard global dynamic programming method and show that it can be reasonably well modelled by an "edge wander" approximation to the distribution of the optimal scoring path around the correct path in the vicinity of a gap. We also give a table from which accuracy values can be predicted for commonly used scoring schemes and sequence divergences (the PAM and BLOSUM series). Finally we describe how to calculate the expected accuracy of a given alignment, and show how this can be used to construct an optimal accuracy alignment algorithm which generates significantly more accurate alignments than standard dynamic programming methods in simulated experiments.