Comparison of three algorithms for the assignment of secondary structure in proteins: the advantages of a consensus assignment

Abstract
Accurate assignments of secondary structures in proteins are crucial for a useful comparison with theoretical predictions. Three major programs which automatically determine the location of helices and strands are used for this purpose, namely DSSP, P-Curve and Define. Their results have been compared for a non-redundant database of 154 proteins. On a residue per residue basis, the percentage match score is only 63% between the three methods. While these methods agree on the overall number of residues in each of the three states (helix, strand or coil), they differ on the number of helices or strands, thus implying a wide discrepancy in the length of assigned structural elements. Moreover, the length distribution of helices and strands points to the existence of artefacts inherent to each assignment algorithm. To overcome these difficulties a consensus assignment is proposed where each residue is assigned to the state determined by at least two of the three methods. With this assignment the artefacts of each algorithm are attenuated. The residues assigned in the same state by the three methods are better predicted than the others. This assignment will thus be useful for analysing the success rate of prediction methods more accurately.