Exhaustive Matching of the Entire Protein Sequence Database

Abstract
The entire protein sequence database has been exhaustively matched. Definitive mutation matrices and models for scoring gaps were obtained from the matching and used to organize the sequence database as sets of evolutionarily connected components. The methods developed are general and can be used to manage sequence data generated by major genome sequencing projects. The alignments made possible by the exhaustive matching are the starting point for successful de novo prediction of the folded structures of proteins, for reconstructing sequences of ancient proteins and metabolisms in ancient organisms, and for obtaining new perspectives in structural biochemistry.