Reconstructing history with amino acid sequences1

Abstract
The main goal of the protein evolutionist is the reconstruction of past events leading to the structures of contemporary proteins. The common strategy is to align amino acid sequences and make inferences about matters of common ancestry. The rate of change of amino acid sequence varies greatly from protein to protein, and this naturally affects how far back a given protein's ancestry can be traced. Happily, the rate of change of many proteins is slow enough that very ancient events can be inferred. Many mainstream metabolic enzymes, for example, are 40–50% identical in prokaryotes and eukaryotes, groups that diverged from a common ancestor more than 1.5 billion years ago. Moreover, some eukaryotic proteins like actin and tubulin change so slowly that they are seldom less than 60% identical, no matter from what source they are drawn. As it happens, prokaryotic counterparts for many eukaryotic cytoskeletal proteins are unknown. A recent exception involves the finding that a heat shock protein cognate is a relative of actin. The gene duplication that gave rise to these two proteins must have been an ancient event. The more recent invention of other proteins whose distribution is restricted to one or the other of the major kingdoms may be easier to trace. Among the factors that can confound the reconstruction of events, however, are occasional horizontal gene transfers and exon shuffling. The latter has led to a number of mosaic proteins, many of which contain various combinations of a relatively small set of modules like the epidermal growth factor domain.
Funding Information
  • National Institutes of Health