Evolution of a family ofN-acetylglucosamine binding proteins containing the disulfide-rich domain of wheat germ agglutinin

Abstract
A disulfide-rich domain, first identified in wheat germ agglutinin, has now been identified in the amino acid and DNA sequences of a large number of other chitin-binding proteins. This 43-residue domain includes eight disulfide-linked cysteines and has been implicated in the binding ofN-acetylglucosamine and its polymers. This study used 12 complementary DNA sequences and 1 amino acid sequence of proteins with one, two, and four copies of this domain to infer a 44-amino acid residue ancestor sequence for this domain, and to derive an evolutionary tree relating these domains in the different proteins. The tree relating these single-domain sequences is divided into two major branches, one consisting of the multidomain dimeric lectins, which we have earlier suggested arose by duplication of a single copy of the disulfide-rich domain, and the other branch consisting of the monomeric chitinases and wound-inducible proteins, which have a single copy of the domain fused to a larger polypeptide. Reference to the three-dimensional structure of WGA and its saccharide complexes shows that the saccharide-binding residues as well as cysteine and glycine residues are conserved among all available sequences. In contrast, many residues at the dimer interface of the domains of WGA are not conserved in those proteins with a single domain, implying that the aggregation state of the domains in these proteins differs from that of the grass lectins. Also, the base compositions of the four-domain and one-domain branches of the tree differ, indicating distinct selective pressures at the level of both protein structure and the gene or its transcript.