Cloning of pea storage protein genes

Abstract
Vicilin and legumin are the major storage proteins of Pisum sativum . Complementary DNAs (cDNAs) have been produced from poly(A) + mRNA isolated from developing seeds and specific storage protein cDNAs cloned into pBR322. The amino acid sequences predicted from the cDNA sequences have been compared with the actual amino acid sequences derived from the purified protein subunits. These comparisons have confirmed that the legumin α and β subunits as initially synthesized are covalently joined together and that a small peptide is subsequently removed by endoproteolysis to give the disulphide linked subunits of the mature seed legumins. Similar comparisons between the predicted amino acid sequence of vicilin cDNA clones and the amino acid sequence determined on the isolated subunits has shown that some of the 50000 M r type subunits are subsequently cleaved to give three subunits as products, i.e. polypeptides of 19000 M r (α), 13500 M r (β) and 12500 M r or 16000 M r if glycosylated (γ). In addition to these three subunits, cleavage at one or other of the two potential cleavage sites, results in a 33000 M r polypeptide (α + β) and a 31000 M r polypeptide tentatively identified as β + γ. The presence of the sequence Lys-Glu-Asn leads to cleavage on the carboxy side of Asn at the β :γ cleavage site whereas the sequence Gly-Leu-Arg does not lead to cleavage. Comparable sequence data for the α: β processing site do not exist. Comparisons of the cDNA and amino acid sequence disclose the presence of a 15 or 16 amino acid residue vicilin leader sequence as well as a 12 amino acid residue C-terminal peptide which is also removed. The codon usage of the messenger RNAs for the storage proteins are similar to those of other plant proteins and differ somewhat from animal messenger RNAs. Complementary DNAs for specific storage proteins when used to probe different restriction enzyme digests of pea genomic DNA reveal the presence of a small number of legumin and vicilin coding sequences (two to five for legumin and three to seven for vicilin) that occur as single copies except for one vicilin sequence present in two to three copies. Genetic mapping experiments using whole plants locate both the main legumin and the vicilin genes on chromosome 7. The main legumin subunits are coded by genes located at a single Mendelian locus Lg-1 located on the short arm of chromosome 7 very close to the rub locus and the vicilin gene is located 16 map units away close to the r locus. Gene libraries prepared with size fractionated partial restriction enzymic digests of pea genomic DNA ligated into both phage λ L47 and phage λ gt wes have led to the isolation of at least three similar but different legumin genomic sequences. Comparison of the λ and cDNA legumin clones suggests the presence of at least one intron in the former. Legumes in general contain two major seed storage protein types, vicilin and legumin (Derbyshire et al. 1976). Seeds of Pisum sativum (L) have significant amounts of both proteins and since a considerable body of knowledge exists about pea physiology and genetics, this species is a good choice for the study of storage protein genes. Peas are also one of the world’s major legume crops. Since the storage proteins are found only in the tissues of the developing seed (Millerd 1975) and then only in significant amounts during the middle and late stages of development, it was suspected from the onset that the genes responsible for the storage protein would belong to the class of developmentally regulated genes, i.e. those that are only switched on in specific tissues over restricted periods of time.