Anatomy of protein pockets and cavities: Measurement of binding site geometry and implications for ligand design

Abstract
Identification and size characterization of surface pockets and occluded cavities are initial steps in protein structure‐based ligand design. A new program, CAST, for automatically locating and measuring protein pockets and cavities, is based on precise computational geometry methods, including alpha shape and discrete flow theory. CAST identifies and measures pockets and pocket mouth openings, as well as cavities. The program specifies the atoms lining pockets, pocket openings, and buried cavities; the volume and area of pockets and cavities; and the area and circumference of mouth openings. CAST analysis of over 100 proteins has been carried out; proteins examined include a set of 51 monomeric enzyme‐ligand structures, several elastase‐inhibitor complexes, the FK506 binding protein, 30 HIV‐1 protease‐inhibitor complexes, and a number of small and large protein inhibitors. Medium‐sized globular proteins typically have 10‐20 pockets/cavities. Most often, binding sites are pockets with 1‐2 mouth openings; much less frequently they are cavities. Ligand binding pockets vary widely in size, most within the range 102‐103 Å3. Statistical analysis reveals that the number of pockets and cavities is correlated with protein size, but there is no correlation between the size of the protein and the size of binding sites. Most frequently, the largest pocket/cavity is the active site, but there are a number of instructive exceptions. Ligand volume and binding site volume are somewhat correlated when binding site volume is < 700 Å3, but the ligand seldom occupies the entire site. Auxiliary pockets near the active site have been suggested as additional binding surface for designed ligands (Mattos C et al., 1994, Nat Struct Bid 1:55‐58). Analysis of elastase‐inhibitor complexes suggests that CAST can identify ancillary pockets suitable for recruitment in ligand design strategies. Analysis of the FK506 binding protein, and of compounds developed in SAR by NMR (Shuker SB et al., 1996, Science 274:1531‐1534), indicates that CAST pocket computation may provide a priori identification of target proteins for linked‐fragment design. CAST analysis of 30 HIV‐1 protease‐inhibitor complexes shows that the flexible active site pocket can vary over a range of 853‐1,566 Å3, and that there are two pockets near or adjoining the active site that may be recruited for ligand design.