Statistics of Knots, Geometry of Conformations, and Evolution of Proteins

Abstract
Like shoelaces, the backbones of proteins may get entangled and form knots. However, only a few knots in native proteins have been identified so far. To more quantitatively assess the rarity of knots in proteins, we make an explicit comparison between the knotting probabilities in native proteins and in random compact loops. We identify knots in proteins statistically, applying the mathematics of knot invariants to the loops obtained by complementing the protein backbone with an ensemble of random closures, and assigning a certain knot type to a given protein if and only if this knot dominates the closure statistics (which tells us that the knot is determined by the protein and not by a particular method of closure). We also examine the local fractal or geometrical properties of proteins via computational measurements of the end-to-end distance and the degree of interpenetration of its subchains. Although we did identify some rather complex knots, we show that native conformations of proteins have statistically fewer knots than random compact loops, and that the local geometrical properties, such as the crumpled character of the conformations at a certain range of scales, are consistent with the rarity of knots. From these, we may conclude that the known “protein universe” (set of native conformations) avoids knots. However, the precise reason for this is unknown—for instance, if knots were removed by evolution due to their unfavorable effect on protein folding or function or due to some other unidentified property of protein evolution. Proteins in their native state are compact structures consisting of long chains of amino-acid residues. As such, a protein should be likely to get entangled or tie into a complex knot. However, researchers have found only a handful of complex knots in native proteins. Lua and Grosberg make what they believe to be the first quantitative study of the statistics of knots in proteins. Although they have found some rather complex knots, including one knot with five crossings in a modest size protein of only 229 amino acids (ubiquitin hydrolase), comparison of the knot abundance in proteins and in compact random strings on a lattice indicates extreme nonrandomness of protein conformations in this respect. They also study the statistics of the geometrical behaviour of parts of protein chains. They find that these parts, on the scale of about 20–30 residues, have a strong nonrandom tendency to crumple back on themselves, and that the segregration of the parts on this scale is also far in excess of random, while on a larger scale the geometry of conformations is statistically close to random. These geometrical features are consistent with the statistical rarity of knots. From these, the authors conclude that the “protein universe” avoids knots.