Mapping intact protein isoforms in discovery mode using top-down proteomics

Abstract
Conventional 'bottom-up' proteomics, in which mass spectrometry is used to analyse peptide mixtures made by tryptic digestion of target proteins, is a powerful way of characterizing complex proteomes. However, the technique has limitations when considering different protein isoforms and combinations of post-translational modifications. The 'top-down' approach is generally thought to be impractical because of the limitations of mass spectrometry and difficulties with automation. A new top-down system presented here avoids these problems by using a four-dimensional separation system that achieves greater proteome coverage than conventional methods. A proof-of-principle experiment shows that the method is capable of identifying previously undetected isoforms and isoform-specific post-translational modifications caused by cellular senescence. A full description of the human proteome relies on the challenging task of detecting mature and changing forms of protein molecules in the body. Large-scale proteome analysis1 has routinely involved digesting intact proteins followed by inferred protein identification using mass spectrometry2. This ‘bottom-up’ process affords a high number of identifications (not always unique to a single gene). However, complications arise from incomplete or ambiguous2 characterization of alternative splice forms, diverse modifications (for example, acetylation and methylation) and endogenous protein cleavages, especially when combinations of these create complex patterns of intact protein isoforms and species3. ‘Top-down’ interrogation of whole proteins can overcome these problems for individual proteins4,5, but has not been achieved on a proteome scale owing to the lack of intact protein fractionation methods that are well integrated with tandem mass spectrometry. Here we show, using a new four-dimensional separation system, identification of 1,043 gene products from human cells that are dispersed into more than 3,000 protein species created by post-translational modification (PTM), RNA splicing and proteolysis. The overall system produced greater than 20-fold increases in both separation power and proteome coverage, enabling the identification of proteins up to 105 kDa and those with up to 11 transmembrane helices. Many previously undetected isoforms of endogenous human proteins were mapped, including changes in multiply modified species in response to accelerated cellular ageing (senescence) induced by DNA damage. Integrated with the latest version of the Swiss-Prot database6, the data provide precise correlations to individual genes and proof-of-concept for large-scale interrogation of whole protein molecules. The technology promises to improve the link between proteomics data and complex phenotypes in basic biology and disease research7.