How many membrane proteins are there?

Abstract
One of the basic issues that arises in functional genomics is the ability to predict the subcellular location of proteins that are deduced from gene and genome sequencing. In particular, one would like to be able to readily specify those proteins that are soluble and those that are inserted in a membrane. Traditional methods of distinguishing between these two locations have relied on extensive, time-consuming biochemical studies. The alternative approach has been to make inferences based on a visual search of the amino acid sequences of presumed gene products for stretches of hydrophobic amino acids. This numerical, sequence-based approach is usually seen as a first approximation pending more reliable biochemical data. The recent availability of large and complete sequence data sets for several organisms allows us to determine just how accurate such a numerical approach could be, and to attempt to minimize and quantify the error involved. We have optimized a statistical approach to protein location determination. Using our approach, we have determined that surprisingly few proteins are misallocated using the numerical method. We also examine the biological implications of the success of this technique.
Funding Information
  • National Institutes of Health (GM54160)