Abstract
We have recognized about ten distinct forms of strongly basic hexapeptides, containing at least four arginines and lysines, characteristic of nuclear proteins among all eukaryotic species, including yeast, plants, flies and mammals. These basic hexapeptides are considered to be different versions of a core nuclear localization signal, NLS. Core NLSs are present in nearly all nuclear proteins and absent from nearly all “nonassociated” cytoplasmic proteins that have been investigated. We suggest that the few (∼ 10%) protein factors lacking a typical NLS core peptide may enter the nucleus via their strong crosscomplexation with their protein factor partners that possess a core NLS. Those cytoplasmic proteins found to possess a NLS-like peptide are either tightly associated with cell membrane proteins or are integral components of large cytoplasmic protein complexes. On the other hand, some versions of core NLSs are found in many cell membrane proteins and secreted proteins. It is hypothesized that in these cases the N-terminal hydrophobic signal peptide of extracellular proteins and the internal hydrophobic domains of transmembrane proteins are stronger determinants for their subcellular localization. The position of core NLSs among homologous nuclear proteins may or may not be conserved; however, if lost from an homolgous site it appears elsewhere in the protein. This search provides a set of rules to our understanding of the nature of core nuclear localization signals: (1) Core NLS are proposed to consist most frequently of an hexapeptide with 4 arginines and lysines; (2) aspartic and glutamic acid residues as well as bulky amino acids (F, Y, W) need not to be present in this hexapeptide; (3) acidic residues and proline or glycine that break the α-helix are frequently in the flanking region of this hexapeptide stretch; (4) hydrophobic residues ought not to be present in the core NLS flanking region allowing for the NLS to be exposed on the protein. In this study we attempt to classify putative core NLS from a wealth of nuclear protein transcription factors from diverse species into several categories, and we propose additional core NLS structures yet to be experimentally verified.