(searched for: doi:10.1146/annurev-genet-112618-043822)
Published: 3 July 2021
Small proteins of <51 amino acids are abundant across all domains of life but are often overlooked because their small size makes them difficult to predict computationally, and they are refractory to standard proteomic approaches. Ribosome profiling has been used to infer the existence of small proteins by detecting the translation of the corresponding open reading frames (ORFs). Detection of translated short ORFs by ribosome profiling can be improved by treating cells with drugs that stall ribosomes at specific codons. Here, we combine the analysis of ribosome profiling data for Escherichia coli cells treated with antibiotics that stall ribosomes at either start or stop codons. Thus, we identify ribosome-occupied start and stop codons for ~400 novel putative ORFs with high sensitivity. The newly discovered ORFs are mostly short, with 365 encoding proteins of <51 amino acids. We validate translation of several selected short ORFs, and show that many likely encode unstable proteins. Moreover, we present evidence that most of the newly identified short ORFs are not under purifying selection, suggesting they do not impact cell fitness, although a small subset have the hallmarks of functional ORFs. IMPORTANCE Small proteins of <51 amino acids are abundant across all domains of life but are often overlooked because their small size makes them difficult to predict computationally, and they are refractory to standard proteomic approaches. Recent studies have discovered small proteins by mapping the location of translating ribosomes on RNA using a technique known as ribosome profiling. Discovery of translated sORFs using ribosome profiling can be improved by treating cells with drugs that trap initiating ribosomes. Here, we show that combining these data with equivalent data for cells treated with a drug that stalls terminating ribosomes facilitates the discovery of small proteins. We use this approach to discover 365 putative genes that encode small proteins in Escherichia coli.
Published: 26 March 2021
Upstream open reading frames (uORFs) are short ORFs found in the 5′-UTRs of many eukaryotic transcripts and can influence the translation of protein-coding main ORFs (mORFs). Recent genome-wide ribosome profiling studies have revealed that thousands of uORFs initiate translation at non-AUG start codons. However, the physiological significance of these non-AUG uORFs has so far been demonstrated for only a few of them. It is conceivable that physiologically important non-AUG uORFs are evolutionarily conserved across species. In this study, using a combination of bioinformatics and experimental approaches, we searched the Arabidopsis genome for non-AUG-initiated uORFs with conserved sequences that control the expression of the mORF-encoded proteins. As a result, we identified four novel regulatory non-AUG uORFs. Among these, two exerted repressive effects on mORF expression in an amino acid sequence-dependent manner. These two non-AUG uORFs are likely to encode regulatory peptides that cause ribosome stalling, thereby enhancing their repressive effects. In contrast, one of the identified regulatory non-AUG uORFs promoted mORF expression by alleviating the inhibitory effect of a downstream AUG-initiated uORF. These findings provide insights into the mechanisms that enable non-AUG uORFs to play regulatory roles despite their low translation initiation efficiencies.