Computational identification of noncoding RNAs in E. coli by comparative genomics

1 September 2001

journal article
research article
Published by Elsevier BV in Current Biology

Vol. 11 (17), 1369-1373
https://doi.org/10.1016/s0960-9822(01)00401-8

Abstract

Some genes produce noncoding transcripts that function directly as structural, regulatory, or even catalytic RNAs [ 1 Eddy S.R. Noncoding RNA genes. Curr Opin Genet Dev. 1999; 9 : 695-699 Crossref PubMed Scopus (112) Google Scholar , 2 Wassarman K.M. Zhang A. Storz G. Small RNAs in Escherichia coli. Trends Microbiol. 2000; 7 : 37-45 Abstract Full Text Full Text PDF Scopus (161) Google Scholar ]. Unlike protein-coding genes, which can be detected as open reading frames with distinctive statistical biases, noncoding RNA (ncRNA) gene sequences have no obvious inherent statistical biases [ 3 Rivas E. Eddy S.R. Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs. Bioinformatics. 2000; 16 : 583-605 Crossref PubMed Scopus (210) Google Scholar ]. Thus, genome sequence analyses reveal novel protein-coding genes, but any novel ncRNA genes remain invisible. Here, we describe a computational comparative genomic screen for ncRNA genes. The key idea is to distinguish conserved RNA secondary structures from a background of other conserved sequences using probabilistic models of expected mutational patterns in pairwise sequence alignments. We report the first whole-genome screen for ncRNA genes done with this method, in which we applied it to the “intergenic” spacers of Escherichia coli using comparative sequence data from four related bacteria. Starting from >23,000 conserved interspecies pairwise alignments, the screen predicted 275 candidate structural RNA loci. A sample of 49 candidate loci was assayed experimentally. At least 11 loci expressed small, apparently noncoding RNA transcripts of unknown function. Our computational approach may be used to discover structural ncRNA genes in any genome for which appropriate comparative genome sequence data are available.

Keywords

This publication has 28 references indexed in Scilit:

Identification of novel small RNAs using comparative genomics and microarrays
Genes & Development, 2001
PREFACE
Clinics in Liver Disease, 2001
Regulation of RpoS by a novel small RNA: the characterization of RprA
Molecular Microbiology, 2001
Human-mouse genome comparisons to locate regulatory sites
Nature Genetics, 2000
An RNA thermometer
Genes & Development, 1999
A Computational Screen for Methylation Guide snoRNAs in Yeast
Science, 1999
Small RNAs in Escherichia coli
Trends in Microbiology, 1999
The Complete Genome Sequence of Escherichia coli K-12
Science, 1997
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Research, 1997
Amino acid substitution matrices from an information theoretic perspective
Journal of Molecular Biology, 1991

Cited by 327 articles