Identification and Classification of Conserved RNA Secondary Structures in the Human Genome

Open Access

21 April 2006

journal article
research article
Published by Public Library of Science (PLoS) in PLoS Computational Biology

Vol. 2 (4), e33
https://doi.org/10.1371/journal.pcbi.0020033

Abstract

The discoveries of microRNAs and riboswitches, among others, have shown functional RNAs to be biologically more important and genomically more prevalent than previously anticipated. We have developed a general comparative genomics method based on phylogenetic stochastic context-free grammars for identifying functional RNAs encoded in the human genome and used it to survey an eight-way genome-wide alignment of the human, chimpanzee, mouse, rat, dog, chicken, zebra-fish, and puffer-fish genomes for deeply conserved functional RNAs. At a loose threshold for acceptance, this search resulted in a set of 48,479 candidate RNA structures. This screen finds a large number of known functional RNAs, including 195 miRNAs, 62 histone 3′UTR stem loops, and various types of known genetic recoding elements. Among the highest-scoring new predictions are 169 new miRNA candidates, as well as new candidate selenocysteine insertion sites, RNA editing hairpins, RNAs involved in transcript auto regulation, and many folds that form singletons or small functional RNA families of completely unknown function. While the rate of false positives in the overall set is difficult to estimate and is likely to be substantial, the results nevertheless provide evidence for many new human functional RNAs and present specific predictions to facilitate their further characterization. Structurally functional RNA is a versatile component of the cell that comprises both independent molecules and regulatory elements of mRNA transcripts. The many recent discoveries of functional RNAs, most notably miRNAs, suggests that many more are yet to be found. Computational identification of functional RNAs has traditionally been hampered by the lack of strong sequence signals. However, structural conservation over long evolutionary times creates a characteristic substitution pattern, which can be exploited with the advent of comparative genomics. The authors have devised a method for identification of functional RNA structures based on phylogenetic analysis of multiple alignments. This method has been used to screen the regions of the human genome that are under strong selective constraints. The result is a set of 48,479 candidate RNA structures. For some classes of known functional RNAs, such as miRNAs and histone 3′UTR stem loops, this set includes nearly all deeply conserved members. The initial large candidate set has been partitioned by size, shape, and genomic location and ranked by score to produce specific lists of top candidates for miRNAs, selenocysteine insertion sites, RNA editing hairpins, and RNAs involved in transcript auto regulation.

Keywords

This publication has 67 references indexed in Scilit:

Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals
Nature, 2005
Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution
Nature, 2004
The Microprocessor complex mediates the genesis of microRNAs
Nature, 2004
Processing of primary microRNAs by the Microprocessor complex
Nature, 2004
Finishing the euchromatic sequence of the human genome
Nature, 2004
Genome sequence of the Brown Norway rat yields insights into mammalian evolution
Nature, 2004
Initial sequencing and comparative analysis of the mouse genome
Nature, 2002
BLAT—The BLAST-Like Alignment Tool
Genome Research, 2002
Non–coding RNA genes and the modern RNA world
Nature Reviews Genetics, 2001
Evolutionary trees from DNA sequences: A maximum likelihood approach
Journal of Molecular Evolution, 1981

Cited by 411 articles