Identification and Classification of Conserved RNA Secondary Structures in the Human Genome
Open Access
- 21 April 2006
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Computational Biology
- Vol. 2 (4), e33
- https://doi.org/10.1371/journal.pcbi.0020033
Abstract
The discoveries of microRNAs and riboswitches, among others, have shown functional RNAs to be biologically more important and genomically more prevalent than previously anticipated. We have developed a general comparative genomics method based on phylogenetic stochastic context-free grammars for identifying functional RNAs encoded in the human genome and used it to survey an eight-way genome-wide alignment of the human, chimpanzee, mouse, rat, dog, chicken, zebra-fish, and puffer-fish genomes for deeply conserved functional RNAs. At a loose threshold for acceptance, this search resulted in a set of 48,479 candidate RNA structures. This screen finds a large number of known functional RNAs, including 195 miRNAs, 62 histone 3′UTR stem loops, and various types of known genetic recoding elements. Among the highest-scoring new predictions are 169 new miRNA candidates, as well as new candidate selenocysteine insertion sites, RNA editing hairpins, RNAs involved in transcript auto regulation, and many folds that form singletons or small functional RNA families of completely unknown function. While the rate of false positives in the overall set is difficult to estimate and is likely to be substantial, the results nevertheless provide evidence for many new human functional RNAs and present specific predictions to facilitate their further characterization. Structurally functional RNA is a versatile component of the cell that comprises both independent molecules and regulatory elements of mRNA transcripts. The many recent discoveries of functional RNAs, most notably miRNAs, suggests that many more are yet to be found. Computational identification of functional RNAs has traditionally been hampered by the lack of strong sequence signals. However, structural conservation over long evolutionary times creates a characteristic substitution pattern, which can be exploited with the advent of comparative genomics. The authors have devised a method for identification of functional RNA structures based on phylogenetic analysis of multiple alignments. This method has been used to screen the regions of the human genome that are under strong selective constraints. The result is a set of 48,479 candidate RNA structures. For some classes of known functional RNAs, such as miRNAs and histone 3′UTR stem loops, this set includes nearly all deeply conserved members. The initial large candidate set has been partitioned by size, shape, and genomic location and ranked by score to produce specific lists of top candidates for miRNAs, selenocysteine insertion sites, RNA editing hairpins, and RNAs involved in transcript auto regulation.Keywords
This publication has 67 references indexed in Scilit:
- Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammalsNature, 2005
- Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolutionNature, 2004
- The Microprocessor complex mediates the genesis of microRNAsNature, 2004
- Processing of primary microRNAs by the Microprocessor complexNature, 2004
- Finishing the euchromatic sequence of the human genomeNature, 2004
- Genome sequence of the Brown Norway rat yields insights into mammalian evolutionNature, 2004
- Initial sequencing and comparative analysis of the mouse genomeNature, 2002
- BLAT—The BLAST-Like Alignment ToolGenome Research, 2002
- Non–coding RNA genes and the modern RNA worldNature Reviews Genetics, 2001
- Evolutionary trees from DNA sequences: A maximum likelihood approachJournal of Molecular Evolution, 1981