Recognition of Unknown Conserved Alternatively Spliced Exons

Open Access

8 July 2005

journal article
research article
Published by Public Library of Science (PLoS) in PLoS Computational Biology

Vol. 1 (2), e15-22
https://doi.org/10.1371/journal.pcbi.0010015

Abstract

The split structure of most mammalian protein-coding genes allows for the potential to produce multiple different mRNA and protein isoforms from a single gene locus through the process of alternative splicing (AS). We propose a computational approach called UNCOVER based on a pair hidden Markov model to discover conserved coding exonic sequences subject to AS that have so far gone undetected. Applying UNCOVER to orthologous introns of known human and mouse genes predicts skipped exons or retained introns present in both species, while discriminating them from conserved noncoding sequences. The accuracy of the model is evaluated on a curated set of genes with known conserved AS events. The prediction of skipped exons in the ~1% of the human genome represented by the ENCODE regions leads to more than 50 new exon candidates. Five novel predicted AS exons were validated by RT-PCR and sequencing analysis of 15 introns with strong UNCOVER predictions and lacking EST evidence. These results imply that a considerable number of conserved exonic sequences and associated isoforms are still completely missing from the current annotation of known genes. UNCOVER also identifies a small number of candidates for conserved intron retention. Alternative splicing is a process in which more than one protein variant can be produced from one gene: Specific parts of the mRNA precursor are included or excluded during the processing into the mature transcript. It is very prevalent in mammalian genomes, and variants are often specific for particular cell types, developmental states, or environmental changes. The identification of such variants has until recently relied solely on the sequencing and comparison of expressed sequence tags (ESTs), but the number of available ESTs is not large enough to cover all variants under all conditions. Ohler et al. have now devised a comparative genomics algorithm based on a pair hidden Markov model, which identifies parts of genes that are alternatively spliced and have not been observed in ESTs. Starting from known annotated genes conserved in human and mouse, they scan corresponding intron pairs of these genes to identify conserved sequences that match the model. Experimental validation of a number of new predictions show that the approach can successfully uncover splice variants that are as yet unknown and not part of the large libraries of ESTs. Together with recently proposed complementary computational methods, this approach helps us to complete our knowledge about the transcript diversity created by alternative splicing.

Keywords

This publication has 38 references indexed in Scilit:

A computational and experimental approach toward a priori identification of alternatively spliced exons
RNA, 2004
How did alternative splicing evolve?
Nature Reviews Genetics, 2004
A Non-EST-Based Method for Exon-Skipping Prediction
Genome Research, 2004
Accurate Identification of Novel Human Genes Through Simultaneous Gene Prediction in Human, Mouse, and Rat
Genome Research, 2004
Single Molecule Profiling of Alternative Pre-mRNA Splicing
Science, 2003
Distinguishing Regulatory DNA From Neutral Sites
Genome Research, 2003
Evidence for the widespread coupling of alternative splicing and nonsense-mediated mRNA decay in humans
Proceedings of the National Academy of Sciences, 2002
Selecting for Functional Alternative Splices in ESTs
Genome Research, 2002
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Research, 1997
A linear space algorithm for computing maximal common subsequences
Communications of the ACM, 1975

Cited by 39 articles