In silico prediction of splice-altering single nucleotide variants in the human genome
Top Cited Papers
Open Access
- 21 November 2014
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 42 (22), 13534-13544
- https://doi.org/10.1093/nar/gku1206
Abstract
In silico tools have been developed to predict variants that may have an impact on pre-mRNA splicing. The major limitation of the application of these tools to basic research and clinical practice is the difficulty in interpreting the output. Most tools only predict potential splice sites given a DNA sequence without measuring splicing signal changes caused by a variant. Another limitation is the lack of large-scale evaluation studies of these tools. We compared eight in silico tools on 2959 single nucleotide variants within splicing consensus regions (scSNVs) using receiver operating characteristic analysis. The Position Weight Matrix model and MaxEntScan outperformed other methods. Two ensemble learning methods, adaptive boosting and random forests, were used to construct models that take advantage of individual methods. Both models further improved prediction, with outputs of directly interpretable prediction scores. We applied our ensemble scores to scSNVs from the Catalogue of Somatic Mutations in Cancer database. Analysis showed that predicted splice-altering scSNVs are enriched in recurrent scSNVs and known cancer genes. We pre-computed our ensemble scores for all potential scSNVs across the human genome, providing a whole genome level resource for identifying splice-altering scSNVs discovered from large-scale sequencing studies.Keywords
This publication has 52 references indexed in Scilit:
- dbNSFP v2.0: A Database of Human Non-synonymous SNVs and Their Functional Predictions and AnnotationsHuman Mutation, 2013
- An integrated map of genetic variation from 1,092 human genomesNature, 2012
- Guidelines for splicing analysis in molecular diagnosis derived from a set of 327 combined in silico/in vitro studies on BRCA1 and BRCA2 variantsHuman Mutation, 2012
- Detection and Quantification of Alternative Splicing Variants Using RNA-seqMethods in molecular biology (Clifton, N.J.), 2012
- dbNSFP: A lightweight database of human nonsynonymous SNPs and their functional predictionsHuman Mutation, 2011
- RNA sequencing: advances, challenges and opportunitiesNature Reviews Genetics, 2010
- Missed threadsEMBO Reports, 2009
- Splicing in disease: disruption of the splicing code and the decoding machineryNature Reviews Genetics, 2007
- A census of human cancer genesNature Reviews Cancer, 2004
- Prediction of complete gene structures in human genomic DNAJournal of Molecular Biology, 1997