Predicting the Functional Effect of Amino Acid Substitutions and Indels
Top Cited Papers
Open Access
- 8 October 2012
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLOS ONE
- Vol. 7 (10), e46688
- https://doi.org/10.1371/journal.pone.0046688
Abstract
As next-generation sequencing projects generate massive genome-wide sequence variation data, bioinformatics tools are being developed to provide computational predictions on the functional effects of sequence variations and narrow down the search of casual variants for disease phenotypes. Different classes of sequence variations at the nucleotide level are involved in human diseases, including substitutions, insertions, deletions, frameshifts, and non-sense mutations. Frameshifts and non-sense mutations are likely to cause a negative effect on protein function. Existing prediction tools primarily focus on studying the deleterious effects of single amino acid substitutions through examining amino acid conservation at the position of interest among related sequences, an approach that is not directly applicable to insertions or deletions. Here, we introduce a versatile alignment-based score as a new metric to predict the damaging effects of variations not limited to single amino acid substitutions but also in-frame insertions, deletions, and multiple amino acid substitutions. This alignment-based score measures the change in sequence similarity of a query sequence to a protein sequence homolog before and after the introduction of an amino acid variation to the query sequence. Our results showed that the scoring scheme performs well in separating disease-associated variants (n = 21,662) from common polymorphisms (n = 37,022) for UniProt human protein variations, and also in separating deleterious variants (n = 15,179) from neutral variants (n = 17,891) for UniProt non-human protein variations. In our approach, the area under the receiver operating characteristic curve (AUC) for the human and non-human protein variation datasets is ∼0.85. We also observed that the alignment-based score correlates with the deleteriousness of a sequence variation. In summary, we have developed a new algorithm, PROVEAN (Protein Variation Effect Analyzer), which provides a generalized approach to predict the functional effects of protein sequence variations including single or multiple amino acid substitutions, and in-frame insertions and deletions. The PROVEAN tool is available online at http://provean.jcvi.org.Keywords
This publication has 34 references indexed in Scilit:
- Predicting the functional impact of protein mutations: application to cancer genomicsNucleic Acids Research, 2011
- A map of human genome variation from population-scale sequencingNature, 2010
- Integrating common and rare genetic variation in diverse human populationsNature, 2010
- A method and server for predicting damaging missense mutationsNature Methods, 2010
- Comprehensive genomic characterization defines human glioblastoma genes and core pathwaysNature, 2008
- Genetic Variation in an Individual Human ExomePLoS Genetics, 2008
- Physicochemical constraint violation by missense substitutions mediates impairment of protein function and disease severityGenome Research, 2005
- Large-scale analysis of non-synonymous coding region single nucleotide polymorphismsBioinformatics, 2004
- PANTHER: A Library of Protein Families and Subfamilies Indexed by FunctionGenome Research, 2003
- Predicting Deleterious Amino Acid SubstitutionsGenome Research, 2001