Performance of mutation pathogenicity prediction methods on missense variants

Open Access

18 January 2011

journal article
informatics
Published by Hindawi Limited in Human Mutation

Vol. 32 (4), 358-368
https://doi.org/10.1002/humu.21445

Abstract

Single nucleotide polymorphisms (SNPs) are the most common form of genetic variation in humans. The number of SNPs identified in the human genome is growing rapidly, but attaining experimental knowledge about the possible disease association of variants is laborious and time-consuming. Several computational methods have been developed for the classification of SNPs according to their predicted pathogenicity. In this study, we have evaluated the performance of nine widely used pathogenicity prediction methods available on the Internet. The evaluated methods were MutPred, nsSNPAnalyzer, Panther, PhD-SNP, PolyPhen, PolyPhen2, SIFT, SNAP, and SNPs&GO. The methods were tested with a set of over 40,000 pathogenic and neutral variants. We also assessed whether the type of original or substituting amino acid residue, the structural class of the protein, or the structural environment of the amino acid substitution, had an effect on the prediction performance. The performances of the programs ranged from poor (MCC 0.19) to reasonably good (MCC 0.65), and the results from the programs correlated poorly. The overall best performing methods in this study were SNPs&GO and MutPred, with accuracies reaching 0.82 and 0.81, respectively. Hum Mutat 32:1–11, 2011.

Keywords

Funding Information

The Tampere Graduate School in Biomedicine and Biotechnology
the Sigrid Jusélius Foundation
the academy of finland
The Medical Research Fund of Tampere University Hospital

This publication has 62 references indexed in Scilit:

A method and server for predicting damaging missense mutations
Nature Methods, 2010
Statistical geometry based prediction of nonsynonymous SNP functional effects using random forest and neuro-fuzzy classifiers
Proteins-Structure Function and Bioinformatics, 2008
SNAP: predict effect of non-synonymous polymorphisms on function
Nucleic Acids Research, 2007
nsSNPAnalyzer: identifying disease-associated nonsynonymous single nucleotide polymorphisms
Nucleic Acids Research, 2005
Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information
Bioinformatics, 2004
Assessing the accuracy of prediction algorithms for classification: an overview
Bioinformatics, 2000
Gene Ontology: tool for the unification of biology
Nature Genetics, 2000
The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000
Nucleic Acids Research, 2000
The Protein Data Bank
Nucleic Acids Research, 2000
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Research, 1997

Cited by 531 articles