DANN: a deep learning approach for annotating the pathogenicity of genetic variants

Top Cited Papers

Open Access

22 October 2014

journal article
research article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 31 (5), 761-763
https://doi.org/10.1093/bioinformatics/btu703

Abstract

Summary: Annotating genetic variants, especially non-coding variants, for the purpose of identifying pathogenic variants remains a challenge. Combined annotation-dependent depletion (CADD) is an algorithm designed to annotate both coding and non-coding variants, and has been shown to outperform other annotation algorithms. CADD trains a linear kernel support vector machine (SVM) to differentiate evolutionarily derived, likely benign, alleles from simulated, likely deleterious, variants. However, SVMs cannot capture non-linear relationships among the features, which can limit performance. To address this issue, we have developed DANN. DANN uses the same feature set and training data as CADD to train a deep neural network (DNN). DNNs can capture non-linear relationships among features and are better suited than SVMs for problems with a large number of samples and features. We exploit Compute Unified Device Architecture-compatible graphics processing units and deep learning techniques such as dropout and momentum training to accelerate the DNN training. DANN achieves about a 19% relative reduction in the error rate and about a 14% relative increase in the area under the curve (AUC) metric over CADD’s SVM methodology. Availability and implementation: All data and source code are available at https://cbcl.ics.uci.edu/public_data/DANN/. Contact:xhx@ics.uci.edu

Keywords

This publication has 3 references indexed in Scilit:

A general framework for estimating the relative pathogenicity of human genetic variants
Nature Genetics, 2014
Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants
Nature, 2012
One-stop shop for disease genes
Nature, 2012

Cited by 815 articles