A general framework for estimating the relative pathogenicity of human genetic variants
Open Access
- 2 February 2014
- journal article
- research article
- Published by Springer Science and Business Media LLC in Nature Genetics
- Vol. 46 (3), 310-315
- https://doi.org/10.1038/ng.2892
Abstract
Jay Shendure, Greg Cooper and colleagues report a framework for annotation of genetic variation, Combined Annotation–Dependent Depletion (CADD), integrating diverse annotations into a single C score. They show that C scores correlate with annotations of functionality, pathogenicity and experimentally measured regulatory effects. Current methods for annotating and interpreting human genetic variation tend to exploit a single information type (for example, conservation) and/or are restricted in scope (for example, to missense changes). Here we describe Combined Annotation–Dependent Depletion (CADD), a method for objectively integrating many diverse annotations into a single measure (C score) for each variant. We implement CADD as a support vector machine trained to differentiate 14.7 million high-frequency human-derived alleles from 14.7 million simulated variants. We precompute C scores for all 8.6 billion possible human single-nucleotide variants and enable scoring of short insertions-deletions. C scores correlate with allelic diversity, annotations of functionality, pathogenicity, disease severity, experimentally measured regulatory effects and complex trait associations, and they highly rank known pathogenic variants within individual genomes. The ability of CADD to prioritize functional, deleterious and pathogenic variants across many functional categories, effect sizes and genetic architectures is unmatched by any current single-annotation method.Keywords
This publication has 59 references indexed in Scilit:
- An integrated map of genetic variation from 1,092 human genomesNature, 2012
- An integrated encyclopedia of DNA elements in the human genomeNature, 2012
- Architecture of the human regulatory network derived from ENCODE dataNature, 2012
- De Novo Gene Disruptions in Children on the Autistic SpectrumNeuron, 2012
- Massively parallel functional dissection of mammalian enhancers in vivoNature Biotechnology, 2012
- dbNSFP: A lightweight database of human nonsynonymous SNPs and their functional predictionsHuman Mutation, 2011
- Improving the Assessment of the Outcome of Nonsynonymous SNVs with a Consensus Deleteriousness Score, CondelAmerican Journal of Human Genetics, 2011
- From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locusNature, 2010
- High-resolution analysis of DNA regulatory elements by synthetic saturation mutagenesisNature Biotechnology, 2009
- High-Resolution Mapping and Characterization of Open Chromatin across the GenomeCell, 2008