A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff
Top Cited Papers
- 1 April 2012
- journal article
- Published by Taylor & Francis Ltd in Fly
- Vol. 6 (2), 80-92
- https://doi.org/10.4161/fly.19695
Abstract
We describe a new computer program, SnpEff, for rapidly categorizing the effects of variants in genome sequences. Once a genome is sequenced, SnpEff annotates variants based on their genomic locations and predicts coding effects. Annotated genomic locations include intronic, untranslated region, upstream, downstream, splice site, or intergenic regions. Coding effects such as synonymous or non-synonymous amino acid replacement, start codon gains or losses, stop codon gains or losses, or frame shifts can be predicted. Here the use of SnpEff is illustrated by annotating ~356,660 candidate SNPs in ~117 Mb unique sequences, representing a substitution rate of ~1/305 nucleotides, between the Drosophila melanogaster w1118; iso-2; iso-3 strain and the reference y1; cn1 bw1 sp1 strain. We show that ~15,842 SNPs are synonymous and ~4,467 SNPs are non-synonymous (N/S ~0.28). The remaining SNPs are in other categories, such as stop codon gains (38 SNPs), stop codon losses (8 SNPs), and start codon gains (297 SNPs) in the 5′UTR. We found, as expected, that the SNP frequency is proportional to the recombination frequency (i.e., highest in the middle of chromosome arms). We also found that start-gain or stop-lost SNPs in Drosophila melanogaster often result in additions of N-terminal or C-terminal amino acids that are conserved in other Drosophila species. It appears that the 5′ and 3′ UTRs are reservoirs for genetic variations that changes the termini of proteins during evolution of the Drosophila genus. As genome sequencing is becoming inexpensive and routine, SnpEff enables rapid analyses of whole-genome sequencing data to be performed by an individual laboratory.Keywords
This publication has 43 references indexed in Scilit:
- Using VAAST to Identify an X-Linked Disorder Resulting in Lethality in Male Infants Due to N-Terminal Acetyltransferase DeficiencyAmerican Journal of Human Genetics, 2011
- The variant call format and VCFtoolsBioinformatics, 2011
- Improving SNP discovery by base alignment qualityBioinformatics, 2011
- Identification of evolutionarily conserved non-AUG-initiated N-terminal extensions in human coding sequencesNucleic Acids Research, 2011
- ANNOVAR: functional annotation of genetic variants from high-throughput sequencing dataNucleic Acids Research, 2010
- DMD Trp3X nonsense mutation associated with a founder effect in North American families with mild Becker muscular dystrophyNeuromuscular Disorders, 2009
- The EDGE hypothesis: Epigenetically directed genetic errors in repeat-containing proteins (RCPs) involved in evolution, neuroendocrine signaling, and cancerFrontiers in Neuroendocrinology, 2008
- The complete genome of an individual by massively parallel DNA sequencingNature, 2008
- Using FlyAtlas to identify better Drosophila melanogaster models of human diseaseNature Genetics, 2007
- The Relative Rates of Evolution of Sex Chromosomes and AutosomesThe American Naturalist, 1987