vipR: variant identification in pooled DNA using R

Open Access

14 June 2011

journal article
research article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 27 (13), i77-i84
https://doi.org/10.1093/bioinformatics/btr205

Abstract

Motivation: High-throughput-sequencing (HTS) technologies are the method of choice for screening the human genome for rare sequence variants causing susceptibility to complex diseases. Unfortunately, preparation of samples for a large number of individuals is still very cost- and labor intensive. Thus, recently, screens for rare sequence variants were carried out in samples of pooled DNA, in which equimolar amounts of DNA from multiple individuals are mixed prior to sequencing with HTS. The resulting sequence data, however, poses a bioinformatics challenge: the discrimination of sequencing errors from real sequence variants present at a low frequency in the DNA pool. Results: Our method vipR uses data from multiple DNA pools in order to compensate for differences in sequencing error rates along the sequenced region. More precisely, instead of aiming at discriminating sequence variants from sequencing errors, vipR identifies sequence positions that exhibit significantly different minor allele frequencies in at least two DNA pools using the Skellam distribution. The performance of vipR was compared with three other models on data from a targeted resequencing study of the TMEM132D locus in 600 individuals distributed over four DNA pools. Performance of the methods was computed on SNPs that were also genotyped individually using a MALDI-TOF technique. On a set of 82 sequence variants, vipR achieved an average sensitivity of 0.80 at an average specificity of 0.92, thus outperforming the reference methods by at least 0.17 in specificity at comparable sensitivity. Availability: The code of vipR is freely available via: http://sourceforge.net/projects/htsvipr/ Contact:altmann@mpipsykl.mpg.de

This publication has 21 references indexed in Scilit:

A map of human genome variation from population-scale sequencing
Nature, 2010
A statistical method for the detection of variants from next-generation resequencing of DNA pools
Bioinformatics, 2010
Finding the missing heritability of complex diseases
Nature, 2009
VarScan: variant detection in massively parallel sequencing of individual and pooled samples
Bioinformatics, 2009
The Sequence Alignment/Map format and SAMtools
Bioinformatics, 2009
Fast and accurate short read alignment with Burrows–Wheeler transform
Bioinformatics, 2009
Quantification of rare allelic variants from pooled genomic DNA
Nature Methods, 2009
Next-generation DNA sequencing
Nature Biotechnology, 2008
Substantial biases in ultra-short read data sets from high-throughput DNA sequencing
Nucleic Acids Research, 2008
Genome resequencing and genetic variation
Nature Biotechnology, 2008

Cited by 36 articles