A novel post hoc method for detecting index switching finds no evidence for increased switching on the Illumina HiSeq X
- 20 September 2017
- journal article
- research article
- Published by Wiley in Molecular Ecology Resources
- Vol. 18 (1), 169-175
- https://doi.org/10.1111/1755-0998.12713
Abstract
High-throughput sequencing using the Illumina HiSeq platform is a pervasive and critical molecular ecology resource, and has provided the data underlying many recent advances. A recent study has suggested that “index switching,” where reads are misattributed to the wrong sample, may be higher in new versions of the HiSeq platform. This has the potential to invalidate both published and in-progress work across the field. Here, we test for evidence of index switching in an exemplar whole-genome shotgun data set sequenced on both the Illumina HiSeq 2500, which should not have the problem, and the Illumina HiSeq X, which may. We leverage unbalanced heterozygotes, which may be produced by index switching, and ask whether the undersequenced allele is more likely to be found in other samples in the same lane than expected based on the allele frequency. Although we validate the sensitivity of this method using simulations, we find that neither the HiSeq 2500 nor the HiSeq X has evidence of index switching. This suggests that, thankfully, index switching may not be a ubiquitous problem in HiSeq X sequence data. Lastly, we provide scripts for applying our method so that index switching can be tested for in other data sets.Keywords
This publication has 20 references indexed in Scilit:
- Consequences of Normalizing Transcriptomic and Genomic Libraries of Plant Genomes Using a Duplex-Specific Nuclease and Tetramethylammonium ChloridePLOS ONE, 2013
- Detecting and Estimating Contamination of Human DNA Samples in Sequencing and Array-Based Genotype DataAmerican Journal of Human Genetics, 2012
- The sunflower (Helianthus annuus L.) genome reflects a recent history of biased accumulation of transposable elementsThe Plant Journal, 2012
- Cost-effective, high-throughput DNA sequencing libraries for multiplexed target captureGenome Research, 2012
- The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing dataGenome Research, 2010
- Normalization of genomic DNA using duplex-specific nucleaseBioTechniques, 2010
- Fast and accurate long-read alignment with Burrows–Wheeler transformBioinformatics, 2010
- The Sequence Alignment/Map format and SAMtoolsBioinformatics, 2009
- Fast and accurate short read alignment with Burrows–Wheeler transformBioinformatics, 2009
- PAnnBuilder: an R package for assembling proteomic annotation dataBioinformatics, 2009