A novel post hoc method for detecting index switching finds no evidence for increased switching on the Illumina HiSeq X

20 September 2017

journal article
research article
Published by Wiley in Molecular Ecology Resources

Vol. 18 (1), 169-175
https://doi.org/10.1111/1755-0998.12713

Abstract

High-throughput sequencing using the Illumina HiSeq platform is a pervasive and critical molecular ecology resource, and has provided the data underlying many recent advances. A recent study has suggested that “index switching,” where reads are misattributed to the wrong sample, may be higher in new versions of the HiSeq platform. This has the potential to invalidate both published and in-progress work across the field. Here, we test for evidence of index switching in an exemplar whole-genome shotgun data set sequenced on both the Illumina HiSeq 2500, which should not have the problem, and the Illumina HiSeq X, which may. We leverage unbalanced heterozygotes, which may be produced by index switching, and ask whether the undersequenced allele is more likely to be found in other samples in the same lane than expected based on the allele frequency. Although we validate the sensitivity of this method using simulations, we find that neither the HiSeq 2500 nor the HiSeq X has evidence of index switching. This suggests that, thankfully, index switching may not be a ubiquitous problem in HiSeq X sequence data. Lastly, we provide scripts for applying our method so that index switching can be tested for in other data sets.

Keywords

This publication has 20 references indexed in Scilit:

Consequences of Normalizing Transcriptomic and Genomic Libraries of Plant Genomes Using a Duplex-Specific Nuclease and Tetramethylammonium Chloride
PLOS ONE, 2013
Detecting and Estimating Contamination of Human DNA Samples in Sequencing and Array-Based Genotype Data
American Journal of Human Genetics, 2012
The sunflower (Helianthus annuus L.) genome reflects a recent history of biased accumulation of transposable elements
The Plant Journal, 2012
Cost-effective, high-throughput DNA sequencing libraries for multiplexed target capture
Genome Research, 2012
The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data
Genome Research, 2010
Normalization of genomic DNA using duplex-specific nuclease
BioTechniques, 2010
Fast and accurate long-read alignment with Burrows–Wheeler transform
Bioinformatics, 2010
The Sequence Alignment/Map format and SAMtools
Bioinformatics, 2009
Fast and accurate short read alignment with Burrows–Wheeler transform
Bioinformatics, 2009
PAnnBuilder: an R package for assembling proteomic annotation data
Bioinformatics, 2009

Cited by 21 articles