Detection of structural variants and indels within exome data

Abstract
The Splitread algorithm uses a split-read strategy to detect structural variants and small insertions and deletions (indels) in whole-exome and whole-genome sequence datasets at high sensitivity. It maps the breakpoints at single-base-pair resolution, even in low-complexity regions, and can detect novel processed pseudogenes. We report an algorithm to detect structural variation and indels from 1 base pair (bp) to 1 Mbp within exome sequence data sets. Splitread uses one end–anchored placements to cluster the mappings of subsequences of unanchored ends to identify the size, content and location of variants with high specificity and sensitivity. The algorithm discovers indels, structural variants, de novo events and copy number–polymorphic processed pseudogenes missed by other methods.