Optimizing an ion semiconductor sequencing data analysis method to identify somatic mutations in the genomes of cancer cells in clinical tissue samples

Abstract
Identification of causal genomic alterations is an indispensable step in the implementation of personalized cancer medicine. Analytical methods play a central role in identifying such changes because of the vast amount of data produced by next generation sequencer. Most analytical techniques are designed for the Illumina platform and are therefore suboptimal for analyzing datasets generated by whole exome sequencing (WES) using the Ion Proton System. Accurate identification of somatic mutations requires the characterization of platform-dependent error profiles and genomic properties that affect the accuracy of sequence data as well as platform-oriented optimization of the pipeline. Therefore, we used the Ion Proton System to perform WES of DNAs isolated from tumor and matched control tissues of 1,058 patients with cancer who were treated at the Shizuoka Cancer Center Hospital. Among the initially identified candidate somatic single-nucleotide variants (SNVs), 10,279 were validated by manual inspection of the WES data followed by Sanger sequencing. These validated SNVs were used as an objective standard to determine an optimum cutoff value to improve the pipeline. Using this optimized pipeline analysis, 189,381 SNVs were identified in 1,101 samples. The analytical technique presented here is a useful resource for conducting clinical WES, particularly using semiconductor-based sequencing technology.