Misannotated Multi-Nucleotide Variants in Public Cancer Genomics Datasets Lead to Inaccurate Mutation Calls with Significant Implications
Open Access
- 15 January 2021
- journal article
- research article
- Published by American Association for Cancer Research (AACR) in Cancer Research
- Vol. 81 (2), 282-288
- https://doi.org/10.1158/0008-5472.CAN-20-2151
Abstract
Although next-generation sequencing is widely used in cancer to profile tumors and detect variants, most somatic variant callers used in these pipelines identify variants at the lowest possible granularity, single-nucleotide variants (SNV). As a result, multiple adjacent SNVs are called individually instead of as a multinucleotide variants (MNV). With this approach, the amino acid change from the individual SNV within a codon could be different from the amino acid change based on the MNV that results from combining SNV, leading to incorrect conclusions about the downstream effects of the variants. Here, we analyzed 10,383 variant call files (VCF) from the Cancer Genome Atlas (TCGA) and found 12,141 incorrectly annotated MNVs. Analysis of seven commonly mutated genes from 178 studies in cBioPortal revealed that MNVs were consistently missed in 20 of these studies, whereas they were correctly annotated in 15 more recent studies. At the BRA]: V600 locus, the most common example of MNV, several public datasets reported separate BRAF V600E and BRAF V600M variants instead of a single merged V600K variant. VCFs from the TCGA Mutect2 caller were used to develop a solution to merge SNV to MNV. Our custom script used the phasing information from the SNV VCF and determined whether SNVs were at the same codon and needed to be merged into MNV before variant annotation. This study shows that institutions performing NGS sequencing for cancer genomics should incorporate the step of merging MNV as a best practice in their pipelines. Significance: Identification of incorrect mutation calls in TCGA, including clinically relevant BRAF V600 and KRAS G12, will influence research and potentially clinical decisions.Other Versions
Funding Information
- Bristol-Myers Squibb (NA)
This publication has 44 references indexed in Scilit:
- Integrative Analysis of Complex Cancer Genomics and Clinical Profiles Using the cBioPortalScience Signaling, 2013
- The somatic affairs of BRAF: tailored therapies for advanced malignant melanoma and orphan non-V600E (V600R-M) mutationsJournal of Clinical Pathology, 2013
- Overwhelming response to Dabrafenib in a patient with double BRAF mutation (V600E; V600M) metastatic malignant melanomaJournal of Hematology & Oncology, 2012
- The cBio Cancer Genomics Portal: An Open Platform for Exploring Multidimensional Cancer Genomics DataCancer Discovery, 2012
- Mutation profiling identifies numerous rare drug targets and distinct mutation patterns in different clinical subtypes of breast cancersBreast Cancer Research and Treatment, 2012
- Routine Multiplex Mutational Profiling of Melanomas Enables Enrollment in Genotype-Driven Therapeutic TrialsPLOS ONE, 2012
- A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEffFly, 2012
- VarScan 2: Somatic mutation and copy number alteration discovery in cancer by exome sequencingGenome Research, 2012
- ANNOVAR: functional annotation of genetic variants from high-throughput sequencing dataNucleic Acids Research, 2010
- BRAF and KRAS mutations in stomach cancerOncogene, 2003