A machine learning approach to optimizing cell-free DNA sequencing panels: with an application to prostate cancer
Open Access
- 28 August 2020
- journal article
- research article
- Published by Springer Science and Business Media LLC in BMC Cancer
- Vol. 20 (1), 1-9
- https://doi.org/10.1186/s12885-020-07318-x
Abstract
Cell-free DNA’s (cfDNA) use as a biomarker in cancer is challenging due to genetic heterogeneity of malignancies and rarity of tumor-derived molecules. Here we describe and demonstrate a novel machine-learning guided panel design strategy for improving the detection of tumor variants in cfDNA. Using this approach, we first generated a model to classify and score candidate variants for inclusion on a prostate cancer targeted sequencing panel. We then used this panel to screen tumor variants from prostate cancer patients with localized disease in both in silico and hybrid capture settings. Whole Genome Sequence (WGS) data from 550 prostate tumors was analyzed to build a targeted sequencing panel of single point and small (< 200 bp) indel mutations, which was subsequently screened in silico against prostate tumor sequences from 5 patients to assess performance against commonly used alternative panel designs. The panel’s ability to detect tumor-derived cfDNA variants was then assessed using prospectively collected cfDNA and tumor foci from a test set 18 prostate cancer patients with localized disease undergoing radical proctectomy. The panel generated from this approach identified as top candidates mutations in known driver genes (e.g. HRAS) and prostate cancer related transcription factor binding sites (e.g. MYC, AR). It outperformed two commonly used designs in detecting somatic mutations found in the cfDNA of 5 prostate cancer patients when analyzed in an in silico setting. Additionally, hybrid capture and 2500X sequencing of cfDNA molecules using the panel resulted in detection of tumor variants in all 18 patients of a test set, where 15 of the 18 patients had detected variants found in multiple foci. Machine learning-prioritized targeted sequencing panels may prove useful for broad and sensitive variant detection in the cfDNA of heterogeneous diseases. This strategy has implications for disease detection and monitoring when applied to the cfDNA isolated from prostate cancer patients.Keywords
Funding Information
- National Institutes of Health (CA088164, CA201358)
This publication has 38 references indexed in Scilit:
- Enrichr: interactive and collaborative HTML5 gene list enrichment analysis toolBMC Bioinformatics, 2013
- Analysis of Circulating Tumor DNA to Monitor Metastatic Breast CancerThe New England Journal of Medicine, 2013
- Detection of Chromosomal Alterations in the Circulation of Cancer Patients with Whole-Genome SequencingScience Translational Medicine, 2012
- Exome sequencing identifies recurrent SPOP, FOXA1 and MED12 mutations in prostate cancerNature Genetics, 2012
- A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEffFly, 2012
- LITAF and TNFSF15, two downstream targets of AMPK, exert inhibitory effects on tumor growthOncogene, 2011
- Accumulation of driver and passenger mutations during tumor progressionProceedings of the National Academy of Sciences of the United States of America, 2010
- Integrins in cancer: biological implications and therapeutic opportunitiesNature Reviews Cancer, 2010
- Circulating mutant DNA to assess tumor dynamicsNature Medicine, 2008
- Patterns of somatic mutation in human cancer genomesNature, 2007