CancerSiamese: one-shot learning for predicting primary and metastatic tumor types unseen during model training
Open Access
- 12 May 2021
- journal article
- research article
- Published by Springer Science and Business Media LLC in BMC Bioinformatics
- Vol. 22 (1), 1-17
- https://doi.org/10.1186/s12859-021-04157-w
Abstract
The state-of-the-art deep learning based cancer type prediction can only predict cancer types whose samples are available during the training where the sample size is commonly large. In this paper, we consider how to utilize the existing training samples to predict cancer types unseen during the training. We hypothesize the existence of a set of type-agnostic expression representations that define the similarity/dissimilarity between samples of the same/different types and propose a novel one-shot learning model called CancerSiamese to learn this common representation. CancerSiamese accepts a pair of query and support samples (gene expression profiles) and learns the representation of similar or dissimilar cancer types through two parallel convolutional neural networks joined by a similarity function. We trained CancerSiamese for cancer type prediction for primary and metastatic tumors using samples from the Cancer Genome Atlas (TCGA) and MET500. Network transfer learning was utilized to facilitate the training of the CancerSiamese models. CancerSiamese was tested for different N-way predictions and yielded an average accuracy improvement of 8% and 4% over the benchmark 1-Nearest Neighbor (1-NN) classifier for primary and metastatic tumors, respectively. Moreover, we applied the guided gradient saliency map and feature selection to CancerSiamese to examine 100 and 200 top marker-gene candidates for the prediction of primary and metastatic cancers, respectively. Functional analysis of these marker genes revealed several cancer related functions between primary and metastatic tumors. This work demonstrated, for the first time, the feasibility of predicting unseen cancer types whose samples are limited. Thus, it could inspire new and ingenious applications of one-shot and few-shot learning solutions for improving cancer diagnosis, prognostic, and our understanding of cancer.Keywords
Funding Information
- National Institutes of Health (P30CA54174, 1UL1RR025767-01, K99CA248944)
- CPRIT (RP190346, RP160732, RP190346)
This publication has 40 references indexed in Scilit:
- ALDOA functions as an oncogene in the highly metastatic pancreatic cancerCancer Letters, 2016
- A survey on feature selection methodsComputers and Electrical Engineering, 2014
- The Cancer Genome Atlas Pan-Cancer analysis projectNature Genetics, 2013
- Oxidative stress and cancer: An overviewAgeing Research Reviews, 2013
- Tissue‐based proteomics reveals FXYD3, S100A11 and GSTM3 as novel markers for regional lymph node metastasis in colon cancerThe Journal of Pathology, 2012
- Stat1 and CD74 overexpression is co-dependent and linked to increased invasion and lymph node metastasis in triple-negative breast cancerJournal of Proteomics, 2011
- Systematic and integrative analysis of large gene lists using DAVID bioinformatics resourcesNature Protocols, 2008
- Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene listsNucleic Acids Research, 2008
- Gene expression profiling of human lymph node metastases and matched primary breast carcinomas: Clinical implicationsMolecular Oncology, 2007
- Apoptosis in cancerCarcinogenesis: Integrative Cancer Research, 2000