ISOLATE: a computational strategy for identifying the primary origin of cancers using high-throughput sequencing

Open Access

19 June 2009

journal article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 25 (21), 2882-2889
https://doi.org/10.1093/bioinformatics/btp378

Abstract

Motivation: One of the most deadly cancer diagnoses is the carcinoma of unknown primary origin. Without the knowledge of the site of origin, treatment regimens are limited in their specificity and result in high mortality rates. Though supervised classification methods have been developed to predict the site of origin based on gene expression data, they require large numbers of previously classified tumors for training, in part because they do not account for sample heterogeneity, which limits their application to well-studied cancers. Results: We present ISOLATE, a new statistical method that simultaneously predicts the primary site of origin of cancers and addresses sample heterogeneity, while taking advantage of new high-throughput sequencing technology that promises to bring higher accuracy and reproducibility to gene expression profiling experiments. ISOLATE makes predictions de novo, without having seen any training expression profiles of cancers with identified origin. Compared with previous methods, ISOLATE is able to predict the primary site of origin, de-convolve and remove the effect of sample heterogeneity and identify differentially expressed genes with higher accuracy, across both synthetic and clinical datasets. Methods such as ISOLATE are invaluable tools for clinicians faced with carcinomas of unknown primary origin. Availability: ISOLATE is available for download at: http://morrislab.med.utoronto.ca/software Contact: gerald.quon@utoronto.ca; quaid.morris@utoronto.ca Supplementary information: Supplementary data are available at Bioinformatics online.

This publication has 39 references indexed in Scilit:

An Integrated Genomic Analysis of Human Glioblastoma Multiforme
Science, 2008
Core Signaling Pathways in Human Pancreatic Cancers Revealed by Global Genomic Analyses
Science, 2008
RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays
Genome Research, 2008
Mapping and quantifying mammalian transcriptomes by RNA-Seq
Nature Methods, 2008
Probabilistic Latent Variable Models as Nonnegative Factorizations
Computational Intelligence and Neuroscience, 2008
How many human genes can be defined as housekeeping with current expression data?
BMC Genomics, 2008
Gene expression profiling may improve diagnosis in patients with carcinoma of unknown primary
British Journal of Cancer, 2008
Assessing natural variations in gene expression in humans by comparing with monozygotic twins using microarrays
Physiological Genomics, 2005
A gene atlas of the mouse and human protein-encoding transcriptomes
Proceedings of the National Academy of Sciences of the United States of America, 2004
Systematic variation in gene expression patterns in human cancer cell lines
Nature Genetics, 2000

Cited by 48 articles