Gene expression profiling of breast cancer survivability by pooled cDNA microarray analysis using logistic regression, artificial neural networks and decision trees

Open Access

19 March 2013

journal article
Published by Springer Science and Business Media LLC in BMC Bioinformatics

Vol. 14 (1), 100
https://doi.org/10.1186/1471-2105-14-100

Abstract

Background: Microarray technology can acquire information about thousands of genes simultaneously. We analyzed published breast cancer microarray databases to predict five-year recurrence and compared the performance of three data mining algorithms of artificial neural networks (ANN), decision trees (DT) and logistic regression (LR) and two composite models of DT-ANN and DT-LR. The collection of microarray datasets from the Gene Expression Omnibus, four breast cancer datasets were pooled for predicting five-year breast cancer relapse. After data compilation, 757 subjects, 5 clinical variables and 13,452 genetic variables were aggregated. The bootstrap method, Mann-Whitney U test and 20-fold cross-validation were performed to investigate candidate genes with 100 most-significant p-values. The predictive powers of DT, LR and ANN models were assessed using accuracy and the area under ROC curve. The associated genes were evaluated using Cox regression. Results: The DT models exhibited the lowest predictive power and the poorest extrapolation when applied to the test samples. The ANN models displayed the best predictive power and showed the best extrapolation. The 21 most-associated genes, as determined by integration of each model, were analyzed using Cox regression with a 3.53-fold (95% CI: 2.24-5.58) increased risk of breast cancer five-year recurrence… Conclusions: The 21 selected genes can predict breast cancer recurrence. Among these genes, CCNB1, PLK1 and TOP2A are in the cell cycle G2/M DNA damage checkpoint pathway. Oncologists can offer the genetic information for patients when understanding the gene expression profiles on breast cancer recurrence.

Keywords

This publication has 27 references indexed in Scilit:

Weighted Change-Point Method for Detecting Differential Gene Expression in Breast Cancer Microarray Data
PLOS ONE, 2012
Application of microarray in breast cancer: An overview
Journal of Pharmacy And Bioallied Sciences, 2012
Novel metrics for evaluating improvement in discrimination: net reclassification and integrated discrimination improvement for normal variables and nested models
Statistics in Medicine, 2011
Microarray analysis of genes associated with cell surface NIS protein levels in breast cancer
BMC Research Notes, 2011
Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers
Statistics in Medicine, 2010
Merging microarray data from separate breast cancer studies provides a robust prognostic test
BMC Bioinformatics, 2008
Gene Expression Profiling in Breast Cancer: Understanding the Molecular Basis of Histologic Grade To Improve Prognosis
JNCI Journal of the National Cancer Institute, 2006
Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles
Proceedings of the National Academy of Sciences of the United States of America, 2005
Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer
The Lancet, 2005
National Institutes of Health Consensus Development Conference Statement: Adjuvant Therapy for Breast Cancer, November 1-3, 2000
JNCI Journal of the National Cancer Institute, 2001

Cited by 33 articles