Multiple Suboptimal Solutions for Prediction Rules in Gene Expression Data

Open Access

16 April 2013

journal article
research article
Published by Hindawi Limited in Computational and Mathematical Methods in Medicine

Vol. 2013, 1-14
https://doi.org/10.1155/2013/798189

Abstract

This paper discusses mathematical and statistical aspects in analysis methods applied to microarray gene expressions. We focus on pattern recognition to extract informative features embedded in the data for prediction of phenotypes. It has been pointed out that there are severely difficult problems due to the unbalance in the number of observed genes compared with the number of observed subjects. We make a reanalysis of microarray gene expression published data to detect many other gene sets with almost the same performance. We conclude in the current stage that it is not possible to extract only informative genes with high performance in the all observed genes. We investigate the reason why this difficulty still exists even though there are actively proposed analysis methods and learning algorithms in statistical machine learning approaches. We focus on the mutual coherence or the absolute value of the Pearson correlations between two genes and describe the distributions of the correlation for the selected set of genes and the total set. We show that the problem of finding informative genes in high dimensional data is ill-posed and that the difficulty is closely related with the mutual coherence.

Keywords

This publication has 31 references indexed in Scilit:

A boosting method for maximizing the partial area under the ROC curve
BMC Bioinformatics, 2010
Computation for ChIP-seq and RNA-seq studies
Nature Methods, 2009
RNA-Seq: a revolutionary tool for transcriptomics
Nature Reviews Genetics, 2009
Mapping and quantifying mammalian transcriptomes by RNA-Seq
Nature Methods, 2008
Learning from positive examples when the negative class is undetermined- microRNA gene identification
Algorithms for Molecular Biology, 2008
Concordance among Gene-Expression–Based Predictors for Breast Cancer
The New England Journal of Medicine, 2006
A Multigene Assay to Predict Recurrence of Tamoxifen-Treated, Node-Negative Breast Cancer
The New England Journal of Medicine, 2004
Gene expression profiling predicts clinical outcome of breast cancer
Nature, 2002
Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications
Proceedings of the National Academy of Sciences of the United States of America, 2001
Initial sequencing and analysis of the human genome
Nature, 2001

Cited by 1 article