MINIMUM REDUNDANCY FEATURE SELECTION FROM MICROARRAY GENE EXPRESSION DATA

Top Cited Papers

1 April 2005

journal article
research article
Published by World Scientific Pub Co Pte Ltd in Journal of Bioinformatics and Computational Biology

Vol. 03 (02), 185-205
https://doi.org/10.1142/s0219720005001004

Abstract

How to selecting a small subset out of the thousands of genes in microarray data is important for accurate classification of phenotypes. Widely used methods typically rank genes according to their differential expressions among phenotypes and pick the top-ranked genes. We observe that feature sets so obtained have certain redundancy and study methods to minimize it. We propose a minimum redundancy — maximum relevance (MRMR) feature selection framework. Genes selected via MRMR provide a more balanced coverage of the space and capture broader characteristics of phenotypes. They lead to significantly improved class predictions in extensive experiments on 6 gene expression data sets: NCI, Lymphoma, Lung, Child Leukemia, Leukemia, and Colon. Improvements are observed consistently among 4 classification methods: Naïve Bayes, Linear discriminant analysis, Logistic regression, and Support vector machines. Supplimentary: The top 60 MRMR genes for each of the datasets are listed in . More information related to MRMR methods can be found at .

Keywords

This publication has 22 references indexed in Scilit:

Classification of multiple cancer types by multicategory support vector machines using gene expression data
Bioinformatics, 2003
A comparison of methods for multiclass support vector machines
IEEE Transactions on Neural Networks, 2002
Diversity of gene expression in adenocarcinoma of the lung
Proceedings of the National Academy of Sciences of the United States of America, 2001
Multi-class protein fold recognition using support vector machines and neural networks
Bioinformatics, 2001
Support vector machine classification and validation of cancer tissue samples using microarray expression data
Bioinformatics, 2000
Tissue Classification with Gene Expression Profiles
Journal of Computational Biology, 2000
Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling
Nature, 2000
Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring
Science, 1999
Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays
Proceedings of the National Academy of Sciences of the United States of America, 1999
Wrappers for feature subset selection
Artificial Intelligence, 1997

Cited by 1695 articles