Improving molecular cancer class discovery through sparse non-negative matrix factorization

Open Access

8 September 2005

journal article
research article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 21 (21), 3970-3975
https://doi.org/10.1093/bioinformatics/bti653

Abstract

Motivation: Identifying different cancer classes or subclasses with similar morphological appearances presents a challenging problem and has important implication in cancer diagnosis and treatment. Clustering based on gene-expression data has been shown to be a powerful method in cancer class discovery. Non-negative matrix factorization is one such method and was shown to be advantageous over other clustering techniques, such as hierarchical clustering or self-organizing maps. In this paper, we investigate the benefit of explicitly enforcing sparseness in the factorization process. Results: We report an improved unsupervised method for cancer classification by the use of gene-expression profile via sparse non-negative matrix factorization. We demonstrate the improvement by direct comparison with classic non-negative matrix factorization on the three well-studied datasets. In addition, we illustrate how to identify a small subset of co-expressed genes that may be directly involved in cancer. Contact:g1m1c1@receptor.med.harvard.edu, ygao@receptor.med.harvard.edu Supplementary information:http://arep.med.harvard.edu/snmf/supplement.htm

Keywords

This publication has 17 references indexed in Scilit:

Metagenes and molecular pattern discovery using matrix factorization
Proceedings of the National Academy of Sciences of the United States of America, 2004
Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data
Machine Learning, 2003
PCA disjoint models for multiclass cancer analysis using gene expression data
Bioinformatics, 2003
Molecular classification of cutaneous malignant melanoma by gene expression profiling
Nature, 2000
Tissue Classification with Gene Expression Profiles
Journal of Computational Biology, 2000
Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling
Nature, 2000
Learning the parts of objects by non-negative matrix factorization
Nature, 1999
Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring
Science, 1999
Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays
Proceedings of the National Academy of Sciences of the United States of America, 1999
Cluster analysis and display of genome-wide expression patterns
Proceedings of the National Academy of Sciences of the United States of America, 1998

Cited by 252 articles