Supervised analysis when the number of candidate features (p) greatly exceeds the number of cases (n)

1 December 2003

journal article
Published by Association for Computing Machinery (ACM) in ACM SIGKDD Explorations Newsletter

Vol. 5 (2), 31-36
https://doi.org/10.1145/980972.980978

Abstract

New genomic and proteomic technologies provide measurements of thousands of features for each case. This provides a context for enhanced discovery and false discovery. Most statistical and machine learning procedures were not developed for the p>>n setting and the literature of DNA microarray studies contains many examples of mis-use of analytic and computatinal methods such a cross-validation. This paper highlights some of key aspects of p>>n problems for identifying informative features and developing accurate classifiers.

Keywords

This publication has 23 references indexed in Scilit:

Evolutionary algorithms for finding optimal gene sets in microarray prediction
Bioinformatics, 2003
Selection bias in gene extraction on the basis of microarray gene-expression data
Proceedings of the National Academy of Sciences of the United States of America, 2002
New feature subset selection procedures for classification of expression profiles
Genome Biology, 2002
Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data
Journal of the American Statistical Association, 2002
Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks
Nature Medicine, 2001
Singular value decomposition for genome-wide expression data processing and modeling
Proceedings of the National Academy of Sciences of the United States of America, 2000
Molecular classification of cutaneous malignant melanoma by gene expression profiling
Nature, 2000
Tissue Classification with Gene Expression Profiles
Journal of Computational Biology, 2000
'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns
Genome Biology, 2000
Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring
Science, 1999

Cited by 39 articles