On Classification with Incomplete Data
- 22 January 2007
- journal article
- research article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Pattern Analysis and Machine Intelligence
- Vol. 29 (3), 427-436
- https://doi.org/10.1109/tpami.2007.52
Abstract
We address the incomplete-data problem in which feature vectors to be classified are missing data (features). A (supervised) logistic regression algorithm for the classification of incomplete data is developed. Single or multiple imputation for the missing data is avoided by performing analytic integration with an estimated conditional density function (conditioned on the observed data). Conditional density functions are estimated using a Gaussian mixture model (GMM), with parameter estimation performed using both expectation-maximization (EM) and variational Bayesian EM (VB-EM). The proposed supervised algorithm is then extended to the semisupervised case by incorporating graph-based regularization. The semisupervised algorithm utilizes all available data-both incomplete and complete, as well as labeled and unlabeled. Experimental results of the proposed classification algorithms are shownKeywords
This publication has 10 references indexed in Scilit:
- Variational expectation-maximization training for Gaussian networksPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2004
- A Bayesian missing value estimation method for gene expression profile dataBioinformatics, 2003
- An efficient image similarity measure based on approximations of KL-divergence between two gaussian mixturesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Missing data: Our view of the state of the art.Psychological Methods, 2002
- Gaussian Processes and SVM: Mean Field and Leave-One-OutPublished by MIT Press ,2000
- The Evidence Framework Applied to Classification NetworksNeural Computation, 1992
- Incomplete Data in Generalized Linear ModelsJournal of the American Statistical Association, 1990
- Generalized Linear ModelsPublished by Springer Science and Business Media LLC ,1989
- Multiple Imputation for Nonresponse in SurveysWiley Series in Probability and Statistics, 1987
- The meaning and use of the area under a receiver operating characteristic (ROC) curve.Radiology, 1982