Unobserved classes and extra variables in high-dimensional discriminant analysis

Open Access

1 March 2022

journal article
research article
Published by Springer Science and Business Media LLC in Advances in Data Analysis and Classification

Vol. 16 (1), 55-92
https://doi.org/10.1007/s11634-021-00474-3

Abstract

In supervised classification problems, the test set may contain data points belonging to classes not observed in the learning phase. Moreover, the same units in the test data may be measured on a set of additional variables recorded at a subsequent stage with respect to when the learning sample was collected. In this situation, the classifier built in the learning phase needs to adapt to handle potential unknown classes and the extra dimensions. We introduce a model-based discriminant approach, Dimension-Adaptive Mixture Discriminant Analysis (D-AMDA), which can detect unobserved classes and adapt to the increasing dimensionality. Model estimation is carried out via a full inductive approach based on an EM algorithm. The method is then embedded in a more general framework for adaptive variable selection and classification suitable for data of large dimensions. A simulation study and an artificial experiment related to classification of adulterated honey samples are used to validate the ability of the proposed framework to deal with complex situations.

Keywords

62H30

Funding Information

Science Foundation Ireland (SFI/12/RC/2289_P2)
Agence Nationale de la Recherche (ANR-19-P3IA-0002)

This publication has 49 references indexed in Scilit:

Sparse Discriminant Analysis
Technometrics, 2011
Variable selection and updating in model-based discriminant analysis for high dimensional data with food authenticity applications
The Annals of Applied Statistics, 2010
Model-Based Clustering, Discriminant Analysis, and Density Estimation
Journal of the American Statistical Association, 2002
Regularized Gaussian Discriminant Analysis through Eigenvalue Decomposition
Journal of the American Statistical Association, 1996
Authentication of Food and Food Ingredients by near Infrared Spectroscopy
Journal of Near Infrared Spectroscopy, 1996
Gaussian parsimonious clustering models
Pattern Recognition, 1995
Bias associated with the discriminant analysis approach to the estimation of mixing proportions
Pattern Recognition, 1989
Regularized Discriminant Analysis
Journal of the American Statistical Association, 1989
Comparing partitions
Journal of Classification, 1985
Estimating the Dimension of a Model
The Annals of Statistics, 1978

Cited by 2 articles