Variable selection and updating in model-based discriminant analysis for high dimensional data with food authenticity applications
Open Access
- 1 March 2010
- journal article
- research article
- Published by Institute of Mathematical Statistics in The Annals of Applied Statistics
- Vol. 4 (1), 396-421
- https://doi.org/10.1214/09-aoas279
Abstract
Food authenticity studies are concerned with determining if food samples have been correctly labeled or not. Discriminant analysis methods are an integral part of the methodology for food authentication. Motivated by food authenticity applications, a model-based discriminant analysis method that includes variable selection is presented. The discriminant analysis model is fitted in a semi-supervised manner using both labeled and unlabeled data. The method is shown to give excellent classification performance on several high-dimensional multiclass food authenticity data sets with more variables than observations. The variables selected by the proposed method provide information about which variables are meaningful for classification purposes. A headlong search strategy for variable selection is shown to be efficient in terms of computation and achieves excellent classification performance. In applications to several food authenticity data sets, our proposed method outperformed default implementations of Random Forests, AdaBoost, transductive SVMs and Bayesian Multinomial Regression by substantial margins.Keywords
Other Versions
This publication has 49 references indexed in Scilit:
- Variable selection and updating in model-based discriminant analysis for high dimensional data with food authenticity applicationsThe Annals of Applied Statistics, 2010
- Geographic Classification of Extra Virgin Olive Oils from the Eastern Mediterranean by Chemometric Analysis of Visible and Near-Infrared Spectroscopic DataApplied Spectroscopy, 2003
- Model-Based Clustering, Discriminant Analysis, and Density EstimationJournal of the American Statistical Association, 2002
- Chemometric Processing of Visible and near Infrared Reflectance Spectra for Species Identification in Selected Raw Homogenised MeatsJournal of Near Infrared Spectroscopy, 1999
- A Decision-Theoretic Generalization of On-Line Learning and an Application to BoostingJournal of Computer and System Sciences, 1997
- Regularized Gaussian Discriminant Analysis through Eigenvalue DecompositionJournal of the American Statistical Association, 1996
- Authentication of Food and Food Ingredients by near Infrared SpectroscopyJournal of Near Infrared Spectroscopy, 1996
- Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam's WindowJournal of the American Statistical Association, 1994
- Normal Discrimination with Unclassified ObservationsJournal of the American Statistical Association, 1978
- Estimating the Dimension of a ModelThe Annals of Statistics, 1978