Examination of Criteria for Local Model Principal Component Regression

Abstract
In analytical chemistry, principal component regression (PCR) is widely used as a method for calibration and prediction. The motivation behind PCR is to select factors associated with predictive information and eliminate those associated with noise. The classical approach, referred to as top-down selection, chooses sequential factors based on singular value magnitudes, and the same factors are used for all future unknown samples; i.e., a global model is formed. The number of factors needed is often determined through cross-validation on the calibration samples or with an external validation set. Alternatively, a model developed specific to an unknown sample, i.e., a local model or sample-dependent model, could offer improved accuracy. The idea behind sample-dependent PCR is that factors associated with small singular values not included in a top-down PCR model can still contain relevant predictive information. This paper shows that local models generated by selecting factors on a sample-by-sample basis often reduce prediction errors compared with those for the global top-down model. However, evidence is also provided that supports the use of global top-down models. Several criteria are proposed and examined for selecting factors on a sample-dependent basis. Observations and conclusions presented are based on two near-infrared data sets.