Input-dependent estimation of generalization error under covariate shift
- 1 January 2005
- journal article
- research article
- Published by Walter de Gruyter GmbH in Statistics & Risk Modeling
- Vol. 23 (4/2005), 249-279
- https://doi.org/10.1524/stnd.2005.23.4.249
Abstract
A common assumption in supervised learning is that the training and test input points follow the same probability distribution. However, this assumption is not fulfilled, e.g., in interpolation, extrapolation, active learning, or classification with imbalanced data. The violation of this assumption-known as the covariate shift-causes a heavy bias in standard generalization error estimation schemes such as cross-validation or Akaike's information criterion, and thus they result in poor model selection. In this paper, we propose an alternative estimator of the generalization error for the squared loss function when training and test distributions are different. The proposed generalization error estimator is shown to be exactly unbiased for finite samples if the learning target function is realizable and asymptotically unbiased in general. We also show that, in addition to the unbiasedness, the proposed generalization error estimator can accurately estimate the difference of the generalization error among different models, which is a desirable property in model selection. Numerical studies show that the proposed method compares favorably with existingmodel selection methods in regression for extrapolation and in classification with imbalanced data.Keywords
This publication has 30 references indexed in Scilit:
- EditorialACM SIGKDD Explorations Newsletter, 2004
- Trading Variance Reduction with Unbiasedness: The Regularized Subspace Information Criterion for Robust Model Selection in Kernel RegressionNeural Computation, 2004
- Active learning algorithm using the maximum weighted log-likelihood estimatorJournal of Statistical Planning and Inference, 2003
- Active Learning with Support Vector Machines in the Drug Discovery ProcessJournal of Chemical Information and Computer Sciences, 2003
- Subspace Information Criterion for Model SelectionNeural Computation, 2001
- Bayesian Regularization and Pruning Using a Laplace PriorNeural Computation, 1995
- Network information criterion-determining the number of hidden units for an artificial neural network modelIEEE Transactions on Neural Networks, 1994
- Asymptotic Optimality of $C_L$ and Generalized Cross-Validation in Ridge Regression with Application to Spline SmoothingThe Annals of Statistics, 1986
- A new look at the statistical model identificationIEEE Transactions on Automatic Control, 1974
- Theory of Reproducing KernelsTransactions of the American Mathematical Society, 1950