Kernel Regression Estimation Using Repeated Measurements Data

Abstract
The estimation of growth curves has been studied extensively in parametric situations. Here we consider the nonparametric estimation of an average growth curve. Suppose that there are observations from several experimental units, each following the regression model y(xi)=f(xj)+ε(j=1,…,n), where ε1, …, ε n are correlated zero mean errors and 0≤x1<…n≤1 are fixed constants. We study some of the properties of a kernel estimator of f(x). Asymptotic and finite-sample results concerning the mean squared error of the estimator are obtained. In particular, the influence of correlation on the bandwidth minimizing mean squared error is discussed. A data-based method for selecting the bandwidth is illustrated in a data analysis. Most previous research on kernel regression estimators has involved uncorrelated errors. We investigate how dependence of the errors changes the behavior of a kernel estimator. Our theorems concerning the asymptotic mean squared error show that the estimator cannot be consistent unless the number of experimental units tends to infinity. This contrasts with the results for uncorrelated errors, where the estimator is consistent when the number of distinct x values tends to infinity. Finite-sample results indicate that the optimum bandwidth when the errors are correlated can be either larger or smaller than the optimum bandwidth with uncorrelated errors. If the number of x values is not large and/or the errors are highly positively correlated, the optimum bandwidth tends to be smaller than when the errors are uncorrelated. This is contrary to existing examples wherein serially correlated errors require larger than usual bandwidths. In our data analysis, we choose the bandwidth that minimizes an estimate of the mean average squared error while taking into account the presence of correlated errors. Using the same data we show that ignoring correlation leads to an oversmoothed kernel estimate. An analytic result illustrates that this phenomenon is not necessarily an anomaly of the data.