On bandwidth choice for density estimation with dependent data

Abstract
We address the empirical bandwidth choice problem in cases where the range of dependence may be virtually arbitrarily long. Assuming that the observed data derive from an unknown function of a Gaussian process, it is argued that, unlike more traditional contexts of statistical inference, in density estimation there is no clear role for the classical distinction between short- and long-range dependence. Indeed, the "boundaries" that separate different modes of behaviour for optimal bandwidths and mean squared errors are determined more by kernel order than by traditional notions of strength of dependence, for example, by whether or not the sum of the covariances converges. We provide surprising evidence that, even for some strongly dependent data sequences, the asymptotically optimal bandwidth for independent data is a good choice. A plug-in empirical bandwidth selector based on this observation is suggested. We determine the properties of this choice for a wide range of different strengths of dependence. Properties of cross-validation are also addressed.