Population Total Estimation in a Complex Survey by Nonparametric Model Calibration Using Penalty Function Method with Auxiliary Information Known at Cluster Levels

Abstract
Nonparametric methods are rich classes of statistical tools that have gained acceptance in most areas of statistics. They have been used in the past by researchers to fit missing values in the presence of auxiliary variables in a sampling survey. Nonparametric methods have been preferred to parametric methods because they make it possible to analyze data, estimate trends and conduct inference without having to fully specify a parametric model for the data. This study, therefore, presents some new attempts in the complex survey through the nonparametric imputation of missing values by the use of both penalized splines and neural networks. More precisely, the study adopted a neural network and penalized splines to estimate the functional relationship between the survey variable and the auxiliary variables. This complex survey data was sampled through a cluster - strata design where a population is divided into clusters which are in turn subdivided into strata. Once missing values have been imputed, this study performs a model calibration with auxiliary information assumed completely available at the cluster level. The reasoning behind model calibration is that if the calibration constraints are satisfied by the auxiliary variable, then it is expected that the fitted values of the variable of interest should satisfy such constraints too. The population total estimators are derived by treating the calibration problems at cluster level as optimization problems and solving it by the method of penalty functions. A Monte Carlo simulation was run to assess the finite sample performance of the estimators under complex survey data. The efficiency of the estimator’s performance was then checked by MSE criterion. A comparison of the penalized spline model calibration and neural network model calibration estimators was done with Horvitz Thompson estimator. From the results, the two nonparametric estimator’s performances seem closer to that of Horvitz Thompson estimator and are both unbiased and consistent.