Comparing methods for handling missing values in food-frequency questionnaires and proposing k nearest neighbours imputation: effects on dietary intake in the Norwegian Women and Cancer study (NOWAC)
- 1 April 2008
- journal article
- research article
- Published by Cambridge University Press (CUP) in Public Health Nutrition
- Vol. 11 (4), 361-370
- https://doi.org/10.1017/s1368980007000365
Abstract
Objective: To investigate item non-response in a postal food-frequency questionnaire (FFQ), and to assess the effect of substituting/imputing missing values on dietary intake levels in the Norwegian Women and Cancer study (NOWAC). We have adapted and probably for the first time applied k nearest neighbours (KNN) imputation to FFQ data.Design: Data from a recent reproducibility study were used. The FFQ was mailed twice (test–retest) about 3 months apart to the same subjects. Missing responses in the test FFQ were imputed using the null value (frequencies = null, amount = smallest), the sample mode, the sample median, KNN, and retest values.Setting: A methodological substudy of NOWAC, a national population-based cohort.Subjects: A random sample of 2000 women aged 46–75 years was drawn from the cohort in 2002 (response 75%). The imputation methods were compared for 1430 women who completed at least 50% of the test FFQ.Results: We imputed 16% missing values in the overall test data matrix. Compared to null value imputation, the largest differences in estimated dietary intake were seen for KNN, and for food items with a high proportion of missing. Imputation with retest values increased total energy intake, indicating that not all missing values are caused by respondents failing to specify no consumption, and that null value imputation may lead to underestimation and misclassification.Conclusion: Missing values in FFQs present a methodological challenge. We encourage the application and evaluation of newer imputation methods, including KNN, which may reduce imputation errors and give more accurate intake estimates.Keywords
This publication has 20 references indexed in Scilit:
- Analysis of the Benefits of a Mediterranean Diet in the GISSI-Prevenzione Study: A Case Study in Imputation of Missing Values from Repeated MeasurementsEuropean Journal of Epidemiology, 2006
- The influence of missing value imputation on detection of differentially expressed genes from microarray dataBioinformatics, 2005
- The CAFE computer program for nutritional analysis of the EPIC‐Norfolk food frequency questionnaire and identification of extreme nutrient valuesJournal of Human Nutrition and Dietetics, 2005
- Development, validation and utilisation of food-frequency questionnaires – a reviewPublic Health Nutrition, 2002
- Validation and calibration of food-frequency questionnaire measurements in the Northern Sweden Health and Disease cohortPublic Health Nutrition, 2002
- Missing data: Our view of the state of the art.Psychological Methods, 2002
- Standardization of the 24-hour diet recall calibration method used in the European Prospective Investigation into Cancer and Nutrition (EPIC): general concepts and preliminary resultsEuropean Journal of Clinical Nutrition, 2000
- A Search for Recall Bias in a Case-Control Study of Diet and Breast CancerInternational Journal of Epidemiology, 1996
- Dietary fat and the risk of breast cancer: A prospective study of 25,892 Norwegian womenInternational Journal of Cancer, 1995
- MAILED DIETARY SURVEYSEpidemiology, 1991