Software cost estimation with incomplete data
- 1 October 2001
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Software Engineering
- Vol. 27 (10), 890-908
- https://doi.org/10.1109/32.962560
Abstract
The construction of software cost estimation models remains an active topic of research. The basic premise of cost modeling is that a historical database of software project cost data can be used to develop a quantitative model to predict the cost of future projects. One of the difficulties faced by workers in this area is that many of these historical databases contain substantial amounts of missing data. Thus far, the common practice has been to ignore observations with missing data. In principle, such a practice can lead to gross biases and may be detrimental to the accuracy of cost estimation models. We describe an extensive simulation where we evaluate different techniques for dealing with missing data in the context of software cost modeling. Three techniques are evaluated: listwise deletion, mean imputation, and eight different types of hot-deck imputation. Our results indicate that all the missing data techniques perform well with small biases and high precision. This suggests that the simplest technique, listwise deletion, is a reasonable choice. However, this will not necessarily provide the best performance. Consistent best performance (minimal bias and highest precision) can be obtained by using hot-deck imputation with Euclidean distance and a z-score standardization.Keywords
This publication has 60 references indexed in Scilit:
- An effort estimation model for implementing ISO 9001Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Comments on: evaluating alternative software production functionsIEEE Transactions on Software Engineering, 1999
- Estimating software project effort using analogiesIEEE Transactions on Software Engineering, 1997
- A Monte Carlo analysis of missing data techniques in a HRM settingJournal of Management, 1995
- MISSING DATA: A CONCEPTUAL REVIEW FOR APPLIED PSYCHOLOGISTSPersonnel Psychology, 1994
- A two-stage imputation procedure for item nonresponse in surveysJournal of Business Research, 1991
- Multiple Imputation for Nonresponse in SurveysWiley Series in Probability and Statistics, 1987
- A Comparison of Methods for Treating Incomplete Data in Selection ResearchEducational and Psychological Measurement, 1987
- Missing Data in Evaluation ResearchEvaluation & the Health Professions, 1986
- Some simple procedures for handling missing data in multivariate analysisPsychometrika, 1976