Are Fix-Inducing Changes a Moving Target? A Longitudinal Case Study of Just-In-Time Defect Prediction

12 April 2017

journal article
research article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Software Engineering

Vol. 44 (5), 412-428
https://doi.org/10.1109/tse.2017.2693980

Abstract

Just-In-Time (JIT) models identify fix-inducing code changes. JIT models are trained using techniques that assume that past fix-inducing changes are similar to future ones. However, this assumption may not hold, e.g., as system complexity tends to accrue, expertise may become more important as systems age. In this paper, we study JIT models as systems evolve. Through a longitudinal case study of 37,524 changes from the rapidly evolving QT and OPENSTACK systems, we find that fluctuations in the properties of fix-inducing changes can impact the performance and interpretation of JIT models. More specifically: (a) the discriminatory power (AUC) and calibration (Brier) scores of JIT models drop considerably one year after being trained; (b) the role that code change properties (e.g., Size, Experience) play within JIT models fluctuates over time; and (c) those fluctuations yield consistent over- or underestimates of the future impact of code change properties on the likelihood of inducing fixes. To avoid erroneous or misleading predictions, JIT models should be retrained using recently recorded data (within three months). Moreover, quality improvement plans should be informed by JIT models that are trained using six months (or more) of historical data, since they are more resilient to period-specific fluctuations in importance.

Keywords

Funding Information

Natural Sciences and Engineering Research Council of Canada (NSERC)
JSPS KAKENHI (15H05306)

This publication has 42 references indexed in Scilit:

The Impact of Mislabelling on the Performance and Interpretation of Defect Prediction Models
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
Revisiting the Impact of Classification Techniques on the Performance of Defect Prediction Models
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
Investigating Code Review Practices in Defective Files: An Empirical Study of the Qt System
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
An empirical study of the impact of modern code review practices on software quality
Empirical Software Engineering, 2015
On the dataset shift problem in software engineering prediction models
Empirical Software Engineering, 2011
Predicting Bugs from History
Published by Springer Science and Business Media LLC ,2008
Predicting risk of software changes
Bell Labs Technical Journal, 2002
Predicting fault incidence using software change history
IEEE Transactions on Software Engineering, 2000
Understanding the sources of variation in software inspections
ACM Transactions on Software Engineering and Methodology, 1998
Regression modelling strategies for improved prognostic prediction
Statistics in Medicine, 1984

Cited by 128 articles