The Impact of Mislabelling on the Performance and Interpretation of Defect Prediction Models

1 May 2015

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

Vol. 1, 812-823
https://doi.org/10.1109/icse.2015.93

Abstract

The reliability of a prediction model depends on the quality of the data from which it was trained. Therefore, defect prediction models may be unreliable if they are trained using noisy data. Recent research suggests that randomly-injected noise that changes the classification (label) of software modules from defective to clean (and vice versa) can impact the performance of defect models. Yet, in reality, incorrectly labelled (i.e., mislabelled) issue reports are likely non-random. In this paper, we study whether mislabelling is random, and the impact that realistic mislabelling has on the performance and interpretation of defect models. Through a case study of 3,931 manually-curated issue reports from the Apache Jackrabbit and Lucene systems, we find that: (1) issue report mislabelling is not random; (2) precision is rarely impacted by mislabelled issue reports, suggesting that practitioners can rely on the accuracy of modules labelled as defective by models that are trained using noisy data; (3) however, models trained on noisy data typically achieve 56%-68% of the recall of models trained on clean data; and (4) only the metrics in top influence rank of our defect models are robust to the noise introduced by mislabelling, suggesting that the less influential metrics of models that are trained on noisy data should not be interpreted or used to make decisions.

Keywords

This publication has 41 references indexed in Scilit:

An empirical study of the classification performance of learners on imbalanced and noisy software quality data
Information Sciences, 2014
Sample size vs. bias in defect prediction
Published by Association for Computing Machinery (ACM) ,2013
Software fault prediction metrics: A systematic literature review
Information and Software Technology, 2013
Multi-layered approach for recovering links between bug reports and fixes
Published by Association for Computing Machinery (ACM) ,2012
A systematic and comprehensive investigation of methods to build and evaluate fault prediction models
Journal of Systems and Software, 2009
Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models
Empirical Software Engineering, 2008
A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction
Published by Association for Computing Machinery (ACM) ,2008
Missing Data in Software Engineering
Published by Springer Science and Business Media LLC ,2008
Predicting risk of software changes
Bell Labs Technical Journal, 2002
A Cluster Analysis Method for Grouping Means in the Analysis of Variance
Published by JSTOR ,1974

Cited by 97 articles