The Impact of Mislabelling on the Performance and Interpretation of Defect Prediction Models
- 1 May 2015
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 1, 812-823
- https://doi.org/10.1109/icse.2015.93
Abstract
The reliability of a prediction model depends on the quality of the data from which it was trained. Therefore, defect prediction models may be unreliable if they are trained using noisy data. Recent research suggests that randomly-injected noise that changes the classification (label) of software modules from defective to clean (and vice versa) can impact the performance of defect models. Yet, in reality, incorrectly labelled (i.e., mislabelled) issue reports are likely non-random. In this paper, we study whether mislabelling is random, and the impact that realistic mislabelling has on the performance and interpretation of defect models. Through a case study of 3,931 manually-curated issue reports from the Apache Jackrabbit and Lucene systems, we find that: (1) issue report mislabelling is not random; (2) precision is rarely impacted by mislabelled issue reports, suggesting that practitioners can rely on the accuracy of modules labelled as defective by models that are trained using noisy data; (3) however, models trained on noisy data typically achieve 56%-68% of the recall of models trained on clean data; and (4) only the metrics in top influence rank of our defect models are robust to the noise introduced by mislabelling, suggesting that the less influential metrics of models that are trained on noisy data should not be interpreted or used to make decisions.Keywords
This publication has 41 references indexed in Scilit:
- An empirical study of the classification performance of learners on imbalanced and noisy software quality dataInformation Sciences, 2014
- Sample size vs. bias in defect predictionPublished by Association for Computing Machinery (ACM) ,2013
- Software fault prediction metrics: A systematic literature reviewInformation and Software Technology, 2013
- Multi-layered approach for recovering links between bug reports and fixesPublished by Association for Computing Machinery (ACM) ,2012
- A systematic and comprehensive investigation of methods to build and evaluate fault prediction modelsJournal of Systems and Software, 2009
- Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction modelsEmpirical Software Engineering, 2008
- A comparative analysis of the efficiency of change metrics and static code attributes for defect predictionPublished by Association for Computing Machinery (ACM) ,2008
- Missing Data in Software EngineeringPublished by Springer Science and Business Media LLC ,2008
- Predicting risk of software changesBell Labs Technical Journal, 2002
- A Cluster Analysis Method for Grouping Means in the Analysis of VariancePublished by JSTOR ,1974