It's not a bug, it's a feature: How misclassification impacts bug prediction
Top Cited Papers
- 1 May 2013
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
In a manual examination of more than 7,000 issue reports from the bug databases of five open-source projects, we found 33.8% of all bug reports to be misclassified - that is, rather than referring to a code fix, they resulted in a new feature, an update to documentation, or an internal refactoring. This misclassification introduces bias in bug prediction models, confusing bugs and features: On average, 39% of files marked as defective actually never had a bug. We discuss the impact of this misclassification on earlier studies and recommend manual data validation for future studies.Keywords
This publication has 31 references indexed in Scilit:
- ReLinkPublished by Association for Computing Machinery (ACM) ,2011
- Dealing with noise in defect predictionPublished by Association for Computing Machinery (ACM) ,2011
- LINKSTERPublished by Association for Computing Machinery (ACM) ,2010
- A Case Study of Bias in Bug-Fix DatasetsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2010
- Characterizing and predicting which bugs get fixedPublished by Association for Computing Machinery (ACM) ,2010
- What makes a good bug report?Published by Association for Computing Machinery (ACM) ,2008
- An approach to detecting duplicate bug reports using natural language and execution informationPublished by Association for Computing Machinery (ACM) ,2008
- Modeling bug report qualityPublished by Association for Computing Machinery (ACM) ,2007
- How Long Will It Take to Fix This Bug?Published by Institute of Electrical and Electronics Engineers (IEEE) ,2007
- Detection of Duplicate Defect Reports Using Natural Language Processing29th International Conference on Software Engineering (ICSE'07), 2007