ReLink
Top Cited Papers
- 9 September 2011
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM)
Abstract
Software defect information, including links between bugs and committed changes, plays an important role in software maintenance such as measuring quality and predicting defects. Usually, the links are automatically mined from change logs and bug reports using heuristics such as searching for specific keywords and bug IDs in change logs. However, the accuracy of these heuristics depends on the quality of change logs. Bird et al. found that there are many missing links due to the absence of bug references in change logs. They also found that the missing links lead to biased defect information, and it affects defect prediction performance. We manually inspected the explicit links, which have explicit bug IDs in change logs and observed that the links exhibit certain features. Based on our observation, we developed an automatic link recovery algorithm, ReLink, which automatically learns criteria of features from explicit links to recover missing links. We applied ReLink to three open source projects. ReLink reliably identified links with 89% precision and 78% recall on average, while the traditional heuristics alone achieve 91% precision and 64% recall. We also evaluated the impact of recovered links on software maintainability measurement and defect prediction, and found the results of ReLink yields significantly better accuracy than those of traditional heuristics. © 2011 ACMKeywords
This publication has 28 references indexed in Scilit:
- Benchmarking Lightweight Techniques to Link E-Mails and Source CodePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2009
- Data sets and data quality in software engineeringPublished by Association for Computing Machinery (ACM) ,2008
- Detection of Duplicate Defect Reports Using Natural Language Processing29th International Conference on Software Engineering (ICSE'07), 2007
- Predicting Faults from Cached History29th International Conference on Software Engineering (ICSE'07), 2007
- Open-Source Change LogsEmpirical Software Engineering, 2004
- Preprocessing CVS data for fine-grained analysisPublished by Institution of Engineering and Technology (IET) ,2004
- Recovering traceability links between code and documentationIEEE Transactions on Software Engineering, 2002
- Two case studies of open source software developmentACM Transactions on Software Engineering and Methodology, 2002
- Analyzing data sets with missing data: an empirical evaluation of imputation methods and likelihood-based methodsIEEE Transactions on Software Engineering, 2001
- Software cost estimation with incomplete dataIEEE Transactions on Software Engineering, 2001