Source Code Retrieval for Bug Localization Using Latent Dirichlet Allocation
Top Cited Papers
- 1 October 2008
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 155-164
- https://doi.org/10.1109/wcre.2008.33
Abstract
In bug localization, a developer uses information about a bug to locate the portion of the source code to modify to correct the bug. Developers expend considerable effort performing this task. Some recent static techniques for automatic bug localization have been built around modern information retrieval (IR) models such as latent semantic indexing (LSI); however, latent Dirichlet allocation (LDA), a modular and extensible IR model, has significant advantages over both LSI and probabilistic LSI (pLSI). In this paper we present an LDA-based static technique for automating bug localization. We describe the implementation of our technique and three case studies that measure its effectiveness. For two of the case studies we directly compare our results to those from similar studies performed using LSI. The results demonstrate our LDA-based technique performs at least as well as the LSI-based techniques for all bugs and performs better, often significantly so, than the LSI-based techniques for most bugs.Keywords
This publication has 23 references indexed in Scilit:
- Mining concepts from code with probabilistic topic modelsPublished by Association for Computing Machinery (ACM) ,2007
- Feature location via information retrieval based filtering of a single scenario execution tracePublished by Association for Computing Machinery (ACM) ,2007
- Semantic clustering: Identifying topics in source codeInformation and Software Technology, 2007
- Working Session: Information Retrieval Based Approaches in Software EvolutionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2006
- LDA-based document models for ad-hoc retrievalPublished by Association for Computing Machinery (ACM) ,2006
- Leveraged Quality Assessment using Information Retrieval TechniquesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2006
- Enriching Reverse Engineering with Semantic ClusteringPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2006
- A Linguistic Analysis of How People Describe Software ProblemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2006
- Static Techniques for Concept Location in Object-Oriented CodePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Probabilistic latent semantic indexingPublished by Association for Computing Machinery (ACM) ,1999