Two-Level Information-Retrieval-Based Model for Bug Localization Based on Bug Reports
Abstract
:1. Introduction
- A two-level information-retrieval-based model for bug localization based on bug reports is proposed.
- A new approach for class-level bug localization at the first level of the proposed model is presented.
- A new approach for method-level bug localization at the second level of the proposed model is presented.
- The proposed model is evaluated by using a publicly available dataset for the class level and the method level.
- The proposed model is compared against several state-of-the-art class-level bug localization models, which proves that the proposed model outperforms these state-of-the-art models.
2. The Proposed Two-Level IR-Based Bug Localization Model
2.1. Parsing and Pre-Processing
- Stack trace extraction from the description of the bug report is performed. To extract the stack frames from the description of the bug report and distinguish them from the other parts of the description, a regular expression is used to apply the stack trace extraction process, which is in the form of ‘at package_name.class_name.method_name (file_name.java:line_number | Native Method | Unknown Source)’, which is found in the description of the bug report where the package, class, and method names that were being executed when the bug has been reported are found at package_name, class_name, and method_name, respectively [4].
- Then, a part of speech (POS) tagger is used to extract the verbs and nouns and the extracted elements are tokenized.
- Additionally, elements are normalized according to their type; for example, source files contain Camle Case identifiers and they are split into separate tokens; for example, “getAbsolutePath” is split into “get”, “Absolute”, and “Path”. Both splitting camel case tokens and the full identifier name are kept because it is shown that this approach is effective as the full identifier is often present in the bug report [9].
- Furthermore, unwanted tokens are removed such as programming language keywords, stop words, and punctuation.
- Finally, tokens are stemmed and converted into their root words.
2.2. The Proposed Class-Level Bug Localization Approach
2.2.1. Class-Level Feature Scoring Phase
- Textual similarity
- 2.
- Semantic similarity
- 3.
- Token matching
- 4.
- Stack trace
- 5.
- Previously fixed bug reports
- 6.
- API-enriched lexical similarity
- 7.
- Bug-fixing recency
- 8.
- Bug-fixing frequency
2.2.2. Class-Level Final Score and Ranking Phase
2.3. The Proposed Method-Level Bug Localization Approach
2.3.1. Method-Level Feature Scoring Phase
- Semantic similarity
- 2.
- Stack trace
2.3.2. Method-Level Final Score and Ranking Phase
3. Experiment Details
3.1. Dataset
3.2. Evaluation Metrics
- Top N rank:
- 2.
- Mean Average Precision (MAP):
- 3.
- Mean Reciprocal Rank (MRR):
4. Results and Discussion
4.1. Class-Level Bug Localization
4.2. Method-Level Bug Localization
5. Related Work
5.1. IR-Based Algorithms
5.2. ML/DL with IR-Based Approaches
5.3. Optimization Algorithms with IR-Based Approaches
6. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- IEEE Std 610.12-1990; IEEE Standard Glossary of Software Engineering Terminology. IEEE: Piscataway, NJ, USA, 1990. [CrossRef]
- Erfani Joorabchi, M.; Mirzaaghaei, M.; Mesbah, A. Works for me! characterizing non-reproducible bug reports. In Proceedings of the 11th Working Conference on Mining Software Repositories, Hyderabad, India, 31 May–1 June 2014. [Google Scholar]
- Breu, S.; Premraj, R.; Sillito, J.; Zimmermann, T. Information needs in Bug Reports. In Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work, Savannah, GA, USA, 6–10 February 2010. [Google Scholar]
- Gharibi, R.; Rasekh, A.H.; Sadreddini, M.H.; Fakhrahmad, S.M. Leveraging textual properties of bug reports to localize relevant source files. Inf. Process. Manag. 2018, 54, 1058–1076. [Google Scholar] [CrossRef]
- Wang, S.; Lo, D. Amalgam+: Composing rich information sources for accurate bug localization. J. Softw. Evol. Process 2016, 28, 921–942. [Google Scholar] [CrossRef]
- Wang, S.; Lo, D. Version history, similar report, and structure: Putting them together for improved bug localization. In Proceedings of the 22nd International Conference on Program Comprehension, Hyderabad, India, 2–3 June 2014. [Google Scholar]
- Manning, C.D.; Raghavan, P.; Schutze, H. An Introduction to Information Retrieval; Cambridge University Press: Cambridge, UK, 2008. [Google Scholar]
- Ye, X.; Bunescu, R.; Liu, C. Learning to rank relevant files for bug reports using domain knowledge. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, Hong Kong, China, 16–22 November 2014. [Google Scholar]
- Saha, R.K.; Lease, M.; Khurshid, S.; Perry, D.E. Improving bug localization using structured information retrieval. In Proceedings of the 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), Silicon Valley, CA, USA, 11–15 November 2013. [Google Scholar]
- Fejzer, M.; Narebski, J.; Przymus, P.; Stencel, K. Tracking buggy files: New efficient adaptive bug localization algorithm. IEEE Trans. Softw. Eng. 2022, 48, 2557–2569. [Google Scholar] [CrossRef]
- Seyam, A.A.; Hamdy, A.; Farhan, M.S. Code complexity and version history for enhancing hybrid bug localization. IEEE Access 2021, 9, 61101–61113. [Google Scholar] [CrossRef]
- Wong, C.-P.; Xiong, Y.; Zhang, H.; Hao, D.; Zhang, L.; Mei, H. Boosting bug-report-oriented fault localization with segmentation and Stack-trace analysis. In Proceedings of the 2014 IEEE International Conference on Software Maintenance and Evolution, Victoria, BC, Canada, 29 September–3 October 2014. [Google Scholar]
- Zhou, Y.; Tong, Y.; Chen, T.; Han, J. Augmenting bug localization with part-of-speech and invocation. Int. J. Softw. Eng. Knowl. Eng. 2017, 27, 925–949. [Google Scholar] [CrossRef]
- Youm, K.C.; Ahn, J.; Lee, E. Improved bug localization based on code change histories and Bug Reports. Inf. Softw. Technol. 2017, 82, 177–192. [Google Scholar] [CrossRef]
- Zhou, J.; Zhang, H.; Lo, D. Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports. In Proceedings of the 2012 34th International Conference on Software Engineering (ICSE), Zurich, Switzerland, 2–9 June 2012. [Google Scholar]
- word2vec|Text. TensorFlow. Available online: https://www.tensorflow.org/text/tutorials/word2vec (accessed on 3 September 2023).
- Pennington, J.; Socher, R.; Manning, C.D. GloVe: Global Vectors for Word Representation. Available online: https://nlp.stanford.edu/projects/glove (accessed on 3 September 2023).
- Sabor, K.K. Automatic Bug Triaging Techniques Using Machine Learning and Stack Traces. Ph.D. Thesis, Concordia University, Montreal, QC, Canada, 2019. [Google Scholar]
- Murphy-Hill, E.; Zimmermann, T.; Bird, C.; Nagappan, N. The design of bug fixes. In Proceedings of the 2013 35th International Conference on Software Engineering (ICSE), San Francisco, CA, USA, 18–26 May 2013. [Google Scholar]
- Rahman, F.; Devanbu, P. How, and why, process metrics are better. In Proceedings of the 2013 35th International Conference on Software Engineering (ICSE), San Francisco, CA, USA, 18–26 May 2013. [Google Scholar]
- Reimers, N.; Gurevych, I. Sentence-bert: Sentence embeddings using Siamese Bert-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019. [Google Scholar]
- Almhana, R.; Kessentini, M.; Mkaouer, W. Method-level bug localization using hybrid multi-objective search. Inf. Softw. Technol. 2021, 131, 106474. [Google Scholar] [CrossRef]
- Kiczales, G.; Hilsdale, E. Aspect-oriented programming. ACM SIGSOFT Softw. Eng. Notes 2001, 26, 313. [Google Scholar] [CrossRef]
- Niu, F.; Assunção, W.K.; Huang, L.; Mayr-Dorn, C.; Ge, J.; Luo, B.; Egyed, A. Rat: A refactoring-aware traceability model for bug localization. In Proceedings of the 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), Melbourne, Australia, 14–20 May 2023. [Google Scholar]
- Chen, A.R.; Chen, T.-H.; Wang, S. Pathidea: Improving information retrieval-based bug localization by re-constructing execution paths using logs. IEEE Trans. Softw. Eng. 2022, 48, 2905–2919. [Google Scholar] [CrossRef]
- Kim, D.; Tao, Y.; Kim, S.; Zeller, A. Where should we fix this bug? A two-phase recommendation model. IEEE Trans. Softw. Eng. 2013, 39, 1597–1610. [Google Scholar]
- Hugging Face—The AI Community Building the Future. Available online: https://huggingface.co/ (accessed on 10 October 2023).
- Sentence-Transformers/All-Minilm-L6-V2 Hugging Face. Available online: https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 (accessed on 3 October 2023).
- Alsaedi, S.A.; Noaman, A.Y.; Gad-Elrab, A.A.; Eassa, F.E. Nature-based prediction model of bug reports based on Ensemble Machine Learning Model. IEEE Access 2023, 11, 63916–63931. [Google Scholar] [CrossRef]
- A Quick Guide to Learning to Rank Models. Available online: https://practicaldatascience.co.uk/machine-learning/a-quick-guide-to-learning-to-rank-models (accessed on 6 October 2023).
- Lam, A.N.; Nguyen, A.T.; Nguyen, H.A.; Nguyen, T.N. Combining deep learning with information retrieval to localize buggy files for bug reports. In Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), Lincoln, NE, USA, 9–13 November 2015. [Google Scholar]
- Huo, X.; Li, M.; Zhou, Z.-H. Learning Unified Features from Natural and Programming Languages for Locating Buggy Source Code. IJCAI 2016, 16, 1606–1612. [Google Scholar]
- Lam, A.N.; Nguyen, A.T.; Nguyen, H.A.; Nguyen, T.N. Bug localization with combination of deep learning and Information Retrieval. In Proceedings of the 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC), Buenos Aires, Argentina, 22–23 May 2017. [Google Scholar]
- Xiao, Y.; Keung, J.; Mi, Q.; Bennin, K.E. Improving bug localization with an enhanced convolutional neural network. In Proceedings of the 2017 24th Asia-Pacific Software Engineering Conference (APSEC), Nanjing, China, 4–8 December 2017; pp. 338–347. [Google Scholar]
- Yang, G.; Lee, B. Utilizing topic-based similar commit information and CNN-LSTM algorithm for bug localization. Symmetry 2021, 13, 406. [Google Scholar] [CrossRef]
- Almhana, R.; Mkaouer, W.; Kessentini, M.; Ouni, A. Recommending relevant classes for bug reports using multi-objective search. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, Singapore, 3–7 September 2016. [Google Scholar]
- Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 2002, 6, 182–197. [Google Scholar] [CrossRef]
Project | Period | Number of Mapped Bug Reports | Number of Files Per Bug Report | Number of Java Files in Different Versions of the Project Source Package | Number of API Entries | ||||
---|---|---|---|---|---|---|---|---|---|
AspectJ | 13 March 2002 to 10 January 2014 | 593 | Maximum | Median | Minimum | Maximum | Median | Minimum | 54 |
87 | 2 | 1 | 6879 | 4439 | 2086 |
AspectJ Dataset | Top 1 | Top 5 | Top 10 |
---|---|---|---|
Correctly localized classes | 176 | 296 | 354 |
Percentage | 29.6795% | 49.9156% | 59.6964% |
Algorithm | MAP | MRR |
---|---|---|
Proposed model | 0.267 | 0.391 |
Gharibi et al. [4] | 0.2659 | 0.389 |
Learning to rank | 0.25 | 0.33 |
BugLocator | 0.22 | 0.32 |
VSM | 0.12 | 0.16 |
Usual Suspects | 0.16 | 0.25 |
Metric | Feature | |||||||
---|---|---|---|---|---|---|---|---|
Textual Similarity | Token Matching | Previously Fixed Bug Reports | Semantic Similarity | Stack Trace | API-Enriched Lexical Similarity | Bug-Fixing Recency | Bug-Fixing Frequency | |
MAP | 0.1585 | 0.2636 | 0.2170 | 0.2571 | 0.2596 | 0.2668 | 0.2670 | 0.2659 |
MRR | 0.2600 | 0.3896 | 0.3218 | 0.3801 | 0.3816 | 0.3893 | 0.3911 | 0.3894 |
Top 1 | 98 | 174 | 143 | 171 | 169 | 173 | 175 | 174 |
Top 5 | 216 | 300 | 242 | 290 | 296 | 297 | 297 | 297 |
Top 10 | 264 | 356 | 292 | 345 | 348 | 352 | 351 | 354 |
Percent of Top 1 | 16.52% | 29.34% | 24.11% | 28.83% | 28.49% | 29.17% | 29.51% | 29.34% |
Percent of Top 5 | 36.42% | 50.59% | 40.80% | 48.90% | 49.91% | 50.08% | 50.08% | 50.08% |
Percent of Top 10 | 44.51% | 60.03% | 49.24% | 58.17% | 58.68% | 59.35% | 59.19% | 59.69% |
Evaluation Metric | Value | |||
---|---|---|---|---|
All-MiniLM-L6-v2 | Multi-qa-MiniLM-L6-cos-v1 | All-mpnet-base-v2 | All-MiniLM-L12-v2 | |
Top 100 | 136 | 139 | 133 | 130 |
Top 50 | 70 | 83 | 85 | 64 |
Top 20 | 36 | 36 | 31 | 28 |
Top 10 | 17 | 18 | 12 | 17 |
Top 5 | 9 | 6 | 8 | 10 |
Model | IR Method | Similarity Between Bug Report and Source Files | Structured Information of Source Code | Previously Fixed bug Reports | Stack Trace | Version History | POS Tagging | Call Graph | Semantic Similarity | Code Complexity | Reporter Information | Evaluation Metrics |
---|---|---|---|---|---|---|---|---|---|---|---|---|
BugLocator [15] | rVSM | ✓ | - | ✓ | - | - | - | - | - | - | - | Top N, MAP, and MRR |
BLUiR [9] | TF.IDF | ✓ | ✓ | - | - | - | - | - | - | - | - | |
AmaLgam [6] | VSM | ✓ | ✓ | ✓ | - | ✓ | - | - | - | - | - | |
AmaLgam+ [5] | VSM | ✓ | ✓ | ✓ | ✓ | ✓ | - | - | - | - | ✓ | |
[13] | rVSM | ✓ | ✓ | - | - | - | ✓ | ✓ | - | - | - | |
BLIA [14] | rVSM | ✓ | ✓ | ✓ | ✓ | ✓ | - | - | - | - | - | |
[4] | rVSM | ✓ | ✓ | ✓ | ✓ | - | ✓ | - | ✓ | - | - | |
BRTracer [12] | rVSM | ✓ | - | ✓ | ✓ | - | - | - | - | - | - | |
HBL [11] | rVSM | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | - | ✓ | ✓ | - |
Reference | Model | ML/DL Algorithm | Features and Methods | Evaluation Metrics |
---|---|---|---|---|
Ye et al. [8] | Learning To Rank | Learning To Rank |
| Accuracy, MAP, and MRR |
Lam et al. [31] | HyLoc | DNN |
| Top-ranked accuracy MAP, and MRR |
Huo et al. [32] | NP-CNN | CNN-based deep neural network |
| AUC, MAP, and top k Rank. |
Lam et al. [33] | DNNLOC | DNN |
| Top-ranked accuracy MAP, and MRR |
Xiao et al. [34] | DeepLocator | Enhanced CNN |
| F-measure (F), Precision rate (P), Recall rate ®, and MAP. |
Yang and Lee [35] | - | CNN-LSTM |
| F-measure |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alsaedi, S.; Gad-Elrab, A.A.A.; Noaman, A.; Eassa, F. Two-Level Information-Retrieval-Based Model for Bug Localization Based on Bug Reports. Electronics 2024, 13, 321. https://doi.org/10.3390/electronics13020321
Alsaedi S, Gad-Elrab AAA, Noaman A, Eassa F. Two-Level Information-Retrieval-Based Model for Bug Localization Based on Bug Reports. Electronics. 2024; 13(2):321. https://doi.org/10.3390/electronics13020321
Chicago/Turabian StyleAlsaedi, Shatha, Ahmed A. A. Gad-Elrab, Amin Noaman, and Fathy Eassa. 2024. "Two-Level Information-Retrieval-Based Model for Bug Localization Based on Bug Reports" Electronics 13, no. 2: 321. https://doi.org/10.3390/electronics13020321
APA StyleAlsaedi, S., Gad-Elrab, A. A. A., Noaman, A., & Eassa, F. (2024). Two-Level Information-Retrieval-Based Model for Bug Localization Based on Bug Reports. Electronics, 13(2), 321. https://doi.org/10.3390/electronics13020321