A Comprehensive Taxonomy for Prediction Models in Software Engineering
Abstract
:1. Introduction
- We conduct a comprehensive taxonomy on prediction models applied to software engineering. The taxonomy contains 136 primary papers from top conference proceedings and journals in the last decade.
- We summarize the 136 papers into 11 main research topics. Based on them, we conclude several big challenges and promising directions when applying prediction models to software engineering tasks.
2. Basis of Prediction Models
2.1. Overview
- Datasets. Datasets are the input of prediction models. There are various datasets (such as code, bug reports). Different software engineering tasks usually have different datasets which have different properties (such as scale, distribution, bias and etc). Due to these reasons, different prediction models are needed to fit well for different datasets.
- Features. A dataset contains more or fewer features. Features play a crucial role in building prediction models. A good feature set can generate a prediction model with very good performance, while a weak feature set may lead to a useless prediction model.
- Algorithms. There are various prediction models and their key difference lies in the algorithms. There are many algorithms in prediction models and different algorithms may fit different software engineering tasks. Section 2.2 introduces several common algorithms used in prediction models in detail.
- Evaluation Metrics. When prediction models output their prediction results, we use metrics evaluating their effectiveness so that we can pick up the best prediction model for a specific software engineering task. Similarly, there are many evaluation metrics for prediction models and different metrics may fit different software engineering tasks. Section 2.3 introduces several widely-used evaluation metrics for prediction models in detail.
2.2. Common Algorithms
2.2.1. Naive Bayes
2.2.2. Random Forest
2.2.3. Logistic Regression
2.2.4. Support Vector Machine
2.2.5. K-Nearest Neighbors
2.3. Evaluation Metrics
2.3.1. F1-Score
2.3.2. AUC
3. Research Methods
3.1. Paper Sources and Search Strategy
3.2. Statistics of Selected Papers
4. Coding Aid
4.1. Code Development
4.2. Code Review
4.3. Code Evaluation
4.4. Code APIs
5. Defect Prediction
5.1. Framework
5.2. Datasets
5.3. Features
5.4. Algorithms
5.5. Cross-Project Defect Prediction
6. Software Management
6.1. Software Requirement Engineering
6.2. Software Development Process
6.3. Software Cycle Time
6.4. Software Evolution
7. Software Quality
7.1. Software Reliability Prediction
7.2. Software Vulnerability Prediction
7.3. Malware Detection
8. Software Performance Prediction
8.1. White-Box Models
8.2. Black-Box Models
8.3. Performance-Related Analysis
9. Effort Estimation
9.1. Software Effort Estimation
9.2. Web Effort Estimation
10. Software Testing
10.1. Test Case Quality
10.2. Test Application
11. Program Analysis
11.1. Dynamic Analysis
11.2. Static Analysis
12. Traceability
12.1. Software Traceability
12.2. Traceability Quality
13. Bug Report Management
13.1. Bug Report Quality
13.2. Bug Report Assignment and Categorization
13.3. Bug Fix Related Task
14. Developers and Users
14.1. Developer Related Task
14.2. User Related Task
15. Discussion
15.1. RQ1: What Features Are Appropriate to Build Prediction Models in Software Engineering Researches?
15.2. RQ2: What Datasets Are Appropriate to Build Prediction Models in Software Engineering Researches?
15.3. RQ3: What Prediction Models Are Appropriate in Different Software Engineering Researches?
15.4. Threats to Validity
16. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Lessmann, S.; Baesens, B.; Mues, C.; Pietsch, S. Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Trans. Softw. Eng. 2008, 34, 485–496. [Google Scholar] [CrossRef]
- Aggarwal, C. Data Mining: The Textbook; Springer International Publishing: Cham, Switzerland, 2015. [Google Scholar]
- Anvik, J.; Murphy, G.C. Reducing the effort of bug report triage: Recommenders for development-oriented decisions. ACM Trans. Softw. Eng. Methodol. (TOSEM) 2011, 20, 10. [Google Scholar] [CrossRef]
- Han, J.; Kamber, M. Data Mining: Concepts and Techniques; Morgan Kaufmann: Burlington, MA, USA, 2006. [Google Scholar]
- Lamkanfi, A.; Demeyer, S.; Giger, E.; Goethals, B. Predicting the severity of a reported bug. In Proceedings of the 2010 7th IEEE Working Conference on Mining Software Repositories (MSR), Cape Town, South Africa, 2–3 May 2010; pp. 1–10. [Google Scholar]
- Romano, D.; Pinzger, M. Using source code metrics to predict change-prone java interfaces. In Proceedings of the 2011 27th IEEE International Conference on Software Maintenance (ICSM), Williamsburg, VA, USA, 25–30 September 2011; pp. 303–312. [Google Scholar]
- Giger, E.; Pinzger, M.; Gall, H.C. Comparing fine-grained source code changes and code churn for bug prediction. In Proceedings of the 8th Working Conference on Mining Software Repositories, Waikiki, Honolulu, HI, USA, 21–28 May 2011; pp. 83–92. [Google Scholar]
- Webster, J.; Watson, R.T. Analyzing the past to prepare for the future: Writing a literature review. Mis Q. 2002, 26, xiii–xxiii. [Google Scholar]
- Keele, S. Guidelines for Performing Systematic Literature Reviews in Software Engineering; Technical report, Ver. 2.3 EBSE Technical Report; IEEE Computer Society: Washington, DC, USA, 2007. [Google Scholar]
- Avizienis, A. Basic concept and taxonomy of dependable and secure computing. IEEE Tran. Dependable Secur. Comput. 2004, 1, 11–33. [Google Scholar] [CrossRef]
- Calderon, A.; Ruiz, M. A systematic literature review on serious games evaluation. Comput. Educ. 2015, 87, 396–422. [Google Scholar] [CrossRef]
- Salvador-Ullauri, L.; Acosta-Vargas, P.; Luján-Mora, S. Web-based Serious Games Systematic literature review. Appl. Sci. 2020, 10, 7859. [Google Scholar] [CrossRef]
- Reiss, S.P. Automatic code stylizing. In Proceedings of the Twenty-Second IEEE/ACM International Conference on Automated Software Engineering, Atlanta, GA, USA, 5–9 November 2007; pp. 74–83. [Google Scholar]
- Wang, X.; Dang, Y.; Zhang, L.; Zhang, D.; Lan, E.; Mei, H. Can I clone this piece of code here? In Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, Essen, Germany, 3–7 September 2012; pp. 170–179. [Google Scholar]
- Wang, X.; Dang, Y.; Zhang, L.; Zhang, D.; Lan, E.; Mei, H. Predicting Consistency-Maintenance Requirement of Code Clonesat Copy-and-Paste Time. IEEE Trans. Softw. Eng. 2014, 40, 773–794. [Google Scholar] [CrossRef]
- Bruch, M.; Monperrus, M.; Mezini, M. Learning from examples to improve code completion systems. In Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, Amsterdam, The Netherland, 26–30 August 2009; pp. 213–222. [Google Scholar]
- Proksch, S.; Lerch, J.; Mezini, M. Intelligent code completion with Bayesian networks. ACM Trans. Softw. Eng. Methodol. (TOSEM) 2015, 25, 3. [Google Scholar] [CrossRef]
- Hassan, A.E.; Zhang, K. Using decision trees to predict the certification result of a build. In Proceedings of the 21st IEEE/ACM International Conference on Automated Software Engineering (ASE’06), Tokyo, Japan, 18–22 September 2006; pp. 189–198. [Google Scholar]
- Zhu, J.; He, P.; Fu, Q.; Zhang, H.; Lyu, M.R.; Zhang, D. Learning to log: Helping developers make informed logging decisions. In Proceedings of the 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Florence, Italy, 16–24 May 2015; Volume 1, pp. 415–425. [Google Scholar]
- Kim, S.; Ernst, M.D. Which warnings should I fix first? In Proceedings of the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, Athens, Greece, 23–28 August 2007; pp. 45–54. [Google Scholar]
- Shihab, E.; Mockus, A.; Kamei, Y.; Adams, B.; Hassan, A.E. High-impact defects: A study of breakage and surprise defects. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, Szeged, Hungary, 5–9 September 2011; pp. 300–310. [Google Scholar]
- Padhye, R.; Mani, S.; Sinha, V.S. NeedFeed: Taming change notifications by modeling code relevance. In Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, Vasteras, Sweden, 15–19 September 2014; pp. 665–676. [Google Scholar]
- Liu, Y.; Khoshgoftaar, T.M.; Seliya, N. Evolutionary optimization of software quality modeling with multiple repositories. IEEE Trans. Softw. Eng. 2010, 36, 852–864. [Google Scholar] [CrossRef]
- Le Goues, C.; Weimer, W. Measuring code quality to improve specification mining. IEEE Trans. Softw. Eng. 2012, 38, 175–190. [Google Scholar] [CrossRef]
- Femmer, H.; Ganesan, D.; Lindvall, M.; McComas, D. Detecting inconsistencies in wrappers: A case study. In Proceedings of the 2013 International Conference on Software Engineering, San Francisco, CA, USA, 18–26 May 2013; pp. 1022–1031. [Google Scholar]
- Rigby, P.C.; Robillard, M.P. Discovering essential code elements in informal documentation. In Proceedings of the 2013 International Conference on Software Engineering, San Francisco, CA, USA, 18–26 May 2013; pp. 832–841. [Google Scholar]
- Thummalapenta, S.; Xie, T. SpotWeb: Detecting framework hotspots and coldspots via mining open source code on the web. In Proceedings of the ASE 2008 23rd IEEE/ACM International Conference on Automated Software Engineering, L’aquila, Italy, 15–19 September 2008; pp. 327–336. [Google Scholar]
- Wu, Q.; Liang, G.; Wang, Q.; Xie, T.; Mei, H. Iterative mining of resource-releasing specifications. In Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE), Lawrence, KS, USA, 6–10 November 2011; pp. 233–242. [Google Scholar]
- Petrosyan, G.; Robillard, M.P.; De Mori, R. Discovering information explaining API types using text classification. In Proceedings of the 37th International Conference on Software Engineering, Florence, Italy, 16–24 May 2015; Volume 1, pp. 869–879. [Google Scholar]
- Treude, C.; Robillard, M.P. Augmenting API documentation with insights from Stack Overflow. In Proceedings of the 38th International Conference on Software Engineering, Austin, TX, USA, 14–22 May 2016; pp. 392–403. [Google Scholar]
- Rahman, F.; Khatri, S.; Barr, E.T.; Devanbu, P. Comparing static bug finders and statistical prediction. In Proceedings of the 36th International Conference on Software Engineering, Hyderabad, India, 31 May–7 June 2014; pp. 424–434. [Google Scholar]
- Song, Q.; Jia, Z.; Shepperd, M.; Ying, S.; Liu, J. A general software defect-proneness prediction framework. IEEE Trans. Softw. Eng. 2011, 37, 356–370. [Google Scholar] [CrossRef] [Green Version]
- Bird, C.; Bachmann, A.; Aune, E.; Duffy, J.; Bernstein, A.; Filkov, V.; Devanbu, P. Fair and balanced?: Bias in bug-fix datasets. In Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, Amsterdam, The Netherlands, 24–28 August 2009; pp. 121–130. [Google Scholar]
- Rahman, F.; Posnett, D.; Herraiz, I.; Devanbu, P. Sample size vs. bias in defect prediction. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, Saint Petersburg, Russia, 18–26 August 2013; pp. 147–157. [Google Scholar]
- Tantithamthavorn, C.; McIntosh, S.; Hassan, A.E.; Ihara, A.; Matsumoto, K. The impact of mislabelling on the performance and interpretation of defect prediction models. In Proceedings of the 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Florence, Italy, 16–24 May 2015; Volume 1, pp. 812–823. [Google Scholar]
- Kim, S.; Zhang, H.; Wu, R.; Gong, L. Dealing with noise in defect prediction. In Proceedings of the 2011 33rd International Conference on Software Engineering (ICSE), Waikiki, HI, USA, 21–28 May 2011; pp. 481–490. [Google Scholar]
- Menzies, T.; Butcher, A.; Cok, D.; Marcus, A.; Layman, L.; Shull, F.; Turhan, B.; Zimmermann, T. Local versus global lessons for defect prediction and effort estimation. IEEE Trans. Softw. Eng. 2013, 39, 822–834. [Google Scholar] [CrossRef]
- Nam, J.; Kim, S. CLAMI: Defect Prediction on Unlabeled Datasets (T). In Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), Lincoln, NE, USA, 9–15 November 2015; pp. 452–463. [Google Scholar]
- Peters, F.; Menzies, T. Privacy and utility for defect prediction: Experiments with morph. In Proceedings of the 34th International Conference on Software Engineering, Zurich, Switzerland, 2–9 June 2012; pp. 189–199. [Google Scholar]
- Peters, F.; Menzies, T.; Gong, L.; Zhang, H. Balancing privacy and utility in cross-company defect prediction. IEEE Trans. Softw. Eng. 2013, 39, 1054–1068. [Google Scholar] [CrossRef]
- Peters, F.; Menzies, T.; Layman, L. LACE2: Better privacy-preserving data sharing for cross project defect prediction. In Proceedings of the 37th International Conference on Software Engineering-Volume 1, Florence, Italy, 16–24 May 2015; pp. 801–811. [Google Scholar]
- Menzies, T.; Greenwald, J.; Frank, A. Data mining static code attributes to learn defect predictors. IEEE Trans. Softw. Eng. 2007, 33, 2–13. [Google Scholar] [CrossRef]
- Moser, R.; Pedrycz, W.; Succi, G. A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In Proceedings of the 2008 ACM/IEEE 30th International Conference on Software Engineering, Leipzig, Germany, 10–18 May 2008; pp. 181–190. [Google Scholar]
- Rahman, F.; Devanbu, P. How, and why, process metrics are better. In Proceedings of the 2013 International Conference on Software Engineering, San Francisco, CA, USA, 18–26 May 2013; pp. 432–441. [Google Scholar]
- Lee, T.; Nam, J.; Han, D.; Kim, S.; In, H.P. Micro interaction metrics for defect prediction. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, Szeged, Hungary, 5–9 September 2011; pp. 311–321. [Google Scholar]
- Jiang, T.; Tan, L.; Kim, S. Personalized defect prediction. In Proceedings of the 2013 IEEE/ACM 28th International Conference on Automated Software Engineering (ASE), Silicon Valley, CA, USA, 11–15 November 2013; pp. 279–289. [Google Scholar]
- Posnett, D.; Filkov, V.; Devanbu, P. Ecological inference in empirical software engineering. In Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, Lawrence, KS, USA, 6–10 November 2011; pp. 362–371. [Google Scholar]
- Hata, H.; Mizuno, O.; Kikuno, T. Bug prediction based on fine-grained module histories. In Proceedings of the 34th International Conference on Software Engineering, Zurich, Switzerland, 2–9 June 2012; pp. 200–210. [Google Scholar]
- Kim, S.; Whitehead Jr, E.J.; Zhang, Y. Classifying software changes: Clean or buggy? IEEE Trans. Softw. Eng. 2008, 34, 181–196. [Google Scholar] [CrossRef]
- Kamei, Y.; Shihab, E.; Adams, B.; Hassan, A.E.; Mockus, A.; Sinha, A.; Ubayashi, N. A large-scale empirical study of just-in-time quality assurance. IEEE Trans. Softw. Eng. 2013, 39, 757–773. [Google Scholar] [CrossRef]
- Shivaji, S.; Whitehead, E.J.; Akella, R.; Kim, S. Reducing features to improve code change-based bug prediction. IEEE Trans. Softw. Eng. 2013, 39, 552–569. [Google Scholar] [CrossRef]
- Jing, X.Y.; Ying, S.; Zhang, Z.W.; Wu, S.S.; Liu, J. Dictionary learning based software defect prediction. In Proceedings of the 36th International Conference on Software Engineering, Hyderabad, India, 31 May–7 June 2014; pp. 414–423. [Google Scholar]
- Yang, X.; Lo, D.; Xia, X.; Yun, Z.; Sun, J. Deep Learning for Just-in-Time Defect Prediction. In Proceedings of the IEEE International Conference on Software Quality, Vancouver, BC, Canada, 3–5 August 2015. [Google Scholar]
- Wang, S.; Liu, T.; Tan, L. Automatically learning semantic features for defect prediction. In Proceedings of the 38th International Conference on Software Engineering, Austin, TX, USA, 14–22 May 2016; pp. 297–308. [Google Scholar]
- Ghotra, B.; McIntosh, S.; Hassan, A.E. Revisiting the impact of classification techniques on the performance of defect prediction models. In Proceedings of the 37th International Conference on Software Engineering-Volume 1, Florence, Italy, 16–24 May 2015; pp. 789–800. [Google Scholar]
- Tantithamthavorn, C.; McIntosh, S.; Hassan, A.E.; Matsumoto, K. Automated parameter optimization of classification techniques for defect prediction models. In Proceedings of the 38th International Conference on Software Engineering, Austin, TX, USA, 14–22 May 2016; pp. 321–332. [Google Scholar]
- Zimmermann, T.; Nagappan, N.; Gall, H.; Giger, E.; Murphy, B. Cross-project defect prediction: A large scale experiment on data vs. domain vs. process. In Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, Amsterdam, The Netherlands, 24–28 August 2009; pp. 91–100. [Google Scholar]
- Rahman, F.; Posnett, D.; Devanbu, P. Recalling the imprecision of cross-project defect prediction. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, Cary, NC, USA, 11–16 November 2012; p. 61. [Google Scholar]
- Nam, J.; Pan, S.J.; Kim, S. Transfer defect learning. In Proceedings of the 2013 International Conference on Software Engineering, San Francisco, CA, USA, 18–26 May 2013; pp. 382–391. [Google Scholar]
- Jing, X.; Wu, F.; Dong, X.; Qi, F.; Xu, B. Heterogeneous cross-company defect prediction by unified metric representation and cca-based transfer learning. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, New York, NY, USA, 30 August–4 September 2015; pp. 496–507. [Google Scholar]
- Nam, J.; Kim, S. Heterogeneous defect prediction. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, New York, NY, USA, 30 August–4 September 2015; pp. 508–519. [Google Scholar]
- Zhang, F.; Zheng, Q.; Zou, Y.; Hassan, A.E. Cross-project defect prediction using a connectivity-based unsupervised classifier. In Proceedings of the 38th International Conference on Software Engineering, Austin, TX, USA, 14–22 May 2016; pp. 309–320. [Google Scholar]
- Yang, H.; Willis, A.; De Roeck, A.; Nuseibeh, B. Automatic detection of nocuous coordination ambiguities in natural language requirements. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering, Antwerp, Belgium, 20–24 September 2010; pp. 53–62. [Google Scholar]
- Anish, P.R.; Balasubramaniam, B.; Sainani, A.; Cleland-Huang, J.; Daneva, M.; Wieringa, R.J.; Ghaisas, S. Probing for requirements knowledge to stimulate architectural thinking. In Proceedings of the 38th International Conference on Software Engineering, Austin, TX, USA, 14–22 May 2016; pp. 843–854. [Google Scholar]
- Chen, N.; Hoi, S.C.; Xiao, X. Software process evaluation: A machine learning approach. In Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, Lawrence, KS, USA, 6–10 November 2011; pp. 333–342. [Google Scholar]
- Blincoe, K.; Valetto, G.; Damian, D. Do all task dependencies require coordination? The role of task properties in identifying critical coordination needs in software projects. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, Singapore, 18–26 August 2013; pp. 213–223. [Google Scholar]
- Nan, N.; Harter, D.E. Impact of budget and schedule pressure on software development cycle time and effort. IEEE Trans. Softw. Eng. 2009, 35, 624–637. [Google Scholar]
- Choetkiertikul, M.; Dam, H.K.; Tran, T.; Ghose, A. Predicting Delays in Software Projects Using Networked Classification (T). In Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), Lincoln, NE, USA, 9–13 November 2015; pp. 353–364. [Google Scholar]
- Chaikalis, T.; Chatzigeorgiou, A. Forecasting Java Software Evolution Trends employing Network Models. IEEE Trans. Softw. Eng. 2015, 41, 582–602. [Google Scholar] [CrossRef]
- Wilson, S.P.; Samaniego, F.J. Nonparametric analysis of the order-statistic model in software reliability. IEEE Trans. Softw. Eng. 2007, 33, 198–208. [Google Scholar] [CrossRef]
- Cheung, L.; Roshandel, R.; Medvidovic, N.; Golubchik, L. Early prediction of software component reliability. In Proceedings of the 30th International Conference on Software Engineering, Leipzig, Germany, 10–18 May 2008; pp. 111–120. [Google Scholar]
- Torrado, N.; Wiper, M.P.; Lillo, R.E. Software reliability modeling with software metrics data via gaussian processes. IEEE Trans. Softw. Eng. 2013, 39, 1179–1186. [Google Scholar] [CrossRef]
- Misirli, A.T.; Bener, A.B. Bayesian networks for evidence-based decision-making in software engineering. IEEE Trans. Softw. Eng. 2014, 40, 533–554. [Google Scholar] [CrossRef]
- Zheng, Z.; Lyu, M.R. Collaborative reliability prediction of service-oriented systems. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1, Cape Town, South Africa, 1–8 May 2010; pp. 35–44. [Google Scholar]
- Zheng, Z.; Lyu, M.R. Personalized reliability prediction of web services. ACM Trans. Softw. Eng. Methodol. (TOSEM) 2013, 22, 12. [Google Scholar] [CrossRef]
- Silic, M.; Delac, G.; Srbljic, S. Prediction of atomic web services reliability based on k-means clustering. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, Singapore, 18–26 August 2013; pp. 70–80. [Google Scholar]
- Shin, Y.; Meneely, A.; Williams, L.; Osborne, J.A. Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities. IEEE Trans. Softw. Eng. 2011, 37, 772–787. [Google Scholar] [CrossRef]
- Shar, L.K.; Tan, H.B.K.; Briand, L.C. Mining SQL injection and cross site scripting vulnerabilities using hybrid program analysis. In Proceedings of the 2013 International Conference on Software Engineering, San Francisco, CA, USA, 18–26 May 2013; pp. 642–651. [Google Scholar]
- Scandariato, R.; Walden, J.; Hovsepyan, A.; Joosen, W. Predicting vulnerable software components via text mining. IEEE Trans. Softw. Eng. 2014, 40, 993–1006. [Google Scholar] [CrossRef]
- Chandramohan, M.; Tan, H.B.K.; Briand, L.C.; Shar, L.K.; Padmanabhuni, B.M. A scalable approach for malware detection through bounded feature space behavior modeling. In Proceedings of the 2013 IEEE/ACM 28th International Conference on Automated Software Engineering (ASE), Silicon Valley, CA, USA, 11–15 November 2013; pp. 312–322. [Google Scholar]
- Avdiienko, V.; Kuznetsov, K.; Gorla, A.; Zeller, A.; Arzt, S.; Rasthofer, S.; Bodden, E. Mining apps for abnormal usage of sensitive data. In Proceedings of the 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Firenze/Florence, Italy, 16–24 May 2015; Volume 1, pp. 426–436. [Google Scholar]
- Jin, Y.; Tang, A.; Han, J.; Liu, Y. Performance evaluation and prediction for legacy information systems. In Proceedings of the 29th International Conference on Software Engineering (ICSE’07), Minneapolis, MN, USA, 20–26 May 2007; pp. 540–549. [Google Scholar]
- Krishnamurthy, D.; Rolia, J.; Xu, M. WAM¡ªThe Weighted Average Method for Predicting the Performance of Systems with Bursts of Customer Sessions. IEEE Trans. Softw. Eng. 2011, 37, 718–735. [Google Scholar] [CrossRef]
- Rathfelder, C.; Kounev, S.; Evans, D. Capacity planning for event-based systems using automated performance predictions. In Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, Lawrence, KS, USA, 6–10 November 2011; pp. 352–361. [Google Scholar]
- Koziolek, H.; Schlich, B.; Bilich, C.; Weiss, R.; Becker, S.; Krogmann, K.; Trifu, M.; Mirandola, R.; Koziolek, A. An industrial case study on quality impact prediction for evolving service-oriented software. In Proceedings of the 33rd International Conference on Software Engineering, Waikiki, Honolulu, HI, USA, 21–28 May 2011; pp. 776–785. [Google Scholar]
- Guo, J.; Czarnecki, K.; Apel, S.; Siegmund, N.; Wasowski, A. Variability-aware performance prediction: A statistical learning approach. In Proceedings of the 2013 IEEE/ACM 28th International Conference on Automated Software Engineering (ASE), Silicon Valley, CA, USA, 11–15 November 2013; pp. 301–311. [Google Scholar]
- Sarkar, A.; Guo, J.; Siegmund, N.; Apel, S.; Czarnecki, K. Cost-efficient sampling for performance prediction of configurable systems (t). In Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), Lincoln, NE, USA, 9–13 November 2015; pp. 342–352. [Google Scholar]
- Zhang, Y.; Guo, J.; Blais, E.; Czarnecki, K. Performance prediction of configurable software systems by fourier learning (T). In Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), Lincoln, NE, USA, 9–13 November 2015; pp. 365–373. [Google Scholar]
- Westermann, D.; Happe, J.; Krebs, R.; Farahbod, R. Automated inference of goal-oriented performance prediction functions. In Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, Essen, Germany, 3–7 September 2012; pp. 190–199. [Google Scholar]
- Siegmund, N.; Kolesnikov, S.S.; Kästner, C.; Apel, S.; Batory, D.; Rosenmüller, M.; Saake, G. Predicting performance via automated feature-interaction detection. In Proceedings of the 34th International Conference on Software Engineering, Zurich, Switzerland, 2–9 June 2012; pp. 167–177. [Google Scholar]
- Siegmund, N.; Grebhahn, A.; Apel, S.; Kästner, C. Performance-influence models for highly configurable systems. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, Istanbul, Turkey, 30 August–4 September 2015; pp. 284–294. [Google Scholar]
- Acharya, M.; Kommineni, V. Mining health models for performance monitoring of services. In Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering, Auckland, New Zealand, 16–20 November 2009; pp. 409–420. [Google Scholar]
- Malik, H.; Hemmati, H.; Hassan, A.E. Automatic detection of performance deviations in the load testing of large scale systems. In Proceedings of the 2013 35th International Conference on Software Engineering (ICSE), San Francisco, CA, USA, 18–26 May 2013; pp. 1012–1021. [Google Scholar]
- Jorgensen, M.; Shepperd, M. A systematic review of software development cost estimation studies. IEEE Trans. Softw. Eng. 2007, 33, 33–53. [Google Scholar] [CrossRef]
- Kultur, Y.; Turhan, B.; Bener, A.B. ENNA: Software effort estimation using ensemble of neural networks with associative memory. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Lausanne, Switzerland, 9–14 November 2008; pp. 330–338. [Google Scholar]
- Whigham, P.A.; Owen, C.A.; Macdonell, S.G. A baseline model for software effort estimation. ACM Trans. Softw. Eng. Methodol. (TOSEM) 2015, 24, 20. [Google Scholar] [CrossRef]
- Dejaeger, K.; Verbeke, W.; Martens, D.; Baesens, B. Data mining techniques for software effort estimation: A comparative study. IEEE Trans. Softw. Eng. 2012, 38, 375–397. [Google Scholar] [CrossRef]
- Mittas, N.; Angelis, L. Ranking and clustering software cost estimation models through a multiple comparisons algorithm. IEEE Trans. Softw. Eng. 2013, 39, 537–551. [Google Scholar] [CrossRef]
- Mendes, E.; Mosley, N. Bayesian network models for web effort prediction: A comparative study. IEEE Trans. Softw. Eng. 2008, 34, 723–737. [Google Scholar] [CrossRef]
- Natella, R.; Cotroneo, D.; Duraes, J.A.; Madeira, H.S. On fault representativeness of software fault injection. IEEE Trans. Softw. Eng. 2013, 39, 80–96. [Google Scholar] [CrossRef]
- Cotroneo, D.; Pietrantuono, R.; Russo, S. A learning-based method for combining testing techniques. In Proceedings of the 2013 35th International Conference on Software Engineering (ICSE), San Francisco, CA, USA, 18–26 May 2013; pp. 142–151. [Google Scholar]
- Yu, Z.; Bai, C.; Cai, K.Y. Does the failing test execute a single or multiple faults? An approach to classifying failing tests. In Proceedings of the 37th International Conference on Software Engineering-Volume 1, Firenze/Florence, Italy, 23–30 August 2015; pp. 924–935. [Google Scholar]
- Song, C.; Porter, A.; Foster, J.S. iTree: Efficiently discovering high-coverage configurations using interaction trees. In Proceedings of the 2012 34th International Conference on Software Engineering (ICSE), Zurich, Switzerland, 2–9 June 2012; pp. 903–913. [Google Scholar]
- Song, C.; Porter, A.; Foster, J.S. iTree: Efficiently discovering high-coverage configurations using interaction trees. IEEE Trans. Softw. Eng. 2014, 40, 251–265. [Google Scholar] [CrossRef]
- Ali, S.; Andrews, J.H.; Dhandapani, T.; Wang, W. Evaluating the accuracy of fault localization techniques. In Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering, Auckland, New Zealand, 16–20 November 2009; pp. 76–87. [Google Scholar]
- Farzan, A.; Madhusudan, P.; Razavi, N.; Sorrentino, F. Predicting null-pointer dereferences in concurrent programs. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, Washington, DC, USA, 11–16 November 2012; p. 47. [Google Scholar]
- Nori, A.V.; Sharma, R. Termination proofs from tests. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, Singapore, 18–26 August 2013; pp. 246–256. [Google Scholar]
- Haran, M.; Karr, A.; Last, M.; Orso, A.; Porter, A.A.; Sanil, A.; Fouche, S. Techniques for classifying executions of deployed software to support software engineering tasks. IEEE Trans. Softw. Eng. 2007, 33, 287–304. [Google Scholar] [CrossRef]
- Yilmaz, C.; Porter, A. Combining hardware and software instrumentation to classify program executions. In Proceedings of the Eighteenth ACM SIGSOFT International Symposium on Foundations of Software Engineering, Seoul, Korea, 7–11 November 2010; pp. 67–76. [Google Scholar]
- Xiao, H.; Sun, J.; Liu, Y.; Lin, S.W.; Sun, C. Tzuyu: Learning stateful typestates. In Proceedings of the 2013 IEEE/ACM 28th International Conference on Automated Software Engineering (ASE), Silicon Valley, CA, USA, 11–15 November 2013; pp. 432–442. [Google Scholar]
- Lee, S.; Jung, C.; Pande, S. Detecting memory leaks through introspective dynamic behavior modelling using machine learning. In Proceedings of the 36th International Conference on Software Engineering, Hyderabad, India, 31 May–7 June 2014; pp. 814–824. [Google Scholar]
- Bodden, E.; Lam, P.; Hendren, L. Finding programming errors earlier by evaluating runtime monitors ahead-of-time. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Lausanne, Switzerland, 9–14 November 2008; pp. 36–47. [Google Scholar]
- Tripp, O.; Rinetzky, N. Tightfit: Adaptive parallelization with foresight. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, Singapore, 18–26 August 2013; pp. 169–179. [Google Scholar]
- Sun, H.; Zhang, X.; Zheng, Y.; Zeng, Q. IntEQ: Recognizing benign integer overflows via equivalence checking across multiple precisions. In Proceedings of the 38th International Conference on Software Engineering, Austin, TX, USA, 14–22 May 2016; pp. 1051–1062. [Google Scholar]
- Asuncion, H.U.; Asuncion, A.U.; Taylor, R.N. Software traceability with topic modeling. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1, Cape Town, South Africa, 1–8 May 2010; pp. 95–104. [Google Scholar]
- Wu, R.; Zhang, H.; Kim, S.; Cheung, S.C. Relink: Recovering links between bugs and changes. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, Lyngby, Denmark, 5–9 September 2011; pp. 15–25. [Google Scholar]
- Nguyen, A.T.; Nguyen, T.T.; Nguyen, H.A.; Nguyen, T.N. Multi-layered approach for recovering links between bug reports and fixes. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, Washington, DC, USA, 11–16 November 2012; p. 63. [Google Scholar]
- Grechanik, M.; McKinley, K.S.; Perry, D.E. Recovering and using use-case-diagram-to-source-code traceability links. In Proceedings of the the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software E ngineering, Luxembourg, 23–28 August 2007; pp. 95–104. [Google Scholar]
- Mirakhorli, M.; Shin, Y.; Cleland-Huang, J.; Cinar, M. A tactic-centric approach for automating traceability of quality concerns. In Proceedings of the 34th International Conference on Software Engineering, Zurich, Switzerland, 2–9 June 2012; pp. 639–649. [Google Scholar]
- Mirakhorli, M.; Cleland-Huang, J. Detecting, tracing, and monitoring architectural tactics in code. IEEE Trans. Softw. Eng. 2016, 42, 205–220. [Google Scholar] [CrossRef]
- Lohar, S.; Amornborvornwong, S.; Zisman, A.; Cleland-Huang, J. Improving trace accuracy through data-driven configuration and composition of tracing features. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, Singapore, 18–26 August 2013; pp. 378–388. [Google Scholar]
- Bettenburg, N.; Just, S.; Schröter, A.; Weiss, C.; Premraj, R.; Zimmermann, T. What makes a good bug report? In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Lausanne, Switzerland, 9–14 November 2008; pp. 308–318. [Google Scholar]
- Zanetti, M.S.; Scholtes, I.; Tessone, C.J.; Schweitzer, F. Categorizing bugs with social networks: A case study on four open source software communities. In Proceedings of the 2013 International Conference on Software Engineering, San Francisco, CA, USA, 18–26 May 2013; pp. 1032–1041. [Google Scholar]
- Runeson, P.; Alexandersson, M.; Nyholm, O. Detection of duplicate defect reports using natural language processing. In Proceedings of the 29th International Conference on Software Engineering (ICSE’07), Minneapolis, MN, USA, 20–26 May 2007; pp. 499–510. [Google Scholar]
- Wang, X.; Zhang, L.; Xie, T.; Anvik, J.; Sun, J. An approach to detecting duplicate bug reports using natural language and execution information. In Proceedings of the 30th International Conference on Software Engineering, Leipzig, Germany, 10–18 May 2008; pp. 461–470. [Google Scholar]
- Sun, C.; Lo, D.; Wang, X.; Jiang, J.; Khoo, S.C. A discriminative model approach for accurate duplicate bug report retrieval. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1, Cape Town, South Africa, 1–8 May 2010; pp. 45–54. [Google Scholar]
- Sun, C.; Lo, D.; Khoo, S.C.; Jiang, J. Towards more accurate retrieval of duplicate bug reports. In Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, Lawrence, KS, USA, 6–10 November 2011; pp. 253–262. [Google Scholar]
- Liu, K.; Tan, H.B.K.; Chandramohan, M. Has this bug been reported? In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software E ngineering, Washington, DC, USA, 11–16 November 2012; p. 28. [Google Scholar]
- Lo, D.; Jiang, L.; Budi, A. Active refinement of clone anomaly reports. In Proceedings of the 2012 34th International Conference on Software Engineering (ICSE), Zurich, Switzerland, 2–9 June 2012; pp. 397–407. [Google Scholar]
- Anvik, J.; Hiew, L.; Murphy, G.C. Who should fix this bug? In Proceedings of the 28th International Conference on Software Engineering, Shanghai, China, 20–28 May 2006; pp. 361–370. [Google Scholar]
- Jeong, G.; Kim, S.; Zimmermann, T. Improving bug triage with bug tossing graphs. In Proceedings of the the 7TH joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, Leuven, Belgium, 23–28 August 2009; pp. 111–120. [Google Scholar]
- Zimmermann, T.; Nagappan, N.; Guo, P.J.; Murphy, B. Characterizing and predicting which bugs get reopened. In Proceedings of the 2012 34th International Conference on Software Engineering (ICSE), Zurich, Switzerland, 2–9 June 2012; pp. 1074–1083. [Google Scholar]
- Xia, X.; Lo, D.; Shihab, E.; Wang, X.; Zhou, B. Automatic, high accuracy prediction of reopened bugs. Autom. Softw. Eng. 2015, 22, 75–109. [Google Scholar] [CrossRef]
- Xuan, J.; Jiang, H.; Ren, Z.; Zou, W. Developer prioritization in bug repositories. In Proceedings of the 2012 34th International Conference on Software Engineering (ICSE), Zurich, Switzerland, 2–9 June 2012; pp. 25–35. [Google Scholar]
- Kim, D.; Tao, Y.; Kim, S.; Zeller, A. Where should we fix this bug? a two-phase recommendation model. IEEE Trans. Softw. Eng. 2013, 39, 1597–1610. [Google Scholar]
- Zhang, H.; Gong, L.; Versteeg, S. Predicting bug-fixing time: An empirical study of commercial software projects. In Proceedings of the 2013 International Conference on Software Engineering, San Francisco, CA, USA, 18–26 May 2013; pp. 1042–1051. [Google Scholar]
- Guo, P.J.; Zimmermann, T.; Nagappan, N.; Murphy, B. Characterizing and predicting which bugs get fixed: An empirical study of Microsoft Windows. In Proceedings of the 2010 ACM/IEEE 32nd International Conference on Software Engineering, Cape Town, South Africa, 1–8 May 2010; Volume 1, pp. 495–504. [Google Scholar]
- Meneely, A.; Williams, L.; Snipes, W.; Osborne, J. Predicting failures with developer networks and social network analysis. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Lausanne, Switzerland, 9–14 November 2008; pp. 13–23. [Google Scholar]
- Meneely, A.; Rotella, P.; Williams, L. Does adding manpower also affect quality?: An empirical, longitudinal analysis. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, Lyngby, Denmark, 5–9 September 2011; pp. 81–90. [Google Scholar]
- Canfora, G.; Di Penta, M.; Oliveto, R.; Panichella, S. Who is going to mentor newcomers in open source projects? In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, Washington, DC, USA, 11–16 November 2012; p. 44. [Google Scholar]
- Fritz, T.; Begel, A.; Müller, S.C.; Yigit-Elliott, S.; Züger, M. Using psycho-physiological measures to assess task difficulty in software development. In Proceedings of the 36th International Conference on Software Engineering, Hyderabad, India, 31 May–7 June 2014; pp. 402–413. [Google Scholar]
- Müller, S.C.; Fritz, T. Stuck and frustrated or in flow and happy: Sensing developers’ emotions and progress. In Proceedings of the 37th International Conference on Software Engineering-Volume 1, Firenze/Florence, Italy, 23–30 August 2015; pp. 688–699. [Google Scholar]
- Müller, S.C.; Fritz, T. Using (bio) metrics to predict code quality online. In Proceedings of the 38th International Conference on Software Engineering, Austin, Texas, USA, 14–22 May 2016; pp. 452–463. [Google Scholar]
- Bacchelli, A.; Dal Sasso, T.; D’Ambros, M.; Lanza, M. Content classification of development emails. In Proceedings of the 34th International Conference on Software Engineering, Zurich, Switzerland, 2–9 June 2012; pp. 375–385. [Google Scholar]
- Di Sorbo, A.; Panichella, S.; Visaggio, C.A.; Di Penta, M.; Canfora, G.; Gall, H.C. Development emails content analyzer: Intention mining in developer discussions (T). In Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), Lincoln, NE, USA, 9–13 November 2015; pp. 12–23. [Google Scholar]
- Zhou, M.; Mockus, A. Who will stay in the floss community? Modeling participant’s initial behavior. IEEE Trans. Softw. Eng. 2015, 41, 82–99. [Google Scholar] [CrossRef]
- Murukannaiah, P.K.; Singh, M.P. Platys: An active learning framework for place-aware application development and its evaluation. ACM Trans. Softw. Eng. Methodol. (TOSEM) 2015, 24, 19. [Google Scholar] [CrossRef]
Predicted Positive | Predicted Negative | |
---|---|---|
Truly Positive | TP | FN |
Truly Negative | FP | TN |
Target-Based | Technique-Based |
---|---|
predict | support vector machine |
classif | decision tree |
identif | Bayes |
detect | machine learning |
regression | |
random forest |
Dataset | Number of Papers |
---|---|
ICSE | 49 |
FSE | 29 |
ASE | 24 |
TSE | 29 |
TOSEM | 5 |
Author | Number of Papers |
---|---|
Sunghun Kim | 12 |
Ahmed E Hassan | 8 |
Thomas Zimmermann | 6 |
Premkumar Devanbu | 6 |
Hongyu Zhang | 5 |
Tim Menzies | 5 |
David Lo | 4 |
Foyzur Rahman | 4 |
Jaechang Nam | 4 |
Norbert Siegmund | 4 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, X.; Liu, J.; Zhang, D. A Comprehensive Taxonomy for Prediction Models in Software Engineering. Information 2023, 14, 111. https://doi.org/10.3390/info14020111
Yang X, Liu J, Zhang D. A Comprehensive Taxonomy for Prediction Models in Software Engineering. Information. 2023; 14(2):111. https://doi.org/10.3390/info14020111
Chicago/Turabian StyleYang, Xinli, Jingjing Liu, and Denghui Zhang. 2023. "A Comprehensive Taxonomy for Prediction Models in Software Engineering" Information 14, no. 2: 111. https://doi.org/10.3390/info14020111