A Deep Ensemble Learning Method for Effort-Aware Just-In-Time Defect Prediction
Abstract
:1. Introduction
- We have provided a state-of-art fusion methodology for effort aware JIT prediction of defects, which combines the deep neural network, random forest and XGBoost classifier.
- The proposed method has provided efficient results on a publicly available standard dataset.
- For comparison purposes, we have provided results on individual classifiers such as Random Forest and XGboost model as well.
- We have proposed a reinforcement strategy for the model, so it can learn with the growth of data.
- The detailed comparison of the proposed method shows that our model is state-of-the-art and has excellent results.
2. Literature Review
3. Proposed Methodology
3.1. Deep Neural Network Inspired by DeepCrowd
3.2. XGBoost Classifier
3.3. Random Forest
3.4. Fusion/Ensemble of Classifiers
3.5. Reinforcement Learning for Hyperparameters Optimization
- Change of the machine dependency towards more reliable features, including only the most correlated features for the prediction.
- Conduct a regressive amount of alpha and beta testing, and fine tune the models as much as possible, and choose the most optimal values for the hyper parameters.
- Exploit the values of current predictions and the real time values in order to add the reward/punishment mechanism (reinforcement learning).
3.6. Rainbow Technique
- To utilize a DQN, which computes the Q value with the help of a neural network, in order to minimize the objective function. A DQN trains itself exactly like the training of a deep neural network.
- In order to handle the overestimation problem, the double Q Learning is utilized.
- Prioritized replay algorithm provides the highest priority to those samples, which are responsible for the higher Q loss in the last iterations.
- The duel network architecture, which has two network streams, is employed. One network is used for the optimization of value and second is for the calculation of advantage calculation.
- To calculate the value in N-steps instead of just relaying on next single step, we utilized the power of multi learning.
- Instead of averaging distribution mechanisms, we utilized distributional RL which can approximate the better distribution of Q-values.
- Finally, we exploited the power of noisy nets to estimate the targeted values and neglect the noise values.
4. Experimental Evaluation
4.1. Dataset
4.2. Result and Discussion
4.3. Discussion
5. Conclusions and Future work
Funding
Conflicts of Interest
References
- Yang, Y.; Zhou, Y.; Liu, J.; Zhao, Y.; Lu, H.; Xu, L.; Leung, H. Effort-aware just-in-time defect prediction: Simple unsupervised models could be better than supervised models. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Seattle, WA, USA, 13–18 November 2016; pp. 157–168. [Google Scholar]
- Jacekliwerski, T.Z.; Zeller, A. When do changes induce fixes? SIGSOFT Softw. Eng. Notes 2005, 30, 15. [Google Scholar]
- Zheng, Q.; Kimura, H. Just-in-Time Modeling for Function Prediction and Its Applications. Asian J. Control 2001, 3, 35–44. [Google Scholar] [CrossRef]
- Mockus, A.; Weiss, D.M. Predicting risk of software changes. Bell Labs Tech. J. 2000, 5, 169–180. [Google Scholar] [CrossRef]
- Fukushima, T.; Kamei, Y.; McIntosh, S.; Yamashita, K.; Ubayashi, N. An empirical study of just-in-time defect prediction using cross-project models. In Proceedings of the 11th Working Conference on Mining Software Repositories, Hyderabad, India, 31 May–1 June 2014; pp. 172–181. [Google Scholar]
- Kamei, Y.; Shihab, E.; Adams, B.; Hassan, A.E.; Mockus, A.; Sinha, A.; Ubayashi, N. A large-scale empirical study of just-in-time quality assurance. IEEE Trans. Softw. Eng. 2012, 39, 757–773. [Google Scholar] [CrossRef]
- Zhou, T.; Sun, X.; Xia, X.; Li, B.; Chen, X. Improving defect prediction with deep forest. Inf. Softw. Technol. 2019, 114, 204–216. [Google Scholar] [CrossRef]
- Ohlsson, N.; Alberg, H. Predicting fault-prone software modules in telephone switches. IEEE Trans. Softw. Eng. 1996, 22, 886–894. [Google Scholar] [CrossRef]
- Xu, Z.; Liu, J.; Luo, X.; Yang, Z.; Zhang, Y.; Yuan, P.; Tang, Y.; Zhang, T. Software defect prediction based on kernel PCA and weighted extreme learning machine. Inf. Softw. Technol., 2019; pp. 182–200. [Google Scholar]
- Xing, F.; Guo, P.; Lyu, M.R. A novel method for early software quality prediction based on support vector machine. In Proceedings of the 16th IEEE International Symposium on Software Reliability Engineering (ISSRE’05), Chicago, IL, USA, 8–11 November 2005. [Google Scholar]
- Yu, T.; Wen, W.; Han, X.; Hayes, J. ConPredictor: Concurrency Defect Prediction in Real-World Applications. IEEE Trans. Softw. Eng. 2018, 45, 558–575. [Google Scholar] [CrossRef]
- Qiao, L.; Wang, Y. Effort-aware and just-in-time defect prediction with neural network. PLoS ONE 2019, 14, e0211359. [Google Scholar] [CrossRef] [PubMed]
- Liu, C.; Yang, D.; Xia, X.; Yan, M.; Zhang, X. Cross-Project Change-Proneness Prediction. In Proceedings of the 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), Tokyo, Japan, 23–27 July 2018; Volume 1, pp. 64–73. [Google Scholar]
- Hata, H.; Mizuno, O.; Kikuno, T. Bug prediction based on fine-grained module histories. In Proceedings of the 34th International Conference on Software Engineering, Zurich, Switzerland, 2–9 June 2012; pp. 200–210. [Google Scholar]
- Kamei, Y.; Matsumoto, S.; Monden, A.; Matsumoto, K.I.; Adams, B.; Hassan, A.E. Revisiting common bug prediction findings using effort-aware models. In Proceedings of the 2010 IEEE International Conference on Software Maintenance, Timisoara, Romania, 12–18 September 2010; pp. 1–10. [Google Scholar]
- Khan, G.; Farooq, M.A.; Hussain, J.; Tariq, Z.; Khan, M.U.G. Categorization of Crowd Varieties using Deep Concurrent Convolution Neural Network. In Proceedings of the 2019 2nd International Conference on Advancements in Computational Sciences (ICACS), Lahore, Pakistan, 18–20 February 2019; pp. 1–6. [Google Scholar]
- Wang, X.; Qin, Y.; Wang, Y.; Xiang, S.; Chen, H. ReLTanh: An activation function with vanishing gradient resistance for SAE-based DNNs and its application to rotating machinery fault diagnosis. Neurocomputing 2019, 363, 88–98. [Google Scholar] [CrossRef]
- Kwan, H.K. Simple sigmoid-like activation function suitable for digital hardware implementation. Electron. Lett. 1992, 28, 1379–1380. [Google Scholar] [CrossRef]
- Schmidt-Hieber, J. Nonparametric regression using deep neural networks with ReLU activation function. arXiv 2017, arXiv:1708.06633. Available online: https://arxiv.org/abs/1708.06633 (accessed on 20 November 2019).
- Grave, E.; Joulin, A.; Cissé, M.; Jégou, H. Efficient softmax approximation for GPUs. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2019; pp. 1302–1310.
- Claudio, G.; Micheli, A.; Pedrelli, L. Deep Echo State Networks for Diagnosis of Parkinson’s Disease. arXiv 2019, arXiv:1802.06708. Available online: https://arxiv.org/abs/1802.06708 (accessed on 20 November 2019).
- Fan, E. Extended tanh-function method and its applications to nonlinear equations. Phys. Lett. A 2000, 277, 212–218. [Google Scholar] [CrossRef]
- Pedrelli, L. Analysis of Deep Learning Models using Deep Echo State Networks (DeepESNs). Available online: https://www.deeplearningitalia.com/analysis-of-deep-learning-models-using-deep-echo-state-networks-deepesns/ (accessed on 20 November 2019).
- Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Liaw, A.; Wiener, M. Classification and regression by randomForest. Available online: http://cogns.northwestern.edu/cbmg/LiawAndWiener2002.pdf (accessed on 20 November 2019).
- Yitzhaki, S. On an extension of the Gini inequality index. Int. Econ. Rev. 1983, 24, 617–628. [Google Scholar] [CrossRef]
- Khan, G.; Tariq, Z.; Hussain, J.; Farooq, M.A. Segmentation of Crowd into Multiple Constituents Using Modified Mask R-CNN Based on Mutual Positioning of Human. In Proceedings of the 2019 International Conference on Communication Technologies (ComTech), Rawalpindi, Pakistan, 20–21 March 2019; pp. 19–25. [Google Scholar]
- Nie, J.; Haykin, S. A Q-learning-based dynamic channel assignment technique for mobile communication systems. IEEE Trans. Veh. Technol. 1999, 48, 1676–1687. [Google Scholar]
- Hessel, M.; Modayil, J.; Van Hasselt, H.; Schaul, T.; Ostrovski, G.; Dabney, W.; Horgan, D.; Piot, B.; Azar, M.; Silver, D. Rainbow: Combining improvements in deep reinforcement learning. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2019. [Google Scholar]
- Pesch, H.J.; Bulirsch, R. The maximum principle, Bellman’s equation, and Carathéodory’s work. J. Optim. Theory Appl. 1994, 80, 199–225. [Google Scholar] [CrossRef]
- Huang, Q.; Xia, X.; Lo, D. Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction. Empir. Softw. Eng. 2019, 24, 2823–2862. [Google Scholar] [CrossRef]
Project | Period | No. Changes | Avg LOC | No. Modified Files/Change | No. Changes/Day | No. Dev Per File | ||
---|---|---|---|---|---|---|---|---|
File | Change | Max | Avg. | |||||
Bugzilla | 08/1998–12/2006 | 4620 | 389.8 | 37.5 | 2.3 | 1.5 | 37 | 8.4 |
Columba | 11/2002–7/2006 | 4455 | 125.0 | 149.4 | 6.2 | 3.3 | 10 | 1.6 |
Eclipse JDT | 5/2001–12/2007 | 35,386 | 260.1 | 71.4 | 4.3 | 14.7 | 19 | 4.0 |
Eclipse Platform | 5/2001–12/2007 | 64,250 | 231.6 | 72.2 | 4.3 | 26.7 | 28 | 2.8 |
Mozilla | 1/200–12/2006 | 98,275 | 360.2 | 106.5 | 5.3 | 38.9 | 155 | 6.4 |
PostgreSQL | 7/1996–5/2010 | 20,431 | 563.0 | 101.3 | 4.5 | 4.0 | 20 | 4.0 |
OSS-Median | – | 27,909 | 310.1 | 86.7 | 4.4 | 9.4 | 24 | 4.0 |
C-1 | 10/2000–12/2009 | 4096 | – | 16.4 | 2.0 | 1.2 | - | - |
C-2 | 10/2000–12/2009 | 9277 | – | 19.2 | 2.4 | 2.8 | - | - |
C-3 | 7/2000–12/2009 | 3586 | – | 16.6 | 2.0 | 1.3 | - | - |
C-4 | 12/2003–12/2009 | 5182 | – | 12.9 | 1.8 | 2.4 | - | - |
C-5 | 10/1982–12/1995 | 10,961 | 303.0 | 39.0 | 4.8 | 2.3 | - | - |
Com-Median | – | 5182 | – | 16.6 | 2.0 | 2.3 | - | - |
Model | Acc | P-score | F1-Score | Sensitivity | Specificity | Acc2 | P-score2 | F1-score2 | Sensitivity2 | Specificity2 |
---|---|---|---|---|---|---|---|---|---|---|
LR | 0.6550 | 0.6714 | 0.6714 | 0.6368 | 0.6714 | 0.6303 | 0.4237 | 0.4911 | 0.7805 | 0.4237 |
SVM linear | 0.6550 | 0.6551 | 0.6877 | 0.6547 | 0.6551 | 0.5833 | 0.3897 | 0.4768 | 0.7727 | 0.3897 |
SVM RBF | 0.4725 | 0.4000 | 0.0186 | 0.4734 | 0.4000 | 0.6942 | 0.4000 | 0.0029 | 0.6946 | 0.4000 |
Naïve-B | 0.5775 | 0.5597 | 0.6943 | 0.6842 | 0.5597 | 0.6828 | 0.4726 | 0.3885 | 0.7398 | 0.4726 |
J48 | 0.6600 | 0.6813 | 0.6714 | 0.6377 | 0.6813 | 0.6823 | 0.4571 | 0.2893 | 0.7194 | 0.4571 |
RF | 0.7100 | 0.7701 | 0.6979 | 0.6637 | 0.7701 | 0.6754 | 0.4519 | 0.3561 | 0.7308 | 0.4519 |
AdaBoost | 0.6900 | 0.7171 | 0.6633 | 0.6960 | 0.6633 | 0.6808 | 0.4760 | 0.4605 | 0.7628 | 0.4760 |
XGBoost | 0.7000 | 0.7393 | 0.6984 | 0.6650 | 0.7393 | 0.6823 | 0.4571 | 0.2893 | 0.7194 | 0.4571 |
DNN | 0.6275 | 0.6177 | 0.6823 | 0.6453 | 0.6177 | 0.5322 | 0.3289 | 0.4001 | 0.7156 | 0.3289 |
K-Means | 0.4775 | 1.000 | 0.0094 | 0.4761 | 1.0 | 0.6962 | 0.5178 | 0.1463 | 0.7057 | 0.5178 |
Accuracy | P Score | F1 Score | Sensitivity | Specificity | |
---|---|---|---|---|---|
Sample Dataset | 0.7739 | 0.7842 | 0.7546 | 0.7798 | 0.7842 |
Different Dataset | 0.8185 | 0.7993 | 0.5860 | 0.8162 | 0.7993 |
© 2019 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Albahli, S. A Deep Ensemble Learning Method for Effort-Aware Just-In-Time Defect Prediction. Future Internet 2019, 11, 246. https://doi.org/10.3390/fi11120246
Albahli S. A Deep Ensemble Learning Method for Effort-Aware Just-In-Time Defect Prediction. Future Internet. 2019; 11(12):246. https://doi.org/10.3390/fi11120246
Chicago/Turabian StyleAlbahli, Saleh. 2019. "A Deep Ensemble Learning Method for Effort-Aware Just-In-Time Defect Prediction" Future Internet 11, no. 12: 246. https://doi.org/10.3390/fi11120246
APA StyleAlbahli, S. (2019). A Deep Ensemble Learning Method for Effort-Aware Just-In-Time Defect Prediction. Future Internet, 11(12), 246. https://doi.org/10.3390/fi11120246