Fine-Tuned RoBERTa Model for Bug Detection in Mobile Games: A Comprehensive Approach
Abstract
:1. Introduction
- Dataset development: We develop a binary and multi-class dataset for bug detection in user reviews, establish guidelines for dataset annotation, and evaluate existing datasets to suggest enhancements;
- Understanding of bug detection in user reviews: We examine the linguistic nuances of user reviews and model bug reports as distinct categories to inform the development of our dataset and classification tasks;
- Text classification: We explore bug detection through a binary and multi-class text classification (TC) task, which is a relatively new approach. The binary classification involves classifying whether the review reports a bug or not. If the review reports a bug, the multi-class classification further classifies the specific type of bug into categories such as network, graphical, and performance issues.
- Benchmarking: We conduct various experiments on learning approaches, offering a benchmark for future research on bug detection tasks.
- Performance improvements: The proposed model (RoBERTa) achieved a 96% cross-validation score in binary classification and a 92% cross-validation score in multi-class classification, resulting in improvements of 5.49% and 8.24%, respectively, compared to traditional machine learning models (LR 91% in binary and 85% in multi-class classification).
2. Literature Review
3. Materials and Methods
- Data Collection: This stage involved collecting 10 k user reviews from both the Google Play Store and App Store.
- Data Pre-processing: The second stage entailed pre-processing the data to remove noise from the dataset.
- Data Labeling: The third stage comprised labeling the data into binary and multi-class categories.
- Application of Models: The fourth stage entailed applying machine learning, deep learning, and pre-trained transfer learning models, such as SVM, ANN, and LSTM, and pre-trained BERT models to predict binary and multi-class categories.
- Model Evaluation: In this phase, predictive models were evaluated using four metrics: accuracy, precision, recall, and macro F1-score. These metrics provided insights into the effectiveness of the models in predicting the near-optimal class.
3.1. Construction of Dataset
3.2. Data Pre-Processing
3.3. Data Labeling
3.4. Application of Models Training and Testing Phase
3.5. Model Evaluation Phase
- TP: True Positive.
- FP: False Positive.
- FN: False Negative.
- : Average cross-validation error over k folds.
- : Score computed on the i-th validation set.
4. Experimental Results
4.1. Results for Machine Learning
4.2. Results for Deep Learning
4.3. Transformer Results
4.4. Error Analysis
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Appinventiv. Google Play Store Statistics: Top Trends & Data Analysis. Available online: https://appinventiv.com/blog/google-play-store-statistics/ (accessed on 23 May 2024).
- Statista. Number of Available Gaming Apps in the Google Play Store from 1st Quarter 2015 to 1st Quarter 2022. Available online: www.statista.com/statistics/780229/number-of-available-gaming-apps-in-the-google-play-store-quarter (accessed on 23 May 2024).
- Rouse, R., III. Gaming and graphics: The console and PC: Separated at birth? ACM SIGGRAPH Comput. Graph. 2001, 35, 5–9. [Google Scholar] [CrossRef]
- Sadiq, S.; Umer, M.; Ullah, S.; Mirjalili, S.; Rupapara, V.; Nappi, M. Discrepancy detection between actual user reviews and numeric ratings of Google App store using deep learning. Expert Syst. Appl. 2021, 181, 115111. [Google Scholar] [CrossRef]
- Schumer, H.; Amadi, C.; Joshi, A. Evaluating the dietary and nutritional apps in the google play store. Healthc. Inform. Res. 2018, 24, 38. [Google Scholar] [CrossRef] [PubMed]
- Martens, D.; Maalej, W. Towards understanding and detecting fake reviews in app stores. Empir. Softw. Eng. 2019, 24, 3316–3355. [Google Scholar] [CrossRef]
- Carreño, L.V.G.; Winbladh, K. Analysis of user comments: An approach for software requirements evolution. In Proceedings of the 2013 35th International Conference on Software Engineering (ICSE), San Francisco, CA, USA, 18–26 May 2013. [Google Scholar]
- Hu, G.; Yuan, X.; Tang, Y.; Yang, J. Efficiently, effectively detecting mobile app bugs with appdoctor. In Proceedings of the Ninth European Conference on Computer Systems, Amsterdam, The Netherlands, 14–16 April 2014; pp. 1–15. [Google Scholar]
- Xu, Z.; Zhao, K.; Zhang, T.; Fu, C.; Yan, M.; Xie, Z.; Zhang, X.; Catolino, G. Effort-aware just-in-time bug prediction for mobile apps via cross-triplet deep feature embedding. IEEE Trans. Reliab. 2021, 71, 204–220. [Google Scholar] [CrossRef]
- Guzman, E.; Maalej, W. How do users like this feature? A fine grained sentiment analysis of app reviews. In Proceedings of the 2014 IEEE 22nd International Requirements Engineering Conference (RE), Karlskrona, Sweden, 25–29 August 2014. [Google Scholar]
- Kristensen, J.T.; Burelli, P. Difficulty Modelling in Mobile Puzzle Games: An Empirical Study on Different Methods to Combine Player Analytics and Simulated Data. arXiv 2024, arXiv:2401.17436. [Google Scholar] [CrossRef]
- Li, H.; Zhang, L.; Zhang, L.; Shen, J. A user satisfaction analysis approach for software evolution. In Proceedings of the 2010 IEEE International Conference on Progress in Informatics and Computing, Shanghai, China, 10–12 December 2010; Volume 2. [Google Scholar]
- Pagano, D.; Maalej, W. User feedback in the appstore: An empirical study. In Proceedings of the 2013 21st IEEE International Requirements Engineering Conference (RE), Rio de Janeiro, Brazil, 15–19 July 2013. [Google Scholar]
- Finkelstein, A.; Harman, M.; Jia, Y.; Martin, W.; Sarro, F.; Zhang, Y. App store analysis: Mining app stores for relationships between customer, business and technical characteristics. RN 2014, 14, 24. [Google Scholar]
- Zhang, J.; Musa, A.; Le, W. A comparison of energy bugs for smartphone platforms. In Proceedings of the 2013 1st International Workshop on the Engineering of Mobile-Enabled Systems (MOBS), San Francisco, CA, USA, 25 May 2013. [Google Scholar]
- Jiang, H.; Yang, H.; Qin, S.; Su, Z.; Zhang, J.; Yan, J. Detecting energy bugs in android apps using static analysis. In International Conference on Formal Engineering Methods; Springer International Publishing: Cham, Switzerland, 2017. [Google Scholar]
- Payandeh, A.; Sharbaf, M.; Rahimi, S.K. A Systematic Review of Model-Driven Game Development Studies. IEEE Trans. Games, early access. 2024. [Google Scholar]
- Taesiri, M.R.; Macklon, F.; Habchi, S.; Bezemer, C.-P. Searching bug instances in gameplay video repositories. IEEE Trans. Games, early access. 2024. [Google Scholar]
- GomezRomero-Borquez, J.; Del-Valle-Soto, C.; Del-Puerto-Flores, J.A.; Briseño, R.A.; Varela-Aldás, J. Neurogaming in Virtual Reality: A Review of Video Game Genres and Cognitive Impact. Electronics 2024, 13, 1683. [Google Scholar] [CrossRef]
- Wu, X.; Ye, J.; Chen, K.; Xie, X.; Hu, Y.; Huang, R.; Ma, L.; Zhao, J. Widget detection-based testing for industrial mobile games. In Proceedings of the 2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), Melbourne, Australia, 14–20 May 2023; pp. 173–184. [Google Scholar]
- van der Lee, W.; Verwer, S. Vulnerability Detection on Mobile Applications Using State Machine Inference. In Proceedings of the 2018 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), London, UK, 23–27 April 2018; pp. 1–10. [Google Scholar]
- Tazuddin, A.M.; Ming, W.M.; Aizuddin, A.M. Collaborative Location-Based Mobile Game with Error Detection Algorithm. J. Telecommun. Electron. Comput. Eng. (JTEC) 2018, 10, 1–7. [Google Scholar]
- Abbasi, A.M.; Al-Tekreeti, M.; Naik, K.; Nayak, A.; Srivastava, P.; Zaman, M. Characterization and detection of tail energy bugs in smartphones. IEEE Access 2018, 6, 65098–65108. [Google Scholar] [CrossRef]
- Kim, C.H.P.; Kroening, D.; Kwiatkowska, M. Static program analysis for identifying energy bugs in graphics-intensive mobile apps. In Proceedings of the 2016 IEEE 24th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), London, UK, 19–21 September 2016. [Google Scholar]
Fun game, but frequent lag and frame rate drops make it hard to enjoy, especially during intense moments.! |
Game player review after removing the unnecessary character and make changes: |
fun game but frequent lag and frame rate drops make it hard to enjoy especially during intense moments |
Hyperparameter | Grid Search Values |
---|---|
Learning Rate | , , , , |
Epoch | 3, 9, 20, 25 |
Batch Size | 8, 32, 64, 128 |
Weight Decay | 0.01–0.1 |
Hidden Dropout | 0.02, 0.1 |
Warm-up Steps | 0.03–0.1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Usman, M.; Ahmad, M.; Ullah, F.; Muzamil, M.; Hamza, A.; Jalal, M.; Gelbukh, A. Fine-Tuned RoBERTa Model for Bug Detection in Mobile Games: A Comprehensive Approach. Computers 2025, 14, 113. https://doi.org/10.3390/computers14040113
Usman M, Ahmad M, Ullah F, Muzamil M, Hamza A, Jalal M, Gelbukh A. Fine-Tuned RoBERTa Model for Bug Detection in Mobile Games: A Comprehensive Approach. Computers. 2025; 14(4):113. https://doi.org/10.3390/computers14040113
Chicago/Turabian StyleUsman, Muhammad, Muhammad Ahmad, Fida Ullah, Muhammad Muzamil, Ameer Hamza, Muhammad Jalal, and Alexander Gelbukh. 2025. "Fine-Tuned RoBERTa Model for Bug Detection in Mobile Games: A Comprehensive Approach" Computers 14, no. 4: 113. https://doi.org/10.3390/computers14040113
APA StyleUsman, M., Ahmad, M., Ullah, F., Muzamil, M., Hamza, A., Jalal, M., & Gelbukh, A. (2025). Fine-Tuned RoBERTa Model for Bug Detection in Mobile Games: A Comprehensive Approach. Computers, 14(4), 113. https://doi.org/10.3390/computers14040113