Violence-YOLO: Enhanced GELAN Algorithm for Violence Detection
Abstract
:1. Introduction
- The simple, parameter-free attention module (SimAM) is integrated into the neck structure of GELAN to identify attention regions in the scene.
- By integrating GhostNet and RepGhostNet modules into the YOLOv9 network model, we introduce two new modules, RepNCSPELAN4_GB and RepNCSPELAN4_RGB, in the backbone and neck networks. Furthermore, the shallow ordinary convolution in the backbone network is replaced with GhostConv, reducing computational complexity.
- A lightweight universal upsampling operator, DySample, is used to replace the traditional nearest neighbor interpolation upsampling module, minimizing the loss of feature information during the upsampling process.
- Combined with Focaler-IoU loss, it mitigates the neglect of simple and difficult samples, focusing on different regression samples, thus improving training accuracy.
- Violence-YOLO, a violent behavior detection algorithm based on YOLOv9, is proposed. On our customized dataset, the average precision of detection during training (mAP@0.5) reaches 92.6%, reflecting an improvement of 0.9%. Additionally, the computational load and the model size are reduced by 12.3% and 12.4%, respectively.
2. Related Work
2.1. 3D-CNN
2.2. CNN-RNN
2.3. YOLO
3. Materials and Methods
3.1. Overview of Violence-YOLO
3.2. Lightweight Modules
3.3. Upsampling Module
3.4. Attention Mechanism
3.5. Loss Function
4. Experimental Design and Analysis of Results
4.1. Data Set
4.1.1. Violence Data Set
4.1.2. Pascal VOC
4.2. Experimental Environment and Parameter Setting
4.3. Evaluation Metrics
4.4. Impact of Lightweight Modules on Algorithm Performance
4.5. Impact of Different Attention Mechanisms on Algorithm Performance
4.6. Ablation Experiments
4.7. Comparative Experiments
4.8. Visualization and Analysis
4.9. Misjudgment and Ethical Analysis
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Yao, H.; Hu, X. A survey of video violence detection. Cyber-Phys. Syst. 2023, 9, 1–24. [Google Scholar] [CrossRef]
- Kumar, P.; Shih, G.L.; Guo, B.L.; Nagi, S.K.; Manie, Y.C.; Yao, C.K.; Arockiyadoss, M.A.; Peng, P.C. Enhancing Smart City Safety and Utilizing AI Expert Systems for Violence Detection. Future Internet 2024, 16, 50. [Google Scholar] [CrossRef]
- Wang, Z.; Lei, L.; Shi, P. Smoking behavior detection algorithm based on YOLOv8-MNC. Front. Comput. Neurosci. 2023, 17, 1243779. [Google Scholar] [CrossRef]
- Moshayedi, A.J.; Uddin, N.M.I.; Khan, A.S.; Zhu, J.; Emadi Andani, M. Designing and Developing a Vision-Based System to Investigate the Emotional Effects of News on Short Sleep at Noon: An Experimental Case Study. Sensors 2023, 23, 8422. [Google Scholar] [CrossRef]
- Singh, A.; Anand, T.; Sharma, S.; Singh, P. IoT based weapons detection system for surveillance and security using YOLOV4. In Proceedings of the 2021 6th International Conference on Communication and Electronics Systems (ICCES), Coimbatre, India, 8–10 July 2021; pp. 488–493. [Google Scholar]
- Li, J.; Liu, J.; Li, C.; Jiang, F.; Huang, J.; Ji, S.; Liu, Y. A hyperautomative human behaviour recognition algorithm based on improved residual network. Enterp. Inf. Syst. 2023, 17, 2180777. [Google Scholar] [CrossRef]
- Gao, H. A Yolo-based Violence Detection Method in IoT Surveillance Systems. Int. J. Adv. Comput. Sci. Appl. 2023, 14, 143–149. [Google Scholar] [CrossRef]
- Moshayedi, A.J.; Roy, A.S.; Kolahdooz, A.; Shuxin, Y. Deep learning application pros and cons over algorithm deep learning application pros and cons over algorithm. EAI Endorsed Trans. AI Robot. 2022, 1. [Google Scholar] [CrossRef]
- Khan, M.; El Saddik, A.; Gueaieb, W.; De Masi, G.; Karray, F. VD-Net: An Edge Vision-Based Surveillance System for Violence Detection. IEEE Access 2024, 12, 43796–43808. [Google Scholar] [CrossRef]
- Luo, D.; Xue, Y.; Deng, X.; Yang, B.; Chen, H.; Mo, Z. Citrus Diseases and Pests Detection Model Based on Self-Attention YOLOV8. IEEE Access 2023, 11, 139872–139881. [Google Scholar] [CrossRef]
- Wang, P.; Wang, P.; Fan, E. Violence detection and face recognition based on deep learning. Pattern Recognit. Lett. 2021, 142, 20–24. [Google Scholar] [CrossRef]
- Zhou, X.; Chen, Y.; Zhang, Q. Trajectory Analysis Method Based on Video Surveillance Anomaly Detection. In Proceedings of the 2021 China Automation Congress (CAC), Beijing, China, 22–24 October 2021; pp. 1141–1145. [Google Scholar]
- Guo, M.F.; Zeng, X.D.; Chen, D.Y.; Yang, N.C. Deep-learning-based earth fault detection using continuous wavelet transform and convolutional neural network in resonant grounding distribution systems. IEEE Sens. J. 2017, 18, 1291–1300. [Google Scholar] [CrossRef]
- Barros, F.; Aguiar, S.; Sousa, P.J.; Cachaço, A.; Tavares, P.J.; Moreira, P.M.; Ranzal, D.; Cardoso, N.; Fernandes, N.; Fernandes, R.; et al. Displacement monitoring of a pedestrian bridge using 3D digital image correlation. Procedia Struct. Integr. 2022, 37, 880–887. [Google Scholar] [CrossRef]
- Khan, M.; Gueaieb, W.; El Saddik, A.; De Masi, G.; Karray, F. An efficient violence detection approach for smart cities surveillance system. In Proceedings of the 2023 IEEE International Smart Cities Conference (ISC2), Bucharest, Romania, 24–27 September 2023; pp. 1–5. [Google Scholar]
- Ramzan, M.; Abid, A.; Khan, H.U.; Awan, S.M.; Ismail, A.; Ahmed, M.; Ilyas, M.; Mahmood, A. A review on state-of-the-art violence detection techniques. IEEE Access 2019, 7, 107560–107575. [Google Scholar] [CrossRef]
- Liu, G.; Wang, Z.; Zhang, H.; Guo, X.; Wang, Y.; Zhang, C. A novel violent video detection method based on improved C3D and transfer learning. In Proceedings of the CIBDA 2022; 3rd International Conference on Computer Information and Big Data Applications, Wuhan, China, 25–27 March 2022; pp. 1–7. [Google Scholar]
- Fenil, E.; Manogaran, G.; Vivekananda, G.; Thanjaivadivel, T.; Jeeva, S.; Ahilan, A. Real time violence detection framework for football stadium comprising of big data analysis and deep learning through bidirectional LSTM. Comput. Netw. 2019, 151, 191–200. [Google Scholar]
- Singh, K.; Rajora, S.; Vishwakarma, D.K.; Tripathi, G.; Kumar, S.; Walia, G.S. Crowd anomaly detection using aggregation of ensembles of fine-tuned convnets. Neurocomputing 2020, 371, 188–198. [Google Scholar] [CrossRef]
- Accattoli, S.; Sernani, P.; Falcionelli, N.; Mekuria, D.N.; Dragoni, A.F. Violence detection in videos by combining 3D convolutional neural networks and support vector machines. Appl. Artif. Intell. 2020, 34, 329–344. [Google Scholar] [CrossRef]
- Magdy, M.; Fakhr, M.W.; Maghraby, F.A. Violence 4D: Violence detection in surveillance using 4D convolutional neural networks. IET Comput. Vis. 2023, 17, 282–294. [Google Scholar] [CrossRef]
- Waddenkery, N.; Soma, S. An efficient convolutional neural network for detecting the crime of stealing in videos. Entertain. Comput. 2024, 51, 100723. [Google Scholar] [CrossRef]
- Polverino, L.; Abbate, R.; Manco, P.; Perfetto, D.; Caputo, F.; Macchiaroli, R.; Caterino, M. Machine learning for prognostics and health management of industrial mechanical systems and equipment: A systematic literature review. Int. J. Eng. Bus. Manag. 2023, 15, 18479790231186848. [Google Scholar] [CrossRef]
- Wang, C.Y.; Yeh, I.H.; Liao, H.Y.M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv 2024, arXiv:2402.13616. [Google Scholar]
- Terven, J.; Cordova-Esparza, D. A comprehensive review of YOLO: From YOLOv1 to YOLOv8 and beyond. arXiv 2023, arXiv:2304.00501. [Google Scholar]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar]
- Balakrishnan, T.; Sengar, S.S. RepVGG-GELAN: Enhanced GELAN with VGG-STYLE ConvNets for Brain Tumour Detection. arXiv 2024, arXiv:2405.03541. [Google Scholar]
- Wang, C.Y.; Liao, H.Y.M.; Wu, Y.H.; Chen, P.Y.; Hsieh, J.W.; Yeh, I.H. CSPNet: A new backbone that can enhance learning capability of CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 390–391. [Google Scholar]
- Wang, C.Y.; Liao, H.Y.M.; Yeh, I.H. Designing network design strategies through gradient path analysis. arXiv 2022, arXiv:2211.04800. [Google Scholar]
- Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. Ghostnet: More features from cheap operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 1580–1589. [Google Scholar]
- Firdiantika, I.M.; Lee, S.; Bhattacharyya, C.; Jang, Y.; Kim, S. EGCY-Net: An ELAN and GhostConv-Based YOLO Network for Stacked Packages in Logistic Systems. Appl. Sci. 2024, 14, 2763. [Google Scholar] [CrossRef]
- Chen, C.; Guo, Z.; Zeng, H.; Xiong, P.; Dong, J. Repghost: A hardware-efficient ghost module via re-parameterization. arXiv 2022, arXiv:2211.06088. [Google Scholar]
- Niu, K.; Yan, Y. A Small-Object-Detection Model Based on Improved YOLOv8 for UAV Aerial Images. In Proceedings of the 2023 2nd International Conference on Artificial Intelligence and Intelligent Information Processing (AIIIP), Hangzhou, China, 27–29 October 2023; pp. 57–60. [Google Scholar]
- Liu, W.; Lu, H.; Fu, H.; Cao, Z. Learning to Upsample by Learning to Sample. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 4–6 October 2023; pp. 6027–6037. [Google Scholar]
- Zhang, H.; Zhang, S. Focaler-IoU: More Focused Intersection over Union Loss. arXiv 2024, arXiv:2401.10525. [Google Scholar]
- Cai, Z.; Neher, H.; Vats, K.; Clausi, D.A.; Zelek, J. Temporal hockey action recognition via pose and optical flows. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–17 June 2019. [Google Scholar]
- Cheng, M.; Cai, K.; Li, M. RWF-2000: An open large scale video database for violence detection. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 4183–4190. [Google Scholar]
Module | Algorithms | mAP50/% | mAP50-95/% | Parameters/M | FLOPs/G | FPS |
---|---|---|---|---|---|---|
Baseline | GELAN-C | 91.7 | 74.9 | 25.2 | 101.8 | 47.6 |
GhostConv | One | 92.5 | 75.3 | 25.19 | 100 | 48.5 |
All | 92.3 | 75.2 | 25.19 | 99.8 | 46.7 | |
RepNCSPELAN4_GB | BackBone | 91.7 | 74.6 | 23.9 | 96.1 | 49.2 |
All | 91.5 | 74.6 | 22.1 | 90.4 | 49.5 | |
RepNCSPELAN4_RGB | BackBone | 91.8 | 74.4 | 24 | 96.6 | 47.3 |
All | 91.5 | 73.9 | 22.3 | 91.3 | 44.4 | |
RepNCSPELAN4_RGB | BackBone_RRG2 | 91.9 | 74.9 | 25 | 98.7 | 43.3 |
BackBone_RRG3 | 91.5 | 74.1 | 24.6 | 97.0 | 42.9 | |
GhostConv+RepNC SPELAN4_GB+RepN CSPELAN4_RGB | Ours | 91.6 | 74.2 | 22.1 | 88.9 | 48.3 |
Models | mAP50/% | mAP50-95/% | Parameters/M | FLOPs/G | FPS |
---|---|---|---|---|---|
GD-GELAN +SimAM | 91.9 | 74.6 | 22.1 | 88.9 | 48.5 |
+ECA | 91.5 | 73.9 | 22.1 | 88.9 | 49.7 |
+SKA | 92.3 | 75.2 | 44.2 | 159.5 | 37.3 |
+FocalModulation | 91.0 | 73.0 | 23.2 | 92.9 | 46.3 |
+SE | 91.8 | 75 | 22.1 | 88.9 | 49 |
Baseline | Ghost Lightweight Modules | Dysample | SimAM | Focaler-IoU | mAP50/% | mAP50-95/% | Parameters/M | FLOPs/G | FPS |
---|---|---|---|---|---|---|---|---|---|
GELAN-C | 91.7 | 74.9 | 25.2 | 101.8 | 47.6 | ||||
✓ | 91.6 | 74.2 | 22.1 | 88.9 | 48.0 | ||||
✓ | ✓ | 91.8 | 74.3 | 22.1 | 88.9 | 48.0 | |||
✓ | ✓ | ✓ | 91.9 | 74.6 | 22.1 | 88.9 | 48.5 | ||
✓ | ✓ | ✓ | ✓ | 92.6 | 75 | 22.1 | 88.9 | 47.8 |
Models | mAP50/% | mAP50-95/% | Parameters/M | FLOPs/G | FPS |
---|---|---|---|---|---|
YOLOv8l | 84.7 | 69.3 | 43.6 | 164.8 | 42.6 |
YOLOv8m-World | 83.5 | 66.3 | 29 | 89.9 | 57.8 |
YOLOv5m | 82.4 | 62.8 | 25 | 64 | 73.5 |
YOLOv3-tiny | 77.6 | 53.2 | 12.1 | 18.9 | 192.3 |
RT-DETR-L | 90.4 | 75.4 | 31.9 | 108 | 63.3 |
GELAN-C | 91.7 | 74.9 | 25.2 | 101.8 | 47.6 |
YOLOv9-C | 90.6 | 70.6 | 50.7 | 236.6 | 24.8 |
Ours | 92.6 | 75.0 | 22.1 | 88.9 | 47.8 |
Models | P /% | mAP50/% | mAP50-95/% | Parameters/M | FLOPs/G | FPS |
---|---|---|---|---|---|---|
YOLOv8l | 76.2 | 79.4 | 61.7 | 43.6 | 164.9 | 55.8 |
YOLOv8m-World | 81.8 | 78.4 | 60.2 | 29 | 99.4 | 74.6 |
YOLOv5m | 75.1 | 77.5 | 58.1 | 25.1 | 64 | 95.2 |
GELAN-C | 77.3 | 80.7 | 62.6 | 25.2 | 101.9 | 62.5 |
Ours | 78.3 | 80.4 | 62.6 | 22.2 | 89.0 | 63.7 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xu, W.; Zhu, D.; Deng, R.; Yung, K.; Ip, A.W.H. Violence-YOLO: Enhanced GELAN Algorithm for Violence Detection. Appl. Sci. 2024, 14, 6712. https://doi.org/10.3390/app14156712
Xu W, Zhu D, Deng R, Yung K, Ip AWH. Violence-YOLO: Enhanced GELAN Algorithm for Violence Detection. Applied Sciences. 2024; 14(15):6712. https://doi.org/10.3390/app14156712
Chicago/Turabian StyleXu, Wenbin, Dingju Zhu, Renfeng Deng, KaiLeung Yung, and Andrew W. H. Ip. 2024. "Violence-YOLO: Enhanced GELAN Algorithm for Violence Detection" Applied Sciences 14, no. 15: 6712. https://doi.org/10.3390/app14156712
APA StyleXu, W., Zhu, D., Deng, R., Yung, K., & Ip, A. W. H. (2024). Violence-YOLO: Enhanced GELAN Algorithm for Violence Detection. Applied Sciences, 14(15), 6712. https://doi.org/10.3390/app14156712