A Study of Classroom Behavior Recognition Incorporating Super-Resolution and Target Detection
Abstract
:1. Introduction
2. Related Work
2.1. Human Body Posture Estimation Recognition
2.2. Recognition Based on Human-Object Interactions
- SRGAN (Image Super Resolution Generative Adversarial Network) is used to generate the original image with high resolution, enrich the degree of detail of small targets, enhance the spatial features of the human–object interaction relationship, and improve the accuracy of recognition;
- A variable kernel convolution AKConv is added in the Backbone module of the target detection algorithm YOLOv8s, where the variable kernel convolution can adjust the initial rule pattern of adopting the network, according to the actual needs of adjusting the shape and size of the samples, so as to enable the network to adapt to different datasets and detect more targets;
- In the SPPF of the Backbone module of YOLOv8s, the integration of the LASK attention mechanism expands the receptive field and acquires wider contextual information, which significantly improves the feature aggregation capability of the SPPF module at multiple scales. It makes the network more focused on target-related features, which in turn improves the detection accuracy;
- By introducing the CBAM attention mechanism, the input features are processed through both channel and spatial dimensions, helping the model better focus on important features of character interactions and suppress irrelevant background information.
3. Recognition Network Model Used in This Study
3.1. System Architecture
3.2. Super-Resolution Generative Adversarial Networks
3.3. Improved YOLOv8s Network
3.3.1. YOLOv8s Network
3.3.2. YOLOv8s Network with Variable Kernel Convolution
3.3.3. Spatial Pyramid Pooling with Attention (SPPF_LSKA)
3.3.4. Improved Feature Fusion Section
4. Experimental Results
4.1. Dataset
4.2. Experimental Environment and Configuration
4.3. Indicators for Evaluation
4.4. Experimental Results and Analysis
4.4.1. Variable Kernel Convolution Based Ablation Experiments
4.4.2. Ablation Experiments Based on LSKA Localization
- Bulleted
4.4.3. Improved Feature-Fusion-Based Partial Ablation Experiments
- lists look
4.4.4. Comparative Experiments with Different Models
- like
4.4.5. Results and Analysis of This Experiment
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Wu, S. Simulation of classroom student behavior recognition based on PSO-kNN algorithm and emotional image processing. J. Intell. Fuzzy Syst. 2021, 40, 7273–7283. [Google Scholar] [CrossRef]
- Wang, Z.; Shen, C.; Zhao, C.; Liu, X.; Chen, J. Recognition of classroom learning behaviors based on the fusion of human pose estimation and object detection. J. East China Norm. Univ. (Nat. Sci.) 2022, 2022, 55. [Google Scholar]
- Chen, G.; Ji, J.; Huang, C. Student classroom behavior recognition based on openpose and deep learning. In Proceedings of the 2022 7th International Conference on Intelligent Computing and Signal Processing (ICSP), Xi’an, China, 15–17 April 2022. [Google Scholar]
- Fu, R.; Wu, T.; Luo, Z.; Duan, F.; Qiao, X.; Guo, P. Learning behavior analysis in classroom based on deep learning. In Proceedings of the 2019 Tenth International Conference on Intelligent Control and Information Processing (ICICIP), Marrakesh, Morocco, 14–19 December 2019; pp. 206–212. [Google Scholar]
- Kolesnikov, A.; Kuznetsova, A.; Lampert, C.; Ferrari, V. Detecting visual relationships using box attention. In Proceedings of the Proceedings of the IEEE/CVF international conference on computer vision workshops, Seoul, Republic of Korea, 27–28 October 2019. [Google Scholar]
- Ulutan, O.; Iftekhar, A.; Manjunath, B.S. Vsgnet: Spatial attention network for detecting human object interactions using graph convolutions. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Wang, Z.; Yao, J.; Zeng, C.; Wu, W.; Xu, H.; Yang, Y. Yolov5 enhanced learning behavior recognition and analysis in smart classroom with multiple students. In Proceedings of the 2022 International Conference on Intelligent Education and Intelligent Research (IEIR), Wuhan, China, 18–20 December 2022. [Google Scholar]
- Wang, T.; Yang, T.; Danelljan, M.; Khan, F.S.; Zhang, X.; Sun, J. Learning human-object interaction detection using interaction points. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Liu, Q.; Jiang, R.; Xu, Q.; Wang, D.; Sang, Z.; Jiang, X.; Wu, L. YOLOv8n_BT: Research on Classroom Learning Behavior Recognition Algorithm Based on Improved YOLOv8n. IEEE Access 2024, 12, 36391–36403. [Google Scholar] [CrossRef]
- Liu, B.; Chen, J. A super resolution algorithm based on attention mechanism and srgan network. IEEE Access 2021, 9, 139138–139145. [Google Scholar] [CrossRef]
- Luo, Z.; Wang, C.; Qi, Z.; Luo, C. LA_YOLOv8s: A lightweight-attention YOLOv8s for oil leakage detection in power transformers. Alex. Eng. J. 2024, 92, 82–91. [Google Scholar] [CrossRef]
- Jooshin, H.K.; Nangir, M.; Seyedarabi, H. Inception-YOLO: Computational cost and accuracy improvement of the YOLOv5 model based on employing modified CSP, SPPF, and inception modules. IET Image Process. 2024, 18, 1985–1999. [Google Scholar] [CrossRef]
- Zhang, X.; Song, Y.; Song, T.; Yang, D.; Ye, Y.; Zhou, J.; Zhang, L. AKConv: Convolutional Kernel with Arbitrary Sampled Shapes and Arbitrary Number of Parameters. arXiv 2023, arXiv:231111587. [Google Scholar]
- Lau, K.W.; Po, L.-M.; Rehman, Y.A.U. Large separable kernel attention: Rethinking the large kernel attention design in cnn. Expert Syst. Appl. 2024, 236, 121352. [Google Scholar] [CrossRef]
- Chevtchenko, S.F.; Vale, R.F.; Macario, V.; Cordeiro, F.R. A convolutional neural network with feature fusion for real-time hand posture recognition. Appl. Soft Comput. 2018, 73, 748–766. [Google Scholar] [CrossRef]
- Yang, K.; Zhang, Y.; Zhang, X.; Zheng, L. YOLOX with CBAM for insulator detection in transmission lines. Multimed. Tools Appl. 2024, 83, 43419–43437. [Google Scholar] [CrossRef]
- Jia, L.; Wang, Y.; Zang, Y.; Li, Q.; Leng, H.; Xiao, Z.; Long, W.; Jiang, L. MobileNetV3 with CBAM for bamboo stick counting. IEEE Access 2022, 10, 53963–53971. [Google Scholar] [CrossRef]
- Sheng, W.; Yu, X.; Lin, J.; Li, Q.; Leng, H.; Xiao, Z.; Long, W.; Jiang, L. Faster rcnn target detection algorithm integrating cbam and fpn. Appl. Sci. 2023, 13, 6913. [Google Scholar] [CrossRef]
- Fu, H.; Song, G.; Wang, Y. Improved YOLOv4 marine target detection combined with CBAM. Symmetry 2021, 13, 623. [Google Scholar] [CrossRef]
- Pischedda, V.; Radescu, S.; Dubois, M.; Batisse, N.; Balima, F.; Cavallari, C.; Cardenas, L. Experimental and DFT high pressure study of fluorinated graphite (C2F) n. Carbon 2017, 114, 690–699. [Google Scholar] [CrossRef]
- Chen, Y.; Zhang, X.; Chen, W.; Li, Y.; Wang, J. Research on recognition of fly species based on improved RetinaNet and CBAM. IEEE Access 2020, 8, 102907–102919. [Google Scholar] [CrossRef]
- Sun, B.; Wu, Y.; Zhao, K.; He, J.; Yu, L.; Yan, H.; Luo, A. Student Class Behavior Dataset: A video dataset for recognizing, detecting, and captioning students’ behaviors in classroom scenes. Neural Comput. Appl. 2021, 33, 8335–8354. [Google Scholar] [CrossRef]
- Wang, Z.; Yao, J.; Zeng, C.; Wu, W.; Xu, H.; Yang, Y. Learning behavior recognition in smart classroom with multiple students based on YOLOv5. arXiv 2023, arXiv:230310916. [Google Scholar]
- Lin, J.; Li, J.; Chen, J. An analysis of English classroom behavior by intelligent image recognition in IoT. Int. J. Syst. Assur. Eng. Manag. 2022, 13, 1063–1071. [Google Scholar] [CrossRef]
- Zamri, F.N.M.; Gunawan, T.S.; Yusoff, S.H.; Alzahrani, A.A.; Bramantoro, A.; Kartiwi, M. Enhanced Small Drone Detection using Optimized YOLOv8 with Attention Mechanisms. IEEE Access 2024, 12, 90629–90643. [Google Scholar] [CrossRef]
- Ji, X.; Niu, Y. A Lightweight Network for Human Pose Estimation Based on ECA Attention Mechanism. Electronics 2023, 13, 150. [Google Scholar] [CrossRef]
- Jia, Z.; Wang, K.; Li, Y.; Liu, Z.; Qin, J.; Yang, Q. High precision feature fast extraction strategy for aircraft attitude sensor fault based on RepVGG and SENet attention mechanism. Sensors 2022, 22, 9662. [Google Scholar] [CrossRef] [PubMed]
- Liu, P.; Wang, Q.; Zhang, H.; Mi, J.; Liu, Y. A lightweight object detection algorithm for remote sensing images based on attention mechanism and YOLOv5s. Remote Sens. 2023, 15, 2429. [Google Scholar] [CrossRef]
- Lee, H.; Eum, S.; Kwon, H. Me r-cnn: Multi-expert r-cnn for object detection. IEEE Trans. Image Process. 2019, 29, 1030–1044. [Google Scholar] [CrossRef]
- Saiki, Y.; Kabata, T.; Ojima, T.; Kajino, Y.; Inoue, D.; Ohmori, T.; Yoshitani, J.; Ueno, T.; Yamamuro, Y.; Taninaka, A.; et al. Reliability and validity of OpenPose for measuring hip-knee-ankle angle in patients with knee osteoarthritis. Sci. Rep. 2023, 13, 3297. [Google Scholar] [CrossRef] [PubMed]
- Li, L.; Liu, M.; Sun, L.; Li, Y.; Li, N. ET-YOLOv5s: Toward deep identification of students’ in-class behaviors. IEEE Access 2022, 10, 44200–44211. [Google Scholar] [CrossRef]
- Yang, F. Student Classroom Behavior Detection based on Improved YOLOv7. arXiv 2023, arXiv:230603318. [Google Scholar]
Name | Parameter |
---|---|
CPU GPU | 11th Gen Intel(R) Core(TM) i5-11400F @ 2.60 GHz 2.59 GHz (Intel, Santa Clara, CA, USA) NVIDIA GeForce RTX 3050 (Nvidia, Santa Clara, CA, USA) |
Memory Operating System | 16G Windows11 |
PyCharm Python | 2020.1 x64 3.9.16 |
Frameworks CUDA | Pytorch 1.12.1 + cu113 Version: 12.0 |
AKConv Position | Precision (%) | Recall (%) | F1 (%) | mAP50 (%) | mAP50-90 (%) |
---|---|---|---|---|---|
YOLOv8s | 85.65 | 84.78 | 85.35 | 87.74 | 67.55 |
Backbone | 88.78 | 87.42 | 87.5 | 89.86 | 70.56 |
SPPF | 86.11 | 86.31 | 87.14 | 89.52 | 67.67 |
Neck | 85.68 | 85.16 | 87.06 | 89.24 | 69.17 |
Backbone+Neck | 84.45 | 85.21 | 86.12 | 88.12 | 66.78 |
Backbone+SPPF | 85.98 | 84.88. | 86.12 | 88.45 | 68.12 |
LSKA Position | Precision (%) | Recall (%) | F1 (%) | mAP50 (%) | mAP50-90 (%) |
---|---|---|---|---|---|
YOLOv8s (baseline) | 85.65 | 84.78 | 85.35 | 87.74 | 67.55 |
Conv_LSKA | 86.25 | 85.21 | 86.35 | 87.98 | 68.12 |
MaxPool2d_LSKA | 89.46 | 88.65 | 87.43 | 90.15 | 69.97 |
Concat_LSKA | 87.21 | 86.98 | 87.25 | 88.25 | 68.14 |
Attention | Precision (%) | GPU_Mem (G) | Params (M) | FLOPs (B) |
---|---|---|---|---|
YOLOv8s(baseline) | 85.65 | 4.1 | 11.2 | 28.6 |
GAM | 89.45 | 4.9 | 19.8 | 32.8 |
ECA | 91.3 | 5.4 | 25.2 | 56.9 |
SENet | 84.31 | 4.5 | 15.2 | 30.2 |
ShuffleAttention | 88.90 | 4.7 | 17.8 | 34.5 |
CBAM | 90.12 | 4.6 | 17.89 | 31.54 |
Classroom Behavior | Our YOLOv8s | Faster R-CNN | OpenPose | YOLOv5 | YOLOv7 |
---|---|---|---|---|---|
Hand-raising | 0.995 | 0.842 | 0.836 | 0.884 | 0.841 |
Reading | 0.920 | 0.839 | 0.819 | 0.888 | 0.802 |
Writing | 0.925 | 0.814 | 0.789 | 0.892 | 0.782 |
Using phone | 0.962 | 0.959 | 0.958 | 0.961 | 0.966 |
Bowing the head | 0.894 | 0.901 | 0.888 | 0.842 | 0.905 |
Leaning over the table | 0.991 | 0.984 | 0.983 | 0.980 | 0.988 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, X.; Nie, J.; Wei, S.; Zhu, G.; Dai, W.; Yang, C. A Study of Classroom Behavior Recognition Incorporating Super-Resolution and Target Detection. Sensors 2024, 24, 5640. https://doi.org/10.3390/s24175640
Zhang X, Nie J, Wei S, Zhu G, Dai W, Yang C. A Study of Classroom Behavior Recognition Incorporating Super-Resolution and Target Detection. Sensors. 2024; 24(17):5640. https://doi.org/10.3390/s24175640
Chicago/Turabian StyleZhang, Xiaoli, Jialei Nie, Shoulin Wei, Guifu Zhu, Wei Dai, and Can Yang. 2024. "A Study of Classroom Behavior Recognition Incorporating Super-Resolution and Target Detection" Sensors 24, no. 17: 5640. https://doi.org/10.3390/s24175640