Accurately Estimate and Analyze Human Postures in Classroom Environments
Abstract
:1. Introduction
- A method for enhancing features is proposed and integrated with ECANet to improve the efficiency of the network during feature extraction. This method enhances the local cross-channel interaction strength of the module, resulting in significant performance improvement for ECAv2Net while using a minimal number of parameters.
- By integrating the ECAv2Net module into the HRNet backbone network, we introduce the ECAv2-HRNet model for human posture assessment. This model enhances the speed of model convergence without adding more model parameters, and it achieves higher accuracy compared to the ECA-HRNet model.
- To address the task of estimating the posture of a large group of people in a smart classroom, we created a dataset called GUET CLASS PICTURE. This dataset consists of nearly 10,000 annotated human postures, specifically focusing on identifying whether the learner is in a head-up or head-down position, as well as detecting turning and slouching movements. The dataset includes keypoints such as the nose, left and right eyes, left and right ears, left and right shoulders, left and right elbows, and left and right wrists, totaling 11 points.
2. Related Work
3. ECAv2-HRNet Architecture
3.1. The ECANet Attention Mechanism
3.2. ECAv2Net Attention Mechanism
3.3. Feature Representation for Varying Depths of Convolutional Layers
3.4. ECAv2Net Embedding in HRNet
4. Experimental Datasets
4.1. COCO Dataset
4.2. GUET CLASS PICTURE Dataset
5. Experiments and Results
5.1. Validation on COCO Dataset
5.1.1. Experimental Environment and Results on COCO Dataset
5.1.2. Ablation Experiment on COCO Dataset
5.2. Validation on GUET CLASS PICTURE Dataset
5.2.1. Experimental Environment and Results
5.2.2. Ablation Experiment on GUET CLASS PICTURE Dataset
5.3. Modeling Applications and Data Analysis
6. Discussion
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
References
- Mascret, Q.; Gagnon-Turcotte, G.; Bielmann, M.; Fall, C.L.; Bouyer, L.J.; Gosselin, B. A wearable sensor network with embedded machine learning for real-time motion analysis and complex posture detection. IEEE Sens. J. 2021, 22, 7868–7876. [Google Scholar] [CrossRef]
- Zheng, C.; Wu, W.; Chen, C.; Yang, T.; Zhu, S.; Shen, J.; Kehtarnavaz, N.; Shah, M. Deep learning-based human pose estimation: A survey. ACM Comput. Surv. 2023, 56, 1–37. [Google Scholar] [CrossRef]
- Chen, H.; Feng, R.; Wu, S.; Xu, H.; Zhou, F.; Liu, Z. 2D Human pose estimation: A survey. Multimed. Syst. 2023, 29, 3115–3138. [Google Scholar] [CrossRef]
- Zhang, C.; Chen, J.; Li, J.; Peng, Y.; Mao, Z. Large language models for human–robot interaction: A review. Biomim. Intell. Robot. 2023, 3, 100131. [Google Scholar] [CrossRef]
- Peng, Y.; Yang, X.; Li, D.; Ma, Z.; Liu, Z.; Bai, X.; Mao, Z. Predicting flow status of a flexible rectifier using cognitive computing. Expert Syst. Appl. 2025, 264, 125878. [Google Scholar] [CrossRef]
- Fang, H.S.; Li, J.; Tang, H.; Xu, C.; Zhu, H.; Xiu, Y.; Li, Y.L.; Lu, C. Alphapose: Whole-body regional multi-person pose estimation and tracking in real-time. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 7157–7173. [Google Scholar] [CrossRef] [PubMed]
- Jiang, T.; Lu, P.; Zhang, L.; Ma, N.; Han, R.; Lyu, C.; Li, Y.; Chen, K. Rtmpose: Real-time multi-person pose estimation based on mmpose. arXiv 2023, arXiv:2303.07399. [Google Scholar]
- Zhang, S.; Qiang, B.; Yang, X.; Zhou, M.; Chen, R.; Chen, L. Efficient pose estimation via a lightweight single-branch pose distillation network. IEEE Sens. J. 2023, 23, 27709–27719. [Google Scholar] [CrossRef]
- Holmquist, K.; Wandt, B. Diffpose: Multi-hypothesis human pose estimation using diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 15977–15987. [Google Scholar]
- Tang, Z.; Qiu, Z.; Hao, Y.; Hong, R.; Yao, T. 3D human pose estimation with spatio-temporal criss-cross attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 4790–4799. [Google Scholar]
- Tang, Z.; Qiu, Z.; Hao, Y.; Hong, R.; Yao, T. Poseformerv2: Exploring frequency domain for efficient and robust 3d human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 8877–8886. [Google Scholar]
- Zhang, T.; Lian, J.; Wen, J.; Chen, C.P. Multi-Person Pose Estimation in the Wild: Using Adversarial Method to Train a Top-Down Pose Estimation Network. IEEE Trans. Syst. Man Cybern. Syst. 2023, 53, 3919–3929. [Google Scholar] [CrossRef]
- Zou, Z.; Chen, K.; Shi, Z.; Guo, Y.; Ye, J. Object detection in 20 years: A survey. Proc. IEEE 2023, 111, 257–276. [Google Scholar] [CrossRef]
- Xu, Y.; Zhang, J.; Zhang, Q.; Tao, D. Vitpose: Simple vision transformer baselines for human pose estimation. Adv. Neural Inf. Process. Syst. 2022, 35, 38571–38584. [Google Scholar]
- Wang, J.; Sun, K.; Cheng, T.; Jiang, B.; Deng, C.; Zhao, Y.; Liu, D.; Mu, Y.; Tan, M.; Wang, X.; et al. Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 3349–3364. [Google Scholar] [CrossRef]
- Bao, W.; Niu, T.; Wang, N.; Yang, X. Pose estimation and motion analysis of ski jumpers based on ECA-HRNet. Sci. Rep. 2023, 13, 6132. [Google Scholar] [CrossRef] [PubMed]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]
- Dong, C.; Du, G. An enhanced real-time human pose estimation method based on modified YOLOv8 framework. Sci. Rep. 2024, 14, 8012. [Google Scholar] [CrossRef]
- Li, Q.; Zhang, Z.; Xiao, F.; Zhang, F.; Bhanu, B. Dite-HRNet: Dynamic lightweight high-resolution network for human pose estimation. arXiv 2022, arXiv:2204.10762. [Google Scholar]
- Zhang, Q.; Xu, Y.; Zhang, J.; Tao, D. Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. Int. J. Comput. Vis. 2023, 131, 1141–1162. [Google Scholar] [CrossRef]
- Guo, M.H.; Lu, C.Z.; Liu, Z.N.; Cheng, M.M.; Hu, S.M. Visual attention network. Comput. Vis. Media 2023, 9, 733–752. [Google Scholar] [CrossRef]
- Li, J.; Su, W.; Wang, Z. Simple pose: Rethinking and improving a bottom-up approach for multi-person pose estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 11354–11361. [Google Scholar]
- Zhang, H.; Dun, Y.; Pei, Y.; Lai, S.; Liu, C.; Zhang, K.; Qian, X. HF-HRNet: A Simple Hardware Friendly High-Resolution Network. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 7699–7711. [Google Scholar] [CrossRef]
- Xiao, B.; Wu, H.; Wei, Y. Simple baselines for human pose estimation and tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germeny, 8–14 September 2018; pp. 466–481. [Google Scholar]
- Cai, Y.; Wang, Z.; Luo, Z.; Yin, B.; Du, A.; Wang, H.; Zhang, X.; Zhou, X.; Zhou, E.; Sun, J. Learning delicate local representations for multi-person pose estimation. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part III 16. Springer International Publishing: Glasgow, UK, 2020; pp. 455–472. [Google Scholar]
- Wang, J.; Long, X.; Chen, G.; Wu, Z.; Chen, Z.; Ding, E. U-HRnet: Delving into improving semantic representation of high resolution network for dense prediction. arXiv 2022, arXiv:2210.07140. [Google Scholar]
- Artacho, B.; Savakis, A. Omnipose: A multi-scale framework for multi-person pose estimation. arXiv 2021, arXiv:2103.10180. [Google Scholar]
- Yu, C.; Xiao, B.; Gao, C.; Yuan, L.; Zhang, L.; Sang, N.; Wang, J. Lite-hrnet: A lightweight high-resolution network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 10440–10450. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Feng, Y.; Liu, P.; Lu, Z.M. Optimized S2E Attention Block based Convolutional Network for Human Pose Estimation. IEEE Access 2022, 10, 111759–111771. [Google Scholar] [CrossRef]
- Jin, X.; Xie, Y.; Wei, X.-S.; Zhao, B.-R.; Chen, Z.-M.; Tan, X. Delving deep into spatial pooling for squeeze-and-excitation networks. Pattern Recognit. 2022, 121, 108159. [Google Scholar] [CrossRef]
Method | Params (M) | GFLOPs | mAP/% | AP50/% | AP75/% | APM/% | APL/% | mAR/% |
---|---|---|---|---|---|---|---|---|
ResNet50 | 34 | 4.04 | 72.23 | 92.44 | 80.37 | 69.28 | 76.69 | 75.4 |
ResNet101 | 53 | 7.69 | 72.96 | 92.48 | 81.31 | 70.17 | 77.1 | 76.1 |
ResNet152 | 67.51 | 10.49 | 72.21 | 92.49 | 80.31 | 69.24 | 76.63 | 75.35 |
HRNet | 28.54 | 7.69 | 73.43 | 92.27 | 80.9 | 71.25 | 77.04 | 77.01 |
ECA-HRNet | 28.54 | 7.69 | 74.37 | 92.17 | 81.89 | 72.29 | 78.02 | 77.93 |
ECAv2-HRNet | 28.54 | 7.69 | 75.7 | 93.43 | 83.35 | 73.27 | 79.47 | 78.56 |
Method | mAP/% | mAR/% |
---|---|---|
ResNet50 | 71.54 | 77.28 |
ResNet101 | 72.5 | 78.47 |
ResNet152 | 70.2 | 75.5 |
HRNet | 72.8 | 78.3 |
ECA-HRNet | 73.4 | 78.9 |
ECAv2-HRNet | 74.42 | 79.21 |
Stage N with ECAv2Net | mAP/% | AP50/% | AP75/% | APM/% | APL/% | mAR/% |
---|---|---|---|---|---|---|
1 2 | 73.93 | 92.54 | 81.49 | 71.25 | 78.01 | 76.85 |
1 2 3 | 74.02 | 92.54 | 81.45 | 71.47 | 78.2 | 77.1 |
1 2 3 4 | 73.74 | 92.54 | 81.32 | 70.79 | 78.39 | 76.77 |
2 3 4 | 74.25 | 92.53 | 81.6 | 71.75 | 78.45 | 77.17 |
4 | 74.28 | 92.55 | 81.61 | 71.58 | 78.5 | 77.23 |
3 4 | 75.7 | 93.43 | 83.35 | 73.27 | 79.47 | 78.56 |
Method | Params (M) | GFLOPs | mAP/% | AP50/% | AP75/% | APM/% | APL/% | mAR/% |
---|---|---|---|---|---|---|---|---|
ResNet50 | 34 | 4.04 | 95.78 | 98.95 | 98.85 | 95.29 | 96.1 | 97.48 |
ResNet101 | 53 | 7.69 | 96.13 | 98.84 | 98.83 | 95.35 | 96.47 | 97.80 |
SE-ResNet50 | 36.53 | 4.05 | 96.35 | 98.93 | 98.92 | 95.88 | 96.17 | 98.07 |
SE-ResNet101 | 57.77 | 7.7 | 95.5 | 98.94 | 98.94 | 94.72 | 95.82 | 97.36 |
S2E-ResNet50 | 36.53 | 4.05 | 95.57 | 98.94 | 98.93 | 95.14 | 95.88 | 97.36 |
S2E-ResNet101 | 57.77 | 7.7 | 96.04 | 98.95 | 98.94 | 94.67 | 96.46 | 97.52 |
ECA-ResNet50 | 34 | 4.04 | 96.35 | 98.94 | 98.93 | 96.03 | 96.66 | 98.04 |
ECA-ResNet101 | 52.99 | 7.69 | 96.27 | 98.94 | 98.92 | 95.47 | 96.66 | 97.89 |
CBAM-ResNet50 | 34.52 | 4.05 | 96.18 | 98.94 | 98.94 | 95.45 | 96.62 | 97.90 |
CBAM-ResNet101 | 53.52 | 7.7 | 96.14 | 98.93 | 98.93 | 95.79 | 96.35 | 97.87 |
Omnipose | 30.56 | 7.91 | 96.52 | 98.97 | 98.96 | 96.15 | 96.7 | 98.06 |
UHRNet | 28.57 | 7.69 | 91.44 | 98.98 | 97.65 | 90.96 | 91.95 | 94.16 |
HRNet | 28.54 | 7.69 | 94.89 | 98.95 | 98.92 | 94.14 | 95.39 | 96.91 |
S2E-HRNet | 28.76 | 7.69 | 95.66 | 98.93 | 98.93 | 95.39 | 95.98 | 97.46 |
SE-HRNet | 28.76 | 7.69 | 93.89 | 98.95 | 98.89 | 92.63 | 94.64 | 96.08 |
ECA-HRNet | 28.54 | 7.69 | 95.62 | 98.95 | 98.93 | 94.87 | 95.97 | 97.42 |
ECAv2-HRNet | 28.54 | 7.69 | 96.71 | 98.93 | 98.92 | 96.44 | 96.87 | 98.27 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shou, Z.; Yu, Y.; Li, D.; Mo, J.; Zhang, H.; Zhang, J.; Wu, Z. Accurately Estimate and Analyze Human Postures in Classroom Environments. Information 2025, 16, 313. https://doi.org/10.3390/info16040313
Shou Z, Yu Y, Li D, Mo J, Zhang H, Zhang J, Wu Z. Accurately Estimate and Analyze Human Postures in Classroom Environments. Information. 2025; 16(4):313. https://doi.org/10.3390/info16040313
Chicago/Turabian StyleShou, Zhaoyu, Yongbo Yu, Dongxu Li, Jianwen Mo, Huibing Zhang, Jingwei Zhang, and Ziyong Wu. 2025. "Accurately Estimate and Analyze Human Postures in Classroom Environments" Information 16, no. 4: 313. https://doi.org/10.3390/info16040313
APA StyleShou, Z., Yu, Y., Li, D., Mo, J., Zhang, H., Zhang, J., & Wu, Z. (2025). Accurately Estimate and Analyze Human Postures in Classroom Environments. Information, 16(4), 313. https://doi.org/10.3390/info16040313