GDE-Pose: A Real-Time Adaptive Compression and Multi-Scale Dynamic Feature Fusion Approach for Pose Estimation
Abstract
:1. Introduction
2. Related Work
3. Method
3.1. C3k2_Ghost
3.2. C3k2_DFFM
3.3. ECA_Head
4. Experiment
4.1. Datasets and Evaluation Metrics
4.2. Experimental Setup
4.3. Ablation and Comparative Experiments
5. Results
6. Discussion and Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Murphy-Chutorian, E.; Trivedi, M.M. Head pose estimation in computer vision: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 31, 607–626. [Google Scholar] [CrossRef] [PubMed]
- Stenum, J.; Cherry-Allen, K.M.; Pyles, C.O.; Reetzke, R.D.; Vignos, M.F.; Roemmich, R.T. Applications of pose estimation in human health and performance across the lifespan. Sensors 2021, 21, 7315. [Google Scholar] [CrossRef] [PubMed]
- Li, Z.; Xue, M.; Cui, Y.; Liu, B.; Fu, R.; Chen, H.; Ju, F. Lightweight 2D Human Pose Estimation Based on Joint Channel Coordinate Attention Mechanism. Electronics 2023, 13, 143. [Google Scholar] [CrossRef]
- Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. Ghostnet: More features from cheap operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1580–1589. [Google Scholar]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]
- Diwan, T.; Anirudh, G.; Tembhurne, J.V. Object detection using YOLO: Challenges, architectural successors, datasets and applications. Multimed. Tools Appl. 2023, 82, 9243–9275. [Google Scholar] [CrossRef] [PubMed]
- Yang, W.; Jiang, M.; Fang, X.; Shi, X.; Guo, Y.; Al-qaness, M.A. A high-precision and efficient method for badminton action detection in sports using You Only Look Once with Hourglass Network. Eng. Appl. Artif. Intell. 2024, 137, 109177. [Google Scholar] [CrossRef]
- Sinha, D.; El-Sharkawy, M. Thin mobilenet: An enhanced mobilenet architecture. In Proceedings of the 2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York, NY, USA, 10–12 October 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 0280–0285. [Google Scholar]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar]
- Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
- Zeng, N.; Wu, P.; Wang, Z.; Li, H.; Liu, W.; Liu, X. A small-sized object detection oriented multi-scale feature fusion approach with application to defect detection. IEEE Trans. Instrum. Meas. 2022, 71, 3507014. [Google Scholar] [CrossRef]
- Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLO. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 20 October 2024).
- Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13713–13722. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Caesar, H.; Uijlings, J.; Ferrari, V. Coco-stuff: Thing and stuff classes in context. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 1209–1218. [Google Scholar]
- Wang, G.; Chen, Y.; An, P.; Hong, H.; Hu, J.; Huang, T. UAV-YOLOv8: A small-object-detection model based on improved YOLOv8 for UAV aerial photography scenarios. Sensors 2023, 23, 7190. [Google Scholar] [CrossRef] [PubMed]
- Research Team. YOLO-NAS by Deci Achieves State-of-the-Art Performance on Object Detection Using Neural Architecture Search. 2023. Available online: https://deci.ai/blog/yolo-nas-object-detection-foundation-model/ (accessed on 12 May 2023).
- Martınez, G.H. Openpose: Whole-Body Pose Estimation. Ph.D. Thesis, Carnegie Mellon University, Pittsburgh, PA, USA, 2019. [Google Scholar]
- He, R.; Wang, X.; Chen, H.; Liu, C. VHR-BirdPose: Vision Transformer-Based HRNet for Bird Pose Estimation with Attention Mechanism. Electronics 2023, 12, 3643. [Google Scholar] [CrossRef]
- Bao, W.; Ma, Z.; Liang, D.; Yang, X.; Niu, T. Pose ResNet: 3D human pose estimation based on self-supervision. Sensors 2023, 23, 3057. [Google Scholar] [CrossRef] [PubMed]
- Fang, H.S.; Li, J.; Tang, H.; Xu, C.; Zhu, H.; Xiu, Y.; Li, Y.L.; Lu, C. Alphapose: Whole-body regional multi-person pose estimation and tracking in real-time. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 7157–7173. [Google Scholar] [CrossRef] [PubMed]
Experiment No. | Replacement Module | AP (%) 1 | mAP (%) 2 | AR (%) 3 | Params (M) 4 | FLOPs (G) 5 | FPS 6 |
---|---|---|---|---|---|---|---|
1 | None | 74.5 | 70.3 | 72 | 8.2 | 16.5 | 30 |
2 | Post Initial Conv Layer | 73.6 | 69.3 | 71.6 | 6.4 | 13.9 | 32 |
3 | Main Feature Extraction Stage | 73.8 | 69.5 | 71.8 | 6.3 | 13.8 | 31 |
4 | Downsampling Module | 73.7 | 69.4 | 71.7 | 6.5 | 13.8 | 32 |
5 | Final Aggregation Layer | 73.8 | 69.5 | 71.8 | 6.6 | 14 | 31 |
6 | Post Initial Conv Layer + Main Feature Extraction Stage | 73.5 | 69.1 | 71.5 | 6.2 | 13.6 | 34 |
7 | Main Feature Extraction Stage + Downsampling Module | 73.6 | 69.3 | 71.6 | 6.3 | 13.7 | 33 |
8 | Post Initial Conv Layer + Main Feature Extraction + Downsampling | 73.4 | 69 | 71.4 | 6.1 | 13.5 | 35 |
9 | All Modules (Complete Replacement) | 74.1 | 69.7 | 71.8 | 6.5 | 14 | 33 |
Integration Method | Accuracy ([email protected]) | Inference Speed (FPS) | Model Size (MB) | FLOPs (G) |
---|---|---|---|---|
DFFM in Each Branch | 74.8 | 28 | 7.9 | 15.8 |
DFFM Before Feature Concatenation | 75 | 31 | 6.7 | 14.1 |
DFFM as Post-Processing in C2f Module | 74.9 | 28 | 8.1 | 16.2 |
Original C3k2 Module (No DFFM) | 74.5 | 30 | 8.2 | 16.5 |
Model | AP (%) | mAP (%) | Params (M) | FLOPs (G) | FPS |
---|---|---|---|---|---|
YOLOv8-Pose [16] | 75 | 70.5 | 50 | 28.5 | 28 |
YOLO-NAS-Pose [17] | 75.8 | 71 | 52 | 30 | 29 |
OpenPose [18] | 74.2 | 69.5 | 200 | 75 | 12 |
HRNet-Pose [19] | 75.5 | 70.8 | 250 | 95 | 10 |
PoseResNet [20] | 73.5 | 69 | 60 | 32 | 30 |
AlphaPose [21] | 73.8 | 69.5 | 150 | 55 | 20 |
YOLO11-pose | 74.5 | 70.3 | 8.2 | 16.5 | 30 |
GDE-pose (Ours) | 77.3 | 72.6 | 6.8 | 14.5 | 31 |
Model Configuration | AP (%) | mAP (%) | AR (%) | Params (M) | FLOPs (G) | FPS |
---|---|---|---|---|---|---|
Baseline (YOLO11-npose) | 74.5 | 70.3 | 72 | 8.2 | 16.5 | 30 |
Baseline + C3k2_Ghost | 73.8 | 69.5 | 71.8 | 6.3 | 13.8 | 33 |
Baseline + C3k2_DFFM | 75 | 71 | 72.5 | 6.7 | 14.1 | 31 |
Baseline + ECA_Head | 74.6 | 70.6 | 72.3 | 6.8 | 14.4 | 29 |
Baseline + C3k2_Ghost + C3k2_DFFM | 74.9 | 70.8 | 72.7 | 6.5 | 14 | 32 |
Baseline + C3k2_Ghost + ECA_Head | 74.3 | 69.8 | 72 | 6.6 | 14.2 | 30 |
Baseline + C3k2_DFFM + ECA_Head | 75.5 | 71.2 | 73 | 6.8 | 14.3 | 28 |
Baseline + C3k2_Ghost + C3k2_DFFM + ECA_Head (Ours) | 77.3 | 72.6 | 74.5 | 6.8 | 14.5 | 31 |
Baseline + MobileNetv3 samll | 72.6 | 66.9 | 70.7 | 6.7 | 15.2 | 30 |
Baseline + ShuffleNetv2 0.5x | 74.9 | 68.7 | 71.2 | 6.5 | 14.9 | 28 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kuok, K.; Liu, X.; Ye, J.; Wang, Y.; Liu, W. GDE-Pose: A Real-Time Adaptive Compression and Multi-Scale Dynamic Feature Fusion Approach for Pose Estimation. Electronics 2024, 13, 4837. https://doi.org/10.3390/electronics13234837
Kuok K, Liu X, Ye J, Wang Y, Liu W. GDE-Pose: A Real-Time Adaptive Compression and Multi-Scale Dynamic Feature Fusion Approach for Pose Estimation. Electronics. 2024; 13(23):4837. https://doi.org/10.3390/electronics13234837
Chicago/Turabian StyleKuok, Kaiian, Xuan Liu, Jinwei Ye, Yaokang Wang, and Wenjian Liu. 2024. "GDE-Pose: A Real-Time Adaptive Compression and Multi-Scale Dynamic Feature Fusion Approach for Pose Estimation" Electronics 13, no. 23: 4837. https://doi.org/10.3390/electronics13234837
APA StyleKuok, K., Liu, X., Ye, J., Wang, Y., & Liu, W. (2024). GDE-Pose: A Real-Time Adaptive Compression and Multi-Scale Dynamic Feature Fusion Approach for Pose Estimation. Electronics, 13(23), 4837. https://doi.org/10.3390/electronics13234837