Noise-Robust 3D Pose Estimation Using Appearance Similarity Based on the Distributed Multiple Views
Abstract
:1. Introduction
2. Related Work
2.1. 3D Human Pose Estimation
2.2. Edge AI-Based 3D Human Pose Estimation
3. System Architecture
4. Proposed Framework
4.1. Edge Device
4.1.1. 2D Human Pose Detection
4.1.2. Appearance Feature Extraction
4.2. Central Server
4.2.1. Geometrical Affinity
Algorithm 1: Grouping 2D poses with geometrical affinity | |
1: | initialize |
2: | for camera 2 to do |
3: | for pose 1 to do |
4: | for hypothesis 1 to do |
5: | |
6: | end |
7: | end |
8: | |
9: | for where = 1 do |
10: | If then |
11: | |
12: | else |
13: | |
14: | end |
15: | end |
16: | end |
4.2.2. Appearance Similarity
Algorithm 2: Grouping 2D poses with appearance similarity | |
1: | initialize |
2: | for camera 2 to do |
3: | for pose 1 to do |
4: | for hypothesis 1 to do |
5: | |
6: | end |
7: | end |
8: | |
9: | for where = 1 do |
10: | If then |
11: | |
12: | |
13: | else |
14: | |
15: | |
16: | end |
17: | end |
18: | end |
4.2.3. Multi-View Triangulation and Calibration Self-Correction
5. Experimental Results
5.1. Simulation Environment
5.2. Quantitative Results (PCP Score)
5.2.1. Campus Dataset
5.2.2. Shelf Dataset
5.3. Qualitative Results
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Oñoro, D.; López-Sastre, R.J.; Redondo-Cabrera, C.; Gil-Jiménez, P. The Challenge of Simultaneous Object Detection and Pose Estimation: A Comparative Study. Image Vis. Comput. 2018, 27, 109–122. [Google Scholar] [CrossRef]
- Felzenszwalb, P.F.; Huttenlocher, D.P. Pictorial Structures for Object Recognition. Int. J. Comput. Vis. 2005, 61, 55–79. [Google Scholar] [CrossRef]
- Kostrikov, I.; Gall, J. Depth Sweep Regression Forests for Estimating 3D Human Pose from Images. In Proceedings of the British Machine Vision Conference (BMVC), Nottingham, UK, 1–4 September 2014. [Google Scholar]
- Andriluka, M.; Roth, S.; Schiele, B. Monocular 3D Pose Estimation and Tracking by Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, 13–18 June 2010; pp. 623–630. [Google Scholar] [CrossRef]
- Belagiannis, V.; Wang, X.; Schiele, B.; Fua, P.; Ilic, S.; Navab, N. Multiple Human Pose Estimation with Temporally Consistent 3d Pictorial Structures. In Proceedings of the Springer European Conference on Computer Vision (ECCV) Workshop, Zurich, Switzerland, 6–12 September 2014; pp. 742–754. [Google Scholar]
- Burenius, M.; Sullivan, J.; Carlsson, S. 3D Pictorial Structures for Multiple View Articulated Pose Estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA, 23–28 June 2013; pp. 3618–3625. [Google Scholar] [CrossRef]
- Cheng, Y.; Yang, B.; Wang, B.; Wending, Y.; Tan, R. Occlusion-Aware Networks for 3D Human Pose Estimation in Video. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 723–732. [Google Scholar] [CrossRef]
- Tran, T.-D.; Vo, X.-T.; Cao, G.; Jo, K.-H. A Simple Yet Effective Data Augmentation for Human Pose Estimation. In Proceedings of the IEEE International Workshop on Intelligent Systems (IWIS), Ulsan, Republic of Korea, 9–11 August 2023; pp. 1–6. [Google Scholar] [CrossRef]
- Dong, J.; Fang, Q.; Jiang, W.; Yang, Y.; Huang, Q.; Bao, H.; Zhou, X. Fast and Robust Multi-Person 3D Pose Estimation and Tracking From Multiple Views. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 6981–6992. [Google Scholar] [CrossRef] [PubMed]
- Tang, Z.; Gu, R.; Hwang, J.-N. Joint Multi-View People Tracking and Pose Estimation for 3D Scene Reconstruction. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), San Diego, CA, USA, 23–27 July 2018; pp. 1–6. [Google Scholar] [CrossRef]
- USB 3.1 Specification Language Usage Guidelines from USB-IF, USB Implementers Forum. 28 May 2015. Available online: www.USB.org (accessed on 29 August 2024).
- Teledyne FLIR. USB 3.1 Multiple Camera Setup. Available online: https://www.flirkorea.com/support-center/iis/machine-vision/application-note/usb-3.1-multiple-camera-setup/ (accessed on 19 April 2024).
- IEEE Std 802.3ab-1999; IEEE Standard for Information Technology—Telecommunications and Information Exchange between Systems—Local and Metropolitan Area Networks—Part 3: Carrier Sense Multiple Access with Collision Detection (CSMA/CD) Access Method and Physical Layer Specifications—Physical Layer Parameters and Specifications for 1000 Mb/s Operation over 4 pair of Category 5 Balanced Copper Cabling, Type 1000BASE-T. IEEE: Piscataway, NJ, USA, 1999; pp. 1–144.
- Bultmann, S.; Behnke, S. Real-Time Multi-View 3D Human Pose Estimation using Semantic Feedback to Smart Edge Sensors. In Proceedings of the Robotics Science and Systems (RSS), Bonn, Germany, 12–16 July 2021. [Google Scholar] [CrossRef]
- StereoLabs, ZED SDK Documentation. Available online: https://www.stereolabs.com/docs/fusion/zed360 (accessed on 19 April 2024).
- Bazargani, H.; Laganiere, R. Camera Calibration and Pose Estimation from Planes. IEEE Instrum. Meas. Mag. 2015, 18, 20–27. [Google Scholar] [CrossRef]
- Usamentiaga, R.; Garcia, D.F. Multi-camera Calibration for Accurate Geometric Measurements in Industrial Environments. Measurement 2019, 134, 345–358. [Google Scholar] [CrossRef]
- Kopparapu, S.K.; Corke, P. The effect of measurement noise on intrinsic camera calibration parameters. In Proceedings of the IEEE International Conference on Robotics and Automation, Detroit, MI, USA, 10–15 May 1999; Volume 2, pp. 1281–1286. [Google Scholar] [CrossRef]
- Toshev, A.; Szegedy, C. DeepPose: Human Pose Estimation via Deep Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 1653–1660. [Google Scholar]
- Gamra, M.B.; Akhloufi, M.A. A review of deep learning techniques for 2D and 3D human pose estimation. Image Vis. Comput. 2021, 114, 104282. [Google Scholar] [CrossRef]
- Li, S.; Chan, A. 3D Human Pose Estimation from Monocular Images with Deep Convolutional Neural Network. In Proceedings of the Asian Conference on Computer Vision (ACCV), Singapore, 1–5 November 2014; pp. 332–347. [Google Scholar] [CrossRef]
- Pavlakos, G.; Zhou, X.; Derpanis, K.G.; Daniilidis, K. Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1263–1272. [Google Scholar] [CrossRef]
- Tekin, B.; Katircioglu, I.; Salzmann, M.; Lepetit, V.; Fua, P. Structured Prediction of 3D Human Pose with Deep Neural Networks. In Proceedings of the British Machine Vision Conference (BMVC), York, UK, 19–22 September 2016. [Google Scholar]
- Takahashi, K.; Mikami, D.; Isogawa, M.; Kimata, H. Human Pose as Calibration Pattern: 3D Human Pose Estimation with Multiple Unsynchronized and Uncalibrated Cameras. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; pp. 1856–18567. [Google Scholar] [CrossRef]
- Defferrard, M.; Bresson, X.; Vandergheynst, P. Convolutional neural networks on graphs with fast localized spectral filtering. In Proceedings of the International Conference on Neural Information Processing Systems (NIPS), Barcelona, Spain, 5–10 December 2016; pp. 3844–3852. [Google Scholar]
- Kadkhodamohammadi, A.; Padoy, N. A Generalizable Approach for Multi-View 3D Human Pose Regression. arXiv 2018, arXiv:1804.10462. [Google Scholar] [CrossRef]
- Tanke, J.; Gall, J. Iterative Greedy Matching for 3D Human Pose Tracking from Multiple Views. In Pattern Recognition, Proceedings of the 41st DAGM German Conference, DAGM GCPR 2019, Dortmund, Germany, 10–13 September 2019; Springer: Cham, Switzerland, 2019; pp. 537–550. [Google Scholar]
- Remelli, E.; Han, S.; Honari, S.; Fua, P.; Wang, R. Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled Representation. In Proceedings of the IEEE/CVF Conference of Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 6039–6048. [Google Scholar] [CrossRef]
- Hua, G.; Liu, H.; Li, W.; Zhang, Q.; Ding, R.; Xu, X. Weakly-Supervised 3D Human Pose Estimation With Cross-View U-Shaped Graph Convolutional Network. IEEE Trans. Multimed. 2023, 25, 1832–1843. [Google Scholar] [CrossRef]
- Nguyen, H.S.; Kim, M.; Im, C.; Han, S.; Han, J. ConvNeXtPose: A Fast Accurate Method for 3D Human Pose Estimation and Its AR Fitness Application in Mobile Devices. IEEE Access 2023, 11, 117393–117402. [Google Scholar] [CrossRef]
- Hossain, M.I.; Akhter, S.; Hossain, M.D.; Hong, C.S.; Huh, E.-N. Multi-Person 3D Pose Estimation in Mobile Edge Computing Devices for Real-Time Applications. In Proceedings of the International Conference on Information Networking (ICOIN), Bangkok, Thailand, 11–14 January 2023; pp. 673–677. [Google Scholar] [CrossRef]
- Google. EdgeTPU dev Board. Available online: https://coral.ai/docs/dev-board/datasheet (accessed on 22 February 2021).
- Cao, Z.; Hidalgo, G.; Simon, T.; Wei, S.-E.; Sheikh, Y. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 172–186. [Google Scholar] [CrossRef] [PubMed]
- OpenCV. Perspective-n-Point (PnP) Pose Computation. Available online: https://docs.opencv.org/3.4/d5/d1f/calib3d_solvePnP.html (accessed on 19 April 2024).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hwang, T.; Kim, M. Noise-Robust 3D Pose Estimation Using Appearance Similarity Based on the Distributed Multiple Views. Sensors 2024, 24, 5645. https://doi.org/10.3390/s24175645
Hwang T, Kim M. Noise-Robust 3D Pose Estimation Using Appearance Similarity Based on the Distributed Multiple Views. Sensors. 2024; 24(17):5645. https://doi.org/10.3390/s24175645
Chicago/Turabian StyleHwang, Taemin, and Minjoon Kim. 2024. "Noise-Robust 3D Pose Estimation Using Appearance Similarity Based on the Distributed Multiple Views" Sensors 24, no. 17: 5645. https://doi.org/10.3390/s24175645
APA StyleHwang, T., & Kim, M. (2024). Noise-Robust 3D Pose Estimation Using Appearance Similarity Based on the Distributed Multiple Views. Sensors, 24(17), 5645. https://doi.org/10.3390/s24175645