A Prior Knowledge-Guided Graph Convolutional Neural Network for Human Action Recognition in Solar Panel Installation Process
Abstract
:1. Introduction and Related Works
1.1. Feature-Based Methods
1.2. Deep Learning-Based Methods
1.3. Contributions of This Paper
- There could be standard actions for the operations;
- The overall pose may not vary significantly with different operations, but the distinction is reflected in the details of the hands or other parts;
- The amount of data available for training is relatively small and narrow.
- A method based on training set statistics is proposed to achieve the correlation representation of different human joint points for a specific action, which can be used as a priori knowledge of the action;
- A graph convolution neural network model using the aforementioned prior knowledge to train attention mechanisms is proposed, which improves the training and recognition effect of the model for specific actions.
2. Method
2.1. System Overview
2.2. Joint Modeling
2.3. Prior Knowledge Extraction
2.4. Graph Convolution Networks
- The basic pose feature introduced in Section 2.1;
- The features of and introduced in Section 2.1;
- The Euclidean distances from the body’s center of gravity to the joint.
3. Experiment Results
3.1. Datasets of the Action of Solar Panel Installation
3.2. Experiment Setup
4. Discussion
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zhang, J.; Wang, P.; Gao, R.X. Hybrid machine learning for human action recognition and prediction in assembly. Robot. Comput. Integr. Manuf. 2021, 72, 102184. [Google Scholar] [CrossRef]
- Inkulu, A.K.; Raju Bahubalendruni, M.V.A.; Dara, A.; SankaranarayanaSamy, K. Challenges and opportunities in human robot collaboration context of Industry 4.0—A state of the art review. Ind. Robot 2022, 49, 226–239. [Google Scholar] [CrossRef]
- Garcia, P.P.; Santos, T.G.; Machado, M.A.; Mendes, N. Deep Learning Framework for Controlling Work Sequence in Collaborative Human–Robot Assembly Processes. Sensors 2023, 23, 553. [Google Scholar] [CrossRef]
- Chen, C.; Wang, T.; Li, D.; Hong, J. Repetitive assembly action recognition based on object detection and pose estimation. J. Manuf. Syst. 2020, 55, 325–333. [Google Scholar] [CrossRef]
- Ramanathan, M.; Yau, W.-Y.; Teoh, E.K. Human action recognition with video data: Research and evaluation challenges. IEEE Trans. Hum. Mach. Syst. 2014, 44, 650–663. [Google Scholar] [CrossRef]
- Gupta, N.; Gupta, S.K.; Pathak, R.K.; Jain, V.; Rashidi, P.; Suri, J.S. Human activity recognition in artificial intelligence framework: A narrative review. Artif. Intell. Rev. 2022, 55, 4755–4808. [Google Scholar] [CrossRef]
- Zhou, X.; Liang, W.; Wang, K.I.-K.; Wang, H.; Yang, L.T.; Jin, Q. Deep-Learning-Enhanced Human Activity Recognition for Internet of Healthcare Things. IEEE Internet Things J. 2020, 7, 6429–6438. [Google Scholar] [CrossRef]
- Mendes, N. Surface Electromyography Signal Recognition Based on Deep Learning for Human-Robot Interaction and Collaboration. J. Intell. Robot. Syst. 2022, 105, 42. [Google Scholar] [CrossRef]
- Yao, L.; Sheng, Q.Z.; Li, X.; Gu, T.; Tan, M.; Wang, X.; Wang, S.; Ruan, W. Compressive Representation for Device-Free Activity Recognition with Passive RFID Signal Strength. IEEE Trans. Mob. Comput. 2017, 17, 293–306. [Google Scholar] [CrossRef]
- Zhang, H.; Parker, L.E. CoDe4D: Color-Depth Local Spatio-Temporal Features for Human Activity Recognition From RGB-D Videos. IEEE Trans. Circuits Syst. Video Technol. 2014, 26, 541–555. [Google Scholar] [CrossRef]
- Chen, C.; Jafari, R.; Kehtarnavaz, N. A survey of depth and inertial sensor fusion for human action recognition. Multimedia Tools Appl. 2017, 76, 4405–4425. [Google Scholar] [CrossRef]
- Bobick, A.; Davis, J. The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 2001, 23, 257–267. [Google Scholar] [CrossRef] [Green Version]
- Weinland, D.; Ronfard, R.; Boyer, E. Free viewpoint action recognition using motion history volumes. Comput. Vis. Image Underst. 2006, 104, 249–257. [Google Scholar] [CrossRef] [Green Version]
- Laptev, I. On space-time interest points. Int. J. Comput. Vis. 2005, 64, 107–123. [Google Scholar] [CrossRef]
- Dollár, P.; Rabaud, V.; Cottrell, G.; Belongie, S. Behavior recognition via sparse spatio-temporal features. In Proceedings of the IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, Beijing, China, 15–16 October 2005; pp. 65–72. [Google Scholar] [CrossRef] [Green Version]
- Laptev, I.; Marszalek, M.; Schmid, C.; Rozenfeld, B. Learning realistic human actions from movies. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 23–28 June 2008; pp. 1–8. [Google Scholar] [CrossRef] [Green Version]
- Klaser, A.; Marszałek, M.; Schmid, C. A spatio-temporal descriptor based on 3d-gradients. In Proceedings of the 19th British Machine Vision Conference, Leeds, UK, 1–4 September 2008; pp. 275:1–275:10. [Google Scholar]
- Wang, H.; Kläser, A.; Schmid, C.; Liu, C.-L. Action Recognition by Dense Trajectories. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, USA, 20–25 June 2011; pp. 3169–3176. [Google Scholar] [CrossRef] [Green Version]
- Sun, J.; Wu, X.; Yan, S.; Cheong, L.-F.; Chua, T.-S.; Li, J. Hierarchical spatio-temporal context modeling for action recognition. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 2004–2011. [Google Scholar] [CrossRef]
- Zelnik-Manor, L.; Irani, M. Statistical analysis of dynamic actions. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 1530–1535. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Simonyan, K.; Zisserman, A. Two-stream convolutional networks for action recognition in videos. In Advances in Neural Information Processing Systems; Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K.Q., Eds.; MIT Press: Cambridge, MA, USA, 2014; Volume 27, pp. 1345–1367. [Google Scholar]
- Karpathy, A.; Toderici, G.; Shetty, S.; Leung, T.; Sukthankar, R.; Li, F.-F. Large-scale video classification with convolutional neural networks. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1725–1732. [Google Scholar] [CrossRef] [Green Version]
- Zhang, B.; Wang, L.; Wang, Z.; Qiao, Y.; Wang, H. Real-time action recognition with enhanced motion vector CNNs. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2718–2726. [Google Scholar] [CrossRef] [Green Version]
- Donahue, J.; Hendricks, L.A.; Guadarrama, S.; Rohrbach, M.; Venugopalan, S.; Saenko, K.; Darrell, T. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 2625–2634. [Google Scholar] [CrossRef] [Green Version]
- Sharma, S.; Kiros, R.; Salakhutdinov, R. Action recognition using visual attention. arXiv 2015, arXiv:1511.04119. [Google Scholar]
- Wu, Z.; Wang, X.; Jiang, Y.-G.; Ye, H.; Xue, X. Modeling spatial-temporal clues in a hybrid deep learning framework for video classification. In Proceedings of the 23rd ACM International Conference on Multimedia, Brisbane, Australia, 26–30 October 2015; pp. 461–470. [Google Scholar] [CrossRef] [Green Version]
- Tran, D.; Bourdev, L.; Fergus, R.; Torresani, L.; Paluri, M. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 4489–4497. [Google Scholar]
- Varol, G.; Laptev, I.; Schmid, C. Long-term temporal convolutions for action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 1510–1517. [Google Scholar] [CrossRef] [Green Version]
- Qiu, Z.; Yao, T.; Mei, T. Learning spatio-temporal representation with pseudo-3d residual networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4489–4497. [Google Scholar] [CrossRef] [Green Version]
- Zhou, Y.; Sun, X.; Luo, C.; Zha, Z.; Zeng, W. Spatiotemporal fusion in 3d CNNs: A probabilistic view. In Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1725–1732. [Google Scholar] [CrossRef]
- Kim, J.; Cha, S.; Wee, D.; Bae, S.; Kim, J. Regularization on spatio-temporally smoothed feature for action recognition. In Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 12103–12112. [Google Scholar] [CrossRef]
- Wang, L.; Huynh, D.Q.; Koniusz, P. A comparative review of recent kinect-based action recognition algorithms. IEEE Trans. Image Process. 2019, 29, 15–28. [Google Scholar] [CrossRef] [Green Version]
- Mehta, D.; Sridhar, S.; Sotnychenko, O.; Rhodin, H.; Shafiei, M.; Seidel, H.-P.; Xu, W.; Casas, D.; Theobalt, C. Vnect: Real-time 3d human pose estimation with a single rgb camera. ACM Trans. Graph. 2017, 36, 1–14. [Google Scholar] [CrossRef]
- Pouyanfar, S.; Sadiq, S.; Yan, Y.; Tian, H.; Tao, Y.; Reyes, M.P.; Shyu, M.-L.; Chen, S.-C.; Iyengar, S.S. A survey on deep learning: Algorithms, techniques, and applications. ACM Comput. Surv. 2018, 51, 1–36. [Google Scholar] [CrossRef]
- Cao, Z.; Simon, T.; Wei, S.-E.; Sheikh, Y. Realtime multi-person 2D pose estimation using part affinity fields. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7291–7299. [Google Scholar] [CrossRef] [Green Version]
- Girshick, R.; Radosavovic, I.; Gkioxari, G.; Dollár, P.; He, K. Detectron. 2018. Available online: https://github.com/facebookresearch/detectron (accessed on 20 June 2023).
- Yuxin, W.; Alexander, K.; Francisco, M.; Wan-Yen, L.; Ross, G. Detectron2. 2019. Available online: https://github.com/facebookresearch/detectron2 (accessed on 20 June 2023).
- Du, Y.; Wang, W.; Wang, L. Hierarchical recurrent neural network for skeleton based action recognition. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1110–1118. [Google Scholar] [CrossRef] [Green Version]
- Si, C.; Chen, W.; Wang, W.; Wang, L.; Tan, T. An attention enhanced graph convolutional LSTM network for skeleton-based action recognition. In Proceedings of the 2019 IEEE International Conference on Computer Vision, Long Beach, CA, USA, 15–20 June 2019; pp. 1227–1236. [Google Scholar] [CrossRef] [Green Version]
- Hou, Y.; Li, Z.; Wang, P.; Li, W. Skeleton optical spectra-based action recognition using convolutional neural networks. IEEE Trans. Circuits Syst. Video Technol. 2016, 28, 807–811. [Google Scholar] [CrossRef]
- Feichtenhofer, C.; Fan, H.; Malik, J.; He, K. SlowFast Networks for Video Recognition. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6201–6210. [Google Scholar] [CrossRef] [Green Version]
- Shi, L.; Zhang, Y.; Cheng, J.; Lu, H. Skeleton-based action recognition with directed graph neural networks. In Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7912–7921. [Google Scholar] [CrossRef]
- Yan, S.; Xiong, Y.; Lin, D. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
- Zhang, Y.; Wu, B.; Li, W.; Duan, L.; Gan, C. Gan, STST: Spatial-temporal specialized transformer for skeleton-based action recognition. In Proceedings of the 29th ACM International Conference on Multimedia, Chengdu, China, 20–24 October 2021; pp. 3229–3237. [Google Scholar] [CrossRef]
- Ahmad, T.; Mao, H.; Lin, L.; Tang, G. Action recognition using attention-joints graph convolutional neural networks. IEEE Access 2019, 8, 305–313. [Google Scholar] [CrossRef]
- Chen, Y.; Ma, G.; Yuan, C.; Li, B.; Zhang, H.; Wang, F.; Hu, W. Graph convolutional network with structure pooling and joint-wise channel attention for action recognition. Pattern Recognit. 2020, 103, 107321. [Google Scholar] [CrossRef]
- Tunga, A.; Nuthalapati, S.V.; Wachs, J. Pose-based Sign Language Recognition using GCN and BERT. In Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision Workshops (WACVW), Waikola, HI, USA, 5–9 January 2021; pp. 31–40. [Google Scholar] [CrossRef]
- Shahroudy, A.; Liu, J.; Ng, T.-T.; Wang, G. NTU RGB+D: A large scale dataset for 3d human activity analysis. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1010–1019. [Google Scholar] [CrossRef] [Green Version]
- Kay, W.; Carreira, J.; Simonyan, K.; Zhang, B.; Hillier, C.; Vijayanarasimhan, S.; Viola, F.; Green, T.; Back, T.; Natsev, P. The kinetics human action video dataset. arXiv 2017, arXiv:1705.06950. [Google Scholar]
- Sahoo, S.P.; Modalavalasa, S.; Ari, S. A sequential learning framework to handle occlusion in human action recognition with video acquisition sensors. Digit. Signal Process. 2022, 131, 103763. [Google Scholar] [CrossRef]
Body Parts | Joints |
---|---|
Torso | 0,2,5,8,11 |
Head | 0,1 |
Right hand | 2,3,4 |
Left hand | 5,6,7 |
Right leg | 11,12,13 |
Left leg | 8,9,10 |
Actions | Joints Marked as ‘1′ (Detected Using Our Method) | Joints Marked as ‘1′ (Human-Designed) |
---|---|---|
‘pendulum’ | 2,3,4;5,6,7;0,2,5,8,11; | 2,3,4;5,6,7; |
‘stay wire’ | 2,3,4;5,6,7;0,1; | 2,3,4;5,6,7; |
‘threading’ | 2,3,4;5,6,7;0,1;0,2,5,8,11 | 2,3,4;5,6,7; |
‘tie’ | 2,3,4;5,6,7;0,2,5,8,11 | 2,3,4;5,6,7; |
Action | Slowfast [41] | ST-GCN [43] | Our Method * | Our Method ** | ||||
---|---|---|---|---|---|---|---|---|
Top1 acc | Top2 acc | Top1 acc | Top2 acc | Top1 acc | Top2 acc | Top1 acc | Top2 acc | |
pendulum | 0.67 | 0.88 | 0.76 | 0.89 | 0.73 | 0.88 | 0.75 | 0.92 |
stay wire | 0.96 | 0.98 | 0.93 | 0.98 | 0.92 | 0.98 | 0.96 | 0.99 |
threading | 0.90 | 1.00 | 0.92 | 0.99 | 0.89 | 0.99 | 0.91 | 0.99 |
tie | 0.28 | 0.66 | 0.60 | 0.78 | 0.70 | 0.78 | 0.79 | 0.85 |
Total (%) | 70.25 | 88.00 | 80.25 | 91.00 | 81.00 | 90.75 | 85.25 | 93.75 |
Process Step | Skeleton Detection Process | Feature Extraction Process | GCN Prediction Process | Total |
---|---|---|---|---|
Average Time | 83 ms | 16 ms | 335 ms | 434 ms |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wu, J.; Zhu, Y.; Wang, C.; Li, J.; Zhu, X. A Prior Knowledge-Guided Graph Convolutional Neural Network for Human Action Recognition in Solar Panel Installation Process. Appl. Sci. 2023, 13, 8608. https://doi.org/10.3390/app13158608
Wu J, Zhu Y, Wang C, Li J, Zhu X. A Prior Knowledge-Guided Graph Convolutional Neural Network for Human Action Recognition in Solar Panel Installation Process. Applied Sciences. 2023; 13(15):8608. https://doi.org/10.3390/app13158608
Chicago/Turabian StyleWu, Jin, Yaqiao Zhu, Chunguang Wang, Jinfu Li, and Xuehong Zhu. 2023. "A Prior Knowledge-Guided Graph Convolutional Neural Network for Human Action Recognition in Solar Panel Installation Process" Applied Sciences 13, no. 15: 8608. https://doi.org/10.3390/app13158608
APA StyleWu, J., Zhu, Y., Wang, C., Li, J., & Zhu, X. (2023). A Prior Knowledge-Guided Graph Convolutional Neural Network for Human Action Recognition in Solar Panel Installation Process. Applied Sciences, 13(15), 8608. https://doi.org/10.3390/app13158608