Human Interaction Recognition Based on Whole-Individual Detection
Abstract
:1. Introduction
1.1. Related Work
1.2. Contribution
2. Proposed Method
2.1. Motion Video Individual Detection
2.2. Video Downsampling with Time-Phased Features
2.3. Human Interaction Feature Extraction Based on Parallel Multi-Feature Network
2.4. Whole-Individual Detection Based on Decision-Level Fusion
3. Results
3.1. Experimental Platform and Experimental Data
3.2. Analysis of Results
4. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Qi, J.; Jiang, G.; Li, G.; Sun, Y.; Tao, B. Intelligent Human-Computer Interaction Based on Surface EMG Gesture Recognition. IEEE Access 2019, 7, 61378–61387. [Google Scholar] [CrossRef]
- Minhaz, U.A.; Yeong, H.K.; Jin, W.K.; Md, R.B.; Phill, K.R. Two person Interaction Recognition Based on Effective Hybrid Learning. KSII Trans. Int. Inf. Syst. 2019, 13, 751–770. [Google Scholar]
- Chinimilli, P.T.; Redkar, S.; Sugar, T. A Two-Dimensional Feature Space-Based Approach for Human Locomotion Recognition. IEEE Sens. J. 2019, 19, 4271–4282. [Google Scholar] [CrossRef]
- Phyo, C.N.; Zin, T.T.; Tin, P. Deep Learning for Recognizing Human Activities Using Motions of Skeletal Joints. IEEE Trans. Consum. Electron. 2019, 65, 243–252. [Google Scholar] [CrossRef]
- Joao, C.; Zisserman, A. Quo vadis, action recognition? A new model and the kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Qi, H.; Fang, K.; Wu, X.; Xu, L.; Lang, Q. Human activity recognition method based on molecular attributes. Int. J. Distrib. Sens. Netw. 2019. [Google Scholar] [CrossRef]
- Sanzari, M.; Ntouskos, V.; Pirri, F. Discovery and recognition of motion primitives in human activities. PLOS ONE 2019, 14, e0214499. [Google Scholar] [CrossRef] [Green Version]
- An, F. Human Action Recognition Algorithm Based on Adaptive Initialization of Deep Learning Model Parameters and Support Vector Machine. IEEE Access 2018, 6, 59405–59421. [Google Scholar] [CrossRef]
- McColl, D.; Jiang, C.; Nejat, G. Classifying a Person’s Degree of Accessibility from Natural Body Language During Social Human–Robot Interactions. IEEE Trans. Cybern. 2017, 47, 524–538. [Google Scholar] [CrossRef] [PubMed]
- Wang, Z.; Cao, J.; Liu, J.; Zhao, Z. Design of human-computer interaction control system based on hand-gesture recognition. In Proceedings of the 2017 32nd Youth Academic Annual Conference of Chinese Association of Automation (YAC), Hefei, China, 19–21 May 2017; pp. 143–147. [Google Scholar]
- Lakomkin, E.; Zamani, M.A.; Weber, C.; Magg, S.; Wermter, S. On the Robustness of Speech Emotion Recognition for Human-Robot Interaction with Deep Neural Networks. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 854–860. [Google Scholar] [CrossRef] [Green Version]
- Böck, R. Recognition of Human Movement Patterns during a Human-Agent Interaction. In Proceedings of the 4th International Workshop on Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction (MA3HMI’18), New York, NY, USA, 16 October 2018; pp. 33–37. [Google Scholar] [CrossRef]
- Lou, X.; Yu, Z.; Wang, Z.; Zhang, K.; Guo, B. Gesture-Radar: Enabling Natural Human-Computer Interactions with Radar-Based Adaptive and Robust Arm Gesture Recognition. In Proceedings of the 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Miyazaki, Japan, 9 October 2018; pp. 4291–4297. [Google Scholar] [CrossRef]
- Faria, D.R.; Vieira, M.; Faria, F.C.C.; Premebida, C. Affective facial expressions recognition for human-robot interaction. In Proceedings of the 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), Lisbon, Portugal, 28 August 2017; pp. 805–810. [Google Scholar] [CrossRef] [Green Version]
- Käse, N.; Babaee, M.; Rigoll, G. Multi-view human activity recognition using motion frequency. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 18–20 September 2017; pp. 3963–3967. [Google Scholar] [CrossRef]
- Jaouedi, N.; Boujnah, N.; Htiwich, O.; Bouhlel, M.S. Human action recognition to human behavior analysis. In Proceedings of the 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), Hammamet, Tunisia, 18–20 December 2016; pp. 263–266. [Google Scholar] [CrossRef]
- Silambarasi, R.; Sahoo, S.P.; Ari, S. 3D spatial-temporal view based motion tracing in human action recognition. In Proceedings of the 2017 International Conference on Communication and Signal Processing (ICCSP), Wuhan, China, 17–19 March 2017; pp. 1833–1837. [Google Scholar] [CrossRef]
- Tozadore, D.; Ranieri, C.; Nardari, G.; Guizilini, V.; Romero, R. Effects of Emotion Grouping for Recognition in Human-Robot Interactions. In Proceedings of the 2018 7th Brazilian Conference on Intelligent Systems (BRACIS), Sao Paulo, Brazil, 22–25 October 2018; pp. 438–443. [Google Scholar] [CrossRef]
- Liu, B.; Cai, H.; Ji, X.; Liu, H. Human-human interaction recognition based on spatial and motion trend feature. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 4547–4551. [Google Scholar] [CrossRef] [Green Version]
- Wang, H.; Wang, L. Modeling Temporal Dynamics and Spatial Configurations of Actions Using Two-Stream Recurrent Neural Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 3633–3642. [Google Scholar] [CrossRef] [Green Version]
- Zhao, Y.; Xiong, Y.; Lin, D. Trajectory convolution for action recognition. Adv. Neural Inf. Processing Syst. 2018, 2018, 2205–2216. [Google Scholar]
- Chiang, T.; Fan, C. 3D Depth Information Based 2D Low-Complexity Hand Posture and Gesture Recognition Design for Human Computer Interactions. In Proceedings of the 2018 3rd International Conference on Computer and Communication Systems (ICCCS), Nagoya, Japan, 27–30 April 2018; pp. 233–238. [Google Scholar] [CrossRef]
- Du, T.; Wang, H.; Torresani, L.; Ray, J.; LeCun, Y.; Paluri, M. A closer look at spatiotemporal convolutions for action recognition. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Vox, J.P.; Wallhoff, F. Preprocessing and Normalization of 3D-Skeleton-Data for Human Motion Recognition. In Proceedings of the 2018 IEEE Life Sciences Conference (LSC), Montreal, QC, Canada, 28−30 October 2018; pp. 279–282. [Google Scholar] [CrossRef]
- Phyo, C.N.; Zin, T.T.; Tin, P. Skeleton motion history based human action recognition using deep learning. In Proceedings of the 2017 IEEE 6th Global Conference on Consumer Electronics (GCCE), Nagoya, Japan, 24−27 October 2017; pp. 1–2. [Google Scholar] [CrossRef]
- Chen, Y.; Kalantidis, Y.; Li, J.; Yan, S.; Feng, J. Multi-fiber networks for video recognition. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8−14 September 2018. [Google Scholar]
- Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18−22 June 2018. [Google Scholar]
- Li, N.J.; Cheng, X.; Guo, H.Y.; Wu, Z.Y. Recognizing human interactions by genetic algorithm-based random forest spatio-temporal correlation. Pattern Anal. Appl. 2016, 19, 267–282. [Google Scholar] [CrossRef]
- Huang, F.F.; Cao, J.T.; Ji, X.F. Two-person interactive motion recognition algorithm based on multi-channel information fusion. Comput. Technol. Dev. 2016, 26, 58–62. [Google Scholar]
- Guo, P.; Miao, Z.; Zhang, X.; Shen, Y.; Wang, S. Coupled Observation Decomposed Hidden Markov Model for Multiperson Activity Recognition. IEEE Trans. Circuits Syst. Video Technol. 2012, 22, 1306–1320. [Google Scholar] [CrossRef]
- Ji, X.F.; Wang, C.H.; Wang, Y.Y. A two-dimensional interactive motion recognition method based on hierarchical structure. J. Intell. Syst. 2015, 10, 893–900. [Google Scholar]
- Vahdat, A.; Gao, B.; Ranjbar, M.; Mori, G. A discriminative key pose sequence model for recognizing human interactions. In Proceedings of the 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), Barcelona, Spain, 6−13 November 2011; pp. 1729–1736. [Google Scholar] [CrossRef] [Green Version]
- Dalal, N.; Triggs, B. Histograms of Oriented Gradients for Human Detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2015. [Google Scholar] [CrossRef] [Green Version]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Kalman, R. A New Approach to Linear Filtering and Prediction Problems. J. Basic Eng. 1960, 82, 35–45. [Google Scholar] [CrossRef] [Green Version]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7 June 2015. [Google Scholar] [CrossRef] [Green Version]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27−30 June 2016. [Google Scholar] [CrossRef] [Green Version]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27−30 June 2016. [Google Scholar] [CrossRef] [Green Version]
- Ryoo, M.S.; Chen, C.C.; Aggarwal, J.K.; Roy-Chowdhury, A. An Overview of Contest on Semantic Description of Human Activities (SDHA) 2010. In Recognizing Patterns in Signals, Speech, Images and Videos. ICPR 2010. Lecture Notes in Computer Science; Ünay, D., Çataltepe, Z., Aksoy, S., Eds.; Springer: Berlin/Heidelberg, Germany, 2010; Volume 6388, Available online: http://cvrc.ece.utexas.edu/SDHA2010/Human_Interaction.html (accessed on 26 August 2010).
- Soomro, K.; Zamir, A.R.; Shah, M. UCF101: A Dataset of 101 Human Action Classes from Videos in the Wild. CRCV-TR-12-01. November 2012. Available online: http://crcv.ucf.edu/data/ (accessed on 1 November 2012).
- Mahmood, M.; Jalal, A.; Sidduqi, M.A. Robust Spatio-Temporal Features for Human Interaction Recognition Via Artificial Neural Network. In Proceedings of the 2018 International Conference on Frontiers of Information Technology (FIT), Islamabad, Pakistan, 17−19 December 2018; pp. 218–223. [Google Scholar] [CrossRef]
- Kong, Y.; Jia, Y.; Fu, Y. Learning Human Interaction by Interactive Phrases. In Computer Vision—ECCV 2012. ECCV 2012. Lecture Notes in Computer Science; Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; Volume 7572. [Google Scholar]
- Shariat, S.; Pavlovic, V. A New Adaptive Segmental Matching Measure for Human Activity Recognition. In Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, NSW, Australia, 3−6 December 2013; pp. 3583–3590. [Google Scholar] [CrossRef]
- Guo, Z.; Wang, X.; Wang, B.; Xie, Z. A Novel 3D Gradient LBP Descriptor for Action Recognition. Trans. Inf. Syst. 2017, 100, 1388–1392. [Google Scholar] [CrossRef] [Green Version]
Sampling Method | Inception (%) | ResNet (%) | Multi-Feature Fusion (%) |
---|---|---|---|
Equal interval sampling | 66.7 | 72.2 | 83.3 |
Downsampling based on Gaussian model | 69.4 | 77.8 | 86.1 |
Equal interval sampling | 66.7 | 72.2 | 83.3 |
Recognition Methods | UT Set 1 Recognition Accuracy (%) | UT Set 2 Recognition Accuracy (%) | UCF101 Interactive Recognition Accuracy (%) |
---|---|---|---|
Individual (left) | 83.3 | 77.8 | 75.6 |
Individual (right) | 75.0 | 72.2 | 76.8 |
whole | 86.1 | 83.3 | 81.8 |
Fusion of this paper | 91.7 | 86.1 | 85.4 |
Recognition Methods | UT Set 1 Recognition Accuracy (%) | UT Set 2 Recognition Accuracy (%) | UCF101 Interactive Recognition Accuracy (%) |
---|---|---|---|
HIS color space model [30] | 81.70 | -- | -- |
Interactive phrases [42] | 88.33 | -- | -- |
detection alignment model [43] | 91.57 | -- | -- |
Novel 3D gradient LBP descriptor [44] | 91.42 | -- | -- |
Method of this paper | 91.70 | 86.10 | 85.43 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ye, Q.; Zhong, H.; Qu, C.; Zhang, Y. Human Interaction Recognition Based on Whole-Individual Detection. Sensors 2020, 20, 2346. https://doi.org/10.3390/s20082346
Ye Q, Zhong H, Qu C, Zhang Y. Human Interaction Recognition Based on Whole-Individual Detection. Sensors. 2020; 20(8):2346. https://doi.org/10.3390/s20082346
Chicago/Turabian StyleYe, Qing, Haoxin Zhong, Chang Qu, and Yongmei Zhang. 2020. "Human Interaction Recognition Based on Whole-Individual Detection" Sensors 20, no. 8: 2346. https://doi.org/10.3390/s20082346
APA StyleYe, Q., Zhong, H., Qu, C., & Zhang, Y. (2020). Human Interaction Recognition Based on Whole-Individual Detection. Sensors, 20(8), 2346. https://doi.org/10.3390/s20082346