Next Article in Journal
Evaluating Slope Deformation of Earth Dams Due to Earthquake Shaking Using MARS and GMDH Techniques
Next Article in Special Issue
Gesture Recognition Based on 3D Human Pose Estimation and Body Part Segmentation for RGB Data Input
Previous Article in Journal
MASS: Microphone Array Speech Simulator in Room Acoustic Environment for Multi-Channel Speech Coding and Enhancement
 
 
Article
Peer-Review Record

Action Recognition Based on the Fusion of Graph Convolutional Networks with High Order Features

Appl. Sci. 2020, 10(4), 1482; https://doi.org/10.3390/app10041482
by Jiuqing Dong 1,†, Yongbin Gao 1,*, Hyo Jong Lee 2, Heng Zhou 1, Yifan Yao 1, Zhijun Fang 1 and Bo Huang 1
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Appl. Sci. 2020, 10(4), 1482; https://doi.org/10.3390/app10041482
Submission received: 23 December 2019 / Revised: 9 February 2020 / Accepted: 18 February 2020 / Published: 21 February 2020
(This article belongs to the Special Issue Deep Learning-Based Action Recognition)

Round 1

Reviewer 1 Report

The authors of the paper present a model that uses higher order features like those from velocity and acceleration of skeletal parts and joints to enhance the performance of graph convolutional neural networks on the task of skeleton-based action recognition. However, there are some major and minor concerns that this reviewer has regarding this paper. This reviewer recommends acceptance for publication of this paper after the authors majorly revise and modify the paper to address all the concerns of this reviewer. These concerns are listed below.  

Minor Concerns:

Please revise the English language in this paper. There are several major and minor English language errors that cause a severe impediment to reading the paper.

Major Concerns:

This paper can be easily cut down to1/2 or 2/3 its current length. The authors repeat multiple points throughout the paper and also engage in needless distractions in the narrative.

Author Response

Dear reviewers, We would like to express our deep gratitude for your great review efforts and the valuable comments, which have substantially helped us in improving the revision. This response letter includes amendments that have been incorporated in the revision and the responses to the reviewers’ comments. We have done our best to comply with all the reviewers’ comments as much as possible. The black fonts are the comments from reviewers, and the red fonts are the replies to the corresponding comments. Response to Reviewer 1 Comments Point 1: The authors of the paper present a model that uses higher order features like those from velocity and acceleration of skeletal parts and joints to enhance the performance of graph convolutional neural networks on the task of skeleton-based action recognition. However, there are some major and minor concerns that this reviewer has regarding this paper. This reviewer recommends acceptance for publication of this paper after the authors majorly revise and modify the paper to address all the concerns of this reviewer. Response 1: Thanks for your comment. In our manuscript, we mainly use high-order information to enhance the temporal and spatial features in the original features. Experimental results show that our method outperforms other methods. We have formatted our manuscript in the revised version and modified the problems accordingly, such as adding and deleting paragraphs. We have fixed some grammar errors and format errors. Point 2: Minor Concerns: Please revise the English language in this paper. There are several major and minor English language errors that cause a severe impediment to reading the paper. Response 2: Thanks for your suggestion. We have fixed at least 20 grammar errors, spelling mistakes, and format errors. These errors have already been marked in blue. Point 3: Major Concerns: This paper can be easily cut down to1/2 or 2/3 its current length. The authors repeat multiple points throughout the paper and also engage in needless distractions in the narrative. Response 3: Thanks for your review. We have removed redundant narratives. The body of manuscript has been cut down to 3/4. The article looks more concise. We think that the rest of the manuscript is necessary. We have deleted some duplicate sentences in Sec.1 and Sec.2(Page1-3). Figure 4 and Table 1 have been removed. The introduction of datasets in Sec.4 has been shortened. Finally, we have removed the Abbreviation.

Author Response File: Author Response.docx

Reviewer 2 Report

The authors presented an action recognition method by extending spatial-temporal features.

However, the contribution is quite low because it is just an extended version of the baseline method.

Shi, L.; Zhang, Y.; Cheng, J.; Lu, H. Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 12026–12035.

It is almost the same as the baseline method except velocity, relative distance and acceleration.

Other comments:

line 151: the graph -> The graph
line 204: weighted average fusion -> too ambiguous
line 287: Js, Bs, JVs, BVs, RDs  -> use full name
Table 2, Table 3: same caption
Table 4, Table 5: same caption
line 305: 4.5 Compare -> Comparison
Table 7: why not use  2s-AGCN and AGC-LSTM for comparison

 

Author Response

Dear reviewers, We would like to express our deep gratitude for your great review efforts and the valuable comments, which have substantially helped us in improving the revision. This response letter includes amendments that have been incorporated in the revision and the responses to the reviewers’ comments. We have done our best to comply with all the reviewers’ comments as much as possible. The black fonts are the comments from reviewers, and the red fonts are the replies to the corresponding comments. Point 1: The authors presented an action recognition method by extending spatial-temporal features. However, the contribution is quite low because it is just an extended version of the baseline method. “Shi, L.; Zhang, Y.; Cheng, J.; Lu, H. Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 12026–12035.” It is almost the same as the baseline method except velocity, relative distance and acceleration. Response 1: Thanks for your comment. As the reviewer pointed out, the backbone network is similar with the baseline method, while the proposed high-order features which are highly effective for action recognition, such as velocity, relative distance and acceleration, highlight the importance of spatial and temporal relationships. Although the proposed features are computational simple, the accuracy is improved significantly and the proposed method achieves the state-of-the-art performance. Point 2: line 151: the graph -> The graph Response 2: Thanks for pointing out the error. We have modified this error and other grammar errors, spelling mistakes, and format errors. These error have already been marked in blue. Point 3: line 204: weighted average fusion -> too ambiguous Response 3: Thanks for your review. We believe that the information contained in the joints, bones, and relative distance is the most fundamental and important. Therefore, these features should be set large weights. The velocity and acceleration information are auxiliary features that strengthen the temporal relationship. These features should be set small weights. We have added a paragraph to illuminate the weighted summation method in our revised version. (Page 7) Point 4: line 287: Js, Bs, JVs, BVs, RDs -> use full name Response 4: Thanks for your suggestion. We have changed the abbreviation to full name accordingly. Point 5: Table 2, Table 3: same caption Response 5: Thanks for pointing out the caption errors of Table 1 and Table 2. The captions of these two tables are extremely similar. These two captions differ by only one word: “cross-subject” and “cross-view”. They are two criterions of NTU-RGBD dataset. We have updated the manuscript and repeatedly confirmed that the caption of each table is correct. Note: We deleted an unnecessary table, so the number of the table changed. Point 6: Table 4, Table 5: same caption Response 6: Thanks for pointing out the caption errors of Table 3 and Table 4. This concern is the same as the previous one. These two captions differ by only one word: “cross-subject” and “cross-setup”. They are two criterions of NTU-RGBD-120 dataset. These two captions are easily confused. We have fixed the manuscript and repeatedly confirmed that the caption of each table is correct. Note: We deleted an unnecessary table, so the number of the table changed. Point 7: line 305: 4.5 Compare -> Comparison Response 7: Thanks for pointing out the grammar error. We have modified this error and fixed at least 20 other language errors. These errors have already been smarked in blue. Point 8: Table 7: why not use 2s-AGCN and AGC-LSTM for comparison Response 8: Thanks for your question. As we have stated at Page 9, since the NTU-RGBD-120 [1] dataset was released in 2019, there is no result reported on this dataset for methods of 2s-AGCN [2] and AGC-LSTM[3], and it is non-trivial and maybe inaccurate if we repeat the experiments of these methods on NTU-RGBD-120 dataset. Therefore, we only cited the result of relevant methods mentioned in [1]. However, we do compare with 2s-AGCN and AGC-LSTM on another dataset of NTU-RGBD, in which the proposed method achieves better performance. Reference [1] Liu, J.; Shahroudy, A.; Perez, M.L.; Wang, G.; Duan, L.Y.; Chichung, A.K. NTU RGB+ D 120: A Large-Scale Benchmark for 3D Human Activity Understanding. IEEE transactions on pattern analysis and machine intelligence 2019 [2] Shi, L.; Zhang, Y.; Cheng, J.; Lu, H. Two-Stream Adaptive Graph Convolutional Networks for 332 Skeleton-Based Action Recognition. Proceedings of the IEEE Conference on Computer Vision and 333 Pattern Recognition, 2019, pp. 12026–12035. [3] Si, C.; Chen, W.; Wang, W.; Wang, L.; Tan, T. An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 1227–1236.

Round 2

Reviewer 1 Report

While the paper still can be improved in multiple ways, this reviewer is fine if the editor wishes to accept the paper for publication

Reviewer 2 Report

I agree with the reviewers' rebuttal.

The idea of feature extension is simple but it can improve the action recognition performance.

Back to TopTop