Next Article in Journal
Development of an Acoustic System for UAV Detection
Next Article in Special Issue
A Hierarchical Learning Approach for Human Action Recognition
Previous Article in Journal
Sub-Millisecond Phase Retrieval for Phase-Diversity Wavefront Sensor
Previous Article in Special Issue
An Acoustic Sensing Gesture Recognition System Design Based on a Hidden Markov Model
 
 
Article
Peer-Review Record

TUHAD: Taekwondo Unit Technique Human Action Dataset with Key Frame-Based CNN Action Recognition

Sensors 2020, 20(17), 4871; https://doi.org/10.3390/s20174871
by Jinkue Lee and Hoeryong Jung *
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Sensors 2020, 20(17), 4871; https://doi.org/10.3390/s20174871
Submission received: 7 August 2020 / Revised: 25 August 2020 / Accepted: 26 August 2020 / Published: 28 August 2020
(This article belongs to the Special Issue Sensor Systems for Gesture Recognition)

Round 1

Reviewer 1 Report

 

This paper provides two main contributions:

  • The description of a new dataset including multi-modal image sequence of Poomsae including more than 1900 actions from 10 experts.
  • The authors propose an action recognition system using a CNN applied to key-frames.

 

I think the paper is interesting and it is well presented. I have some comments in order to improve it:

  • Figure 2: is the same reference wall for both cameras? I so not understand very well this distribution. I’d suggest including both cameras in the same map/figure to see their relative position.
  • Figure 4: T is associated to every subject, isn’t it?
  • Figure 6: optimize the space by reducing the environment in the four figures.
  • Tables 1 and 2 are a bit repetitive. My suggestion would be to comment that every subject repeat 12 times the same action and then comment the exceptions: subjects that repeated 0 or 16 times.
  • I’d suggest increasing the description of the labelling process. How many people labelled the data? Is there any agreement protocol?
  • Section 2.2.2. I ‘d appreciate more details about the CNN. I’d suggest including a table with all the configurations options including number of parameters.
  • Did you do a subject-wise cross validation? Training and testing the system with different subjects? Or did you include all recordings and separated randomly 90%?
  • I’d like to know the influence of the background/environment. Did you do experiments training and testing with different environment?
  • Table 4: same number of decimals. With 1 decimal it would be enough.
  • Figure 10: the chart is a bit confusing including a line graph when there is not relationship/evolution between consecutive modalities? Perhaps a bar graph would be better? Same for figure 13
  • In all figures, I’d suggest considering only 1 decimal for all accuracies.
  • About the key-frames, how does the system know when the action starts or ends? Is the time segmentation provided to the system? This is an important limitation that should be commented in the conclusions.
  • At the end, It would be interesting to provide accuracy numbers combining both side views.
  • It would be interesting to provide some numbers comparing different action recognition systems. This would be the way to demonstrate that the proposed method overperforms previous studies.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 2 Report

This paper proposes an interesting dataset for action recognition, that focuses on Taekwondo activities. To the best of my knowledge, there are no other representative datasets on aekwondo activities.

The dataset is well designed which contains RGB, Depth, and IR modalities. Ten aekwondo experts were invited for action performing.

I have have some minor suggestions: (1) The authors need to analyse why the accuracy of some actions is much higher than others. (2) The motion information is not used for action recognition. Will this degrade the performance? (3) The following recent works on action recognition are relevant to this paper, which needs to be discussed in the Section 1: Skeleton-Based Action Recognition Using Spatio-Temporal LSTM Network with Trust Gates; Skeleton-Based Online Action Prediction Using Scale Selection Network; (4) If possible, the authors are suggested to add a table on existing sport action-based datasets, and compare this dataset with them.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 3 Report

In this paper a Taekwondo Unit-technique Human Action Dataset (TUHAD) that consists of an arranged multi-modal image sequence of Poomsae actions is proposed together with a a key frame-based convolutional neural network architecture.

The study includes an adequate state of the art containing other studies on vision-based Taekwondo action recognition, such as [28], [36], [37] or [38]. However there is not a quantitative comparison in terms of performance and computation time with respect to other works as the aforementioned, this comparison must be included in order to validate the proposal with respect to the state of the art.

The level of the use of the English language is unsatisfactory to be included in this publication. Authors whose primary language is not English are advised to seek help in the preparation of the paper. 

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

I think the authors have addressed properly my comments and the paper can be accepted.

I have several minor aspects:

1.- Regarding the cross validation strategy (previous comment 1.8). I’d suggest including in the paper the explanation given to the reviewer remarking that the division was done at the example level, including samples from the same subjects in all subsets.

2.- Line 362, the reference to table 8 should be a reference to table 9?

3.- Regarding table 8 (multi-view) I think that including the results of pseudo-multi view is a bit confusion because it is very strange that combining both views the results were worse?? My suggestion would be to explain that it was not possible to provide multi-view results because both views are not synchronized and remove the pseudo-multiview results.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Once the proposed changes have been implemented, the paper can be accepted for publication.

Author Response

[Comment 3.1]

Once the proposed changes have been implemented, the paper can be accepted for publication.

 

[Response 3.1]

Thank you for accepting the proposed revisions. The revisions proposed in the response were completely implemented in the revised manuscript.

Back to TopTop