Next Article in Journal
Assessment of Handover Prediction Models in Estimation of Cycle Times for Manual Assembly Tasks in a Human–Robot Collaborative Environment
Previous Article in Journal
Online Monitoring of a Long-Term Full-Scale Constructed Wetland for the Treatment of Winery Wastewater in Italy
 
 
Article
Peer-Review Record

Semi-CNN Architecture for Effective Spatio-Temporal Learning in Action Recognition

Appl. Sci. 2020, 10(2), 557; https://doi.org/10.3390/app10020557
by Mei Chee Leong 1, Dilip K. Prasad 2,*, Yong Tsui Lee 3 and Feng Lin 4
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Appl. Sci. 2020, 10(2), 557; https://doi.org/10.3390/app10020557
Submission received: 5 December 2019 / Revised: 9 January 2020 / Accepted: 10 January 2020 / Published: 12 January 2020
(This article belongs to the Section Computing and Artificial Intelligence)

Round 1

Reviewer 1 Report

Action recognition has always been a topic of interest to the computer vision community, and increasing the accuracy of action classification has been considered a challenging task. This paper proposes an interesting 'semi-CNN' architecture that combines 1D, 2D, and 3D convolutions improving the action classification results compared to systems using pure 3D CNN-based architectures, reducing parameters to optimize, and lowering overfitting.

The results (both qualitative and quantitative) presented in the paper support the claims, and I believe this is an interesting architectural modification over 3D CNN-based action classifiers.

But, it would have been interesting to see:

The results of the proposed algorithm using other datasets More experiments for fine-tuning the network parameters, as mentioned in the conclusion

 

 

 

 

 

 

Author Response

Response to reviewer 1

Action recognition has always been a topic of interest to the computer vision community, and increasing the accuracy of action classification has been considered a challenging task. This paper proposes an interesting 'semi-CNN' architecture that combines 1D, 2D, and 3D convolutions improving the action classification results compared to systems using pure 3D CNN-based architectures, reducing parameters to optimize, and lowering overfitting.

The results (both qualitative and quantitative) presented in the paper support the claims, and I believe this is an interesting architectural modification over 3D CNN-based action classifiers.

Response: Thank you for appreciating the focus of our work and the results.

 

But, it would have been interesting to see:

The results of the proposed algorithm using other datasets More experiments for fine-tuning the network parameters, as mentioned in the conclusion.

Response: Thank you for useful suggestion to improve our work. To address more experiments concern we have added a new additional sub-section on 4.5. Experiment on Discontinued Motion. This section evaluates the same architectures with video input of discontinued motion changes, and presented the results in Table 4. Semi-CNN consistently achieved 3% improved performance across all the models, while some of the 3D-CNN performances deteriorate.

Reviewer 2 Report

Action recognition is an important topic and useful in many applications context.

The paper is well written and has a clear structure.

The important related work (3D CNN, two-streams CNNs, spatio-temporal fusion) is referenced in the paper.

The authors propose a efficient network architecture called Semi-CNN, that combines 2D CNN spatial layers and 3D CNN spatio-temporal layers, utilizing three different backbones (VGG, ResNet, DenseNet).

The experiments and evaluation on the validation dataset are well motivated and sound.
They show a clear advantage oft he Semi-CNN architecture with respect the the state of the art, as the number of model parameters could be reduced significantly.


Overall, this paper is a good contribution.

Author Response

Thank you reviewer for your time and effort in understanding our work and appreciating it.

Back to TopTop