Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Deep Learning for Activity Recognition Using Audio and Video

Electronics 2022, 11(5), 782; https://doi.org/10.3390/electronics11050782

by Francisco Reinolds¹

, Cristiana Neto^2,3

and José Machado^2,3,*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Electronics 2022, 11(5), 782; https://doi.org/10.3390/electronics11050782

Submission received: 31 January 2022 / Revised: 22 February 2022 / Accepted: 25 February 2022 / Published: 3 March 2022

(This article belongs to the Special Issue Advances in Explainable Artificial Intelligence and Edge Computing Applications)

Round 1

Reviewer 1 Report

The author has presented an object classification model that incorporates ensemble classifier combining Resnet+CNN to enhance the precision of the audio/video classification model by collecting 500 sample videos. Moreover, the proposed approach minimizes classification errors through the use of pre-processed data and Exploding Gradients to enhance features to accelerate intrusion detection. Overall, the results indicate that the precision of the proposed model for training & testing data in audio/video classification analysis is good higher than that of the SVC. However, I have some suggestions as follows:

1. Partial of the references are out-of-date.

2.In the experiment, the author has used only the Hockey and city walking dataset. It is seemed that there are some more street violent actions like political violence, assault, arson, street blockades, sabotage, and property destruction, etc. Thus, the author should use some Up-to-Date general Dataset to re-experiment their proposed model.

3.The proposed model needs to compare the performance with the existing schemes.

4. In the current version, there are some crucial description diagrams missed like system model architecture, error convergence diagram, performance comparison metric, etc.

5.The author claimed that they have used Temporal-Robust features model and a Bag of Words algorithm. But there is no such clear presentation about the scheme or how the model is being modified.

6.The author should include a pipeline flowchart of their proposed model to provide a more clear understanding.

7.What is the novel contribution of the proposed system with respect to the following work?

Author Response

Thank you very much for your comments.

Author Response File: Author Response.pdf

Reviewer 2 Report

The authors were recommended to revise the abstract section to highlight your academic contributions and methodology used in your study.
The authors were recommended to polish the manuscript language to correct out gramaar and typo mistakes. E.g., letters s and a in the statement “This model leverages technology being used in State of the Art methods…” were supposed to be lower case.
The authors were recommended to delete line 75 to line 78 which did not provide sufficient information to readers.
Please provide more details for the parameter settings in your study.
The following studies were recommended to be properly cited in the study: [1] Video-Based Detection Infrastructure Enhancement for Automated Ship Recognition and Behavior Analysis, Journal of Advanced Transportation, vol. 2020, pp. 1-12, 2020. [2] A Survey on Human Behavior Recognition Using Smartphone-Based Ultrasonic Signal, IEEE Access, vol. 7, pp. 100581-100604, 2019.

Author Response

Thank you very much for your comments.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

My commens have been addressed.

Article Menu

Deep Learning for Activity Recognition Using Audio and Video

Further Information

Guidelines

MDPI Initiatives

Follow MDPI