Deep Learning for Activity Recognition Using Audio and Video
Round 1
Reviewer 1 Report
The author has presented an object classification model that incorporates ensemble classifier combining Resnet+CNN to enhance the precision of the audio/video classification model by collecting 500 sample videos. Moreover, the proposed approach minimizes classification errors through the use of pre-processed data and Exploding Gradients to enhance features to accelerate intrusion detection. Overall, the results indicate that the precision of the proposed model for training & testing data in audio/video classification analysis is good higher than that of the SVC. However, I have some suggestions as follows:
1. Partial of the references are out-of-date.
2.In the experiment, the author has used only the Hockey and city walking dataset. It is seemed that there are some more street violent actions like political violence, assault, arson, street blockades, sabotage, and property destruction, etc. Thus, the author should use some Up-to-Date general Dataset to re-experiment their proposed model.
3.The proposed model needs to compare the performance with the existing schemes.
4. In the current version, there are some crucial description diagrams missed like system model architecture, error convergence diagram, performance comparison metric, etc.
5.The author claimed that they have used Temporal-Robust features model and a Bag of Words algorithm. But there is no such clear presentation about the scheme or how the model is being modified.
6.The author should include a pipeline flowchart of their proposed model to provide a more clear understanding.
7.What is the novel contribution of the proposed system with respect to the following work?
Author Response
Thank you very much for your comments.
Author Response File: Author Response.pdf
Reviewer 2 Report
- The authors were recommended to revise the abstract section to highlight your academic contributions and methodology used in your study.
- The authors were recommended to polish the manuscript language to correct out gramaar and typo mistakes. E.g., letters s and a in the statement “This model leverages technology being used in State of the Art methods…” were supposed to be lower case.
- The authors were recommended to delete line 75 to line 78 which did not provide sufficient information to readers.
- Please provide more details for the parameter settings in your study.
- The following studies were recommended to be properly cited in the study: [1] Video-Based Detection Infrastructure Enhancement for Automated Ship Recognition and Behavior Analysis, Journal of Advanced Transportation, vol. 2020, pp. 1-12, 2020. [2] A Survey on Human Behavior Recognition Using Smartphone-Based Ultrasonic Signal, IEEE Access, vol. 7, pp. 100581-100604, 2019.
Author Response
Thank you very much for your comments.
Author Response File: Author Response.pdf
Round 2
Reviewer 2 Report
My commens have been addressed.