Next Article in Journal
Non-Linear Simulation by Harmonic Balance Techniques of Load Modulated Power Amplifier Driven by Random Modulated Signals
Previous Article in Journal
Self-Evaluation of Trajectory Predictors for Autonomous Driving
 
 
Article
Peer-Review Record

MgMViT: Multi-Granularity and Multi-Scale Vision Transformer for Efficient Action Recognition

Electronics 2024, 13(5), 948; https://doi.org/10.3390/electronics13050948
by Hua Huo * and Bingjie Li
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3: Anonymous
Electronics 2024, 13(5), 948; https://doi.org/10.3390/electronics13050948
Submission received: 30 January 2024 / Revised: 25 February 2024 / Accepted: 26 February 2024 / Published: 29 February 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Congratulation, interesting investigation, howeber...

It would be interesting to detail the metrics to determine the performance of the model.

Author Response

The authors gratefully thank the editor and the reviewers for your comments and suggestions. We are thankful for your great efforts in helping us improve the manuscript. We have tried our best to address all the comments and hope the quality of the revised manuscript has achieved the standard of the journal. Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This paper presents MgMViT, a novel transformer-based architecture for efficient video action recognition. A multi-granularity and multi-scale fusion approach is proposed, which remarkably reduces model parameters and improves efficiency whilst maintaining competitive accuracies on Kinetics-400 and Something-Something-v2 datasets. The work is of decent interest to video understanding community. This paper is well structured with good attention paid to implementation details.

A major concern is with insufficient experiments. Swin Transformer, Video Transformer Network, and MoViNet are reviewed but not taken for testing, given the latter two are designed for efficiency.

This proposal is much inspired by MViTv2 with many experiment results used for reference. It would be helpful if the authors could confirm they use identical training and experimental settings.

Since this paper proposes a plug-and-play transformer module, it would help to demonstrate its effect on other different tasks such as video keyframe retrieval, video captioning etc.

 

Comments on the Quality of English Language

Page 2 Line 73: "which can not compromise.." does not make sense.

Abbreviation not used consistently, e.g., VTN not defined at first occurence. 

Author Response

The authors gratefully thank the editor and the reviewers for your comments and suggestions. We are thankful for your great efforts in helping us improve the manuscript. We have tried our best to address all the comments and hope the quality of the revised manuscript has achieved the standard of the journal. Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The article seems interesting and addresses an important issue but there are still some comments:

1. It is unclear whether the comparative analysis with the studies of other authors is appropriate. Looking to the tables of experimental results, it becomes evident that, in certain aspects, the proposed solution is not optimal, with the Top-5 accuracy being among the lowest (Table 4). Nonetheless, it should be acknowledged that the objectives and performance metrics of other authors' work may differ.

2.The graphical results of the experiments appear weak and redundant, as they largely overlap with the data presented in the tables. Therefore, additional experimental saree necessary

3. Figures from 1 to 3 are very abstract, therefore, they require more detailed and specific information relevant to this study.

 

4. Is it possible to consider more metrics than just Top-1 and Top-5?

5. A discussion section would also be very useful

Author Response

The authors gratefully thank the editor and the reviewers for your comments and suggestions. We are thankful for your great efforts in helping us improve the manuscript. We have tried our best to address all the comments and hope the quality of the revised manuscript has achieved the standard of the journal. Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 3 Report

Comments and Suggestions for Authors

Thank you for the explanations and corrections made to the article

Back to TopTop