Next Article in Journal
An Algorithm for Local Dynamic Map Generation for Safe UAV Navigation
Next Article in Special Issue
Machine Learning-Assisted Adaptive Modulation for Optimized Drone-User Communication in B5G
Previous Article in Journal
Area-Wide Prediction of Vertebrate and Invertebrate Hole Density and Depth across a Climate Gradient in Chile Based on UAV and Machine Learning
Previous Article in Special Issue
A Network Slicing Framework for UAV-Aided Vehicular Networks
 
 
Article
Peer-Review Record

Background Invariant Faster Motion Modeling for Drone Action Recognition

by Ketan Kotecha 1,*, Deepak Garg 2, Balmukund Mishra 2, Pratik Narang 3 and Vipul Kumar Mishra 2
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Submission received: 25 July 2021 / Revised: 25 August 2021 / Accepted: 25 August 2021 / Published: 31 August 2021

Round 1

Reviewer 1 Report

Research on the use of AI for the purpose of recognizing the image obtained from a drone is valuable and very necessary. They already prove the usefulness of this technology. The authors focused on adapting the image analysis technology to the quality and specificity of the material obtained from the drone, which gave impressive results.

Taking into account the labor-intensive nature of the development and implementation of the technology developed by the authors, it is necessary to consider whether the development of appropriate flight performance standards will allow for obtaining data whose quality will be similar to that obtained from the ground.

Comments for author File: Comments.pdf

Author Response

Reviewer-1

Sr.

No.

Comments

Response

1.      

Research on the use of AI for the purpose of recognizing the image obtained from a drone is valuable and very necessary. They already prove the usefulness of this technology. The authors focused on adapting the image analysis technology to the quality and specificity of the material obtained from the drone, which gave impressive results.

Thank you so much for this appreciation.

2.      

Taking into account the labor-intensive nature of the development and implementation of the technology developed by the authors, it is necessary to consider whether the development of appropriate flight performance standards will allow for obtaining data whose quality will be similar to that obtained from the ground.

Thank you so much for this suggestion, although we have experimented with such variety, but somehow it was not included in the previous version of the paper.

We have added one paragraph describing the effect of drone flying height. and other parameters related to the drone. Same can be seen in the line no-498 which is a part of Discussion section of the paper.

 

Reviewer-1 additional comments and response

S.No

Comments

Response

1.      

Please correct “Aerial and drone surveillance29can”

Corrected as 29 is not required here, we removed it. Changes can be seen at line no-38.

2.      

Please check “EPE loss1”

We have added the full form of EPE loss, now I think it will make sense for the reader.

3.      

Line 154 - Please change the layout of Figure 1. Current is uncomfortable for the reader

We have slightly changed the layout and orientation of the figure-1 for clarity.

4.      

Line 280 - Please check the subsection numbering

For subsection 4.2, incorrectly listed as 4.1, We have updated the number and heading. Changes can be seen at line no-288

5.      

Line 466 - Please check the figure numbering

Numbering of figure 3 and figure 4 is updated as previously it was figure 4 and figure 5 which was not in order.

6.      

Line 471 - Please check the figure numbering

Numbering of figure 3 and figure 4 is updated as previously it was figure 4 and figure 5 which was not in order.

 

Author Response File: Author Response.docx

Reviewer 2 Report

the authors highlighted a novel architecture that eliminates the dependency on Optical flow. It is good work, however there are few comments to improve paper 

1- activity recogniation is mainly discussed in "Multi-user activity recognition: Challenges and opportunities", therefore, i suggest authors to read and discuss this concept with more details with the help of this paper. 

2-  i suggest authors to create table to comperson the existing work in section literature.

 

Author Response

Reviewer-2

Sr.

No.

Comments

Response

1.      

The authors highlighted a novel architecture that eliminates the dependency on Optical flow. It is good work, however there are few comments to improve paper

Thank you so much for this appreciation.

2.      

Activity recognition is mainly discussed in "Multi-user activity recognition: Challenges and opportunities", therefore, I suggest authors to read and discuss this concept with more details with the help of this paper. 

Thanks for this suggestion, this paper is a well-organized summary of various challenges and opportunity in muli-human action recognition in same frame of videos. We read this paper, and added various lines in our introduction and literature section wherever required. Some added lines are.

1. Various approaches for multi-human action recognition is summarized in [56], with the detailed discussion on opportunity and challenges.

 At line no - 47

2. However, applying these models in multi-human present in same scene especially in unstructured environment is little complicated and article [56] gives the detailed discussion on complexity and opportunity of this task.

At line no - 99

3.      

I suggest authors to create table to comparison the existing work in section literature.

 

Thank you so much for this suggestion, we have added on table in literature section, which compares some key research and dataset proposed in literature for drone-based surveillance.

The changes can be seen in the paper in

Table 1. at page number 3.

Author Response File: Author Response.docx

Reviewer 3 Report

Do not use first person such as we, us, etc.

Why is the whole of section 4.1 in italics. Also 5.1 - maybe just a selection that made the whole paragraph instead of the heading...

In the discussion and the conclusion (these two could actually be joined), it would be nice to have some more discussion. Not only a summary of the paper, but also of the results and comparison of the performance. What was the failure rate and error in the different model observations. Some statistical analysis. How does this compare to systems and models out there not mentioned in this paper? These are some ideas of things to discuss and add.

Author Response

Reviewer-3

Sr. No.

Comments

Response

1.      

Do not use first person such as we, us, etc.

Thank you so much for this appreciation. We have reviewed the paper and corrected all the mistake of such type.

For example:

" Therefore, we come up with a novel architecture proposed in this paper. "  à " Therefore, this paper proposed a novel architecture for this."

2.      

 Why is the whole of section 4.1 in italics. Also 5.1 - maybe just a selection that made the whole paragraph instead of the heading...

Thank you for your suggestion,

We have corrected the subheading which were italic before as nonitalic throughout the paper.

3.      

In the discussion and the conclusion (these two could actually be joined), it would be nice to have some more discussion. Not only a summary of the paper, but also of the results and comparison of the performance. What was the failure rate and error in the different model observations? Some statistical analysis. How does this compare to systems and models out there not mentioned in this paper? These are some ideas of things to discuss and add.

We have added on paragraph for the failure cases and limitations of the proposed approach and it can be seen in the paper from line no. 491 to 497. The added paragraph is "However, these performances have certain limitations, as the height of the drone will increase, the human feature changes especially in drone footage captured from top. These experiments were performed on the drone footage captured from our DJI Mavic Pro drone, with camera having 60 frame per second rate. Images and videos were captured from 10 meter to 50-meter height. Some important failure cases arise when we have tested with the dataset published in aerial image of Okutama dataset. Since, the height from where those images were captured are more than 60 meters."

 

Author Response File: Author Response.docx

Reviewer 4 Report

Strengths

  • Clear motivation
  • Good literature review
  • Good experimental setup and results

Weaknesses – presentation issues (see detailed comments)

  • Problems with formal notation (see detailed comments)
  • Suggestion – create a formal notation section and include all the definitions.
  • Section and subsection headers are improperly capitalized
  • Improper capitalization of words in the middle of the sentence (ex: Besides, For in line 435)
  • missing words, incomplete or incorrect statements

Other questions:

  • Line 82 A unique five-class-action dataset for aerial drone surveillance was introduced, with each image containing a single set of actions.  Suggestion: Give the number of 480 clips having  14,400 frames.  Also, give the distribution of number of objects in each class.  Will this dataset be made public?
  • Line 190 The dataset contains a total of 480 clips having 14,400 frames. Did you manually label these frames? If yes, this should be mentioned.

Abstract:   The two issues should be clearly identified i) time consuming aspect and  i) lack of diverse  action recognition models.

Presentation problems (Detailed Comments)

Line 14 - Most state-of-the-art methods heavily rely on optical flow for motion modeling and representation, while it is time-consuming  (and is time consuming?)

Line 38 -  Aerial and drone surveillance29 ?

Line 56 – What is EPE loss ( full form)

Line 61 – Please give citations’

Line 101 Which enable the models to lean the features  (incorrect language)

Line 117 where both 2d and 3d features 

Line 144 – Full form of YOLO

Line 145 - A separate module for detecting the human 145 body and its extension with other models for the detection of action are used in [50]. 

Line 182 dataset preprocessing  (Capitalization)

Line 188- Human walking and human standing  (incorrect Capitalization)

Line  193 Table1, line 208 Figure1 equation8. (spacing problem) – You should check for this problem throughout the paper since it is repeated in many places.

Line 211 - each video clip Xi = [c ∗ l∗ h ∗ w]. Where I = f (x, y)  -- This statement does not appear to be correct.  There is no I in the clip.    Does I represent the image?  Also X_i (this should be a subscript).  What is i? Is i between 1 and 480? 

The mathematical formulation of the video clip, its frames should be rewritten.

Line 269 – Please give equations for calculating localization loss and classification loss used in this paper with appropriate citation (if necessary).

Equation 5, line 302  – There is no S in the equation.  Also, why is r starting from 2.  There should be a remark.

Line 359 – You mean operational architecture?

Line 366 Mean square error (mse)

Equation 6 – In line 211 Xi was used as a video clip.  Here Vi is used.  All of these notations should be consistent.

Author Response

Reviewer-4

Sr.

No.

Comments

Response

1.      

create a formal notation section and include all the definitions.

Section and subsection headers are improperly capitalized

 

Thank you for the suggestion.

 

We have corrected the issue related to the section and subsection header throughout the paper. The changes can be seen in the revised paper.

2.      

Improper capitalization of words in the middle of the sentence (ex: Besides, For in line 435)

missing words, incomplete or incorrect statements

 

Thank you for this suggestion. We have corrected the issue. However, somewhere, it is required as the new line was beginning.

3.      

Line 82 A unique five-class-action dataset for aerial drone surveillance was introduced, with each image containing a single set of actions.  Suggestion: Give the number of 480 clips having 14,400        bframes.  Also, give the distribution of number of objects in each class.  Will this dataset be made public?

 

Thank you for the suggestion.

We have added one line describing the total number of video clips in the proposed dataset. Added line can be seen at line no – 84. However, the other details of this dataset are described in the dataset section.

 

A unique five-class-action dataset for aerial drone surveillance was introduced, with approximately 480 number of video clips distributed in five different classes.

 

Yes, this dataset will be made public, we are planning to create a website where all our dataset will be available, mostly captured from drone.

 

4.      

Line 190 The dataset contains a total of 480 clips having 14,400 frames. Did you manually label these frames? If yes, this should be mentioned.

 

We have added the suggestion, as we have manually labeled this dataset into five different classes. Now this statement is clearer.

The added change can be seen at line no – 197.

 

The dataset contains a total of 480 clips having 14400 frames. The dataset is labeled in five different classes manually, and the number of videos is equally divided between all five classes

 

Author Response File: Author Response.docx

Back to TopTop