Next Article in Journal
An Exploration of Loess Landform Development Based on Population Ecology Method
Previous Article in Journal
Spatiotemporal Graph Convolutional Network for Multi-Scale Traffic Forecasting
 
 
Article
Peer-Review Record

Extracting Objects’ Spatial–Temporal Information Based on Surveillance Videos and the Digital Surface Model

ISPRS Int. J. Geo-Inf. 2022, 11(2), 103; https://doi.org/10.3390/ijgi11020103
by Shijing Han 1,2, Xiaorui Dong 1, Xiangyang Hao 1,* and Shufeng Miao 3
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
ISPRS Int. J. Geo-Inf. 2022, 11(2), 103; https://doi.org/10.3390/ijgi11020103
Submission received: 17 November 2021 / Revised: 12 January 2022 / Accepted: 30 January 2022 / Published: 2 February 2022

Round 1

Reviewer 1 Report

In this manuscript, the authors propose a fusion framework of 3D geographic information and dynamic objects in surveillance video.

By reading the introduction, I was not convinced on why do we need to intersect the image information with the DSM. Perhaps the authors could provide more examples to clearly demonstrate the relevance of that part.

The study area used to perform the experiments doesn't seem to be complex enough, regarding its topography, as the 3D scene seems to be very flat. This doesn't help in demonstrating the usefulness of this work, nor to demonstrate that the proposed approach actually works in complex 3D terrain. It would be interesting to see the proposed approach in a more complex 3D scene.

In the related work chapter, relevant references are missing. For instance, the methods for Video geo-spatialization do not show any supporting reference. Also in line 99, the authors mention "...several disadvantages..." and I would rather call it challenges.

The sentence starting in line 166 "Wang [31] proposed a shared model ..." is not clear. Please rewrite.

In section 4 (Principles and Methods), a short paragraph introducing the section is missing. That paragraph should briefly describe how the subsections are combined together regarding the implemented methodology, and how that connects, for instance, with figure 1.

In the experimental Analysis section, line 349, the authors mention that "The data used in this paper are translated and rotated." but do not provide any explanation on why they performed those operations. Could you provide a clarification on that?

In line 355, a reference seems to be missing when the authors mention that "to calibrate this camera by Zhang Shengyou calibration." Please provide a reference for the calibration method used.

Figure 11, on page 14, is missing.

Some other language mistakes and typos:

- In the abstract, the beginning of the second sentence "To overcome the limitations..." should be replaced by "To overcome such limitations..."

- The expression "and so on." at the end of the abstract doesn't add any value and should be avoided.

- The paper should be proofread by an English native person as it contains several grammatical mistakes and other typos.

- The tense of the sentences should be in the past since you are talking about experiences and results developed and obtained before (in the past).

- Change "2ed" by "2nd" in all occurrences in the manuscript (there are several occurrences)

- Line 178 "data sets" should be replaced by "datasets"

- Line 212: the word "position" is duplicated

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

The authors proposed a framework that enables the extraction of 3D geographic information about moving objects from surveillance video streams. The framework relies on well-established methods from the field of computer vision such as camera calibration and deep learning-based object detection. All the appropriate steps are thoroughly described and several experiments are conducted showing the results that are promising. The presented research relies on existing methods so the main contribution is the specification of the proposed framework which is unfortunately not properly described. In my opinion, the manuscript needs a very serious major revision to address this issue. I'm not a native English speaker so I don't like to judge language, but it seems to me that it needs significant improvement also. Here is a list of points that I captured and that needs to be addressed in the revised version. The list is not exclusive since most of the consideration regards the abstract, and it should be used as a guideline to revise the entire manuscript.

1) Consider replacing the abbreviation DSM from the title since it is not clear to the wider auditorium. Maybe you can replace it with the "surface model" or "terrain model". 

2) Line 16: Consider changing "dynamic objects in surveillance video" with "dynamic objects extracted from surveillance video". Maybe you should use "moving objects" instead of "dynamic objects".

3) Line 17: "It is a general framework" -> "We propose a general framework". 

4) Lines 17-19: These two sentences are not clear to me, please consider rephrasing them. E.g. the framework does not rely on specific algorithms and models for object extraction. "Research methods" are something you use in research, and not something you propose in the paper.

5) Lines 19-22: You present results without previously properly introducing the method. Are these values correspond to the object detection task? You previously said that the framework is not limited to certain algorithms and models. How did you get those results? Avoid using abbreviations in the abstract.

6) Lines 22-23: What idea? 

7) Line 179: The first sentence seems like a title?

8) Line 436: There is a blank space where Figure 11 should be placed.

9) Figure 15: You stated that your method relies on DSM (I would prefer the term DTM), i.e. not limited to a flat surface like e.g. method described by Xie et al. [7, 8] but it seems that your experiments and visualization is done on the flat surface also.

10) Lines 562-563: What significance? You should explain (beginning with the abstract and ending in the conclusions) what are the gains of the proposed method. Why it is important to acquire coordinates of moving objects from the video stream and how it could be used. 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

The surveillance video and SM is an important aspect not only in data processing but also in security. In this paper, a fusion framework of 3d geographic information and object was described. The paper is interesting and the topic is worth investigating, but a framework indicates that some libraries with the proposal will be added. Is it somewhere published for instance on GitHub? 

Moreover, discuss the need of your proposal and the current state of research in terms of security, for instance, discuss the agent architecture of an intelligent medical system with FL and blockchain. Such ideas would be interesting because your proposal can be applied in many areas. 
The proposal is described, but some pseudocodes for better understanding the implementation should be added.

Is it possible to use your solution in real-time? Can you show time/computational complexities?

The proposal uses YoloV5, but did you freeze some layers? 

Some comparison with state-of-art also should be added to show the real impact on the current state.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

The authors addressed all the issues I had, so my advice is to accept the paper.

Reviewer 3 Report

Accept

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.


Back to TopTop