Next Article in Journal
Identifying Potential Landslides in Steep Mountainous Areas Based on Improved Seasonal Interferometry Stacking-InSAR
Next Article in Special Issue
Quantifying Multi-Scale Performance of Geometric Features for Efficient Extraction of Insulators from Point Clouds
Previous Article in Journal
Multilayer Densities of the Crust and Upper Mantle in the South China Sea Using Gravity Multiscale Analysis
Previous Article in Special Issue
Target Positioning for Complex Scenes in Remote Sensing Frame Using Depth Estimation Based on Optical Flow Information
 
 
Communication
Peer-Review Record

Self-Supervised Monocular Depth Estimation Using Global and Local Mixed Multi-Scale Feature Enhancement Network for Low-Altitude UAV Remote Sensing

Remote Sens. 2023, 15(13), 3275; https://doi.org/10.3390/rs15133275
by Rong Chang 1, Kailong Yu 2,* and Yang Yang 2
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3:
Remote Sens. 2023, 15(13), 3275; https://doi.org/10.3390/rs15133275
Submission received: 18 April 2023 / Revised: 20 June 2023 / Accepted: 24 June 2023 / Published: 26 June 2023
(This article belongs to the Special Issue Drone Remote Sensing II)

Round 1

Reviewer 1 Report

The paper addresses the task of depth estimation from the perspective of UAVs flying at low altitudes. Notably, the authors argue this is more challenging than the more usual indoor or automotive scenario to varying scale and non-uniform distribution of the depth. Due to this, the author proposes to use a multi-scale depth decoder and a nove attention module inspired by the Squeeze Excitation approach. The method is validated on the UAVid dataset on different metrics showing that is able to outperfom some state-of-the-art approaches by a discrete margin.  

There are a handful points for criticism. First, the related work review should be improved especially considering more recent unsupervised /self-supervised monocular depth estimation. Similarly, while the method from Godard et al. is a milestone in this area, the are more advanced models and techniques that could have been considered for the validation but also for building possibly a more perfomant network, e.g. Vision Transformers https://doi.org/10.1109/TIP.2022.3167307. The authors could explain why these have not been taken into considerations.

Regarding the methodology description, the structure of the networks lacks precise explaination fo the number of channels in each stage. Concerning the GD loss, the authors should acknoledge that other previous paper already tested with second order gradient and other techniques for introducing the smoothness prior.

Overall, the methods shows improvement on the selected benchmark, but doubts still remain that more recent method not considered by the study may already have solved the investigated challenges.

There are only a few sentences that may be corrected, e.g., lines 140,354. The authors should perform only a quick review of the English style.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

I would recommend the replacement of the acronym UAV with the acronym UAS (Unmanned Aerial System) when you refer to the whole system (UAV and sensor)  - Lines 1, 6

Please uppercase the first letters of the Unmanned Aerial Vehicles - Lines 19, 75, 148, 283. It is also unnecessary to use the whole explanation, once you have already explained it the first time it is mentioned in your manuscript.

Please use ":" or another way to separate the method name from the sentence in lines 309, 318.

In line 3, you mention traditional depth estimation methods, maybe you could mention some, in two lines. 

 

I recommend minor changes to the grammar and quality of the English language and minor text editing.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

The analysis of single images is an important question for many problems. The possibility of reconstructing secondary features from the analysis of one image is a very urgent problem, especially for the areas of automated control and remote decision making. Possible areas of application can be: remote sensing and mapping; auto-driving and UAV control systems; security systems; entertainment and games; et al. Solving the problem of recovering volume data and reconstructing spatial data is a complex problem, and its automation and intellectualization is an important area of research.

The strong point of the article is that authors use the of deep learning to estimate depth from a single low-altitude image. This approach has a wide range of applications. The weak side of the work is the shows excellent results only on data with good lighting and in the conditions of summer shooting of urban areas.

I have the following problems with this paper:

- A combined criterion for assessing the Loss function has been introduced in the work. This criterion includes many parameters, each of which will affect the result. One of the parameters is an attention map, the training of which is also a separate task. The paper describes the application of the trained function with perfectly matched parameters for the UAVid2020dataset elements. The choice of criteria is not justified, although the result obtained is high. It is possible that the exclusion of one parameter will not only not worsen the result, but may even increase it on separate input data.

-  The UAVid2020dataset was taken for research. The work will be stronger if you show the possibility of processing real data in other shooting conditions (sunset, rain, winter, fog). The versatility of your approach would expand the scope.

In my opinion, the paper, after some small changes, can be recommended to accept.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

The authors have addressed the issue raised in the first review entirely.

Back to TopTop