Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Joint Soft–Hard Attention for Self-Supervised Monocular Depth Estimation

Sensors 2021, 21(21), 6956; https://doi.org/10.3390/s21216956

by Chao Fan^1,2,3, Zhenyu Yin^2,3,*, Fulong Xu^1,2,3

, Anying Chai^1,2,3

and Feiqing Zhang^1,2,3

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Sensors 2021, 21(21), 6956; https://doi.org/10.3390/s21216956

Submission received: 9 September 2021 / Revised: 13 October 2021 / Accepted: 18 October 2021 / Published: 20 October 2021

(This article belongs to the Section Sensing and Imaging)

Round 1

Reviewer 1 Report

I've found this job rubust and of interest. I just have some minor concerns like double check the formality og eq. in Sec. 3.2.1. Spatial Attention.

I would also clarify a bit more the interaction between the soft and hard attention strategies, which sometimes goes a little out of focus.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

The paper presents a method for improving upon the current state-of-the-art on self-supervised monocular depth estimation. The authors employ a soft attention module between the network encoder and decoder and a hard attention strategy for multi-scale fusion.

Reported results show an improvement for monocular depth estimation. However, the improvement is rather small, and Table 4 shows that the hard attention strategy is only marginally improving upon the Baseline + SA + CA. In fact, the improvement might be small enough to be attributed to different random initializations.

However, qualitative results do show an improvement in some scenarios, but it is hard to judge based only on that.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Authors propose a self-supervised monocular depth estimation approach based in soft and hard attention, intergrating attention in the model architecture. The methods have been properly described and implemented and diagrams have also been provided.

The proposed approached was benchmarked against other SOTA approaches showing very promising resulting in a number of benchmarks datasets.

Comments:

1. line 29: more references are needed.

2. line 47: what is meant by "A few special pixels"?

3. Provide a more details discussion on the proposed method performance with respect to other methods appearing in Table 3. The results appear to be mixed, so what is the advantage of this approach compared to the others?

4. Please provide some information on training and inference run times and also number of parameters for the end-to-end architecture.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Article Menu

Joint Soft–Hard Attention for Self-Supervised Monocular Depth Estimation

Further Information

Guidelines

MDPI Initiatives

Follow MDPI