Next Article in Journal
Rockburst Hazard Evaluation Using an Extended COPRAS Method with Interval-Valued Fuzzy Information
Previous Article in Journal
Formulation, Physico-Chemical Characterization, and Evaluation of the In Vitro Release Capacity of the Ruta graveolens L. Phytocomplex from Biodegradable Chitosan and Alginate Microspheres
 
 
Review
Peer-Review Record

Attention Mechanism Used in Monocular Depth Estimation: An Overview

Appl. Sci. 2023, 13(17), 9940; https://doi.org/10.3390/app13179940
by Yundong Li *, Xiaokun Wei and Hanlu Fan
Reviewer 1: Anonymous
Reviewer 3:
Appl. Sci. 2023, 13(17), 9940; https://doi.org/10.3390/app13179940
Submission received: 11 July 2023 / Revised: 18 August 2023 / Accepted: 25 August 2023 / Published: 2 September 2023

Round 1

Reviewer 1 Report

 

 

Monocular depth estimation is a fundamental task of computer vision. This paper focuses on a comprehensive review of attention-based methods, which might be a guide line for new researchers of this areas. However, the reviewer has the following considerations the authors might take into consideration.

1. Page 4, Section 2.2. Channel cross-attention. and Spatial cross-attention. Please provide references for these two types of attention.

2. Page 5, Section 3.1, line 203. The authors are suggested to clarify why these four types are classified.

3. line 205, 208, 211, 214. Fig. 1, …, Fig. 4 à Fig. 5, …, Fig. 8?

4. The logical coherence of this paper can be improved. In current version, this paper pays more attention on the coverage of inferences while neglecting their relevance. The text provides a substantial amount of references and discusses the characteristics of the methodology, but it is limited to that. The authors are suggested to focus more on the logical coherence and causal relationships within the topic.

5. Page 17, line 512. Is it the section title of section 4? Section 4 is missed.

6. Section 4 need more details. For example, why is Method a more effective than Method b? What are the underlying principles and significance? A is meaningless of simply listing a set of data.

7. As a review report, a key point (probably the most important) is challenging and future direction, i.e., section 5 of this paper. However, this paper lacks details of these parts. The authors are highly recommend to enrich this section.

8. The authors are suggested to take a thorough check of the whole paper.

   

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

The paper provides an in-depth investigation of attention mechanisms in monocular depth estimation (MDE), a fundamental task in computer vision with applications in virtual reality, 3D reconstruction, and robotic navigation. The authors highlight the significant progress made by convolutional neural network (CNN) based methods compared to traditional visual cue-based approaches. However, they also address the limitations of CNNs, specifically the degradation in MDE performance due to their local receptive fields. To overcome this gap, attention mechanisms have been introduced to model long-range dependencies. The paper offers a comprehensive survey of attention mechanisms in the context of MDE. By categorizing recent attention-related works into CNN-based, Transformer-based, and hybrid (CNN-Transformer based) approaches, the authors provide a clear understanding of how attention impacts the extraction of global features in MDE. The contribution is fair. The paper is well-written, although some typos are found. It should be revised again.

 

However, some literature reviews should be seriously added. It should be explained more about CNN and AI during COVID-19. For example, some following papers should be mathematically added and discussed: COVID-19 and human development: An approach for classification of HDI with deep CNN. Biomed. Signal Process. Control. 81: 104499 (2023), A Digital Human Emotion Modeling Application Using Metaverse Technology in the Post-COVID-19 Era. HCI (19) 2023: 480-489, Feature selection of pre-trained shallow CNN using the QLESCA optimizer: COVID-19 detection as a case study. Appl. Intell. 53(15): 18630-18652 (2023), Roles of Artificial Intelligence and Extended Reality Development in the Post-COVID-19 Era. HCI (41) 2021: 445-454. Also, to provide a more concrete evaluation of the performance comparison, the paper should clearly specify the evaluation metrics used in assessing the attention-based MDE methods. Standard metrics, such as mean absolute error (MAE) and root mean squared error (RMSE), are commonly used in depth estimation tasks and could be included for clarity.

 

In conclusion, the paper presents a good contribution to the field. The paper could be strengthened by providing a thorough review of related work. With these improvements, the paper will have a broader impact and be more useful to the research community. Therefore, I recommend that this paper could be accepted after proper changes. If not, I am afraid to reject this paper.

The paper provides an in-depth investigation of attention mechanisms in monocular depth estimation (MDE), a fundamental task in computer vision with applications in virtual reality, 3D reconstruction, and robotic navigation. The authors highlight the significant progress made by convolutional neural network (CNN) based methods compared to traditional visual cue-based approaches. However, they also address the limitations of CNNs, specifically the degradation in MDE performance due to their local receptive fields. To overcome this gap, attention mechanisms have been introduced to model long-range dependencies. The paper offers a comprehensive survey of attention mechanisms in the context of MDE. By categorizing recent attention-related works into CNN-based, Transformer-based, and hybrid (CNN-Transformer based) approaches, the authors provide a clear understanding of how attention impacts the extraction of global features in MDE. The contribution is fair. The paper is well-written, although some typos are found. It should be revised again.

 

However, some literature reviews should be seriously added. It should be explained more about CNN and AI during COVID-19. For example, some following papers should be mathematically added and discussed: COVID-19 and human development: An approach for classification of HDI with deep CNN. Biomed. Signal Process. Control. 81: 104499 (2023), A Digital Human Emotion Modeling Application Using Metaverse Technology in the Post-COVID-19 Era. HCI (19) 2023: 480-489, Feature selection of pre-trained shallow CNN using the QLESCA optimizer: COVID-19 detection as a case study. Appl. Intell. 53(15): 18630-18652 (2023), Roles of Artificial Intelligence and Extended Reality Development in the Post-COVID-19 Era. HCI (41) 2021: 445-454. Also, to provide a more concrete evaluation of the performance comparison, the paper should clearly specify the evaluation metrics used in assessing the attention-based MDE methods. Standard metrics, such as mean absolute error (MAE) and root mean squared error (RMSE), are commonly used in depth estimation tasks and could be included for clarity.

 

In conclusion, the paper presents a good contribution to the field. The paper could be strengthened by providing a thorough review of related work. With these improvements, the paper will have a broader impact and be more useful to the research community. Therefore, I recommend that this paper could be accepted after proper changes. If not, I am afraid to reject this paper.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

What are the main contributions of the paper need to be placed at the end of the introduction section.

In Fig.1 the input should be specified. Its better to represent the output after each sub-block in Fig.1. It would provide better understandability to the naive readers.

In Fig. 2 and Fig.3 the authors have used different shapes/dimensions for MaxPool and AvgPool. Is there any specific reason for it. if so specify.

In Fig.5, Fig.6, Fig.7 & Fig.8 the block dimensions should be mentioned.

Section 4 is missing. Authors should cross verify the manuscript before submitting it.

The metrics provided in Table 2  and Table 3 are different from the metrics given in Eqn (7) to Eqn (9). Also mention its significance in the discussion section.

Check the references format properly.

Section 4 is missing. Please check before submitting.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

The revised manuscript has been slightly better. However, some points still are not improved yet. For instance, the literature reviews should be revised again. It should be explained more about CNN and AI during COVID-19. Thus, all the following papers should be mathematically added and discussed: COVID-19 and human development: An approach for classification of HDI with deep CNN. Biomed. Signal Process. Control. 81: 104499 (2023), A Digital Human Emotion Modeling Application Using Metaverse Technology in the Post-COVID-19 Era. HCI (19) 2023: 480- 489, Feature selection of pre-trained shallow CNN using the QLESCA optimizer: COVID-19 detection as a case study. Appl. Intell. 53(15): 18630-18652 (2023), Roles of Artificial Intelligence and Extended Reality Development in the Post-COVID-19 Era. HCI (41) 2021: 445- 454. Though some are added, many papers still are not discussed and included. So, the paper should be improved again, including all other issues I mentioned previously.

The revised manuscript has been slightly better. However, some points still are not improved yet. For instance, the literature reviews should be revised again. It should be explained more about CNN and AI during COVID-19. Thus, all the following papers should be mathematically added and discussed: COVID-19 and human development: An approach for classification of HDI with deep CNN. Biomed. Signal Process. Control. 81: 104499 (2023), A Digital Human Emotion Modeling Application Using Metaverse Technology in the Post-COVID-19 Era. HCI (19) 2023: 480- 489, Feature selection of pre-trained shallow CNN using the QLESCA optimizer: COVID-19 detection as a case study. Appl. Intell. 53(15): 18630-18652 (2023), Roles of Artificial Intelligence and Extended Reality Development in the Post-COVID-19 Era. HCI (41) 2021: 445- 454. Though some are added, many papers still are not discussed and included. So, the paper should be improved again, including all other issues I mentioned previously.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

The manuscript is substantially improved. I don't have any further comments. The manuscript may be accepted in its current form.

 

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Back to TopTop