Next Article in Journal
Evaluation of Fairness of Urban Park Green Space Based on an Improved Supply Model of Green Space: A Case Study of Beijing Central City
Next Article in Special Issue
Tri-CNN: A Three Branch Model for Hyperspectral Image Classification
Previous Article in Journal
Multispectral LiDAR Point Cloud Segmentation for Land Cover Leveraging Semantic Fusion in Deep Learning Network
Previous Article in Special Issue
A Full Tensor Decomposition Network for Crop Classification with Polarization Extension
 
 
Article
Peer-Review Record

XANet: An Efficient Remote Sensing Image Segmentation Model Using Element-Wise Attention Enhancement and Multi-Scale Attention Fusion

Remote Sens. 2023, 15(1), 236; https://doi.org/10.3390/rs15010236
by Chenbin Liang 1,2,3, Baihua Xiao 2, Bo Cheng 4 and Yunyun Dong 1,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Remote Sens. 2023, 15(1), 236; https://doi.org/10.3390/rs15010236
Submission received: 25 November 2022 / Revised: 26 December 2022 / Accepted: 27 December 2022 / Published: 31 December 2022
(This article belongs to the Special Issue Deep Learning for Remote Sensing Image Classification II)

Round 1

Reviewer 1 Report

This paper presnets a model, named XANet, for the task of image segemntation in the remote sensing images. The main contributions are two attention models, ARM and AFM, one in the encoder for CS attentions and one in the decoder for fusions. Various experiments are conducted on the remote sensing image dataset, which shows superior performances when compared with previous methods. Ablation studies are conducted, on both backbones and various attention modules, which shows the effectiveness of the ARM and AFM.    The paper is very well written, not only in language but also in orgnization. The proposed components show improvements on the remote sensing images.  I'm positive towards the paper.    The designs should be reflected to the remote sensing senariors. Currently, ARM and AFM look like 2 general attention modules, If they were, they should also have good performances on general segmentation tasks, if so, experiments should be conducted to verifiy the case, if not, the connections of ARM and AFM to the remote sensing images should be discussed.    The motivation that adopts attention in decoder is not that clearly discussed, at least, not as clear as the discussions when saying the importance and issues of attention in the encoder.    It would be better to write some descriptions on the figure caption, such that the figures become stand-alone which is more readable without referring to the main text.    In ablation study of the experiments, various attention modules are selected for comparison. On the one hand, it is good to show the advantages of the designs, on the other hand, the selected methods are not designed for remote sensing images. In contrast, they are for general tasks, which is a bit unfair to them. Reversely, if the ARM and AFM are not specially designed for remote sensing images, then the advantages should also been demonstrated on other tasks, which is absent in the experiments. 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

This paper proposes an attention-based network termed as XANet for remote sensing semantic segmentation. XANet has two key components: ARM and AFM. ARM is developed to generate a 3D attention map for feature enhancement from spatial and channel dimensions. AFM is designed for sufficient spatial and channel fusion of multi-scale features. Extensive experiments on ISPRS and GID datasets are performed. XANet achieves superior results to classical semantic segmentation models. Overall, this paper is well-written and easy to follow. However, some concerns should be resolved for final acceptance:

1. The authors aim to develop an efficient model for high-resolution remote sensing semantic segmentation. However, Both FLOPs evaluations and performance comparisons are performed at the resolution of 256x256, which is far smaller than the original image sizes of ISPRS and GID datasets. Thus, we can not see the efficiency and effectiveness of the proposed method from the experiment results. Please give explanations. Besides, most visual results of this paper are with square spatial size. Why are the spatial sizes of visual results in Fig.4 rectangle?

2. Table 2 only presents the performance comparisons with the classical semantic segmentation models, while remote sensing segmentation methods are not involved. Thus, we can not see the effectiveness and efficiency of XANet from Table 2, which is in contrast to the original motivation of this paper.

3. The literature review is far from comprehensive. The leading and recent semantic segmentation methods should be discussed in Sec. 2, for instance, Volumetric memory network for interactive medical image segmentation, and Rethinking Semantic Segmentation: A Prototype View. In addition, neural attention has been widely exploited for remote sensing analysis, e.g., Contextual Transformation Network for Lightweight Remote Sensing Image Super-Resolution.

4.  A comma or period should be added at the end of each equation. Please unify the style of all tables to highlight the best and second-place results. 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

  1. Tables 2,3 can highlight the strongest values for emphasizing the comparison as done in Tables 4,5,6 and 7.
  2. Authors can plot the bar graphs of comparison of different parameters as shown in the following for better visualization of the results: 
    https://www.mdpi.com/2504-446X/6/12/406
  3. Can you provide the detail of the GPUs and RAM used?
  4. Why Adam optimizer is used? authors can write a line explaining that in their manuscript.
  5. Authors are suggested to include training/validation loss curve for the presented model.
  6. Is there another model for model interpretability other than Grad-CAM for visually explaining the Deep Learning Models? If yes, then what is the reason for the selection of Grad-CAM
  7. The limitations and future scope need to elaborate more.

Please refer to the attachment for additional comments

Comments for author File: Comments.pdf

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

The revision has addressed my concerns. 

Back to TopTop