Next Article in Journal
MANet: A Network Architecture for Remote Sensing Spatiotemporal Fusion Based on Multiscale and Attention Mechanisms
Next Article in Special Issue
Self-Supervised Encoders Are Better Transfer Learners in Remote Sensing Applications
Previous Article in Journal
Automatic Detection Method for Loess Landslides Based on GEE and an Improved YOLOX Algorithm
Previous Article in Special Issue
Rapid Target Detection of Fruit Trees Using UAV Imaging and Improved Light YOLOv4 Algorithm
 
 
Article
Peer-Review Record

TRQ3DNet: A 3D Quasi-Recurrent and Transformer Based Network for Hyperspectral Image Denoising

Remote Sens. 2022, 14(18), 4598; https://doi.org/10.3390/rs14184598
by Li Pang 1,†, Weizhen Gu 2,† and Xiangyong Cao 3,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3:
Remote Sens. 2022, 14(18), 4598; https://doi.org/10.3390/rs14184598
Submission received: 16 August 2022 / Revised: 4 September 2022 / Accepted: 9 September 2022 / Published: 14 September 2022
(This article belongs to the Special Issue Machine Vision and Advanced Image Processing in Remote Sensing)

Round 1

Reviewer 1 Report (Previous Reviewer 2)

In the paper, a network for Hyperspectral Image Denoising is presented. The revision improved the quality of the paper. Further comments:
1.    Please discuss the flexibility of the placement of the BI bridge (Section 4.3).  How the performance differences between structures can be explained? Case (c) is better in most scenarios.  What happened here?
2.    Please add a section with the limitations of the approach.
3.    The link to the source code should be also present in the Data Availability section.
4.    The paper still requires proofreading (often lack of space before citation bracket e.g., “Blocks[39]”-> “Blocks [39]”  or “QRU3D, Uformer and TRQ3DNet” -> “QRU3D, Uformer, and TRQ3DNet”)

Author Response

The response to reviewer 1 is in the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report (New Reviewer)

·         Numerical results/improvement should be mentioned in the abstract against SOTA.

·         It's not clear the significance of the article's results.  How the results can be applied in practical mode? Does Image Denoising have any relation with Hyperspectral Image? How do noises influence HP image results?

·         However, the Introduction section, is able to clearly indicate where the research problem and the contribution of the study are; in addition, in the Discussion section, it is suggested to add a description of what are the academic and practical implications of the results obtained from the study. If further improvement can be made, I believe the quality and impact of this paper can be greatly improved.

 

  • Extension of the DL literature will be appreciated: DOI: 10.3390/electronics11091328 should be cited behind deep learning.
  • Figures 10 and 11 should be explained in more detail.
  •  In conclusion: add factual information with more detail and the future direction should be clear and distinct.

Author Response

The response to Reviewer 2 is in the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report (New Reviewer)

In this work, the authors propose a combined CNN and transformer architecture for HSI denoising with a bidirectional integration bridge between them for better preserving information circulation. The idea is good, and the experimental results are impressive. I have following suggestions to further improve the manuscript. 

1. The motivation of the combination between the CNN and Transformer needs to be further clarified. Both of them are described for global feature extraction. 

2. I am curious about the reason why TR performs poorly than the other methods in Table 7. The transformer has been extensively studied in which transformer has been demonstrated with better performance.  Is it due to the limited parameters in Transformer? 

3. In Fig 10, it seems that there exist obvious striping artifacts in the features. A more detailed explanation is required. 

4. The comprehensive survey ''Image Restoration for Remote Sensing: Overview and Toolbox'' is a good supplement to this work. The authors may consider introducing this work in the related work for the freshers.

Author Response

The response to Reviewer 3 is in the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 3 Report (New Reviewer)

Thanks for the effort of the authors. I have no more questions.

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.


Round 1

Reviewer 1 Report

Your method of using three branches was very well described in the article.  But, the difference between your method (TRQ3DNet) and the competing method (QRNN3D) are too close to boast that your method is superior.  In fact, viewing the data presented, I would have to question if your method is worth the extra step when one could use the QRNN3D method and get results that are very similar.  The data presented also makes me question if global context data is adding any significant value.  Because of this, I recommend more work for this article before publishing.

I also recommend the following updates:

Section 3.1 Experimental Setup: 

I recommend considering using more real data for training, validation, and test.  

I also recommend considering more complex validation methods such as cross-validation, random subsampling, and/or bootstrapping.

Even considering general validation methods, you are using very little runs to complete your validation.  (ref:  5 out of 201)

Lines 219-223:  Spell out first use of PSNR, SSIM, and SAM so that reader does not have to go read the references.

Table 2:  This table shows that method QRNN3D is very close to your method.  If you consider run time, QRNN3D may be the better choice.  Thus, must show the run times of QRNN3D and TRQ3DNet to prove your method is significantly better.  This same comment can be applied to Table 3, Figure 6, Table 4, and Table 5.

Figure 6:  You have used the term "ours" as the heading for your method up to Figure 6.  But, at Figure 6, you change it to TRQ3D.  I suggest picking one heading and using the same heading throughout the document.

At Future Work section:  recommend more work using more real data with real noise.  Investigate the use of better validation methods.  Study run time for each of the methods that are close in performance.

 

 

 

 

Reviewer 2 Report

 In the paper, a model that combines a convolutional neural network with a transformer network is proposed for hyperspectral image denoising. Comments:
1.    It is written in the Abstract that one branch “is built by stacking the 3D quasi-recurrent blocks.” However, the stacking is not mentioned in the further parts of the paper and it seems that only one 3D quasi-recurrent block is used in the model.
2.    Many subsolutions in the model are not discussed. They are just described as is. Are they introduced in this paper or taken from the literature approaches? For example, why is the GELU used as the activation function?  What does it mean “to better fuse the feature of two parts… a 3D convolution”… with squeezing is used” (page 6, lines 168+)? What was the alternative? Please share the findings.
3.    It is claimed that ”We develop a bidirectional integration bridge (BI bridge) for better preserving the image feature information.” The better preservation of feature information is not explicitly shown in the paper. Such features should be elaborated and visualized. How do we know it is better preserved? What does preservation mean in this case?
4.    The results shown in Table 4 are not discussed. From previous results, it can be seen that the proposed model highly overfits the data and its usability in practice would be worse than those of compared methods. It happens since in the test that promotes dataset independence other approaches significantly outperform the method.  Unfortunately, the training on ICVL and Pavia Centre datasets (the "Ours-S" and "Ours-F" cases) cannot lead to meaningful conclusions as the method is tested on trained samples. This should not be reported in the paper. This observation leads to another question whether the experimental protocols are correct. The division of data samples into training and testing subsets should be revealed. Were they disjointed? Why only 100 out of 200 were used? How they were selected? Was it arbitrary? Please note that placing the same image in both subsets, even with different noise severity, still makes the testing easier for the network and the result is completely irrelevant. The cases reported in Section 3.2 should be clarified. Also, the way the state-of-the-art approaches were trained should be revealed. Are they pretrained? Can they be fairly compared?
5.    The experimental setup is unclear and requires explanation (lines 226-246). Why are the first 30 epochs spent on training one noise severity and then the model is trained differently? This should be supported by the literature or thoroughly justified. What would happen if we took some images, corrupted them with noise, and used them for training without any stages? Can it be done? How it would be trained in practice? Were other models trained in the same way? Why? Please clarify.
6.    The simulation of real-world conditions with synthetic noise should be discussed (page 8, lines 225+). If it is performed correctly, the trained model would greatly perform on real datasets.
7.    The methods should be coherently compared in tests. Some of them are tested only in one scenario. How can we draw conclusions about their overall performance?
8.    Section 3.3. Can a no-reference quality assessment method be used for real HSI denoising cases to support the perceptual observations?
9.    The paper should contain a link to a source code of the method that ensures the repeatability of the presented results.  
10.    The paper should be carefully proofread. The number of grammar errors and typos is overwhelming (e.g., lack of space before each citation bracket, or “we final take”). What does it mean that “we compare the evaluation comparison?” (page 12, line 308) 

Back to TopTop