Next Article in Journal
The Adaptation of Internet of Things in the Indian Insurance Industry—Reviewing the Challenges and Potential Solutions
Next Article in Special Issue
Solpen: An Accurate 6-DOF Positioning Tool for Vision-Guided Robotics
Previous Article in Journal
A Multi-Scale Multi-Task Learning Model for Continuous Dimensional Emotion Recognition from Audio
Previous Article in Special Issue
Multiple Cues-Based Robust Visual Object Tracking Method
 
 
Article
Peer-Review Record

Dense Residual Transformer for Image Denoising

Electronics 2022, 11(3), 418; https://doi.org/10.3390/electronics11030418
by Chao Yao 1,†, Shuo Jin 2,†, Meiqin Liu 2,* and Xiaojuan Ban 3,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Electronics 2022, 11(3), 418; https://doi.org/10.3390/electronics11030418
Submission received: 31 December 2021 / Revised: 25 January 2022 / Accepted: 27 January 2022 / Published: 29 January 2022
(This article belongs to the Collection Computer Vision and Pattern Recognition Techniques)

Round 1

Reviewer 1 Report

The authors have proposed an end-to-end Transformer-based network for image denoising. As mentioned in the paper, Transformers are slowly replacing CNNs in computer vision offering improved efficiency. The results show improved performance when compared to existing methods. For these reasons, this paper will be of interest to readers of this journal.

The paper is organized and well written with necessary figures and tables. Yet, following minor corrections/changes need to be considered.

  1. The quality metric SSIM is not explained anywhere in the manuscript. Please explain.
  2. What is the reason for adding synthetic images with AWGN rather than other noise types?
  3. Line 295:"...DenSformer is not suitable for images with huge size and unknown degradation". How can DenSformer improve its performance if the degradation is known?

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

I reviewed the paper throughly and my opinion is that paper can become publishable after minor revision requirements that I listed below:

  1. Abstract should provide a summary of quantitative results based on comparisons with state of the art methods.
  2. I do not think there is need for Eq1 to define noisy image. Just writing your definition will be enough
  3. Figure 1 location is not appropriate and also it is not referenced in the text.
  4. Line 73, there is a need for revision of the sentence for clarity.
  5. Figure 2b is not a complex block structure so it can be combined with 2a.
  6. Line 144, why L1 loss function? Needs reference of better performance or quantification of use.
  7. Figure 4 is providing a replicated info that already presented in FÄ°gure 2.
  8. Line 248, why ADAM optimizer? Needs reference of better performance or quantification of use.
  9. As training and ablation studies are performed with synthetically noised images, in which added noise shows a regular pattern, I am mostly interested in real oisy images, thus it would be better to have more visuals on that section (Section 4.3)

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

This article introduces a dense residual transformer for solving low-level image denoising problem. At the heart of the model is a Sformer block, which is developed based on LeWin Transformer. The entire model shows promising performance, with even less number of parameters.

Overall, the article is well written, and the methodology is well motivated and reasonable. The results also demonstrate the effectiveness of the model. However, the article can be further improved by addressing some minor issues:

[1] It is not clear to me why LeWin is used to build the Sformer block. Why not using the vanilla attention module? More (theoretical or experimental) analysis should be given regarding this.

[2] The details of the preprocessing module in Eq.2 is not given. 

[3] Some important related work are missed. For example, Detail-Preserving Transformer for Light Field Image Super-Resolution, also propose a Transformer architecture for low-level vision problem. Please discuss it in the related work and clarify the difference. 

[4] For the ablation study, I will suggest to investigate the effect of the number of ETransformer block used in Sformer.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 3 Report

The revision has addressed all my concerns.

Back to TopTop