GSA-SiamNet: A Siamese Network with Gradient-Based Spatial Attention for Pan-Sharpening of Multi-Spectral Images
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsI think it is a well written paper. The proposed architecture is clearly explained, and alternative methods with references are also included for every step. This new proposal clearly outperforms other similar methods (especially with high-frequency content), and have enough novelty to be considered for publication.
I have some minor questions that could be addressed in the paper.
- The values used for some configuration parameters should be commented. For example, the angle of the negative slope for the Leaky RELU used in the two-stream features extraction module, or the value of lambda to balance the loss terms of the loss function.
- For the reduced-resolution validation, after the gaussian blur kernel, what down-sampling filter is used? no filter, bilinear, bicubic, etc. How would this affect the training process? What are your insight about not using a gaussian blur kernel and performing the down-sampling directly with a filter?
- Related to the above, has antialiasing been considered when down-sampling the input images?
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Reviewer 2 Report
Comments and Suggestions for AuthorsThe introduction and state-of-the-art review is rather complete. Some indication on the efficiency of each solution would be appreciated (what are the advantages of each solution? the drawbacks? which is the best?).
The study is well conducted with clear description of the approach. In §3.2, the inference time is missing. For such processing which is operationnaly applied on many very large images, the computation cost is a very important criteria. It has to be taken into account in the comparison of the different methods.
On quantitative criteria, there is no doubt, the proposed method is the best among all tested methods. But visually in fig 6 and 7, it is not the same thing. The details on the building's roof (especially fig7) are better preserved with the BDSD and AWLP, even if the quantitative criteria are much worse. It is well known that it is very difficult to quantify the small details with criteria. I would recommend to talk about this limitation and highlight the fact that the spectral distortion is the main drawback of the BDSD and AWLP method even if the preservation of tiny details is better. The area chosen for these 2 images are rather poor in color contents. Thus the spectral distorttion is not visible. If you can change for another area with more colored objects (trees, colored vehicules,...), it would be better.
On fig8, the zoomed extract is not very well chosen since there is few high resolution details. It would be worth to choose another area with pylons or traces in the snow. And the zoomed extract should be larger and less zoomed. I guess that then, we would observe the same thing than what I said on fig 6&7.
Fig9, same remarks, a zoomed extract on an area close to the terminal with objects like vehicules or the plane on top right would be better for the comparison. Is this image really at 31 cm (it seems much less) ?
One typo in line 298: same
Author Response
Please see the attachment.
Author Response File: Author Response.docx