Landsat-8 to Sentinel-2 Satellite Imagery Super-Resolution-Based Multiscale Dilated Transformer Generative Adversarial Networks
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe authors used Landsat-8 and Sentinel-2 (S2) data to develop a Dilated Transformer Generative Adversarial Network method to improve the spatial resolution of the data. The manuscript covers an important and necessary topic in generating a long-time series of super-resolution imagery, especially for Landsat archive coverage before 2000. However, this paper needs to provide more critical information about the spectral accuracy of all data generated explicitly for reproducibility.
Please see specific comments below:
1) Abstract must be rephrased to accommodate the results description better. It is unclear how the proposed method improved the super-resolution images and the LULC maps.
2) Does LR means low-resolution?
3) Generative Adversarial Networks must be included in the title of this piece.
4) The new paradigm of the RS is clouding computing, so it is not clear why a higher quality of visual perception is critical here. Please rewrite lines 111 and 112.
5) The manuscript's primary goal is not clearly stated at the end of the Introduction. What do authors mean by LUCC accuracy? Improving the special resolution per si does not imply better LUCC thematical accuracy.
6) Figure 1 is challenging to be read. The letters are too small.
7) Figure 2: are the authors using the data cube concept? If yes, it has to be better described in the piece's Introduction.
8) All Figures must be improved for better reading.
9) The location of the selected images must be better described. Where are they located? In each biome? How may the land types, the relief, and the atmospherical conditions influence the results? April- June are in which seasonal period? Are they from the dry or rainy season?
10) How can the preprocessing method described in line 288 be better than the one processed by USGS and ESA?
11) In line 290, which spectral bands the authors cropped?
12) Please explain the land use types the authors used in lines 302 and 303.
13) I suggest splitting the Results section from the Discussion section.
14) I can not see any improvement in the visual results in Figure 9.
15) How may the preprocessing stage the authors perform impact the results presented in Figure 10?
16) Line 420: how much closed?
17) I am not convinced that the higher MSE in infrared bands is derived from the grayscale distribution. How did the land types and the seasonal conditions affect these results?
18) There is no Discussion in this piece. The authors limited to the description of the Results only.
19) Figure 14 must be better described. How did the authors assume the results are improved? In which context?
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThis manuscript proposes a new approach to improve the spatial resolution and land use/cover classification accuracy of multispectral remote sensing images. The background and process of this research is described in detail and is innovative. I have just a few comments:
1. Figure 1 Suggested size increase.
2. Check the grammar and consider shortening sentences that are too long.
3. Suggest sharing the code.
4. Please check references and correct them according to the journal format uniformly.
Comments on the Quality of English LanguageMinor editing of English language required
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThis paper proposed a Generative Adversarial Network that combines Convolutional Neural Networks with Transformers to improve the spatial resolution of multispectral remote sensing images and enhance the accuracy of Land Use/Cover classification maps. The research was divided into two stages: image super-resolution and LUCC. In the super-resolution stage, Transformers were used to enhance the model’s ability to learn local and global features. Multi-scale information and dilated convolutions were introduced to improve computational efficiency. In the LUCC stage, a pre-trained model was used to generate high-resolution images from Landsat 8 data to demonstrated using SR images for improvements of the accuracy of LUCC maps.
The manuscript comprises of 2 pages introduction, 8.5 pages methods, 6.0 pages results and discussion, 0.5 pages conclusion plus references. All together 76 references were stated. The introduction section is semantically well written and a quick overview of applicability of deep learning and super-resolution data. The introduction is well documented with references.
The material and methods section is very much technically written, but this is due to the nature of the focus. Figure 6 should incorporate a inlet overview map to indicate where the test site is located.
The results section needs small formatting concerning figure captions. In the opinion of the reviewer these should be extended and self-explaining. Furthermore, the amount of figures should be reduced and/or shifted to the appendix (e.g. figure 10 and 13).
In the following some recommendations are listed by line throughout the manuscript.
p. 2 line 35: Space should be introduced between the word and the reference.
p. 2 line 41: In the opinion of the reviewer a reference should be stated for the argument that high computational complexity and a priori knowledge is required with deficits on high-frequency details.
p. 2 line 56: This sentence should be rephrased to clearly indicate the better results in terms of which parameter or criteria.
p. 2 line 69: Space should be introduced between the word and the reference. Thoroughly check the whole manuscript.
p. 2 line 69: Needs linguistic improvement. For the reviewer is not clear what the is meant here.
p. 3 line 89: Needs linguistic improvement. For the reviewer is not clear what the is meant here.
p. 3 line 131: Needs linguistic improvement. For the reviewer is not clear what the is meant here.
p. 3 line 135: Please check citation of Roy et al. It seems not to fit to the rest of the citation style.
p. 9 line 285: Space should be introduced between figure 6 and the text.
p. 14 figure 9: Figure caption should be more comprehensive and self-explanatory.
p. 15 figure 10: Figure caption should be more comprehensive and self-explanatory.
Comments on the Quality of English LanguageConcerning linguistic style, the manuscript is at a moderate level, there are several major modifications needed. Throughout the manuscript the spelling of super resolution should be consistent and the formatting of references should be checked. To make sure the reference is included with a space between the word before (e.g. line 35).
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 4 Report
Comments and Suggestions for AuthorsThis paper describes a new super-resolution methods relyings on image transformers. It uses an adversarial encoder-decoder strategy relying on image transformers.
The paper is generally clear and results are convincing.
To be improved:
- The way the different losses are combined must be described with more details (weighted sum ?)
- The dataset could be described with slightly more details (number of images, different dates / seasons ?)
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsAll my comments were adequately accommodated.