Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Deep Learning Classification by ResNet-18 Based on the Real Spectral Dataset from Multispectral Remote Sensing Images

Remote Sens. 2022, 14(19), 4883; https://doi.org/10.3390/rs14194883

by Yi Zhao¹

, Xinchang Zhang^1,*

, Weiming Feng¹ and Jianhui Xu²

Reviewer 1:

Daniela Faur

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Remote Sens. 2022, 14(19), 4883; https://doi.org/10.3390/rs14194883

Submission received: 5 August 2022 / Revised: 6 September 2022 / Accepted: 27 September 2022 / Published: 30 September 2022

Round 1

Reviewer 1 Report

This paper intends to provide a deep learning based solution for the classification of the Landsat 8 data. Please find above my comments.

Introduction

It would be very appropriate to mention the purpose of your efforts: the classification of L8 data serves to what goal? Since higher spatial resolution data- S2 data are availble, a justification will be welcomed. Do you plan to work with time series?

LIne 48-50 While reading the beginning of the introduction I have found a major confusion: "Multi-spectral remote sensing images range between remote sensing images with high spatial resolution and hyperspectral remote sensing images. Compared with hyperspectral remote sensing images, multispectral remote sensing images are important data sources for sequence space analysis, have higher spatial resolution, and are usually multisources with long time series"

This phrase is very confusing, mixing spatial and spectral resolution of the Eartg observation data.

Section 2

Subsection 2.1.1- How does the WV data from December used for validation capture the vegetation fraction?

Line 139 - why the pure pixels are easy to obtain for this study area?

Subsection 2.1.2

Table 1 shoud not be splitted.

Section 2.2.1.

The methodology/ the workflow, cleary needs a staggered represention of the stages to be taken. Figure 2 should be refined to highlithg the processing steps.

It is not clear how spectral information of S2' s pure pixels will improve the spectral information of the pure pixels in L8. How these pure pixels from both data types are selected?

After the preprocessing stage I understand that OTSU segmentation was applied on the MNDWI to remove the water pixels from both L8 and S2. Where does this step appears in the Figure 3 picturing Reflectance Replacement in a section describing Reflectance Enhancement?? Please use the same terms for consistency.

Line 192- What method has been used for data resampling to 10m? How the resampling method of the S2 bands of lower resolution to 10 m impacts your approach?

Please align formules 2, 3 and 4 with the first one.

See Figure 4. Flow chart for post-processing model - it is accuracy assesment in the end.

Please change the name of this sub sub section - 2.2.1.2 Water- and clarify in the general framework when this water pixel removal is performed.

Section. 2.2.2.2 Data Set Establishment

Line 284-284- how did the authors established the ratio between training and validation? Does a change of this ratio impact the results?

Line 292-304 - I urge the authors to consider the use of a table to highlight the 4 categories of experiments.

2.2.3 Deep Learning Model Establishment

Please pay attention to the English phrasing/grammar - "In reference of the ResNet18 model,", by the way, cite the reference here...

Align formula (10)

Section 3 Results

Please pay attention to paper's format, fig 8 has the caption on another page, this is very annoying.

Section 3.3 Data Sets

Table 2 says nothing, please consider the use of percents.

Section 3.4 Results of Resnets18

Please consider, again, to rename some section- the results are not of Resnets18....generally speaking.

Why did you choose this network? have you performed other experiments?

Line 370- please mention these differences.

Author Response

Dear Reviewer,

Thank you very much for reviewing our article, and your valuable suggestions make our article to be revised towards a more academic version. Your ideas and suggestions are a valuable asset in our academic work. We have reworked this article with language improvements again, and the following are our responses and modifications:

Introduction

It would be very appropriate to mention the purpose of your efforts: the classification of L8 data serves to what goal? Since higher spatial resolution data- S2 data are available, a justification will be welcomed. Do you plan to work with time series?

Line 48-50 While reading the beginning of the introduction I have found a major confusion: "Multi-spectral remote sensing images range between remote sensing images with high spatial resolution and hyperspectral remote sensing images. Compared with hyperspectral remote sensing images, multispectral remote sensing images are important data sources for sequence space analysis, have higher spatial resolution, and are usually multisources with long time series"

This phrase is very confusing, mixing spatial and spectral resolution of the Eartg observation data.

In some research work, we often use L8 images as a long-term data source. And in this paper, we take L8 images as target to support the next research work of long-term image classification.

We have revised the manuscript, and the mentioned part is revised as:

Remote sensing images have two important attributes: spatial resolution and spectral resolution. The spatial resolution of multispectral remote sensing images is generally higher than that of hyperspectral remote sensing images, and the spectral resolution of multispectral remote sensing images is generally higher than that of high spatial resolution remote sensing images. In addition, the multispectral remote sensing image, especially Landsat series, provide long-term remote sensing images, and the data is relatively easy to obtain, and the theoretical basis of data preprocessing is solid. Therefore, the remote sensing images of this series are often used as data sources for long-term series analysis and research.

Section 2

Subsection 2.1.1- How does the WV data from December used for validation capture the vegetation fraction?

Line 139 - why the pure pixels are easy to obtain for this study area?

We have revised the manuscript, and the mentioned part is revised as:

In order to verify the reliability of LSMA, 172 rectangular sample areas were se-lected in the study area randomly from for vectorization to extracts vegetation, soil and impervious surface in the sample area, and then the area ratio of each component in each sample area were calculated as the reference data.

The study area comprises the six main urban areas (113°8′ E~113°37′ E，23°26′ N~23°2′ N) of Guangzhou City, Guangdong Province: Yuexiu District, Haizhu District, Tianhe District, Baiyun District, Huangpu District and Liwan District. The study area location is shown in Figure 1. The main urban area contains typical vegetation, water, bare soil, and urban ISs (roads, buildings, and residential and commercial lands with different densities). Furthermore, in the study area, there are large forest areas with more pure pixels of vegetation in Baiyun District and Huangpu District. At the same time, Baiyun District also contains a large amount of farmland, with more pure pixels of soil. Tianhe District and Yuexiu District have large built-up areas with more pure pixels of impervious surface. Therefore, pure pixels are easier to obtain, which is con-ducive to the extraction of spectral features of different pixels.

Subsection 2.1.2

Table 1 shoud not be splitted.

We have revised the manuscript, and the mentioned part is revised as:

Section 2.2.1.

The methodology/ the workflow, cleary needs a staggered represention of the stages to be taken. Figure 2 should be refined to highlithg the processing steps.

We have revised the manuscript, and the mentioned part is revised as:

It is not clear how spectral information of S2' s pure pixels will improve the spectral information of the pure pixels in L8. How these pure pixels from both data types are selected?

In the article, the following describes how spectral information of S2' s pure pixels will improve the spectral information of the pure pixels in L8.

In this study, Sentinel 2A MSI with higher spatial resolution (10 m) was used to select pixels with relatively high purity and extract the spectral reflectance of these pixels. As shown in the band range presented in Table 1, the spectral reflectance of the "pure pixel" selected from Landsat 8 OLI was replaced with the spectral reflectance of the "pure pixel" selected by Sentinel 2A MSI, and the spectrum closer to the actual ground object pixel was obtained. The curve provides a reliable data basis for the calculation of mixed spectral unmixing, thereby improving the extraction accuracy of spectral unmixing. The specific data processing is shown in Figure 3:

We have revised the manuscript, and the mentioned part is revised as:

Line 192- What method has been used for data resampling to 10m? How the resampling method of the S2 bands of lower resolution to 10 m impacts your approach?

We resampled the S2 images according to a simple and feasible method, and did not discuss the effect of the resampling method on the experimental results. In the next step, we will focus on this problem and conduct experiments and discussions.

And we have revised the manuscript, and the mentioned part is revised as:

Additionally, the spatial resolutions of B11 and B12 were resampled to 10 m by the Sentinel Application Platform (SNAP) using nearest neighbor method.

Please align formules 2, 3 and 4 with the first one.

We have revised the manuscript, and the mentioned part is revised as:

See Figure 4. Flow chart for post-processing model - it is accuracy assessment in the end.

We have revised the manuscript, and the mentioned part is revised as:

Please change the name of this sub sub section - 2.2.1.2 Water- and clarify in the general framework when this water pixel removal is performed.

We have revised the manuscript, and the mentioned part is revised as:

2.2.1.2 Water Pixels Spectral Reflectance Extraction

Section. 2.2.2.2 Data Set Establishment

Line 284-284- how did the authors established the ratio between training and validation? Does a change of this ratio impact the results?

Because the dataset in this work is not very large, we propose to split it using the traditional machine learning division approach. During the experiment, we discovered that while the quantity of training samples (more than 75%) is guaranteed, the number of test sets has no effect on the correctness of the findings.

Line 292-304 - I urge the authors to consider the use of a table to highlight the 4 categories of experiments.

We have revised the manuscript, and the mentioned part is revised as:

Experiments	Threshold Ⅰ	Threshold Ⅱ (High IS/ Low IS/ Soil/ Water/ Vegetation)
Experiment 1	0.75	999/999/999/999
Experiment 2	0.75	1.2/1.0/0.5/1.3/0.3
Experiment 3	0.85	999/999/999/999
Experiment 4	0.85	1.2/1.0/0.5/1.3/0.3

2.2.3 Deep Learning Model Establishment

Please pay attention to the English phrasing/grammar - "In reference of the ResNet18 model,", by the way, cite the reference here...

We have revised the manuscript, and the mentioned part is revised as:

In reference to the ResNet-18 model [55],

Align formula 10

We have revised the manuscript, and the mentioned part is revised as:

Section 3 Results

Please pay attention to paper's format, fig 8 has the caption on another page, this is very annoying.

We have revised the manuscript, and the mentioned part is revised as:

Section 3.3 Data Sets

Table 2 says nothing, please consider the use of percents.

We have removed table 2.

Section 3.4 Results of Resnets18

Please consider, again, to rename some section- the results are not of Resnets18....generally speaking.

We have revised the manuscript, and the mentioned part is revised as:

3.3 Results of Classification

Line 370- please mention these differences.

We have revised the manuscript, and the mentioned part is revised as:

In Experiment 1, some water pixels are mistaken as IS pixels. Experiments 2 and 3 also have such problems. Experiment 4 has the best classification results, and the distribu-tion of various ground objects is close to the actual situation.

Why did you choose this network?

We have revised the manuscript, and the mentioned part is revised as:

When ResNet-18 was initially presented, the model was used for picture recogni-tion [54]. Since then, it has been widely used in remote sensing image semantic seg-mentation and change detection. Both areas are similarly complicated 2D planar re-mote sensing picture jobs in general, however this research only pertains to spectral curve regression fitting, which is comparatively straightforward compared to 2D planar remote sensing photos. The phenomenon of “same thing different spectrum” and “foreign matter same spectrum” occurs due to the minor variation in spectral curves between distinct features, making it difficult for deep learning networks to fit the pa-rameters and prone to misfitting. The residual block structure in the ResNet-18 net-work may enhance the depth of the model network and allow the model to learn deep-er characteristics of the spectral curves while avoiding the difficulties of model instability and accuracy loss caused by increasing the depth of the network.

Kind regards,

Yi Zhao

Tuesday, September 6, 2022

Author Response File: Author Response.docx

Reviewer 2 Report

In general, the idea of this paper is interesting. However, there are also some issues about the experiments and the expression of this paper.

(1) The title of this paper should point out which type of deep learning network is used for classification.

(2)The following expression , i.e., "Owing to the limitation of spatial resolution and spectral resolution, deep learning methods are rarely used for the classification of multispectral remote sensing images", is not true. Many deep learning network were designed for MS image classifcation.

(3) In the experiments, the proposed method should be compared with the state-of-the-art methods.

(4)In abstract, it says: "a deep learning classification model, Resnet 18, based on the pixel real spectral information, was constructed to classify Landsat 8 OLI images". However, this model was used to classify worldview-2 MS images in the experiments. Is there something wrong about the expression?

Author Response

Dear Reviewer,

Thank you very much for reviewing our article, and your valuable suggestions make our article to be revised towards a more academic version. Your ideas and suggestions are a valuable asset in our academic work. We have reworked this article with language improvements, and the following are our responses and modifications:

(1) The title of this paper should point out which type of deep learning network is used for classification.

We have revised the title as:

Deep learning classification by ResNet 18 based on the real spectral dataset from multispectral remote sensing images

We have revised the manuscript, and the mentioned part is revised as:

For remote sensing images with medium spatial resolution, some scholars have also tried, for example, Karra K et al. pointed out that the resolution of global LULC prod-ucts is currently low, so they used deep learning technology to generate global LULC products with a special resolution of 10 m [38]. Other scholars have also tried to classify Landsat images with 15 m and 30 m resolution based on deep learning methods [39-41].

(3) In the experiments, the proposed method should be compared with the state-of-the-art methods.

This article aims to provide new ideas and methods for the deep learning classification of multispectral remote sensing images based on the spectral information from the images. At present, among the deep learning classification methods of multispectral remote sensing images, there are few experimental cases of methods using the spectral information of images for classification. We design four experiments to verify the effectiveness of the method proposed in this paper.

In this paper, we adopt the ResNet18 model to classify L8 multispectral remote sensing images. We have revised the manuscript, and the mentioned part is revised as:

In this study, the Landsat 8 OLI multispectral remote sensing image acquired on October 23, 2017 was selected as the main research data for classification.

Kind regards,

Yi Zhao

Tuesday, September 6, 2022

Author Response File: Author Response.docx

Reviewer 3 Report

The authors of this paper used linear spectral mixture analysis and the spectral index method to extract the impervious surfaces, soil, vegetation, and water. Next, they prepared a spectral dataset of multispectral image pixels using threshold screening. Similarly, Resnet 18 was used to classify the Landsat 8 OLI images.

The idea is interesting, however, I have several concerns and questions about the work that are given below:

Visual representation of proposed method is very important to convey the actual theme for better understanding of readers. However, the introduction lacks the visual representation of the generic flow of the method.

Also, the main contributions of this method are not clear. Authors are advised to highlight the contributions of the method in bullets after a paragraph stating the challenges and problems in the existing linear spectral mixture analysis and spectral index methods describing how the proposed method cope them.

This article focused on deep learning in the spectral classification of multi-spectral remote sensing images and ground objects. Authors can refer some classification literature such as 10.1109/TII.2021.3116377 & https://doi.org/10.1002/int.22537 in the manuscript.

The Scientific language of the paper is very weak. Authors need to get thoughts and expressions from existing Remote Sensing papers and reflect in the papers accordingly.

Authors are suggested to add some qualitative results to assess the method performance.

Author Response

Dear Reviewer,

Thank you very much for reviewing our article, and your valuable suggestions make our article to be revised towards a more academic version. Your ideas and suggestions are a valuable asset in our academic work. We have reworked this article with language improvements, and the following are our responses and modifications:

We have revised the manuscript, and the mentioned part is revised as:

Spectral index and spectral unmixing methods are commonly used methods for classifying multi-spectral remote sensing images. However, the use of the spectral index to classify ground objects necessitates the determination of a specific threshold. Furthermore, the spectral unmixing method is often attributed to the selection error of pure pixels and the phenomenon of the same ground objects with different spectra, and different ground objects with the same spectra, resulting in large error. This often necessitates the culling of water pixels because the spectral reflectance of the water pixels is close to that of the low albedo impervious surface (IS) (ISs are divided into high-albedo ISs and low-albedo ISs).

In this study, MNDWI is used for the removal of water pixels for calculating LSMA. Then the spectral information of the pure pixels selected from the Sentinel 2A image is improved by improving the spectral information of the pure pixels selected from the Landsat 8. Based on the improved spectral information, the LSMA is used to calculate the fraction of vegetation, soil, high IS, and low IS. Meanwhile, the MNDWI images are used to extract water pixels. Through the threshold extraction method for vegetation fraction (VS), soil fraction (SF), IS fraction (ISF), and MNDWI, a real spectral data set of ground objects for deep learning is established to ease the workload of training sample selection. ResNet 18 model is applied for Landsat 8 OLI image classification to address the inability of LSMA to distinguish classifications of ground objects from similar spectral features, and alleviate the strong subjectivity when using spectral index threshold method to classify ground objects. New ideas and theoretical basis for the classification of ground objects in multispectral remote sensing images are provided.

The Scientific language of the paper is very weak. Authors need to get thoughts and expressions from existing Remote Sensing papers and reflect in the papers accordingly.

We've revised the entire article, and again with language improvements, especially the introduction. Some related references and research results have been added to the revised manuscript, and the research method of this paper is proposed by analyzing traditional methods.

Authors are suggested to add some qualitative results to assess the method performance.

We present a visual representation of the classification results in the Results and Discussion sections, including classification results of the entire study area and partial detail maps from the classification results.

I'm not sure if I fully understand your suggestion, please let us know if this answer is inappropriate.

Kind regards,

Yi Zhao

Tuesday, September 6, 2022

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

I have no suggestions at this point.

Article Menu

Deep Learning Classification by ResNet-18 Based on the Real Spectral Dataset from Multispectral Remote Sensing Images

Further Information

Guidelines

MDPI Initiatives

Follow MDPI