Next Article in Journal
Determination of the Features of the Author’s Style of A.S. Pushkin’s Poems by Machine Learning Methods
Next Article in Special Issue
Application of Remote Sensing Tools to Assess the Land Use and Land Cover Change in Coatzacoalcos, Veracruz, Mexico
Previous Article in Journal
Randomized Trial of Feasibility and Preliminary Effectiveness of PerioTabs® on Periodontal Diseases
Previous Article in Special Issue
Crop Classification for Agricultural Applications in Hyperspectral Remote Sensing Images
 
 
Article
Peer-Review Record

Three-Dimensional Convolutional Neural Network on Multi-Temporal Synthetic Aperture Radar Images for Urban Flood Potential Mapping in Jakarta

Appl. Sci. 2022, 12(3), 1679; https://doi.org/10.3390/app12031679
by Indra Riyanto 1, Mia Rizkinia 1, Rahmat Arief 2 and Dodi Sudiana 1,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Appl. Sci. 2022, 12(3), 1679; https://doi.org/10.3390/app12031679
Submission received: 20 December 2021 / Revised: 29 January 2022 / Accepted: 31 January 2022 / Published: 6 February 2022
(This article belongs to the Special Issue Sustainable Agriculture and Advances of Remote Sensing)

Round 1

Reviewer 1 Report

 

This manuscript proposes mapping the flood-prone areas using machine learning to classify the areas using the 3D CNN 17 method. The mail concerns are:

  • CNN is divided in two big steps. Feature Learning and Classification. What happens in each step for your application?
  • Why do you prefer Convolutional Neural networks (CNN) over Artificial Neural networks (ANN) for this application?
  • Explain the significance of the RELU Activation function in your Convolution Neural Network.
  • Explain more details about Pooling Layer in your CNN?
  • Explain the terms “Valid Padding” and “Same Padding” in your CNN.
  • What are the different types of Pooling? Explain their characteristics.
  • What is the role of the Fully Connected (FC) Layer in your CNN?
  • Briefly explain the two major steps of CNN i.e, Feature Learning and Classification. 
  • Explain the significance of “Parameter Sharing” and “Sparsity of connections” in CNN.
  • More avaluation are neccary for validation such as R2 and RMSE?

Author Response

Response 1: Feature Learning phase, or Training, provides training data to the model to store known flood data; classification or Testing phase presents the system to other data for recognizing if there are floods in the images. We explained it in the 3rd paragraph of Section 2.5. Proposed Method

Response 2: CNN features unsupervised classification compared to semi-supervised ANN. We added an explanation in the 7th paragraph of Section 1. Introduction and the 1st paragraph in Section 2.4. Convolutional Neural Network (CNN)

Response 3: The ReLU plays a significant part in this phase since flood areas tend to change values, the possibility of dry land changes to the water surface and then back to dry land will result in a negative value. The ReLU rectifies this problem and prevents the negative output neuron from contributing to the network. Classification stage presents the system to other data for recognizing if there are flood features present in the images using feedback from the results of the Training stage. We explained this in the 3rd paragraph of Section 2.5. Proposed Method

Response 4: The images are downsampled using a pooling layer by summarizing the features present in the images. In this model, the pooling layer uses max pooling, which summarizes the most dominant value in the sample. We added this in the 2nd paragraph of Section 2.5. Proposed Method

Response 5: The problem with sampling of a matrix (and an image) in CL is that pixels at and near the edge are sampled less than pixels further from the edge, this sometimes result in sampling inaccuracy. To prevent this, the kernel filter is padded, with extra rows and columns to allow more information to be collected from the edge pixels. For 2-dimensional data, there are two types of padding: same padding and valid padding. Same padding maintains the size of sample as the original matrix, basically, it resampled the image. Valid padding considers all pixels as valid so the value is considered by the model. This is useful for keeping the information from corner pixels since smple model considers it invalid due to being less sampled compared to other pixels. We added this explanation in the 3rd paragraph of Section 2.4.

Response 6: Pooling layer function is to reduce the size of an image by downsampling it and summarize the features. The common pooling methods to achieve grouping are average pooling, where the summary is the dominant presence of a feature, and maximum pooling by summarizing the strongest feature [46]. Average pooling produces smooth feature which is useful to extract the most relevant value such as the color of a surface, where a small variation in isolated points within a region does not affect the overall value. On the other hand, max pooling extracts high contrast datasuch as edges or point.. We explained this in the 2nd paragraph of Section 2.4.

Response 7: We provided the explanation of the role of FC as well as Pooling Layers in CNN in Section 2.5. Proposed Method.

Response 8: Same as our response to Point 1.

Response 9: Unlike other neural network, where all neuron is fully connected with every other neurons of the next layer, CNN disregads zero-valued parameters and make less connection between layers. The non-zero parameters can be shared to be used by more than one connection in the layer to reduce the number of connection. This characteristic is useful for recognising features. This sentences are explained this in the 1st paragraph of Section 2.4.

Response 10: We provided RMSE values Tables 2 & 3 in Section 3. Results and Discussion. Root Mean Square Error (RMSE) for 70/30 and 80/20 is around 0.28, while for 90/10 is lower at 0.2, which is consistent with higher accuracy. For 150 epochs, the accuracy of 0.672; 0.692; and 0.674 with RMSE of 0.288; 0.314; and 0.296 for the corresponding data split in 70/30; 80/20; and 90/10 ratios respectively. The most significant increase in accuracy is for 150 epochs with a 90/10 split of testing and validation data, which shows an increase of 0.045 for accuracy of 0.719, the lowest RMSE achieved by 70/30 split with 100 epochs with a drop of RMSE value from 0.284 to 0.024.

Author Response File: Author Response.docx

Reviewer 2 Report

Please see me comments in the attached file. Thank you. 

Comments for author File: Comments.docx

Author Response

The suggested rewriting has been incorporated into the text for Introduction (lines 101-110) and Conclusions (lines 424-434)

Introduction (Lines 100-110)

The objective of this work is to investigate the mapping of flood potential in Jakarta and nearby coastal areas using 3D CNN on co-polarized (VV) and cross-polarized (VH) Sentinel-1a SAR images. The dataset SAR was collected using Google Earth Engine and consists of Sentinel-1a VV and VH images between November 2019 and December 2020 as RGB composite (VV and VH) TIF images. The images are then preprocessed into grayscale images to be converted into a vector data format. The January 2, 2020 images were also sampled as flood and non-flood target sub-images along with the corresponding locations from other images to form the multitemporal value changes of the flooded locations along with the consistency of the non-flooded locations. The CNN training is performed with percentage training/test  values of 70/30; 80/20; and 90/10 with varying epochs between 100 and 160 iterations to obtain the best combination with the highest accuracy and the shortest processing time.

 CONCLUSIONS (Lines 424-434)

In this study, an application of a 3D Convolutional Neural Network for flood mapping is presented. The overfitting problem is minimized by the deactivation factor to reduce the number of neurons and simplify the connections. The result of the research is that the 3D-CNN method enables the analysis of multi-temporal images for flood detection and classification   instead of using multiple image pairs with multiple classification levels. For three combinations of splitting training/test data, the highest overall accuracy of 0.72 was achieved for a split of 90/10 and 150 epochs in 302 minutes.  Looking at computation time, the best performance is achieved with an 80/20 split and 150 epochs with an accuracy of 0.71 in 283 minutes. Another test with epochs other than 150 showed that accuracy gradually decreases with a 90/10 split, but with a lower training function, the accuracy improves as the number of epochs increases.

Author Response File: Author Response.docx

Reviewer 3 Report

Flood in urban areas is counted as a significant disaster that must be correctly mitigated due to the huge amount of affected people, material losses, hampered economic activity, and flood related diseases. The authors used the 3D convolutional neural network on multi-temporal synthetic aperture radar Images for urban flood potential mapping in Jakarta. Before it can be published on this journal, the following points should be improved.

(1) From the title, the context is flood mapping, but in the results, I can’t see the results. Please added the final results to the flood map.

(2) Innovation. In this version, I can’t see the innovation, it is more like a validation of an old method about remote sensing classification. Please revise the Introduction, added more words about the innovation of this research.

(3) Please added more references in the first paragraph.

(4) Too much words about the previous methods of imagery classification in the part of 2.2-2.5, please move them to the introduction. Why you selected the 3D-CNN, this should be explained in the introduction, but not just list the previous method.

(5) Please use RMSE or other indicators to explain the performance of this method used in this research. Just present computational efficiency is not enough.

Comments for author File: Comments.pdf

Author Response

Response 1: The resulting map is provided as Figure 12 in Section 3. Results and Discussion.

Response 2: The reason for 3D-CNN is added in the 8th paragraph of Section 1. Introduction. We proposed a 3-D classification that combines the 2-D image and 1-D multi-temporal processes into a single convolution.

Response 3: We added references [1] and [2], the report from the recent flood provided by the government.

Response 4: We moved previous methods from Sections 2.2-2.5 to Section 1. Introduction. We also explained why we chose the 3D-CNN in the 8th paragraph of Section 1.

Response 5:  We provided RMSE values Tables 2 & 3 in Section 3. Results and Discussion.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

All camments have been answered correctly. It can be accepted

Author Response

Comment: All comments have been answered correctly. It can be accepted

Response: Thank you for accepting the revision

Back to TopTop