Enhancement of Detecting Permanent Water and Temporary Water in Flood Disasters by Fusing Sentinel-1 and Sentinel-2 Imagery Using Deep Learning Algorithms: Demonstration of Sen1Floods11 Benchmark Datasets

Bai, Yanbing; Wu, Wenqi; Yang, Zhengxin; Yu, Jinze; Zhao, Bo; Liu, Xing; Yang, Hanfang; Mas, Erick; Koshimura, Shunichi

doi:10.3390/rs13112220

Open AccessArticle

Enhancement of Detecting Permanent Water and Temporary Water in Flood Disasters by Fusing Sentinel-1 and Sentinel-2 Imagery Using Deep Learning Algorithms: Demonstration of Sen1Floods11 Benchmark Datasets

by

Yanbing Bai

^1,†

,

Wenqi Wu

^1,†

,

Zhengxin Yang

²,

Jinze Yu

³,

Bo Zhao

⁴,

Xing Liu

⁵,

Hanfang Yang

^1,*,

Erick Mas

⁶

and

Shunichi Koshimura

⁶

¹

Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing 100872, China

²

China Huaneng Group Co., Ltd., Beijing 100031, China

³

Graduate School of Information Science and Technology, The University of Tokyo, Tokyo 113-8656, Japan

⁴

School of Informatics, The University of Edinburgh, Edinburgh EH8 9AB, UK

⁵

Graduate School of Information Sciences, Tohoku University, Sendai 980-8579, Japan

⁶

International Research Institute of Disaster Science, Tohoku University, Sendai 980-8572, Japan

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Remote Sens. 2021, 13(11), 2220; https://doi.org/10.3390/rs13112220

Submission received: 10 May 2021 / Revised: 1 June 2021 / Accepted: 2 June 2021 / Published: 5 June 2021

(This article belongs to the Special Issue Improving Disaster Damage and Loss Assessments by Modeling and Remote Sensing Techniques)

Download

Browse Figures

Versions Notes

Abstract

:

Identifying permanent water and temporary water in flood disasters efficiently has mainly relied on change detection method from multi-temporal remote sensing imageries, but estimating the water type in flood disaster events from only post-flood remote sensing imageries still remains challenging. Research progress in recent years has demonstrated the excellent potential of multi-source data fusion and deep learning algorithms in improving flood detection, while this field has only been studied initially due to the lack of large-scale labelled remote sensing images of flood events. Here, we present new deep learning algorithms and a multi-source data fusion driven flood inundation mapping approach by leveraging a large-scale publicly available Sen1Flood11 dataset consisting of roughly 4831 labelled Sentinel-1 SAR and Sentinel-2 optical imagery gathered from flood events worldwide in recent years. Specifically, we proposed an automatic segmentation method for surface water, permanent water, and temporary water identification, and all tasks share the same convolutional neural network architecture. We utilize focal loss to deal with the class (water/non-water) imbalance problem. Thorough ablation experiments and analysis confirmed the effectiveness of various proposed designs. In comparison experiments, the method proposed in this paper is superior to other classical models. Our model achieves a mean Intersection over Union (mIoU) of

52.99 %

, Intersection over Union (IoU) of

52.30 %

, and Overall Accuracy (OA) of

92.81 %

on the Sen1Flood11 test set. On the Sen1Flood11 Bolivia test set, our model also achieves very high mIoU (

47.88 %

), IoU (

76.74 %

), and OA (

95.59 %

) and shows good generalization ability.

Keywords:

deep learning; Sen1Floods11 datasets; multi-source data fusion; Sentinel-1; Sentinel-2; permanent water; temporary water; flood

Graphical Abstract

1. Introduction

Natural hazards such as floods, landslides, and typhoons pose severe threats to people’s lives and property. Among them, floods are the most frequent, widespread, and deadly natural disasters. They affect more people globally each year than any other disaster [1,2,3]. Floods not only cause injuries, deaths, losses of livelihoods, infrastructure damage, and asset losses, but they can also have direct or indirect effects on health [4]. In the past decade, there have been 2850 disasters that triggered natural hazards globally, of which 1298 were floods, accounting for approximately

46 %

[1]. In 2019, there were 127 floods worldwide, affecting 69 countries, causing 1586 deaths and more than 10 million displacements [2]. In the future, floods may become a frequent disaster that poses a huge threat to human society due to sea-level rise, climate change, and urbanization [5,6].

The effective way to reduce flood losses is to enhance our ability of flood risk mitigation and response. In recent years, timely and accurate flood detection products derived from satellite remote sensing imagery are becoming effective methods of responding flood disaster, and city and infrastructure planners, risk managers, disaster emergency response agency, and property insurance company are benefiting from it in worldwide [7,8,9,10,11], while identifying permanent water and temporary water in flood disasters efficiently is a remaining challenge.

In recent years, researchers have done a lot of work in flood detection based on satellite remote sensing images. The most commonly used method to distinguish between water and non-water areas is a threshold split-based method [12,13,14,15]. However, the optimal threshold is affected by the geographical area, time, and atmospheric conditions of image collection. Therefore, the generalization ability of the above methods is greatly limited. In recent years, the European Space Agency (ESA) developed a series of Sentinel missions to provide free available datasets including Synthetic Aperture Radar (SAR) data from Sentinel-1 sensor and optical data from Sentinel-2 sensor, taking into account the respective advantages of optical imagery and SAR imagery in flood information extraction [5,14,16,17,18], and the investigation of combining SAR images with optical images for more accurate flood mapping is of great interest for researchers [16,19,20,21,22]. Although the data fusion method improves the accuracy of flood extraction, it is still very challenging to distinguish permanent water from temporary water problems in flood disasters. The identification of temporary waters in flood disasters and permanent waters mainly rely on multi-temporal change detection methods [23,24,25,26,27,28]. This type of approach requires at least one pair of multi-temporal remote sensing scenes which were acquired before and after a flood event. Although the method based on multi-temporal change detection can better detect the temporary water in flood disaster events, the multi-temporal method was greatly limited due to the mandatory demand for satellite imagery before disasters.

Deep learning methods represented by convolutional neural networks have been proven to be effective in the field of flood damage assessment, and related research has grown rapidly since 2017 [29]. However, most algorithms are focusing on affected buildings in flood events [22,30], with very few examples of flood water detection. The latest research focuses on the application of deep learning algorithms for enhancing flood water detection [31,32]. Early research focused on the extraction of surface water [33,34,35]. Furkan et al. proposed a deep-learning-based approach for surface water mapping from Landsat imagery. The results demonstrated that the deep learning method outperform the traditional threshold and Multi-Layer Perceptron model. Considering the difference in the characteristics of surface water and floods in satellite imagery, which will increase the difficulty of flood extraction. Maryam et al. developed a semantic segmentation method for extracting the flood boundary from UAV imagery. The semantic segmentation-based flood extraction method was further applied to identify the flood inundation caused by mounting destruction [36]. Experimental results validate the efficiency and effectiveness of the proposed method. Muñoz et al. [37] combined the multispectral Landsat imagery and dual-polarized synthetic aperture radar imagery to evaluate the performance of integrating convolutional neural network and data fusion framework for generating compound flood mapping. The usefulness of this method was verified by comparing with other methods. These studies show that deep learning algorithms play an important role in enhancing flood classification. However, research in this field is still in its infancy, due to the lack of high-quality large-scale flood annotation satellite datasets.

Recent development in earth observation has contributed a series of open-sourced large scale disaster related satellite imagery datasets, which has greatly spurred the advance of leveraging deep learning algorithm for disaster mapping from satellite imagery. For building damage classification, the xBD dataset has provided large scale satellite imagery data that collected from the multi-type disasters with four category damage level labels to worldwide researchers, and the research spawned by this public data has also verified the great potential of deep learning in building damage recognition [38,39]. For flooded building damage assessment in Hurricane disaster events, FloodNet provides a high-resolution UAV image dataset and has done the same task [40]. The recent release of the large-scale open-source Sen1Floods11 dataset [5] is boosting the research of utilizing deep learning algorithms for water type detection in flood disasters [41]. For water type detection in flood disaster events, Sen1Flood should take on a similar role. Unfortunately, so far, only one preliminary work has been conducted.

With the purpose of developing an efficient benchmark algorithm for distinguishing between permanent water and temporary water in flood disasters based on the Sen1Flood dataset to boost the research in this area, the contributions and originality of this research are as follows.

Effectiveness: To the best of our knowledge, in terms of the sen1flood11 dataset, the accuracy of our proposed algorithm is the highest so far.
Convenience: All of the sentinel-1 and sentinel-2 imagery utilized in the model come from post-flood imagery, and this greatly reduces the reliance on satellite imagery data before the flood.
Refinement: We introduced salient object detection algorithm to modify the convolutional neural network classifier, in addition, the multi-scale loss algorithm and data augmentation were adopted to improve the accuracy of the model.
Robustness: the robustness of our proposed algorithm was verified in a new Bolivia flood dataset.

2. Sen1Floods11 Dataset

We utilize the Flood Event Data in the Sen1Floods11 dataset [5] to train, validate, and test deep learning flood algorithms. This dataset provides raw Sentinel-1 SAR images (IW mode, GRD product), Sentinel-2 MSI Level-1C images, classified permanent water, and flood water. There are 4831 non-overlapping 512 × 512 tiles from 11 flood events. This dataset helps map flood at the global scale, covering 120,406 square kilometers, spanning 14 biomes, 357 ecological regions, and six continents. Locations for flood events are shown in Figure 1.

For each selected flood event, the time interval between the acquisition of Sentinel-1 imagery and Sentinel-2 imagery shall not exceed two days. The Sentinel-1 imagery contains two bands, VV and VH, which are backscatter values; the Sentinel-2 imagery includes 13 bands, and all bands are TOA reflectance values. The imagery is projected to the WGS-84 coordinate system.The ground resolution of the imagery is different on different bands. In order to fuse images, the ground resolution is sampled to 10 m on all bands. Each band is visualized in Figure 2.

Due to the high cost of hand labels, 4370 tiles are not hand-labeled and exported with annotations automatically generated by the Sentinel-1 and Sentinel-2 flood classification algorithms, which can serve as weakly supervised training data. The remaining 446 tiles are manually annotated by trained remote sensing analysts for high-quality model training, validation and testing. The weakly supervised data contain two types of surface water labels. One is produced by the histogram thresholding method based on the Sentinel-1 image; the other is generated by the Normalized Difference Vegetation Index (NDVI), MNDWI and thresholding method based on the Sentinel-2 image. All cloud and cloud shadow pixels were masked and excluded from training and accuracy assessments. Hand labels include all water labels and permanent water labels. For all water labels, analysts exploited Google Earth Engine to correct the automated labels using Sentinel-1 VH band, two false color composites from Sentinel-2 and the reference water classification from Sentinel-2 by removing uncertain areas and adding to the water classification. For the permanent water label, with the help of the JRC (European Commission Joint Research Center) surface water data set, Bonafilia et al. [5] labeled the pixels that were detected as water at both the beginning (1984) and end (2018) of the dataset as permanent water pixels. The pixels never observed as water during this period are treated as non-water pixels. The remaining pixels are masked. Examples of water label are visualized in Figure 3.

Like most existing studies [42], the Sen1Floods11 dataset shows the highly imbalanced distribution between flooded and unflooded area. As shown in Table 1, for all water, water pixels account for only

9.16 %

, and non-water pixels account for

77.22 %

, which is about eight times the number of surface water pixels. The percentages of water pixels and non-water pixels in permanent waters are

3.06 %

and

96.94 %

, respectively, and the number of non-water pixels is about 32 times that of non-water pixels.

The dataset is split into three parts: training set, validation set, and test set. All 4370 images automatically labeled are used as the weakly supervised training set. The hand-labeled data are first randomly split into training, validation, and testing data in the proportion 6:2:2. In order to test the model’s ability to predict unknown flood events, all hand-labeled data related to the Bolivia flood event is held out for a distinct test set. Rest hand-labeled data in training, validation, and testing sets are composed of final training, validation, and testing set, respectively. Correspondingly, all data from Bolivia in the weakly-supervised training set are also excluded and do not participate in model training. The overall composition of the dataset is shown in Table 2.

3. Method

Figure 4 depicts a flowchart of our work. In this work, the benchmark Sen1Floods11 dataset that contains 4831 samples with 512 × 512-pixel size from both Sentinel-1 and Sentinel-2 imagery were utilized to develop the algorithm. The spatial resolution resampling and pixel value normalization were adopted to fuse sentinel-1 and sentinel-2 imagery. The model input is a stack of fused image bands with permanent water and temporary water annotations. The network used is BASNet proposed by Qin et al. [43] is used. BASNet architecture is shown in Figure 5. The network combines a densely supervised encoder–decoder network similar to U-Net and a new residual refinement module. The encoder–decoder produces a coarse probability prediction map from the image input, and the Residual Refinement Module is responsible for learning the residuals between the coarse probability prediction map and the ground truth. We apply the network to remote sensing data sets, adapt, train, and optimize it to better predict flood areas.

3.1. Encoder–Decoder Network

The encoder–decoder network can fuse abstract high-level information and detailed low-level information and is mainly responsible for water body segmentation. The encoder part contains an input convolution block and six convolution stages consisting of basic res-blocks. The input convolution block comprises a convolution layer with batch normalization [44] and Rectified Linear Unit (ReLU) activation function [45]. The size of the convolution kernel is 3 × 3, and the stride is 1. This convolution block can convert an input image of any number of channels to a feature map of 64 channels. The first four convolution stages directly use the four stages of ResNet34 [46]. Except for the first residual block, each residual block will double the feature map’s channels. The last two convolution stages have the same structure, and both consist of three basic res-blocks with 512 filters and a 2 × 2 max pooling operation with stride 2 for downsampling.

Compared with traditional convolution, atrous convolution can obtain a larger receptive field without increasing parameter amount and capture long-range information. In addition, atrous convolution can avoid the reduction of feature map resolution caused by repeated downsampling and allow a deeper model [47,48]. To capture global information, Qin et al. [43] designed a bridge stage to connect the encoder and the decoder. This stage is comprised of three atrous convolution blocks. Each atrous convolution block consists of a convolution layer with 512 atrous 3 × 3 filters with dilation 2, a batch normalization, and a ReLU activation function.

In the decoder part, each decoder stage corresponds to an encoder stage. As shown in the Formulas (1) and (2),

g_{i}

is the merge base,

f_{i}

is the feature map to be fused,

h_{i}

is the merged feature map, and

c o n v_{3 \times 3}

is a convolution layer followed by batch normalization and ReLU activation, RRM represents the Residual Refinement Module and the operator

[\cdot; \cdot]

represents concatenation along the channel axis. There are three convolution layers with batch normalization and a ReLU activation function in each decoder stage. The feature map from the last stage is first fed to an up-sampling layer to get

g_{i + 1}

, and then concatenated with the current feature map

f_{i}

. In order to alleviate overfitting, the last layer of the bridge stage and each stage of the encoder is fed to a

3 \times 3

convolution layer followed by a bilinear up-sampling layer and sigmoid activation function to generate a prediction map and then supervised by the ground truth:

g_{i} = \{\begin{matrix} R R M (h_{i}) & if i = 1 \\ u p s a m p l e (h_{i}) & if i \geq 2 \\ h_{i} & if i = 7 \end{matrix}

(1)

h_{i} = \{\begin{matrix} c o n v_{3 \times 3} (c o n v_{3 \times 3} (c o n v_{3 \times 3} ([g_{i + 1}; f_{i}]))) & if i \neq 7 \\ f_{i} & if i = 7 \end{matrix}

(2)

3.2. Residual Refinement Module

The residual refinement module learns the residuals between the coarse maps and the ground truth and then adds them to the coarse maps to produce the final results. By fine-tuning the prediction results, the fuzzy and noisy boundaries can be made sharper. The probability gap between water and non-water pixels can be increased. Compared to the encoder–decoder network, the residual refinement module has a simpler architecture containing an input layer, a four-stage encoder–decoder with a bridge, and an output layer. Each stage has only one convolution layer followed by a batch normalization and a ReLU activation function. The convolution layer has 64 3 × 3 filters with stride 1. In addition, down-sampling and up-sampling are performed through non-overlapping 2 × 2 max pooling and bilinear interpolation in the encoder–decoder network.

3.3. Hybrid Loss

Training loss is defined as the summation of all outputs’ losses:

L = \sum_{k = 1}^{K} l^{(k)}

(3)

where l is the loss of the k-th output. Here,

K = 8

, including seven side outputs from the bridge stage and decoder and one final output from the refinement module. Each loss is comprised of three parts: focal loss [49], Structural SIMilarity (SSIM) loss [50], and Intersection over Union (IoU) loss [51]:

l = l_{f o c a l} + l_{s s i m} + l_{i o u}

(4)

We replace Binary Cross Entropy (BCE) loss in Qin et al. (2019) [43] with focal loss. It is defined as:

l_{f o c a l} = \{\begin{matrix} - α {(1 - p)}^{γ} l o g (p) & if y = 1 \\ - (1 - α) p^{γ} l o g (1 - p) & if y = 0 \end{matrix}

(5)

where y specifies the ground-truth class and p is the model’s estimated probability for the water class. The focal loss is designed to address the extreme imbalance between water and non-water classes during training. On the one hand, using weighting factor

α \in [0, 1]

for water and

1 - α

for non-water to balance the importance of water/non-water pixels, focal loss can avoid non-water pixels dominating the gradient. Larger

α

puts more weight on water pixels. On the other hand, with the modulating factor

γ

, focal loss can reduce the loss contribution from easy examples and thus focus training on hard non-waters (e.g., boundary pixels). Focal loss is a pixel-level measure and can be utilized to maintain a smooth gradient for all pixels.

Taking each pixel’s local neighborhood into account, SSIM loss is a patch-level measure and is developed to capture the structural information in an image. It gives a higher loss around the boundary when the predicted probabilities on the boundary and the inner pixels are the same. Thus, it can drive the model to focus training on the boundary pixels, which are usually harder to classify. It is defined as:

l_{s s i m} = 1 - \frac{(2 μ_{x} μ_{y} + C_{1}) (2 σ_{x y} + C_{2})}{(μ_{x}^{2} + μ_{y}^{2} + C_{1}) (σ_{x}^{2} + σ_{y}^{2} + C_{2})}

(6)

where

μ_{x}

,

μ_{y}

, and

σ_{x}

,

σ_{y}

are the mean and standard deviations of patch x and y, respectively,

σ_{x y}

is their covariance.

C_{1}

and

C_{2}

are constants which are used to avoid zero denominators.

IoU loss is a map-level measure. It gives more focus on the water body, whose error rate is usually higher than non-water area. For water pixels, a larger

p_{r, c}

stands for higher confidence of the network prediction. The higher the model prediction’s confidence of the water is, the lower the loss is. It is defined as:

l_{i o u} = 1 - \frac{\sum_{r = 1}^{H} \sum_{c = 1}^{W} p_{r, c} y_{r, c}}{\sum_{r = 1}^{H} \sum_{c = 1}^{W} [p_{r, c} + y_{r, c} - p_{r, c} y_{r, c}]}

(7)

where H, W are the height and width of the image, respectively.

4. Experimental Analysis

As introduced in the data section, we first use the 4160 automatically annotated images to pre-train our network and then use 252 manually annotated images to fine-tune it. We monitor convergence and overfitting during training on the validation set while evaluating the model performance on the test set and Bolivia test set.

4.1. Implementation Detail and Experimental Setup

The backbone used in BASNet is ResNet-34 [46] for all the experiments, which was pre-trained on ImageNet [52]. Other convolution layers are initialized by Xavier [53]. For hyperparameters introduced by focal loss,

γ

is set to 2.5, and

α

is set to 0.25. We utilize the Adam optimizer [54] to train our network and all hyperparameters are set by default, namely

lr = 0.001

,

betas = (0.9, 0.999)

,

eps = 1 \times 10^{- 8}

,

weight dacay = 0

. A “poly” learning rate policy is used to adjust the learning rate; that is, the learning rate is multiplied by

{(1 - \frac{s t e p}{m a x s t e p})}^{p o w e r}

with

power = 0.9

. The image batch size during training is 8.

We apply some data augmentation methods to enhance the generalization ability of our model. Specifically, random horizontal and vertical flip and random rotation of

45 \times k, (k = 1, 2, 3, 4)

degrees with a probability of 0.5 are performed on each image. For extra data preprocessing, we randomly crop the image into a fixed size of

256 \times 256

, and the pixel value is normalized to a value in the range of

[0, 1]

.

We implement our network based on the Pytorch deep learning framework. Both training and testing are conducted on Public Computing Cloud, Renmin University of China using a single NVIDIA Titan RTX GPU and an Inter Xeon Silver 4114 @2.20GHz CPU.

4.2. Evaluation Metrics

We use five measures to evaluate our method: Intersection over Union (IoU), mean Intersection over Union (mIoU, equal weighting of all tiles), Overall Accuracy (OA), omission error rates (OMISSION), and commission error rates (COMMISSION) [55]. IoU, mIoU, and OA are standard evaluation metrics for image segmentation tasks. Omission and commission error rates are reported for comparison to remote sensing literature [5,56]. Omission rates are false negative water detection rates, and commission rates are false positive water detection rates. When the number is larger, the model’s performance is worse. Following the common practice, we used the mIoU as the primary metric to evaluate methods. We calculate all the above metrics for water body segmentation, permanent water body segmentation, and temporary water body segmentation. The definitions are as follows:

I o U_{i} = \frac{T P_{i}}{F N_{i} + F P_{i} + T P_{i}}

(8)

m I o U = \frac{1}{N} \sum_{i = 1}^{N} I o U_{i}

(9)

I o U = \frac{\sum_{i = 1}^{N} T P_{i}}{\sum_{i = 1}^{N} (F N_{i} + F P_{i} + T P_{i})}

(10)

A c c u r a c y_{i} = \frac{T P_{i} + T N_{i}}{T P_{i} + F P_{i} + T N_{i} + F N_{i}}

(11)

O A = \frac{1}{N} \sum_{i = 1}^{N} A c c u r a c y_{i}

(12)

O M I S S I O N = \frac{\sum_{i = 1}^{N} F N_{i}}{\sum_{i = 1}^{N} (F N_{i} + T P_{i})}

(13)

C O M M I S S I O N = \frac{\sum_{i = 1}^{N} F P_{i}}{\sum_{i = 1}^{N} (F P_{i} + T N_{i})}

(14)

where i is the index of image. For the i-th sample,

T P_{i}

,

T N_{i}

,

F P_{i}

,

F N_{i}

is the number of accurately categorised water pixels, accurately categorized non-water pixels, misclassified non-water pixels, and misclassified water pixels, respectively.

4.3. Ablation Study

In this section, we conduct experiments to verify each improvement measures’ effectiveness in our model. The ablation study contains three parts: data augmentation ablation, loss ablation, and image fusion ablation. The baseline is the original BASNet network with only the Sentinel-1 SAR image as input. Table 3 and Table 4 show summaries of the test set and Bolivia test set results, respectively.

4.3.1. Data Augmentation

As can be seen from Table 3, after applying data augmentation, on the test set, the mIoU, IoU, and OA of the water body segmentation task are increased by

0.97 %

,

7.19 %

, and

4.55 %

, respectively. Those of the temporary water body segmentation task increase by

2.12 %

,

1.76 %

, and

2.08 %

, respectively. There is a decrease on the permanent water body detection task. Data augmentation improves mIoU, IoU, and OA on all three tasks on the Bolivia test set (Table 4).

4.3.2. Loss Function

As shown in Table 5, the number of water pixels is far more than that of non-water pixels on all tasks. Specifically, the number of non-water pixels is 8.43, 17.49, and 11.81 times that of surface, permanent and temporary water pixels, respectively. We can see from Table 3 and Table 4, on the test set, that focal loss outperforms cross-entropy loss with respect to mIoU, IoU, and OA on all tasks, especially the permanent water mapping task. For permanent water body extraction, focal loss brings improvements of mIoU by

25.64 %

, IoU by

12.74 %

, and OA by

5.32 %

(Table 3). On the Bolivia test set, focal loss significantly improved the permanent water detection, showing an improvement for mIoU of

26.87 %

, IoU of

36.02 %

, and OA of

4.90 %

(Table 4). Meanwhile, it achieves the best performance on the test set’s surface and temporary water detection tasks and the Bolivia test set’s surface water detection task. In addition, it achieves competitive performance in temporary water detection on the Bolivia test set.

Comparison with other loss functions: Besides focal loss, distributional ranking (DR) loss [57] and normalized focal loss [58] are also proposed to address the class imbalance problem.

DR loss [57] treats the classification problem as a ranking problem and improves object detection by distributional ranking supplementary. The distributional ranking model ranks the distributions between positive and negative examples in the worse-case scenario. As a result, this loss can handle a class imbalance problem and the problem of imbalanced hardness of negative examples as well as maintaining efficiency. In addition, it can separate the foreground (water) and background (non-water) with a large margin. We have the DR loss as:

\underset{θ}{m i n} L_{D R} (θ) = \sum_{i}^{N} l_{l o g i s t i c} ({\hat{P}}_{i, j_{-}} - {\hat{P}}_{i, j_{+}} + γ)

(15)

l_{l o g i s t i c} (z) = \frac{1}{L} log (1 + exp (L z))

(16)

{\hat{P}}_{i, j_{-}} = \sum_{j_{-}}^{n_{-}} \frac{1}{Z_{-}} exp (\frac{p_{i, j_{-}}}{λ_{-}}) p_{i, j_{-}} = \sum_{j_{-}}^{n_{-}} q_{i, j_{-}} p_{i, j_{-}}

(17)

where

j_{+}

and

j_{-}

denote the water and non-water pixels, respectively.

q_{+} \in Δ

and

q_{-} \in Δ

denote the distributions over water and non-water pixels, respectively.

P_{+}

and

P_{-}

represent the expected scores under the corresponding distribution.

Δ = {q : \sum_{j} q_{j} = 1, \forall j, q_{j} \geq 0}

.

Zheng et al. [58] modified focal loss for balanced optimization. They adjust the loss distribution without a change of sum to avoid gradient vanishing. The paper introduced a normalization constant Z that guarantees

\sum l (p_{j}, y_{j}) = \frac{1}{Z} \sum {(1 - p_{j})}^{γ} l (p_{j}, y_{j})

(18)

where

l (p_{j}, y_{j})

denotes the j-th pixel’s cross entropy loss.

p_{j}

represents the j-th pixel’s predicted probability. In addition,

y_{j}

is its ground truth. Hence, for the loss of each pixel, they produce a new weight

\frac{1}{Z} {(1 - p_{j})}^{γ}

.

We carry out experiments to compare these loss functions with focal loss. We use DR loss with

λ_{+} = 1

,

λ_{-} = \frac{1}{l o g (3.5)}

,

L = 6

,

τ = 4

and set

γ = 2.5

,

α = 0.25

in focal loss and normalized focal loss. Table 5 shows that the permanent water detection task suffers the most from the class imbalance problem. Table 6 and Table 7 compare the mIoU of the three loss functions on the three tasks. Focal loss obtains slightly inferior results in surface and temporary water detection on the test set. Still, both on the test and Bolivia test set, focal loss produces the highest mIoU in permanent water detection. With respect to normalized focal loss, focal loss (on the test set and the Bolivia test set) brings

2.74 %

and

3.16 %

performance gains in mIoU, respectively. For DR loss, focal loss gains

1.69 %

and

10.83 %

in mIoU, respectively. In addition, focal loss outperforms normalized focal loss and DR loss on the Bolivia test set across all other tasks.

4.3.3. Image Fusion

After fusing Sentinel-2 optical imagery and Sentinel-1 SAR imagery, results improve significantly on all tasks, which demonstrates that optical imagery can provide useful supplementary information on water segmentation.

4.4. Comparison with General Methods

4.4.1. Evaluation on the Sen1Floods11 Test Set

To evaluate our method, we conduct comprehensive experiments on the Sen1Floods11 dataset. On the one hand, since Sen1Floods11 is a newly released data set, there are few existing methods to conduct experiments on this data set; on the other hand, existing methods use different evaluation metrics, and the experimental results of different methods cannot be directly compared. Therefore, we reproduce the classical methods in remote sensing and some CNN (convolutional neural network)-based methods from classical to state-of-the-art to compare with our model under uniform experimental conditions and with the same evaluation codes. These methods include the Otsu thresholding method based on the VH band [5], FCN-ResNet50 [5], Deeplab v3+ [59], and

U^{2}

-Net [60].

The Otsu thresholding method [5,15] is a widely used method in water body extraction. It can degenerate a grayscale image into a binary image with the best threshold to distinguish the two different types of pixels. The between-class variance is calculated according to the specific algorithm corresponding to Otsu [15]. Then, find the threshold corresponding to the largest between-class variance as the best threshold. This method is unsupervised, simple, and fast. ResNet [46] is utilized as a standard backbone in most networks. Bonafilia et al. [5] use the fully convolutional neural networks (FCNN) model with a ResNet50 backbone to map floods. Here, we compare our method with it. Our tasks can also be regarded as segmentation tasks. Chen et al. [48] proposed Deeplab for semantic segmentation. Here, we try to use it to map floods and compare it with our model. We use the latest version of Deeplab(Deeplabv3+ [59]) as our comparison. We replace its Aligned Xception with Resnet-50 [46] to decrease the parameters amount and computational complexity. Moreover, to account for the relatively small batch size, we convert all of the batch normalization layers to group normalization layers. Considering water bodies as the salient object, we can solve our problems by SOD models.

U^{2}

-Net [60] is a SOD network and has mainly two advantages compared with previous architectures. First, it allows training from scratch rather than from existing pre-trained backbones, which avoids the problem of distributional differences between RGB images and satellite imagery; second, it can achieve deeper architecture while maintaining high-resolution feature maps at a low memory and computation cost.

Quantitative comparison: We train and test all the models on the same data set. In addition, we use the same evaluation code to evaluate all the predicted maps for a fair comparison. Table 8 summarizes the mIoU, IoU, OMISSION, COMMISSION, and OA of all the methods on the Sen1Floods11 test set. We underline the best results under each metric. As can be seen, on the test set, the method proposed in this paper outperforms other methods by a large margin (over

18 %

) in surface water, permanent water as well as temporary water segmentation in mIoU. In terms of COMMISSION and OA, our model achieves the best result on all tasks. For IoU, the proposed method largely improves most tasks (over

8 %

) except that Otsu and FCN-ResNet50 are superior in permanent water segmentation on the test set. There is still room for improvements on the test set.

Qualitative comparison: Figure 6 depicts the flooding maps of each model (binary maps for the Otsu method, probability maps for FCN-ResNet50, Deeplab,

U^{2}

Net, and our model) on some samples from the Sen1Floods11 test set. For the Otsu method, lots of details such as small river tributaries and fragmented land in the middle of rivers are missed. For FCN-ResNet50, Deeplab, and

U^{2}

Net, besides missing details, we can observe the large gray area in predicted probability maps, which demonstrates that these CNN-based models can only produce low confidence predictions and blurred boundary. However, our method produces both clear boundaries and sharp-contrast maps. Even in urban area, our method produce an accurate map. Compared with other models, the proposed method produces clearer and more accurate prediction maps.

4.4.2. Evaluation of the New Scenario: Bolivia Flood Datasets

Quantitative comparison: Table 9 summarizes the mIoU, IoU, OMISSION, COMMISSION, and OA of all the methods. We underline the best results under each metric. As can be seen, on the Bolivia test set, the method proposed in this paper increases mIoU by over

5 %

in surface water, permanent water, as well as temporary water segmentation. Our model achieves the best result on all tasks in COMMISSION and OA. For IoU, the proposed method improves over

7 %

on all tasks. Our method performs well on the Bolivia test set in terms of OMISSION.

Qualitative comparison: Figure 7 depicts the flooding maps of each model (binary maps for the Otsu method, probability maps for FCN-ResNet50, Deeplab,

U^{2}

Net, and our model) on some samples from Sen1Floods11 Bolivia test set. In some challenging scenes, such as low-contrast foreground and cloud occlusion areas, our method still obtains robust results.

5. Discussion

Data augmentation can improve the generalization ability of the model, especially when there is only a little training data. Since the Bolivia test set consists of completely unknown flood event images, the gains indicate that the model’s generalization ability has been improved. Focal loss [49] is designed to deal with the extreme imbalance between water/non-water, difficult/easy pixels during training. The ablation experimental results show the effectiveness of focal loss in dealing with a sample imbalance problem. Optical imagery contains information on the ground surface’s multispectral reflectivity, which is widely used in water indices and thresholding methods. Image fusion aims to use optical image data to assist SAR image prediction. Our experimental results demonstrate that optical imagery can provide useful supplementary information on water segmentation.

However, all water and temporary water have poor mIoU scores. There are two reasons to explain this phenomenon. One reason is the training process, as the image of all water and temporary water data have more water pixels and fewer non-water pixels. The difference in the sample size leads to differences in the learning effect. From OMISSION and COMMISSION, we can see that the all water and temporary water tasks perform better than permanent water on water pixels, and perform worse on non-water pixels. On the whole, non-water pixels dominate our data. The poorer prediction of non-water pixels leads to worse overall results. The other reason is the difference in image characteristics, and the images in all water and temporary water data contain more small tributaries and scattered areas from newly flooded areas. These areas are usually more challenging to identify.

With the help of hybrid loss, our model pays more attention to boundary pixels and increasing the confidence of the prediction. As a result, our method can not only produce richer details and sharper boundaries but also distinguish water and non-water pixels with a larger probability gap. The excellent feature extraction ability of deep learning model enables our model to deal with some challenging scenes.

6. Conclusions

In this paper, we developed an efficient model for detecting permanent water and temporary water in flood disasters by fusing Sentinel-1 and Sentinel-2 Imagery using a deep learning algorithm with the help of benchmark Sen1Floods11 datasets. The BASNet network adopted in this can capture both large-scale and detailed structural features. By combining with focal loss, our model achieved state-of-the-art accuracy for hard boundary pixels’ identification. The model’s performance was further improved by fusing the multi-source information, and the ablation study verified the effectiveness of each improvement measures. The comparison experiment results demonstrated that the implemented method could detect permanent water and temporary water flood more accurately than other methods. The proposed model performed well on the unknown Bolivia test set, which verifies its robustness. Due to the network architecture’s modularity, it can be easily adapted to data from other sensors. Finally, the method does not require prior knowledge, additional data pre-processing, and multi-temporal data, which significantly reduces the method’s complexity and increases the degree of automation.

Ongoing and future works focus on training water segmentation models on high spatial resolution remote sensing imagery. High spatial resolution remote sensing imagery has more complex background information, objects with larger-scale variation, and more unbalanced pixel classes [58]. More sophisticated modules are required to extract and fuse richer image information. In addition, the existing pre-trained neural networks are all based on RGB images, and, directly applied to remote sensing images, may reduce the efficiency of transfer learning due to differences in the data distribution. McKay et al. [61] dealt with this problem by discarding deep feature layers. Qin et al. [60] designed a network that allows for training from scratch, but this lighter network may degrade the performance. Although we dramatically improved the results of flood mapping, there is still much work to do.

Author Contributions

Conceptualization, Y.B.; methodology, Y.B. and H.Y.; software, W.W.; validation, W.W.; formal analysis, W.W.; investigation, Y.B. and H.Y.; resources, Y.B., H.Y., and Z.Y.; data curation, W.W.; writing—original draft preparation, Y.B. and W.W.; writing—review and editing, Y.B., W.W., J.Y., B.Z., X.L., E.M., and S.K.; visualization, W.W.; supervision, Y.B., J.Y., B.Z., X.L., E.M., and S.K.; project administration, Y.B.; funding acquisition, Y.B., H.Y., and Z.Y.; All authors have read and agreed to the published version of the manuscript.

Funding

This research was partly funded by the Fundamental Research Funds for the Central Universities, Research Funds of Renmin University of China (20XNF022), the fund for building world-class universities (disciplines) of Renmin University of China, the Japan Society for the Promotion of Science Kakenhi Program (17H06108), and Core Research Cluster of Disaster Science and Tough Cyberphysical AI Research Center at Tohoku University. The author gratefully acknowledges the support of K.C. Wong Education Foundation, Hong Kong.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The Sen1Floods11 we utilized in this study can be reached at https://github.com/cloudtostreet/Sen1Floods11 (accessed on 10 November 2020).

Acknowledgments

This work was supported by the Public Computing Cloud, Renmin University of China. We also thank the Core Research Cluster of Disaster Science at Tohoku University (a Designated National University) for their support. We thank the reviewers for their helpful and constructive comments on our work. The author gratefully acknowledges the support of K.C. Wong Education Foundation, Hong Kong.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AW	All Water
BCE	Binary Cross Entropy
CNN	Convolutional Neural Network
COMMISSION	Commission Error Rates
DR	Distributional Ranking
ESA	European Space Agency
FCNN	Fully Convolutional Neural Network
GEE	Google Earth Engine
HAND	Height above Nearest Drainage
IoU	Intersection over Union
JRC	European Commission Joint Research Centre
mIoU	mean Intersection over Union
MNDWI	Modified Normalized Difference Water Index
NDFI	Normalized Difference Flood Index
NDFVI	Normalized Difference Flood in Vegetated Areas Index
NDVI	Normalized Difference Vegetation Index
OA	Overall Accuracy
OMISSION	Omission Error Rates
PW	Permanent Water
RAPID	RAdar-Produced Inundation Diary
ReLU	Rectified Linear Unit
RRM	Residual Refinement Module
SAR	Synthetic Aperture Radar
SOD	Salient Object Detection
SSIM	Structural SIMilarity
TW	Temporary Water
TOA	Top of Atmosphere

References

IFRC. World Disaster Report 2020. Available online: https://media.ifrc.org/ifrc/world-disaster-report-2020/ (accessed on 18 January 2021).
IDMC. Global Report on Internal Displacement. Available online: https://www.internal-displacement.org/sites/default/files/publications/documents/2019-IDMC-GRID.pdf (accessed on 17 January 2021).
Aon. Weather, Climate & Catastrophe Insight 2019 Annual Report. Available online: http://thoughtleadership.aon.com/Documents/20200122-if-natcat2020.pdf?utm_source=ceros&utm_medium=storypage&utm_campaign=natcat20 (accessed on 18 January 2021).
FAO. The State of Food Security and Nutrition in the World. Available online: http://www.fao.org/3/I9553EN/i9553en.pdf (accessed on 18 January 2021).
Bonafilia, D.; Tellman, B.; Anderson, T.; Issenberg, E. Sen1Floods11: A Georeferenced Dataset to Train and Test Deep Learning Flood Algorithms for Sentinel-1. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 210–211. [Google Scholar]
Mason, D.C.; Speck, R.; Devereux, B.; Schumann, G.J.P.; Neal, J.C.; Bates, P.D. Flood Detection in Urban Areas Using TerraSAR-X. IEEE Trans. Geosci. Remote Sens. 2010, 48, 882–894. [Google Scholar] [CrossRef] [Green Version]
Alfieri, L.; Cohen, S.; Galantowicz, J.; Schumann, G.J.; Trigg, M.A.; Zsoter, E.; Prudhomme, C.; Kruczkiewicz, A.; de Perez, E.C.; Flamig, Z.; et al. A global network for operational flood risk reduction. Environ. Sci. Policy 2018, 84, 149–158. [Google Scholar] [CrossRef]
Zajic, B. How flood mapping from space protects the vulnerable and can save lives. Planet Labs 2019, 17. [Google Scholar]
Oddo, P.C.; Bolten, J.D. The value of near real-time earth observations for improved flood disaster response. Front. Environ. Sci. 2019, 7, 127. [Google Scholar] [CrossRef] [Green Version]
Enenkel, M.; Osgood, D.; Anderson, M.; Powell, B.; McCarty, J.; Neigh, C.; Carroll, M.; Wooten, M.; Husak, G.; Hain, C.; et al. Exploiting the convergence of evidence in satellite data for advanced weather index insurance design. Weather. Clim. Soc. 2019, 11, 65–93. [Google Scholar] [CrossRef]
Okada, G.; Moya, L.; Mas, E.; Koshimura, S. The Potential Role of News Media to Construct a Machine Learning Based Damage Mapping Framework. Remote Sens. 2021, 13, 1401. [Google Scholar] [CrossRef]
Martinis, S.; Twele, A.; Voigt, S. Towards operational near real-time flood detection using a split-based automatic thresholding procedure on high resolution TerraSAR-X data. Nat. Hazards Earth Syst. Sci. 2009, 9, 303–314. [Google Scholar] [CrossRef]
Mahoney, C.; Merchant, M.; Boychuk, L.; Hopkinson, C.; Brisco, B. Automated SAR Image Thresholds for Water Mask Production in Alberta’s Boreal Region. Remote Sens. 2020, 12, 2223. [Google Scholar] [CrossRef]
Tiwari, V.; Kumar, V.; Matin, M.A.; Thapa, A.; Ellenburg, W.L.; Gupta, N.; Thapa, S. Flood inundation mapping- Kerala 2018; Harnessing the power of SAR, automatic threshold detection method and Google Earth Engine. PLoS ONE 2020, 15, e0237324. [Google Scholar] [CrossRef]
Otsu, N. Threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 1979, 9, 62–66. [Google Scholar] [CrossRef] [Green Version]
Bioresita, F.; Puissant, A.; Stumpf, A.; Malet, J.P. Fusion of Sentinel-1 and Sentinel-2 image time series for permanent and temporary surface water mapping. Int. J. Remote Sens. 2019, 40, 9026–9049. [Google Scholar] [CrossRef]
Conde, F.C.; Munoz, M.D. Flood Monitoring Based on the Study of Sentinel-1 SAR Images: The Ebro River Case Study. Water 2019, 11, 2454. [Google Scholar] [CrossRef] [Green Version]
Huang, M.M.; Jin, S.G. Rapid Flood Mapping and Evaluation with a Supervised Classifier and Change Detection in Shouguang Using Sentinel-1 SAR and Sentinel-2 Optical Data. Remote Sens. 2020, 12, 2073. [Google Scholar] [CrossRef]
Markert, K.N.; Chishtie, F.; Anderson, E.R.; Saah, D.; Griffin, R.E. On the merging of optical and SAR satellite imagery for surface water mapping applications. Results Phys. 2018, 9, 275–277. [Google Scholar] [CrossRef]
Benoudjit, A.; Guida, R. A Novel Fully Automated Mapping of the Flood Extent on SAR Images Using a Supervised Classifier. Remote Sens. 2019, 11, 779. [Google Scholar] [CrossRef] [Green Version]
DeVries, B.; Huang, C.Q.; Armston, J.; Huang, W.L.; Jones, J.W.; Lang, M.W. Rapid and robust monitoring of flood events using Sentinel-1 and Landsat data on the Google Earth Engine. Remote Sens. Environ. 2020, 240, 111664. [Google Scholar] [CrossRef]
Rudner, T.G.; Rußwurm, M.; Fil, J.; Pelich, R.; Bischke, B.; Kopačková, V.; Biliński, P. Multi3Net: Segmenting flooded buildings via fusion of multiresolution, multisensor, and multitemporal satellite imagery. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 702–709. [Google Scholar]
Schlaffer, S.; Matgen, P.; Hollaus, M.; Wagner, W. Flood detection from multi-temporal SAR data using harmonic analysis and change detection. Int. J. Appl. Earth Obs. Geoinf. 2015, 38, 15–24. [Google Scholar] [CrossRef]
Twele, A.; Cao, W.X.; Plank, S.; Martinis, S. Sentinel-1-based flood mapping: A fully automated processing chain. Int. J. Remote Sens. 2016, 37, 2990–3004. [Google Scholar] [CrossRef]
Schlaffer, S.; Chini, M.; Giustarini, L.; Matgen, P. Probabilistic mapping of flood-induced backscatter changes in SAR time series. Int. J. Appl. Earth Obs. Geoinf. 2017, 56, 77–87. [Google Scholar] [CrossRef]
Amitrano, D.; Di Martino, G.; Iodice, A.; Riccio, D.; Ruello, G. Unsupervised Rapid Flood Mapping Using Sentinel-1 GRD SAR Images. IEEE Trans. Geosci. Remote Sens. 2018, 56, 3290–3299. [Google Scholar] [CrossRef]
Moya, L.; Endo, Y.; Okada, G.; Koshimura, S.; Mas, E. Drawback in the change detection approach: False detection during the 2018 western Japan floods. Remote Sens. 2019, 11, 2320. [Google Scholar] [CrossRef] [Green Version]
Moya, L.; Mas, E.; Koshimura, S. Learning from the 2018 Western Japan Heavy Rains to Detect Floods during the 2019 Hagibis Typhoon. Remote Sens. 2020, 12, 2244. [Google Scholar] [CrossRef]
Bai, Y.; Gao, C.; Singh, S.; Koch, M.; Adriano, B.; Mas, E.; Koshimura, S. A framework of rapid regional tsunami damage recognition from post-event TerraSAR-X imagery using deep neural networks. IEEE Geosci. Remote Sens. Lett. 2017, 15, 43–47. [Google Scholar] [CrossRef] [Green Version]
Bai, Y.; Mas, E.; Koshimura, S. Towards operational satellite-based damage-mapping using u-net convolutional network: A case study of 2011 tohoku earthquake-tsunami. Remote Sens. 2018, 10, 1626. [Google Scholar] [CrossRef] [Green Version]
Kang, W.; Xiang, Y.; Wang, F.; Wan, L.; You, H. Flood detection in gaofen-3 SAR images via fully convolutional networks. Sensors 2018, 18, 2915. [Google Scholar] [CrossRef] [Green Version]
Li, Y.; Martinis, S.; Wieland, M. Urban flood mapping with an active self-learning convolutional neural network based on TerraSAR-X intensity and interferometric coherence. ISPRS J. Photogramm. Remote Sens. 2019, 152, 178–191. [Google Scholar] [CrossRef]
Chen, L.; Zhang, P.; Xing, J.; Li, Z.; Xing, X.; Yuan, Z. A Multi-Scale Deep Neural Network for Water Detection from SAR Images in the Mountainous Areas. Remote Sens. 2020, 12, 3205. [Google Scholar] [CrossRef]
Wangchuk, S.; Bolch, T. Mapping of glacial lakes using Sentinel-1 and Sentinel-2 data and a random forest classifier: Strengths and challenges. Sci. Remote Sens. 2020, 2, 100008. [Google Scholar] [CrossRef]
Zhang, Y.; Zhang, G.; Zhu, T. Seasonal cycles of lakes on the Tibetan Plateau detected by Sentinel-1 SAR data. Sci. Total Environ. 2020, 703, 135563. [Google Scholar] [CrossRef]
Sunkara, V.; Purri, M.; Saux, B.L.; Adams, J. Street to Cloud: Improving Flood Maps With Crowdsourcing and Semantic Segmentation. arXiv 2020, arXiv:2011.08010. [Google Scholar]
Muñoz, D.F.; Muñoz, P.; Moftakhari, H.; Moradkhani, H. From Local to Regional Compound Flood Mapping with Deep Learning and Data Fusion Techniques. Sci. Total Environ. 2021, 146927. [Google Scholar] [CrossRef]
Bai, Y.; Hu, J.; Su, J.; Liu, X.; Liu, H.; He, X.; Meng, S.; Mas, E.; Koshimura, S. Pyramid Pooling Module-Based Semi-Siamese Network: A Benchmark Model for Assessing Building Damage from xBD Satellite Imagery Datasets. Remote Sens. 2020, 12, 4055. [Google Scholar] [CrossRef]
Su, J.; Bai, Y.; Wang, X.; Lu, D.; Zhao, B.; Yang, H.; Mas, E.; Koshimura, S. Technical Solution Discussion for Key Challenges of Operational Convolutional Neural Network-Based Building-Damage Assessment from Satellite Imagery: Perspective from Benchmark xBD Dataset. Remote Sens. 2020, 12, 3808. [Google Scholar] [CrossRef]
FloodNet: A High Resolution Aerial Imagery Dataset for Post Flood Scene Understanding. arXiv 2020, arXiv:2012.02951.
Konapala, G.; Kumar, S. Exploring Sentinel-1 and Sentinel-2 Diversity for Flood Inundation Mapping Using Deep Learning. Technical Report. Copernicus Meetings. 2021. Available online: https://doi.org/10.5194/egusphere-egu21-10445 (accessed on 4 March 2021).
Li, Y.; Martinis, S.; Plank, S.; Ludwig, R. An automatic change detection approach for rapid flood mapping in Sentinel-1 SAR data. Int. J. Appl. Earth Obs. Geoinf. 2018, 73, 123–135. [Google Scholar] [CrossRef]
Qin, X.; Zhang, Z.; Huang, C.; Gao, C.; Dehghan, M.; Jagersand, M. Basnet: Boundary-aware salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7479–7489. [Google Scholar]
Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv 2015, arXiv:1502.03167. [Google Scholar]
Hahnloser, R.H.; Seung, H.S.; Slotine, J.J. Permitted and forbidden sets in symmetric threshold-linear networks. Neural Comput. 2003, 15, 621–638. [Google Scholar] [CrossRef] [PubMed]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv 2014, arXiv:1412.7062. [Google Scholar]
Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Intell. Mach. 2017, 40, 834–848. [Google Scholar] [CrossRef]
Goyal, P.; Kaiming, H. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Wang, Z.; Simoncelli, E.P.; Bovik, A.C. Multiscale structural similarity for image quality assessment. In Proceedings of the The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, Pacific Grove, CA, USA, 9–12 November 2003; Volume 2, pp. 1398–1402. [Google Scholar]
Máttyus, G.; Luo, W.; Urtasun, R. Deeproadmapper: Extracting road topology from aerial images. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 3438–3446. [Google Scholar]
Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Sardinia, Italy, 13–15 May 2010; pp. 249–256. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Banko, G. A Review of Assessing the Accuracy of Classifications of Remotely Sensed Data and of Methods Including Remote Sensing Data in Forest Inventory; International Institute for Applied Systems Analysis: Laxenburg, Austria, 1998. [Google Scholar]
Chang, C.H.; Lee, H.; Kim, D.; Hwang, E.; Hossain, F.; Chishtie, F.; Jayasinghe, S.; Basnayake, S. Hindcast and forecast of daily inundation extents using satellite SAR and altimetry data with rotated empirical orthogonal function analysis: Case study in Tonle Sap Lake Floodplain. Remote Sens. Environ. 2020, 241, 111732. [Google Scholar] [CrossRef]
Qian, Q.; Chen, L.; Li, H.; Jin, R. DR loss: Improving object detection by distributional ranking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 12164–12172. [Google Scholar]
Zheng, Z.; Zhong, Y.; Wang, J.; Ma, A. Foreground-Aware Relation Network for Geospatial Object Segmentation in High Spatial Resolution Remote Sensing Imagery. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 4096–4105. [Google Scholar]
Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
Qin, X.; Zhang, Z.; Huang, C.; Dehghan, M.; Zaiane, O.R.; Jagersand, M. U2-Net: Going deeper with nested U-structure for salient object detection. Pattern Recognit. 2020, 106, 107404. [Google Scholar] [CrossRef]
McKay, J.; Gag, I.; Monga, V.; Raj, R.G.; Ieee. What’s Mine is Yours: Pretrained CNNs for Limited Training Sonar ATR. arXiv 2017, arXiv:1706.09858. [Google Scholar]

Figure 1. Sample locations of flood event data.

Figure 2. Example of all the bands.

Figure 3. Illustration of water label. (a) example of hand labeled data of all water; (b) example of hand labeled data of permanent water; (c) the illustration of temporary water, permanent water, and all water.

Figure 4. Flowchart of water type detection proposed in this work.

Figure 5. Architecture of the BASNet used in this study.

Figure 6. Qualitative comparison of proposed method with other methods. Each row represents one image and corresponding flooding maps (binary maps for the Otsu method, probability maps for FCN-ResNet50, Deeplab,

U^{2}

Net, and our model). Each column represents one method. Results shown on all water (AW), permanent water (PW), and temporary water (TW).

Figure 6. Qualitative comparison of proposed method with other methods. Each row represents one image and corresponding flooding maps (binary maps for the Otsu method, probability maps for FCN-ResNet50, Deeplab,

U^{2}

Net, and our model). Each column represents one method. Results shown on all water (AW), permanent water (PW), and temporary water (TW).

Figure 7. Qualitative comparison of proposed method with other methods. Each row represents one image and corresponding flooding maps (binary maps for the Otsu method, probability maps for FCN-ResNet50, Deeplab,

U^{2}

Net, and our model). Each column represents one method. Results shown on all water (AW), permanent water (PW), and temporary water (TW).

Figure 7. Qualitative comparison of proposed method with other methods. Each row represents one image and corresponding flooding maps (binary maps for the Otsu method, probability maps for FCN-ResNet50, Deeplab,

U^{2}

Net, and our model). Each column represents one method. Results shown on all water (AW), permanent water (PW), and temporary water (TW).

Table 1. Water, Non-Water, and Ignored area proportion in pixel level

Label	Water	Non-Water	Ignored
All Water	9.16%	77.22%	13.63%
Permanent Water	3.06%	96.94%	0.00%

Table 2. Sen1Floods11 dataset composition

Dataset		Sample Size
Training Set	Hand-Labeled	252
Training Set	Weakly-Labeled	4160
Validation Set		89
Testing Set		90
Bolivia Testing Set		15

Table 3. Evaluation on the Sen1Floods11 test set for ablation studies. Results showed on all water (AW), permanent water (PW), and temporary water (TW).

Task	Method	Augments	Focal Loss	Image Fuse	mloU (%)	IoU (%)	OA (%)
All Water	Baseline				29.39	35.08	79.49
	+Augments	✓			30.36	42.27	84.04
	+Focal Loss		✓		42.77	54.10	90.58
	+Image Fuse			✓	47.60	63.13	92.85
	+Augments+Focal Loss	✓	✓		42.95	53.86	90.80
	+Augments+Focal Loss+Image Fuse	✓	✓	✓	58.73	64.52	93.38
Permanent Water	Baseline				40.09	45.81	91.50
	+Augments	✓			34.10	38.14	88.18
	+Focal Loss		✓		50.06	35.48	91.47
	+Image Fuse			✓	65.04	50.81	93.49
	+Augments+Focal Loss	✓	✓		59.74	50.88	93.50
	+Augments+Focal Loss+Image Fuse	✓	✓	✓	68.79	52.03	93.84
Temporary Water	Baseline				25.69	31.68	83.52
	+Augments	✓			27.81	33.44	85.60
	+Focal Loss		✓		34.64	39.19	89.13
	+Image Fuse			✓	50.53	51.19	92.97
	+Augments+Focal Loss	✓	✓		34.75	38.99	88.19
	+Augments+Focal Loss+Image Fuse	✓	✓	✓	52.99	52.30	92.81

Table 4. Evaluation on Sen1Floods11 Bolivia test set for ablation studies. Results shown on all water (AW), permanent water (PW), and temporary water (TW).

Task	Method	Augments	Focal Loss	Image Fuse	mloU (%)	IoU (%)	OA (%)
All Water	Baseline				45.31	64.54	91.40
	+Augments	✓			45.51	64.95	91.53
	+Focal Loss		✓		47.12	70.63	94.07
	+Image Fuse			✓	46.39	67.37	92.22
	+Augments+Focal Loss	✓	✓		47.42	71.23	94.20
	+Augments+Focal Loss+Image Fuse	✓	✓	✓	54.07	78.90	95.79
Permanent Water	Baseline				38.78	36.10	95.31
	+Augments	✓			43.37	40.43	94.32
	+Focal Loss		✓		67.29	71.79	99.06
	+Image Fuse			✓	72.30	75.42	99.23
	+Augments+Focal Loss	✓	✓		70.23	76.45	99.22
	+Augments+Focal Loss+Image Fuse	✓	✓	✓	75.27	78.80	99.39
Temporary Water	Baseline				40.69	64.32	91.76
	+Augments	✓			40.70	64.66	92.08
	+Focal Loss		✓		41.12	66.47	92.79
	+Image Fuse			✓	43.53	70.60	94.23
	+Augments+Focal Loss	✓	✓		40.65	64.13	92.20
	+Augments+Focal Loss+Image Fuse	✓	✓	✓	47.88	76.74	95.59

Table 5. Water, Non-Water, and Ignored area proportion at the pixel level of all water (AW), permanent water (PW), and temporary water (TW).

Label	Water	Non-Water	Ignored
All Water	9.16%	77.22%	13.63%
Permanent Water	4.26%	74.49%	21.25%
Temporary Water	6.54%	77.22%	16.25%

Table 6. Comparison on different losses in terms of mIoU(%) on the Sen1Floods11 test set. Results shown on all water (AW), permanent water (PW), and temporary water (TW).

Loss	AW	PW	TW
Normalized Focal Loss	42.26	57.00	35.23
DR Loss	43.32	58.05	36.87
Focal Loss	42.95	59.74	34.75

Table 7. Comparison on different losses in terms of mIoU(%) on the Sen1Floods11 Bolivia test set. Results shown on all water (AW), permanent water (PW), and temporary water (TW).

Loss	AW	PW	TW
Normalized Focal Loss	47.36	67.08	40.33
DR Loss	43.99	59.41	36.08
Focal Loss	47.42	70.24	40.65

Table 8. Performance comparison with other methods on Sen1Floods11 test set. Results shown on all water (AW), permanent water (PW), and temporary water (TW). xx indicates the best performance.

Task	Method	mIoU(%)	IoU(%)	OMISSION(%)	COMMISSION(%)	OA(%)
All Water	Otsu	35.73	54.58	26.68	4.91	90.01
	FCN-ResNet50	30.9	49.32	28.49	8.26	88.56
	Deeplab v3	32.08	47.67	34.89	6.72	87.96
	$U^{2}$ Net	38.39	52.03	32.95	5.30	89.96
	BASNet	58.73	64.52	31.19	1.16	93.38
Permanent Water	Otsu	47.78	62.81	1.22	4.90	92.83
	FCN-ResNet50	35.19	55.11	18.04	7.55	90.70
	Deeplab v3	37.54	44.60	43.55	4.12	90.18
	$U^{2}$ Net	38.36	27.80	64.03	4.56	88.34
	BASNet	68.79	52.03	47.43	0.16	93.84
Temporary Water	Otsu	27.95	38.98	40.42	4.91	89.13
	FCN-ResNet50	22.55	30.30	24.49	19.93	79.50
	Deeplab v3	34.49	41.58	46.09	3.96	90.27
	$U^{2}$ Net	31.05	44.08	36.07	6.01	89.77
	BASNet	52.99	52.30	43.15	1.16	92.81

Table 9. Performance comparison with other methods on Bolivia datasets. Results shown on all water (AW), permanent water (PW), and temporary water (TW). xx indicates the best performance.

Task	Method	mIoU(%)	IoU(%)	OMISSION(%)	COMMISSION(%)	OA(%)
All Water	Otsu	48.22	70.58	12.85	4.58	93.64
	FCN-ResNet50	43.52	57.32	10.19	10.68	89.53
	Deeplab v3	44.46	68.11	10.28	5.98	92.89
	$U^{2}$ Net	40.11	56.60	9.10	11.43	88.53
	BASNet	54.07	78.90	7.66	3.21	95.79
Permanent Water	Otsu	36.26	35.20	55.36	5.25	93.47
	FCN-ResNet50	32.98	22.64	16.60	8.37	91.43
	Deeplab v3	36.09	34.50	11.83	4.86	93.08
	$U^{2}$ Net	40.13	34.06	20.12	4.20	95.55
	BASNet	75.27	78.80	18.60	0.10	99.39
Temporary Water	Otsu	42.23	69.28	13.35	4.58	93.64
	FCN-ResNet50	36.21	38.72	10.48	23.10	78.74
	Deeplab v3	31.40	54.74	29.77	4.98	90.80
	$U^{2}$ Net	35.29	53.06	7.47	13.10	87.60
	BASNet	47.88	76.74	9.85	3.08	95.59

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bai, Y.; Wu, W.; Yang, Z.; Yu, J.; Zhao, B.; Liu, X.; Yang, H.; Mas, E.; Koshimura, S. Enhancement of Detecting Permanent Water and Temporary Water in Flood Disasters by Fusing Sentinel-1 and Sentinel-2 Imagery Using Deep Learning Algorithms: Demonstration of Sen1Floods11 Benchmark Datasets. Remote Sens. 2021, 13, 2220. https://doi.org/10.3390/rs13112220

AMA Style

Bai Y, Wu W, Yang Z, Yu J, Zhao B, Liu X, Yang H, Mas E, Koshimura S. Enhancement of Detecting Permanent Water and Temporary Water in Flood Disasters by Fusing Sentinel-1 and Sentinel-2 Imagery Using Deep Learning Algorithms: Demonstration of Sen1Floods11 Benchmark Datasets. Remote Sensing. 2021; 13(11):2220. https://doi.org/10.3390/rs13112220

Chicago/Turabian Style

Bai, Yanbing, Wenqi Wu, Zhengxin Yang, Jinze Yu, Bo Zhao, Xing Liu, Hanfang Yang, Erick Mas, and Shunichi Koshimura. 2021. "Enhancement of Detecting Permanent Water and Temporary Water in Flood Disasters by Fusing Sentinel-1 and Sentinel-2 Imagery Using Deep Learning Algorithms: Demonstration of Sen1Floods11 Benchmark Datasets" Remote Sensing 13, no. 11: 2220. https://doi.org/10.3390/rs13112220

APA Style

Bai, Y., Wu, W., Yang, Z., Yu, J., Zhao, B., Liu, X., Yang, H., Mas, E., & Koshimura, S. (2021). Enhancement of Detecting Permanent Water and Temporary Water in Flood Disasters by Fusing Sentinel-1 and Sentinel-2 Imagery Using Deep Learning Algorithms: Demonstration of Sen1Floods11 Benchmark Datasets. Remote Sensing, 13(11), 2220. https://doi.org/10.3390/rs13112220

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancement of Detecting Permanent Water and Temporary Water in Flood Disasters by Fusing Sentinel-1 and Sentinel-2 Imagery Using Deep Learning Algorithms: Demonstration of Sen1Floods11 Benchmark Datasets

Abstract

1. Introduction

2. Sen1Floods11 Dataset

3. Method

3.1. Encoder–Decoder Network

3.2. Residual Refinement Module

3.3. Hybrid Loss

4. Experimental Analysis

4.1. Implementation Detail and Experimental Setup

4.2. Evaluation Metrics

4.3. Ablation Study

4.3.1. Data Augmentation

4.3.2. Loss Function

4.3.3. Image Fusion

4.4. Comparison with General Methods

4.4.1. Evaluation on the Sen1Floods11 Test Set

4.4.2. Evaluation of the New Scenario: Bolivia Flood Datasets

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI