1. Introduction
Lakes are an important component of terrestrial ecosystems and participate in the water cycle of the ecosystem. They are also water resources and flood control reservoirs, and so play a role in maintaining the ecological balance of river basins and providing water for residents’ domestic use. Lake area is one of the most important parameters of lake morphology, and its changes reflect the local climate and environment, affecting human production and life. Synthetic Aperture Radar (SAR) can work all day and under all weather conditions, and so can provide data for near-real-time monitoring of terrestrial water bodies. Therefore, the use of SAR for surface water monitoring is of great use in the management of water resources and the prevention of flood disasters.
Water identification methods based on SAR can generally be divided into two broad categories—unsupervised classification [
1,
2,
3,
4,
5,
6] and supervised classification [
7,
8,
9,
10,
11] (including semisupervised classification [
12]). Unsupervised methods with adaptive threshold segmentation can achieve fast identification of water bodies, but usually supervised learning methods give better results [
13]. Traditional supervised learning extracts multiple features through algorithms such as Gray Level Co-Occurrence Matrix [
8,
14] and inputs them into classical machine learning algorithms such as random forest. Such methods require the manual selection of features and tuning of parameters, resulting in huge computational and memory costs.
As a branch of machine learning, deep learning algorithms have been widely studied and applied in remote sensing in recent years due to their efficient image feature identification capabilities [
15,
16,
17,
18]. Since image-level segmentation models such as FCN [
19] and UNet [
20] were first proposed, more and more researchers have applied convolutional neural networks to SAR image water identification and monitoring [
21,
22,
23,
24].
In order to obtain a higher-accuracy water system map, many researchers have improved the SAR water body identification method. Xu [
25] added an attention mechanism to UNet skip connections, and Li [
26] built PA-UNet by adding the Spatial Pyramid Pooling (SPP) module on this basis. Ren [
27] introduced ResNet-34 and the dual attention mechanism into the encoding process of U-Net. From the experimental results, the use of attention mechanisms and SPP can better perceive global information and reduce large-area wrong segmentation, but the identification ability for narrow waters is weak. Dang [
28] incorporated Multiscale Dilated Convolution (MSDC) and Multikernel Max Pooling (MKMP) modules into ResNet, focusing on multiscale and multishape features. Chen et al. [
29] proposed a multiscale deep convolutional network, which extracted and generated high-level features through the Multiscale Spatial Feature (MSF) module and the Multilevel and Selective Attention Network (MLSAN), improving the output results based on weighting measures. In these studies, multiscale convolution was used to extract deep-level information of feature maps, but the depth of networks increased, and there were still many false alarms. In addition, some scholars obtained more refined output results at the expense of high computational consumption by paralleling or connecting multiple neural networks in series. Nemni et al. [
30] used an XNet that integrates two UNet-like encoding–decoding processes to avoid the loss of water features caused by multiple downsampling encoding. Kim et al. [
31] used a parallel-structured HRNet network to simultaneously extract features of different resolutions using the parallel structure. Bai [
32] designed BASNet to identify water. The network consists of a Wide-UNet-like network and a Residual Refinement Module (RRM). The first encoding–decoding process obtains the probability map of water body segmentation, and the second learns how the segmentation output differs from the ground truth.
The above studies have made some improvements to deep learning technology to SAR data for water identification, but there are still some deficiencies: (1) the identification accuracy of narrow waters is not high; (2) the false alarm problem caused by radar noise and mountain shadows is not well handled; (3) high-accuracy networks are too complicated and consume lots of computational resources, making it difficult to achieve efficient segmentation.
To address these issues, a water body identification method based on Attention-UNet3+ is proposed in this paper. The Attention-UNet3+ is improved from U-Net. It combines full-scale skip connections, an attention mechanism, and deep supervision. Full-scale skip connections were proposed in 2020 [
33]. It is used for image segmentation [
34,
35] and object detection [
36,
37]. On this basis, the channel attention mechanism was used to organize the connected multichannel information to obtain better prediction results [
38,
39]. However, due to the existence of multiple skip connections, the computer memory usage of the model during training is relatively high. Adding an attention module that requires lots of resources may reduce the transfer effect of full-scale skip connections. To tackle this problem, this paper combines a low-resource-demanding spatial attention model with full-scale skip connections.
The main contributions of the research are as follows:
- (1)
For the problem of low identification accuracy in narrow waters, full-scale skip connections are introduced. This connection transfers and utilizes different scale features in the decoding process, integrating low-level details and high-level semantics in the feature map, which helps the network to extract features in narrow waters
- (2)
The spatial attention mechanism is used to suppress false alarms in water identification. The mechanism generates a spatial attention coefficient matrix, determines the focus information of the feature map, performs feature sorting and fusion, and suppresses the background irrelevant to water body identification.
- (3)
Considering the high computational cost and low efficiency of the current high-precision deep learning models, a deep supervision module is added to the model. The staged output of the decoder is used to improve the model efficiency, which enables the model to have fast segmentation capabilities.
The rest of this article is organized as follows.
Section 2 introduces our proposed method, including a detailed introduction of Attention-UNet3+.
Section 3 presents the study area and data used.
Section 4 gives the experimental results of the model and the multitemporal water analysis of the study area.
Section 5 and
Section 6 contain a discussion of the experimental results and a summary of the article, respectively.
4. Results and Analysis
Using Intel(R) Core(TM) i7-11800H CPU 2.30 GHz, 16 G RAM, RTX 3070, and TensorFlow GPU 2.5.0 framework with Python3.8 as the experimental environment, the rationality and validity of Attention-UNet3+ were verified. Three experiments were designed: (1) Attention-UNet3+ was compared with the commonly used water identification models, to confirm its superiority; (2) The ability of the stage output results of deep supervision was tested to improve the water identification efficiency while ensuring the overall classification effect; (3) Multitemporal SAR data were used to periodically monitor the water in the Poyang Lake area, and to examine the ability to identify large-scale multitemporal waters.
The testing and validation of the models were carried out in four validation areas with different river morphology and different surface environments in the Poyang Lake area in 2021. The expert visual interpretation results were used as the ground truth. The SAR images and the corresponding labels are shown in
Figure 11. Val. 1 is the small mountain river. Val. 2 is the plain lake surface in the wet season. Val. 3 is the plain lake surface in the dry season. Val. 4 is the mountain branch lake.
4.1. Comparison of Different Models
Water bodies usually have obvious scattering characteristics in SAR images, but high-accuracy water identification also encounters some difficulties. The main reasons are incomplete identification of small-scale rivers caused by diverse shapes of water bodies and false detection caused by objects with low backscattering coefficients such as mountain shadows and tidal flats. To address these difficulties, full-scale skip connections and the attention mechanism are used to improve the performance of water detection. To quantitatively and qualitatively evaluate the impact of the proposed strategy on model performance, six sets of experiments were designed, as shown below. In the table and figure, TS is the threshold segmentation. Att refers to the network model with the attention mechanism; 3+ represents the full-scale skip connections. The training strategies and samples of other networks such as SegNet [
47] and Deeplabv3+ [
48] are the same as those of Att-UNet3+.
From the quantitative analysis of the results in
Table 3,
Table 4 and
Table 5, it can be seen that the results of Att-UNet3+ are better than those of other methods, and the results are also the best in the UNet framework. Taking IOU as an example, the attention mechanism improved the accuracy of Att-UNet by 4.76% compared with UNet, and the full-scale connections allowed Att-UNet3+ to improve the accuracy of Att-UNet by 2.78%. Adding the attention mechanism and full-scale connections to the UNet framework to form a full-scale attention gate mechanism effectively improved the water identification accuracy.
Figure 12 shows the results of water identification from different methods in different validation samples. It can be seen from the figure that the results of Val. 2 were relatively good, while the results of 1, 3, and 4 were relatively poor. Because the water in Val. 2 was in a flat area, the features were more obvious, and the shadow of the hill was less obvious. Therefore, it was easy to distinguish water from other ground objects. For Val. 1, the water was all small water bodies, and the river curvature was high, resulting in the omission of water and low accuracy. For Val. 3, this area was the center of Poyang Lake during the dry season. The backscattering coefficient of the exposed tidal flats was relatively low and close to water, which led to the misclassification of water. For Val. 4, there were mountains and many tributaries of the lake, which made it prone to misclassification and loss of the edge. Moreover, because the mountain shadow presented a low-scattering area similar to the water, it was prone to misdetection, resulting in poor identification accuracy.
From the analysis of
Figure 12 and
Table 3,
Table 4 and
Table 5, it can be seen by comparing Val. 1, 3, and 4 that Att-UNet3+ was superior to the other water identification models. Att-UNet3+ alleviated the misclassification of water, incomplete identification of narrow rivers, and false alarms of water body shadows. In summary, the attention mechanism and full-scale skip connections improved the network performance to varying degrees, which enabled Attention-UNet3+ to achieve high-accuracy water identification.
4.2. Stage Output Results of Deep Supervision
In practice, it is usually necessary to monitor water bodies over a large area and make large-scale water body thematic maps. Such applications do not pay too much attention to the accuracy of small river identification in the effort to obtain monitoring results quickly. In this regard, the model of this paper introduces a deep supervision module, which enables the model decoder to have the ability to output in stages. By pruning the model and removing irrelevant decoder paths, the efficiency of the model and the computational requirements can be improved, which enables the model to segment quickly.
The parameter quantity, accuracy, and prediction time of processing a 2320 × 2320 pixel image of each stage output of the network module are shown in
Table 6, and
Figure 13 to demonstrate the effects of stage outputs. It can be seen that, compared with Out5, Out4 led to a 52.02% reduction in prediction time, a 10.07% reduction in parameters, and a 3.48% reduction in IOU.
In terms of other models, SegNet and Deeplabv3+ were selected for comparison.
Table 6 lists the results, and
Figure 13 demonstrates the prediction time and IOU of deep supervision and the results of SegNet and Deeplabv3+. It can be seen that the Out4 has about 25% higher prediction efficiency with relatively better accuracy. The reason that stage output can significantly improve efficiency is that the number of channels of the feature tensors does not change with the decoding process, but the size of the feature tensors increases. When the amount of feature information reaches a certain level, it may lead to much computational consumption. Therefore, if we choose one of the stage outputs as the model result, it can reach a tradeoff between speed and acceptable accuracy.
As can be seen from
Figure 14, using a certain stage output can significantly reduce the segmentation time of the model and the number of computing resources occupied, but also reduces the segmentation accuracy. The decrease in accuracy, in this case, is mainly manifested in the identification of small-scale water and the accurate restoration of water boundaries. It has little effect on the identification of macro-scale water bodies. Therefore, in the case of allowing a certain classification error, the stage output with low model complexity can be selected as the output classification result, so as to achieve the purpose of improving the water identification efficiency.
4.3. Multitemporal Analysis of Poyang Lake
The method proposed based on the Sentinel-1 SAR data was used to monitor the changes in the water body area of Poyang Lake in 2021. As
Figure 15 shows, the area of Poyang Lake in 2021 varied greatly. It expanded from 1988.35 km
2 in February to 4408.31 km
2 in June. After its peak from June to September, the lake shrunk to 1846.79 km
2 in December. The water body changes in the local area of Poyang Lake also have certain characteristics: The region of Poyang Lake with the largest area change was the central part, while the north and south contained mostly unchanged water bodies. During the dry season in spring and winter, Poyang Lake shrank significantly, leaving only a few main streams in the lake center and many discontinuities in the basin. During the wet season in summer, the center of the lake expanded to the southeast–northwest direction, with the width of the rivers increasing. The tributaries became more complete, with fewer interruptions.
As shown in
Figure 16, the blue histogram is the water area of Poyang Lake identified by the model proposed, and the broken line is the water level monitored by the hydrological station. It can be seen that the water area of Poyang Lake varied greatly, and the area in the wet season was about 2.39 times that in the dry season. There was a good correlation between the water level data of hydrological stations and the area of Poyang Lake. Its Pearson coefficients were 0.9674, 0.9498, and 0.9811, which are almost the same rise and fall, and indicate the same trend of change. This shows that the method in this paper has a proven multiphase water identification ability, and so can realize high-accuracy monitoring of the dry–wet season cycle of Poyang Lake.
5. Discussion
The experimental results in this paper show that Attention-UNet3+ has good application potential in Sentinel-1 image water mapping, with results better than those of UNet, Deeplabv3+, and other deep semantic segmentation networks. The validation results show that Attention-UNet3+ was about 7.54% and 4.87% better than UNet in terms of IOU and Kappa. The improvements used in this paper mainly included the attention mechanism and full-scale skip connections. The attention mechanism was applied to the network for information enhancement. It fully captured feature information without wasting resources and improved the effectiveness of full-scale skip connections to transfer features.
The model in this paper fixed the number of convolution kernels in the decoding process to 320, which reduced information redundancy. However, the full-scale skip connections and the operation of normalizing all features to one scale before connection require a lot of computation. In this regard, deep supervision is proposed in this paper—that is, without going through all the decoding layers, convolution and upsampling were used to complete the output of the segmentation results in the middle of decoding, to improve the segmentation efficiency.
Secondly, comparative experiments showed that deep neural networks for SAR image segmentation had a better effect than traditional threshold segmentation. They were more universal and stable and had a better identification effect on multiple time phases. However, deep learning is prone to overfitting in the middle of training, resulting in a significant drop in accuracy. In this regard, we used the early stopping method for training. During training, the model parameters output by each epoch were used to verify the accuracy of the test samples. If the model test accuracy was not improved after multiple epochs, training was stopped. The model that performed best in the test dataset was used as the result of training.
Figure 17 shows the identification results of various models for mountain and narrow rivers. It can be seen from the results that Attention-UNet3+ effectively alleviated the false alarm problem of mountain shadows and the incomplete river identification of UNet, Deeplabv3+, and other networks in SAR image water identification. Some studies used object-oriented methods to improve this problem, but they relied on selected features and subjective empirical knowledge to establish segmentation parameters. Using the semantic segmentation network can preserve the detailed information of the edge and complete the identification of water at different scales. Moreover, recent deep learning research has shown that trained models can be extended to other regions and other data sources for water identification through transfer learning. Therefore, Attention-UNet3+ is more suitable for water identification.
The method used in this paper performed well for water monitoring, but there are still some limitations: (1) Metal objects such as buildings and ships may cause obvious angular reflections. These lead to some bright spots appearing in SAR images, which affect the identification of water body boundaries. In this regard, multisource SAR data can be integrated, such as the combination of Sentinel-1 and GF-3, to improve the mapping of water; (2) Since the resolution of the GRD product in Sentinel-1 IW mode is 20 × 22 m, many small-scale rivers with a width of less than or close to 20 m present too few pixels in the image. Their mixed pixel characteristics are between water and soil, causing some interference with the classification.
6. Conclusions
The Attention-UNet3+ model proposed in this paper can extract and utilize the full-scale features of the input images through the encoding–decoding structure. It performs well in the water monitoring of Sentinel-1 SAR images. The proposed method has the following characteristics for water body identification from SAR images: (1) The full-scale skip connections added in the decoding process can combine the features mapped by different scale features to complete water body monitoring at different scales; (2) The spatial attention gate mechanism can strengthen the identification of target features, suppress the interference of background information, and improve the accuracy and robustness of the segmentation algorithm. It combines well with the full-scale skip connections; (3) The deep supervision module is used to improve the segmentation efficiency at a small performance cost.
Taking the Poyang Lake area as the experimental research area, the results of comparative experiments show that Attention-UNet3+ can better obtain the characteristics of water bodies with different shapes. Its average IOU/Kappa values are 0.9502 and 0.9698. Multitemporal experiments demonstrate the water monitoring capability of this method on a macro scale. The Pearson coefficients of the identified Poyang Lake area and the water level are above 0.9, which indicates a high correlation.
There are still potential improvements to the experiment that needs to be carried out in future research: (1) Multisource and multifeature remote sensing data fusion can further improve the accuracy of water monitoring; (2) The model proposed in this paper has a certain generalization ability. Therefore, the ability of this method to be used to monitor water in other regions through transfer learning can be tested and analyzed.