With the development of industrialization and urbanization, including the influence of climate change, significant changes are occurring in surface water bodies, and many regions are facing issues of drought and water scarcity. Thus, water conservation has become an urgent priority. For instance, in 2022, Poyang Lake, the largest freshwater lake in China, experienced a phenomenon known as “dry lake”. This phenomenon led to survival challenges for organisms inhabiting the lake and garnered widespread attention domestically and internationally.
Alterations in the size of water bodies result in corresponding changes in the shoreline. These dynamic shoreline changes have a direct impact on marine resources and the ecological environment in coastal areas, potentially leading to ecological degradation. Additionally, human economic activities and transportation are affected [
1], along with a range of associated problems. Examples include the degradation of ecosystem services, heightened environmental pollution, scarcity of biological resources, and an increase in the frequency of climate and marine disasters [
2,
3,
4,
5,
6]. Rapid population growth and coastal development are the primary driving forces of marine habitat degradation. For instance, the addition of concrete structures like seawalls, jetties, and groins can accelerate the erosion and loss of beaches and tidal wetlands [
7]. Relevant studies indicate that the area of Chongming Island increased obviously, accompanied by a notable expansion of built-up areas from 1979 to 2009. Most of the expansion area of the island is added to urbanization and rural settlement development, wetland, and the body of water. The temporal evolution of the built-up areas is highly correlated with the population change [
8]. Therefore, it is beneficial to support the efficient use of marine resources and promote the development of ecological civilization by studying the dynamic changes of the body of water, which will be a reference for promoting good management of marine resources in China.
Two main methods are used for studying dynamic changes in water bodies using remote sensing images: visual and automatic interpretation. Visual interpretation, although highly accurate with continuous extraction results, is limited to small areas and is subject to operator bias, making this method less efficient. By contrast, automatic interpretation offers timeliness and reusability, making this method the preferred approach for identifying water boundaries and extracting remote sensing information about water bodies [
9]. This method enables fast and efficient processing of large remote sensing image datasets, making automatic interpretation the primary focus of current research. Scholars worldwide have extensively studied automatic interpretation methods based on remote sensing images, primarily employing threshold segmentation and remote sensing image classification techniques to extract water body information.
1.1. Water Body Extraction Based on the Threshold Segmentation Method
The threshold segmentation method, commonly used to extract water bodies from images, involves selecting a threshold value of a band or band combination to delineate between water and non-water pixels in an image or images. Compared with the threshold segmentation method, the image classification method generally achieves the purpose of improving the recognition accuracy by improving the structure of the neural network. In the threshold segmentation method, most scholars improve the segmentation effect by modifying and optimizing the bands in the water body index. One widely used index is the normalized difference water index (NDWI
), initially proposed by McFeeters et al. [
10]. NDWI is a modification of the normalized difference vegetation index (NDVI
) and has gained widespread adoption. For instance, Ghosh et al. [
11] utilized satellite images of Thematic Mapper (TM) and Enhanced Thematic Mapper from Landsat satellites. In addition, they applied the improved modified normalized difference water index (MNDWI
) algorithm to identify water bodies and land, allowing for the analysis of spatial and temporal variations in the coastal zone of Hatia Island over a specific period. In another study, Yang et al. [
12] substituted the green band in NDWI with the first principal component of selected principal component analysis transforms and the blue band (wavelengths ranging from 450 to 520 nm) of ZY-3 multispectral image data. They developed two water indices called the new normalized difference water index 1(NNDWI1
) and new normalized difference water index 2 (NNDWI2
), NNDWI1 is sensitive to turbid water bodies, and NNDWI2 is sensitive to water bodies with spectral information affected by vegetation. Additionally, they employed a shadow detection technique to remove shadows in small areas of the segmentation results. This approach achieved high accuracy in water edge detection and maintained relative stability with changes in the threshold value, resulting in the precise extraction of water bodies.
Feyisa et al. [
13] introduced a novel automated water extraction index (AWEI) to mitigate environmental noise effects and enhance the classification accuracy of water bodies. They successfully applied this method to Landsat 5 TM images from various study areas in Denmark, Switzerland, Ethiopia, South Africa, and New Zealand, achieving improved accuracy and robustness in water body extraction. Wang et al. [
14] utilized the NDWI and MNDWI to analyze and explore the spatial and temporal variation characteristics of the Ningbo coastline. Viaña-Borja et al. [
15] proposed an automatic method for water body index-based shoreline detection in multispectral Landsat images. Their approach demonstrated excellent performance in automatically detecting shorelines, achieving a minimum deviation value of −0.91 m and standard deviations ranging from ±4.7 to ±7.29 m in some cases. Luo et al. [
16] devised an automated method for water body extraction by combining the segmentation and classification on whole and local scales, respectively. They fused the hierarchical knowledge of water extraction and set up an iterative algorithm to achieve a gradual approach of the water body’s optimal margin. While this iterative approach improved the results, it was noted that the segmentation and buffering methods may not be entirely sufficient. Xu modified the NDWI to enhance the features of open water bodies in remote sensing images. This modification enables rapid and accurate differentiation between water and non-water features. However, it only detects the presence of water and cannot distinguish between rivers, lakes, or other types of water bodies [
17]. In response to the limitations of previous water body extraction methods that rely on single models or solely spectral information, Qiao et al. [
18] proposed an adaptive water body extraction method. Their approach combines the NDWI, histogram threshold segmentation, and area growth methods to accurately extract water body information.
Thresholds are typically set as constant values, but the presence of ambient noise, such as shadows, forests, built-up areas, snow, and clouds, makes using a fixed threshold for different environmental regions challenging. Tri Dev Acharya et al. [
19] evaluated the effectiveness of commonly used water body-related indices, including NDVI, NDWI, MNDWI, and AWEI, in extracting water bodies from a Landsat 8 scene in Nepal. They found that none of the methods could accurately extract the entire surface water in the scene. This finding highlights the limitations of using fixed thresholds for water body extraction in the presence of environmental noise.
1.2. Water Body Extraction Based on the Image Classification Method
Unlike threshold segmentation, which requires dividing thresholds, remote sensing image classification methods usually extract spatial features of images to achieve classification, mainly including traditional machine learning as well as deep learning, which is a subset of machine learning that excels at processing unstructured data.
Machine learning methods have diverse applications in remote sensing image classification, including road extraction, urban functional area extraction, and target detection. Researchers have conducted numerous studies using machine learning methods to extract information on water bodies. For example, Zeinali et al. [
20] studied coastline changes in Chabahar Bay from 1972 to 2014 using the maximum likelihood classification method to extract water bodies from remote sensing images. Possa et al. [
21] utilized an SVM classifier with a 95% probability threshold to separate water from land in small reservoirs and estimated partial water areas using probabilistic maps. Zhang et al. [
22] employed the decision tree classification method to extract water body information from Operational Land Imager remote sensing data, demonstrating its higher accuracy compared with automatic computer classification methods.
Despite achieving water body information extraction, the aforementioned research methods have some limitations in their algorithms. Threshold-based image segmentation methods are sensitive to image noise, resulting in low extraction accuracy when a minimal grayscale difference exists. Thus, additional methods are necessary to meet extraction requirements. Traditional machine learning image classification methods based on image elements can lead to low classification accuracy and incomplete classification edges, particularly when dealing with complex spectra and fine structures. These problems show that for optical remote sensing images with few bands, high resolution, and data with confusing interclass difference and intraclass difference in features, the key to processing is understanding how to express and learn features efficiently.
Deep learning methods based on neural networks offer advantages over threshold segmentation and traditional machine learning methods. They effectively extract spectral and spatial features through convolutional operations, yielding improved results for water extraction from remote sensing images. The UNet neural network model, proposed by Ronneberger et al. [
23] in 2015, is a fully convolutional network that enables image semantic segmentation. Compared with other methods, the UNet neural network model exhibits better performance even with smaller datasets, resulting in more accurate segmentation outcomes, thereby providing new insights for water body extraction from remote sensing images. In 2018, Li et al. [
24] proposed DeepUNet, an improved version of UNet, to enhance the accuracy of convolutional neural networks in recognizing water bodies in remotely sensed images. Additionally, Kim et al. [
25] employed UNet with a pyramidal pooling layer and achieved a pixel accuracy of 87.61% and an intersection over union (IOU) of 79.52% for four target types, namely buildings, roads, water, and background, using a training dataset of 72,400 and a test dataset of 9600 samples.
In 2022, An and Rui [
26] introduced an Improved Lightweight U-Net specifically designed for remote sensing images. This model reduced the number of downsampling layers to accommodate the low resolution of remote sensing images. Additionally, the bottleneck structure was updated to ensure the effective transmission of water feature information within the model. Zhang et al. [
27] proposed a hybrid MixFormer architecture called MU-Net by embedding the MixFormer block into UNet. The combination of CNN and MixFormer is used to model the local spatial detail information and global contextual information of the image to improve the ability of the network to capture semantic features of the water body. MU-Net exhibits higher pixel precision and accurate positional information in identifying water bodies. However, it is important to note that the data annotation process for this method is very laborious and time-consuming, while semi-supervised learning can achieve similar performance using only a small amount of labeled data. Li et al. [
28] constructed a robust water extraction network called PA-UNet by introducing attention block and pyramid module into the UNet. When applying this method to Sentinel-1 data for water extraction in Dongting Lake and Poyang Lake, the results showed that the proposed framework can realize high-precision, highly efficient automatic water extraction. However, in cold-weather areas, rivers will freeze in winter, and the backscattering intensity of rivers in SAR image is high, so SAR images are not suitable for water extraction during the river icing period.
Most existing studies applying the UNet network for water extraction from remote sensing images focused on clear water bodies in small regional rivers and lakes. However, they do not consider water margins with sediment accumulation and turbid water bodies, such as beach islands and sea inlets. Additionally, the repeated use of maximum pooling operations in the UNet architecture can result in resolution reduction and potential loss of spatial information [
29].
Due to the problem that the traditional UNet network loses spatial information due to multiple convolutions and pooling and is not effective in extracting sandy shores and silty shores, this research modifies the structure of the UNet neural network and applies it to the extraction of turbid water bodies in remote sensing images to mitigate the loss of spatial information caused by pooling operations and to enhance extraction accuracy. Highly precise extraction of water bodies is achieved by enhancing low-dimensional feature information. The improved UNet model incorporates the two-layer feature mapping from the encoding path with the corresponding decoding path, effectively capturing fine-grained details. The dynamic changes in water bodies in Chongming Island and its surrounding areas from 2013 to 2022 are herein extracted and analyzed using this enhanced UNet model.