Detection of Black and Odorous Water in Gaofen-2 Remote Sensing Images Using the Modified DeepLabv3+ Model

Huang, Jianjun; Xu, Jindong; Yan, Weiqing; Wu, Peng; Xing, Haihua

doi:10.3390/su16010092

Open AccessArticle

Detection of Black and Odorous Water in Gaofen-2 Remote Sensing Images Using the Modified DeepLabv3+ Model

¹

School of Computer and Control Engineering, Yantai University, Yantai 264005, China

²

School of Information Science and Engineering, University of Jinan, Jinan 250024, China

³

School of Information Science and Technology, Hainan Normal University, Haikou 571158, China

^*

Author to whom correspondence should be addressed.

Sustainability 2024, 16(1), 92; https://doi.org/10.3390/su16010092

Submission received: 3 November 2023 / Revised: 5 December 2023 / Accepted: 15 December 2023 / Published: 21 December 2023

(This article belongs to the Special Issue Remote Sensing and Image Processing in Environmental Field)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The detection of black and odorous water using remote sensing technology has become an effective method. The high-resolution remote sensing images can extract target features better than low-resolution images. However, the high-resolution images often introduce complex background details and intricate textures, which often have problems with accurate feature extraction. In this paper, based on remote sensing images acquired by the Gaofen-2 satellite, we proposed a Modified DeepLabv3+ model to detect black and odorous water. To reduce the complexity of the encoder part of the model, Modified Deeplabv3+ incorporates a lightweight MobileNetV2 network. A convolutional attention module was introduced to improve the focus on the features of black and odorous water. Then, a fuzzy block was crafted to reduce the uncertainty of the raw data. Additionally, a new loss function was formulated to solve the problem of category imbalance. A series of experiments were conducted on both remote sensing images for the black and odorous water detection (RSBD) dataset and the water pollution dataset, demonstrating that the Modified DeepLabv3+ model outperforms other commonly used semantic segmentation networks. It effectively captures detailed information and reduces image segmentation errors. In addition, in order to better identify black and odorous water and enrich the spectral information of the image, we have generated derived bands using the black and odorous water index. These derived bands were fused together with the original image to construct the RSBD-II dataset. The experimental results show that adding a black and odorous water feature index can achieve a better detection effect.

Keywords:

black and odorous water; remote sensing images; DeepLabv3+ model; fuzzy logic; attention module

1. Introduction

Black and odorous water (BOW) has two manifestations, abnormal watercolor and an unpleasant odor, which may be caused by a variety of different biochemical reactions [1]. Under the influence of rapid urbanization and industrialization, sediments appearing in water bodies release heavy metal ions Fe²⁺ and Mn²⁺ which react with organic matter containing sulfur to form compounds such as ferrous metal sulfides (FeS, MnS), leading to the blackening of water bodies. BOW emits an odor usually caused by factors such as the decomposition of organic matter, contaminants, and a lack of oxygen. Over time, phenomena such as the blackening and odorization of water bodies become gradually apparent [2,3,4]. BOW is not only harmful to aquatic organisms but also has a serious impact on human health and socio-economic development. Accurate detection of BOW is especially important for managing the water body. Currently, in China, the issue of BOW is becoming increasingly prevalent and observable in most cities [5,6,7]. Consequently, the government has extended substantial support toward the treatment of BOW, considering it a prominent integral component for enhancing urban environments.

Since the 1930s, scholars have been studying BOW [8], primarily focusing on the mechanisms of their formation [9], the causes of BOW [10,11], black and odor indices and models [12], as well as the management of BOW [13]. Traditional BOW monitoring methods mainly rely on fieldwork, which involves collecting samples and conducting on-site observations. However, this approach has limitations in terms of covering large monitoring areas. It is not only time-consuming and labor-intensive but also prone to omission and misidentification [14,15]. Compared to traditional methods, remote sensing technology has advantages such as low cost, high efficiency, and strong spatial continuity, and has gradually become an effective technique for extracting BOW features. Through remote sensing technology, we can obtain extensive water body information, providing new technical means for the identification and treatment of urban BOW [16,17,18].

With the continuous development of high-resolution remote sensing satellites, many research scholars have conducted BOW monitoring, classification, and identification experiments by means of remote sensing [19,20]. Duan et al. [2] analyzed the optical properties of black and normal water masses in Taihu Lake using remote sensing data. Kuster et al. [21] used remote sensing to monitor extremely dark lakes at high latitudes and in the Arctic region. In addition, many experts and scholars analyzed the spectral features of BOW and constructed various remote sensing identification algorithms by selecting the characteristic bands and combinations [22,23]. Between the green band and the red band, generally, the water body changes faster, while the BOW changes less obviously. Yao et al. [24] proposed an improved black and odorous water index (BOI) model by analyzing the spectral and water quality parameters of water bodies. Wen et al. [25] constructed a chromaticity-based algorithm and a normalized difference black—odorous water index (NDBWI) algorithm for identifying urban BOW. However, threshold-based methods require constant adjustment of the thresholds and have poor applicability. Additionally, these methods cannot fully utilize the spatial contextual information of the images, resulting in the loss of useful image information. Some scholars have also proposed using the unique chromaticity index of BOW for identification [26], but the color of BOW is not fixed due to external conditions, and the use of the chromaticity index method is prone to misclassification. Other scholars analyzed the perspective of the unique odor of BOW based on human sensory experiences to divide the water into multiple levels [27,28]. However, due to the subjective influences of individual olfaction, this method has not yet established a uniform standard for assessment. These methods artificially extract the underlying features of BOW, and they neglect the surrounding environment, resulting in weak generalization capabilities. Therefore, more high-performance techniques are urgently required to improve the efficiency and accuracy of BOW extraction.

Nowadays, deep learning, especially convolutional neural networks (CNN) is widely used in semantic segmentation and has become one of the mainstream methods to solve the semantic segmentation problem [29,30]. Badrinarayanan et al. [31] proposed the SegNet model, which utilizes an encoder–decoder architecture. The encoder is responsible for extracting image features, while the decoder is used to restore the feature map to the original image size and classify each pixel. Ronneberger et al. [32] proposed the U-Net model, which has a U-shaped network structure and also uses an encoder–decoder structure. It introduces a jump connection section to retain more low-level and high-level semantic feature information. Li et al. [33] proposed a multi-attention network called MANet, which combines multiple effective attention modules to extract contextual information of the target features and achieves excellent performance in segmentation tasks. Google has proposed the DeepLab series of models [34,35,36,37], among which the DeepLabv3+ model is commonly used for remote sensing image classification. Built on the foundation of atrous spatial pyramid pooling (ASPP) and dilated convolutions, the DeepLabv3+ model constructs an encoder–decoder structure that fuses low and high features, enhancing the segmentation accuracy of objects. Zhang et al. [38] proposed the Ice-DeepLab network model which improves the segmentation accuracy of ice features at different scales by combining attention modules and an enhanced decoder structure. However, the DeepLabv3+ network has large memory overhead and slow computational speed. The multi-scale connection of feature maps during the up-sampled process is not thorough enough, leading to insufficient refinement in the segmentation of small objects.

CNN, as a remarkable deep learning model, is gradually used to detect BOW. Based on Beijing No. 2 data, wang et al. [39] used a CNN model to detect BOW with an overall accuracy of about 90%. Shao et al. [40] introduced an attention mechanism module into the CNN model for extracting information on BOW. Zheng et al. [41] proposed the fully convolutional adversarial network (FANet) which introduces a larger receptive field and adversarial learning to achieve end-to-end pixel segmentation for BOW. However, the formation mechanism of BOW is complex, and its morphology, color, texture, and other features may have diversity and variability. In addition, remote sensing data are prone to loss of information, resulting in imagery that often exhibits uncertainty. The uncertainty of remote sensing images poses a great challenge to segmentation BOW tasks. Research shows that fuzzy logic can effectively deal with uncertain and imprecise information and reduce the ambiguity in images [42,43]. It represents the uncertainty degree of each pixel by membership function and reduces the blurriness of the image by multiplying the uncertainty map with the input image. Price et al. [44] integrated fuzzy strategies into deep learning architectures to exploit the powerful aggregation of fuzzy logic. Ma et al. [45] introduced a conditional random field module into a fuzzy deep learning network for the segmentation of high-resolution remote sensing images. Zhao et al. [46] combined fuzzy units with traditional convolutional units to solve the inherent uncertainty of remote sensing images and achieve better boundary segmentation. In most of these studies, the logical operator “AND” is applied to the membership function, and other operators are not explored. Nan et al. [47] exploited attentional mechanisms and fuzzy logic for airway segmentation. A channel-specific fuzzy attention layer was proposed to solve the problem of heterogeneity of features in different channels. Therefore, we developed a new deep learning model with fuzzy logic for BOW extraction.

In summary, we propose the Modified DeepLabv3+ model with fuzzy logic. Among the improvements are the following: (1) using the lightweight MobileNetV2 network to improve the efficiency of model detection; (2) introducing an attention module to focus on target features; (3) integrating deep learning with fuzzy logic to reduce the uncertainty in the raw data; (4) adding multiple jump-connected sections to integrate more high feature information; (5) designing an improved cross-entropy loss function to solve the imbalance between BOW features and other features. Compared to previous methods, Modified DeepLabv3+ performed best on different datasets and had the lowest number of parameters. In addition, to address the issue of similar features between BOW and non-BOW in remote sensing images, we selected the BOI index and NDBWI index to perform band calculations on the images, generating derived bands. Subsequently, these derived bands were integrated into the original remote sensing images, creating the RSBD-II dataset, which includes the RSBD dataset [48], RSBD_BOI dataset, and RSBD_NDBWI dataset. The experimental results on the RSBD-II dataset show that fusing NDBWI features can better detect the target features and obtain higher segmentation accuracy. The combination of deep learning and water quality parameters contributes to improving the efficiency and accuracy of detection, providing a powerful tool for better protecting water resources and enhancing water quality.

The rest of this paper is structured as follows. The structure of the proposed model is described in Section 2. Section 3 describes the used datasets, experimental details, and evaluation metrics. Section 4 presents the experimental results to demonstrate the validity of the model. The discussion of these results is presented in Section 5. Finally, the conclusions are described in Section 6.

2. Methodology

2.1. Original DeepLabv3+ Model

DeepLabv3+ [37] is one of the most commonly used architectures for semantic segmentation tasks which consists of an encoder–decoder structure, a deep convolutional neural network, and an ASPP module. The deep convolutional neural network performs feature extraction using ResNet [49] or Xception [50] to obtain high-level semantic features and low-level semantic features. High-level semantic features go into the ASPP module and go through four atrous convolutional layers with different dilation rates and one pooling layer, respectively, to obtain five feature maps. Multi-scale information is obtained by fusing the output of five feature maps. Low-level semantic feature information is first processed through a

1 \times 1

convolution operation and then fused with multi-scale features that have undergone four-fold bilinear up-sampled operation. Finally, the fusion features are refined through a

3 \times 3

convolution to optimize the target edge segmentation effect, followed by a four-fold bilinear up-sampled operation to restore the size. The DeepLabv3+ model has gained significant popularity and recognition in the computer vision field due to its remarkable performance. It has been extensively applied to a variety of tasks, including water body recognition, flood extraction, and sea ice segmentation [38,51,52].

2.2. Architecture of the Modified DeepLabv3+ Model

In the field of BOW image segmentation, a significant challenge lies in the diversity of water body shapes in different image scenes and their distribution in narrow areas. The ASPP module of the DeepLabv3+ model can capture multi-scale targets and is therefore suitable for extracting BOW features of different sizes. However, the original decoder branch of the model only utilizes

1 / 4

of the feature maps to refine the segmentation results during the processing, failing to fully leverage other crucial feature information. Additionally, the depth convolutional neural network in the encoder part also introduces a considerable computational burden. Furthermore, remote sensing images often contain complex mixtures of elements and noise, adding extra uncertainty to the water body segmentation process. It is noteworthy that the DeepLabv3 model does not address these issues. Therefore, this paper proposes the Modified DeepLabv3+ model. Built upon the foundation of the original DeepLabv3+, this model undergoes a series of innovative improvements aimed at more efficiently addressing the segmentation challenges of BOW while significantly reducing computational overhead.

Figure 1 illustrates the architecture of Modified DeepLabv3+ for BOW segmentation. As shown in Figure 1, the network maintains the encoder–decoder structure of the original DeepLabv3+ model, where the main modifications are highlighted with red dashed lines. The encoder part uses the lightweight MobileNetV2 network [53] as the backbone network to obtain high-level semantic features and low-level semantic features. It reduces the number of parameters and speeds up the convergence while maintaining accuracy, adding the convolutional attention module (CBAM) [54] before ASPP to capture channel and spatial feature information. Given the uncertainty and complexity of water body information, the fuzzy block is specifically designed. This block effectively reduces data uncertainty and significantly improves the accuracy of segmenting small targets and boundary regions. The decoder part consists of up-sampled and convolution modules. To enhance the fusion of high-level semantic feature information with low-level semantic feature information, we incorporated an additional jump connection module. This module enabled us to acquire more precise and distinct boundary information.

2.3. Convolutional Attention Module

To address the problem of poor segmentation caused by tiny shape BOW, CBAM was introduced to optimize and improve the network. CBAM can highlight important feature information and suppress redundant feature information, thus improving the overall learning ability and generalization of the model. It contains a channel attention module (CAM) and a spatial attention module (SAM). CAM enables the model to adaptively select which channels are more important for a particular task by weighting the channel dimensions of the feature map. This improves the quality and robustness of the feature representation. SAM enables the model to adaptively select which spatial locations are more important for a particular task by weighting the spatial dimensions of the feature map. As shown in Figure 2, CBAM first applies CAM to the input feature map and then multiplies the output of CAM with the original feature map to obtain a weighted feature map focusing on channels. Subsequently, SAM takes this weighted feature map as input and applies spatial attention operations to produce the final feature map.

2.4. Fuzzy Block

Due to the influence of complex natural environments interacting with remote sensing waveforms, remote sensing images are prone to lose some information when it is imaged. This causes the uncertainty of remote sensing images. Two aspects are mainly reflected: similar features have different grayscale values and different types of features have similar grayscale values, i.e., intra-class heterogeneity and inter-class similarity. On the other hand, a certain image pixel may contain different types of ground features [42]. Deep learning is a machine learning model that can automatically learn features and patterns, and it can handle large amounts of data and complex tasks. By combining deep learning and fuzzy logic together to build hybrid models, the models can handle both fuzziness and complex features, thus improving the accuracy and robustness of machine learning [55]. Fuzzy logic is utilized to generate fuzzy representations with the goal of reducing uncertainty in the original data. In this case, the Gaussian membership function (GMF) has the characteristics of symmetry and flexibility to represent the data more effectively. Research has shown that combining fuzzy logic and deep learning is effective in data representation [56,57,58,59]. Therefore, we combined fuzzy logic and deep learning to propose fuzzy blocks using trainable GMF. This integration aims to assist segmentation networks in focusing on pertinent regions while reducing uncertainty and variability in data representation.

The structure of the fuzzy block is shown in Figure 3, assuming that

F \in R^{C \times H \times W}

is the input feature map, where

H

and

W

are the height and width of the feature map, respectively, and

C

is the number of channels. As a result of the smoothness and simplicity of GMF, a learnable GMF is posed to specify fuzzy sets. The memberships are leveraged to represent the uncertainty in the image, filtering by

m

GMFs for each feature point

α \in R^{H \times W}

in a particular channel. The formula for GMF is as follows:

f_{i, j} (X, μ, σ) = e^{\frac{{- (X_{j} - μ_{i, j})}^{2}}{{2 σ}_{i, j}^{2}}}

(1)

where

i = 1, \dots, m

,

j = 1, \dots, C

, and

X_{j}

denote the coordinates of feature point

α

in

j

th channel, and

μ_{i, j}

and

σ_{i, j}

represent the mean and variance of the

i

th GMF in channel

j

, respectively. In order to make the membership function learn the importance of the target feature representation, the commonly used operator ‘AND’ is changed to ‘OR’. The function suppresses irrelevant features by applying ‘OR’ to all memberships of feature point

α

. There are two fuzzy sets

\bar{A}

and

\bar{B}

, as in the following equation:

f_{\bar{A} ⋃ \bar{B}} (y) = f_{\bar{A}} (y) ⋁ f_{\bar{B}} (y), \forall y \in U

(2)

where

U

is the whole of the message, and

y

is an element of

U

. In order to leverage the logical operator ‘OR’, it is modified as follows:

f_{\bar{A} ⋃ \bar{B}} (y) = {m a x (f}_{\bar{A}} (y), f_{\bar{B}} (y))

(3)

Therefore, according to Equations (3) and (4), the membership

f_{j} (X, μ, σ)

of the

j

th channel can be obtained, as shown in equation:

f_{j} (X, μ, σ) = ⋁_{i = 1}^{m} e^{\frac{{- (X_{j} - μ_{i, j})}^{2}}{{2 σ}_{i, j}^{2}}} = m a x (e^{\frac{{- (X_{j} - μ_{i, j})}^{2}}{{2 σ}_{i, j}^{2}}})

(4)

where

⋁

denotes the ‘OR’ operation.

2.5. Multi-Layer Skip Connections

The original DeepLabv3+ model has two parts of feature information fused in the decoder branch. One part is the multi-scale feature information obtained after the ASPP module. The other part is the low-level semantic feature information obtained after the backbone network. There is only one connection between the encoder and the decoder. This neglects other high-feature information, making it challenging to extract small-scale BOW features and obtain clear boundary information, ultimately resulting in an unclear object boundary in the final segmentation map. Between the encoder and decoder sections, we added three layers with 24, 64, and 160 channels of feature maps, respectively. Then, feature mapping fusion was performed by a

1 \times 1

convolution operation. The information between the encoder and decoder was more closely connected, which increases the clarity and feature details of the BOW contour information. Finally, we recovered the original feature information by a

3 \times 3

convolution and four-fold bilinear up-sampled operation.

2.6. Loss Function

The traditional cross-entropy loss function assigns the same weight to different classes. When there is a problem of class imbalance, the training process tends to be biased toward classes with more pixels, making it difficult to effectively learn the features of classes with fewer objects. This results in the network predicting the rich classes more frequently than the scarce classes [60]. Therefore, we designed the median balancing loss (MBL) function, which assigns a weight to each class based on their median frequency in the training set. This means that classes with fewer samples will have higher weights, while classes with more samples will have lower weights. By using the MBL function, the Modified DeepLabv3+ model can effectively address the problem caused by class imbalance and improve its ability to learn features from minority classes. The weights of each category are calculated using the median frequency balancing (MFB) method [61]. Let the set

P = {1,2, \dots, C}

denote the set of

C

categories, and let

f_{i}

denote the frequency of the

i

th category, which represents the ratio of the number of pixels in the

i

th category to the sum of all pixels. The median of the category frequencies in set

C

is calculated by

M F B (f_{i})

, and the weight of category

i

is shown in the following equation:

w_{i} = \frac{M F B ({f_{1}, f_{2}, \dots f_{c}})}{f_{i}}

(5)

The MBL function is shown in the following equation:

L = - \sum_{i = 1}^{C} w_{i} \times x_{i} \times \log_{10} (s (x_{i}))

(6)

where

L

denotes the MBL function,

w_{i}

is the weight of the

i

th class calculated by the MFB method,

x_{i}

denotes the target label of the

i

th class, and

s (x_{i})

denotes the output of softmax on the corresponding dimension of the

i

th class.

3. Experiment Setup

3.1. Dataset Description

To validate the performance and generalization of the Modified DeepLabv3+, a series of experiments were conducted on the RSBD dataset and water pollution dataset, respectively.

3.1.1. RSBD Dataset

Currently, there is no suitable BOW dataset of high-resolution images in the Yellow Bohai Sea region. Since Yantai city has a dense river network and the BOW phenomenon is serious, the RSBD dataset was constructed in [48] to detect BOW, with Laishan and Muping districts in Yantai City as the study area. We selected 10 Gaofen-2 (GF-2) remote sensing images as data sources. The GF-2 satellite carried two high-resolution cameras, the panchromatic camera with a resolution of 1 m and the multi-spectral camera with a resolution of 4 m. It had sub-meter spatial resolution and more accurate positioning capability, which can better detect the location of BOW [62,63,64]. The GF-2 satellite images can be downloaded from the China Resources Satellite Data Application Center (https://www.cresda.com, accessed on 3 November 2023).

The RSBD dataset, focusing on the area of Yantai, China, consists of 1645 annotated images with the size of

256 \times 256

pixels acquired by the GF-2. Each image contains three channels of red–green–blue. Here, 70% of them were randomly assigned to the training set and 30% were assigned to the testing set, which are used for model training and testing, respectively. Finally, 1155 training images and 490 testing images were obtained in two categories: the BOW and the others category. Figure 4 shows a partial example of the RSBD dataset.

3.1.2. Water Pollution Dataset

The water pollution dataset (https://aistudio.baidu.com/datasetdetail/42522, accessed on 3 November 2023) originally contained 226 images with two categories: the polluted water category and the others category. Due to the limitation of GPU memory, the image size was set to

256 \times 256

pixels. Through data enhancement operations, including brightness enhancement, color enhancement, contrast enhancement, horizontal flip, and vertical flip, 1356 images were finally obtained. For the purpose of model training and evaluation, we selected 1084 training images, while the remaining 272 images were allocated for testing. Figure 5 shows a partial example of the water pollution dataset.

3.2. Implementation Details

The experiments were conducted on the Pytorch framework with a Linux operating system, CUDA version 11.6, Python 3.8.13, and a graphics card model NVIDIA GeForce RTX 3090. The learning rate was established as

10^{- 4}

, Adam was chosen as the optimizer, a batch size of 4 was utilized, and the values for weight decay, momentum, and power were set at

2 e^{- 5}

, 0.99 and 0.9, respectively. Following the training of 1000 epochs, a gradual convergence of the loss value was observed, and the iteration count was fixed at 1000.

3.3. Evaluation Metrics

Four evaluation metrics were adopted to assess the performance of the proposed model, including F₁-score, mean intersection over union (MIoU), overall accuracy (OA), and water intersection over union (WIoU). F₁-score combines precision and recall, with the result of summing these two values and averaging them. Precision is described as the percentage of samples that the model predicts as positive categories that are correctly predicted as positive categories. Recall is described as the percentage of samples with actual positive categories that are successfully predicted as positive by the model. MIoU denotes the mean value of intersection and union for each category. OA denotes the percentage of all correctly classified pixels compared to the total pixels. WIoU denotes the percentage of intersection and concatenation of predicted and actual water body values, and their equations are shown as follows:

F_{1} - s c o r e = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(7)

P r e c i s i o n = \frac{T P}{T P + F P}

(8)

R e c a l l = \frac{T P}{T P + F N}

(9)

M I o U = \frac{1}{k + 1} \sum_{i = 0}^{k} \frac{T P}{T P + F P + F N}

(10)

O A = \frac{T P + T N}{T P + T N + F P + F N}

(11)

W I o U = \frac{T P}{T P + F P + F N}

(12)

where

k

denotes the class of the label, and the confusion matrix between the BOW and others is calculated, including true positive (TP), true negative (TN), false positive (FP), and false negative (FN).

4. Experiment Results

To evaluate the performance of Modified DeepLabv3+, experiments were conducted on the RSBD dataset and the water pollution dataset. We chose four common semantic segmentation models: SegNet [31], U-Net [32], MANet [33], and DeepLabv3+ [37] to conduct the comparison experiments. SegNet and U-Net are more common semantic segmentation models, usually used for remote sensing image segmentation. MANet introduces multiple attention modules to extract target characterization information. Our model is improved based on DeepLabv3+ by combining the fuzzy logic to improve the accuracy of segmentation. Meanwhile, the Modified DeepLabv3+ model incorporates several key improvements, including the CBAM, the fuzzy block, multi-layer skip connections, and the MBL function. To evaluate the effectiveness of these improvements, a series of ablation experiments were designed. All of these models were tested using the same hyperparameters, training set, and test set. In addition, to further enhance the detection performance of BOW, a new dataset was created by fusing feature indices from different bands. Experiments were conducted on this new dataset to verify the effectiveness of the fused features.

4.1. Comparative Experiments

The quantitative results of the RSBD dataset are shown in Table 1. Modified DeepLabv3+ outperformed the other four segmentation models on F₁-score, MIoU, and WIoU. SegNet performed the worst. The U-Net model fused high features and low features to solve the feature loss problem, but the prediction results still had the problem of classification errors. MANet used multiple attention mechanisms to improve the attention to the BOW region. The DeepLabv3+ model used different dilation rates to obtain more contextual information, and its performance metrics were significantly better than MANet. Modified DeepLabv3+ was built on the basis of the DeepLabv3+ model, and the F₁-score, MIoU, and WIoU were improved by 3.09%, 1.96%, and 4.07%, respectively. It was proven that the improvement in DeepLabv3+ increased the recognition accuracy of BOW. However, Modified DeepLabv3+ showed a 0.11% decrease in accuracy compared to DeepLabv3+. This is due to the fact that the Modified DeepLabv3+ model was better at recognizing a minority category, but this may have come at the expense of some correct predictions for the majority of the categories, which slightly reduces the overall accuracy. F₁-score integrated precision and recall and was more suitable for evaluating the performance of the model on unbalanced data sets, where the F₁-score achieved 78.47%. The Modified DeepLabv3+ model had a smaller number of parameters compared to the DeepLabV3+ model, only 23.3 MB. This was due to the use of MobilNetv2 network for its backbone network, which significantly reduced the number of model parameters.

The qualitative results are shown in Figure 6. SegNet had many obvious mis-segmentations, lacking much important information. U-Net predicted the edges of BOW but was prone to misclassify other pixels as BOW pixels. MANet and DeepLabv3+ were able to segment a more complete region, but poorly identified regions with unclear edge information. Modified DeepLabv3+ achieved some improvement in the BOW segmentation effect, closer to the true value, rarely dividing other pixels into BOW, and the edge detail part was also better segmented.

The quantitative results of the water pollution dataset are shown in Table 2. Compared with SegNet, U-Net, MANet, and DeepLabv3+, the F₁-score of the Modified DeepLabv3+ model improved by 3.93%, 3.90%, 1.81%, and 1.05%, respectively. MIoU improved by 3.53%, 3.49%, 1.61%, and 0.90%, respectively. OA was significantly improved by 0.60%, 0.57%, 0.20%, and 0.07%, respectively. WIoU was significantly improved by 6.44%, 6.39%, 3.02%, and 1.77%, respectively. In addition, compared with DeepLabv3+, the number of parameters change the most, with a reduction of 202.7 MB, proving that our proposed Modified DeepLabv3+ model greatly reduces the size of the model while maintaining accuracy.

The qualitative results of the experiments are shown in Figure 7. SegNet, U-Net, MANet, and DeepLabv3+ perform poorly in segmenting polluted water bodies, as they struggle to accurately represent water body types and misclassify polluted water as other categories. In contrast, the Modified DeepLabv3+ model can more accurately capture the position and shape of polluted water bodies, resulting in clearer boundaries between polluted water and the background. As seen in the first row, SegNet, U-Net, MANet, and DeepLabv3+ all fail to fully segment polluted water bodies, leading to instances of omission. However, the Modified DeepLabv3+ model can precisely identify the edge information of water bodies, making more accurate predictions. Comparing the second, third, and fourth rows, the Modified DeepLabv3+ model outperforms SegNet, U-Net, MANet, and DeepLabv3+ in terms of segmentation performance. As shown in the fifth row, SegNet and U-Net exhibit the poorest segmentation results, with occurrences of misclassification and discontinuous water body segmentation.

4.2. Ablation Experiments

We comprehensively analyzed the effectiveness of the DeepLabv3+ improvement section on the model. Extensive experimental validations and quantitative comparisons were conducted to validate the contribution of CBAM, the fuzzy block, and multi-layer skip connections to the BOW segmentation results. CBAM was placed after the ASPP module with the aim of enhancing the ability of the Modified DeepLabv3+ model to focus on target feature information. The fuzzy block was designed to remove the ambiguity and uncertainty of edge information. The multi-layer skip connections integrated more high-level semantic feature information. The ablation experiments are shown in Table 3. After removing CBAM, F₁-score, MIoU, OA, and WIoU decreased by 1.63%, 1.19%, 0.22%, and 2.16%, respectively, which is because the model easily identified the background as BOW. After removing the fuzzy block, the uncertainty between the edges led to a decrease in segmentation accuracy, WIoU decreased more significantly by 2.46%, and also F₁-score, MIoU, and OA decreased by 1.85%, 1.33%, and 0.21%, respectively. The number of parameters decreased the most, by 0.04 MB, which means that the fuzzy block introduces some parameters while improving the performance, but at an acceptable cost. After removing the multi-layer skip connections, F₁-score, MIoU, OA and WIoU were reduced by 0.59%, 0.40%, 0.02%, and 0.79%, respectively. Similar to the results with the RSBD dataset, the ablation experiments in the water pollution dataset also demonstrated improved performance in terms of F₁-Score, MIoU, OA, and WIOU metrics, with no significant change in the number of parameters. The ablation experiments demonstrate that CBAM, the fuzzy block, and multi-layer skip connections are effective for polluted water segmentation, in addition to these improvements not resulting in too many parameters.

To provide a more intuitive verification of the effectiveness of the proposed modules, ablation experiments were conducted in the RSBD dataset and the water pollution dataset. As shown in Figure 8 and Figure 9, when CBAM is removed, the model loses a significant amount of water information, leading to a decrease in segmentation accuracy. Removing the fuzzy block makes the boundary information of other classes and water less clear. If the multi-level skip connections are removed, some low-level semantic feature information is lost, which can lead to discontinuous information on the water body. The ablation experiment results of the RSBD dataset and the water pollution dataset indicate that the proposed modules play a crucial role in the model and are essential for preserving critical water boundary information.

In addition, to verify the impact of the loss function proposed in this paper on the model performance, we conducted comparative experiments on the traditional cross-entropy loss function and the MBL function. We defined

l_{1}

as the traditional cross-entropy loss function and

l_{2}

as the MBL function. The experimental results are shown in Table 4. Modified DeepLabv3+ obtained better results using the MBL function. This function set the corresponding weight parameters for each class and balanced the positive and negative samples, leading to more accurate BOW detection. As shown in Table 4, F₁-score, MioU, and WIoU improved by 1.79%, 1.12%, and 2.38%, respectively, over the cross-entropy loss function.

Overall, both the quantitative and qualitative results of the comparison and ablation experiments show that the improvement in the DeepLab3+ model can effectively improve the extraction accuracy of BOW.

4.3. Experiments of RSBD-II Dataset

By analyzing the differences in the spectral curve characteristics between general water bodies and BOW, BOW can be effectively detected. Compared to general water bodies, whose reflectance values change rapidly, the reflectance values of BOW change less significantly. In GF-2 imagery, the green and red bands corresponding to this spectral range are well suited to capturing the spectral characteristics of BOW [20,65]. According to this difference, Yao et al. [24] selected the reflection difference between the green and red bands as the numerator, and used the sum of the red, green, and blue bands as the denominator to propose the BOI index. The formula is as follows:

B O I = \frac{R_{r s} (G) - R_{r s} (R)}{R_{r s} (R) + R_{r s} (G) + R_{r s} (B)}

(13)

where

R_{r s} (G)

,

R_{r s} (R)

, and

R_{r s} (B)

denote the remote sensing reflectance of the GF-2 image in the green band, red band, and blue band, respectively. Wen et al. [25] proposed the NDBWI index by analyzing the spectral characteristics of BOW. The formula is as follows:

N D B W I = \frac{R_{r s} (G) - R_{r s} (R)}{R_{r s} (G) + R_{r s} (R)}

(14)

In order to enrich the waveband information of remote sensing images and verify the generalization ability of the Modified DeepLabv3+ model to identify BOW, the characteristic waveband indices were used to expand the RSBD dataset. The features obtained from the BOI index and the NDBWI index can be used as supplementary information for the sample set. Consequently, we constructed the RSBD-II dataset, which includes the original RSBD dataset, the RSBD_BOI dataset, and the RSBD_NDBWI dataset. The RSBD_BOI dataset comprises the red band, green band, blue band, and BOI band, while the RSBD_NDBWI dataset consists of the red band, green band, blue band, and NDBWI band. This expansion of the dataset provides additional information for analysis and research purposes. We trained with SegNet, U-Net, MANet, DeepLabv3+, and Modified DeepLab3+ in the RSBD-II dataset, respectively, to compare and analyze the results of BOW remote sensing information.

Table 5 shows the experimental results of different networks on the RSBD_BOI dataset. As shown in Table 5, SegNet showed the lowest OA value; while both U-Net and MANet were more effective than SegNet, there were still some misclassifications. Modified DeepLabv3+ reached the highest value in all metrics, and compared to DeepLabv3+, F₁-score improved by 1.61%, MIoU improved by 1.13%, OA improved by 0.09%, and WIoU improved by 2.19%. The experimental results of the RSBD_NDBWI dataset are represented in Table 6. Modified DeepLabv3+ reached the highest among all evaluation metrics: 80.79% for F₁-score, 82.98% for MIoU, 98.25% for OA, and 67.77% for WIoU.

Figure 10 demonstrates the F₁-score, MIoU, OA, and WIoU values of the RSBD-II datasets on each model. The Modified DeepLabv3+ model achieved the best results on all three datasets, validating the generalization capability of our proposed model. The extraction results of the dataset with input band images outperformed the original dataset, which was especially evident with WIoU. Compared to the RSBD dataset, the WIoU in the RSBD_BOI dataset was increased by 1.59%, and the WIoU in the RSBD_NDBWI dataset was even increased by 3.21%. The results showed that the increase in BOW index features had a significant effect on the improvement in Modified DeepLabv3+ segmentation accuracy. The OA value in the RSBD_BOI dataset was slightly lower than that in the RSBD dataset; however, the OA value in the RSBD_NDBWI dataset was 98.25%, which was 0.74% higher than that in the original dataset. This indicated that the increase in NDBWI features was more helpful for the network to extract BOW.

5. Discussion

Through four evaluation metrics and visual comparisons, the extraction accuracy of water bodies in the test dataset was assessed. The results indicate that the proposed Modified DeepLabv3+ model outperforms SegNet, U-Net, MANet, and DeepLabv3+ models in segmentation effectiveness. Different performances may be related to the structures of these CNNs. SegNet, during down-sampling, extracts main features by reducing the image size, leading to the loss of some detailed information. The subsequent up-sampled process aims to restore resolution, but the previously lost details cannot be fully recovered, possibly causing difficulties in identifying subtle features of water bodies. U-Net combines too many low-level features extracted by shallow convolutional layers, and these low-level features may lead to the incorrect identification of noise with similar spectral features to water bodies. MANet introduces multiple attention modules, increases computational overhead, and cannot address the uncertainty issues in remote sensing images. DeepLabv3+ uses an ASPP pyramid to extract multi-scale features and a decoder to restore the resolution of feature maps. However, in this study, DeepLabv3+ performed poorly, possibly due to its complex structure. While it may be suitable for pixel-level segmentation in complex scenes, it tends to overfit in water body extraction. To address these issues, we designed the Modified DeepLabv3+ model. This model adopts the lightweight MobileNetV2 network as the backbone, reducing parameters and accelerating convergence. Channel and spatial features are captured by introducing CBAM, and the uncertainty and complexity of remote sensing information are addressed by increasing blur blocks. A new loss function was designed to tackle the issue of imbalanced sample data, and additional skip connections were added to obtain more deep-level feature information. Specifically, with the RSBD dataset, the Modified DeepLabv3+ model outperformed SegNet by 6.50% in F₁-score, U-Net by 5.96%, MANet by 4.95%, and DeepLabv3+ by 3.09%. However, compared to DeepLabv3+, the OA of Modified DeepLabv3+ decreased by 0.11%. This is because the Modified DeepLabv3+ model focuses more on the extraction of minority classes (BOW class), potentially sacrificing some correct predictions of the majority classes (background class), slightly reducing the overall accuracy. When dealing with complex image segmentation tasks, this trade-off becomes particularly pronounced, emphasizing the crucial need for balancing and coordinating considerations in model adjustment strategies. In summary, the slight decrease in OA highlights a key challenge in model optimization: how to enhance recognition capability for specific categories while preserving or improving OA for other categories.

As shown in Figure 11, testing experiments on untrained data demonstrate that the Modified DeepLabv3+ model performs well in visually clear scenarios. Our consistent predictions on samples from various times validate its robust generalization ability in processing untrained data. Future research can consider expanding the dataset range, including data from various satellites, and extending its coverage to other provinces and cities to enhance the model’s stability in identifying different regions and image types. Additionally, our RSBD-II dataset directly integrates BOI and NDBWI band information. Compared to images containing only RGB information, those incorporating BOW bands provide additional useful information, enabling more accurate water body segmentation. In future work, we can explore building new deep learning models from the perspective of inherent optical properties and water quality parameters to improve the identification accuracy of BOW. These improvements will aid the model’s performance in practical applications, especially in identifying water body characteristics in different regions and image data across various times.

6. Conclusions

BOW is mostly water channels and rivers with small shapes, and it is difficult to correctly distinguish between BOW and non-BOW using traditional recognition methods. To improve the recognition accuracy, we proposed an improved BOW detection model called Modified DeepLabv3+. The model adopts CBAM to improve the model’s focus on important features, integrates deep learning with fuzzy logic to deal with the uncertainty of edge information, and combines multiple skip connections to retain high-level semantic feature information. And the MBL function is designed to solve the imbalance problem between the BOW class and other classes. We conducted a comparative analysis of Modified DeepLabv3+ on the proposed RSBD dataset and the water pollution dataset. The experimental results showed that the Modified DeepLabv3+ model outperforms the common other models in terms of segmentation accuracy and has better performance in detail segmentation. This validates the necessity of the improved part in the proposed model for obtaining the best BOW extraction results. In addition, to further enhance the model’s BOW detection capabilities, we incorporated the BOI band and NDBWI band as inputs. The experiment results showed that the image fused with the NDBWI band had a better extraction effect, improved segmentation accuracy, and ensured more precise attention to target areas. This underscores the outstanding performance of the NDBWI index in enhancing model performance and segmentation capabilities. In conclusion, this research underscores the significant potential of deep learning-based BOW detection. This study paves the way for further innovations in environmental monitoring, advocating for the integration of advanced technologies like deep learning to meet global sustainability challenges more effectively.

The RSBD dataset used in this study only collects the BOW for the city of Yantai and may not be applicable to other scenarios. To address this limitation, future research should integrate more satellite and regional data to supplement and enhance the existing dataset, ensuring the comprehensiveness and reliability of the study results. Additionally, while the BOW indices used in this study are effective, there may be other indices that could contribute to a more holistic evaluation. In the future, we will continue to explore the organic combination of deep learning and the intrinsic optical quantity of water bodies with water quality parameters so as to further improve the recognition accuracy of BOW.

Author Contributions

Conceptualization, P.W. and H.X.; methodology, J.H.; software, J.X.; validation, J.H. and J.X.; formal analysis, J.H., W.Y., P.W. and H.X.; data curation, J.X.; writing—original draft preparation, J.H.; writing—review and editing, J.H., J.X., W.Y., P.W. and H.X.; visualization, J.H.; supervision, J.X.; project administration, J.X.; funding acquisition, J.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under Grant 62072391 and Grant 62066013.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The RSBD-II dataset contains the RSBD dataset, the RSBD_BOI dataset, and the RSBD_NDBWI dataset, which can be downloaded from the website (https://github.com/xianjiangzhuangnai/rsbd_2, accessed on 3 November 2023).

Acknowledgments

We appreciate the China Centre for Resources Satellite Data and Application providing the GF-2 images.

Conflicts of Interest

The authors declare no conflict of interest.

References

Wang, X.; Wang, Y.G.; Sun, C.H.; Pan, T. Formation mechanism and assessment method for urban black and odorous water body: A review. Ying Yong Sheng Tai Xue Bao 2016, 27, 1331–1340. [Google Scholar] [PubMed]
Duan, H.; Ma, R.; Loiselle, S.A.; Shen, Q.; Yin, H.; Zhang, Y. Optical characterization of black water blooms in eutrophic waters. Sci. Total Environ. 2014, 482, 174–183. [Google Scholar] [CrossRef] [PubMed]
Cao, J.; Sun, Q.; Zhao, D.; Xu, M.; Shen, Q.; Wang, D.; Wang, Y.; Ding, S. A critical review of the appearance of black-odorous waterbodies in China and treatment methods. J. Hazard. Mater. 2020, 385, 121511. [Google Scholar] [CrossRef] [PubMed]
Liang, Z.; Fang, W.; Luo, Y.; Lu, Q.; Juneau, P.; He, Z.; Wang, S. Mechanistic insights into organic carbon-driven water blackening and odorization of urban rivers. J. Hazard. Mater. 2021, 405, 124663. [Google Scholar] [CrossRef] [PubMed]
Peng, Y.; Liankui, X.U.; Baoling, K.E.; Fei, S.; Hongjie, G. Treatment and ecological restoration of black and odorous water body in Yueya Lake in Nanjing City. J. Environ. Eng. Technol. 2020, 10, 696–701. [Google Scholar]
Wang, L.; Yu, L.; Xiong, Y.; Li, Z.; Geng, J. Study on the Governance of Black-odor Water in Chinese Cities. J. Clean. Prod. 2021, 308, 127290. [Google Scholar] [CrossRef]
Chen, G.; Luo, J.; Zhang, C.; Jiang, L.; Tian, L.; Chen, G. Characteristics and influencing factors of spatial differentiation of urban black and odorous waters in China. Sustainability 2018, 10, 4747. [Google Scholar] [CrossRef]
Müezzinoğlu, A.; Sponza, D.; Ken, I.K.; Alparslan, N.; Re, N.Z. Hydrogen Sulfide and Odor Control in İzmir Bay. Water Air Soil Pollut. 2000, 123, 245–257. [Google Scholar] [CrossRef]
Chen, J.; Xie, P.; Ma, Z.; Niu, Y.; Tao, M.; Deng, X.; Wang, Q. A systematic study on spatial and seasonal patterns of eight taste and odor compounds with relation to various biotic and abiotic parameters in Gonghu Bay of Lake Taihu. Sci. Total Environ. 2010, 409, 314–325. [Google Scholar] [CrossRef]
Watts, S.F. The mass budgets of carbonyl sulfide, dimethyl sulfide, carbon disulfide and hydrogen sulfide. Atmos. Environ. 2000, 34, 761–779. [Google Scholar] [CrossRef]
Zhang, X.; Ren, Y.; Zhu, X.; Pan, H.; Yao, H.; Wang, J.; Liu, M.; He, M. Driving Factors for Black-Odor-Related Microorganisms and Potential Self-Remediation Strategies. Sustainability 2022, 15, 521. [Google Scholar] [CrossRef]
Sugiura, N.; Utsumi, M.; Wei, B.; Iwami, N.; Okano, K.; Kawauchi, Y.; Maekawa, T. Assessment for the complicated occurrence of nuisance odours from phytoplankton and environmental factors in a eutrophic lake. Lakes Reserv. Res. Manag. 2004, 9, 195–201. [Google Scholar] [CrossRef]
Xu, M.; Yao, R.H.; Song, L.L. Primary exploration of general plan of the urban black-odor river treatment in China. Chin. J. Environ. Manag. 2015, 7, 74–78. [Google Scholar] [CrossRef] [PubMed]
Lu, G.-H.; Ma, Q.; Zhang, J.-H. Analysis of black water aggregation in Taihu Lake. Water Sci. Eng. 2011, 4, 374–385. [Google Scholar]
Liu, C.; Hu, Z.; Hao, X.; Bai, Y. Progress in the development of black-odour prediction models for urban rivers. J. East China Norm. Univ. (Nat. Sci.) 2011, 1, 43–54. [Google Scholar]
Zhao, J.; Hu, C.; Lapointe, B.; Melo, N.; Johns, E.M.; Smith, R.H. Satellite-observed black water events off Southwest Florida: Implications for coral reef health in the Florida Keys National Marine Sanctuary. Remote Sens. 2013, 5, 415–431. [Google Scholar] [CrossRef]
Wei, L.; Huang, C.; Wang, Z.; Wang, Z.; Zhou, X.; Cao, L. Monitoring of urban black-odor water based on Nemerow index and gradient boosting decision tree regression using UAV-borne hyperspectral imagery. Remote Sens. 2019, 11, 2402. [Google Scholar] [CrossRef]
Shen, Q.; Zhu, L.; Cao, H. Remote sensing monitoring and screening for urban black and odorous water body: A review. Chin. J. Appl. Ecol. 2017, 28, 3433–3439. [Google Scholar]
Wang, Q.; Xu, J.; Chen, Y.; Li, J.; Wang, X. Influence of the varied spatial resolution of remote sensing images on urban and rural residential information extraction. Resour. Sci. 2012, 34, 159–165. [Google Scholar]
Yu, Z.; Huang, Q.; Peng, X.; Liu, H.; Ai, Q.; Zhou, B.; Yuan, X.; Fang, M.; Wang, B. Comparative Study on Recognition Models of Black-Odorous Water in Hangzhou Based on GF-2 Satellite Data. Sensors 2022, 22, 4593. [Google Scholar] [CrossRef]
Kutser, T.; Paavel, B.; Verpoorter, C.; Ligi, M.; Soomets, T.; Toming, K.; Casal, G. Remote sensing of black lakes and using 810 nm reflectance peak for retrieving water quality parameters of optically complex waters. Remote Sens. 2016, 8, 497. [Google Scholar] [CrossRef]
Yuwen, J.; Ning, Z.; Yanyan, Z.; Rijuan, H.; Zhe, Z. Research on remote sensing monitoring of urban black and odorous water. Bull. Surv. Mapp. 2019, 2019, 98–104. [Google Scholar]
Shihong, W. Research progress of remote sensing monitoring key technologies for urban black and odorous water bodies. Chin. J. Environ. Eng. 2019, 13, 1261–1271. [Google Scholar]
Yao, Y.; Shen, Q.; Zhu, L.; Gao, H.; Cao, H.; Han, H.; Sun, J.; Li, J. Remote sensing identification of urban black-odor water bodies in Shenyang city based on GF-2 image. J. Remote Sens. 2019, 23, 230–242. [Google Scholar] [CrossRef]
Wen, S.; Wang, Q.; Li, Y.; Zhu, L.; Lü, H.; Lei, S.; Ding, X.; Miao, S. Remote sensing identification of urban black-odor water bodies based on high-resolution images: A case study in Nanjing. Environ. Sci. 2018, 39, 57–67. [Google Scholar]
Shen, Q.; Yao, Y.; Li, J.; Zhang, F.; Wang, S.; Wu, Y.; Ye, H.; Zhang, B. A CIE Color Purity Algorithm to Detect Black and Odorous Water in Urban Rivers Using High-Resolution Multispectral Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2019, 57, 6577–6590. [Google Scholar] [CrossRef]
Yuqiao, F. Study of the Phenomena of Water Blackening and Stink on Suzhou Creek. Shanghai Environ. Sci. 1993, 1993, 21–26. [Google Scholar]
Ji-Zhoua, L.I.; Chenga, N.; Hao-Rana, J.I.; Wana, J.; Xu-Yina, Y. Malodorous circumstances assessment of representative river water in Nanjing city. J. Xuzhou Inst. Technol. (Nat. Sci. Ed.) 2013, 28, 53–56. [Google Scholar]
Pan, B.; Yu, H.; Cheng, H.; Du, S.; Cai, S.; Zhao, M.; Du, J.; Xie, F. A CNN–LSTM Machine-Learning Method for Estimating Particulate Organic Carbon from Remote Sensing in Lakes. Sustainability 2023, 15, 13043. [Google Scholar] [CrossRef]
Abasi, A.K.; Makhadmeh, S.N.; Alomari, O.A.; Tubishat, M.; Mohammed, H.J. Enhancing Rice Leaf Disease Classification: A Customized Convolutional Neural Network Approach. Sustainability 2023, 15, 15039. [Google Scholar] [CrossRef]
Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
Li, R.; Zheng, S.; Zhang, C.; Duan, C.; Su, J.; Wang, L.; Atkinson, P.M. Multiattention network for semantic segmentation of fine-resolution remote sensing images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–13. [Google Scholar] [CrossRef]
Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv 2014, arXiv:1412.7062. [Google Scholar]
Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
Chen, L.-C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
Zhang, C.; Chen, X.; Ji, S. Semantic image segmentation for sea ice parameters recognition using deep convolutional neural networks. Int. J. Appl. Earth Obs. Geoinf. 2022, 112, 102885. [Google Scholar] [CrossRef]
Wang, Y.; Yao, J.; Yang, P.; Zhang, Y.; Sun, Y.; Cui, N. Dynamic remote sensing monitoring and its influence factors analysis for urban black and odorous water body management and treatment in Beijing, China. Chin. J. Environ. Eng. 2022, 16, 3092–3101. [Google Scholar]
Shao, H.; Ding, F.; Yang, J.; Zheng, Z. Model of Extracting Remotely-sensed Information of Black and Odorous Water Based on Deep Learning. J. Yangtze River Sci. Res. Inst. 2022, 39, 156–162. [Google Scholar]
Zheng, G.; Zhao, Y.; Pan, Z.; Chen, Z.; Qiu, Z.; Zheng, T. Fanet: A deep learning framework for black and odorous water extraction. Eur. J. Remote Sens. 2023, 56, 2234077. [Google Scholar] [CrossRef]
Rocchini, D.; Foody, G.M.; Nagendra, H.; Ricotta, C.; Anand, M.; He, K.S.; Amici, V.; Kleinschmit, B.; Förster, M.; Schmidtlein, S. Uncertainty in ecosystem mapping by remote sensing. Comput. Geosci. 2013, 50, 128–135. [Google Scholar] [CrossRef]
Guo, J.; Huo, H.; Peng, G. An interval number distance-and ranking-based method for remotely sensed image fuzzy clustering. Int. J. Remote Sens. 2018, 39, 8591–8614. [Google Scholar] [CrossRef]
Price, S.R.; Price, S.R.; Anderson, D.T. Introducing fuzzy layers for deep learning. In Proceedings of the IEEE International Conference on Fuzzy Systems, New Orleans, LA, USA, 23–26 June 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–6. [Google Scholar]
Ma, X.; Xu, J.; Chong, Q.; Ou, S.; Xing, H.; Ni, M. FCUnet: Refined remote sensing image segmentation method based on a fuzzy deep learning conditional random field network. IET Image Process. 2023, 17, 3616–3629. [Google Scholar] [CrossRef]
Zhao, T.; Xu, J.; Chen, R.; Ma, X. Remote sensing image segmentation based on the fuzzy deep convolutional neural network. Int. J. Remote Sens. 2021, 42, 6264–6283. [Google Scholar] [CrossRef]
Nan, Y.; Del Ser, J.; Tang, Z.; Tang, P.; Xing, X.; Fang, Y.; Herrera, F.; Pedrycz, W.; Walsh, S.; Yang, G. Fuzzy attention neural network to tackle discontinuity in airway segmentation. IEEE Trans. Neural Netw. Learn. Syst. 2023, 1–14. [Google Scholar] [CrossRef] [PubMed]
Huang, J.; Xu, J.; Chong, Q.; Li, Z. Black and Odorous Water Detection of Remote Sensing Images Based on Improved Deep Learning. Can. J. Remote Sens. 2023, 49, 2237591. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
Su, H.; Peng, Y.; Xu, C.; Feng, A.; Liu, T. Using improved DeepLabv3+ network integrated with normalized difference water index to extract water bodies in Sentinel-2A urban remote sensing images. J. Appl. Remote Sens. 2021, 15, 018504. [Google Scholar] [CrossRef]
Lv, S.; Meng, L.; Edwing, D.; Xue, S.; Geng, X.; Yan, X.-H. High-Performance Segmentation for Flood Mapping of HISEA-1 SAR Remote Sensing Images. Remote Sens. 2022, 14, 5504. [Google Scholar] [CrossRef]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4510–4520. [Google Scholar]
Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Zheng, Y.; Xu, Z.; Wang, X. The fusion of deep learning and fuzzy systems: A state-of-the-art survey. IEEE Trans. Fuzzy Syst. 2021, 30, 2783–2799. [Google Scholar] [CrossRef]
Deng, Y.; Ren, Z.; Kong, Y.; Bao, F.; Dai, Q. A hierarchical fused fuzzy deep neural network for data classification. IEEE Trans. Fuzzy Syst. 2016, 25, 1006–1012. [Google Scholar] [CrossRef]
Shen, T.; Wang, J.; Gou, C.; Wang, F.-Y. Hierarchical fused model with deep learning and type-2 fuzzy learning for breast cancer diagnosis. IEEE Trans. Fuzzy Syst. 2020, 28, 3204–3218. [Google Scholar] [CrossRef]
Qu, T.; Xu, J.; Chong, Q.; Liu, Z.; Yan, W.; Wang, X.; Song, Y.; Ni, M. Fuzzy neighbourhood neural network for high-resolution remote sensing image segmentation. Eur. J. Remote Sens. 2023, 56, 2174706. [Google Scholar] [CrossRef]
Chong, Q.; Xu, J.; Jia, F.; Liu, Z.; Yan, W.; Wang, X.; Song, Y. A multiscale fuzzy dual-domain attention network for urban remote sensing image segmentation. Int. J. Remote Sens. 2022, 43, 5480–5501. [Google Scholar] [CrossRef]
Nambiar, K.G.; Morgenshtern, V.I.; Hochreuther, P.; Seehaus, T.; Braun, M.H. A Self-Trained Model for Cloud, Shadow and Snow Detection in Sentinel-2 Images of Snow-and Ice-Covered Regions. Remote Sens. 2022, 14, 1825. [Google Scholar] [CrossRef]
Eigen, D.; Fergus, R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 2650–2658. [Google Scholar]
Wei, C.; Zheng, Q.; Shang, Y.; Zhang, X.; Yin, J.; Shen, Z. Black and Odorous Water Monitoring by Using GF Series Remote Sensing Data. In Proceedings of the International Conference on Agro-Geoinformatics, Shenzhen, China, 26–29 July 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–6. [Google Scholar]
Sun, W.; Yang, G.; Chen, C.; Chang, M.; Huang, K.; Meng, X.; Liu, L. Development status and literature analysis of China’s earth observation remote sensing satellites. J. Remote Sens. 2020, 24, 479–510. [Google Scholar] [CrossRef]
Ruan, Y.; Zhang, X.; Liao, X.; Ruan, B.; Wang, C.; Jiang, X. Automatic Plastic Greenhouse Extraction from Gaofen-2 Satellite Images with Fully Convolution Networks and Image Enhanced Index. Sustainability 2023, 15, 16537. [Google Scholar] [CrossRef]
Liu, B.; Xi, H.; Li, T.; Borthwick, A.G. Black-odorous water bodies annual dynamics in the context of climate change adaptation in Guangzhou City, China. J. Clean. Prod. 2023, 415, 137781. [Google Scholar] [CrossRef]

Figure 1. Structure diagram of the Modified DeepLabv3+ model.

Figure 2. Structure diagram of the convolutional attention module.

Figure 3. Structure diagram of the fuzzy block. Different colored rectangles represent different channels.

Figure 4. Typical images and their corresponding annotation images of the RSBD dataset.

Figure 5. Typical images and their corresponding annotation images of the water pollution dataset.

Figure 6. Visualization results of different models in the RSBD dataset. (a) Original image; (b) ground truth; (c) SegNet; (d) U-Net; (e) MANet; (f) DeepLabv3+; (g) Modified DeepLabv3+. The red box indicates the place that is not detected. The blue box indicates the place that is incorrectly detected.

Figure 7. Visualization results of different models in the water pollution dataset. (a) Original image; (b) ground truth; (c) SegNet; (d) U-Net; (e) MANet; (f) DeepLabv3+; (g) Modified DeepLabv3+. The red box indicates the place that is not detected. The blue box indicates the place that is incorrectly detected.

Figure 8. Comparison of qualitative results of ablation experiments in the RSBD dataset. (a) Original image; (b) ground truth; (c) Modified DeepLabv3+ (without CBAM); (d) Modified DeepLabv3+ (without fuzzy block); (e) Modified DeepLabv3+ (without skip); (f) Modified DeepLabv3+. The red box indicates the place that is not detected. The blue box indicates the place that is incorrectly detected.

Figure 9. Comparison of qualitative results of ablation experiments in the water pollution dataset. (a) Original image; (b) ground truth; (c) Modified DeepLabv3+ (without CBAM); (d) Modified DeepLabv3+ (without fuzzy block); (e) Modified DeepLabv3+ (without skip); (f) Modified DeepLabv3+. The red box indicates the place that is not detected. The blue box indicates the place that is incorrectly detected.

Figure 10. Comparison of F₁-score, MIoU, OA, and WIoU results of different segmentation models in RSBD dataset, RSBD_BOI dataset, and RSBD_NDBWI dataset, respectively. (a) Represents the F₁-score result for different networks; (b) represents the MIoU result for different networks; (c) represents the OA result for different networks; (d) represents the WIoU result for different networks.

Figure 11. Visualization results of different models in untrained black and odorous water image. The first and second rows represent the black and odorous water within the city of Yantai in 2018, and the third and fourth rows represent the black and odorous water within the city of Yantai in 2019. (a) Original image; (b) ground truth; (c) SegNet; (d) U-Net; (e) DeepLabv3+; (f) MANet; (g) Modified DeepLabv3+.

Table 1. Quantitative results of the RSBD dataset compared Modified DeepLabv3+ with common semantic segmentation networks. Bold formatting represents the best results of the network in the assessment metrics.

Method	Parameters (MB)	F₁-Score (%)	MIoU (%)	OA (%)	WIoU (%)
SegNet	112	71.97	76.75	97.37	56.21
U-Net	51.1	72.51	77.09	97.39	56.88
MANet	137	73.52	77.78	97.52	58.13
DeepLabv3+	226	75.38	79.01	97.62	60.49
Modified DeepLabv3+	23.3	78.47	80.97	97.51	64.56

Table 2. Quantitative results of the water pollution dataset compared Modified DeepLabv3+ with common semantic segmentation networks. Bold formatting represents the best results of the network in the assessment metrics.

Method	Parameters (MB)	F₁-Score (%)	MIoU (%)	OA (%)	WIoU (%)
SegNet	112	87.56	87.52	97.43	77.87
U-Net	51.1	87.59	87.56	97.46	77.92
MANet	137	89.68	89.44	97.83	81.29
DeepLabv3+	226	90.44	90.15	97.96	82.54
Modified DeepLabv3+	23.3	91.49	91.05	98.03	84.31

Table 3. Ablation experiments of convolutional attention module, fuzzy block, and multi-layer skip connections in RSBD dataset and water pollution dataset. The bold format indicates the best results of the network in the evaluation metrics.

Dataset	Method	Parameters (MB)	F₁-Score (%)	MIoU (%)	OA (%)	WIOU (%)
RSBD dataset	Modified DeepLabv3+	23.3	78.47	80.97	97.51	64.56
	Modified DeepLabv3+ (without CBAM)	23.0	76.84	79.78	97.29	62.40
	Modified DeepLabv3+ (without fuzzy block)	22.9	76.62	79.64	97.30	62.10
	Modified DeepLabv3+ (without skip)	23.2	77.88	80.57	97.49	63.77
Water pollution dataset	Modified DeepLabv3+	23.3	91.49	91.05	98.03	84.31
	Modified DeepLabv3+ (without CBAM)	23.0	90.99	90.57	97.91	83.47
	Modified DeepLabv3+ (without fuzzy block)	22.9	90.57	90.13	97.77	82.77
	Modified DeepLabv3+ (without skip)	23.2	88.75	88.36	97.27	79.77

Table 4. Comparison results of different loss functions. Bold formatting represents the best results of the network in the assessment metrics.

Method	Parameters (MB)	F₁-Score (%)	MIoU (%)	OA (%)	WIoU (%)
l₁	23.3	76.68	79.85	97.62	62.18
l₂	23.3	78.47	80.97	97.51	64.56

Table 5. Quantitative results of the RSBD_BOI dataset compared Modified DeepLabv3+ with common deep learning methods. Bold formatting represents the best results of the network in the assessment metrics.

Models	F₁-Score (%)	MIoU (%)	OA (%)	WIoU (%)
SegNet	70.02	75.25	96.76	53.87
U-Net	75.79	79.13	97.36	61.01
MANet	77.45	80.33	97.57	63.20
DeepLabv3+	78.02	80.56	97.29	63.96
Modified DeepLabv3+	79.63	81.69	97.38	66.15

Table 6. Quantitative results of the RSBD_NDBWI dataset compared Modified DeepLabv3+ with common deep learning methods. Bold formatting represents the best results of the network in the assessment metrics.

Models	F₁-Score (%)	MIoU (%)	OA (%)	WIoU (%)
SegNet	72.30	77.26	97.95	56.62
U-Net	74.27	78.54	98.06	59.07
DeepLabv3+	76.85	80.25	98.15	62.41
MANet	78.11	81.13	98.24	64.08
Modified DeepLabv3+	80.79	82.98	98.25	67.77

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huang, J.; Xu, J.; Yan, W.; Wu, P.; Xing, H. Detection of Black and Odorous Water in Gaofen-2 Remote Sensing Images Using the Modified DeepLabv3+ Model. Sustainability 2024, 16, 92. https://doi.org/10.3390/su16010092

AMA Style

Huang J, Xu J, Yan W, Wu P, Xing H. Detection of Black and Odorous Water in Gaofen-2 Remote Sensing Images Using the Modified DeepLabv3+ Model. Sustainability. 2024; 16(1):92. https://doi.org/10.3390/su16010092

Chicago/Turabian Style

Huang, Jianjun, Jindong Xu, Weiqing Yan, Peng Wu, and Haihua Xing. 2024. "Detection of Black and Odorous Water in Gaofen-2 Remote Sensing Images Using the Modified DeepLabv3+ Model" Sustainability 16, no. 1: 92. https://doi.org/10.3390/su16010092

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Detection of Black and Odorous Water in Gaofen-2 Remote Sensing Images Using the Modified DeepLabv3+ Model

Abstract

1. Introduction

2. Methodology

2.1. Original DeepLabv3+ Model

2.2. Architecture of the Modified DeepLabv3+ Model

2.3. Convolutional Attention Module

2.4. Fuzzy Block

2.5. Multi-Layer Skip Connections

2.6. Loss Function

3. Experiment Setup

3.1. Dataset Description

3.1.1. RSBD Dataset

3.1.2. Water Pollution Dataset

3.2. Implementation Details

3.3. Evaluation Metrics

4. Experiment Results

4.1. Comparative Experiments

4.2. Ablation Experiments

4.3. Experiments of RSBD-II Dataset

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI