Next Article in Journal
A Study of Cavitation Erosion in Artificial Submerged Water Jets
Previous Article in Journal
An Investigation of Fly Ash and Slag Processing and Fiber Production Using Plasma Technology
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Inversion of Chlorophyll-a Concentration in Wuliangsu Lake Based on OGolden-DBO-XGBoost

School of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot 010011, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(11), 4798; https://doi.org/10.3390/app14114798
Submission received: 22 April 2024 / Revised: 29 May 2024 / Accepted: 29 May 2024 / Published: 1 June 2024
(This article belongs to the Section Environmental Sciences)

Abstract

:
Chlorophyll-a (Chl-a) concentration is one of the important indicators in water bodies for assessing the ecological health of water quality. In this paper, an OGolden-DBO-XGBoost Chl-a concentration inversion model is proposed using Wuliangsu Lake as the study area, and by combining the Sentinel-2 remote-sensing satellite images and measured Chl-a concentration data in Wuliangsu Lake, the XGBoost model is optimized using the hybrid-strategy-improved dung beetle optimization algorithm (OGolden-DBO), and an OGolden-DBO-XGBoost Chl-a concentration inversion model. The OGolden-DBO-XGBoost model’s coefficients of determination (R2s) were 0.8936 and 0.8850 on the training set and test set, according to the results. The root mean squared errors (RMSEs) were 3.1353 and 2.9659 μg/L, and the mean absolute errors (MAEs) were 1.8918 and 2.4282 μg/L. The model performed well and provided a strong support for the detection of Chl-a concentration in Wuliangsu Lake.

1. Introduction

Lake wetlands are ecosystems with multiple ecological functions, which are crucial to the sustainable development of human beings and the environment. In recent years, global warming, population activities, and irrational exploitation of natural resources [1] have triggered a series of problems such as shrinking area, fragile aquatic ecosystems, and deterioration of water quality in domestic lakes and wetlands. Chl-a is an important carrier of photosynthesis in algal organisms, and its concentration is an important indicator for assessing the degree of algae and eutrophication in water bodies [2,3]; therefore, the inversion of Chl-a concentration is of great significance for detecting the water quality of water bodies as well as the protection of lake wetlands.
With the rapid development of satellite remote-sensing technology, satellite remote-sensing data are widely used in water quality detection for their high accuracy, multi-band, and map integration [4]. This greatly improves the estimation accuracy of numerical parameters and contributes to the ecological environmental protection and sustainable development of water bodies [5]. For example, Liu et al. [6] used the red-peak envelop area method (PEA) to invert the Chl-a concentration in the Pearl River based on the Sentinel-3 satellite data of the Pearl River, and compared with algorithms such as reflectance peak algorithm and fluorescence baseline height method, the results show that the PEA method is the best with a R2 of 0.74 and a RMSE of 0.12 mg/m3. Feng Tianshi et al. [7] used the Pearl River One hyperspectral satellite to build a three-band model to invert the Chl-a concentration of Chaohu Lake, and it was found that the three-band model [Rrs(700 nm)-1—Rrs(670 nm)-1] × Rrs(746 nm) constructed from the 14, 16, and 19 bands of the Zhuhai One OHS-2A star image achieved a high degree of accuracy, with the relative error and the RMSE were 19.97% and 10.85 mg/m3, respectively. However, the traditional inversion algorithm for Chl-a concentration has a strong dependence on the quality of spectral data, is easily affected by the concentration of other substances in the water, and has a poor inversion effect in the face of a more complex water body. With the development of artificial intelligence, such as neural networks, support vector machines, random forests, and other artificial intelligence-based algorithms are widely used in solving nonlinear problems in waters [8]. Numerous existing scholars have studied the inversion of Chl-a concentration in marine or inland waters through remote-sensing images combined with AI-based algorithms. For example, Xu Pengfei et al. [9] constructed a Chl-a concentration inversion model for the clean water of Qiandao Lake based on Gaofen-1 satellite remote-sensing images and a neural network model, and the neural network model has a better effect on the inversion of the inland clean water with a low Chl-a content compared to other conventional methods. Cao et al. [10] used extreme gradients and inversions to estimate the Chl-a concentration of eight lakes in eastern China based on Landsat-8 remote-sensing data. Remote-sensing data use the machine-learning method of extreme gradient augmented tree to propose an algorithm for inversion of Chl-a concentration in turbid lakes by broadband instruments such as OLI. Tingting Xie et al. [11] selected multiple regression, BP neural network, and random forest methods based on GF-1 WFV data to establish the inversion concentration of Chl-a concentration in the downstream of Minjiang River, respectively, and the results showed that the inversion accuracy of the random forest model was optimal. Chen et al. [12] proposed a genetic algorithm combined with an artificial neural network as a machine-learning method to invert the Chl-a concentration, and also compared with the three-band model; the results show that the machine-learning model performance is overall better than the three-band model under three different remote-sensing datasets.
In summary, artificial intelligence algorithms have demonstrated great potential in inversion of Chl-a concentration in water bodies, but there are still some limitations, such as higher requirements for parameter tuning, easy overfitting, and higher computational cost. To address these problems, an improved OGolden-DBO-XGBoost model is proposed in this paper. The model improves DBO by introducing Obl and Gold-SA, which enhances the global search capability and convergence speed of DBO to search for better XGBoost parameters. This effectively prevents the overfitting problem while also improving the computational efficiency. Wuliangsu Lake is a river trace lake formed by the diversion of the Yellow River [13], which is the largest lake wetland in the Yellow River Basin, and has important values in ecology, economy, scientific research, and other aspects [14]. In the early years, human activities and climate change have led to the deepening of eutrophication and deterioration of water quality in the water bodies of Wuliangsu Lake [15,16], which has caused serious impacts on the economy and ecological environment of the surrounding areas. In recent years, as an important part of the Inner Mongolia Autonomous Region’s “One Lake and Two Seas”, the environmental protection of Wuliangsu Lake has been highly valued [17], and the water quality and ecological environment have been improved, but it is still necessary to continue to monitor its water quality to ensure that the ecological environment of Wuliangsu Lake is continuously optimized. Therefore, this paper presents the application of an inversion model for Chl-a concentration in Wuliangsu Lake and use of OGolden-DBO to optimize the XGBoost parameters based on the measured Chl-a concentration data and the Sentinel-2 remote-sensing image data. The main work of this paper can be summarized in the following three points:
  • Due to the fact that there are seasonal characteristics of the nutrient status of Wuliangsu Lake, this paper investigates the influence of monthly characteristics on the inversion of Chl-a concentration in Wuliangsu Lake.
  • Introducing the Obl to optimize the population initialization of the dung beetle optimization algorithm, which increases the diversity of the initial population and improves the global search ability of the dung beetle optimization algorithm.
  • Incorporating the Gold-SA to improve the dancing strategy of DBO, which promotes the exchange of information between the individuals and the best individual, and improves the local search ability of the algorithm.

2. Materials and Methods

2.1. Study Area

Wuliangsu Lake (N40°47′~41°03′,E108°43′~108°57′) is a large multifunctional lake located in Ulatqian Banner, Bayannur City, Inner Mongolia [18], and the largest wetland at the same latitude of the Earth [19], with a total area of about 325.31 km2 an average water depth of 2.16 m and a water storage capacity of about 2.5–3 × 108 m3 [20]. Meanwhile, Wuliangsu Lake is also a world of birds and a paradise for fish, home to over 20 species of fish and nearly 200 species of birds, including 12 species of national category I and II protected birds, and 48 species of migratory birds protected by the China–Japan Migratory Bird Agreement. Figure 1 depicts the geographic location and sampling locations of Wuliangsu Lake.

2.2. Data Sets

2.2.1. Measured Chl-a Concentration Data

The data in this paper were provided by the national research team of “River and Lake Wetland Water Environment Protection and Restoration” of Inner Mongolia Agricultural University. This team has long been committed to the scientific research of Wuliangsu Lake, so the data sources are authentic and trustworthy. The data were collected at 19 sampling locations in Wuliangsu Lake. Wuliangsu Lake enters the freezing period from the end of October to the first half of November every year, and thaws in March of the next year; the freezing period lasts five months [21,22]. Therefore, the sampling time is concentrated in the period from June to September, and the sampling time is fixed at the middle and the end of every month. The sampling is conducted by boat, and the depth of sampling is vertically downward from the surface of the water to a depth of 0.5 m. Between 2015 and 2018, 92 Chl-a concentration measurements were sampled. Table 1 presents the internationally recognized “Chl-a Concentration Hierarchy”, which was used to assess the degree of eutrophication at the selected sample sites, and the results are shown in Table 2. In this paper, all sampling dates were divided into a training set and test set according to the ratio of 8:2, and 73 measurements were used to train the model. After the training was completed, 19 measurements were used to test and evaluate the model efficiency.

2.2.2. Remote-Sensing Data

Remote-sensing satellite data from the Sentinel-2A and Sentinel-2B satellites of the Sentinel series of satellites of the European Space Agency are mainly used for detecting information on various aspects such as agriculture, forestry cultivation, and pollution of lakes and offshore waters [23]. The same multispectral imager, which covers visible light, near-infrared, and short-wave infrared wavelengths, is carried by every Sentinel-2 satellite. Table 3 lists the bands that are available for imaging collection. The Sentinel-2 series of satellites can systematically photograph land and coastal waters from latitudes 56°S to 84°N. The two satellites have a complementary repeat cycle of five days, which has a high access frequency and improves the accuracy and efficiency of experimental data acquisition [24].
Remote-sensing satellite data were selected from remote-sensing images that were synchronous or quasi-synchronous with the sampling dates of measured Chl-a concentration and with less than 20% cloud cover over the study area. Remote-sensing image data were downloaded from the ESA Copernicus Data Centre (https://scihub.copernicus.eu/dhus/#/home, accessed on 30 July 2021.).

2.3. Data Preprocessing

2.3.1. Remote-Sensing Image Data Preprocessing

The remote-sensing image data downloaded in this paper are of Level-1C, and the Level-1C remote-sensing images are atmospherically corrected by Sen2Cor (V2.9), the Sentinel-2 image processing software provided by ESA, to produce Level-2A level remote-sensing image data. Furthermore, Sen2Cor does not process band 10 because it is a convolutional cloud band; as a result, the final result consists of 12 bands of data in addition to band 10. The final experimental data are obtained by resampling the processed Level-2A remotely sensed image data using nearest neighbor interpolation, after which modified normalized difference water index (MNDWI) is used to extract the watershed. The remote-sensing image data preprocessing process is shown in Figure 2.

2.3.2. Monthly Feature

According to domestic scholars’ studies related to water quality and other aspects of Wuliangsu Lake, the eutrophication degree of Wuliangsu Lake has obvious seasonal characteristics [25,26], so the month was also utilized as a feature. The monthly information of the data (June, July, August, September) is used as the feature, which is represented by using the binary numbers 0 and 1. If the value is 1, it indicates that the data were collected during that month; if it is 0, it indicates that they were not. The final data feature structure is shown in Table 4.

3. Methods

Extreme gradient boosting (XGBoost) was first proposed by Chen et al. [27] in 2016. It is an enhancement of gradient boosting decision tree (GBDT) [28]. Compared to GBDT, XGBoost offers a significant increase in learning speed and efficiency. In this paper, we invert the Chl-a concentration in Wuliangsu Lake by establishing the XGBoost model. However, XGBoost has many parameters and the tuning parameter is complicated. OGlden-DBO is proposed to optimize the parameters of XGBoost to obtain the ideal Chl-a concentration inversion model. The parameters of XGBoost tuning in this paper are shown in Table 5.

3.1. Dung Beetle Optimization Algorithm

The dung beetle optimization (DBO) algorithm [29] is a swarm intelligence algorithm proposed by Xue in 2023 to find the optimal solution by simulating the dung beetle’s ball-rolling, breeding, foraging, and stealing behaviors for position updating. The following only provides a quick description of the rolling dung beetle’s position update method, because the main focus of this work is on improving the strategy in DBO.
Dung beetles rolling dung balls need to navigate with the help of celestial cues (sun, light source intensity, and wind direction, etc.) in order to keep the dung balls rolling along a straight line. The position update strategy is shown in Equation (1).
X i n t + 1 = X i n t + α × k × X i n t 1 + b × Δ x
Δ x = X i n t W n t
where, t denotes the current iteration number, n denotes the dung beetle’s position in n-dimensional space, X i n t denotes the position of the ith dung beetle at the t iteration, k 0 , 0.2 denotes the deflection coefficient, b is a constant value in the range (0,1), and α denotes a natural coefficient with the value of 1 or −1. When α = −1, it means that the dung beetle has deviated from the original direction due to the natural factors, while α = 1 means that there is no deviation. W n t denotes the worst position of the global position at the t iteration, and Δ x is used to simulate the change in the light intensity.
When the dung beetle encounters an obstacle and is unable to move forward, it simulates dance behavior by using the tangent function to obtain a new rolling direction. The position update strategy is shown in Equation (3).
X i n t + 1 = X i n t + tan θ X i n t X i n t 1
where θ is a random number uniformly distributed in the range [0,π], and the dung beetle position is not updated when θ = 0, π/2, π.

3.2. Improved Dung Beetle Optimization Algorithm

DBO adopts random generation when initializing the population, which leads to the slow convergence speed of DBO, weak global search ability, and ease of falling into local optimum. Therefore, this paper introduces Obl to diversify the initial population and improve the global search capability of the algorithm. Additionally, the dung beetle uses a sinusoidal dancing behavior strategy (Equation (3)) when rolling the dung ball and coming across barriers. There is poor global search capability and no inter-person communication. Therefore, by introducing Gold-SA, a better balance between the capacity to search globally and the ability to exploit locally is obtained, and communication between individuals and the best individual is improved.

3.2.1. Opposition-Based-Learning Strategy to Improve Population Initialization

Opposition-based-learning (Obl) [30] was proposed by Tizhoosh in 2005 and has been successfully applied in population intelligence optimization algorithms such as genetic algorithm (GA), ant colony optimization (ACO) and biogeography-based optimization (BBO). The current candidate solution is used to compute its inverse solution, the candidate solution is evaluated against the inverse solution, and finally the better solution is selected as the final solution. The inverse solution is defined as:
Suppose the candidate solution P is a point in n-dimensional space, P = X 1 , X 2 , , X n ; its inverse solution is P ¯ = X 1 ¯ , X 2 ¯ , , X n ¯ , X i ¯ is calculated as in Equation (4)
X i ¯ = μ U b i + L b i X i , w h e r e ( i = 1 , 2 , , n )
where µ is a random number randomly and uniformly distributed in the range (0,1), U b i is the upper limit of X i and L b i is the lower limit of X i . After generating the initial population in each iteration, its inverse solution by Equation (4). F i t   is denoted as the fitness evaluation function for the minimization problem; if F i t P i > F i t P i ¯ , then P i ¯ is selected as the final solution, otherwise P i is selected.
Figure 3 displays the population initialization flowchart for the improved DBO using the Obl strategy. By using Obl to generate the reverse population, the starting population’s search range is expanded, the algorithm’s global search capability is strengthened, and the initial population’s fitness is improved. These improvements hasten the algorithm’s convergence and shorten its computation time.

3.2.2. Golden Sine Algorithm Update Location

Tanyildizi et al. [31] proposed the golden sine technique (Gold-SA) in 2017, a novel meta-heuristic algorithm that uses the mathematical sine function for computational iterative optimization search. The relationship between the sine function and the unit circle indicates that it is possible to search for all of the points on the unit circle by traversing all of the values on the sine function. At the same time, the golden section number is introduced to reduce the space of the solution while it is updating its position. This allows for the scanning of the region that may yield only positive results, which significantly speeds up the search and improves the balance between the ability to search globally and locally. The updating process of the solution of the Gold-SA is shown in Equations (5) and (6):
X i n t + 1 = X i n t sin r 1 + r 2 sin r 1 S
S = x 1 B n t x 2 X i n t
where r 1 is a random number uniformly distributed in the range [0,2π], which determines the moving distance of the individual in the next iteration, and r 2 is a random number uniformly distributed in the range [0,2π], which determines the direction of the individual’s position update in the next iteration. B n t is the globally optimal position in the tth iteration and x 1 and x 2 are the coefficients obtained by introducing the golden section number. These coefficients reduce the search space to lead the individual to gradually converge to the optimal value, which ensures the convergence of the algorithm, and the computational formula is shown in Equations (7)–(9):
τ = ( 5 1 ) / 2
x 1 = π τ + π 1 τ
x 2 = π 1 τ + π τ
The Gold-SA is introduced to improve the ability of global search and local exploitation of the DBO algorithm. In the rolling dung beetle behavioral update strategy, a random number r between 0 and 1, and a constant rate between 0 and 1, indicating the probability of encountering an obstacle, are generated. When r > rate, the dung beetle encounters an obstacle and the dung beetle’s position is updated using Gold-SA (Equations (5) and (6)), and when r < rate, the update is performed according to the original strategy (Equations (1) and (2)). The improved rolling behavior update strategy is shown in Figure 4.
The improved dung beetle rolling behavior position update strategy is shown in Equation (10):
X i n t + 1 = X i n t + α × k × X i n t 1 + b × Δ x , r r a t e X i n t sin r 1 + r 2 sin r 1 x 1 B n t x 2 X i n t , r > r a t e
where r is a random uniformly distributed random number in the range (0,1) and rate is a fixed value in the range (0,1). When r < rate, it means that the dung beetle rolls with a goal, and when r > rate, it means that the dung beetle encounters an obstacle, which needs to be iteratively updated in position by the Gold-SA.
By introducing the golden sine algorithm, the DBO algorithm improves the position update strategy of the dancing behavior, so that dung beetle individuals will communicate with the current optimal individual B n t for information exchange, which promotes information exchange between individuals.

3.2.3. OGolden-DBO-XGBoost Chl-a Concentration Inversion Model

The flowchart of the OGolden-DBO-XGBoost Chl-a concentration inversion model is displayed in Figure 5. The XGBoost model’s parameters and complex tuning were significantly improved by adjusting the model’s parameters using OGolden-DBO.
The specific steps of the OGolden-DBO-XGBoost Chl-a concentration inversion model are as follows:
(1)
Input of the remotely sensed image data of Wuliangsu Lake and the measured data of Chl-a concentration.
(2)
Building of the XGBoost inversion model.
(3)
Initialization of the dung beetle population and parameters.
(4)
According to Figure 3, calculation of the reverse population of the dung beetle population, and selection of the better individuals from the dung beetle population and its reverse population to form new population.
(5)
Updating of position of dung beetle population based on improved rolling (Equation (10)), spawning, foraging and stealing behavioural strategies. Then calculation of fitness of population.
(6)
Updating of the location and fitness of the best dung beetle.
(7)
Determination of whether the stopping condition is satisfied; if the condition is satisfied execute step (8), otherwise execute step (5).
(8)
Outputting of the location of the best dung beetle (the best parameters of the XGBoost Chl-a concentration inversion model).
(9)
Training of the XGBoost Chl-a concentration inversion model based on the best parameters obtained by the OGolden-DBO algorithm.
(10)
Obtaining of the Chl-a concentration inversion results for Wuliangsu Lake and calculation of the model rating index R2, RMSE, and MAE to evaluate the model performance.

3.3. Model Evaluation Metrics

This paper uses mean absolute error (MAE), root mean squard error (RMSE), and coefficient of determination (R2) as the evaluation indexes to evaluate the experimental accuracy of the model. Both MAE and RMSE are used to measure the difference between the predicted and measured values of the model. MAE is a measure that is insensitive to outliers. Since squaring is present, the RMSE is more sensitive to large errors. When the MAE and RMSE are smaller, the error is smaller, and the model inversion effect is better. The R2 intuitively reflects the model inversion effect, and its general range is 0 to 1. When R2 is closer to 1, the model inversion effect is better. The calculation formula of MAE, RMSE and R2 are shown in Equations (11)–(13):
M A E = 1 n i = 1 n y i y ^
R M S E = 1 n i = 1 n y i y ^ 2
R 2 = 1 i = 1 n y i y ^ 2 i = 1 n y ¯ y i 2
where n denotes the number of samples, y i denotes the measured value of Chl-a concentration, y ^ denotes the projected value of Chl-a concentration, and y ¯ denotes the mean value of the measured value of Chl-a concentration.

4. Results and Discussion

4.1. Model Parameter Setting

Parameters have a crucial impact on the effectiveness and stability of the model. The parameter settings for the DBO algorithm and the OGolden-DBO algorithm are shown in Table 6.

4.2. Analysis of the Effect of Monthly Feature

In order to verify the effect of month features on the inversion of Chl-a concentration in Lake Wufu, this paper establishes the XGBoost, support vector regression (SVR), and linear regression models with and without month features for comparison, and the experimental results are summarized in Table 7. The experimental results demonstrate that the R2 of XGBoost and SVR on both training and testing sets is higher than 0.7, the RMSE is less than 5 μg/L, and the MAE is less than 4 μg/L with the introduction of month features. XGBoost, SVR, and linear regression increased the R2 by roughly 18–80%, decreased the RMSE by 13–34%, and decreased the MAE by roughly 3–37% when compared to the models without month features. The results show that the introduction of month features significantly improves the inversion accuracies of the three models, especially the XGBoost model, which outperforms SVR and linear regression on both the training and test sets, with R2s of 0.8183 and 0.7711, RMSEs of 4.0982 and 4.1845 μg/L, and MAEs of 2.5775 and 3.3462 μg/L, respectively. Figure 6 demonstrates the fitting degree of the three models for the inversion of Chl-a concentration in Wuliangsu Lake with and without monthly features, where the straight line is a 1:1 straight line, which represents the ideal state (the inversion value is exactly equal to the measured value). It can be seen that the Chl-a concentration inversion value of the M-XGBoost model has the smallest difference with the measured value and is closest to the 1:1 straight line, indicating that its inversion is the best. Based on this finding, the monthly feature was chosen to be introduced as a feature in the subsequent inversion model experiments to improve the accuracy of the model.

4.3. Intelligent Optimization Algorithm to Optimize XGBoost Parameters

XGBoost has a large number of parameters, so in this paper, intelligent optimization algorithms are used to tune the XGBoost parameters, including the sparrow optimization algorithm [32], the whale optimization algorithm [33] and the dung beetle optimization algorithm. The accuracy of each model for inversion of Chl-a concentration in Wuliangsu Lake was analyzed with the introduction of month features, and the results are shown in Table 8. It is obvious that the R2 of the XGBoost model after the optimization of the intelligent optimization algorithm is higher than 0.83 in both the training set and the test set, which shows good fitting and generalization abilities. Secondly, the RMSEs were all lower than 4.0 μg/L, the MAEs were all lower than 2.8 μg/L, and the deviation between the inversion values of the optimized models and the measured values was small. Among them, the R2s of the DBO-XGBoost model on the training and test sets were 0.8578 and 0.8490, the RMSEs were 3.6259 and 3.3987 mg/L, and MAEs were 2.1346 and 2.7216. Figure 7 demonstrates the fitting degree of fit of the four models. As can be observed, the DBO-XGBoost model’s Chl-a concentration predicted value is closest to the 1:1 line and has the least variation from the measured value, suggesting that its inversion accuracy is the best. In summary, in the task of Chl-a concentration inversion in Wuliangsu Lake, using DBO for tuning the parameters of the XGBoost model can effectively improve the inversion accuracy of the model and reduce the inversion error.

4.4. Comparison and Analysis of Improvement Strategies

In order to verify the performance of OGolden-DBO-XGBoost in the inversion of Chl-a concentration with the introduction of the month feature, this paper compares the performance of DBO-XGBoost, Obl-DBO-XGBoost, Golden-DBO-XGBoost, and OGolden-DBO-XGBoost.
As can be seen in Table 9, the XGBoost model, improved by different strategies, performs well on both the training and test sets, with the Obl-DBO-XGBoost model achieving an R2 of 0.8618 on the test set, the RMSE decreasing to 3.2515 µg/L, and the MAE decreasing to 2.6637 µg/L, and the Golden-DBO-XGBoost model achieving an R2 of 0.8602, RMSE of 3.2708 µg/L, and MAE of 2.5824 µg/L. This indicates that the introduction of a single improvement strategy (either the Obl or the Gold-SA) significantly improves the model’s fitting and generalization abilities. The OGolden-DBO-XGBoost model, which ultimately incorporates the improvements of the two strategies, has the best inversion accuracy, with R2s of 0.8936 and 0.8850 for the training and test sets, respectively, RMSEs of 3.1353 and 2.9659 µg/L, and MAEs of 1.8918 and 2.4282 µg/L. Compared with the other models, the OGolden-DBO-XGBoost model is significantly more effective and Chl-a concentration inversion was significantly improved. In addition, as can be seen in Figure 8, the OGolden-DBO-XGBoost model has the smallest difference between the Chl-a concentration inversion value and the measured value, which is the closest to the 1:1 line, and has the best inversion effect.
In summary, the OGolden-DBO-XGBoost model improved by these two strategies showed good results in the inversion of Chl-a concentration in Wuliangsu Lake. It also outperformed the other models in terms of R2, RMSE, and MAE in both the training and test sets, with smaller errors between the projected and measured values. This demonstrated its superior capacity to invert the Chl-a concentration in Wuliangsu Lake. The enhanced DBO’s efficacy and excellent performance were proven.

4.5. Spatial and Temporal Distribution of Chl-a Concentration in the Study Area

Through the comparison of the above experiments, the OGolden-DBO-XGBoost model proposed in this paper has the best effect on the Chl-a concentration inversion in Wuliangsu Lake. Based on this model, the Chl-a concentration inversion results of Wuliangsu Lake were obtained, and the Chl-a concentration distribution map of Wuliangsu Lake was plotted, as shown in Figure 9. From the figure, it can be seen that the Chl-a concentration in Wuliangsu Lake has obvious temporal characteristics. In June and July, due to the low water temperature in the north, which is not suitable for the growth of algae and other plankton, the overall Chl-a concentration in June 2016, and June and July 2017 was low, mainly concentrated in the range 1–11 µg/L. Meanwhile, in August and September, Wuliangsu Lake needs to undertake the drainage of farmland in the river-loop irrigation area, and a large amount of phosphorus and other nutrients were discharged into the lake, which provided the growth of algae organisms with nutrients, leading to a sharp increase in Chl-a concentration. As can be seen from Figure 9, the Chl-a concentration was higher in September 2015 and August and September 2017; it was generally higher than 10 µg/L, reaching a peak in August, with some areas exceeding 30 µg/L. Meanwhile, it can be clearly seen that the Chl-a concentration in the northern region of Wuliangsu Lake was generally higher than that in the southern region. In the spatial and temporal distribution map of Chl-a concentration in August and September 2017, the central region had the highest Chl-a concentration. This was due to the fact that the ditches for discharging agricultural irrigation wastewater were located in the central part of Wuliangsu Lake, a large number of reeds were distributed in the northern region [34], and pollutants were aggregated to the northern region, which resulted in higher Chl-a concentrations and higher eutrophication in the northern region than in the southern region. These findings are highly consistent with those of other scholars [35,36].

5. Conclusions

In this paper, the XGBoost model was optimized to invert the Chl-a concentration in Wuliangsu Lake using the Sentinel-2 remote-sensing image data, while the month information was used as one of the input features, using OGolden-DBO. The results show that, compared with other optimizations, the XGBoost model of OGolden-DBO has the best inversion effect, and can accurately invert the Chl-a concentration in Wuliangsu Lake.
Comprehensive analysis of this paper reveals that under the existing system of agricultural development, farmers are decentralizing their farming and gradually replacing farmyard manure with chemical fertilizers. This leads to a variety of agricultural surface pollution, and irregular and random emission points. This type of surface pollution from agriculture causes the concentration of Chl-a in Wuliangsu Lake to rise dramatically in August and September, intensifying the degree of eutrophication. Water quality is also impacted by climate change at the same time. The OGolden-DBO-XGBoost model can help address the water quality issues facing Wuliangsu Lake by offering timely and accurate monitoring and early warning of changes in water quality, aiding in the management of water quality, making data from water quality testing publicly available, raising public awareness of the need to protect Wuliangsu Lake, and providing scientific data support for environmental agencies and the government to develop effective policies for agriculture and water resource management. In summary, the OGolden-DBO-XGBoost model can accurately invert the concentration of Chl-a. It is also a useful tool for maintaining the biological balance of the lake and encouraging its sustainable expansion.
However, the data used to train the model were limited due to weather, frequency of satellite transit, etc. In future work, more data need to be taken to further improve the accuracy and reliability of the model. Meanwhile, the possible correlation between the bands increases the complexity of the model, so future studies will select specific bands and reduce the complexity of the model to predict the Chl-a concentration in Wuliangsu Lake more accurately.

Author Contributions

Conceptualization, H.Z.; methodology, H.Z.; software, H.Z.; validation, H.Z.; formal analysis, H.Z.; data curation, H.Z.; writing—original draft preparation, H.Z.; writing—review and editing, X.F. and H.L.; visualization, H.Z.; supervision, X.F. and H.L.; project administration, X.F.; funding acquisition, X.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China (62041211, 61962047), Inner Mongolia Autonomous Region Science and Technology Major Special Project (2021ZD0004), Science and Technology Programme of Inner Mongolia Autonomous Region (2022YFHH0070), Program for Innovative Research Team in Universities of Inner Mongolia Autonomous Region (NMGIRT2313), Basic Research Operation Funds for Universities under Inner Mongolia Autonomous Region (BR22-14-05), Collaborative Innovation Project of Universities and Institutes in Hohhot (XTCX2023-20, XTCX2023-24).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Hou, X.; Feng, L.; Dai, Y.; Hu, C.; Gibson, L.; Tang, J.; Lee, Z.; Wang, Y.; Cai, X.; Liu, J.; et al. Global mapping reveals increase in lacustrine algal blooms over the past decade. Nat. Geosci. 2022, 15, 130–134. [Google Scholar] [CrossRef]
  2. Yan, Y.; Yueyue, W.; Duxian, F.; Jie, R.; Ying, W. Bacterial diversity and influencing factors of surface sediments in Baiyangdian. J. Environ. Eng. 2021, 15, 1121–1130. [Google Scholar]
  3. Xue, G.; Shuailong, W.; Peirong, S.; Chutian, X.; Dapeng, L.; Yong, H. Phosphorus occurrence characteristics and environmental significance in the grass-algae lake area of Taihu Lake. Environ. Sci. 2019, 40, 5358–5366. [Google Scholar] [CrossRef] [PubMed]
  4. Ma, Y.; Song, K.; Wen, Z.; Liu, G.; Shang, Y.; Lyu, L.; Du, J.; Yang, Q.; Li, S.; Tao, H.; et al. Remote Sensing of Turbidity for Lakes in Northeast China Using Sentinel-2 Images With Machine Learning Algorithms. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 9132–9146. [Google Scholar] [CrossRef]
  5. Wang, C.; Jiang, W.; Deng, Y.; Ling, Z.; Deng, Y. Long Time Series Water Extent Analysis for SDG 6.6.1 Based on the GEE Platform: A Case Study of Dongting Lake. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 490–503. [Google Scholar] [CrossRef]
  6. Liu, F.; Tang, S. Evaluation of Red-Peak Algorithms for Chlorophyll Measurement in the Pearl River Estuary. IEEE Trans. Geosci. Remote Sens. 2019, 57, 8928–8936. [Google Scholar] [CrossRef]
  7. Tian-shi, F.; Zhi-guo, P.; Wei, J. Remote Sensing Retrieval of Chlorophyall-a Concentration in Lake Chaohu Based on Zhuhai-1 Hyperspectral Satellite. Spetroscopy Spectr. Anal. 2022, 42, 2642–2648. [Google Scholar]
  8. Zolfaghari, K.; Pahlevan, N.; Binding, C.; Gurlin, D.; Simis, S.G.H.; Verdú, A.R.; Li, L.; Crawford, C.J.; Vanderwoude, A.; Errera, R.; et al. Impact of Spectral Resolution on Quantifying Cyanobacteria in Lakes and Reservoirs: A Machine-Learning Assessment. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–20. [Google Scholar] [CrossRef]
  9. Pengfei, X.; Qian, C.; Pingbin, J. Research on remote sensing retrieval of chlorophyll a in clean water bodies of Qiandao Lake based on neural network model. Yangtze River Basin Resour. Environ. 2021, 30, 1670–1679. [Google Scholar]
  10. Cao, Z.; Ma, R.; Duan, H.; Pahlevan, N.; Melack, J.; Shen, M.; Xue, K. A machine learning approach to estimate chlorophyll-a from Landsat-8 measurements in inland lakes. Remote Sens. Environ. 2020, 248, 111974. [Google Scholar] [CrossRef]
  11. Tingting, X.; Yunzhi, C.; Wenfang, L.; Xiaoqin, W. For GF-1 Research on chlorophyll a inversion model of WFV data in the lower reaches of Minjiang River. J. Environ. Sci. 2019, 39, 4276–4283. [Google Scholar] [CrossRef]
  12. Chen, J.; Chen, S.; Fu, R.; Wang, C.; Li, D.; Peng, Y.; Wang, L.; Jiang, H.; Zheng, Q. Remote Sensing Estimation of Chlorophyll-A in Case-II Waters of Coastal Areas: Three-Band Model Versus Genetic Algorithm–Artificial Neural Networks Model. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 3640–3658. [Google Scholar] [CrossRef]
  13. Wang, Y.S.; Rong, N.; Hou, J. Characteristics and Influencing Factors Analysis of Water Exchange in the Wuliangsuhai Lake. Water Resour. Power 2021, 39, 39–42, 88. [Google Scholar]
  14. Liu, H.Y.; Lu, J.P.; Zhao, S.N.; Shi, X.H.; Sun, B.; Zhang, X.J.; Shi, Z.Y.; Mi, J.H. Water environment change trend and ecological water replenishment of Lake Wuliangsuhai and other key driving factors analysis based on long time series (2011–2020). J. Lake Sci. 2023, 35, 1939–1948. [Google Scholar]
  15. Shi, R.; Zhao, J.; Shi, W.; Song, S.; Wang, C. Comprehensive Assessment of Water Quality and Pollution Source Apportionment in Wuliangsuhai Lake, Inner Mongolia, China. Int. J. Environ. Res. Public Health 2020, 17, 54. [Google Scholar] [CrossRef] [PubMed]
  16. Wei, X.; Wen, Y.; Wang, Z. Water quality analysis and pollution assessment of Wuliangsuhai drainage ditch into the lake. Proc. Yangtze River Acad. Sci. 2023, 40, 63–69. [Google Scholar]
  17. Yue, C.P.; Li, X.; Bao, L.S.; Wei, J.T. Using Remote Sensing to Estimate Seasonal Variation in Phytoplankton Biomasses in the Lake Wuliangsuhai. J. Irrig. Drain. 2020, 39, 122–128. [Google Scholar] [CrossRef]
  18. Yunfan, L.; Caixia, L.; Xiang, J.; Jing, W.; Xiaoli, Z.; Xiaoli, M.; Ruoning, Z.; Dong, W. Spatiotemporal Changes and Causes of Ecological Vulnerability in Ulansuhai Basin. J. Geo-Inf. Sci. 2023, 25, 2039–2054. [Google Scholar]
  19. Feifei, L.; Yong, Z.; Xue, L.; Cheng, C. Discussion on Wetland Protection and Restoration Along Wuliangsuhai Basin. For. Resour. Manag. 2019, 5, 23–27, 67. [Google Scholar] [CrossRef]
  20. Song, S.; Li, C.; Shi, X.; Zhao, S.; Tian, W.; Li, Z.; Bai, Y.; Cao, X.; Wang, Q.; Huotari, J.; et al. Under-ice metabolism in a shallow lake in a cold and arid climate. Freshw. Biol. 2019, 64, 1710–1720. [Google Scholar] [CrossRef]
  21. Shi, X.; Yu, H.; Zhao, S.; Sun, B.; Liu, Y.; Huo, J.; Wang, S.; Wang, J.; Wu, Y.; Wang, Y.; et al. Impacts of environmental factors on Chlorophyll-a in lakes in cold and arid regions: A 10-year study of Wuliangsuhai Lake, China. Ecol. Indic. 2023, 148, 110133. [Google Scholar] [CrossRef]
  22. Yu, H.; Shi, X.; Zhao, S.; Sun, B.; Liu, Y.; Arvola, L.; Li, G.; Wang, Y.; Pan, X.; Wu, R.; et al. Primary productivity of phytoplankton and its influencing factors in cold and arid regions: A case study of Wuliangsuhai Lake, China. Ecol. Indic. 2022, 144, 109545. [Google Scholar] [CrossRef]
  23. Agency, E.S. Sentinel-2 User Handbook; European Space Agency: Noordwijk, The Netherlands, 2021. [Google Scholar]
  24. Sun, R.; Wang, J.; Cheng, Q.; Mao, Y.; Ochieng, W.Y. A new IMU-aided multiple GNSS fault detection and exclusion algorithm for integrated navigation in urban environments. GPS Solut. 2021, 25, 147. [Google Scholar] [CrossRef]
  25. Du, D.D.; Li, C.Y.; Shi, X.H.; Zhao, S.; Quan, D.; Yang, Z. Seasonal changes of nutritional status of lake Wuliangsuhai. J. Arid Land Resour. Environ. 2019, 33, 186–192. [Google Scholar] [CrossRef]
  26. Yu, H.; Shi, X.; Wang, S.; Zhao, S.; Sun, B.; Liu, Y.; Yang, Z. Trophic status of a shallow lake in Inner Mongolia: Long-term, seasonal, and spatial variation. Ecol. Indic. 2023, 156, 111167. [Google Scholar] [CrossRef]
  27. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
  28. Friedman, F.J. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2002, 29, 1189–1232. [Google Scholar] [CrossRef]
  29. Xue, J.; Shen, B. Dung beetle optimizer: A new meta-heuristic algorithm for global optimization. J. Supercomput. 2023, 79, 7305–7336. [Google Scholar] [CrossRef]
  30. Tizhoosh, H.R. Opposition-Based Learning: A New Scheme for Machine Intelligence. In Proceedings of the International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC’06), Vienna, Austria, 28–30 November 2005; pp. 695–701. [Google Scholar]
  31. Tanyildizi, E.; Demir, G. Golden Sine Algorithm: A Novel Math-Inspired Algorithm. Adv. Electr. Comput. Eng. 2017, 17, 71. [Google Scholar] [CrossRef]
  32. Xue, J.; Shen, B. A novel swarm intelligence optimization approach: Sparrow search algorithm. Syst. Sci. Control Eng. 2020, 8, 22–34. [Google Scholar] [CrossRef]
  33. Mirjalili, S.; Lewis, A. The Whale Optimization Algorithm. Adv. Eng. Softw. 2016, 95, 51–67. [Google Scholar] [CrossRef]
  34. Zhang, Q.; Shi, X.; Zhao, S. Analysis of Pollution Sources and Evaluation of Water Quality Changes of Ulan Suhai Lake during Frozen and Non-frozen Periods from 2016 to 2021. Wetl. Sci. 2022, 20, 829–837. [Google Scholar] [CrossRef]
  35. Hu, H.; Fu, X.; Li, H.; Wang, F.; Duan, W.; Zhang, L.; Liu, M. Prediction of lake chlorophyll concentration using the BP neural network and Sentinel-2 images based on time features. Water Sci. Technol. 2023, 87, 539–554. [Google Scholar] [CrossRef] [PubMed]
  36. Jiang, X.; Li, C.; Shi, X.; Sun, B.; Zhao, S.; Sun, C. Spatial and temporal distribution of chlorophyll-a concentration and its relationships with environmental factors in Lake Ulansuhai. Ecol. Environ. Sci. 2019, 28, 964–973. [Google Scholar] [CrossRef]
Figure 1. Geographic location of Wuliangsu Lake and sampling locations.
Figure 1. Geographic location of Wuliangsu Lake and sampling locations.
Applsci 14 04798 g001
Figure 2. Flow chart of remote-sensing image data preprocessing.
Figure 2. Flow chart of remote-sensing image data preprocessing.
Applsci 14 04798 g002
Figure 3. Flowchart of improved DBO with the Obl strategy.
Figure 3. Flowchart of improved DBO with the Obl strategy.
Applsci 14 04798 g003
Figure 4. Flowchart of the improved DBO algorithm with Gold-SA.
Figure 4. Flowchart of the improved DBO algorithm with Gold-SA.
Applsci 14 04798 g004
Figure 5. Flowchart of the OGolden-DBO-XGBoost Chl-a concentration inversion model.
Figure 5. Flowchart of the OGolden-DBO-XGBoost Chl-a concentration inversion model.
Applsci 14 04798 g005
Figure 6. Prediction results of the model for (a) M-XGBoost, (b) XGBoost, (c) M-SVR, (d) SVR, (e)M-Linear regeression, and (f) Linear regeression.
Figure 6. Prediction results of the model for (a) M-XGBoost, (b) XGBoost, (c) M-SVR, (d) SVR, (e)M-Linear regeression, and (f) Linear regeression.
Applsci 14 04798 g006
Figure 7. Prediction results of the model for (a) XGBoost, (b) SSA-XGBoost, (c) WOA-XGBoost, and (d) DBO-XGBoost.
Figure 7. Prediction results of the model for (a) XGBoost, (b) SSA-XGBoost, (c) WOA-XGBoost, and (d) DBO-XGBoost.
Applsci 14 04798 g007
Figure 8. Prediction results of the model for (a) DBO-XGBoost, (b) Obl-DBO-XGBoost, (c) Golden-DBO-XGBoost, and (d) OGolden-DBO-XGBoost.
Figure 8. Prediction results of the model for (a) DBO-XGBoost, (b) Obl-DBO-XGBoost, (c) Golden-DBO-XGBoost, and (d) OGolden-DBO-XGBoost.
Applsci 14 04798 g008
Figure 9. Distribution of Chl-a concentration in Wuliangsu Lake.
Figure 9. Distribution of Chl-a concentration in Wuliangsu Lake.
Applsci 14 04798 g009
Table 1. Chl-a concentration hierarchy.
Table 1. Chl-a concentration hierarchy.
Standard GradeChl-a (μg/L)Nutritional Level
I<1.6Light nutrition
II1.6~10Middle nutrition
III10.0~26Light eutrophication
IV26.0~64Middle eutrophication
V64.0~160Severe eutrophication
ShoddyV>160Extreme eutrophication
Table 2. Selected measured Chl-a concentrations.
Table 2. Selected measured Chl-a concentrations.
Measured DataRemote-Sensing
Image Data
Sampling PointMeasured Chl-a
Concentration (μg/L)
Nutritional Level
2015.09.252015.09.25O1013.958Light eutrophication
2016.06.202016.06.21S64.704Middle nutrition
2017.06.252017.06.26R73.214Middle nutrition
2017.08.292017.08.30P930.283Middle eutrophication
2017.09.252017.09.24Q1022.587Light eutrophication
2018.07.262018.07.26M147.4513Middle nutrition
Table 3. Sentinel-2 sensor spectral characteristic information.
Table 3. Sentinel-2 sensor spectral characteristic information.
NumberSentinel-2ASentinel-2BSpatial Resolution
(m)
Central
Wavelength
(nm)
Bandwidth
(nm)
Central
Wavelength
(nm)
Bandwidth (nm)
Band 1433.927442.34560
Band 2496.698492.19810
Band 3560.045559.04610
Band 4664.538665.03910
Band 5703.919703.82020
Band 6740.218739.11820
Band 7782.528779.72820
Band 8835.1145833.013310
Band 8A864.833864.03220
Band 9945.026943.22760
Band 101373.5751376.97660
Band 111613.71431610.414120
Band 122202.42422185.723820
Table 4. Data feature structure.
Table 4. Data feature structure.
NumberJuneJulyAugustSeptemberB01B02B12Measured Chl-a
Concentration (μg/L)
Data10001611524012.018
Data210001761656010.095
Data3001015825419720.012
Data40100120213533.6673
Table 5. Parameters of XGBoost and meaning.
Table 5. Parameters of XGBoost and meaning.
ParametersMeaning
n_estimatorsNumber of iterations
max_depthMaximum depth
learning_rateLearning rate
gammaCoefficient of the number of leaf nodes
reg_alphaL1 regular term coefficient
reg_lambdaL2 regular term coefficient
min_child_weightSum of sample weights of minimum leaf nodes
Table 6. DBO algorithm and OGolden-DBO algorithm parameter settings.
Table 6. DBO algorithm and OGolden-DBO algorithm parameter settings.
ParameterDBOOGolden-DBO
Maximum iterations number1010
Population size5050
b0.40.4
k0.20.2
Table 7. Chl-a concentration inversion results from the XGBoost model.
Table 7. Chl-a concentration inversion results from the XGBoost model.
ModelMonthly
Feature
Training SetTest Set
R2RMSE (μg/L)MAE (μg/L)R2RMSE (μg/L)MAE (μg/L)
XGBoostYes0.81834.09822.57750.77114.18453.3462
No0.57996.23154.13510.47836.31774.5693
SVRYes0.74654.84092.62620.74344.43093.6722
No0.63315.82353.16780.41166.70954.9062
Linear regressionYes0.69035.35073.71610.63705.27004.5299
No0.44167.18454.98520.513686.09994.7047
Table 8. Inversion results of the XGBoost model after optimization by different intelligent optimization algorithms.
Table 8. Inversion results of the XGBoost model after optimization by different intelligent optimization algorithms.
ModelTraining SetTest Set
R2RMSE (μg/L)MAE (μg/L)R2RMSE (μg/L)MAE (μg/L)
XGBoost0.81834.09822.57750.77114.18453.3462
SSA-XGBoost0.85153.70482.23910.83963.50272.7974
WOA-Xgboost0.84453.79122.13120.84343.46112.7737
DBO-XGBoost0.85783.62592.13460.84903.39872.7216
Table 9. Different improvement strategies to improve DBO-optimized XGBoost inversion results.
Table 9. Different improvement strategies to improve DBO-optimized XGBoost inversion results.
ModelTraining SetTest Set
R2RMSE (μg/L)MAE (μg/L)R2RMSE (μg/L)MAE (μg/L)
DBO-XGBoost0.85783.62592.13460.84903.39872.7216
Obl-DBO-XGBoost0.86793.49491.98040.86183.25152.6637
Golden-DBO-XGBoost0.86453.53952.06960.86023.27082.5824
OGolden-DBO-XGBoost0.89363.13531.89180.88502.96592.4282
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhou, H.; Fu, X.; Li, H. Inversion of Chlorophyll-a Concentration in Wuliangsu Lake Based on OGolden-DBO-XGBoost. Appl. Sci. 2024, 14, 4798. https://doi.org/10.3390/app14114798

AMA Style

Zhou H, Fu X, Li H. Inversion of Chlorophyll-a Concentration in Wuliangsu Lake Based on OGolden-DBO-XGBoost. Applied Sciences. 2024; 14(11):4798. https://doi.org/10.3390/app14114798

Chicago/Turabian Style

Zhou, Hao, Xueliang Fu, and Honghui Li. 2024. "Inversion of Chlorophyll-a Concentration in Wuliangsu Lake Based on OGolden-DBO-XGBoost" Applied Sciences 14, no. 11: 4798. https://doi.org/10.3390/app14114798

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop