An Ensemble Machine Learning Model to Estimate Urban Water Quality Parameters Using Unmanned Aerial Vehicle Multispectral Imagery

Lei, Xiangdong; Jiang, Jie; Deng, Zifeng; Wu, Di; Wang, Fangyi; Lai, Chengguang; Wang, Zhaoli; Chen, Xiaohong

doi:10.3390/rs16122246

Open AccessArticle

An Ensemble Machine Learning Model to Estimate Urban Water Quality Parameters Using Unmanned Aerial Vehicle Multispectral Imagery

by

Xiangdong Lei

¹,

Jie Jiang

¹,

Zifeng Deng

¹,

Di Wu

¹,

Fangyi Wang

¹,

Chengguang Lai

^1,2,

Zhaoli Wang

^1,2,* and

Xiaohong Chen

³

¹

School of Civil Engineering and Transportation, State Key Laboratory of Subtropical Building and Urban Science, South China University of Technology, Guangzhou 510641, China

²

Pazhou Lab, Guangzhou 510335, China

³

Center for Water Resources and Environment, Sun Yat-sen University, Guangzhou 510275, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(12), 2246; https://doi.org/10.3390/rs16122246

Submission received: 1 May 2024 / Revised: 3 June 2024 / Accepted: 16 June 2024 / Published: 20 June 2024

(This article belongs to the Special Issue Remote Sensing in Natural Resource and Water Environment II)

Download

Browse Figures

Versions Notes

Abstract

:

Urban reservoirs contribute significantly to human survival and ecological balance. Machine learning-based remote sensing techniques for monitoring water quality parameters (WQPs) have gained increasing prominence in recent years. However, these techniques still face challenges such as inadequate band selection, weak machine learning model performance, and the limited retrieval of non-optical active parameters (NOAPs). This study focuses on an urban reservoir, utilizing unmanned aerial vehicle (UAV) multispectral remote sensing and ensemble machine learning (EML) methods to monitor optically active parameters (OAPs, including Chla and SD) and non-optically active parameters (including COD_Mn, TN, and TP), exploring spatial and temporal variations of WQPs. A framework of Feature Combination and Genetic Algorithm (FC-GA) is developed for feature band selection, along with two frameworks of EML models for WQP estimation. Results indicate FC-GA’s superiority over popular methods such as the Pearson correlation coefficient and recursive feature elimination, achieving higher performance with no multicollinearity between bands. The EML model demonstrates superior estimation capabilities for WQPs like Chla, SD, COD_Mn, and TP, with an R² of 0.72–0.86 and an MRE of 7.57–42.06%. Notably, the EML model exhibits greater accuracy in estimating OAPs (MRE ≤ 19.35%) compared to NOAPs (MRE ≤ 42.06%). Furthermore, spatial and temporal distributions of WQPs reveal nitrogen and phosphorus nutrient pollution in the upstream head and downstream tail of the reservoir due to human activities. TP, TN, and Chla are lower in the dry season than in the rainy season, while clarity and COD_Mn are higher in the dry season than in the rainy season. This study proposes a novel approach to water quality monitoring, aiding in the identification of potential pollution sources and ecological management.

Keywords:

UAV remote sensing; optically and non-optically active parameters; genetic algorithm; ensemble machine learning

1. Introduction

Freshwater is critical for local, regional, and global biodiversity [1,2]; ecosystem productivity [3,4]; and human well-being [5,6]. However, inland waters, including urban reservoirs, face severe pollution and threats due to human activities such as industrialization and urbanization [7,8,9]. Urban reservoirs, situated in heavily urbanized areas, are particularly susceptible to untreated industrial, agricultural, aquacultural, and domestic wastewater discharge. In some cases, human activities result in eutrophication [10] and algal blooms [11,12], leading to reduced water clarity [13]. Therefore, monitoring the water quality of urban reservoirs, integral to inland water systems, is essential for preventing and managing water quality deterioration [14].

Conventional water quality monitoring relies on field sampling, laboratory analysis, or automated monitoring stations [15]. However, these methods are limited in providing real-time, comprehensive, and wide-scale water quality monitoring due to their labor-intensive, costly, and time-consuming nature [16]. Furthermore, field sampling measurement may not fully represent the water quality status of entire water systems [17]. Satellite remote sensing technology addresses these limitations by efficiently monitoring various water quality parameters (WQPs) across large water bodies including chlorophyll-a (Chla) [18], total phosphorus (TP), total nitrogen (TN) [17], chemical oxygen demand, biological oxygen demand [19], and clarity (Secchi disk depth, SD) [20]. However, it is unsuitable for the high-precision monitoring of small inland water bodies due to fixed revisit periods, low spatial resolution, and susceptibility to external environmental factors [21,22]. Unmanned aerial vehicle (UAV) remote sensing is a valuable alternative, offering flexibility, superior spatial resolution, and minimal environmental impact [23].

The principle of water quality retrieval methods using remote sensing is founded on the fusion of spectral information with concentration data for WQPs to construct retrieval models. These models transform spectral information into corresponding WQPs [24]. Methods for WQP estimation mainly involve semi-analytical methods [25], empirical methods [26], and machine learning methods [27]. Semi-analytical methods require empirical and experimental knowledge to identify feature bands, constraining their applicability and reducing their robustness. Empirical methods require large amounts of data and struggle to establish complex nonlinear relationships, making them less suitable for estimating non-optical activity parameters (NOAPs) such as TP, TN, COD_Mn, and others. However, machine learning (ML) methods excel at unraveling intricate relationships between independent variables and multiple dependent variables [17], providing new insights for remote sensing water quality retrieval and monitoring. In recent years, the ML method has increasingly focused on estimating optical activity parameters (OAPs) such as Chla and clarity. For instance, Cao et al. [28] and Werther et al. [29] utilized extreme gradient boosting trees and Bayesian probabilistic neural networks for the quantitative retrieval of Chla in inland lakes. He et al. [30] proposed a novel and transferable hybrid deep learning-based recurrent model, which accurately predicted global lake clarity using Landsat 8 OLI data. These efforts highlight the evolving role of the ML method in enhancing OAP estimation in water quality monitoring.

The estimation of NOAPs is still challenging since NOAPs are not optically active and have low signal-to-noise ratios in the sensed wavelengths [22,31]. The ML method has improved the ability to capture the relationship between OAPs, NOAPs, and remote sensing reflectance (R_rs) [17], which is promising for the retrieval and monitoring of NOAPs. For instance, Li et al. [31] proposed an ML method with multispectral scale morphological combined features by combining local and global spectral morphological features, which provides a good approach for the retrieval of chemical oxygen demand (COD_Mn), dissolved oxygen, TP, and NH₃-N. Qun’ou et al. [32] used 12 ML algorithms and UAV hyperspectral images to retrieve the TN in the Miyun Reservoir and analyzed the spatial distribution of TN with the Extra Trees Regression. In short, the ML method has great potential in the field of the remote sensing water quality monitoring of OAPs and NOAPs.

However, in the context of complex spectral-responsive water bodies, single ML algorithms often encounter challenges such as insufficient computational capability and susceptibility to overfitting. To address these limitations, ensemble machine learning (EML) methods have emerged as a promising approach. By integrating multiple individual models, EML methods can effectively broaden the search space, mitigate overfitting risks, and offer enhanced flexibility in handling complex nonlinear regression predictions [33]. Despite these advantages, the application of EML methods in remote sensing for WQP estimation remains relatively underexplored. Additionally, the inadequate selection of feature bands may hinder the performance of ML models by failing to identify the most informative bands.

In this study, multispectral imagery captured using UAVs is utilized to quantitatively monitor and retrieve five WQPs in an urban reservoir. These parameters include Chla and SD as OAPs, and TN, TP, and COD_Mn as NOAPs. The research framework is shown in Figure 1. The three main objectives are as follows: (1) to select the feature bands based on the feature combination and genetic algorithm (FC-GA); (2) to develop two types of EML algorithms for estimating WQPs; and (3) to use the EML model to retrieve WQPs and to explore the spatial and temporal distribution of WQPs in urban reservoirs. This study aims to provide novel insights and valuable references for the monitoring and management of water environments in small inland waters.

2. Materials and Methods

2.1. Study Area

The Longdong Reservoir, the largest reservoir in the Tianhe District, Guangzhou, China (Figure 2), serves as the source of the Chebei River, the main river in the district. Its surface area and storage volume are 28 × 10⁴ m² and 237.91 × 10⁴ m³, respectively. With an annual average precipitation of 1650 mm [34], its climate is characterized by the subtropical monsoon, in which the rainy season ranges from April to October, while the dry season ranges from November to March of the following year. The main use of the reservoir is water supply and irrigation in the Tianhe District, one of the most economically developed and populous districts in Guangzhou [35,36]. It is necessary to monitor the water quality of the reservoir on a regular and effective basis to ensure its persistent development.

2.2. Method

2.2.1. Multispectral Data Collection and Preprocessing

In this study, high-spatial-resolution multispectral images of the Longdong Reservoir were captured using the DJI Phantom 4 Multispectral (P4M) (DJI, Shenzhen, Guangdong, China) UAV (Figure S1). The P4M UAV can simultaneously acquire high-resolution multispectral data including Blue (450 nm ± 16 nm), Green (560 nm ± 16 nm), Red (650 nm ± 16 nm), Red-edge (730 nm ± 16 nm), and Near-infrared (840 nm ± 26 nm) bands [37].

During the period from 4 January 2022 to 11 June 2023, a total of six periods of multispectral imagery were collected using the P4M UAV, including five rainy seasons and one dry season, as shown in Table S1. To minimize the effects of water surface specular reflections [38] and sky clouds [35] on multispectral data, six UAV flight missions were conducted under clear-sky and low-wind-speed (<5 m/s) conditions between the local time of 9:00–11:00 and 14:00–16:00, with altitudes around 100 m [39]. In addition, to meet the requirements of multispectral image splicing afterward, the course overlap and the side overlap were set at 80% and 70%, respectively. Because of the limited battery power, 6 flights were required to complete the photographing of the whole reservoir. Before each flight, a calibration plate was taken for subsequent calculations of the R_rs of the multispectral image, and water samples were taken at specific locations (Figure 2) in the reservoir with the help of GPS devices during the flight.

With the help of a gray standard panel of R_rs and the internal physical parameters of the P4M camera [40], the Pix4D Mapper 4.5.6 (https://pix4d.com/ (accessed on 16 June 2024)) was used for the image mosaicking, orthorectification, and radiometric calibration of multispectral raw images. However, due to the low altitude of the UAV and the small size of the reservoirs, the effects of atmospheric refraction and earth curvature were neglected [41]. Therefore, in utilizing the Pix4D Mapper 4.5.6 and three gray standard panels (25%, 50%, 75%) (Figure S1), the raw multispectral images from the UAV were processed and, subsequently, an R_rs image of the entire reservoir in five bands could be obtained [39]. In addition, to eliminate the spectral differences caused by individual pixel noise, the multispectral images were uniformly resampled to a resolution of 0.1 m.

Furthermore, we preprocessed the R_rs data corresponding to each sampling point, removing the anomalous spectra that are inconsistent with the normal water body [22]. Finally, the statistical information of the spectra for each period is shown in Figure S2. In the Longdong Reservoir, most of the R_rs data have obvious reflection peaks as shown in the Green band, and obvious reflection troughs are shown in the Red-edge and Near-infrared bands, which are consistent with the spectral trends of Chen et al. [42] and Cai et al. [43].

2.2.2. Water Sample Collection and Measurement

A total of 94 valid water samples were collected from the Longdong Reservoir, of which 40, 30, and 24 samples were collected on 7 April 2022, 31 July 2022, and 26 April 2023, respectively. It is worth noting that each period of field sampling was localized to a designated sampling point with the help of a GPS device to map the water quality data at the sampling point to the R_rs of the UAV multispectral imagery. In addition, water samples were collected at a depth of 50 cm below the water surface at each sampling point, collected in 1 L polyethylene bottles and stored at a low temperature (<4 °C) and under protection from light, and then analyzed in the laboratory for water quality within 12 h.

All WQPs—COD_Mn, TN, and TP—were analyzed in the chemistry experiment, except for chlorophyll a (Chla) and clarity, which were measured in the field with an online self-cleaning chlorophyll sensor (LH-T615, Hangzhou China) and a Secchi disk, respectively. In this study, Chinese national standards (GB11892-89 and GB11893-89), environmental protection standards (HJ636-2012 and HJ897-2017), and industry standards (SL87-1994) [44] were applied to determine COD_Mn, TP, TN, Chla, and SD, respectively. Notably, the specific measurements for each water quality parameter are shown in Table S2. After eliminating inaccuracies and outliers through data filtering and checking, we studied and analyzed the remaining data. The data used for following machine learning model training and validation are shown in Table S3.

2.2.3. Oversampling of WQPs Samples Based on SMOTE

The five datasets consisting of WQPs and R_rs were generated from the five different types of effective WQPs (Chla, SD, COD_Mn, TN, and TP) in Section 2.2.2 and the corresponding R_rs in Section 2.2.1. Since the training and validation dataset generated using the random sampling method may not adequately characterize the data distribution of the whole dataset, a stratified random sampling method was applied to divide the whole dataset in this study. The stratified random sampling method is conducted as follows:

(1): Divide one of the datasets into five equal parts and calibrate them into five categories (I, II, III, IV, and V) according to the WQP ranges;
(2): According to the five categories, divide the whole dataset into the training dataset (referred to as T-I) and validation dataset (referred to as V-I) according to the ratio of 3:1 using the stratified random sampling method;
(3): Execute the same processing for the remaining four datasets.

However, there may be categories with relatively small amounts of data in the 5 categories of each dataset. Therefore, due to the small and unevenly distributed data sample sizes of the training dataset, feature band selection cannot be performed well and machine learning models cannot be trained well [45]. In this study, oversampling is needed to make the training dataset uniformly distributed, and the synthetic minority oversampling technique (SMOTE) [46] is selected to oversample each type of WQP. More importantly, only the training dataset is processed for oversampling using the SMOTE method to prevent information leakage from the whole dataset. For the training dataset of one of the WQPs, n-1 is chosen as the k-nearest neighbors assuming that the smallest size of data among the 5 categories is n, and then each category is oversampled. Finally, a synthetic training dataset (referred to as T-II) of each WQP is obtained, and the same process is carried out for the training datasets of the other four WQPs.

2.2.4. Feature Band Selection Based on Feature Combination and Genetic Algorithm (FC-GA)

In this study, we propose a band feature selection method based on FC-GA, in which a genetic algorithm is used and the multicollinearity between feature bands for feature engineering is considered. In addition, the T-II dataset generated in Section 2.2.3 was used for feature engineering based on FC-GA. The specific process is shown in Figure 3.

Firstly, the five original R_rs bands are mathematically transformed using reciprocals, logarithms, exponents, squares, and square roots. Four arithmetic operations and standardization methods [47,48] are used to combine the different bands with different WQPs at the corresponding locations, and then the combined band datasets of five WQPs are obtained. Secondly, different machine learning algorithms are selected for each WQP, and genetic algorithms are used to globally filter all the band features to obtain the primary band features of the corresponding algorithms for each WQP. Finally, considering the problem of multicollinearity among the primary band features [49], further band feature selection is performed through the method of using the variance inflation factor (VIF) to obtain the optimal band features. In the case that the VIF value of the selected features is less than 10 [50], the final band feature of the corresponding algorithm for each WQP is obtained, which can be considered to be non-multicollinear.

In conclusion, the WQPs of T-II and V-I are combined with the optimal band features to generate a synthetic training dataset (T-III) and validation dataset (V-III), respectively.

The expression of the VIF is as follows:

V I F_{i} = \frac{1}{1 - R_{i}^{2}}

(1)

where

V I F_{i}

is the variance inflation factor of the explanatory variable

i

for all features while

R_{i}^{2}

is the multicorrelation coefficient between variable

i

and all others.

2.2.5. Ensemble Machine Learning (EML) Algorithm

With the survey and analysis of related kinds of literature, the popular ML algorithms selected in this study include Bayesian ridge regression (BRR) [50], k-nearest neighbor regression (NNR) [51], Support Vector Regression (SVR), classification and regression tree (CART) [52], Random Forests (RF) [53], Light Gradient Boosting Machine (LightGBM) [54], and Multilayer Perceptron (MLP) [55].

Traditionally, EML algorithms are known to minimize the loss function, leading to the improved predictive performance of the model compared to individual ML algorithms [56]. The EML algorithm consists of two levels: the first level is termed level 0 and the second level is termed level 1. Typically, level 0 contains one or more ML models (called the base-model), while level 1 contains only one ML model (called the meta-model). The essence of the EML model is learning from the predictions of the base-model using the meta-model. Therefore, the meta-model can learn different types of WQP estimation errors, which leads to the performance improvement of the EML model. In this study, the input data of the EML model include the feature bands generated using FC-GA and corresponding WQPs of each selected ML model; meanwhile, the output data are the WQPs predicted using the EML model. Consequently, we combine two or more ML algorithms to form the EML algorithm, which consists of two types: a regular EML algorithm (called EML-1) and a revised EML algorithm [57] (called EML-2).

The specific process of EML-1 model prediction for each WQP, using TN as an example, is as follows:

(a): The T-III dataset $D = \{(x_{i}, y_{i}), i = 1, 2, \dots, m\}$ is divided into five datasets, four of which are used for training, and the remaining one is used for prediction to perform a 5-fold cross-validation.
(b): The datasets divided in (a) are used for the training and prediction of one type of base-model, and the prediction results $(a_{1}, a_{2}, a_{3}, a_{4}, a_{5})$ are obtained after 5 cycles. The 5 predictions are vertically combined into 1 column to form a column vector $A_{1}$ with the same number of columns as $D$ .
(c): Steps (a) and (b) are repeated in all the selected base-models to obtain the same number of column vectors $A_{i}$ as the number of columns of $D$ .
(d): The base-model is trained using the entire T-III dataset and then the V-III dataset is predicted to form $B_{i} = (B_{1}, B_{2}, \dots, B_{n})$ with the same number of columns as $A_{i} (i = 1, 2, \dots, n)$ .
(e): Input $B_{i}$ to the meta-model trained on $A_{i}$ finally yields $C_{i}$ , which is the predicted TN.

EML-2 is a revised version of EML-1 and is consistent with the processing steps (a) to (d) of EML-1. Step (e) is modified as follows: the feature bands of T-III are combined with

A_{i}

and input into the meta-model for training, and then

B_{i}

is predicted to obtain

C_{i}

, which is the predicted TN. The modeling framework of the EML-1 and EML-2 is shown in Figure 4. In particular, the SMOTE, FC-GA, ML algorithm, and EML algorithm operations mentioned above in this study are based on the Python 3.6 platform.

2.2.6. Performance Evaluation

In the study, to evaluate the performance of the models for estimating WQPs, the coefficient of determination (R²), root mean square error (RMSE), and mean relative error (MRE) were selected as evaluation indices used to compare the accuracy of single algorithms for a single WQP, while absolute residual (AR) was used to analyze the differences of accuracy for a single WQP. In addition, relative error (

μ

) and MRE were selected as evaluation indices for comparing the differences in the accuracy of each algorithm for OAPs and NOAPs. The formulas for R², RMSE, MRE, and AR,

μ

, were calculated as follows, respectively:

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(\hat{y_{i}} - \bar{y})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(2)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {({\hat{y}}_{i} - y_{i})}^{2}}

(3)

M R E = \frac{\sum_{i = 1}^{n} \frac{|{\hat{y}}_{i} - y_{i}|}{y_{i}}}{n} \times 100 %

(4)

A R = |{\hat{y}}_{i} - y_{i}|

(5)

μ = \frac{|{\hat{y}}_{i} - y_{i}|}{y_{i}}

(6)

where

y_{i}

and

{\hat{y}}_{i}

indicate the observed value and the predicted value of the WQP samples, respectively;

\bar{y}

is the average of the observed values of the WQP samples; and

n

is the number of samples.

3. Results

3.1. Statistical Analysis of Water Quality Samples

All 94 water samples in this study came from the reservoir, with 30 water samples belonging to Grade III, 34 samples belonging to Grade IV, 6 samples belonging to Grade V, and 24 samples belonging to inferior Grade V, according to China’s Environmental Quality Standards for Surface Water (GB3838-2002 [8]) (Table S4). Descriptive statistics were performed on the whole dataset, training, and validation dataset for each WQP. As illustrated in Table S5, the data range, quantile statistics, standard deviation (SD), and coefficient of variation (CV) of the training and validation dataset of the five WQPs are similar to those of the whole dataset, which suggests that the training and validation dataset of each WQP are sufficiently representative to be used for the training and validation of the machine learning model, respectively.

After the oversampling process in Section 2.2.3, the size of the category data for the training dataset of each WQP before and after oversampling is shown in Table S6. In addition, the distributions of the five WQPs before and after oversampling are shown in Figure S3. From the above table and figure, it is shown that after oversampling via the SMOTE method, the size of the data for all five categories in all the training datasets is quite consistent, which can be used for band feature selection and the training of machine learning models.

3.2. Feature Band Selection Results

After a series of transformations and combinations of the original five bands using the FC-GA method, a total of 1120 feature bands of 5 × 6 = 30 single-feature bands, 6 × 5 × 5 × 4/2 + 5 × 5 × 4/2 = 550 dual-feature bands, 6 × 2 × 5 × 4 × 3/2 = 360 triple-feature bands, and 6 × 5 × 4 × 3 × 2/2/2 = 180 quad-feature bands are generated, and the remaining 1080 feature bands are obtained after outlier feature band removal. Information on all feature bands is shown in Table S7. For a machine learning algorithm with set default parameters, R² is used as an evaluation metric, and the preliminary selection of all feature bands for five-fold cross-validation is performed using a genetic algorithm, in which the parameter settings of the genetic algorithm are shown in Table S8. In addition, the VIFs of the preliminarily selected feature bands are calculated to evaluate the multicollinearity between feature bands.

After the feature bands with a VIF greater than 10 and the largest VIF are eliminated, the VIF of each feature band is recalculated, and the process is looped sequentially so that the VIF of all feature bands is less than 10; the final selection of feature band types is shown in Table S7.

Furthermore, to evaluate the performance of the FC-GA method, two popular methods of Pearson correlation coefficient (PC) [58] and recursive feature elimination (RFE) [59] are selected for comparative analysis with the FC-GA method. In this study, a dataset of T-III for each WQP is used for feature selection, and the feature selection results are evaluated using a dataset of V-III for the corresponding WQP. In addition, LightGBM with default parameters is selected as the evaluation algorithm, R² and RMSE are used as the evaluation indices, and both PC and RFE are configured to select five optimal band features. Finally, the VIF values of the selected features are calculated, and the maximum (VIF_max) and minimum (VIF_min) values are counted to determine the multicollinearity among the selected features.

The performance results of the three feature selection methods (Table 1) show that the FC-GA method performs better for all four WQPs except for the RFE method for SD (R² = 0.82, RMSE = 8.22 cm). In addition, the PC method performs poorly in all WQPs except for the feature selection of TN. Among them, the FC-GA method and the RFE method have the same performance for the characterization of COD_Mn selection. The RMSE evaluation demonstrated that compared with the other two methods, the FC-GA method performs from 8.15% to 9.57%, 5.67% to 21.20%, and 17.64% to 39.23% better in the feature selection of Chla, TN, and TP, respectively. It is worth noting that the VIF values between features for each WQP selection are greater than 10 for both PC and RFE methods, while the VIF values between features are less than 10 for the FC-GA method. In conclusion, considering the feature of multicollinearity, the FC-GA feature selection method proposed in this study has a better performance compared with the PC and RFE methods.

3.3. Analysis of ML Models and EML Model Results

3.3.1. Comparison of Different Models for Water Quality Estimation

The EML model is integrated using several single ML models as detailed in Table 2. In order to find the best estimation model, the performance of seven ML models and two EML models are compared statistically (Table 3). Comparisons between the estimated and field-measured Chla, SD, COD_Mn, TN, and TP of validation datasets and the accuracy matrices are shown in Figure 5. The EML algorithm (MRE: 7.57% to 42.06%) has a higher performance than the ML algorithm (MRE: 7.40% to 150.65%). The highest estimation accuracies are obtained using the EML-1 model for both Chla (R² = 0.86, RMSE = 2.02 mg m⁻³, MRE = 13.66%) and COD_Mn (R² = 0.74, RMSE = 0.23 mg L⁻¹, MRE = 7.57%). In addition, SD and TP achieve the highest accuracy using the EML-2 model, with R² reaching 0.90 and 0.83, RMSE reaching 6.14 cm and 0.08 mg L⁻¹, and MRE reaching 16.34% and 22.29%, respectively. However, for TN, the estimation accuracy of the NNR algorithm for ML is higher than the EML algorithm, in which R², RMSE, and MRE reach 0.63, 0.18 mg L⁻¹, and 13.87%, respectively.

The best model for each method is taken, and the absolute residuals between the estimated and field-measured values of the validation set are calculated to obtain the mean absolute residuals (Table 4) to obtain the best method. The results showed that EML has smaller residuals than ML for all WQPs except for TN, where EML (0.13 mg L⁻¹) is slightly higher than the mean absolute residuals for ML (0.12 mg L⁻¹). The EML method improves the estimation accuracy of other WQPs by 10.53% to 56.94% compared to the ML method except for TN. Therefore, except for TN, EML is the best method for estimating a single WQP in this study.

The AR leveling is divided into three classes—high, medium, and low—according to field-measured values, and the average values of AR for the three classes are finally obtained separately. As shown in Table 5, the EML-1 model has relatively poor predictive performance for the high value of Chla (AR mean = 1.68 mg m⁻³) and the low value of COD_Mn (AR mean = 0.26 mg L⁻¹), while the EML-2 model has poor predictive performance for the low value of SD (AR mean = 6.89 cm) and the high value of TP (AR mean = 0.063 mg L⁻¹). In addition, for the TN extreme value prediction performance, the NNR model predicts a high value (AR mean = 0.19 mg L⁻¹) with worse performance. Therefore, in this study, Chla, TP, and TN have relatively worse performance for high values, while SD and COD_Mn have relatively worse performance for low values.

3.3.2. Comparative Analysis of OAP and NOAP Estimation

As shown in Table 6, the proportion of μ ≤ 0.5 for OAPs is mainly in the range of 0.81 to 0.94, while the proportion of μ ≤ 0.5 for NOAPs is mainly distributed in the range of 0.79 to 0.97. The difference in the proportion of OAPs (0.13) is lower than that of NOAPs (0.18). Therefore, the estimation stability of OAPs is higher than that of NOAPs. Additionally, the proportion of μ ≤ 0.5 for the EML algorithm utilized to evaluate OAPs is 0.94, which is higher than that of the ML algorithm (proportion of μ ≤ 0.5:0.81~0.92). In conclusion, the proportions of μ ≤ 0.5 for NOAPs using the EML algorithm are mainly distributed from 0.88 to 0.97, respectively, which is higher than that of the ML algorithm (proportions of μ ≤ 0.5:0.79 to 0.95). In addition, the EML algorithm achieves the highest accuracy estimation of OAPs and NOAPs with the proportions of μ ≤ 0.5 being 0.94 and 0.97, respectively. Therefore, the EML algorithm is superior to the ML algorithm in estimating OAPs and NOAPs and is suitable for the estimation of OAPs and NOAPs.

To compare the differences in the accuracies of the two types of WQPs, OAPs, and NOAPs, the MREs of the WQPs estimated using the EML algorithm are counted separately (Table 7). The results show that the MREs of OAPs and NOAPs are mainly distributed in the range of 13.66~19.35% and 7.57~42.06%, respectively, indicating that the estimation accuracies of OAPs are higher than those of NOAPs. The best estimation accuracies of OAPs are in the range of 13.66~14.87%, with the highest estimation accuracies of Chla (MRE ≤ 14.87%), and the lowest estimation accuracies of SD (MRE ≤ 19.35%). Meanwhile, the best estimation accuracies of NOAPs range from 7.57% to 7.74%, among which COD_Mn has the highest estimation accuracy (MRE ≤ 7.74 %), and TP has the worst estimation accuracy (MRE ≤ 42.06%). Comparatively, the best estimation accuracies of COD_Mn (MRE = 7.57%) are higher than those of Chla, SD, and TP by 6.09% to 14.73%, and the best estimation accuracies of TN (MRE = 13.74%) are higher than those of SD and TP by 2.60% to 8.56%. Therefore, the overall estimation accuracy of OAPs is higher than that of NOAPs, and the best estimation accuracies of NOAPs are all higher than those of OAPs.

3.4. Reversal Results of Spatial and Temporal Distribution of WQPs

The best model is used to retrieve each WQP for each of the six periods in the Longdong Reservoir (Figure 6). The Longdong Reservoir presents a topographic feature of the high northeast and the low southwest with the northeast part being termed as the upstream, the southwest part as the downstream, and the middle part as the midstream in this study. During 4 January 2022, Chla concentration was higher in the head of the upstream, while SD and COD_Mn concentrations were lower, and TN concentration was higher with TP concentration, except for the middle reaches. In addition, during 31 July 2022, Chla concentrations were lower in the upstream, while SD and COD_Mn concentrations were lower at the corresponding locations, and TP was not significantly changed. In particular, TN concentrations were lower in the middle and upper upstream. During the periods of 27 May 2023 and 11 June 2023, the distribution of water quality concentrations was essentially the same, with higher Chla concentrations at the end of the downstream, lower SD and TP concentrations at the corresponding locations, and higher TN concentrations. In addition, it was found that the retrieval results of WQPs in a small number of regions were affected by vegetation shadows and sunspots.

In order to recognize the temporal variations of water quality in the Longdong Reservoir, the mean and variance of each water quality result from the retrieval of the best model are plotted (Figure 7) after vegetation shadows and sunspots are removed. On top of that, the mean and variance of the retrieval results for each water quality in the rainy season and the dry season are calculated and accounted for (Table 8). The results show that Chla concentration presents an increasing trend during the period from 26 April 2023 to 11 June 2023, with a corresponding increase in TN concentration and COD_Mn concentration and a decrease in clarity. In addition, during the period from 7 April 2022 to 31 July 2022, Chla concentration gradually decreases, at the lowest value (6.76 ± 0.47 mg m⁻³), while COD_Mn concentration gradually increases and at the highest value (2.96 ± 0.05 mg L⁻¹). Meanwhile, the corresponding TN and TP concentrations decrease gradually, while the clarity shows an increasing trend and the highest value (63.99 ± 4.02 cm) is observed. More importantly, the water quality of the Longdong Reservoir is polluted by TP during 26 April 2023 and is degraded to Grade V, but the water quality of the Longdong Reservoir during other periods is in Grade III or above. According to Table 8, TP concentration and Chla concentration are lower in the dry season than in the rainy season, while TN concentration is slightly lower in the dry season than in the rainy season. Additionally, clarity is significantly higher in the dry season than in the rainy season, and COD_Mn concentration is slightly higher in the dry season than in the rainy season.

4. Discussion

4.1. Performance of the FC-GA Method

In this study, the training dataset (T-I) of the original unbalanced dataset is oversampled using the SMOTE method, and mathematical transformations and band combinations are performed on the five original bands, resulting in the generated T-III training dataset for feature band selection. The feature selection is performed using the FC-GA method, and the optimal bands are finally selected for each WQP (Table S9), which provides good features for the machine learning model. The study demonstrates that each WQP corresponds to a significant difference in the features selected using each algorithm, which is possibly attributed to the different assumption spaces of each algorithm [60]. Meanwhile, the genetic algorithm used in FC-GA is characterized through a random and bidirectional search [61], which further leads to the variability of the selected features. Although the FC-GA method in this study globally selects the optimal bands for each algorithm to improve the estimation accuracy, the main feature bands cannot be determined for specific WQPs.

In addition, PC and RFE are selected for performance in comparison with the FC-GA method. The results show that the FC-GA method performs 8.15% to 9.57%, 5.67% to 21.20%, and 17.64% to 39.23% better in the feature selection of Chla, TN, and TP, respectively, compared to the other two methods. More significantly, the VIF values between features for each WQP selection are greater than 10 for both PC and RFE methods, while the VIF values between features are less than 10 for the FC-GA method. The RFE method recursively removes a specified number of features and uses the remaining features to construct a model, determines the contribution of each feature, and ranks the features for selection, which is easy to fall into a local optimum and has limitations in the selected features. Although the PC method ensures a strong correlation between the selected features and WQPs [42], the selected features are not optimal and have larger multicollinearity among the features. In conclusion, considering the feature of multicollinearity, the FC-GA feature selection method proposed in this study has a better performance compared with the PC and RFE methods.

4.2. Performance of EML Models

The EML method improves the estimation accuracy of WQPs by 10.53% to 56.94% compared to the ML method, except for TN. Zhou et al. [62] also proved that the EML algorithm could improve the performance of water quality estimation by integrating the advantages of the ML algorithm. EML-2 provides good estimations for SD and TP, while EML-1 provides high-accuracy estimations for Chla and COD_Mn. Among them, EML-2 has relatively more application studies in the field of the remote sensing retrieval of WQPs [16,44], while EML-1 has relatively fewer application studies, providing new approaches for remote sensing retrieval of WQPs.

In the estimation of TN concentration, it is observed that the performance of the EML algorithm is marginally lower compared to the NNR algorithm of ML, despite attempts to enhance EML performance through a combination of diverse base-models, as reported by Zhang et al. [48] and Rahman et al. [63]. Fu et al. [64] similarly noted lower accuracy in EML compared to single models, consistent with TN in this study. This could potentially be attributed to the presence of an underperforming base-model in layer 0 of the EML algorithm, where the superior performance of a certain base-model is compromised during meta-model training in layer 1 to enhance validation accuracy. Consequently, EML algorithms not only require a high-performing base-model but also necessitate diversity in base-model types to mitigate reliance on any single model’s information, thus achieving accurate WQP estimation [65,66].

Furthermore, it was found that the best model had poor performance for high values of Chla, TP, and TN and low values of SD and COD_Mn. This may be due to the small number of water quality samples for high values of Chla, TP, and TN and low values of SD and COD_Mn, which lead to a large bias in the estimation of extreme values with the model. Cao et al. [28] also present a consistent conclusion in their study. Therefore, the samples should be guaranteed to contain enough extreme values in the field sampling to obtain higher accuracy of WQP estimation.

The estimation accuracy of the EML algorithm for OAPs and NOAPs in this study is higher (percentage of μ ≤ 0.5: 0.88~0.97) than that of the ML algorithm (percentage of μ ≤ 0.5: 0.79~0.95). It verifies from another perspective that the accuracy of the EML algorithm for estimating OAPs and NOAPs is improved due to the strength of integrating a single ML algorithm [67]. In addition, the overall estimation accuracy of the EML algorithm for OAPs (MRE: 13.66% to 19.35%) is higher than that for NOAPs (MRE: 7.57% to 42.06%). On the one hand, the OAPs of Chla and SD and others may have good inherent optical properties [20,68] and have clear optical signals spectrally, which are more favorable for R_rs estimation. On the other hand—probably due to the complex composition and optical insensitivity properties of the NOAPs of COD_Mn, TN, TP, and others [69]—more abundant spectral information [32,43] needs to be obtained for NOAP estimation.

4.3. Temporal and Spatial Variations of Water Quality

In this study, water quality retrieval in the Longdong Reservoir is performed using UAV multispectral imagery, and WQP concentrations in the whole reservoir present obvious spatial and temporal heterogeneity. The field investigation reveals that there is an aquaculture facility near the upstream head of the Longdong Reservoir and a college accommodating more than 1000 people near the downstream tail. During 4 January 2022, Chla concentration was higher at the head of the upstream, and clarity was decreased under the influence of the aquaculture wastewater generated from the dredging and water exchange of the aquaculture facility. The water quality of the Longdong Reservoir mainly suffers from the severe effects of nutrients (nitrogen and phosphorus), which is consistent with the studies of other researchers [70,71]. In addition, during 27 May 2023 and 11 June 2023, due to the discharge of domestic sewage effluent, the clarity of the water body at the downstream tailing was reduced, the concentration of Chla was higher, and the water body mainly suffered from the severe impact of nutrients. Cai et al. [43] revealed that the river suffered from COD under the influence of domestic sewage discharge, which is, however, different from the results of this study. Rivers are closer to residential areas than reservoirs, which may be subject to stronger influences from human activities, receiving more domestic wastewater and, thus, exhibiting enrichment with organic matter. It reflects that the water quality of inland water bodies is affected by domestic sewage discharge to some extent.

Notably, on account of the vegetation surrounding the reservoir and the angle of the sun, vegetation shadows are present in a small number of marginal areas of the UAV multispectral images from 7 April 2022, 26 April 2023, and 11 June 2023. In addition, a small number of sunspots are present in the central area of the images from 27 May 2023. As a result, R_rs is distorted [72], which affects the estimation of WQPs.

More significantly, during the period of July 2022, the increase in the reservoir area and storage volume because of heavy rainfall have a big effect on the self-purification, regulation, and balance with the ecological impacts of the reservoir. During the period from 7 April 2022 to 31 July 2022, although the COD_Mn in the upstream shows a lower concentration due to rain dilution, the runoff caused by heavy rainfall carries a large amount of organic matter, leading to an overall increase in COD_Mn concentration in the reservoir [73]. Meanwhile, strong rainfall affects the photosynthesis of phytoplankton and, due to the dilution effect [74], the concentration of Chla is significantly reduced, and the concentrations of TN and TP also tend to decrease, with a significant increase in clarity.

During 26 April 2023, the water quality of the Longdong Reservoir may be affected by the dual impacts of aquaculture activities and domestic sewage discharge. On account of the two impacts, the reservoir is polluted by phosphorus and appears to be inferior Grade V. During the period from 26 April 2023 to 11 June 2023, the COD_Mn concentration and Chla concentration show an increasing trend and the clarity shows a decreasing trend, probably due to the decrease in the dissolved oxygen caused by the increase in temperature [75,76] and also the growth of phytoplankton promoted by nitrogen [69,77]. More significantly, during the rainy season, with increasing rainfall, rising temperatures, and frequent human activities, nutrients (nitrogen and phosphorus) may flow into the reservoir. As a result of this, phytoplankton growth in the water column is promoted, leading to higher Chla concentrations and lower clarity. During the dry season, with lower temperatures, less rainfall, and lower hydrodynamics, microbial activity is lower, leading to a weaker ability to decompose organic matter, which ultimately appears as higher organic matter. At the same time, lower nutrients (nitrogen and phosphorus) result in lower Chla concentration and higher clarity. Wang et al. [10] revealed that due to human activities, the estuarine area of Lake Wuli had higher concentrations of TN, TP, and Chla in the rainy season than in the dry season, and the clarity in the rainy season was lower than that in the dry season, which was consistent with the results of this study.

4.4. Challenges and Opportunities

In this study, benefiting from the FC-GA method and EML algorithm, the UAV remote sensing method efficiently monitors the water quality situation of OAPs and NOAPs and successfully captures potential pollution sources near the urban reservoir based on the retrieved images of water quality in comparison with satellite remote sensing methods [18,78]. However, some challenges remain in the application of UAV remote sensing for water quality monitoring. Firstly, the small spectral range and low spectral resolution that could be captured with the UAV multispectral camera resulted in the potential failure to capture important bands of WQPs. Secondly, the water quality samples are not sufficiently representative, which could lead to a large bias in the concentration of WQPs retrieved via the machine learning model. In this study, due to financial and time limitations, 94 samples were collected in three periods with relatively full water quality coverage of grades (Grade IV to inferior Grade V). However, fewer samples of extreme values of WQPs resulted in a larger bias in the prediction of extreme values using the ML model. Thirdly, the EML model is trained without considering meteorological and hydrological factors that would affect water quality, and the EML algorithm has a complex structure and fails to physically interpret important bands.

In future research, some improvements will be made. To begin with, UAV hyperspectral cameras with a wider spectral range and higher spectral resolution will be used to acquire water body images with rich spectral information to improve the performance of ML models. Additionally, the spatial heterogeneity of water quality and the variability of spectra in each region will be comprehensively considered, and more samples and spectral images of inland water bodies with different environments (including clean water, water with different grades of eutrophication, black-odorous water, and so on) will be collected. Finally, machine learning models will be trained by adding meteorological data like precipitation, water temperature, wind speed and direction, and hydrological data like water level and flow rate, as well as remote sensing spectral data.

5. Conclusions

In this study, UAV multispectral remote sensing and machine learning methods were used to effectively monitor the WQPs of the urban reservoir. To select the important bands for WQPs, the band selection method of FC-GA was developed to perform a global search for all the bands. The performance evaluation of band selection shows that FC-GA had higher performance than the other two methods in the range of 8.15% to 39.23%. Furthermore, two frameworks of ensemble machine learning models were developed for the water quality estimation of the urban reservoir. Compared with the ML algorithm, the estimation accuracy (AR) of the EML model was improved from 10.53% to 56.94%. In addition, the estimation performance of OAPs and NOAPs showed that the estimation accuracy of the EML model for OAPs (MRE ≤ 19.35%) was higher than for NOAPs (MRE ≤ 42.06%). Finally, we reconstructed the spatial and temporal distribution of WQPs in the urban reservoir using the best EML model. The results showed that the upstream head and downstream tail of the reservoir suffered from nitrogen–phosphorus nutrient and nitrogen nutrient pollution, respectively. In addition, TP, TN, and Chla were lower in the dry season than in the rainy season, and clarity and COD_Mn were higher in the dry season. In future studies, we will collect more samples and spectral images of inland waters of different environmental types and add meteorological and hydrological data, which will be used to enhance the generalization performance of the model. This study provides new ideas and ways for water quality monitoring and management of inland waters.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/rs16122246/s1. Figure S1: Pictures of UAV and gray standard panel. (a) The P4M UAV and (b) three gray standard panels of 50%, 25%, and 75%, respectively; Figure S2: Spectral characteristics of six periods at the sampling site of Longdong Reservoir. The error bars represent the degree of dispersion of the Spectral; Figure S3: Distribution of five water quality parameters before and after oversampling. (a)~(e) are the distributions of Chla, SD, COD_Mn, TN, TP before oversampling, and (f)~(j) are the distributions of Chla, SD, COD_Mn, TN, TP after oversampling, respectively; Table S1: Statistics of information collected from UAV multispectral images; Table S2: Determination method of water quality parameters; Table S3: statistics on the number of available data for WQPs samples; Table S4: Standard values of the basic items for the Chinese Environmental Quality Standards for Surface Water (GB3838-2002) (units: mg/L); Table S5: Descriptive statistics for the whole dataset, training, and validation datasets for COD_Mn, TP, TN, Chla, and SD. N represents the number of data, and Min and Max represent the maximum and minimum values, respectively. Mean and Std represent the mean and variance, respectively, and CV represents the coefficient of variation. 25%, 50%, and 75% represent the first quartile, the second quartile and the third quartile, respectively; Table S6: Statistics on the size of the training dataset’s category data before and after oversampling; Table S7: All the combined band formats; Table S8: Configuration of the parameters of the genetic algorithm; Table S9: Statistics of the feature band selection results based on the FC-GA for each WQP.

Author Contributions

X.L.: Conceptualization, Investigation, Methodology, Software, Validation, Visualization, Formal Analysis, Writing—Original Draft, and Writing—Review and Editing. J.J.: Writing—Original Draft, Writing—Review and Editing. Z.D.: Writing—Original Draft and Writing—Review and Editing. D.W.: Investigation and Data Curation. F.W.: Investigation and Data Curation. C.L.: Writing—Review and Editing and Supervision. Z.W.: Conceptualization, Writing—Review and Editing, Supervision, Funding Acquisition, and Project Administration. X.C.: Conceptualization, Writing—Review and Editing, and Supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This research was financially supported by the National Key R&D Program of China (2021YFC3001000), the National Natural Science Foundation of China (52209019, 52379010, U1911204), and the Natural Science Foundation of Guangdong Province (2022A1515240071, 2023B1515020087, 2022A1515010019).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Albert, J.S.; Destouni, G.; Duke-Sylvester, S.M.; Magurran, A.E.; Oberdorff, T.; Reis, R.E.; Winemiller, K.O.; Ripple, W.J. Scientists′ warning to humanity on the freshwater biodiversity crisis. Ambio 2021, 50, 85–94. [Google Scholar] [CrossRef]
Faghihinia, M.; Xu, Y.; Liu, D.; Wu, N. Freshwater biodiversity at different habitats: Research hotspots with persistent and emerging themes. Ecol. Indic. 2021, 129, 107926. [Google Scholar] [CrossRef]
Li, J.; Ianaiev, V.; Huff, A.; Zalusky, J.; Ozersky, T.; Katsev, S. Benthic invaders control the phosphorus cycle in the world′s largest freshwater ecosystem. Proc. Natl. Acad. Sci. USA 2021, 118, e2008223118. [Google Scholar] [CrossRef] [PubMed]
Lai, C.; Sun, H.; Wu, X.; Li, J.; Wang, Z.; Tong, H.; Feng, J. Water availability may not constrain vegetation growth in Northern Hemisphere. Agric. Water Manag. 2024, 291, 108649. [Google Scholar] [CrossRef]
Ma, T.; Sun, S.; Fu, G.; Hall, J.W.; Ni, Y.; He, L.; Yi, J.; Zhao, N.; Du, Y.; Pei, T.; et al. Pollution exacerbates China′s water scarcity and its regional inequality. Nat. Commun. 2020, 11, 650. [Google Scholar] [CrossRef] [PubMed]
Messager, M.L.; Lehner, B.; Cockburn, C.; Lamouroux, N.; Pella, H.; Snelder, T.; Tockner, K.; Trautmann, T.; Watt, C.; Datry, T. Global prevalence of non-perennial rivers and streams. Nature 2021, 594, 391–397. [Google Scholar] [CrossRef] [PubMed]
Giri, S. Water quality prospective in Twenty First Century: Status of water quality in major river basins, contemporary strategies and impediments: A review. Environ. Pollut. 2021, 271, 116332. [Google Scholar] [CrossRef] [PubMed]
Huang, J.; Zhang, Y.; Bing, H.; Peng, J.; Dong, F.; Gao, J.; Arhonditsis, G.B. Characterizing the river water quality in China: Recent progress and on-going challenges. Water Res. 2021, 201, 117309. [Google Scholar] [CrossRef]
Liao, Y.; Wang, Z.; Chen, X.; Lai, C. Fast simulation and prediction of urban pluvial floods using a deep convolutional neural network model. J. Hydrol. 2023, 624, 129945. [Google Scholar] [CrossRef]
Wang, J.; Fu, Z.; Qiao, H.; Liu, F. Assessment of eutrophication and water quality in the estuarine area of Lake Wuli, Lake Taihu, China. Sci. Total Environ. 2019, 650, 1392–1402. [Google Scholar] [CrossRef]
Xu, H.; Paerl, H.W.; Qin, B.; Zhu, G.; Hall, N.S.; Wu, Y. Determining Critical Nutrient Thresholds Needed to Control Harmful Cyanobacterial Blooms in Eutrophic Lake Taihu, China. Environ. Sci. Technol. 2015, 49, 1051–1059. [Google Scholar] [CrossRef] [PubMed]
Ho, J.C.; Michalak, A.M.; Pahlevan, N. Widespread global increase in intense lake phytoplankton blooms since the 1980s. Nature 2019, 574, 667–670. [Google Scholar] [CrossRef] [PubMed]
Song, K.; Liu, G.; Wang, Q.; Wen, Z.; Lyu, L.; Du, Y.; Sha, L.; Fang, C. Quantification of lake clarity in China using Landsat OLI imagery data. Remote Sens. Environ. 2020, 243, 111800. [Google Scholar] [CrossRef]
Guan, Q.; Feng, L.; Hou, X.; Schurgers, G.; Zheng, Y.; Tang, J. Eutrophication changes in fifty large lakes on the Yangtze Plain of China derived from MERIS and OLCI observations. Remote Sens. Environ. 2020, 246, 111890. [Google Scholar] [CrossRef]
Bownik, A.; Wlodkowic, D. Advances in real-time monitoring of water quality using automated analysis of animal behaviour. Sci. Total Environ. 2021, 789, 147796. [Google Scholar] [CrossRef] [PubMed]
Zhu, X.; Guo, H.; Huang, J.J.; Tian, S.; Xu, W.; Mai, Y. An ensemble machine learning model for water quality estimation in coastal area based on remote sensing imagery. J. Environ. Manag. 2022, 323, 116187. [Google Scholar] [CrossRef] [PubMed]
Guo, H.; Tian, S.; Huang, J.J.; Zhu, X.; Wang, B.; Zhang, Z. Performance of deep learning in mapping water quality of Lake Simcoe with long-term Landsat archive. J. Photogramm. Remote Sens. 2022, 183, 451–469. [Google Scholar] [CrossRef]
Saranathan, A.M.; Smith, B.; Pahlevan, N. Per-Pixel Uncertainty Quantification and Reporting for Satellite-Derived Chlorophyll-a Estimates via Mixture Density Networks. Trans. Geosci. Remote Sens. 2023, 61, 1–18. [Google Scholar] [CrossRef]
Do, T.-N.; Nguyen, D.-M.T.; Ghimire, J.; Vu, K.-C.; Do Dang, L.-P.; Pham, S.-L.; Pham, V.-M. Assessing surface water pollution in Hanoi, Vietnam, using remote sensing and machine learning algorithms. Environ. Sci. Pollut. Res. 2023, 30, 82230–82247. [Google Scholar] [CrossRef]
Maciel, D.A.; Faria Barbosa, C.C.; Leao de Moraes Novo, E.M.; Flores Junior, R.; Begliomini, F.N. Water clarity in Brazilian water assessed using Sentinel-2 and machine learning methods. J. Photogramm. Remote Sens. 2021, 182, 134–152. [Google Scholar] [CrossRef]
Shen, H.; Wu, J.; Cheng, Q.; Aihemaiti, M.; Zhang, C.; Li, Z. A Spatiotemporal Fusion Based Cloud Removal Method for Remote Sensing Images With Land Cover Changes. J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 862–874. [Google Scholar] [CrossRef]
Sun, X.; Zhang, Y.; Shi, K.; Zhang, Y.; Li, N.; Wang, W.; Huang, X.; Qin, B. Monitoring water quality using proximal remote sensing technology. Sci. Total Environ. 2022, 803, 149805. [Google Scholar] [CrossRef]
Kwon, Y.S.; Pyo, J.; Kwon, Y.-H.; Duan, H.; Cho, K.H.; Park, Y. Drone-based hyperspectral remote sensing of cyanobacteria using vertical cumulative pigment concentration in a deep reservoir. Remote Sens. Environ. 2020, 236, 111517. [Google Scholar] [CrossRef]
Sagan, V.; Peterson, K.T.; Maimaitijiang, M.; Sidike, P.; Sloan, J.; Greeling, B.A.; Maalouf, S.; Adams, C. Monitoring inland water quality using remote sensing: Potential and limitations of spectral indices, bio-optical simulations, machine learning, and cloud computing. Earth-Sci. Rev. 2020, 205, 103187. [Google Scholar] [CrossRef]
Jiang, D.; Matsushita, B.; Pahlevan, N.; Gurlin, D.; Lehmann, M.K.; Fichot, C.G.; Schalles, J.; Loisel, H.; Binding, C.; Zhang, Y.; et al. Remotely estimating total suspended solids concentration in clear to extremely turbid waters using a novel semi-analytical method. Remote Sens. Environ. 2021, 258, 112386. [Google Scholar] [CrossRef]
Li, S.; Chen, F.; Song, K.; Liu, G.; Tao, H.; Xu, S.; Wang, X.; Wang, Q.; Mu, G. Mapping the trophic state index of eastern lakes in China using an empirical model and Sentinel-2 imagery data. J. Hydrol. 2022, 608, 127613. [Google Scholar] [CrossRef]
Lai, L.; Zhang, Y.; Cao, Z.; Liu, Z.; Yang, Q. Algal biomass mapping of eutrophic lakes using a machine learning approach with MODIS images. Sci. Total Environ. 2023, 880, 163357. [Google Scholar] [CrossRef]
Cao, Z.; Ma, R.; Duan, H.; Pahlevan, N.; Melack, J.; Shen, M.; Xue, K. A machine learning approach to estimate chlorophyll-a from Landsat-8 measurements in inland lakes. Remote Sens. Environ. 2020, 248, 111974. [Google Scholar] [CrossRef]
Werther, M.; Odermatt, D.; Simis, S.G.H.; Gurlin, D.; Lehmann, M.K.; Kutser, T.; Gupana, R.; Varley, A.; Hunter, P.D.; Tyler, A.N.; et al. A Bayesian approach for remote sensing of chlorophyll-a and associated retrieval uncertainty in oligotrophic and mesotrophic lakes. Remote Sens. Environ. 2022, 283, 113295. [Google Scholar] [CrossRef]
He, Y.; Lu, Z.; Wang, W.; Zhang, D.; Zhang, Y.; Qin, B.; Shi, K.; Yang, X. Water clarity mapping of global lakes using a novel hybrid deep-learning-based recurrent model with Landsat OLI images. Water Res. 2022, 215, 118241. [Google Scholar] [CrossRef]
Li, L.; Gu, M.; Gong, C.; Hu, Y.; Wang, X.; Yang, Z.; He, Z. An advanced remote sensing retrieval method for urban non-optically active water quality parameters: An example from Shanghai. Sci. Total Environ. 2023, 880, 163389. [Google Scholar] [CrossRef] [PubMed]
Qun′ou, J.; Lidan, X.; Siyang, S.; Meilin, W.; Huijie, X. Retrieval model for total nitrogen concentration based on UAV hyper spectral remote sensing data and machine learning algorithms—A case study in the Miyun Reservoir, China. Ecol. Indic. 2021, 124, 107356. [Google Scholar] [CrossRef]
Sagi, O.; Rokach, L. Ensemble learning: A survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2018, 8, e1249. [Google Scholar] [CrossRef]
Zeng, J.; Lin, G.; Huang, G. Evaluation of the cost-effectiveness of Green Infrastructure in climate change scenarios using TOPSIS. Urban For. Urban Green. 2021, 64, 127287. [Google Scholar] [CrossRef]
Liu, C.; Xing, C.; Hu, Q.; Wang, S.; Zhao, S.; Gao, M. Stereoscopic hyperspectral remote sensing of the atmospheric environment: Innovation and prospects. Earth-Sci. Rev. 2022, 226, 103958. [Google Scholar] [CrossRef]
Wang, X.; Liu, L.; Zhang, S.; Gao, C. Dynamic simulation and comprehensive evaluation of the water resources carrying capacity in Guangzhou city, China. Ecol. Indic. 2022, 135, 108528. [Google Scholar] [CrossRef]
Shafiee, S.; Mroz, T.; Burud, I.; Lillemo, M. Evaluation of UAV multispectral cameras for yield and biomass prediction in wheat under different sun elevation angles and phenological stages. Comput. Electron. Agric. 2023, 210, 107874. [Google Scholar] [CrossRef]
Yao, C.; Sun, Z.; Lu, S. Reducing BRDF Effects on the Estimation of Leaf Biochemical Parameters Using the Nonpolarized Reflectance Factor in the Hemispheric Space. Trans. Geosci. Remote Sens. 2022, 60, 1–17. [Google Scholar] [CrossRef]
Daniels, L.; Eeckhout, E.; Wieme, J.; Dejaegher, Y.; Audenaert, K.; Maes, W.H. Identifying the Optimal Radiometric Calibration Method for UAV-Based Multispectral Imaging. Remote Sens. 2023, 15, 2909. [Google Scholar] [CrossRef]
Svensgaard, J.; Jensen, S.M.; Christensen, S.; Rasmussen, J. The importance of spectral correction of UAV-based phenotyping with RGB cameras. Field Crops Res. 2021, 269, 108177. [Google Scholar] [CrossRef]
Tang, Y.; Pan, Y.; Zhang, L.; Yi, H.; Gu, Y.; Sun, W. Efficient Monitoring of Total Suspended Matter in Urban Water Based on UAV Multi-spectral Images. Water Resour. Manag. 2023, 37, 2143–2160. [Google Scholar] [CrossRef]
Chen, B.; Mu, X.; Chen, P.; Wang, B.; Choi, J.; Park, H.; Xu, S.; Wu, Y.; Yang, H. Machine learning-based inversion of water quality parameters in typical reach of the urban river by UAV multispectral data. Ecol. Indic. 2021, 133, 108434. [Google Scholar] [CrossRef]
Cai, J.; Meng, L.; Liu, H.; Chen, J.; Xing, Q. Estimating Chemical Oxygen Demand in estuarine urban rivers using unmanned aerial vehicle hyperspectral images. Ecol. Indic. 2022, 139, 108936. [Google Scholar] [CrossRef]
Xiao, Y.; Guo, Y.; Yin, G.; Zhang, X.; Shi, Y.; Hao, F.; Fu, Y. UAV Multispectral Image-Based Urban River Water Quality Monitoring Using Stacked Ensemble Machine Learning Algorithms-A Case Study of the Zhanghe River, China. Remote Sens. 2022, 14, 3272. [Google Scholar] [CrossRef]
Lemaitre, G.; Nogueira, F.; Aridas, C.K. Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning. J. Mach. Learn. Res. 2017, 18, 17. [Google Scholar]
Rivera, W.A.; Xanthopoulos, P. A priori synthetic over-sampling methods for increasing classification sensitivity in imbalanced data sets. Expert Syst. Appl. 2016, 66, 124–135. [Google Scholar] [CrossRef]
Xiong, J.; Lin, C.; Cao, Z.; Hu, M.; Xue, K.; Chen, X.; Ma, R. Development of remote sensing algorithm for total phosphorus concentration in eutrophic lakes: Conventional or machine learning? Water Res. 2022, 215, 118213. [Google Scholar] [CrossRef]
Zhang, J.; Fu, P.; Meng, F.; Yang, X.; Xu, J.; Cui, Y. Estimation algorithm for chlorophyll-a concentrations in water from hyperspectral images based on feature derivation and ensemble learning. Ecol. Inform. 2022, 71, 101783. [Google Scholar] [CrossRef]
Rocha, A.D.; Groen, T.A.; Skidmore, A.K. Spatially-explicit modelling with support of hyperspectral data can improve prediction of plant traits. Remote Sens. Environ. 2019, 231, 111200. [Google Scholar] [CrossRef]
Dormann, C.F.; Elith, J.; Bacher, S.; Buchmann, C.; Carl, G.; Carre, G.; Garcia Marquez, J.R.; Gruber, B.; Lafourcade, B.; Leitao, P.J.; et al. Collinearity: A review of methods to deal with it and a simulation study evaluating their performance. Ecography 2013, 36, 27–46. [Google Scholar] [CrossRef]
Martinez, F.; Pilar Frias, M.; Dolores Perez, M.; Jesus Rivera, A. A methodology for applying k-nearest neighbor to time series forecasting. Artif. Intell. Rev. 2019, 52, 2019–2037. [Google Scholar] [CrossRef]
De′ath, G.; Fabricius, K.E. Classification and regression trees: A powerful yet simple technique for ecological data analysis. Ecology 2000, 81, 3178–3192. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Bentejac, C.; Csorgo, A.; Martinez-Munoz, G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 2021, 54, 1937–1967. [Google Scholar] [CrossRef]
Hu, X.; Weng, Q. Estimating impervious surfaces from medium spatial resolution imagery using the self-organizing map and multi-layer perceptron neural networks. Remote Sens. Environ. 2009, 113, 2089–2102. [Google Scholar] [CrossRef]
Rajadurai, H.; Gandhi, U.D. A stacked ensemble learning model for intrusion detection in wireless network. Neural Comput. Appl. 2022, 34, 15387–15395. [Google Scholar] [CrossRef]
Wang, R. Significantly Improving the Prediction of Molecular Atomization Energies by an Ensemble of Machine Learning Algorithms and Rescanning Input Space: A Stacked Generalization Approach. J. Phys. Chem. C 2018, 122, 8868–8873. [Google Scholar] [CrossRef]
Jiang, F.; Kutia, M.; Ma, K.; Chen, S.; Long, J.; Sun, H. Estimating the aboveground biomass of coniferous forest in Northeast China using spectral variables, land surface temperature and soil moisture. Sci. Total Environ. 2021, 785, 147335. [Google Scholar] [CrossRef] [PubMed]
Kwon, S.; Seo, I.W.; Noh, H.; Kim, B. Hyperspectral retrievals of suspended sediment using cluster-based machine learning regression in shallow waters. Sci. Total Environ. 2022, 833, 155168. [Google Scholar] [CrossRef]
Wang, Y.; Yao, Q.; Kwok, J.T.; Ni, L.M. Generalizing from a Few Examples: A Survey on Few-shot Learning. Acm. Comput. Surv. 2020, 53, 1–34. [Google Scholar] [CrossRef]
Emadi, M.; Taghizadeh-Mehrjardi, R.; Cherati, A.; Danesh, M.; Mosavi, A.; Scholten, T. Predicting and Mapping of Soil Organic Carbon Using Machine Learning Algorithms in Northern Iran. Remote Sens. 2020, 12, 2234. [Google Scholar] [CrossRef]
Zhou, X.; Liu, C.; Akbar, A.; Xue, Y.; Zhou, Y. Spectral and Spatial Feature Integrated Ensemble Learning Method for Grading Urban River Network Water Quality. Remote Sens. 2021, 13, 4591. [Google Scholar] [CrossRef]
Rahman, M.; Chen, N.; Elbeltagi, A.; Islam, M.M.; Alam, M.; Pourghasemi, H.R.; Tao, W.; Zhang, J.; Tian, S.; Faiz, H.; et al. Application of stacking hybrid machine learning algorithms in delineating multi-type flooding in Bangladesh. J. Environ. Manag. 2021, 295, 113086. [Google Scholar] [CrossRef] [PubMed]
Fu, B.; Lao, Z.; Liang, Y.; Sun, J.; He, X.; Deng, T.; He, W.; Fan, D.; Gao, E.; Hou, Q. Evaluating optically and non-optically active water quality and its response relationship to hydro-meteorology using multi-source data in Poyang Lake, China. Ecol. Indic. 2022, 145, 109675. [Google Scholar] [CrossRef]
Feng, L.; Zhang, Z.; Ma, Y.; Du, Q.; Williams, P.; Drewry, J.; Luck, B. Alfalfa Yield Prediction Using UAV-Based Hyperspectral Imagery and Ensemble Learning. Remote Sens. 2020, 12, 2028. [Google Scholar] [CrossRef]
Koopialipoor, M.; Asteris, P.G.; Mohammed, A.S.; Alexakis, D.E.; Mamou, A.; Armaghani, D.J. Introducing stacking machine learning approaches for the prediction of rock deformation. Transp. Geotech. 2022, 34, 100756. [Google Scholar] [CrossRef]
Werther, M.; Spyrakos, E.; Simis, S.G.H.; Odermatt, D.; Stelzer, K.; Krawczyk, H.; Berlage, O.; Hunter, P.; Tyler, A. Meta-classification of remote sensing reflectance to estimate trophic status of inland and nearshore waters. J. Photogramm. Remote Sens. 2021, 176, 109–126. [Google Scholar] [CrossRef]
Pahlevan, N.; Smith, B.; Schalles, J.; Binding, C.; Cao, Z.; Ma, R.; Alikas, K.; Kangro, K.; Gurlin, D.; Nguyen, H.; et al. Seamless retrievals of chlorophyll-a from Sentinel-2 (MSI) and Sentinel-3 (OLCI) in inland and coastal waters: A machine-learning approach. Remote Sens. Environ. 2020, 240, 111604. [Google Scholar] [CrossRef]
Cao, J.; Hou, Z.-Y.; Li, Z.-K.; Zheng, B.-H.; Chu, Z.-S. Spatiotemporal dynamics of phytoplankton biomass and community succession for driving factors in a meso-eutrophic lake. J. Environ. Manag. 2023, 345, 118693. [Google Scholar] [CrossRef]
Zou, W.; Zhu, G.; Xu, H.; Zhu, M.; Qin, B.; Zhang, Y.; Bi, Y.; Liu, M.; Wu, T. Elucidating phytoplankton limiting factors in lakes and reservoirs of the Chinese Eastern Plains ecoregion. J. Environ. Manag. 2022, 318, 115542. [Google Scholar] [CrossRef]
Laura Sanchez, M.; Izaguirre, I.; Zagarese, H.; Romina Schiaffino, M.; Castro Berman, M.; Lagomarsino, L.; Chaparro, G.; Balina, S.; Solange Vera, M.; Spence Cheruvelil, K. Drivers of planktonic chlorophyll a in pampean shallow lakes. Ecol. Indic. 2023, 146, 109834. [Google Scholar] [CrossRef]
Zeng, C.; Richardson, M.; King, D.J. The impacts of environmental variables on water reflectance measured using a lightweight unmanned aerial vehicle (UAV)-based spectrometer system. J. Photogramm. Remote Sens. 2017, 130, 217–230. [Google Scholar] [CrossRef]
Li, X.; Huang, T.; Ma, W.; Sun, X.; Zhang, H. Effects of rainfall patterns on water quality in a stratified reservoir subject to eutrophication: Implications for management. Sci. Total Environ. 2015, 521, 27–36. [Google Scholar] [CrossRef] [PubMed]
Wang, D.; Li, Z.; Li, Z.; Ma, W.; Nie, X. Response of organic carbon in drainage ditch water to rainfall events in Zoige Basin in the Qinghai-Tibet Plateau. J. Hydrol. 2019, 579, 124187. [Google Scholar] [CrossRef]
Wang, P.; Chen, B.; Yuan, R.; Li, C.; Li, Y. Characteristics of aquatic bacterial community and the influencing factors in an urban river. Sci. Total Environ. 2016, 569, 382–389. [Google Scholar] [CrossRef]
Yeh, C.-S.; Wang, R.; Chang, W.-C.; Shih, Y.-H. Synthesis and characterization of stabilized oxygen-releasing CaO₂ nanoparticles for bioremediation. J. Environ. Manag. 2018, 212, 17–22. [Google Scholar] [CrossRef] [PubMed]
Liu, X.; Feng, J.; Wang, Y. Chlorophyll a predictability and relative importance of factors governing lake phytoplankton at different timescales. Sci. Total Environ. 2019, 648, 472–480. [Google Scholar] [CrossRef]
Bonelli, A.G.; Vantrepotte, V.; Jorge, D.S.F.; Demaria, J.; Jamet, C.; Dessailly, D.; Mangin, A.; d’Andon, O.F.; Kwiatkowska, E.; Loisel, H. Colored dissolved organic matter absorption at global scale from ocean color radiometry observation: Spatio-temporal variability and contribution to the absorption budget. Remote Sens. Environ. 2021, 265, 112637. [Google Scholar] [CrossRef]

Figure 1. Flowchart of the research framework. The whole study consists of four main parts: data collection and preprocessing, feature bands selection, ensemble machine learning model development, and retrieval of the spatial and temporal distribution of WQPs.

Figure 2. The study area. (a,b) Locations of the Longdong Reservoir. (c) Sampling points of water quality in the Longdong Reservoir. In panel (c), the sampling points of water quality are marked in green, and the RGB image is composited from three UAV remote sensing bands of Red, Blue, and Green.

Figure 3. Flowchart of feature band selection based on FC-GA. FC-GA consists of two parts: feature combination of arithmetic operations and random combination, band selection based on genetic algorithm, and VIF.

Figure 4. The modeling framework of the EML-1 and EML-2. (a,c) The training and prediction of EML-1, respectively; (b,d) the training and prediction of EML-2, respectively. The difference in panel (b) compared to panel (a) is that the training dataset and testing dataset of level 0 are re-entered into the meta-model of level 1, and the difference in panel (d) compared to panel (c) is the input of new data into the meta-model of level 1.

Figure 5. Performance evaluation of five water quality parameters using EML-1, EML-2, BRR, SVR, NNR, CART, RF, LightGBM, and MLP. Each panel, (a–e), indicates Chla, SD, COD_Mn, TN, and TP, respectively. Meanwhile, the black dotted line presents an angle of 45.

Figure 6. Maps of WQPs concentration distribution. (A–E) Chla, SD, COD_Mn, TN, and TP, respectively; (a–f) the reversal results of six periods, respectively. COD_Mn, TN, and TP are divided according to the grade range of China’s Environmental Quality Standards for Surface Water (GB3838-2002 [8]), while Chla and SD are equally divided according to the total range of each. A small number of holes in the maps are caused by image mosaicking deviation.

Figure 7. Trends of water quality changes in the six periods of the best model reversion. (a–e) The five WQPs of Chla, SD, COD_Mn, TN, and TP, respectively; A~F in the horizontal coordinates represent the six periods of 4 January 2022, 7 April 2022, 31 July 2022, 26 April 2023, 27 May 2023, and 11 June 2023, respectively. The error bars represent the degree of dispersion of each water quality parameter.

Table 1. Statistics on the performance of three feature selection methods for five water quality parameters, where “*” indicates greater than 100.

Evaluation Indicator	Method	Chla	SD	COD_Mn	TN	TP
R²	FC-GA	0.77	0.72	0.76	0.55	0.70
	PC	0.72	0.62	0.62	0.50	0.19
	RFE	0.73	0.82	0.76	0.28	0.56
RMSE	FC-GA	2.53	10.38	0.22	0.19	0.10
	PC	2.80	12.01	0.29	0.21	0.17
	RFE	2.75	8.22	0.22	0.25	0.13
VIF_min	FC-GA	1.17	2.61	3.88	1.62	1.28
	PC	62.49	51.05	*	67.23	*
	RFE	3.28	1.04	3.13	2.48	1.56
VIF_max	FC-GA	9.49	9.65	7.04	8.04	2.63
	PC	*	*	*	*	*
	RFE	36.65	56.94	22.07	11.16	17.62

Table 2. Statistics of EML model integration single ML model information for each WQP.

WQPs	ML
WQPs	Base-Model	Mate-Model
Chla	NNR, LightGBM, BRR,	NNR
SD	SVR, NNR, MLP, BRR	SVR
COD_Mn	RF, NNR, CART	MLP
TN	NNR, LightGBM, RF, CART	MLP
TP	LightGBM	LightGBM

Table 3. Statistical information of accuracy validation using nine algorithms for estimating WQPs.

WQPs	Evaluation Indicators	BRR	SVR	NNR	CART	RF	LightGBM	MLP	EML-1	EML-2
Chla	R²	0.74	0.78	0.81	0.72	0.73	0.78	0.66	0.86	0.83
	RMSE (mg m⁻³)	2.69	2.51	2.30	2.82	2.74	2.48	3.11	2.02	2.16
	MRE/%	23.71	19.98	19.03	21.62	24.80	19.71	24.51	13.66	14.87
COD_Mn	R²	0.65	0.73	0.70	0.42	0.56	0.60	0.72	0.74	0.72
	RMSE/ (mg L⁻¹)	0.27	0.24	0.25	0.35	0.31	0.29	0.24	0.23	0.24
	MRE/%	9.31	8.06	8.18	12.01	11.02	9.94	7.40	7.57	7.74
SD	R²	0.52	0.53	0.73	0.73	0.81	0.60	0.58	0.89	0.90
	RMSE/ (cm)	13.52	13.37	10.13	10.15	8.56	12.36	12.55	6.51	6.14
	MRE/%	37.08	34.07	24.09	27.71	21.69	28.68	30.75	19.35	16.34
TN	R²	0.19	0.38	0.63	0.48	0.58	0.60	0.55	0.51	0.61
	RMSE/ (mg L⁻¹)	0.26	0.23	0.18	0.21	0.19	0.18	0.19	0.20	0.18
	MRE/%	21.54	21.87	13.87	17.94	13.29	14.36	17.26	15.05	13.74
TP	R²	0.24	0.50	0.54	0.29	0.62	0.76	-1.05	0.74	0.83
	RMSE/ (mg L⁻¹)	0.16	0.13	0.13	0.16	0.12	0.09	0.27	0.10	0.08
	MRE/%	87.40	150.65	38.27	39.74	32.03	47.90	127.16	42.06	22.29

Table 4. Information statistics of mean AR of the best algorithm in each water quality parameter estimation method.

Methods	Chla	COD_Mn	SD	TN	TP
ML	1.86	0.19	10.80	0.12	0.056
EML	1.30	0.17	4.65	0.13	0.042

Table 5. Statistics of the AR class of the best model for each WQP.

WQPs	Class (AR)
WQPs	Low	Medium	High
Chla	1.14	1.07	1.68
SD	6.89	2.76	4.31
COD_Mn	0.26	0.14	0.12
TN	0.09	0.10	0.19
TP	0.013	0.056	0.063

Table 6. Statistics of relative errors in estimating OAPs and NOAPs using different models.

WQPs	Model	0 ≤ μ ≤ 0.2 Very Good	0.2 < μ ≤ 0.3 Good	0.3 < μ ≤ 0.4 Middle	0.4 < μ ≤ 0.5 General	μ > 0.5 Poor	Proportion of μ ≤ 0.5
OAPs	MLP	19	5	3	2	7	0.81
	BRR	16	6	5	3	6	0.83
	CART	17	7	5	2	5	0.86
	NNR	21	5	5	2	3	0.92
	LightGBM	16	13	1	2	4	0.89
	RF	21	5	2	3	5	0.86
	SVR	18	9	2	3	4	0.89
	EML-1	27	4	2	1	2	0.94
	EML-2	27	3	3	1	2	0.94
NOAPs	MLP	37	5	6	2	8	0.86
	BRR	34	7	2	3	12	0.79
	CART	37	11	4	1	5	0.91
	NNR	42	6	2	2	6	0.90
	LightGBM	43	4	5	1	5	0.91
	RF	42	4	3	6	3	0.95
	SVR	33	4	6	3	12	0.79
	EML-1	44	2	3	2	7	0.88
	EML-2	46	3	4	3	2	0.97

Table 7. Statistics of MRE for the estimation of WQPs via the EML algorithm.

Parameters	OAPs		NOAPs
Parameters	Chla	SD	COD_Mn	TN	TP
EML-1	13.66%	19.35%	7.57%	15.05%	42.06%
EML-2	14.87%	16.34%	7.74%	13.74%	22.29%

Table 8. Statistics of water quality reversion results in Longdong Reservoir during rainy and dry seasons.

Season	Chla		SD		COD_Mn		TN		TP
Season	Mean	std	Mean	std	Mean	std	Mean	std	Mean	std
rainy	12.97	1.64	45.22	5.60	2.15	0.15	1.09	0.07	0.120	0.030
dry	7.89	2.10	56.16	6.93	2.28	0.14	0.98	0.21	0.068	0.007

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lei, X.; Jiang, J.; Deng, Z.; Wu, D.; Wang, F.; Lai, C.; Wang, Z.; Chen, X. An Ensemble Machine Learning Model to Estimate Urban Water Quality Parameters Using Unmanned Aerial Vehicle Multispectral Imagery. Remote Sens. 2024, 16, 2246. https://doi.org/10.3390/rs16122246

AMA Style

Lei X, Jiang J, Deng Z, Wu D, Wang F, Lai C, Wang Z, Chen X. An Ensemble Machine Learning Model to Estimate Urban Water Quality Parameters Using Unmanned Aerial Vehicle Multispectral Imagery. Remote Sensing. 2024; 16(12):2246. https://doi.org/10.3390/rs16122246

Chicago/Turabian Style

Lei, Xiangdong, Jie Jiang, Zifeng Deng, Di Wu, Fangyi Wang, Chengguang Lai, Zhaoli Wang, and Xiaohong Chen. 2024. "An Ensemble Machine Learning Model to Estimate Urban Water Quality Parameters Using Unmanned Aerial Vehicle Multispectral Imagery" Remote Sensing 16, no. 12: 2246. https://doi.org/10.3390/rs16122246

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Ensemble Machine Learning Model to Estimate Urban Water Quality Parameters Using Unmanned Aerial Vehicle Multispectral Imagery

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Method

2.2.1. Multispectral Data Collection and Preprocessing

2.2.2. Water Sample Collection and Measurement

2.2.3. Oversampling of WQPs Samples Based on SMOTE

2.2.4. Feature Band Selection Based on Feature Combination and Genetic Algorithm (FC-GA)

2.2.5. Ensemble Machine Learning (EML) Algorithm

2.2.6. Performance Evaluation

3. Results

3.1. Statistical Analysis of Water Quality Samples

3.2. Feature Band Selection Results

3.3. Analysis of ML Models and EML Model Results

3.3.1. Comparison of Different Models for Water Quality Estimation

3.3.2. Comparative Analysis of OAP and NOAP Estimation

3.4. Reversal Results of Spatial and Temporal Distribution of WQPs

4. Discussion

4.1. Performance of the FC-GA Method

4.2. Performance of EML Models

4.3. Temporal and Spatial Variations of Water Quality

4.4. Challenges and Opportunities

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI