Regional Remote Sensing of Lake Water Transparency Based on Google Earth Engine: Performance of Empirical Algorithm and Machine Learning

Zeng, Weizhong; Xu, Ke; Cheng, Sihang; Zhao, Lei; Yang, Kun

doi:10.3390/app13064007

Open AccessArticle

Regional Remote Sensing of Lake Water Transparency Based on Google Earth Engine: Performance of Empirical Algorithm and Machine Learning

by

Weizhong Zeng

¹,

Ke Xu

¹,

Sihang Cheng

¹,

Lei Zhao

^2,3,4,* and

Kun Yang

^2,4,*

¹

School of Information Science and Technology, Yunnan Normal University, Kunming 650500, China

²

Faculty of Geography, Yunnan Normal University, Kunming 650500, China

³

Yunnan Key Laboratory for Pollution Processes and Control of Plateau Lake-Watersheds, Kunming 650034, China

⁴

GIS Technology Engineering Research Centre for West-China Resources and Environment, Yunnan Normal University, Ministry Education, Kunming 650500, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2023, 13(6), 4007; https://doi.org/10.3390/app13064007

Submission received: 21 January 2023 / Revised: 14 March 2023 / Accepted: 20 March 2023 / Published: 21 March 2023

(This article belongs to the Special Issue Lake Processes under Climate Change and Human Activities)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Secchi depth (SD) is a valuable and feasible water quality indicator of lake eutrophication. The establishment of an automated system with efficient image processing and an algorithm suitable for the inversion of transparency in lake-rich regions could provide sufficient temporal and spatial information for lake management. These are especially critical for lake-rich regions where in situ monitoring data are scarce. This study demonstrated the implementation of an atmospheric correction algorithm (ACOLITE algorithm) in conjunction with the Google Earth Engine platform to generate remote-sensing reflectance products of specific points efficiently. The study also evaluated the performance of an algorithm for inverting lake SDs in Yunnan Plateau lakes, which is one of the five lake districts in China, since there is a lack of in situ data for most of the lakes in the region. The in situ data from four lakes with large SD ranges and imagery from Landsat Operational Land Imager were used to train and evaluate the performance of two algorithms: an empirical algorithm (stepwise regression) and machine learning (support vector machines and multi-layer perception). The results revealed that the retrieval accuracy of models with bands and band ratio combinations could be substantially improved compared with models with a single band or band combinations. A negative correlation was also observed between the temporal match between observations and the model accuracy. This study found that the MLP model with sufficient training data was more suitable for transparency estimation of lakes belonging to the dataset; the SVM model was more suitable for transparency prediction outside the training set, regardless of the adequacy of the training data. This study provides a reference for monitoring lakes within the Yunnan region using remote sensing.

Keywords:

ACOLITE algorithm; empirical regression; Landsat image; machine learning; Secchi depth; water clarity

1. Introduction

Water is essential for human survival. Water transparency is not only a key water quality indicator that regulates aquatic ecosystem processes and reflects watershed characteristics [1] but also is a sentinel to environmental change and a useful tool for resource managers to monitor intra- and inter-annual changes in lakes. In general, the transparency of lake water is measured by manual sampling, and the Secchi depth (SD) is obtained by measuring the depth of a Secchi disk when the ship reaches the pre-arranged sampling point [2]. However, this method for water transparency monitoring only applies to specific sampling points of the waters, which is insufficient for comprehensively and accurately estimating water transparency at each location of a lake. The solution is to increase the number of sampling points; however, this would require substantial human and material resources. Hence, manual sampling and measurements are not an effective approach for monitoring lake water transparency over a large area and the results does not fully represent the transparency of the entire lake.

With the emergence of remote-sensing satellites, the use of image data to evaluate lake water quality has gained traction [3,4,5,6]. In addition, the Google Earth Engine (GEE) can process and analyze remote-sensing images based on Google’s high-performance cluster servers, which overcomes the limitations of traditional image processing technologies [7,8]. However, as the atmospheric correction algorithm may not be specifically designed for typical radiation in water, the direct use of Landsat 8 surface reflection data extracted from GEE may lead to aquatic remote-sensing errors [9]. Therefore, the combination of the original images extracted from GEE and an atmospheric correction algorithm suited for water bodies provides a highly efficient method to calculate water quality for the ocean and inland water bodies [10].

Remote-sensing inversion methods for water bodies are mainly categorized as semi-analytical [11,12,13,14], empirical [15,16,17,18], and machine learning-based [19,20,21,22] methods. Semi-analytical methods use diffusion attenuation coefficients in the transparent window of water bodies in the visible domain to calculate SD. These methods involve complex parameterization processes and require hyperspectral resolution, which impedes their ability to determine the transparency of lake water [15,16,17,18,23]. Empirical methods establish the relationship between single bands (or band combinations) and lake water transparency with measured datasets by fitting different input parameters with water transparency data, and optimal regression functions are selected based on the goodness of fit to establish inversion models. These methods are simple and convenient and have achieved satisfactory results for several water bodies [16]. However, generating a unified water transparency model for different water body types from a limited dataset is challenging because band selection usually varies with the optical properties of the water environment [24]. Machine learning is another option that uses computing power to learn and identify potential patterns in datasets [19] and has been implemented successfully in the prediction of lake water quality and transparency [19,20,21,22,25]. For example, Kim et al. used the multi-layer perception (MLP) method to achieve high-precision predictions of lake water quality in 2021 [25], and Maciel et al. successfully predicted the SD value using the support vector machines (SVM) algorithm in 2021 [21].

Owing to the complex optical characteristics and high geographic heterogeneity of lakes, it is difficult to ascertain an algorithm or a series of algorithms effective for the inversion of lake water quality globally [26,27,28]. Moreover, there is a scarcity of in situ monitoring data from lake-rich regions in China. Adopting publicly available satellite data and an automated system that includes an effective algorithm suited for the inversion of regional water quality can provide reliable and effective information for regional lake management. Yunnan Plateau lakes are one of the five lake districts in China, possessing 31 lakes with areas greater than 1 km² [29]. In situ data from most of these lakes are absent. Thus, the development of a reliable and cost-effective approach based on the remote-sensing inversion of water quality as the main objective is critical for the management of lakes in the Yunnan Plateau. Here, we selected Landsat Operational Land Imager (OLI) imagery as a remote-sensing data source, which can provide sufficient multispectral channels centered in the red, green, and blue regions that are sensitive and effective for SD inversion [30,31,32]. Moreover, the longer time series of Landsat satellites compared to other satellites can provide the possibility to trace the water quality of lakes with missing in situ data. Additionally, because of the relatively small area of Yunnan Plateau lakes, of which 64.5% are smaller than 10 km², the spatial resolution of OLI imagery is more suitable than other moderate spatial resolution imagery, such as the moderate-resolution spectroradiometer (MODIS) and medium-resolution imaging spectrometer instrument (MERIS).

The ACOLITE algorithm has been regarded as one of the mainstream atmospheric correction algorithms by scholars and is frequently used in related studies [33,34,35]. However, so far, there are still difficulties in combining ACOLITE with GEE, which means that the application of the ACOLITE algorithm requires the support of local image data. The combination of ACOLITE and GEE will bring significant application value for the online processing of image data.

The matching window between in situ data and satellite data is generally ±1 day, but due to the lack of in situ data of lakes and the mismatch between sampling time and satellite imaging time, researchers have to expand the matching time window to obtain more experimental data. Although expanding the matching time window will reduce the accuracy of the empirical model, high accuracy is still observed for the model within the ±7-day time window [16,36,37]. Due to the inconsistency of the time windows used in the relevant studies of machine-learning methods, it is difficult to determine an effective time window for the training of machine-learning models [19,20,21,22]. At the same time, due to the existence of long water exchange cycles and stable aquatic ecosystems in the Yunnan lakes, lake transparency will not change significantly in a short period of time, and it is worth discussing whether it is suitable to use a longer window.

To obtain remote-sensing transparency information of lakes more effectively, we (1) built a bridge between the ACOLITE algorithm and the GEE platform to achieve online processing of remote-sensing images using the ACOLITE method. Additionally, using four lakes from Yunnan Province, China, as an example, we (2) evaluated the applicability of empirical algorithms and machine-learning algorithms for lakes in the Yunnan Province under different time windows to obtain more suitable algorithms and time windows for the application of lake management in Yunnan. We hope that the algorithmic model obtained in this study and the combination of GEE and ACOLITE algorithms will help complete the online deducing function of lake transparency distribution in the region in the future, as well as expand the supported time series to the initial imaging time of Landsat series satellites, using remote sensing to fill in the missing data from past water resources surveys.

2. Materials and Methods

2.1. Study Area

Four lakes (Dianchi, Yilong, Erhai, and Fuxian Lake) located in Yunnan Province, southern China, were selected as the study area in this study; their geographical locations are shown in Figure 1. Dianchi Lake and Yilong Lake are low-transparency lakes with mean SD values of 0.47 m and 0.38 m, respectively. Erhai Lake has medium transparency, with a mean SD value of 1.92 m. Fuxian Lake has high transparency, with a mean SD value of 5.59 m.

2.2. Data Acquisition

2.2.1. In Situ Data

Four datasets comprising in situ SD data were collected: Dianchi Lake (from 2015 to 2020), Yilong Lake (from 2010 to 2019), Fuxian Lake (from 2012 to 2017), and Erhai Lake (from 2012 to 2017). Sampling sites were spaced at least 1 km apart, with 10 points in Dianchi Lake, 18 points in Erhai Lake, 13 points in Fuxian Lake, and 3 points in Yilong Lake, with a sampling frequency of once a month. Table 1 presents the statistics of SD measured in the four lakes.

2.2.2. Satellite Data and Atmospheric Correction Algorithm

The GEE platform provides various Landsat 8 products, including surface reflectance, top-of-atmosphere (TOA) reflectance, and raw images. To avoid possible errors induced by an unsuitable atmospheric correction algorithm [9], raw image data were used in this study.

ACOLITE algorithms, which were developed by the Royal Belgian Institute of Natural Sciences (RBINS) and are particularly suitable for inland and marine water bodies, provided an atmospheric correction algorithm to produce output data, such as surface reflectance (rhos), surface reflectance for water pixels (rhow), and remote-sensing reflectance for water pixels (Rrs) (the latest ACOLITE python source code is available from GitHub (https:/github.com/acolite/acolite, accessed on 21 March 2023). To implement the ACOLITE algorithm in the GEE platform, some revisions were required: (1) Screening: filtering the image collections containing sampling points according to the input time series and sampling points. (2) Atmospheric correction: Converting the unmodifiable ee.image format from the GEE to the modifiable .nc format, in which non-aqueous pixels and cloud pixels are masked, and the TOA reflectance values are converted to Rrs values. (3) Cutting: Obtain the corresponding Rrs value from the .nc file according to the coordinates of the measurement points. (4) Reorganization: collecting the Rrs values of all measuring points and storing them in a csv file. The above was executed using Google Colab, and the relevant content is shared at GitHub (https://github.com/Zwz0003/github.git, accessed on 21 March 2023). A total of 1829 satellite data points were obtained based on the measurement points and time series (January 2013–August 2022).

2.3. Time Matching

Time matching between in situ observations and satellite data is another important problem for remote sensing of lake water quality [36]. Owing to the long water exchange cycle and stable aquatic conditions in highland lakes, lake transparency does not freely change drastically in a short period of time. To evaluate the impact of time matching on the inversion accuracy of the model and to obtain a reasonable time matching window, we extracted a series of datasets to represent different time windows between in situ and satellite data, including 1, 3, and 7 days. The SD statistics of 1-day, 3-day, and 7-day time window matching data are shown in Table 2. These were used separately as the dataset for model training.

2.4. Algorithm Implementation

2.4.1. Empirical Algorithm

Several studies have used empirical algorithms with single-band and band combinations (or dual-band and band combinations) [10,30,36,38,39,40]. The results of these studies show that using band combinations for remote-sensing inversion of lake water quality is better than using single bands [41]. Moreover, band ratios are helpful for eliminating the noise of the reflected signal from the bottom of a water body, and some Landsat image studies have shown that the partial band ratio is effective for the prediction of water transparency in many lakes. However, it is difficult to manually identify the appropriate bands and band ratios for water transparency prediction [10,30,36,38,39,40]. Therefore, stepwise regression was used in this study to identify bands and band ratios suitable for predicting regional water transparency.

Stepwise regression is a simple and effective method with few variables in the regression equation, and the most significant variables are retained. This method is effective in practice and achieves high prediction accuracy for water transparency [32]. Furthermore, if there is a relationship (i.e., multicollinearity) between variables, stepwise regression can correct the multicollinearity to a certain extent. All variables were used as input parameters, including specific band reflectivity, band ratios, and other forms of algebraic combination [36,42,43]. We cannot intuitively assess the validity of any variable to explain SD. Hence, we used stepwise regression to obtain the multivariate linear equation that can best describe SD.

Logarithmic transformation can provide more stable SD estimates than using measured SD values directly [36,44]. To improve the generalization ability of the regression model, two variables were randomly selected as the initial variables, and the Akaike information criterion (AIC) [45] was used to evaluate the model’s goodness-of-fit after introducing or deleting variables until the AIC value was no longer reduced significantly. The final results not only ensured an accurate interpretation of the data but also avoided overfitting.

2.4.2. Machine-Learning Algorithms

SVM is a robust algorithm with high accuracy and minimal overfitting [46]. If the data is linearly separable, the model creates “spacing bands” on both sides of the linear function so that only the samples within it have an effect on the model. If the original data is linearly indistinguishable, the SVM maps it to a higher dimensional space until the data can be represented by a linear relationship, at which point regression is performed. Even if the data are linearly inseparable in the original feature space, SVM can run well, provided an appropriate kernel function is employed. This algorithm can solve high-dimensional problems (e.g., the problem of multiple input variables in this study) and deal with interactions between nonlinear features. When there is no linear relationship between input variables, the algorithm can still generate satisfactory results. Furthermore, only a fraction of the data is required to train a suitable model and improve the model’s generalization ability.

MLP is the first and simplest artificial neural network algorithm with a highly nonlinear global effect [47]. The method is an optimization process that uses gradient descent methods to continuously update the gradients and parameters during iterative training until the error is sufficiently small or the upper limit of the iteration is reached. Even if there is no linear relationship between the input variable and the target, the MLP algorithm can still generate a result that expresses the relationship between the two factors. Owing to its high fault tolerance, the final results will not be affected by it and produce a low fit, even if some poor data are included. Since it has an associative memory function, the trained model can achieve suitable results when directly applied to similar areas. Given its strong adaptive and self-learning capacity, an appropriate model can be obtained through continuous adjustment, even when using poor data.

2.4.3. Model Structure

To thoroughly examine the performance of different algorithms and identify the best model for the regional remote sensing of SD, we performed a series of experiments considering two main variables: spectral variables and different datasets of the time window between in situ data and satellite data. The spectral variables include five bands of OLI (443, 483, 561, 655, and 865 nm), which were considered to be the most sensitive and effective for the inversion of SD and have been demonstrated to retrieve SD in some inland waters [30,48]. The spectral variables are divided into two groups: single band or band combination, which has five bands as candidates (group 1), and band combination with band ratio, which has 25 variables as candidates (group 2). Each group was trained and evaluated separately based on the time window datasets mentioned above (Table 2).

2.5. Algorithm Assessment

A hold-out strategy, wherein the dataset was randomly divided into a training (80%) and test set (20%), was adopted to evaluate the model performance. The training set was used to train the model, and the test set was used to estimate the generalization error of the final model when dealing with a real scenario. The mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination (R-squared) were used to assess the accuracy and precision of the model performance. A low MAE or RMSE value indicated high prediction accuracy of the model, and an R-squared value close to 1 indicated that the model was a suitable fit for the data. The calculation formulas used were as follows:

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}, \in [0, + \infty)

(1)

MAE = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |, \in [0, + \infty)

(2)

R - squared = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - {\overline{y}}_{i})}^{2}}, \in [0, 1)

(3)

where

y

is the measured value,

\hat{y}

is the predicted value,

\overline{y}

is the mean of the true values,

n

is the number of data pairs, and subscript

i

denotes individual data points.

3. Results

3.1. Empirical Algorithm

The number of final parameters and the prediction accuracy of each stepwise regression model (Table 3 experiment number NO.1-6) are shown in Supplementary Materials Table S1. The results showed (Figure 2) that the R-squared of the models increased with the number of parameters added to the model formula. However, after the number of parameters reached a certain level, increasing the parameters of the model no longer significantly improved the R-squared value of the model, which implied that convergence was achieved. By comparing the R-squared of the models with the simultaneous window training dataset (Figure 2a,b: lines with the same color), it was clear that the group of spectral variables using the bands and band ratio combination had a higher R-squared, that is, the spectral variables using the bands and band ratio combination can better explain the relationship between remote-sensing reflectance and lake SD. Moreover, as shown in Figure 2, the model with the 1-day time window training dataset performed better than the model with a longer time window training dataset, and the R-squared value decreased as the time window of the training set increased. Table 4 shows the optimal solutions of the two groups of models with different spectral variables, and the R-squared of spectral variable group 2 was significantly higher than that of spectral variable group 1.

Moreover, to identify whether the model trained with a shorter or longer time window data can produce accurate lake SD using spectral data beyond its training data, a 1-day trained model was used for the 7-day time window spectral data, and a 7-day trained model was used for the 1-day time window spectral data. The results (Table 5) illustrate that although the 1-day model prediction accuracy for the 7-day time window dataset was lower than the prediction accuracy for the 1-day time window dataset, it was still at a high level; that is, the model still had suitable prediction accuracy for data outside the training set. Using the model trained from the 7-day time window training dataset to predict the 1-day time window data, the resulting R-squared was higher than itself; therefore, it was feasible to train the model using the 7-day time window data in case of insufficient data.

3.2. Machine-Learning Algorithms

In this study, MLP and SVM were used to verify the ability of machine learning to interpret the relationship between remote-sensing reflectance and lake SD. The results (Tables S1 and S2) showed that the MLP model (Table 3, experiment no. NO.7-12) had a higher R-squared than the empirical algorithmic models overall. This indicated that the MLP model performed better in interpreting the relationship between spectral reflectance and lake SD. The relationship between the time window and R-squared of the MLP and SVM models is shown in Figure 3. Both machine-learning models had a performance similar to that of the empirical algorithm model when confronted with changes in the time window; that is, the accuracy of the model exhibited a decreasing trend as the time window increased. This trend was improved when spectral variable group 2 was used.

The models with the highest R-squared, which are optimal solutions for different sets of spectral variables, are plotted in Figure 4. As shown in Figure 4, both machine-learning models have some scatter that deviates from the fitted line in the figure. In general, the MLP model has a scatter distribution that is closer to the fitted line, whereas the scatter distribution of the different sets of spectral variables for MLP was not significant. Further, there was no significant difference between the scatter distribution of different spectral variable groups of the MLP, where the SVM using a single band or a combination of bands (blue dots) had a larger deviation value compared to the spectral variable group using a combination of bands and a combination of band proportions (orange dots). Similar to the empirical algorithm model, the use of spectral variable groups with band combinations and band proportional combinations (orange points) reduces the bias of the SVM model, whereas the MLP model using different spectral variable groups does not produce significant differences. Therefore, the MLP model using a 1-day time window training dataset was the best machine-learning model for estimating regional lake SD.

Experiments were conducted to evaluate the performance of machine-learning models with spectral datasets shorter or longer than their training data, in which the MLP and SVM models using spectral variable group 2 were selected and trained with 1-day and 7-day time window training datasets, respectively. As shown in Table 6, the prediction accuracy of the MLP model decreased substantially and was particularly low when interpreting data beyond its own training dataset; however, it performs well for a shorter time window dataset. The SVM model maintains a high level of prediction accuracy when using remote-sensing reflectance data that do not exist in the training set for SD prediction, although the prediction accuracy decreases, and for the shorter time window dataset, the prediction accuracy remains stable. In conclusion, the machine-learning algorithm was suitable for using longer datasets to obtain more stable prediction results.

4. Discussion

4.1. Uncertainty of SD Remote Sensing

The uncertainty of SD remote sensing was mainly induced by two factors, the dataset for model training and the algorithms themselves. The uncertainty induced by the dataset was primarily due to the mismatch between the in situ data and satellite data. Kloiber et al. found that when the in situ data were collected within ±1 day of satellite imaging, the highest fitting degree of the empirical algorithm could be obtained, and when the data were collected within ±7 days, although the model accuracy was reduced, it was still at a high level in 2002 [36]. Zhang et al. verified the relationship between the accuracy of the empirical model and the time window and concluded that there is a negative correlation between them, although a time window of 12 days can still maintain high accuracy in 2021 [37]. Our results are consistent with these findings in that an increase in the training time window reduces the accuracy of the model (Tables S1 and S2). However, this condition was effectively mitigated by using band combinations and band ratio combinations, and there were even cases in the SVM model where the model accuracy did not change substantially as the time window increased (Figure 3). We believe that this phenomenon occurred because machine-learning models explain the relationship between band combinations, band ratio combinations, and SD more effectively.

Moreover, to verify whether the predictions of models trained by different time windows differ from each other, we conducted a double-sample t-test on the 1-day time window prediction results of each model. The results (Table 7) showed that there was no significant difference between the prediction results of the models trained with the 1-day and 7-day time window datasets. These results clearly show that it is reasonable to use a 7-day time window dataset for training without significantly affecting the accuracy of the model.

The uncertainty of the algorithm arises from the random assignment of data to the training and test sets and parameters [49,50]. Random selection of the training and test sets can improve the generalization ability of the model, but it may also cause uncertainty in the training results [50]. It was difficult to obtain the same results with different training sets, despite the conditions remaining consistent. Moreover, different parameters (threshold of the stepwise regression model, number of hidden layers of the MLP, number of neurons per layer, penalty coefficients of the SVM, etc.) may result in different results for the same training set. A feasible strategy for improving the generalization ability and prediction accuracy of the model is to increase the number of tests. However, this strategy may also have certain uncertainties; for instance, we cannot exhaust all tests to determine whether the model is optimal. In this study, the MLP model with certain parameters (details are provided in Section 2.4. Algorithm implementation) and the spectral variable group 2 and 3-day time window training dataset was the best model obtained; however, as per our analysis, once the experimental conditions (dataset, model parameters, etc.) change, the optimal model may need to be re-determined.

4.2. Algorithm Comparison

Several studies have demonstrated that the accuracy of empirical algorithms applied to the inversion of lake SD could be improved by using band and band ratio combinations; however, the accuracy of empirical algorithms also has certain limitations [18,37,51,52]. For example, Rubin et al. collected the R-squared values (range: 0.6–0.98) of 13 previously published non-machine-learning empirical algorithms, including single-band, band-logarithm, multi-band, and inter-band-ratio algorithms, and found that the R-squared values were 0.6 (n = 120), 0.93 (n = 374), and 0.85 (n = 15,615) in 2021 [19]. Additionally, Olmanson et al. established a stepwise regression model with Landsat 8 OLI to explore the relationship between all bands (or band ratio) and SD in Minnesota Lake in 2016 [32]. The model yielded R-squared values in the range of 0.6–0.83, which was similar to the R-squared values reported by Rubin et al. [19]. Kabiri et al. applied 17 combinations of the highest correlation band/band ratios to estimate the SD, and the highest R-squared obtained was 0.866 in 2016 [53]. Deutsch et al. verified the relationship between landsat8 blue/red bands and SD in Canada using an empirical algorithm with a maximum R-squared value of 0.65 in 2021 [16]. Alikas et al. applied partial band ratios for SD prediction in the Baltic Sea and inland waters of Northern Europe with an accuracy of 0.66–0.73 in 2017 [24]. In our study, the R-squared of the stepwise regression models were in the range of 0.57–0.86, which is similar to other studies [16,19,24,32,53]. Other researcher have explored the prediction of lake transparency using machine-learning algorithms, showing that such algorithms are indeed more easily available with higher accuracy than traditional algorithms [19,20,21,22].

The combination of band and band ratio provided better estimates and higher R-squared for SD estimation in all models compared to a single band or band combinations, which is consistent with the findings of many studies [15,16,17,18,19,24,32,53]. In terms of the wave and band ratio combinations, the SVM model and the stepwise regression model have similar performance under a 1-day time window, and it can be speculated that the data exhibit a degree of linearity under this condition and are shown by both models. However, as the time window expands, the two models under this condition show different trends, with the SD interpretation ability of the regression model gradually decreasing, while the SVM model tends to stabilize, which we believe is due to the lack of the high-dimensional linear identification ability of the regression model. Additionally, the SVM had better performance than the regression model in the face of multiple and complex data. Although the MLP had very suitable performance in all time windows, its accuracy decreased substantially compared to the SVM model and the regression model when estimating data beyond its own learning capability (using a 1-day time window for training and a 7-day time window for validation), which means that the model was not applicable to inferring the transparency of a wider region beyond the learning region.

Machine-learning models, compared with traditional empirical algorithms, can estimate lake SD better when using the same set of spectral variables and the same training set. With the same set of spectral variables and training dataset, the SVM and MLP models had lower MAE and RMSE and higher R-squared values than the stepwise regression models (Tables S1 and S2). This clearly demonstrates that machine-learning models outperform empirical algorithmic models (with higher accuracy and better fit) in explaining the relationship between remotely sensed reflectance and lake SD.

In addition, although the empirical and SVM algorithms are not as accurate as the MLP algorithm, both algorithms produce acceptable accuracy in the face of unlearned data. More importantly, they are less limited by the amount of observed data. Therefore, empirical regression and SVM models can be considered useful candidates for lakes with limited field monitoring data. The MLP model performs well in predicting the SD; however, this approach requires a large in situ dataset for model training and is more suitable for large-scale lake datasets.

5. Conclusions

The purpose of this study was to obtain remote-sensing transparency information about lakes more effectively. The combination of the GEE and ACOLITE algorithm was used to achieve the capability to obtain accurate remote-sensing reflectance of measurement points online and to do so in a quick and efficient manner. The effectiveness of the empirical model, SVM model, and MLP model for estimating the transparency of selected lakes in Yunnan was evaluated under the condition of using different time windows. It was found that using a combination of bands and band ratios provided a stronger transparency interpretation capability compared with using only a single band or a combination of bands. The increase in time windows could reduce the prediction accuracy of the model; however, using a combination of band combinations and band ratios would alleviate this. The MLP model had much higher accuracy than the regression and SVM models when predicting trained data but not when the prediction target was outside the training data. The empirical method was more suitable for training with a 1-day time window when the model had the highest accuracy and was still effective in predicting data outside the training data. The SVM method trained with a 7-day time window provided a model with higher accuracy and more reliable and stable performance than the empirical model when dealing with data outside the learning region, similar to the results of the empirical model when trained with a 1-day time window.

In general, we found that the MLP model can better explain the relationship between remote-sensing reflectance and lake transparency when sufficient data are available, and the prediction target is the lake to which the training data belong. The performance of the SVM model is expected if the target is the lake outside the study area.

Our work makes a significant contribution to the use of remote sensing to monitor lakes in the Yunnan region. Although this work only completed remote-sensing re-reflection extraction and transparency estimation of sample points, only some lakes in the Yunnan region were studied, which cannot fully represent the applicability of the study results in the whole Yunnan region; meanwhile, due to the lack of field data, only field data in some time periods were used for the study. However, the combination of the ACOLITE algorithm and GEE provides a technical basis for the next step to realize the online monitoring of regional lake transparency, and the model evaluation results provide an experimental basis for the next step. In the future, we will expand the temporal and spatial scales of the study based on the findings of this study, and extend the inversion work of transparency from point-based to the whole lake, the whole Yunnan Province, or an even larger scale. The supported time series from Landsat 8 will be expanded to the farthest time that can be traced by Landsat series satellites. This work provides a framework and experimental basis for realizing remote-sensing lake monitoring in Yunnan Province and will be the cornerstone for realizing remote-sensing monitoring of water resources in Yunnan Province.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/app13064007/s1, Table S1: Statistics of the prediction effect of stepwise regression model on its own training dataset; Table S2: Statistics of the prediction effect of multi-layer perceptron model and support vector machines model on its own training dataset.

Author Contributions

Conceptualization, K.Y. and L.Z.; methodology, W.Z.; software, W.Z.; validation, W.Z., K.X. and S.C.; formal analysis, W.Z.; investigation, W.Z. and L.Z.; resources, L.Z.; data curation, W.Z.; writing—original draft preparation, W.Z.; writing—review and editing, W.Z.; visualization, W.Z.; supervision, L.Z.; project administration, W.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (no. 41961019).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

SD	Secchi depth
ACOLITE	Atmospheric correction for OLI ‘lite’
SVM	Support vector machine
MLP	Multi-layer perception
SW	Stepwise regression
GEE	Google Earth Engine
Rrs	Remote-sensing reflectance for water pixels
TOA	Top-of-atmosphere reflectance

References

Rose, K.C.; Greb, S.R.; Diebel, M.; Turner, M.G. Annual precipitation regulates spatial and temporal drivers of lake water clarity. Ecol. Appl. 2017, 27, 632–643. [Google Scholar] [CrossRef]
Fee, E.J.; Hecky, R.E.; Kasian, S.E.M.; Cruikshank, D.R. Effects of lake size, water clarity, and climatic variability on mixing depths in Canadian Shield lakes. Limnol. Oceanogr. 1996, 41, 912–920. [Google Scholar] [CrossRef]
Miao, Q.; Liu, C.; Tan, X.H.; Liu, Z.Q.; Gao, Y. Eutrophication Assessment of Nansi Lake, China through Remote Sensing Technology. In Proceedings of the 1st International Conference on Future Computer and Communication (FCC 2009), Wuhan, China, 6–7 June 2009; pp. 28–31. [Google Scholar]
Cao, Q.; Yu, G.L.; Qiao, Z.Y. Application and recent progress of inland water monitoring using remote sensing techniques. Environ. Monit. Assess. 2023, 195, 125. [Google Scholar] [CrossRef] [PubMed]
Dube, T.; Shekede, M.D.; Massari, C. Remote Sensing for Water Resources and Environmental Management. Remote Sens. 2023, 15, 18. [Google Scholar] [CrossRef]
Cai, X.L.; Li, Y.M.; Lei, S.H.; Zeng, S.; Zhao, Z.L.; Lyu, H.; Dong, X.Z.; Li, J.D.; Wang, H.J.; Xu, J.; et al. A hybrid remote sensing approach for estimating chemical oxygen demand concentration in optically complex waters: A case study in inland lake waters in eastern China. Sci. Total Environ. 2023, 856, 158869. [Google Scholar] [CrossRef]
Amani, M.; Ghorbanian, A.; Ahmadi, S.A.; Kakooei, M.; Moghimi, A.; Mirmazloumi, S.M.; Moghaddam, S.H.A.; Mahdavi, S.; Ghahremanloo, M.; Parsian, S.; et al. Google Earth Engine Cloud Computing Platform for Remote Sensing Big Data Applications: A Comprehensive Review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 5326–5350. [Google Scholar] [CrossRef]
Paul, A.; Vignesh, K.S.; Sood, A.; Bhaumik, S.; Singh, K.A.; Sethupathi, S.; Chanda, A. Suspended Particulate Matter Analysis of Pre and during Covid Lockdown Using Google Earth Engine Cloud Computing: A Case Study of Ukai Reservoir. Bull. Environ. Contam. Toxicol. 2023, 110, 7. [Google Scholar] [CrossRef]
Pahlevan, N.; Lee, Z.; Wei, J.; Schaaf, C.B.; Schott, J.R.; Berk, A. On-orbit radiometric characterization of OLI (Landsat-8) for applications in aquatic remote sensing. Remote Sens. Environ. 2014, 154, 272–284. [Google Scholar] [CrossRef]
Page, B.P.; Olmanson, L.G.; Mishra, D.R. A harmonized image processing workflow using Sentinel-2/MSI and Landsat-8/OLI for mapping water clarity in optically variable lake systems. Remote Sens. Environ. 2019, 231, 111284. [Google Scholar] [CrossRef]
Lee, Z.P.; Shang, S.L.; Hu, C.M.; Du, K.P.; Weidemann, A.; Hou, W.L.; Lin, J.F.; Lin, G. Secchi disk depth: A new theory and mechanistic model for underwater visibility. Remote Sens. Environ. 2015, 169, 139–149. [Google Scholar] [CrossRef] [Green Version]
Qin, Z.; Wen, Y.; Jiang, J.; Sun, Q. An improved algorithm for estimating the Secchi disk depth of inland waters across China based on Sentinel-2 MSI data. Environ. Sci. Pollut. Res. Int. 2023. [Google Scholar] [CrossRef] [PubMed]
Tan, Z.Y.; Cao, Z.G.; Shen, M.; Chen, J.; Song, Q.J.; Duan, H.T. Remote Estimation of Water Clarity and Suspended Particulate Matter in Qinghai Lake from 2001 to 2020 Using MODIS Images. Remote Sens. 2022, 14, 3094. [Google Scholar] [CrossRef]
Roy, S.; Ojha, S.R.; Reddy, N.N.; Samal, R.N.; Das, B.S. Suspended particulate matter and secchi disk depth in the Chilika Lagoon from in situ and remote sensing data: A modified semi-analytical approach. Int. J. Remote Sens. 2022, 43, 3628–3654. [Google Scholar] [CrossRef]
Binding, C.E.; Greenberg, T.A.; Watson, S.B.; Rastin, S.; Gould, J. Long term water clarity changes in North America’s Great Lakes from multi-sensor satellite observations. Limnol. Oceanogr. 2015, 60, 1976–1995. [Google Scholar] [CrossRef]
Deutsch, E.S.; Cardille, J.A.; Koll-Egyed, T.; Fortin, M.J. Landsat 8 Lake Water Clarity Empirical Algorithms: Large-Scale Calibration and Validation Using Government and Citizen Science Data from across Canada. Remote Sens. 2021, 13, 1257. [Google Scholar] [CrossRef]
Deutsch, E.S.; Fortin, M.J.; Cardille, J.A. Assessing the current water clarity status of similar to 100,000 lakes across southern Canada: A remote sensing approach. Sci. Total Environ. 2022, 826, 153971. [Google Scholar] [CrossRef]
Li, Y.; Shi, K.; Zhang, Y.; Zhu, G.; Zhang, Y.; Wu, Z.; Liu, M.; Guo, Y.; Li, N. Analysis of water clarity decrease in Xin’anjiang Reservoir, China, from 30-Year Landsat TM, ETM+, and OLI observations. J. Hydrol. 2020, 590, 125476. [Google Scholar] [CrossRef]
Rubin, H.J.; Lutz, D.A.; Steele, B.G.; Cottingham, K.L.; Weathers, K.C.; Ducey, M.J.; Palace, M.; Johnson, K.M.; Chipman, J.W. Remote Sensing of Lake Water Clarity: Performance and Transferability of Both Historical Algorithms and Machine Learning. Remote Sens. 2021, 13, 1434. [Google Scholar] [CrossRef]
Lee, C.C.; Barnes, B.B.; Sheridan, S.C.; Smith, E.T.; Hu, C.M.; Pirhalla, D.E.; Ransibrahmanakul, V.; Adams, R. Using machine learning to model and predict water clarity in the Great Lakes. J. Great Lakes Res. 2020, 46, 1501–1510. [Google Scholar] [CrossRef]
Maciel, D.A.; Barbosa, C.C.F.; Novo, E.M.L.d.M.; Flores Júnior, R.; Begliomini, F.N. Water clarity in Brazilian water assessed using Sentinel-2 and machine learning methods. ISPRS J. Photogramm. Remote Sens. 2021, 182, 134–152. [Google Scholar] [CrossRef]
He, Y.; Lu, Z.; Wang, W.J.; Zhang, D.; Zhang, Y.L.; Qin, B.Q.; Shi, K.; Yang, X.F. Water clarity mapping of global lakes using a novel hybrid deep-learning-based recurrent model with Landsat OLI images. Water Res. 2022, 215, 118241. [Google Scholar] [CrossRef] [PubMed]
Cao, Z.; Duan, H.; Feng, L.; Ma, R.; Xue, K. Climate- and human-induced changes in suspended particulate matter over Lake Hongze on short and long timescales. Remote Sens. Environ. 2017, 192, 98–113. [Google Scholar] [CrossRef]
Alikas, K.; Kratzer, S. Improved retrieval of Secchi depth for optically-complex waters using remote sensing data. Ecol. Indic. 2017, 77, 218–227. [Google Scholar] [CrossRef]
Kim, J.; Seo, D.; Jang, M.; Kim, J. Augmentation of limited input data using an artificial neural network method to improve the accuracy of water quality modeling in a large lake. J. Hydrol. 2021, 602, 126817. [Google Scholar] [CrossRef]
Li, S.; Song, K.; Wang, S.; Liu, G.; Wen, Z.; Shang, Y.; Lyu, L.; Chen, F.; Xu, S.; Tao, H.; et al. Quantification of chlorophyll-a in typical lakes across China using Sentinel-2 MSI imagery with machine learning algorithm. Sci. Total Environ. 2021, 778, 146271. [Google Scholar] [CrossRef] [PubMed]
Sagan, V.; Peterson, K.T.; Maimaitijiang, M.; Sidike, P.; Sloan, J.; Greeling, B.A.; Maalouf, S.; Adams, C. Monitoring inland water quality using remote sensing: Potential and limitations of spectral indices, bio-optical simulations, machine learning, and cloud computing. Earth-Sci. Rev. 2020, 205, 103187. [Google Scholar] [CrossRef]
Xu, M.; Liu, H.; Beck, R.; Lekki, J.; Yang, B.; Shu, S.; Kang, E.L.; Anderson, R.; Johansen, R.; Emery, E.; et al. A spectral space partition guided ensemble method for retrieving chlorophyll-a concentration in inland waters from Sentinel-2A satellite imagery. J. Great Lakes Res. 2019, 45, 454–465. [Google Scholar] [CrossRef]
Ma, R.; Yang, G.; Duan, H.; Jiang, J.; Wang, S.; Feng, X.; Li, A.; Kong, F.; Xue, B.; Wu, J.; et al. China’s lakes at present: Number, area and spatial distribution. Sci. China Earth Sci. 2010, 54, 283–289. [Google Scholar] [CrossRef]
Matthews, M.W. A current review of empirical procedures of remote sensing in inland and near-coastal transitional waters. Int. J. Remote Sens. 2011, 32, 6855–6899. [Google Scholar] [CrossRef]
Song, K.; Liu, G.; Wang, Q.; Wen, Z.; Lyu, L.; Du, Y.; Sha, L.; Fang, C. Quantification of lake clarity in China using Landsat OLI imagery data. Remote Sens. Environ. 2020, 243, 111800. [Google Scholar] [CrossRef]
Olmanson, L.G.; Brezonik, P.L.; Finlay, J.C.; Bauer, M.E. Comparison of Landsat 8 and Landsat 7 for regional measurements of CDOM and water clarity in lakes. Remote Sens. Environ. 2016, 185, 119–128. [Google Scholar] [CrossRef]
Rodríguez-López, L.; Duran-Llacer, I.; González-Rodríguez, L.; Cardenas, R.; Urrutia, R. Retrieving Water Turbidity in Araucanian Lakes (South-Central Chile) Based on Multispectral Landsat Imagery. Remote Sens. 2021, 13, 3133. [Google Scholar] [CrossRef]
Renosh, P.R.; Doxaran, D.; Keukelaere, L.D.; Gossn, J.I. Evaluation of Atmospheric Correction Algorithms for Sentinel-2-MSI and Sentinel-3-OLCI in Highly Turbid Estuarine Waters. Remote Sens. 2020, 12, 1285. [Google Scholar] [CrossRef] [Green Version]
Li, Q.; Jiang, L.L.; Chen, Y.L.; Wang, L.; Wang, L.X. Evaluation of seven atmospheric correction algorithms for OLCI images over the coastal waters of Qinhuangdao in Bohai Sea. Reg. Stud. Mar. Sci. 2022, 56, 102711. [Google Scholar] [CrossRef]
Kloiber, S.M.; Brezonik, P.L.; Olmanson, L.G.; Bauer, M.E. A procedure for regional lake water clarity assessment using Landsat multispectral data. Remote Sens. Environ. 2002, 82, 38–47. [Google Scholar] [CrossRef]
Zhang, Y.; Zhang, Y.; Shi, K.; Zhou, Y.; Li, N. Remote sensing estimation of water clarity for various lakes in China. Water Res. 2021, 192, 116844. [Google Scholar] [CrossRef]
Deutsch, E.S.; Alameddine, I.; El-Fadel, M. Monitoring water quality in a hypereutrophic reservoir using Landsat ETM+ and OLI sensors: How transferable are the water quality algorithms? Environ. Monit. Assess. 2018, 190, 141. [Google Scholar] [CrossRef] [PubMed]
Hicks, B.J.; Stichbury, G.A.; Brabyn, L.K.; Allan, M.G.; Ashraf, S. Hindcasting water clarity from Landsat satellite images of unmonitored shallow lakes in the Waikato region, New Zealand. Environ. Monit. Assess. 2013, 185, 7245–7261. [Google Scholar] [CrossRef]
Olmanson, L.G.; Brezonik, P.L.; Bauer, M.E. Evaluation of medium to low resolution satellite imagery for regional lake water quality assessments. Water Resour. Res. 2011, 47, 1–14. [Google Scholar] [CrossRef] [Green Version]
Yu, D.F.; Zhou, B.; Zhang, X.Q.; Xie, W.H.; Liu, E.X. Retrieval of Secchi disk depth in offshore marine areas based on simulated HICO from in situ hyperspectral data. In Proceedings of the International Conference on Intelligent Earth Observing and Applications (IEOAs), Guilin, China, 23–24 October 2015; Volume 9808. [Google Scholar]
Lathrop, R.C.; Carpenter, S.R.; Rudstam, L.G. Water clarity in Lake Mendota since 1900: Responses to differing levels of nutrients and herbivory. Can. J. Fish. Aquat. Sci. 1996, 53, 2250–2261. [Google Scholar] [CrossRef]
Wu, G.; De Leeuw, J.; Skidmore, A.K.; Prins, H.H.T.; Liu, Y. Comparison of MODIS and Landsat TM5 images for mapping tempo–spatial dynamics of Secchi disk depths in Poyang Lake National Nature Reserve, China. Int. J. Remote Sens. 2008, 29, 2183–2198. [Google Scholar] [CrossRef]
Olmanson, L.G.; Bauer, M.E.; Brezonik, P.L. A 20-year Landsat water clarity census of Minnesota’s 10,000 lakes. Remote Sens. Environ. 2008, 112, 4086–4097. [Google Scholar] [CrossRef]
Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control 1974, 19, 716–723. [Google Scholar] [CrossRef]
Noble, W.S. What is a support vector machine? Nat. Biotechnol. 2006, 24, 1565–1567. [Google Scholar] [CrossRef]
Zhao, Q.; Shang, Z. Deep learning and Its Development. J. Phys. Conf. Ser. 2021, 1948, 012023. [Google Scholar] [CrossRef]
Cui, Y.H.; Yan, Z.N.; Wang, J.; Hao, S.; Liu, Y.C. Deep learning-based remote sensing estimation of water transparency in shallow lakes by combining Landsat 8 and Sentinel 2 images. Environ. Sci. Pollut. Res. 2021, 29, 4401–4413. [Google Scholar] [CrossRef]
Caldeira, J. Deeply Uncertain: Comparing Methods of Uncertainty Quantification in Deep Learning Algorithms [Slides]; Fermi National Accelerator Lab. (FNAL): Batavia, IL, USA, 2020. [Google Scholar]
Hüllermeier, E.; Waegeman, W. Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods. Mach. Learn. 2021, 110, 457–506. [Google Scholar] [CrossRef]
Kwon, Y.S.; Baek, S.H.; Lim, Y.K.; Pyo, J.; Ligaray, M.; Park, Y.; Cho, K.H. Monitoring Coastal Chlorophyll-a Concentrations in Coastal Areas Using Machine Learning Models. Water 2018, 10, 1020. [Google Scholar] [CrossRef] [Green Version]
Nazeer, M.; Bilal, M.; Alsahli, M.M.M.; Shahzad, M.I.; Waqas, A. Evaluation of Empirical and Machine Learning Algorithms for Estimation of Coastal Water Quality Parameters. ISPRS Int. J. Geo-Inf. 2017, 6, 360. [Google Scholar] [CrossRef] [Green Version]
Kabiri, K.; Moradi, M. Landsat-8 imagery to estimate clarity in near-shore coastal waters: Feasibility study—Chabahar Bay, Iran. Cont. Shelf Res. 2016, 125, 44–53. [Google Scholar] [CrossRef]

Figure 1. The distribution of the lakes in this study in Yunnan Province, China. (a) Distribution map of rivers and lakes in China. The red box indicates Yunnan Province, where the study area is located. (b) Distribution map of rivers and lakes in Yunnan Province. Red boxes indicate the four lakes investigated in this study. (c) Geographical location of Erhai Lake. (d) Geographical location of Dianchi Lake and Fuxian Lake. (e) Geographical location of Yilong Lake.

Figure 2. The relationship between the number of parameters in the stepwise regression model and the prediction accuracy of the model. (a) The results of the stepwise regression model using a single band or band combination spectral variables with different training datasets. (b) The results of the stepwise regression model using band combination and band ratio combination spectral variables with different training datasets. The experiment number corresponds to Table 3. The blue line corresponds to the 1-day time window dataset, the orange line corresponds to the 3-day time window dataset, and the green line corresponds to the 7-day time window dataset.

Figure 3. The relationship between the time window and the model R-squared of the MLP and SVM models. The experiment number corresponds to Table 3. The blue line represents the MLP model using spectral variable group 1. The orange line represents the MLP model using spectral variable group 2. The green line represents the SVM model using spectral variable group 1. The pink line represents the SVM model using spectral variable group 2. Group 1 represents the spectral variables that use a single band or band combination. Group 2 represents the spectral variables that use a band combination and band ratio arrangement.

Figure 4. Scatter plot of the measured Secchi depth (SD) and predicted SD of the optimal solutions for different groups of spectral variables corresponding to MLP and SVM models. (a) The optimal solutions for the MLP model for different groups of spectral variables. (b) The optimal solutions for the SVM model for different groups of spectral variables. The experiment number corresponds to Table 3. The blue points represent models using spectral variables in a single band or band combination. The orange points represent models using spectral variables in band combination and band ratio combination.

Table 1. Statistics of the measured SD for the four lakes. CV (coefficient of variation) = Stdev/Mean.

	Dianchi Lake (n = 696)	Erhai Lake (n = 1186)	Fuxian Lake (n = 504)	Yilong Lake (n = 358)
Min (m)	0.14	0.53	3.00	0.08
Max (m)	1.70	4.90	9.00	2.00
Mean (m)	0.47	1.92	5.59	0.38
Median (m)	0.45	1.80	5.50	0.30
Stdev (m)	0.15	0.55	0.79	0.24
CV	0.33	0.29	0.14	0.65

Table 2. Secchi depth statistics of 1-day, 3-day, and 7-day time window matching data. Coefficient of variation (CV) = Stdev/Mean.

	1 Day (n = 112)	3 Days (n = 254)	7 Days (n = 432)
Min (m)	0.18	0.18	0.15
Max (m)	7.50	7.50	8.00
Mean (m)	2.01	1.88	2.02
Stdev (m)	2.09	1.86	1.97
CV	1.04	0.99	0.97

Table 3. The experiments conducted in this study and their relevant details. Group 1 represents the spectral variables that use a single band or band combination. Group 2 represents the spectral variables that use band combination and band ratio.

Experiment Number	Model	Training Dataset (Time Window)	Spectral Group
NO.1	Stepwise regression	1-day	Group 1
NO.2	Stepwise regression	3-day	Group 1
NO.3	Stepwise regression	7-day	Group 1
NO.4	Stepwise regression	1-day	Group 2
NO.5	Stepwise regression	3-day	Group 2
NO.6	Stepwise regression	7-day	Group 2
NO.7	Multi-layer perception	1-day	Group 1
NO.8	Multi-layer perception	3-day	Group 1
NO.9	Multi-layer perception	7-day	Group 1
NO.10	Multi-layer perception	1-day	Group 2
NO.11	Multi-layer perception	3-day	Group 2
NO.12	Multi-layer perception	7-day	Group 2
NO.13	Support vector machines	1-day	Group 1
NO.14	Support vector machines	3-day	Group 1
NO.15	Support vector machines	7-day	Group 1
NO.16	Support vector machines	1-day	Group 2
NO.17	Support vector machines	3-day	Group 2
NO.18	Support vector machines	7-day	Group 2

Table 4. The optimal models of stepwise regression correspond to two groups of spectral variables and their related details. Group 1 represents the spectral variables that use a single band or band combination. Group 2 represents the spectral variables that use band combination and band ratio arrangement. The experiment number corresponds to Table 3.

Experiment Number	Model	Training Dataset (Time Window)	Spectral Group	Formula		R-Squared
Experiment Number	Model	Training Dataset (Time Window)	Spectral Group	Parameters	Coefficient	R-Squared
NO.1	Stepwise regression	1-day	Group 1	Intercept	4.73 × 10²	0.6291
				Rrs655	−3.55 × 10⁴
				Rrs443	2.08 × 10⁴
NO.4			Group 2	Intercept	4.05 × 10²	0.8641
				Rrs655	−6.87 × 10⁴
				Rrs655/ Rrs483	−1.72 × 10⁷
				Rrs443	−1.28 × 10⁵
				Rrs443/ Rrs561	−4.30 × 10⁶
				Rrs561	−8.54 × 10⁴
				Rrs561/ Rrs655	7.94 × 10⁶
				Rrs483	2.57 × 10⁵
				Rrs483/ Rrs443	2.97 × 10⁶
				Rrs655/ Rrs443	1.17 × 10⁷

Table 5. Effect of the time window of the training set on the prediction results of the stepwise regression model (with spectral variable group 2 as an example). Group 2 represents the spectral variables that used a band combination and band ratio arrangement. The experiment number corresponds to Table 3.

Experiment Number	Model	Training Dataset (Time Window)	Spectral Group	Testing Dataset (Time Window)	R-Squared
NO.4	Stepwise regression	1-day	Group 2	1-day	0.8641
NO.4		1-day	Group 2	7-day	0.7537
NO.6		7-day	Group 2	1-day	0.8258
NO.6		7-day	Group 2	7-day	0.7826

Table 6. Influence of the time window of the training set on the prediction results of the machine-learning model (in the case of spectral variable group 2). Group 2 represents the spectral variables that use a band combination and band ratio combination. The experiment number corresponds to Table 3.

Experiment Number	Model	Training Dataset (Time Window)	Spectral Group	Testing Dataset (Time Window)	R-Squared
NO.10	Multi-layer perceptron	1-day	Group 2	1-day	0.988
NO.10		1-day	Group 2	7-day	0.5731
NO.12		7-day	Group 2	1-day	0.9812
NO.12		7-day	Group 2	7-day	0.9721
NO.16	Support vector machines	1-day	Group 2	1-day	0.865
NO.16		1-day	Group 2	7-day	0.7671
NO.18		7-day	Group 2	1-day	0.8686
NO.18		7-day	Group 2	7-day	0.869

Table 7. Difference in prediction results between models trained with a 1-day time window dataset and models trained with other time window training datasets (using 1-day time window data as a testing dataset). Group 1 represents the spectral variables that used a single band or band combination. Group 2 represents the spectral variables that used a band combination and band ratio arrangement. The 1- to 3-day, 1- to 7-day, and 3- to 7-day time windows of the training dataset were obtained by subtracting the 1-, 3-, and 7-day time windows of the dataset. The p-value is a scale used by the t-test to detect the effect, and a p-value less than 0.05 represents a significant difference.

Model	Testing Dataset (Time Window)	Spectral Group	Training Dataset (Time Window)		Statistic	p-Value
Model	Testing Dataset (Time Window)	Spectral Group	Sample1	Sample2	Statistic	p-Value
Stepwise regression	1-day	Group 1	1-day	3-day	0.9482	0.3441
		Group 1		7-day	0.3431	0.7318
		Group 1		1- to 3-day	0.6366	0.525
		Group 1		1- to 7-day	0.3412	0.7332
		Group 1		3- to 7-day	0.6184	0.5369
		Group 2		3-day	0.3879	0.6984
		Group 2		7-day	0.2418	0.8092
		Group 2		1- to 3-day	0.2995	0.7648
		Group 2		1- to 7-day	−0.0424	0.9662
		Group 2		3- to 7-day	0.2993	0.765
Multi-layer perceptron		Group 1		3-day	−0.1351	0.8926
		Group 1		7-day	0.1541	0.8776
		Group 1		1- to 3-day	0.022	0.9825
		Group 1		1- to 7-day	−0.5074	0.6124
		Group 1		3- to 7-day	−0.3291	0.7424
		Group 2		3-day	0.0783	0.9377
		Group 2		7-day	−0.0545	0.9566
		Group 2		1- to 3-day	−0.693	0.489
		Group 2		1- to 7-day	−0.3651	0.7154
		Group 2		3- to 7-day	0.2928	0.77
Support vector machines		Group 1		3-day	0.3482	0.728
		Group 1		7-day	0.5229	0.6016
		Group 1		1- to 3-day	1.5386	0.1253
		Group 1		1- to 7-day	0.358	0.7207
		Group 1		3- to 7-day	0.7433	0.4581
		Group 2		3-day	0.1303	0.8964
		Group 2		7-day	−0.773	0.4403
		Group 2		1- to 3-day	−1.3009	0.1946
		Group 2		1- to 7-day	−1.2284	0.2206
		Group 2		3- to 7-day	0.1303	0.8964

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zeng, W.; Xu, K.; Cheng, S.; Zhao, L.; Yang, K. Regional Remote Sensing of Lake Water Transparency Based on Google Earth Engine: Performance of Empirical Algorithm and Machine Learning. Appl. Sci. 2023, 13, 4007. https://doi.org/10.3390/app13064007

AMA Style

Zeng W, Xu K, Cheng S, Zhao L, Yang K. Regional Remote Sensing of Lake Water Transparency Based on Google Earth Engine: Performance of Empirical Algorithm and Machine Learning. Applied Sciences. 2023; 13(6):4007. https://doi.org/10.3390/app13064007

Chicago/Turabian Style

Zeng, Weizhong, Ke Xu, Sihang Cheng, Lei Zhao, and Kun Yang. 2023. "Regional Remote Sensing of Lake Water Transparency Based on Google Earth Engine: Performance of Empirical Algorithm and Machine Learning" Applied Sciences 13, no. 6: 4007. https://doi.org/10.3390/app13064007

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Regional Remote Sensing of Lake Water Transparency Based on Google Earth Engine: Performance of Empirical Algorithm and Machine Learning

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Data Acquisition

2.2.1. In Situ Data

2.2.2. Satellite Data and Atmospheric Correction Algorithm

2.3. Time Matching

2.4. Algorithm Implementation

2.4.1. Empirical Algorithm

2.4.2. Machine-Learning Algorithms

2.4.3. Model Structure

2.5. Algorithm Assessment

3. Results

3.1. Empirical Algorithm

3.2. Machine-Learning Algorithms

4. Discussion

4.1. Uncertainty of SD Remote Sensing

4.2. Algorithm Comparison

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI