Estimation of the Water Level in the Ili River from Sentinel-2 Optical Data Using Ensemble Machine Learning
Abstract
:1. Introduction
- We obtained the state-of-the-art results in the problem of determining the water level in the Ili River using the optical remote sensing data and machine learning methods.
- We compared the machine learning algorithms and found that ensemble machine learning methods (Random Forest, eXtreme Gradient Boosting (XGBoost) and LightGBM) demonstrated the best and most robust water level estimation results.
- A set of input variables and corresponding feature engineering techniques is identified, which allows significant improvement of the original result and a good value of the Nash–Sutcliffe model efficiency coefficient.
- For the proposed model, the input parameters that ensure its stability depending on the volume and quality of input data were identified.
- Section 2 briefly describes the study area.
- The current state of the research area is discussed in Section 3.
- In Section 4, we describe the proposed method.
- Section 5 describes the results.
- Section 6 is devoted to discussion of the results.
- Finally, we refer to the limitations of the proposed approach and formulate the objectives of future research.
2. Study Area
3. Related Works
4. Method
- Formation of Sentinel-2 satellite imagery dataset.
- Preprocessing and features engineering.
- Training and tuning of machine learning models.
- Evaluation of results using a specified set of quality metrics.
4.1. Generation of the Dataset
4.1.1. Method of Data Preparation
4.1.2. Datasets
- Mean—average water level obtained from 8 a.m. and 12 p.m. measurements (target value).
- Date—date of measurements.
- pixelCount—number of pixels in the river mask.
- pixelCount_Clo—number of pixels in the image distorted by cloudiness.
Date | NDVI | B1 | B2 | B3 | B4 | B5 | B6 | B7 | B8 | B9 | B10 | B11 | B12 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
6 March 2017 | −0.133 | 0.1591 | 0.1368 | 0.1267 | 0.1327 | 0.1340 | 0.1113 | 0.1129 | 0.1027 | 0.0406 | 0.0026 | 0.0743 | 0.0538 |
29 March 2017 | 0.0188 | 0.6219 | 0.6118 | 0.5793 | 0.6180 | 0.6302 | 0.6395 | 0.6554 | 0.6413 | 0.4458 | 0.1065 | 0.4236 | 0.3761 |
5 April 2017 | 0.0077 | 0.1860 | 0.1634 | 0.1524 | 0.1649 | 0.1721 | 0.1707 | 0.1787 | 0.1674 | 0.0768 | 0.0312 | 0.1130 | 0.0937 |
28 April 2017 | −0.021 | 0.1931 | 0.1759 | 0.1694 | 0.1765 | 0.1801 | 0.1787 | 0.1894 | 0.1738 | 0.0753 | 0.0151 | 0.1373 | 0.1090 |
… | … | … | … | … | … | … | … | … | … | … | … | … | … |
30 November 2021 | −0.306 | 0.167 | 0.14 | 0.119 | 0.104 | 0.097 | 0.069 | 0.066 | 0.06 | 0.033 | 0.002 | 0.034 | 0.024 |
4.2. Machine Learning Models
5. Results
5.1. Preprocessing and Features Engineering
- Columns “pixelCount_Clo”, “month”, “year” were added (NSE = 0.79).
- The values for the year 2017 were removed, since many anomalous values (NSE = 0.81) were found. The records were cleared of anomalous values in the following way. First, the average pixelCount values were calculated. Then, there were removed those rows in which the pixelCount value is high (by sigma is greater than the mean) and the level is low (by sigma less) and vice versa, where is the variance of the values, is the empirical coefficient controlling the allowable spread of the data (the best value of NSE is obtained at = 0.5).
- Gradient of total river surface area values (sign of the difference between the current pixelCount and the previous one), season (1—spring, 2—summer, 3—autumn) (NSE = 0.87) were added.
- pixelCount value for previous date was added (NSE = 0.881).
- Area-averaged values of spectral ranges was added (NSE = 0.892).
Regression Model | MAE | MSE | NSE | R | Variance of MAE | Variance of NSE | Duration, sec. |
---|---|---|---|---|---|---|---|
XGB | 12.457 | 277.378 | 0.892 | 0.948 | 2.211 | 0.001 | 26.6537 |
RF | 13.093 | 313.232 | 0.876 | 0.939 | 2.447 | 0.001 | 63.1779 |
LR | 16.47 | 487.59 | 0.808 | 0.906 | 3.527 | 0.003 | 0.2334 |
Lasso | 15.898 | 459.063 | 0.819 | 0.911 | 3.27 | 0.003 | 0.2753 |
ElasticNet | 16.824 | 509.744 | 0.798 | 0.9 | 3.559 | 0.003 | 0.2084 |
LGBM | 13.316 | 316.898 | 0.875 | 0.938 | 2.524 | 0.002 | 15.0877 |
Ridge | 14.189 | 363.815 | 0.855 | 0.929 | 2.829 | 0.003 | 0.432 |
SVM | 14.767 | 406.513 | 0.839 | 0.92 | 3.482 | 0.003 | 1.1559 |
5.2. Robustness Analysis
- (a)
- Using the full dataset for the period from 2017 to 2021
- 1.
- Without removal of anomalous values (277 records)
- 2.
- Removal of anomalous values at = 1.0 (270 records)
- 3.
- Removal of anomalous values at = 0.5 (270 records)
- (b)
- Using a reduced dataset for the period 2018 to 2021
- 4.
- No removal of anomalous values (244 records)
- 5.
- Removal of anomalous values at parameter = 1.0 (244 records)
- 6.
- Removal of anomalous values at parameter = 0.5 (232 records)
- 7.
- Additionally, all records with water levels greater than 280 cm were deleted. (164 records).
Regression Model | MAE | NSE | R |
---|---|---|---|
XGB | 17.088 | 0.812 | 0.907 |
RF | 17.427 | 0.776 | 0.887 |
LR | 34.667 | 0.28 | 0.569 |
Lasso | 34.61 | 0.282 | 0.57 |
ElasticNet | 34.665 | 0.29 | 0.571 |
LGBM | 18.858 | 0.764 | 0.879 |
Ridge | 28.191 | 0.493 | 0.719 |
SVM | 28.941 | 0.415 | 0.687 |
MLP | 31.728 | 0.393 | 0.697 |
Regression Model | MAE | NSE | R |
---|---|---|---|
XGB | 14.039 | 0.827 | 0.914 |
RF | 14.256 | 0.808 | 0.904 |
LR | 30.52 | 0.223 | 0.529 |
Lasso | 30.43 | 0.231 | 0.534 |
ElasticNet | 30.598 | 0.241 | 0.532 |
LGBM | 15.499 | 0.788 | 0.891 |
Ridge | 23.385 | 0.516 | 0.735 |
SVM | 24.374 | 0.429 | 0.701 |
MLP | 19.075 | 0.642 | 0.831 |
6. Discussion
Regression Model | MAE | NSE | R |
---|---|---|---|
XGB | 37.219 | 0.766 | 0.88 |
RF | 38.169 | 0.756 | 0.876 |
LR | 53.537 | 0.528 | 0.746 |
Lasso | 54.367 | 0.515 | 0.738 |
ElasticNet | 62.273 | 0.431 | 0.688 |
LGBM | 40.808 | 0.72 | 0.854 |
Ridge | 47.68 | 0.586 | 0.779 |
SVM | 49.489 | 0.585 | 0.779 |
- Limited spatial resolution, making the method suitable only for relatively large rivers.
- Dependence on the state of the atmosphere. The method does not allow determining the water level in case of significant cloud cover.
- Limited applicability of the model for other regions with different riverbed morphology.
- High accuracy of the method depends on expert marking of the riverbeds.
7. Conclusions
- To assess the applicability of the methods for other large rivers of South Kazakhstan.
- To evaluate the possibility of using SAR data to improve the accuracy of estimating the width of the river bed.
- To apply the methods of image processing using deep learning models, for example, convolutional networks, which will require a significant increase in the set of input data.
- To investigate the relationship between the length of the virtual gauging station and the accuracy of the forecast.
- More precise tuning of parameters and hyperparameters of machine learning models using, for example, evolutionary programming.
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
ANN | Artificial Neural Network |
BiLSTM | Bidirectional LSTM |
CNN | Convolution Neural Network |
DT | decision tree |
GPR | Gaussian process regression |
LSSVM | least squares support vector machine |
LSTM | long short-term memory |
MAE | Mean Absolute Error |
MARS | multivariate adaptive regression spline |
ML | Machine Learning |
MLP | multilayer perceptron |
MSE | Mean Squared Error |
NSE (R2) | Nash–Sutcliffe model coefficient |
RBF | Radial Basis Function Neural Network |
RF | Random Forest |
SHAP | Shapley Additive exPlanations |
SVM | Support Vector Machine |
SVR | Support Vector Regression |
XGB | eXtreme Gradient Boosting |
Appendix A
Sentinel-2 Bands | Sentinel-2A | Sentinel-2B | |||
---|---|---|---|---|---|
Central Wavelength (nm) | Bandwidth (nm) | Central Wavelength (nm) | Bandwidth (nm) | Spatial Resolution (m) | |
Band 1—Coastal aerosol | 442.7 | 21 | 442.2 | 21 | 60 |
Band 2—Blue | 492.4 | 66 | 492.1 | 66 | 10 |
Band 3—Green | 559.8 | 36 | 559.0 | 36 | 10 |
Band 4—Red | 664.6 | 31 | 664.9 | 31 | 10 |
Band 5—Vegetation red edge | 704.1 | 15 | 703.8 | 16 | 20 |
Band 6—Vegetation red edge | 740.5 | 15 | 739.1 | 15 | 20 |
Band 7—Vegetation red edge | 782.8 | 20 | 779.7 | 20 | 20 |
Band 8—NIR | 832.8 | 106 | 832.9 | 106 | 10 |
Band 8A—Narrow NIR | 864.7 | 21 | 864.0 | 22 | 20 |
Band 9—Water vapor | 945.1 | 20 | 943.2 | 21 | 60 |
Band 10—SWIR—Cirrus | 1373.5 | 31 | 1376.9 | 30 | 60 |
Band 11—SWIR | 1613.7 | 91 | 1610.4 | 94 | 20 |
Band 12—SWIR | 2202.4 | 175 | 2185.7 | 185 | 20 |
Appendix B. Tuning Parameters and Hyperparameters of Machine Learning Models
- (1)
- (2)
- In the second step, hyperparameters were configured using the GrigSearch() method. The custom algorithms were trained by splitting the dataset once into training and testing. Table A2 lists the tunable hyperparameters of the regression models and their best combinations found using GrigSearch().
Regression Model | Model Parameters | Best Params |
---|---|---|
RF | ‘max_depth’: [32, 16, 8, 4, 2], ‘n_estimators’: [50, 100, 400, 1000], ‘max_features’: [4, 7, 14] | max_depth = 16, max_features = 4, n_estimators = 100 |
SVR | ‘kernel’: [‘linear’,’rbf’], ‘C’: [0.015, 0.03, 0.05, 0.025, 0.03, 1, 100, 1000, 2000, 3000], ‘gamma’: [0.01, 0.08, 0.1, 0.15, 0.2, 0.25, 1] | C = 1000, Gamma = 0.2, Kernel = ‘rbf’ |
LGBM | ‘learning_rate’: [0.01, 0.1, 0.25, 0.6, 0.7], ‘max_depth’: [32, 16, 8, 4, 2], ‘n_estimators’: [1000, 400, 50, 100], ‘min_child_samples’: [2, 10, 20, 50], ‘min_child_weight’: [0.0001, 0.001, 0.01, 0.1, 1,2] | learning_rate = 0.01, max_depth = 2, min_child_weight = 2, min_child_samples = 2, n_estimators = 1000 |
XGB | ‘gamma’: [0, 0.1, 0.2, 0.8, 3.2, 12.8, 25.6, 102.4, 200], ‘learning_rate’: [0.01, 0.1, 0.25, 0.6, 0.7], ‘max_depth’: [32, 16, 8, 4, 2] ‘n_estimators’: [50, 100, 400, 1000] ‘colapse_bytree’: [0.1, 0.2, 0.4], ‘min_child_weight’: [2, 4, 6] | Gamma = 0.0, learning_rate = 0.25, max_depth = 8 n_estimators = 400 colapse_bytree = 0.1 min_child_weight = 2 |
- (3)
- Then all algorithms were performed with a 50-fold split into training and test. Algorithms with poor performance were excluded from their totality, the execution time of which was much (several times) above average (some regression models based on SVM).
- (4)
Appendix C. Results of Computational Experiments at Different Processing of Initial Data
Regression Model | MAE | MSE | NSE | R | Variance of MAE | Variance of | Duration |
---|---|---|---|---|---|---|---|
XGB | 15.024 | 425.157 | 0.845 | 0.924 | 3.881 | 0.002 | 28.9665 |
RF | 16.884 | 561.712 | 0.795 | 0.897 | 3.609 | 0.004 | 73.8551 |
LR | 21.171 | 846.091 | 0.691 | 0.842 | 5.241 | 0.007 | 0.2573 |
Lasso | 20.934 | 816.489 | 0.702 | 0.847 | 5.194 | 0.006 | 0.2942 |
ElasticNet | 22.077 | 846.867 | 0.692 | 0.842 | 5.312 | 0.005 | 0.2165 |
LGBM | 16.258 | 473.927 | 0.827 | 0.913 | 3.435 | 0.003 | 17.3745 |
Ridge | 17.877 | 631.565 | 0.767 | 0.885 | 5.53 | 0.015 | 0.4069 |
SVM | 18.049 | 657.847 | 0.759 | 0.879 | 6.382 | 0.012 | 1.1061 |
Regression Model | MAE | MSE | NSE | R | Variance of MAE | Variance of | Duration |
---|---|---|---|---|---|---|---|
XGB | 15.068 | 432.232 | 0.838 | 0.92 | 3.301 | 0.002 | 26.9609 |
RF | 16.46 | 540.478 | 0.792 | 0.894 | 3.876 | 0.004 | 73.702 |
LR | 20.741 | 816.747 | 0.687 | 0.84 | 6.301 | 0.007 | 0.2575 |
Lasso | 20.493 | 793.558 | 0.695 | 0.843 | 6.016 | 0.007 | 0.2713 |
ElasticNet | 21.513 | 813.917 | 0.688 | 0.838 | 5.291 | 0.005 | 0.2077 |
LGBM | 16.173 | 478.723 | 0.816 | 0.907 | 4.111 | 0.003 | 16.5896 |
Ridge | 17.654 | 639.832 | 0.752 | 0.879 | 5.602 | 0.018 | 0.4608 |
SVM | 17.887 | 670.769 | 0.741 | 0.871 | 5.114 | 0.016 | 1.1679 |
Regression Model | MAE | MSE | NSE | R | Variance of MAE | Variance of | Duration |
---|---|---|---|---|---|---|---|
XGB | 14.659 | 401.71 | 0.851 | 0.926 | 4.214 | 0.002 | 26.0439 |
RF | 16.24 | 514.742 | 0.797 | 0.899 | 3.697 | 0.005 | 70.7347 |
LR | 20.177 | 786.327 | 0.69 | 0.845 | 4.884 | 0.011 | 0.2214 |
Lasso | 19.696 | 765.638 | 0.699 | 0.847 | 4.352 | 0.009 | 0.2673 |
ElasticNet | 20.734 | 776.286 | 0.695 | 0.843 | 3.917 | 0.007 | 0.2154 |
LGBM | 16.097 | 477.845 | 0.812 | 0.905 | 3.957 | 0.004 | 16.6644 |
Ridge | 16.502 | 526.122 | 0.792 | 0.897 | 3.918 | 0.006 | 0.4448 |
SVM | 17.013 | 569.296 | 0.776 | 0.888 | 4.308 | 0.006 | 1.1549 |
Regression Model | MAE | MSE | NSE | R | Variance of MAE | Variance of | Duration |
---|---|---|---|---|---|---|---|
XGB | 13.485 | 334.454 | 0.867 | 0.935 | 3.172 | 0.001 | 27.979 |
RF | 14.91 | 413.625 | 0.837 | 0.919 | 4.017 | 0.003 | 68.1341 |
LR | 19.406 | 726.094 | 0.711 | 0.857 | 6.697 | 0.012 | 0.2553 |
Lasso | 18.865 | 688.109 | 0.726 | 0.863 | 6.663 | 0.011 | 0.356 |
ElasticNet | 19.008 | 678.672 | 0.731 | 0.863 | 6.495 | 0.008 | 0.2832 |
LGBM | 14.915 | 388.547 | 0.847 | 0.924 | 3.043 | 0.002 | 16.4032 |
Ridge | 15.794 | 473.48 | 0.812 | 0.909 | 4.2 | 0.005 | 0.4099 |
SVM | 16.517 | 509.542 | 0.8 | 0.9 | 5.605 | 0.004 | 1.0911 |
Regression Model | MAE | MSE | NSE | R | Variance of MAE | Variance of | Duration |
---|---|---|---|---|---|---|---|
XGB | 13.485 | 334.454 | 0.867 | 0.935 | 3.172 | 0.001 | 26.9818 |
RF | 14.908 | 413.881 | 0.837 | 0.919 | 4.145 | 0.003 | 66.2989 |
LR | 19.406 | 726.094 | 0.711 | 0.857 | 6.697 | 0.012 | 0.2304 |
Lasso | 18.865 | 688.109 | 0.726 | 0.863 | 6.663 | 0.011 | 0.3212 |
ElasticNet | 19.008 | 678.672 | 0.731 | 0.863 | 6.495 | 0.008 | 0.1995 |
LGBM | 14.915 | 388.547 | 0.847 | 0.924 | 3.043 | 0.002 | 16.1009 |
Ridge | 15.794 | 473.48 | 0.812 | 0.909 | 4.2 | 0.005 | 0.363 |
SVM | 16.517 | 509.542 | 0.8 | 0.9 | 5.605 | 0.004 | 1.126 |
Regression Model | MAE | MSE | NSE | R | Variance of MAE | Variance of | Duration |
---|---|---|---|---|---|---|---|
XGB | 12.457 | 277.378 | 0.892 | 0.948 | 2.211 | 0.001 | 26.6537 |
RF | 13.093 | 313.232 | 0.876 | 0.939 | 2.447 | 0.001 | 63.1779 |
LR | 16.47 | 487.59 | 0.808 | 0.906 | 3.527 | 0.003 | 0.2334 |
Lasso | 15.898 | 459.063 | 0.819 | 0.911 | 3.27 | 0.003 | 0.2753 |
ElasticNet | 16.824 | 509.744 | 0.798 | 0.9 | 3.559 | 0.003 | 0.2084 |
LGBM | 13.316 | 316.898 | 0.875 | 0.938 | 2.524 | 0.002 | 15.0877 |
Ridge | 14.189 | 363.815 | 0.855 | 0.929 | 2.829 | 0.003 | 0.432 |
SVM | 14.767 | 406.513 | 0.839 | 0.92 | 3.482 | 0.003 | 1.1559 |
Regression Model | MAE | MSE | R2 | R | Variance of MAE | Variance of | Duration |
---|---|---|---|---|---|---|---|
XGB | 10.267 | 192.933 | 0.882 | 0.943 | 3.021 | 0.003 | 24.6675 |
RF | 10.253 | 196.141 | 0.879 | 0.942 | 2.84 | 0.004 | 54.0638 |
LR | 12.432 | 288.842 | 0.823 | 0.916 | 3.572 | 0.007 | 0.2274 |
Lasso | 12.663 | 274.545 | 0.832 | 0.92 | 2.764 | 0.005 | 0.2245 |
ElasticNet | 13.607 | 308.322 | 0.812 | 0.909 | 3.294 | 0.005 | 0.1985 |
LGBM | 11.423 | 232.186 | 0.858 | 0.931 | 3.157 | 0.004 | 12.1241 |
Ridge | 11.269 | 232.433 | 0.857 | 0.932 | 3.045 | 0.003 | 0.3597 |
SVM | 11.616 | 245.88 | 0.848 | 0.929 | 3.064 | 0.004 | 0.8737 |
Appendix D. The Difference between SHAP Value and Gini Impurity Index
References
- Terekhov, A.; Abaev, N.; Lagutin, E. Satellite monitoring of the Sardobinsky reservoir in the Syrdarya River Basin (Uzbekistan) before and after the dam breach on May 1, 2020. Mod. Probl. Earth Remote Sens. Space 2020, 17, 255–260. [Google Scholar]
- In Kazakhstan, 268 Dams Were Recognized as Dangerous. Available online: https://vesti.kz/society/v-kazahstane-268-plotin-priznali-opasnyimi-44002 (accessed on 4 September 2023).
- Wang, X.; Yang, W. Water quality monitoring and evaluation using remote sensing techniques in China: A systematic review. Ecosyst. Health Sustain. 2019, 5, 47–56. [Google Scholar] [CrossRef]
- Kapalanga, T.S.; Hoko, Z.; Gumindoga, W.; Chikwiramakomo, L. Remote-sensing-based algorithms for water quality monitoring in Olushandja Dam, north-central Namibia. Water Supply 2021, 21, 1878–1894. [Google Scholar] [CrossRef]
- Yang, H.; Kong, J.; Hu, H.; Du, Y.; Gao, M.; Chen, F. A review of remote sensing for water quality retrieval: Progress and challenges. Remote Sens. 2022, 14, 1770. [Google Scholar] [CrossRef]
- Tarpanelli, A.; Brocca, L.; Lacava, T.; Melone, F.; Moramarco, T.; Faruolo, M.; Pergola, N.; Tramutoli, V. Toward the estimation of river discharge variations using MODIS data in engaged basins. Remote Sens. Environ. 2013, 136, 47–55. [Google Scholar] [CrossRef]
- Riggs, R.M.; Allen, G.H.; David, C.H.; Lin, P.; Pan, M.; Yang, X.; Gleason, C. RODEO: An algorithm and Google Earth Engine application for river discharge retrieval from Landsat. Environ. Model. Softw. 2022, 148, 105254. [Google Scholar] [CrossRef]
- Psomiadis, E.; Tomanis, L.; Kavvadias, A.; Soulis, K.X.; Charizopoulos, N.; Michas, S. Potential dam breach analysis and flood wave risk assessment using HEC-RAS and remote sensing data: A multicriteria approach. Water 2021, 13, 364. [Google Scholar] [CrossRef]
- Bhattacharya, B.; Mazzoleni, M.; Ugay, R. Flood inundation mapping of the sparsely gauged large-scale Brahmaputra Basin using remote sensing products. Remote Sens. 2019, 11, 501. [Google Scholar] [CrossRef]
- Zeng, Y.; Meng, X.; Zhang, Y.; Dai, W.; Fang, N.; Shi, Z. Estimation of the volume of sediment deposited behind check dams based on UAV remote sensing. J. Hydrol. 2022, 612, 128143. [Google Scholar] [CrossRef]
- Zou, W.; Zhou, Y.; Wang, S.; Wang, F.; Wang, L.; Zhao, Q.; Liu, W.; Zhu, J.; Xiong, Y.; Wang, Z. Using a single remote-sensing image to calculate the height of a landslide dam and the maximum volume of a lake. Nat. Hazards Earth Syst. Sci. 2022, 22, 2081–2097. [Google Scholar] [CrossRef]
- Silveira Kupssinskü, L.; Thomassim Guimarães, T.; Menezes de Souza, E.; Zanotta, D.; Roberto Veronez, M.; Gonzaga Jr, L.; Mauad, F.F. A method for chlorophyll-a and suspended solids prediction through remote sensing and machine learning. Sensors 2020, 20, 2125. [Google Scholar] [CrossRef] [PubMed]
- Xiao, Y.; Guo, Y.; Yin, G.; Zhang, X.; Shi, Y.; Hao, F.; Fu, Y. UAV multispectral image-based urban river water quality monitoring using stacked ensemble machine learning algorithms—A case study of the Zhanghe river, China. Remote Sens. 2022, 14, 3272. [Google Scholar] [CrossRef]
- Feng, C.; Zhang, H.; Wang, S.; Li, Y.; Wang, H.; Yan, F. Structural damage detection using deep convolutional neural network and transfer learning. KSCE J. Civ. Eng. 2019, 23, 4493–4502. [Google Scholar] [CrossRef]
- Mukhamedjanov, I.; Konstantinova, A.; Lupyan, E.; Umirzakov, G. Assessment of capabilities of satellite monitoring of the river discharge dynamics on the example of analyzing the Amudarya river condition. Mod. Probl. Remote Sens. Earth Space 2022, 1, 87. [Google Scholar]
- Xiong, J.; Guo, S.; Yin, J. Discharge estimation using integrated satellite data and hybrid model in the midstream Yangtze River. Remote Sens. 2021, 13, 2272. [Google Scholar] [CrossRef]
- Imentai, A.; Thevs, N.; Schmidt, S.; Nurtazin, S.; Salmurzauli, R. Vegetation, fauna, and biodiversity of the Ili Delta and southern Lake Balkhash—A review. J. Great Lakes Res. 2015, 41, 688–696. [Google Scholar] [CrossRef]
- Talipova, E.; Shrestha, S.; Alimkulov, S.; Nyssanbayeva, A.; Tursunova, A.; Isakan, G. Influence of climate change and anthropogenic factors on the Ili River basin streamflow, Kazakhstan. Arab. J. Geosci. 2021, 14, 1756. [Google Scholar] [CrossRef]
- Kogutenko, L.; Severskiy, I.; Shahgedanova, M.; Lin, B. Change in the Extent of Glaciers and Glacier Runoff in the Chinese Sector of the Ile River Basin between 1962 and 2012. Water 2019, 11, 1668. [Google Scholar] [CrossRef]
- Duskayev, K.; Myrzakhmetov, A.; Zhanabayeva, Z.; Klein, I. Features of the sediment runoff regime downstream the Ile river. J. Ecol. Eng. 2020, 21, 117–125. [Google Scholar] [CrossRef]
- Thevs, N.; Nurtazin, S.; Beckmann, V.; Salmyrzauli, R.; Khalil, A. Water consumption of agriculture and natural ecosystems along the Ili River in China and Kazakhstan. Water 2017, 9, 207. [Google Scholar] [CrossRef]
- Pueppke, S.G.; Zhang, Q.; Nurtazin, S.T. Irrigation in the Ili River basin of Central Asia: From ditches to dams and diversion. Water 2018, 10, 1650. [Google Scholar] [CrossRef]
- Pueppke, S.G.; Nurtazin, S.T.; Graham, N.A.; Qi, J. Central Asia’s Ili River ecosystem as a wicked problem: Unraveling complex interrelationships at the interface of water, energy, and food. Water 2018, 10, 541. [Google Scholar] [CrossRef]
- Li, Y.; Song, Y.; Fitzsimmons, K.E.; Chen, X.; Wang, Q.; Sun, H.; Zhang, Z. New evidence for the provenance and formation of loess deposits in the Ili River Basin, Arid Central Asia. Aeolian Res. 2018, 35, 1–8. [Google Scholar] [CrossRef]
- Jiao, W.; Chen, Y.; Li, W.; Zhu, C.; Li, Z. Estimation of net primary productivity and its driving factors in the Ili River Valley, China. J. Arid Land 2018, 10, 781–793. [Google Scholar] [CrossRef]
- Propastin, P.A. Simple model for monitoring Balkhash Lake water levels and Ili River discharges: Application of remote sensing. Lakes Reserv. Res. Manag. 2008, 13, 77–81. [Google Scholar] [CrossRef]
- Terekhov, A.; Pak, I.; Dolgikh, S. LANDSAT 5, 7, 8 and DEM data in the task of monitoring the hydrological regime of the Kapshagai reservoir on the Tekes River (Chinese part of the Ile River Basin). Mod. Probl. Remote Sens. Earth Space 2015, 12, 174–182. [Google Scholar]
- Ahmed, A.N.; Yafouz, A.; Birima, A.H.; Kisi, O.; Huang, Y.F.; Sherif, M.; Sefelnasr, A.; El-Shafie, A. Water level prediction using various machine learning algorithms: A case study of Durian Tunggal river, Malaysia. Eng. Appl. Comput. Fluid Mech. 2022, 16, 422–440. [Google Scholar] [CrossRef]
- Brakenridge, G.R.; Cohen, S.; Kettner, A.J.; De Groeve, T.; Nghiem, S.V.; Syvitski, J.P.; Fekete, B.M. Calibration of satellite measurements of river discharge using a global hydrology model. J. Hydrol. 2012, 475, 123–136. [Google Scholar] [CrossRef]
- Bustami, R.; Bessaih, N.; Bong, C.; Suhaili, S. Artificial Neural Network for Precipitation and Water Level Predictions of Bedup River. IAENG Int. J. Comput. Sci. 2007, 34, 2. [Google Scholar]
- Khan, M.; Hasan, F.; Panwar, S.; Chakrapani, G.J. Neural network model for discharge and water-level prediction for Ramganga River catchment of Ganga Basin, India. Hydrol. Sci. J. 2016, 61, 2084–2095. [Google Scholar] [CrossRef]
- Jung, S.; Lee, D.; Lee, K. Prediction of river water level using deep-learning open library. J. Korean Soc. Hazard Mitig. 2018, 18, 1–11. [Google Scholar] [CrossRef]
- Jung, S.; Cho, H.; Kim, J.; Lee, G. Prediction of water level in a tidal river using a deep-learning based LSTM model. J. Korea Water Resour. Assoc. 2018, 51, 1207–1216. [Google Scholar]
- Tao, H.; Al-Bedyry, N.K.; Khedher, K.M.; Shahid, S.; Yaseen, Z.M. River water level prediction in coastal catchment using hybridized relevance vector machine model with improved grasshopper optimization. J. Hydrol. 2021, 598, 126477. [Google Scholar] [CrossRef]
- Hussain, D.; Khan, A.A. Machine learning techniques for monthly river flow forecasting of Hunza River, Pakistan. Earth Sci. Inform. 2020, 13, 939–949. [Google Scholar] [CrossRef]
- Ditthakit, P.; Pinthong, S.; Salaeh, N.; Binnui, F.; Khwanchum, L.; Pham, Q.B. Using machine learning methods for supporting GR2M model in runoff estimation in an engaged basin. Sci. Rep. 2021, 11, 19955. [Google Scholar] [CrossRef] [PubMed]
- Thanh, H.V.; Binh, D.V.; Kantoush, S.A.; Nourani, V.; Saber, M.; Lee, K.K.; Sumi, T. Reconstructing daily discharge in a megadelta using machine learning techniques. Water Resour. Res. 2022, 58, e2021WR031048. [Google Scholar] [CrossRef]
- Sahoo, D.P.; Sahoo, B.; Tiwari, M.K.; Behera, G.K. Integrated remote sensing and machine learning tools for estimating ecological flow regimes in tropical river reaches. J. Environ. Manag. 2022, 322, 116121. [Google Scholar] [CrossRef]
- Bjerklie, D.M.; Birkett, C.M.; Jones, J.W.; Carabajal, C.; Rover, J.A.; Fulton, J.W.; Garambois, P.-A. Satellite remote sensing estimation of river discharge: Application to the Yukon River Alaska. J. Hydrol. 2018, 561, 1000–1018. [Google Scholar] [CrossRef]
- Fok, H.S.; Chen, Y.; Zhou, L. Daily runoff and its potential error sources reconstructed using individual satellite hydrological variables at the basin upstream. Front. Earth Sci. 2022, 10, 821592. [Google Scholar] [CrossRef]
- Hirpa, F.A.; Hopson, T.M.; De Groeve, T.; Brakenridge, G.R.; Gebremichael, M.; Restrepo, P.J. Upstream satellite remote sensing for river discharge forecasting: Application to major rivers in South Asia. Remote Sens. Environ. 2013, 131, 140–151. [Google Scholar] [CrossRef]
- Koblinsky, C.J.; Clarke, R.T.; Brenner, A.; Frey, H. Measurement of River Level Variations with Satellite Altimetry; Wiley Online Library: Hoboken, NJ, USA, 1993. [Google Scholar]
- Tarpanelli, A.; Camici, S.; Nielsen, K.; Brocca, L.; Moramarco, T.; Benveniste, J. Potentials and limitations of Sentinel-3 for river discharge assessment. Adv. Space Res. 2021, 68, 593–606. [Google Scholar] [CrossRef]
- Jason-3 Altimetry Mission. Available online: https://www.eoportal.org/satellite-missions/jason-3#mission-capabilities (accessed on 4 September 2023).
- Lebedev, S.; Kostyanoy, A.; Popov, S. Satellite altimetry of the Barents Sea. Sovrem. Probl. Distantsionnogo Zondirovaniya Zemli Iz Kosmosa 2021, 12, 194–212. [Google Scholar] [CrossRef]
- Vittucci, C.; Guerriero, L.; Ferrazzoli, P.; Rahmoune, R.; Barraza, V.; Grings, F. River water level prediction using passive microwave signatures—A case study: The Bermejo Basin. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 3903–3914. [Google Scholar] [CrossRef]
- Verma, K.; Nair, A.S.; Jayaluxmi, I.; Karmakar, S.; Calmant, S. Satellite altimetry for Indian reservoirs. Water Sci. Eng. 2021, 14, 277–285. [Google Scholar] [CrossRef]
- Grimaldi, S.; Li, Y.; Pauwels, V.R.; Walker, J.P. Remote sensing-derived water extent and level to constrain hydraulic flood forecasting models: Opportunities and challenges. Surv. Geophys. 2016, 37, 977–1034. [Google Scholar] [CrossRef]
- Göttl, F.; Dettmering, D.; Müller, F.L.; Schwatke, C. Lake level estimation based on CryoSat-2 SAR altimetry and multi-looked waveform classification. Remote Sens. 2016, 8, 885. [Google Scholar] [CrossRef]
- Kleinherenbrink, M.; Naeije, M.; Slobbe, C.; Egido, A.; Smith, W. The performance of CryoSat-2 fully-focussed SAR for inland water-level estimation. Remote Sens. Environ. 2020, 237, 111589. [Google Scholar] [CrossRef]
- Ahmed, A.M.; Deo, R.C.; Ghahramani, A.; Feng, Q.; Raj, N.; Yin, Z.; Yang, L. New double decomposition deep learning methods for river water level forecasting. Sci. Total Environ. 2022, 831, 154722. [Google Scholar] [CrossRef]
- Mohsen, A.; Kovács, F.; Kiss, T. Remote Sensing of Sediment Discharge in Rivers Using Sentinel-2 Images and Machine-Learning Algorithms. Hydrology 2022, 9, 88. [Google Scholar] [CrossRef]
- Terekhov, A. Satellite monitoring of the river bed of the transboundary Ili River in the task of water discharge estimation. In Proceedings of the Sixteenth All-Russian Open Conference “Modern Problems of Remote Sensing of the Earth from Space”, Moscow, Russia, 12–16 November 2018; p. 115. [Google Scholar]
- Abayev, N.N.; Terekhov, A.G.; Sagatdinova, G.N.; Mukhamediev, R.I.; Amirgaliyev, E.N. Satellite monitoring of the river shoals of the transboundary Ili River (Central Asia) in the task of the water level estimation. Mod. Probl. Remote Sens. Earth Space 2023, 20, 170–181. [Google Scholar]
- Gizatullin, A.; Sharafutdinov, R. Distinctive features of modeling the zones of possible flooding during the passage of floods on the plain and mountainous territory. In Geoinformation Technologies in Projecting and Constructing the Corporate Information Systems; Springer: Ufa, Russia, 2010; pp. 154–160. [Google Scholar]
- McFeeters, S.K. The use of the Normalized Difference Water Index (NDWI) in the delineation of open water features. Int. J. Remote Sens. 1996, 17, 1425–1432. [Google Scholar] [CrossRef]
- Xu, H. Modification of normalised difference water index (NDWI) to enhance open water features in remotely sensed imagery. Int. J. Remote Sens. 2006, 27, 3025–3033. [Google Scholar] [CrossRef]
- Sentinel-2 Bands. Available online: https://custom-scripts.sentinel-hub.com/custom-scripts/sentinel-2/bands/ (accessed on 4 September 2023).
- Mukhamediev, R.I.; Popova, Y.; Kuchin, Y.; Zaitseva, E.; Kalimoldayev, A.; Symagulov, A.; Levashenko, V.; Abdoldina, F.; Gopejenko, V.; Yakunin, K. Review of Artificial Intelligence and Machine Learning Technologies: Classification, Restrictions, Opportunities and Challenges. Mathematics 2022, 10, 2552. [Google Scholar] [CrossRef]
- Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Mukhamediev, R.I.; Merembayev, T.; Kuchin, Y.; Malakhov, D.; Zaitseva, E.; Levashenko, V.; Popova, Y.; Symagulov, A.; Sagatdinova, G.; Amirgaliyev, Y. Soil Salinity Estimation for South Kazakhstan Based on SAR Sentinel-1 and Landsat-8, 9 OLI Data with Machine Learning Models. Remote Sens. 2023, 15, 4269. [Google Scholar] [CrossRef]
- Yu, H.-F.; Huang, F.-L.; Lin, C.-J. Dual coordinate descent methods for logistic regression and maximum entropy models. Mach. Learn. 2011, 85, 41–75. [Google Scholar] [CrossRef]
- Santosa, F.; Symes, W.W. Linear inversion of band-limited reflection seismograms. SIAM J. Sci. Stat. Comput. 1986, 7, 1307–1330. [Google Scholar] [CrossRef]
- Goncharsky, A.; Stepanov, V.; Tikhonov, A.; Yagola, A. Numerical Methods for the Solution of Ill-Posed Problems; Springer: Berlin/Heidelberg, Germany, 1995. [Google Scholar]
- Hoerl, A.E.; Kennard, R.W. Ridge regression: Applications to nonorthogonal problems. Technometrics 1970, 12, 69–82. [Google Scholar] [CrossRef]
- Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 2017, 30, 3149–3157. [Google Scholar]
- Al Daoud, E. Comparison between XGBoost, LightGBM and CatBoost using a home credit dataset. Int. J. Comput. Inf. Eng. 2019, 13, 6–10. [Google Scholar]
- Bentéjac, C.; Csörgő, A.; Martínez-Muñoz, G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 2021, 54, 1937–1967. [Google Scholar] [CrossRef]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Hornik, K.; Stinchcombe, M.; White, H. Multilayer feedforward networks are universal approximators. Neural Netw. 1989, 2, 359–366. [Google Scholar] [CrossRef]
- Galushkin, A.I. The Back Propagation Error Method and Russian Works on Neural Networks Theory. Inf. Technol. 2014, 7, 66–76. Available online: http://novtex.ru/IT/it2014/It714_web.pdf#page=66 (accessed on 23 November 2023).
- Mukhamediev, R.I.; Kuchin, Y.; Amirgaliyev, Y.; Yunicheva, N.; Muhamedijeva, E. Estimation of Filtration Properties of Host Rocks in Sandstone-Type Uranium Deposits Using Machine Learning Methods. IEEE Access 2022, 10, 18855–18872. [Google Scholar] [CrossRef]
- Mukhamediev, R.; Amirgaliyev, Y.; Kuchin, Y.; Aubakirov, M.; Terekhov, A.; Merembayev, T.; Yelis, M.; Zaitseva, E.; Levashenko, V.; Popova, Y. Operational Mapping of Salinization Areas in Agricultural Fields Using Machine Learning Models Based on Low-Altitude Multispectral Images. Drones 2023, 7, 357. [Google Scholar] [CrossRef]
- Kuchin, Y.; Mukhamediev, R.; Yunicheva, N.; Symagulov, A.; Abramov, K.; Mukhamedieva, E.; Zaitseva, E.; Levashenko, V. Application of Machine Learning Methods to Assess Filtration Properties of Host Rocks of Uranium Deposits in Kazakhstan. Appl. Sci. 2023, 13, 10958. [Google Scholar] [CrossRef]
- Nash, J.E.; Sutcliffe, J.V. River flow forecasting through conceptual models part I—A discussion of principles. J. Hydrol. 1970, 10, 282–290. [Google Scholar] [CrossRef]
- Moriasi, D.N.; Arnold, J.G.; Van Liew, M.W.; Bingner, R.L.; Harmel, R.D.; Veith, T.L. Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Trans. ASABE 2007, 50, 885–900. [Google Scholar] [CrossRef]
- Borshch, S.V.; Simonov, Y.A.; Khristoforov, A.V.; Yumina, N.V. Forecasting the inflow into the Tsimlyansk Reservoir. Hydrometeorological studies and forecasts 2022, 4, 47–189. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 17301. [Google Scholar]
- Linardatos, P.; Papastefanopoulos, V.; Kotsiantis, S. Explainable ai: A review of machine learning interpretability methods. Entropy 2020, 23, 18. [Google Scholar] [CrossRef] [PubMed]
- Raschka, S. MLxtend: Providing Machine Learning and Data Science Utilities and Extensions to Python’s Scientific Computing Stack. Open Source Softw. 2018, 3, 638. [Google Scholar] [CrossRef]
- MLxtend Documentation. Available online: https://rasbt.github.io/mlxtend/ (accessed on 3 May 2023).
Task | Study Area | Machine Learning Methods | Remote Sensing Data | Result | Ref. |
---|---|---|---|---|---|
1 | Ramganga River catchment of the Ganga Basin, India | ANN | - | Ac = 83.5% | [30,31] |
1 | Guam River and the Han River, South Korea | multilinear regression and LSTM | - | RMSE = 0.08 m | [32,33] |
1 | Catchment located in the east coast of tropical peninsular Malaysia | SVR | - | NSE = 0.986 | [28,34] |
1 | Murray River, Australia | CNN, LSTM, BiLSTM | MODIS | Ac = 98% | [51] |
2 | Lakes or reservoirs | - | satellite altimetry | accuracy in the range of 0.2 to 1.05 meters | [42,43,44] |
2 | Lakes or reservoirs | - | satellite SAR | decimeter accuracy | [49] |
2 | Ili River | - | Sentinel-2 | NSE = 0.74 | [54] |
3 | Hunza River, Pakistan | MLP, SVR, RF | NSE = 0.993 | [35] | |
3 | Mekong River megadelta, Vietnam | RF, GPR, SVR, DT, LSSVM, MARS | - | MAE = 200 m3/s for dry month | [37] |
4 | Brahmani River basin, India | ANN, RF, SVR | Aqua-MODIS, Landsat | NSE > 0.85 | [38] |
4 | Midstream Yangtze River basin | LSTM, RF | - | NSE = 0.69 | [16] |
5 | Tisza and the Maros rivers, Hungary | RF and combined model | Sentinel-2 | NSE = 0.87 | [52] |
Mean | Date | pixelCount | pixelCount_Clo |
---|---|---|---|
235 | 1 March 2017 | 35,928 | 24,580.00 |
228.5 | 4 March 2017 | 12,125 | 6105.00 |
272 | 21 March 2017 | 31,851 | 41,113.00 |
294 | 10 April 2017 | 33,249 | 41,113.00 |
334.5 | 3 May 2017 | 34,572 | 0.00 |
… | … | … | … |
273 | 30 November 2021 | 35,983 | 0.00 |
Regression Model | Abbreviation | About Method | References |
---|---|---|---|
Linear regression | LR | Method is based on linear approach | [62] |
Lasso regression | Lasso | Based on the use of a regularization mechanism that not only helps in reducing over-fitting but it can help in feature selection | [63] |
Ridge regression | Ridge | The regularization mechanism is used to prevent over-fitting | [64,65] |
Elastic net | ElasticNet | Hybrid of ridge regression and lasso regularization | [66] |
XGBoost | XGB | Ensemble learning method based on the gradient boosted trees algorithm | [67] |
LightGBM | LGBM | Ensemble learning method based on the gradient boosted trees algorithm | [68,69,70] |
Random forest | RF | Ensemble learning method based on bagging technique | [71] |
Support vector machines | SVM | Method is based on the kernel technique | [72] |
Artificial neural network or multilayer perceptron | ANN or MLP | Feed forward neural network | [73,74] |
Evaluation Index | Equation |
---|---|
Mean Absolute Error | where n is sample size; |
Mean Squared Error | |
Nash–Sutcliffe model efficiency (or determination coefficient) | , |
Linear correlation coefficient (or Pearson correlation coefficient) | , |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mukhamediev, R.I.; Terekhov, A.; Sagatdinova, G.; Amirgaliyev, Y.; Gopejenko, V.; Abayev, N.; Kuchin, Y.; Popova, Y.; Symagulov, A. Estimation of the Water Level in the Ili River from Sentinel-2 Optical Data Using Ensemble Machine Learning. Remote Sens. 2023, 15, 5544. https://doi.org/10.3390/rs15235544
Mukhamediev RI, Terekhov A, Sagatdinova G, Amirgaliyev Y, Gopejenko V, Abayev N, Kuchin Y, Popova Y, Symagulov A. Estimation of the Water Level in the Ili River from Sentinel-2 Optical Data Using Ensemble Machine Learning. Remote Sensing. 2023; 15(23):5544. https://doi.org/10.3390/rs15235544
Chicago/Turabian StyleMukhamediev, Ravil I., Alexey Terekhov, Gulshat Sagatdinova, Yedilkhan Amirgaliyev, Viktors Gopejenko, Nurlan Abayev, Yan Kuchin, Yelena Popova, and Adilkhan Symagulov. 2023. "Estimation of the Water Level in the Ili River from Sentinel-2 Optical Data Using Ensemble Machine Learning" Remote Sensing 15, no. 23: 5544. https://doi.org/10.3390/rs15235544
APA StyleMukhamediev, R. I., Terekhov, A., Sagatdinova, G., Amirgaliyev, Y., Gopejenko, V., Abayev, N., Kuchin, Y., Popova, Y., & Symagulov, A. (2023). Estimation of the Water Level in the Ili River from Sentinel-2 Optical Data Using Ensemble Machine Learning. Remote Sensing, 15(23), 5544. https://doi.org/10.3390/rs15235544