1. Introduction
Water is an essential resource for human survival on Earth. However, water quality deterioration is a common occurrence due to various anthropogenic activities, including the improper disposal of sewage and other waste materials, construction and poor agricultural practices [
1,
2]. Water bodies can also be physically affected by natural factors such as the erosion of soil [
3]. It is important to continuously monitor any deterioration in quality and plan for appropriate recovery mechanisms such as the use of aerators, linings, biological treatments, embankments, etc. The most commonly used parameters for analyzing water quality include physico–chemical parameters such as pH, conductivity and turbidity. These parameters are usually gathered manually and later tested in laboratories to measure water quality, which can be a tedious and time–consuming task. In Pakistan, as in most other countries, these traditional methods and tools are used for collecting and analyzing water samples [
4,
5,
6]. Moreover, this requires human intervention and depends on the ready availability of data collection sites. Overall, this can lead to delayed action in response to events, leading to a deterioration in water quality. Traditionally, water quality estimation studies focus on predicting the water quality index (WQI) value, which is a multi–classification problem. However, water quality indices are biased as they are developed for a specific place and use a limited number of parameters. Thus, such indices are not applicable to all water types as they are dependent on the core physico–chemical water parameters, the location and the frequency of data sampling. With the recent advancements in remote sensing technology, a more generic approach can be used for acquiring timely data and increasing coverage in assessing water quality for any drinking water reservoir [
7,
8,
9,
10]. In remote sensing, the water quality is monitored by measuring the parameters that change the spectral properties of water bodies upon their interaction with light. These are known as the optically active constituents of water. On the other hand, there also exist components that do not show any direct detectable signals but can be estimated as they show high correlations with the detectable water quality parameters and these are referred to as the optically inactive parameters of water [
11,
12]. However, remote sensing alone does not have the capability to assess the water quality with precise and accurate results. Thus, modern techniques involving the combination of remote sensing and AI for accurate and timely water quality forecasts can be a more useful approach [
13,
14].
As for multi–step forecasting, researchers have been looking for more suitable models as the state–of–the–art artificial neural networks (such as MLP) directly consider each time point independently and discard much of the information in historical data in order to make a prediction at each time step [
15,
16]. Here, deep–learning–based regression models have been proven to be more effective as compared to machine learning models in solving complex regression problems such as multi–output multivariate time series forecasting [
17]. The traditional models lack the ability to capture real–world dependencies, whereas deep neural networks such as recurrent neural network (RNN) and long short term memory (LSTM) models can be very powerful in this regard [
18,
19]. This is especially true for multi–output problems, where temporal dependencies need to be detected to make future forecasts, as in the case of weather forecasting [
20].
The use of deep learning on remote sensing data for water quality parameter estimation is very limited. However, the work on water quality estimation through remote sensing has been utilized in this study. Due to the availability of various satellite images, water quality parameters have been investigated and various researchers have proposed different estimation algorithms for calculating water quality parameters. These studies have used satellites including Landsat, Sentinel and MODIS. Most of the studies have focused on optically active parameters, such as Chl–
[
21,
22], temperature [
11], turbidity [
23] and total suspended solids [
24,
25]. The reflection characteristics of optically active variables have allowed researchers to estimate parameters using semi–empirical/semi–analytical methods. These methods are used to establish patterns between the band wavelengths and the water quality parameters and to derive formulas for parameter estimations. For example, turbidity is calculated using bands 2 to 5 [
26] and wavelength bands of 645 nm and 859 nm [
27] of Landsat 8 images. Chl–
is extracted from images of Sentinel–2A [
28]. However, parameters with weak optical characteristics are also important for assessing the water environment. Such water quality parameters can be derived from the optical active parameters [
29]. Optically inactive parameters are also retrieved through remote sensing [
30]. Similarly to optically inactive parameters, DO is retrieved through regression methods applied to establish patterns comparing the remote sensing and field data based on the ratio of Bands 2 and 4 [
31]).
With the advent of artificial intelligence, machine learning is gradually being applied on remote sensing data. The use of machine learning techniques for water quality parameter estimation is traditionally carried out with models such as support vector machines (SVMs) [
32]. Similarly in [
33], 12 water quality parameters including DO, EC, nitrate, nitrite, pH, turbidity, etc., were extracted from the Karun River and the water quality index (WQI) was estimated with the use of a M5 Model Tree classifier that exhibited an RMSE of 1.412 and an MAE of 0.0274, in combination with the Gamma test technique, which was applied to the acquired data for data reduction purposes. An artificial neural network (ANN) model in combination with a linear regression model was used to extract total phosphorus and total nitrogen concentrations from Landsat 8 images [
34]. Other regression–based models, including evolutionary polynomial regression, have been used to predict DO, biochemical oxygen demand (BOD) and chemical oxygen demand (COD) with nine independent variables i.e., pH, turbidity, nitrite, nitrate nitrogen, phosphate, calcium, magnesium, sodium and EC, giving RMSE values of 4.417, 4.999 and 5.557 for DO, COD and BOD, respectively [
35]. A deep neural network (DNN) was proposed, using multiple hidden layers between the input and output layers and this network performed well in resolving complex problems with high accuracy [
36]. Deep–learning–based regression models are very effective as compared to traditional models in solving complex regression problems such as the forecasting of water quality parameters. A CNN model was used to estimate the concentrations of phycocyanin and chl–
using airborne hyperspectral imagery [
37]. In [
38], deep–learning–based regression models were applied to remote sensing images of the Guanhe river in China to estimate optically inactive water quality parameters—zinc, the permanganate index, total nitrogen, and total phosphorus—with a coefficient of determination (
) greater than 0.6. A hybrid approach using a traditional model (ARIMA) and neural network model was investigated for water quality time series prediction, resulting in RMSE values of 0.039, 0.063, and 0.051 for water temperature, boron and DO, respectively [
39]. A regression convolutional neural network (RegCNN) was proposed for multi–step wastewater treatment prediction with an MSE of 0.05 [
40].
The literature has revealed that, overall, the use of remote sensing techniques for the estimation of water quality parameters is a much faster and economical method, with minor concerns regarding the accuracy of the parameters retrieved. In addition, the studies have discussed the importance of deep learning models in multi–step water quality forecasts. However, less work has been conducted on utilizing the combination of both techniques for water quality monitoring. Thus, in this study, an approach utilizing both remote sensing and deep learning techniques applied to optically active and inactive water quality parameter estimation was investigated.
In this study, data were acquired for the stream network of the Rawal watershed. The Rawal watershed area consists of land as well as water streams. Hence, the stream network was extracted from the Rawal watershed using GIS tools. A digital elevation model (DEM) was created with Shuttle Radar Topography Mission (SRTM) data to extract the stream network. A total of eight water quality parameters were extracted from Landsat 8 (Collection 1 Level 1(C1 L1)) images for the period from 2014 to 2021. Amongst these eight parameters, six were optically active and two were optically inactive parameters. The optically active water quality parameters included “turbidity”, “total dissolved solids (TDS)”, “electric conductivity (EC)”, “Chlorophyll–
(chl–
)”, “Secchi disk depth (SDD)” and “land surface temperature (LST)”. The optically inactive parameters were “pH” and “dissolved oxygen (DO)”. Out of the eight parameters, seven were taken as dependent variables to estimate the future concentrations of the inactive parameter ‘DO’, which was considered an independent variable. Similarly, ‘EC’ was considered an independent variable amongst the eight parameters, whereas the remaining seven parameters were taken as dependent variables. The estimation of the EC and DO concentrations was chosen as these parameters are crucial in monitoring water quality. EC and DO help to identify the level of impurities and the level of oxygen in the water bodies, which can help analyze the survival of fish and other aquatic organisms. In addition, to analyze the performance of deep learning models on multivariate multi–step forecasts; various deep learning models including a convolutional neural network (CNN), fully connected network (FCN), recurrent neural network (RNN), multi–layer perceptron (MLP) and five variants of LSTMs [
41] that included vanilla, stacked, bidirectional, convolutional and CNN LSTMs were evaluated. This study was limited to the satellite imagery collected for the years 2014 to 2021 that covered the Rawal watershed area. Moreover, the optically active and inactive water quality parameters, i.e., EC and DO, were estimated for current and future events, using different water quality parameters with deep learning models. The study revealed that LSTMs demonstrated significantly goodperformance in multi–step forecasting for both optically active and inactive (EC and DO) parameters. The major contributions of this study are as follows:
The extraction of the stream network for the Rawal watershed from the SRTM DEM.
The extraction of a total of eight water quality parameters, six optically active and two optically inactive water quality parameters, by applying estimated band equations on Landsat 8 satellite imagery for the Rawal watershed stream network pertaining to the years 2014–2021.
The application of deep learning models for current and future multi–step forecasting of an optically active parameter, i.e., EC, and an optically inactive parameter, i.e., DO, using optically active/inactive water quality parameters. The analysis conducted using the deep learning models demonstrated the decline in water quality over the eight–year period and revealed that the factors that have contributed to the deterioration in water quality include seasonal variations and other environmental variables.
The value of using a remote sensing and machine learning approach was that it led to some important conclusions, including the identification of (i) the fact that the quality of water declined over the eight–year period, as well as (ii) the factors that contributed to this deterioration in water quality. In this study we aimed to find practical methods to analyze the factors affecting the water quality and to investigate the changes needed in the traditional water quality monitoring techniques for the betterment of society on a global scale. This will improve the socio–economic environment, which is dependent on an appropriate standard of water quality for its development, which may include activities such as agricultural operations. Therefore, the proposed solution can be used as a guideline for applications in other drinking water reservoirs besides the current study area. The hybrid deep learning and remote sensing approach can promote innovation in state–of–the–art water quality management and assessment techniques.
The paper is organized as follows.
Section 2 covers the proposed methodology for the extraction of the optically active and inactive water quality parameters and the application of deep learning models is discussed. The results of the deep learning models are elaborated in
Section 3. In
Section 4, the conclusions and future works in this area of research are presented.
3. Results and Discussion
The aim of this study was to explore the use of different deep learning models in current and multi–step parameter estimation for both optically active and inactive water quality parameters, i.e., EC and DO. The results and findings are discussed in detail in this section. The models were assessed in terms of three loss functions i.e., the root mean square error (RMSE), mean absolute error (MAE) and the mean absolute percentage error (MAPE). The RMSE and MAE both measure the error in the same units as the predicted variable. On the other hand, the MAPE indicates the error margin in the model forecast and is expressed as a percentage (%). Moreover, there are some temporal dependencies for time series forecasting problems. To overcome such dependencies, the data were trained by determining a split point without shuffling them. Hence, the training was performed on 0.6M samples without shuffling the data. A sample of the features calculated from the Landsat 8 images for the year 2021 is depicted in
Figure 5 and the last twenty samples are shown in
Table 3. The results of the deep learning algorithms—the CNN, FCN, RNN, MLP and LSTM variants—are assessed and each model performance is compared on the basis of the lowest RMSE reached with the same number of epochs.
The regression time series problem was framed inthe following two formulations:
Predict the DO and EC at the current time event (t) given the eight water quality features at the prior time steps, that is, a lag time period of three (t − 3, t − 2, t − 1).
Predict the DO and EC for the next three events (t + 1, t + 2, t + 3) based on the eight water quality features at the prior time steps with a lag time period of one (t − 1).
Next, the results for both of these formulations are discussed. The LSTM variants showed exemplary performance as compared to the other deep learning models.
Predictions of current event parameters: For current predictions of the optically active and inactive parameters, the last three lag events (t − 3, t − 2, t − 1) were used to predict the current time event (t).
Figure 6 displays the results for the current EC predictions. It can be seen that S–LSTM outperformed the other deep models, followed by the bi–LSTM with RMSE values of 281.689 and 281.811 (µS/cm), respectively. Overall, the LSTM variants displayed a much better performance for the current time event prediction task. This shows that the LSTM–dominated variants outperformed the LSTM–integrated ones. On the other hand, FCN and RNN models exhibited high RMSE values up to 301 (µS/cm).
Figure 7 displays the results for the current DO prediction task. The best results were achieved with V–LSTM and conv–LSTM, with RMSE values of 0.197, 0.198 (mg/L), respectively. Here, the LSTM variants showed a better performance when compared with other deep models, with V–LSTM giving only an 0.109 % MAPE. Similarly, for DO prediction, the RNN model demonstrated a high RMSE of 0.242 (mg/L).
Predictions of future event parameters: For multi–step forecasts, a lag time period of one (t − 1) was used to predict the next three events, i.e., t + 1, t + 2, and t + 3. For the future time event predictions of optically active and inactive parameters, EC and DO, Bi–LSTM performed the best among the other LSTM variants. For DO, V–LSTM and Bi–LSTM showed the minimum RMSE values of 0.2 and 0.199 (mg/L), respectively. Other variants, such as CNN–LSTM and Conv–LSTM, showed much better results than other deep models for the multi–step forecasting of DO, as shown in
Table 4. The RNN model exhibited a high RMSE of 0.238 (mg/L). For EC, the best results were shown by the two variants of LSTM as well, i.e., S–LSTM and Bi–LSTM with RMSE values of 281.93 and 281.741 (µS/cm), respectively, as seen in
Table 5. Thus, for both current and future water quality forecasts, the LSTM variants showed much better results than the other deep models. However, Bi–LSTM was the best performer when compared with the other LSTM variants. For EC, FCN and CNN showed high RMSE values of 296.46 and 294.38 (µS/cm), respectively.
Figure 8 and
Figure 9 show a year–wise comparison of the Bi–LSTM model for DO and EC, respectively. The performance of the Bi–LSTM model was the best among the deep learning models. The actual and predicted forecasts for both DO and EC parameters for the years 2020 and 2021 can be seen.
Figure 8 shows that, for each time step, the error margin for the DO predictions was very low. However, for EC, the forecasts for October through December 2020 were not that accurate as seen in
Figure 9. This could be due to the fact that EC shows variations during the summer and winter seasons. EC values in winter are generally lower than those in the summer season due to the high evaporation losses in summer and the increased drainage water inflow [
61]. Moreover, the year–wise analysis showed a decline in the water quality over the eight–year period, as we can observe a decline in the observed concentrations of the EC and DO water quality variables. The decline in concentrations over the years can be attributed to seasonal variations and other environmental variables [
62].
4. Conclusions
Rawal Lake is the main source of drinking water for the residents of Islamabad and Rawalpindi. However, the lake water is unfit to drink from as it receives untreated sewage and other wastewater due to the increase in population. Water quality assessments are made using manual labor and in laboratories, which is time-consuming. Thus, using the advancements in remote sensing and other technologies, water quality monitoring tasks can be made simple and robust. In this study, eight water quality features for the years 2014 to 2021 were calculated using Landsat 8 images of the study area of the Rawal stream network that were extracted with SRTM DEM data, using hydrological GIS tools. Six optically active water quality parameters, including turbidity, Chl– , SDD, TDS, EC, and LST, and two optically inactive features, i.e., DO and pH, were taken as inputs to observe the water quality parameter estimations for current and future events.
The experiments were limited to predicting only one of the active and inactive water quality parameters, i.e., EC and DO. The multi–step water quality forecasts were made using different deep learning models, i.e., CNN, FCN, MLP, RNN, and five variants of the LSTM model, which included LSTM–dominated and LSTM–integrated versions, including vanilla, stacked, bi–directional, convolutional, and a CNN LSTM hybrid. These models were then compared on the basis of the lowest RMSE achieved. The results showed that the LSTM variants displayed the best performance in the current and future multi–step parameter estimations for both optically active and inactive parameters with the bi–directional LSTM emerging as the leading variant among them. Moreover, the performance of the LSTM–dominated variants was better when compared with the LSTM–integrated version for the observed problem.
The proposed approach, using the combination of remote sensing and machine learning, identified that the water quality declined over the eight–year period, as observed through the concentrations of the water quality variables. Moreover, the factors that contributed to this water quality deterioration include the concentrations of water quality variables that are affected by seasonal variations and other environmental variables. Thus, in the future, some additional water quality parameters can be used for multi–step water quality parameter estimations and forecasts. These environmental variables, which may include air quality parameters, slope, soil type, and the geology and lithology of the study area, can be considered to examine water quality parameters.