Deep Ensemble-Based Approach Using Randomized Low-Rank Approximation for Sustainable Groundwater Level Prediction

Manna, Tishya; Anitha, A.

doi:10.3390/app13053210

Open AccessArticle

Deep Ensemble-Based Approach Using Randomized Low-Rank Approximation for Sustainable Groundwater Level Prediction

by

Tishya Manna

^†

and

A. Anitha

^*,†

School of Information Technology and Engineering, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2023, 13(5), 3210; https://doi.org/10.3390/app13053210

Submission received: 24 December 2022 / Revised: 10 January 2023 / Accepted: 13 January 2023 / Published: 2 March 2023

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Groundwater is the most abundant freshwater resource. Agriculture, industrialization, and domestic water supplies rely on it. The depletion of groundwater leads to drought. Topographic elevation, aquifer properties, and geomorphology influence groundwater quality. As the groundwater level data (GWL) are time series in nature, it is challenging to determine appropriate metrics and to evaluate groundwater levels accurately with less information loss. An effort has been made to forecast groundwater levels in India by developing a deep ensemble learning approach using a double-edge bi-directed long-short-term-memory (DEBi-LSTM) model approximated with a randomized low-ranked approximation algorithm (RLRA) and the variance inflation factor (VIF) to reduce information loss and to preserve data consistency. With minimal computation time, the model outperformed existing state-of-the-art models with

96.1 %

accuracy. To ensure sustainable groundwater development, the proposed work is discussed in terms of its managerial implications. By applying the model, we can identify safe, critical, and semi-critical groundwater levels in Indian states so that strategic plans can be developed.

Keywords:

groundwater level; prediction; multi-collinearity; randomized low rank approximation; long-short-term-memory; ensemble learning; sustainability

Graphical Abstract

1. Introduction

Water is a renewable energy that protects the ecosystem. At the same time, fresh water is a limited and costly resource. Water conservation encompasses the strategies, policies, and activities to manage fresh water as a sustainable resource to balance current and future human demand. A UNICEF report indicates that half of the world’s population will live in areas where water is scarce by 2025 [1]. Multiple stressful events across a wide range are inextricably linked to water scarcity. Water stress results in a global risk for water demand and water unavailability. When the natural elements are unable to meet the essential demand for water, it has a serious effect on the functionality of the ecosystem. The extensive impact of water scarcity has been shown in several large cities in India such as Punjab, Rajasthan, Haryana, Uttar Pradesh, Karnataka, Tamil Nadu, and Andhra Pradesh [2]. In any socio-economic civilization development, groundwater is the most desirable resource;

90 %

and

50 %

of necessary water demand in rural and urban areas is covered by groundwater storage [3]. Acute water demands affect industry, agriculture, health, and the economy, so there is concern about the sustainability of groundwater resources. Groundwater depletion includes the reduction of water in streams, the deterioration of water quality, and land subsidence. In 2003, NASA’s Gravity Recovery and Climate Experiment (GRACE) satellite gathered data on GWL changes in south India, and it concluded that GWL increased before 2009, but since then, it has decreased at a rate of

0.25

cm/month, as studied by [4]. From 2003 to 2009 and after 2009, the changes in GWL have been recorded as two segments, as noted in [4]. Nevertheless, GWL simulations represent an integrated response to geological, topographic, and hydrological factors, making them challenging.

Over the past two decades, artificial intelligence (AI) models have been widely employed to address the drawbacks of conventional methods for GWL stimulation. Due to the dynamic and heterogeneous nature of GWL, it is difficult to conduct simulations with high accuracy and comprehensiveness. The following Figure 1 depicts the state-wise GWL rise and fall

(%)

in India from 2010 to 2020, where 0 to 100 represents the rise of the waterbed level and 0 to

(- 150)

represents the fall of the waterbed level. A rise is considered an increase in the availability of the water level under the land surface, and a fall is considered the decline of the water level from the saturated zone. This study abbreviates the Indian states according to the Indian template, as mentioned in [5]

Numerous studies have been undertaken regarding groundwater, such as groundwater quality investigations, surface water level estimation, observations on organic components in the drinking water, groundwater contamination, and many more using a sundry simulation approach. These approaches can be conceptual, experimental, or numerical models. Out of these groundwater level predictions arises an interesting need of time due to severe climate changes. It has been observed that groundwater level predictions were carried out using smart techniques such as machine learning, deep learning techniques, and ensemble learning. Some of the related works are depicted below.

1.1. Review on Groundwater Level Prediction Using Machine Learning Technique

Knoll et al. [6] proposed a prediction model for identifying the nitrate level of aquatic factors by statistical and machine learning (ML) techniques and found that random forest (RF) produces better performance with a maximum R-value. Majumdar et al. [7] studied the impact of aquifer exhaustion by monitoring groundwater extraction using a multi-variate regression method and RF with a good

R 2

value. Adiat et al. [8] predicted GWL using geoelectric parameters and explored the process by an artificial neural network (ANN) using RMSE and the regression coefficient. Hussein et al. [9] compared the various ML techniques and found that support-vector regression (SVR) produces good accuracy for measuring the availability of groundwater to mitigate the sustainability and scarcity problem. Banadkooki et al. [10] hybridized a neural network with the Whale algorithm for hyperparameter optimization using an adaptive neuro-fuzzy interface system (ANFIS).

Even though machine learning techniques show better prediction accuracy, they have the limitation of stimulating a reduced amount of data. As the data increases, there is a chance for less prediction accuracy. Additionally, some machine learning techniques lack in identifying the best attributes to support the prediction process, where feature selection takes a major role. Furthermore, comparatively, the computation time is high as there is an increase in the training process. Thus, to overcome these issues, researchers have directed their attention towards deep learning techniques in groundwater level prediction.

1.2. Review on Groundwater Level Prediction Using a Deep Learning Technique

In recent studies, deep learning (DL) techniques have shown an extensive prospect in the prediction field by refining the predictive parameters that have the ability to assess enormous amounts of data. Huang et al. [11] compared the performance of ML with DL techniques and found that long short-term memory (LSTM) produced better predictions for groundwater recharge. Chen et al. [12] studied the existence of potential groundwater levels contrary to the high demand for renewable groundwater levels and presented better validation accuracy. Kochhar et al. [13] partitioned the available dataset into pre-monsoon, post-monsoon, and combined annual datasets for measuring the groundwater level with a seasonal auto-regressive integrated moving average (SARIMA) model with LSTM. Sun et al. [14] proposed a data-driven model for groundwater level prediction with practical significance using ARIMA and LSTM. Oyedele et al. [15] developed a DL framework that is tuned by a genetic algorithm to produce a generalized prediction over daily pricing. Jimenez-Mesa et al. [16] designed a non-parametric framework to estimate the statistical significance for classification by combining an auto-encoder with a support-vector machine (SVM).

Even though deep learning has the ability to process huge data, its prediction depends on various influential parameters, which may affect the prediction accuracy of using a particular classifier. Additionally, it suffers from the limitation of high computation cost with data inconsistency, leading us to take steps for the data acquisition process. Furthermore, overfitting is one of the significant issues to be taken care of with a better algorithm selection process with an efficient hyperparameter tuning process. Thus, combining more than one classifier may improve the prediction process and pave the way for the ensemble learning mechanism.

1.3. Review on Groundwater Level Prediction Using Ensemble Learning

Due to the growing demand for the more accurate prediction of groundwater levels, the ensemble model has presented a quality evaluation which is essential for the precise identification of influential parameters. A machine learning-based ensemble model was analyzed by Mosavi et al. Ref. [17] proposed an ensemble-based model using boosted regression tree (BRT) and random forest (RF) to estimate the groundwater hardness. Yin et al. [18] proposed a machine learning-based ensemble model with ANN, SVM, and response surface regression to perform the weak-strong learner’s decision-making process. Jiang et al. [19] proposed a multi-model perturbation-based algorithm to perturb the feature space using bootstrap sampling to provide a better solution. Mosavi et al. [20] proposed a design with GAMBoost and AdaBoost for a boosting method, and their bagging method was designed with a classification and regression tree (CART) and RF mode. Lee et al. [21] proposed a DL-based ensemble model using long short-term memory (LSTM), gated recurrent units (GRU), and a multi-layered perceptron (MLP) for predicting the erythrocyte sediment rate. Li et al. [22] proposed a DL-based ensemble model designed using a recurrent neural network (RNN) and a fully connected neural network for analyzing future stock movements. Ngo et al. [23] developed an ML-based ensemble learning process using ANN, SVR, and M-5 rules to predict energy consumption in buildings.

From the previous related work, the following gaps are identified, which motivates the identification of the problem statement to propose the research model. The identified gaps are as follows:

Machine learning methods struggle to handle enormous amounts of data in groundwater level prediction for huge geographical regions.
Deep learning techniques have high computation time, and it is difficult to identify the better algorithm with hyperparameters upon feature selection in groundwater level prediction.
Identifying the primary attributes for groundwater prediction is tedious, as every attribute has its own advantages and disadvantages that affect the generalization mechanism.
Data pre-processing techniques used in the previous research led to information loss and, in turn, reduced the prediction accuracy.

One of the challenging tasks in prediction depends on understanding the data pattern. Even though smart techniques help us with learning adaptability, the selection of an algorithm with the model’s potentiality depends on the evaluation of performance metrics with less information loss. As the groundwater level is characterized by various factors such as geological, topological, and hydrological properties, it is mandatory to pre-process the data to maintain data consistency. These considerations and limitations motivate us to identify the significant attributes using the variance inflation factor (VIF) using the multi-collinearity test. The reduced dataset is further approximated using randomized low-rank approximation (RLRA) to reduce the computation cost of the training process. Thus, the proposed model uses RLRA and VIF for data consistency, whereas a deep ensemble model (DEM) using double-edge bi-directed LSTM (DEBi-LSTM) provides improved prediction accuracy. The objectives of the paper are as follows:

To obtain attributes using multi-collinearity by applying the variance inflation factor (VIF) value;
To obtain the data approximation using the randomized low-rank approximation (RLRA) technique over the attribute selection;
To combine random forest as a base learner using a stacking mechanism and double-edge bi-directional LSTM as a meta-classifier to obtain the deep ensemble Model (DEM);
To find better classification accuracy, the DEM was applied to both approximated and non-approximated reduced datasets;
To reduce the overfitting using the proposed DEM mechanism;
To check for minimal or no information loss during data approximation to maintain data authenticity;
To calculate the time taken to train and to test the data iteratively to assess the computation cost.

The article is articulated as follows: the introduction discusses the need for groundwater and the existing groundwater level in the states of India, followed by the related research activities carried out in Section 1. Background fundamentals are discussed in Section 2. The proposed research methodology on groundwater level prediction is emphasized in Section 3 with feature selection and approximation processes, whereas Section 4 explains the experimental analysis according to the deep ensemble model. Section 5 explains the results and discussion. Section 6 describes the comparative analysis with existing research methods. Section 7 illustrates the managerial implication of the proposed groundwater prediction process. Section 8 follows with conclusions and future work.

2. Background Fundamentals

One of the important tasks in the prediction process includes data collection and data pre-processing. After data collection, analyzing and organizing the data into valuable insights improves the decision-making process. Pre-processing involves identifying the relationship between the attribute values of the collected data, and identifying the significant attributes helps in reducing the computation time and cost. To find the correlation between the dependent and independent variables and remove the predictors that are less likely to represent the prediction process using the multi-collinearity test is a more challenging task that can be achieved using VIF. Thus, we utilized VIF in our model.

2.1. Variance Inflation Factor (VIF)

Predictive models deal with various influential parameters that improve the efficiency of the decision-making process. The variance inflation factor (VIF) determines the correlation between the independent variables. The independent variables are represented as

a_{i}

,

1 \leq i \leq n

. The VIF threshold,

(τ)

, is indicated as the threshold value for attribute selection. Algorithm 1 shows the process of attribute selection using VIF. Let us explain the process of VIF using a sample information system, represented in Table 1 and in Example 1. The variance inflation factor for the ith attribute is shown in Equation (1):

V I F = \frac{1}{1 - {R_{i}}^{2}}

(1)

Algorithm 1 Feature selection using variance inflation factors

Parameters:

${[A]}_{m \times n}$ : Input matrix of dimension $m \times n$ .
$a_{i}$ : attribute $a_{i} \in {[A]}_{m \times n}$ , where $1 \leq i \leq n$ .
$R_{i}^{2}$ : R-square for each attribute $a_{i}$ .
VIF: variance inflation factor.
$τ$ : VIF threshold value, (assumed based on the data pattern.)

Input: Input matrix is

{[A]}_{m \times n}

.

Output: Reduced dataset is

{[A_{R}]}_{p \times q}

.

Procedure to calculate the VIF value

1:: Load input matrix ${[A]}_{m \times n}$ .
2:: $a_{i} \in {[A]}_{m \times n}$ where $i = 1, 2, . . ., n$ .
3:: Calculate R-square ${R_{i}}^{2}$ for each attribute $a_{i}$ .
4:: Calculate $V I F_{i} = \frac{1}{1 - {R_{i}}^{2}}$ .
5:: if $V I F_{i} \geq τ$ then.
6:: Discard the corresponding attribute, $a_{i}$ .
7:: $i = i + 1$ .
8:: Go to 5.
9:: else
10:: Corresponding attributes have no redundancy.
11:: end if

Example 1.

A sample information system is given in Table 1. where the attributes are chlorine, Cl, represented as

(a_{1})

, magnesium, Mg, represented as

(a_{2})

, manganese, Mn, represented as

(a_{3})

, iron, Fe, represented as

(a_{4})

, calcium, Ca, represented as

(a_{5})

, Zinc, Zn, represented as

(a_{6})

, and copper, Cu, represented as

(a_{7})

.

On considering the attribute

a_{1}

(Chlorine), the VIF is calculated using Equation (1) as follows:

\begin{matrix} R Square : 0.860 \\ V I F_{a_{1}} = \frac{1}{1 - 0.860} = 7.120 \end{matrix}

Thus, a summary is given in Table 2.

From Table 2, it can be seen that the attributes

a_{2}, a_{3}, a_{4}, a_{5}

, and

a_{6}

have a VIF value

τ

less than the considered threshold of 5. The information system is now reduced with

a_{2}, a_{3}, a_{4}, a_{5}, a_{6}

attributes. Again, the reduced information system checked with the VIF values and found there is no change in the attributes to be reduced. Thus, the reduced information system is given in Table 3.

2.2. Randomized Low-Rank Approximations

Data approximation plays a major role in reducing complex calculations into less complicated ones. Most data modeling works better for certain and less complex datasets. An appropriate approximation model helps in performing the decision-making process better with complex datasets. If the knowledge base has a huge number of attributes with various ranges of data, the computation complexity can be reduced by finding the rank of the space. Thus, we utilized the randomized low-rank approximation (RLRA) technique in the research model [24,25]. The method of approximating a matrix by a relatively lower rank matrix is known as a low-rank approximation. The goal is to achieve a further compact representation of the original data set with limited loss of information.

In this section, let us discuss the basics of the RLRA technique.

(i): Singular Value Decomposition (SVD)

Any real matrix of arbitrary size can be factored or decomposed into a product of orthonormal matrices and diagonal matrices, such as Equation (2):

{[Y]}_{p \times s} = U_{p \times p} Σ_{p \times q} V_{q \times q}^{T}

(2)

where U and V both have orthonormal columns, and whose columns contain, respectively, the left and right singular vectors. The diagonal matrix

Σ

contains singular values corresponding to the singular vectors. We denote the ith singular value as

σ_{i} (Y)

and order them and their corresponding singular vectors into U and V such that

Σ =

diag

(σ_{1} (Y), σ_{2} (Y), \dots)

and

σ_{i} (Y) \geq σ_{i + 1} (Y)

for all i. This is represented as Equation (3),

Y = \sum_{i = 1}^{min {p, q}} σ_{i} u_{i} v_{i}^{T}

(3)

which is a decomposition of Y into a sum of rank-l matrices. Here,

Y^{T} Y = V Σ^{2} V^{T}

and

Y Y^{T} = U Σ^{2} U^{T}

; note that the eigenvectors of

Y^{T} Y

are

{v_{i}}

, and the eigenvectors of

Y Y^{T}

are

{u_{i}}

. Each one has the eigenvalues

λ_{i} = σ_{i}^{2}

.

(ii): Orthogonal Projections

Assume we have an orthonormal basis for a linear subspace that has been stacked into a matrix:

Q = [q_{1} q_{2} \dots q_{k}]

. Then,

Q Q^{T}

is a projection matrix that, when applied to a matrix

A_{R}

, projects it orthogonally onto the subspace spanned by Q, which we denote as

{\hat{A}}_{A p}

in Equation (4):

{\hat{A}}_{A p} \equiv Q Q^{T} A_{R}

(4)

(iii): Norms

Two matrix norms are commonly used to keep things simple and easy to compare—the Frobenius norm, Equation (5), and the

L_{2}

norm, Equation (6).

The Frobenius norm is as follows:

∥ Y ∥_{F}^{2} = \sum_{i, j} Y {(i, j)}^{2} = t r (Y^{T} Y) = \sum_{i = 1}^{min {p, q}} σ_{i} {(Y)}^{2}

(5)

The

L_{2} -

operator norm is as follows:

∥ Y_{2} ∥ = max_{∥ v ∥_{2} = 1} ∥ Y v ∥_{2} = σ_{1} (Y)

(6)

where, for vectors

{v, ∥ v ∥}_{2}

is the standard

L_{2}

(Euclidean) norm

∥ v ∥_{2}^{2} = \sum_{i} v_{i}^{2}

.

(iv): Optimal Randomized Low-Rank Decomposition

The motivation for using the Frobenius and

L_{2}

norm for the RLRA method was supported by the classical result from [26], which states that the best rank-k approximation to

Y = U Σ V^{T}

is achieved by truncating the singular values (setting

σ_{j} (Y) = 0

for

j > k

). There always exists a set of columns and multiplicative factor of those to produce the optimal solution [27].

Table 3 attribute values are approximated by Algorithm 2 and represented in Table 4. By approximation using the RLRA algorithm, minimal information loss is achieved, which ensures that the authenticity of the data remains unchanged. Figure 2 visualizes the heatmap structure of the reduced information system and the RLRA-information system to represent the color variation of numerical data, while preserving the data’s authenticity. Figure 3 represents the graphical similarity behavior of the RLRA algorithm to the reduced information system and the approximated information system.

Algorithm 2 Approximation using RLRA.

Parameters:

1.: ${[A_{R}]}_{p \times q}$ : Input reduced matrix.
2.: $ρ [{[A_{R}]}_{p \times q}]$ : Rank of input matrix.
3.: k: Rank of target approximated matrix.
4.: $ϕ_{q}$ : Orthogonal basis.
5.: s: Subsample.
6.: $[P r o b_{q_{i}}]$ : Probability of orthogonal basis using $F r o b e n i u s - L_{2}$ norm.
7.: ${[Y]}_{p \times s}$ : Resample vector.
8.: $U_{1}, U_{2}, . . ., U_{k}$ : Orthonormal basis of decomposed resampled vector.
9.: Q: Orthonarmal basis using random projection.
10.: ${\hat{A}}_{A P}$ : Low-rank approximation.
11.: $ρ (k)$ : Rank of the low-rank approximated vector.

Input: Input matrix

{[A_{R}]}_{p \times q}

.

Output: Reduced matrix

{[{\hat{A}}_{A p}]}_{p \times q}

.

Procedure:

1:: Load input matrix ${[A_{R}]}_{p \times q}$ with p rows and q columns.
2:: Compute the rank of ${[A_{R}]}_{p \times q}$ given as $ρ [{[A_{R}]}_{p \times q}] = r$ .
3:: Let k be the rank of the approximated matrix.
4:: Suppose $ϕ_{q}$ is an orthogonal basis for the subsamples s, where $s \in {[A_{R}]}_{p \times q}$ for the Frobenius– $L_{2}$ norm with the probability $[P r o b_{q_{i}}]$ as $[P r o b_{q_{i}}] = \frac{∥ A_{R} (:, q_{i}) ∥_{L_{2}}^{2}}{(∥ A_{R} ∥_{F r o b e n i u s}^{2})}$ .
5:: Resample the vector $[Y_{p \times s}] = m a x (P r o b_{q_{i}})$ , where $1 \leq i \leq s$ .
6:: Apply the resampled vector to the orthonormal basis for the best approximation. Decompose ${[Y]}_{p \times s} = U Σ V^{T}$ when $U_{k} = U_{1} \cdot U_{2} \cdot \dots U_{k}$ is the orthonormal basis of $[Y]$ , where $U_{1} \cdot U_{2} \cdot \dots U_{k} ⟶$ is a decomposition of a resampled vector.
7:: Compute $Q = [q_{1} \cdot q_{2} \cdot \dots q_{k}]$ , then top-rank k of a singular vector of $[Y]$ via SVD.
8:: Low-rank approximation ${\hat{A}}_{A P}$ onto basis $Q : {\hat{A}}_{A P} = Q \cdot Q^{T} \cdot A_{R} = ρ (k)$ .

2.3. Ensemble Learning

In general, ensemble learning is based on the baseline classifiers for dependent or independent methods. In dependent methods, the result of one classifier has an impact on the next classifier’s formation. Some of the examples are boosting algorithms [28,29,30]. On the other hand, the independent methods build each classifier separately from the subset of the data and combine its results [7,8,9,31]. Various ensemble techniques have been proposed for the successful improvement of predictive accuracy [17,18,20]. The final predictive model requires a proper combination of several learners. The combination of these classifiers is divided into averaging and meta-learning. Simple ensemble techniques come under averaging methods, whereas advanced ensemble techniques include stacking, blending, bagging, and boosting methods. As our proposed model aims to improve predictive accuracy, the stacking ensemble approach will be discussed.

2.3.1. Stacking Ensemble Learning

Stacking is a powerful ensemble learning mechanism. The base learners are stacked in a parallel manner by combining with the meta learners to obtain better prediction accuracy. The general architecture of the stacking mechanism is given in Figure 4. It consists of two or more base learner models and meta-models combined to form a predictive model. The data are divided into folds and partitioned into training and testing data. The base learners are triggered with the predictions as first-level predictions on the training data. The meta-model helps to combine the predictions of the first level, and the prediction pattern is trained on other models to obtain the prediction accuracy.

3. Proposed Research Methodology on Groundwater Level Prediction

3.1. Study Area Investigation and Data Pre-Processing

The paper aims to investigate the groundwater level in India. Groundwater level data were collected from all the districts of India from 2000 to 2021. As per the census of India, 593 districts were counted in the years 2001 to 2010, whereas 640 districts were counted from 2011 to 2020. The census of India reported 773 districts from the year 2021. Thus, the data collected are from 2000 to 2021 and have the objects of

5930 + 6400 + 773 =

13,103 records. For a better understanding of the data pattern, India was divided into four divisions, such as East and Northeast India, encompassing the states Arunachal Pradesh, Assam, Meghalaya, Nagaland, Manipur, Mizoram, Tripura, Sikkim, West Bengal, Jharkhand, Bihar; Northwest India includes the states Uttar Pradesh, Uttarakhand, Haryana, Chandigarh, Delhi, Punjab, Himachal Pradesh, Jammu, Kashmir, and Rajasthan. Central India includes the Nicobar Islands, Madhya Pradesh, Gujarat, Dadra, Nagar Haveli, Daman and Diu, Goa, Maharashtra, and Chhattisgarh, whereas South India includes Andhra Pradesh, Telangana, Tamil Nadu, Puducherry, Karnataka, Kerala, Lakshadweep, Andaman, and the Nicobar Islands, as represented in Figure 5. The groundwater sampling was carried out from representative borewells in various districts of every state of India through web resources [32]. With consultation with the domain expert, the variables that have a significant impact on GWL are listed in Table 5. As per the expert advice, the data were collected on three main factors, such as geological, topological, and hydrological properties, as given in Figure 6. The collected data underwent pre-processing to check their consistency. Out of 13,103 objects, 387 records had missing data, and 716 pieces of data had conflicting information, leading to the data removal process. Thus, 12,000 records were considered for processing, whereas 1103 records were removed. It is essential to identify significant attributes that contribute better to the knowledge discovery process of groundwater levels. As the collected data are independent of each other, finding the inter-relations or inter-associations between the attributes can be recognized using the multi-collinearity test with the help of variance inflation factors (VIF). Therefore, the following section deals with identifying significant attributes using the multi-collinearity test.

3.2. Feature Selection Using Multi Collinearity Test

This section aims to identify the selection of significant attributes using a multi-collinearity test. For each attribute, variance inflation factors (VIF) identify the multi-collinearity number using Equation (1), where the unadjusted determination coefficient for the ith independent attribute over the other attributes is determined by

{R_{i}}^{2}

. The VIF threshold value (

τ

) was assumed on a better understanding of the data pattern. For the collected data with a dimension of 12,000 × 32, the VIF is calculated on each attribute and discards the attributes that fail to have an association in the prediction process. The process of VIF calculation is given in Algorithm 1. The calculated VIF values for all 32 attributes are depicted in Table 6. Figure 7 clearly illustrates the attribute reduction process, and better features are selected. The reduced sample dataset with 18 attribute values is presented in Figure 8.

3.3. Attribute Approximation Using Randomized Low-Rank Approximation Method

After the attribute selection through the VIF number of the corresponding column, the rank of the reduced dataset was checked using the rank function of MATLAB. The rank function identified that the reduced dataset was of rank 18. The basics of RLRA are explained in Section 2.2 with Algorithm 2. The sample dataset is depicted in Figure 8.

The real rank of the dataset was 18, but it was approximated by the RLRA technique, which obtained 3 as the reduced rank. Figure 9 shows the approximated dataset of the sample dataset.

To depict the data authenticity, a heat map and similarity representation were created for both the original sample dataset and approximated data, as shown in Figure 10 and Figure 11. It can be seen that the data distribution on the knowledge base is almost the same for both datasets. Thus, data authenticity is maintained. The approximated and non-approximated dataset is further applied to the deep ensemble learning process.

3.4. Proposed Deep Ensemble Model

The main idea of the proposed ensemble method is to combine the assorted base classifiers into different layers. The first layer (Level 0) trains the group of deep base learners on a different partition of the training dataset to form a so-called team. In the next layer, Level 1, a group of meta-classifiers is trained on the prediction of the team of Level 0. In the final layer, the predictions of the previous group of classifiers are combined with the meta-classifier to produce the results. Figure 12 shows the stacking mechanism of the proposed model, as discussed in Section 2.3.1.

3.4.1. Proposed Deep Ensemble Architecture

The proposed deep ensemble model encompasses random forest (RF) as the base learner and double-edged bi-directional LSTM (DEBi-LSTM) as the meta-classifier [33,34]. The reduced dataset from the randomized low-rank approximation is partitioned into training data of

7200 (60 %)

and testing data of

4800 (40 %)

and fed into the deep ensemble model. The overall learning architecture of the proposed ensemble is depicted in Figure 13. The architecture makes use of two levels of classifiers

(1)

. Each training fold is applied into the random forest (RF) as a base learner

(2)

. The output of the base learner in each fold is combined with the DEBi-LSTM using a meta classifier and produces the output. The proposed architecture is similar to a multi-layer perceptron, with Level 0 representing the input layer, Level 1 representing the hidden layer, and the output layer. The meta-classifier (DEBi-LSTM) uses the ‘Relu’ activation function by taking the input from the base learner and producing the output in the final layer. Let us discuss the design of the proposed double-edged bi-directional LSTM before the training process.

3.4.2. Double-Edged Bi-Directional LSTM

The first hidden layer of the bi-directional connection is configured with 128 LSTM memory units through the ‘Relu’ activation function, whereas the return sequence must be true to allow passage into the output from the prior layer to the next subsequent layer. The first layer is narrowed using the dropout layer as

0.2

. The second hidden layer is stacked with 64 LSTM memory units by the ‘Relu’ activation function, and the dropout layer is configured as

0.1

. Hereafter, two dense layers

(32

and 3, respectively) are separated by a dropout layer equal to

0.2

. A fully connected layer followed by this above architecture for obtaining the prediction output is called a dense layer. The design of the proposed DEBi-LSTM is represented in Figure 14.

3.4.3. Proposed Training Process

The training phase took the input of both the approximated (AP) data from RLRA and the non-approximated reduced (NAP) data to be fed into Level 1 of the base learner. Given the dataset

{I n p u t_{i}}^{0}

, the procedure selected random n samples of equal size for each dataset. Each sample was partitioned into training

(60 %)

and testing

(40 %)

data.

{I n p u t_{i}}^{0}

=

({T r a i n_{i}}^{0}, {T e s t_{i}}^{0})

. The Level 0 classifier was generated using RF to each

{T r a i n_{i}}^{0}

with a

λ

learning rate. Hence, the first fold with the

λ

learning rate was called a

λ

team. A collection of

λ

teams was

G R P_{i}

,

1 \leq i \leq n

, represented as

G R P_{i} = R F_{i 1}, R F_{i 2}, \cdot, R F_{i k}

, where

i = 1

to 5,

k : λ

team. Each

{T e s t_{i}}^{0}

produced the prediction as

{F_{o u t}}^{0}

of a sample n to be fed as input to the next level, Level 1. Once the Level 1 fed with the previous layer prediction was finished with the meta-classifier, double-edge bi-directional LSTM (DEBi-LSTM) trained the

{F_{o u t}}^{1}

as

{T r a i n_{i}}^{1}

. Subsequently, the

{T e s t_{i}}^{1}

generated the data based on the meta-classifier to produce the final output,

{F_{o u t}}^{2}

. The overall training process is illustrated in Algorithm 3.

Algorithm 3 Ensemble model for groundwater level prediction

Parameter:

1.: $I n p u t_{d a t a}$ ${A p_{i}}^{0}$ : Input matrix of ${[{\hat{A}}_{A p}]}_{p \times q}^{A p}$ of approximated data.
2.: $I n p u t_{d a t a}$ ${N A p_{i}}^{0}$ : Input matrix of ${[{\hat{A}}_{A p}]}_{p \times q}^{N A p}$ of non-approximated data.
3.: ${[T r a i n_{A p}]}_{i}^{0}$ : Training set for approximated data for Level 0.
4.: ${[T e s t_{A p}]}_{i}^{0}$ : Testing set for approximated data for Level 0.
5.: $C l$ : Base learner classifiers.
6.: $R F_{i j}$ : Random forest classifier for each fold of the training set.
7.: $G R P_{i}$ : Team of the base learner classifier.
8.: ${[{[F_{o u t}]}_{i j}^{1}]}^{A p}$ : Prediction of Level 0 base learner classifier.
9.: $I n p u t_{d a t a}$ ${A p_{i}}^{1}$ : The output matrix produced from Level 0 to be input to the Level 1 model.
10.: ${D a t a p a r t i t i o n_{i}}^{1}$ : Partitioning the Level 0 data into training and testing sets for Level 1.
11.: $D e e p c l f$ : $D e e p_{c l a s s i f i e r}$ of Level 1 model. Here, $D e e p c l f$ is DEBi-LSTM.
12.: ${[{[F_{o u t}]}_{i j}^{1}]}^{A p}$ : Find the prediction of the Level 1 meta-classifier.
13.: $F i n a l_{m o d e l}$ = Finds the max value among the two input datasets.
14.: $B e s t_{f i t}$ : Final prediction.

1:: Procedure Input: Input Data.
2:: $I n p u t_{d a t a}$ ${[{\hat{A}}_{A p}]}_{p \times q}$ = ${[T r a i n_{A p}]}_{i}^{0}$ ⋃ ${[T e s t_{A p}]}_{i}^{0}$ = ${[{\hat{A}}_{A p}]}_{p \times q}^{0}, 1 \leq i \leq n, p :$ row, $q :$ column of the input matrix.
3:: Procedure: Base classifier Level 0
4:: $C l = C l_{1}, C l_{2}, . . ., C l_{k}$ is a set of the k-base classifier algorithm.
5:: for each ${[T r a i n_{A p}]}_{i}^{0}, 1 \leq i \leq n$ .
6:: for each $C l_{j} \in C l, 1 \leq j \leq k$ .
7:: $R F_{i j} ⟵ f i t (C l_{j}, {[T r a i n_{A p}]}_{i}^{0}), 1 \leq j \leq k$ .
8:: $G R P_{i} ⟵ [R F_{i 1}, R F_{i 2}, \dots R F_{i k}], 1 \leq i \leq n$ .
9:: For each $R F_{i j} \in G R P_{i}, 1 \leq i \leq n, 1 \leq j \leq k$ .
10:: ${[{[F_{o u t}]}_{i j}^{1}]}^{A p} ⟵$ prediction of $R F_{i j} ({[{\hat{A}}_{A p}]}_{p \times q}^{0}), i \leq j \leq k$ .
11:: $I n p u t_{d a t a}$ ${A p_{i}}^{1} ⟵$ stack $[{[F_{o u t}]}_{i 1}^{1}, {[F_{o u t}]}_{i 2}^{1}, \dots, {[F_{o u t}]}_{i k}^{1}], 1 \leq i \leq n$ .
12:: Procedure: Meta-classifier Level 1
13:: ${D a t a p a r t i t i o n_{i}}^{1}$ = ${[T r a i n_{A p}]}_{i}^{1}$ ⋃ ${[T e s t_{A p}]}_{i}^{1}$ = ${[{\hat{A}}_{A p}]}^{1}, 1 \leq i \leq n$ .
14:: $D e e p c l f = D_{c l_{1}}, D_{c l_{2}}, \dots, D_{c l_{n}}$ is a set of n $D e e p_{c l a s s i f i e r}$ .
15:: For each ${[T r a i n_{A p}]}_{i}^{1}, 1 \leq i \leq n$ .
16:: ${D e e p c l f}_{j} ⟵$ fits $[D_{c l_{i}}, {[T r a i n_{A p}]}_{i}^{1}], 1 \leq j \leq n$ .
17:: Procedure Predictions:
18:: For each ${[T e s t_{A p}]}_{i}^{1} = {[{\hat{A}}_{A p}]}_{A p}^{1}, 1 \leq i \leq n$ .
19:: For each ${D e e p c l f}_{j} \in D e e p c l f$ .
20:: ${[{[F_{o u t}]}_{i j}^{1}]}^{A p} ⟵$ prediction of ${D e e p c l f}_{j} {[{\hat{A}}_{A p}]}_{A p}^{1}$ .
21:: The same procedure is carried out for non-approximated data as ${[{[F_{o u t}]}_{i j}^{1}]}^{N A p}$ .
22:: Procedure: Best-fit model
23:: $F i n a l_{m o d e l}$ = $m a x - v a l u e ({[{[F_{o u t}]}_{i j}^{1}]}^{A p}, {[{[F_{o u t}]}_{i j}^{1}]}^{N A p})$ .
24:: Model $⟵ D e e p_{c l a s s i f i e r}$ .
25:: $B e s t_{f i t} ⟵$ fit [Model, $F i n a l_{m o d e l}$ ].

4. Experimental Analysis According to Deep Ensemble Model

This section deals with the experimental analysis of the base classifiers that were implemented using online Google Colab. To implement the meta-classifier, we used the scikit-learn python library, which includes various machine learning algorithms. To evaluate the proposed deep ensemble approach to the predictions, the experiment was conducted on the two reduced datasets, the approximated dataset and the non-approximated dataset, of the groundwater level. To train the baseline classifiers in the teams, it is necessary to divide the training data using data partitioning methods. Our proposed model adopted the k-fold partitioning method, which randomly divides the training set into equal-size partitions. As the training data are of a total of 7200 objects, there were 12 partitions, each having 600 training objects. A team of base learners (random forest) was applied in each split in Level 0. The predictions from the base learner were applied to the DEBi-LSTM model of Level 1 to obtain the final classification accuracy of both the approximated and non-approximated datasets. The empirical analysis of the proposed ensemble model with the hyperparameters are provided in Table 7 and Table 8.

Performance Analysis Using Evaluation Metrics

The performance analysis of the proposed model was carried out using the following evaluation metrics: accuracy,

(A c)

(Equation (7)), precision,

(P r)

(Equation (8)), recall,

(R c)

(Equation (9)),

F_{1}

score, and

(F s)

(Equation (10)). These points are related to the evaluation metrics: True correct classification (TCc), True non-correct classification (TNCc), False correct classification (FCc), False non-correct classification (FNCc).

\begin{matrix} A c = \frac{T C c + T N C c}{T C c + T N C c + F C c + F N C c} \end{matrix}

(7)

\begin{matrix} P r = \frac{T C c}{T C c + F C c} \end{matrix}

(8)

\begin{matrix} R c = \frac{T C c}{T C c + F N C c} \end{matrix}

(9)

\begin{matrix} F s = 2 * \frac{P r \times R c}{P r + R c} \end{matrix}

(10)

5. Result and Discussion

The performance of the proposed researched model of DEM was verified by tuning the hyper-parameters, such as the batch size, number of epochs, number of memory units, number of dropouts, learning rate, and the optimizer, represented in Table 7. By comparing the training performance for the various choice options, the best hyper-parameters can be identified. Table 8 illustrates the classification performance for the approximated and non-approximated datasets. Each dataset was analysed independently with the related three batch sizes. At a batch size of 32, the approximated dataset had the highest classification accuracy of 91.26%. In the case of the non-approximated dataset, the classification accuracy was 89.26%. Moreover, the 12 splits provided the best classification accuracy. Having this information at hand will help support the argument that the approximated dataset is superior to the non-approximated dataset.

The training performance with the various batch sizes and iteration processes is shown in Table 9. Here, with a batch size of 32, the approximated dataset (AP) requires the lowest number of iterations to achieve the best training performance with respect to

A c

,

P r

,

R c

, and

F s

compared with the non-approximated dataset. Additionally, it is observed that the proposed DEM produces better performance with a batch size of 32, as represented in Figure 15.

Based on the classification accuracy obtained using the approximated and non-approximated data, along with the evaluation of the performance analysis, we had the confidence to select the approximated dataset with a batch size of 32 to obtain the prediction accuracy. The obtained prediction accuracy is listed in Table 10.

From Table 10, batch-32 produced 96.1% accuracy for the approximated dataset, where as 93.76% accuracy for the non-approximated dataset. The 12 splits provide the best classification accuracy. Thus, there is an increase in prediction accuracy of 2.34% for the simulated approximated dataset. For the approximated dataset the evaluation metrics such as

P r

,

R c

, and

F s

show their values as

93.87

,

85.14

, and

89.29

respectively, which is shown in Figure 16.

6. Comparative Analysis with Existing Research Methods

A comparative analysis has been carried out with the proposed model against the model designed by [35] using LSTM, also compared to a model proposed by [20] using Bagging-ensemble, and compared by ensemble model proposed by [18]. Table 11 has given the justification regarding the choice of the proposed model and demonstrated the prediction analysis with the approximated dataset where the proposed DEM model achieved the highest

P r = 93.87

,

R c = 85.14

, and

F s = 89.29

comparing with the existing deep learning, and an ensemble model. Here, the ensemble model [18] is designed with the conventional machine learning techniques so while the data size is huge and the model is working over the varieties of data characteristics, prediction performance is reduced. Among LSTM [35], and Bagging-ensemble [20], the prediction performance shows better in the case of the bagging-ensemble process because the prediction process is suppressed with the conventional LSTM for the overfitting problem.

Figure 17 demonstrates a comparative analysis using the accuracy with a split of 8. It is shown that the proposed DEM has achieved the comparatively highest accuracy, with a value of

96.1 %

, using the approximated dataset with a batch size 32 among the existing benchmakred deep learning ensemble models.

7. Managerial Implications of the Proposed Groundwater Prediction Process

Depletion of groundwater levels is a major concern for India, as

16 %

of the world’s total population depends on water. Moreover, irrigation purposes, domestic demand, and the industrial use of water impose over-exploitation on the country’s groundwater level [36]. This major water crisis might project severe water stress in the near future. Realizing the water crisis, the study has highlighted some of the concerning factors that are considered regarding their parametric value for implementation.

Natural water rechargeability is average.
Its utility for irrigation, industrial, and domestic purpose is high.
Population growth is huge.

Erratic water rechargeability and increasing water demand cause endangered water stress for the states of Northwest India, such as Gujarat, Madhya Pradesh, and Maharashtra, in Central India, and in Telangana, Tamil Nadu, and Puducherry in South India. The following Table 12 shows the past 8-year rainfall pattern for the above-mentioned states [37]. These states are recorded as critical to semi-critical zones for highly stressed GWL. By the proposed DEM process, this study has categorized the statewide GWL of India, represented in Figure 18. GWL are strongly correlated with annual rainfall. GWL is gradually reduced due to the excessive need for power generation, population demand, irrigation, and industrialization. The present study thus points out the states of India according to their predicted GWL. As the sources of natural water resources is becoming minimal due to various climatic changes, the implication of that water usage must be reduced by a measured amount. Thus, the seasonal uncertainty of rainfall could not affect the GWL as well as it would help to fulfill the essential demands for living purposes. This effort will help prevent the issue of the decline in the water table in India.

8. Conclusions and Future Work

The reliable prediction and accurate estimation of the depletion of groundwater level to refine the efficacy of water usage leads to a better, more sustainable water resource management system. The prediction capability has been esteemed over the substantial attribute. The attributes considered for the study depend on three contingency factors to represent the closeness of the attributes with groundwater sustainability as geological, topographic, and hydrological properties. The weather system in tropical regions such as India is controlled by convection and radiation and involves a non-linear association with rainfall, which is the main source of groundwater. Thus, the salient features are identified using the variance inflation factor from the collected data. As significant features always contribute better predictions for consistent data, they have been obtained using the randomized low-rank approximation technique for the reduced dataset. The consistent data are partitioned into training and testing sets, which are employed in the proposed deep ensemble model for the classification and prediction process. The approximated and non-approximated datasets are employed in the proposed model to check the classification accuracy, as discussed in Section 4. Section 5 discusses the analysis of the experimental results and exhibits an improved tuning of hyperparameters, with the meta-classifier showing that the approximated data have a 2% increase in classification accuracy compared to the non-approximated data. Similarly, the prediction accuracy was increased by 2.34% for the approximated data compared to the non-approximated data. Additionally, the proposed deep ensemble model has proved to have better classification and prediction capacities when it is compared against the benchmarked existing techniques as provided in Section 6. Since groundwater levels are strongly dependent on rainfall, and due to various climatic changes, annual rainfall may diminish or overflow, creating fluctuations in the GWL. Thus, the accuracies obtained by the proposed model are validated by the GWL prediction through 2022, and we found that the model prediction and the status updated by the government’s categorization are almost the same. Thus, the same is shown in Figure 18, where groundwater levels are categorized as critical, semi-critical, or safe. The proposed model can be improved using some optimization techniques to tune the hyperparameters to reduce the computation time and improve the accuracy.

Author Contributions

All authors equally contributed to the manuscript. Conception and design, material preparation, data collection, and analysis were performed by T.M. and A.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the School of Information Technology and Engineering, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The the authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

UNICEF	United Nations Children’s Fund
NASA	National Aeronautics and Space Administration
GRACE	Gravity Recovery and Climate Experiment
GWL	Groundwater level
AI	Artificial intelligence
RF	Random forest
ML	Machine learning
ANN	Artificial neural network
RMSE	Root-mean-square error
SVR	Support-vector regression
ANFIS	Adaptive neuro-fuzzy inference system
DL	Deep learning
LSTM	Long short-term memory
SARIMA	Seasonal autoregressive integrated moving average
SVM	Support-vector machine
BRT	Boosted regression trees
CART	Classification and regression tree
GRU	Gated recurrent units
MLP	Multi-layer perceptron
RNN	Recurrent neural network
RLRA	Randomized low-rank approximation
VIF	Variance inflation factor
DEM	Deep ensemble model
DEBi-LSTM	Double-edge bi-directional LSTM

References

Water Scarcity. Available online: https://www.unicef.org/wash/water-scarcity (accessed on 20 May 2022).
Space Applications Centre, ISRO. Desertification and Land Degradation Atlas of Selected Districts of India (Based on IRS LISS III data of 2011–13 and 2003–05); Space Applications Centre (ISRO): Ahmedabad, India, 2016; pp. 1–219. [Google Scholar]
Chindarkar, N.; Grafton, R.Q. India’s depleting groundwater: When science meets policy. Asia Pac. Policy Stud. 2019, 6, 108–124. [Google Scholar] [CrossRef]
Rodell, M.; Houser, P.R.; Jambor, U.; Gottschalck, J.; Mitchell, K.; Meng, C.-J.; Arsenault, K.; Cosgrove, B.; Radakovich, J.; Bosilovich, M.; et al. The Global Land Data Assimilation System. Bull. Am. Meteorol. Soc. 2004, 85, 381–394. [Google Scholar] [CrossRef]
India State Map, List of States in India. Available online: https://www.whereig.com/india/states/ (accessed on 20 May 2022).
Knoll, L.; Breuer, L.; Bach, M. Large scale prediction of groundwater nitrate concentrations from spatial data using machine learning. Sci. Total Environ. 2019, 668, 1317–1327. [Google Scholar] [CrossRef] [PubMed]
Majumdar, S.; Smith, R.; Butler, J.J., Jr.; Lakshmi, V. Groundwater withdrawal prediction using integrated multitemporal remote sensing data sets and machine learning. Water Resour. Res. 2020, 56, e2020WR028059. [Google Scholar] [CrossRef]
Adiat, K.A.N.; Ajayi, O.F.; Akinlalu, A.A.; Tijani, I.B. Prediction of groundwater level in basement complex terrain using artificial neural network: A case of Ijebu-Jesa, southwestern Nigeria. Appl. Water Sci. 2020, 10, 1–14. [Google Scholar] [CrossRef]
Hussein, E.A.; Thron, C.; Ghaziasgar, M.; Bagula, A.; Vaccari, M. Groundwater prediction using machine-learning tools. Algorithms 2020, 13, 300. [Google Scholar] [CrossRef]
Banadkooki, F.B.; Ehteram, M.; Ahmed, A.N.; Teo, F.Y.; Fia, C.M.; Afan, H.A.; Sapitang, M.; Shafie, A.E. Enhancement of Groundwater-Level Prediction Using an Integrated Machine Learning Model Optimized by Whale Algorithm. Nat. Resour. Res. 2020, 29, 3233–3252. [Google Scholar] [CrossRef]
Huang, X.; Gao, L.; Crosbie, R.S.; Zhang, N.; Fu, G.; Doble, R. Groundwater recharge prediction using linear regression, multi-layer perception network, and deep learning. Water 2019, 11, 1879. [Google Scholar] [CrossRef]
Chen, Y.; Chen, W.; Chandra Pal, S.; Saha, A.; Chowdhuri, I.; Adeli, B.; Janizadeh, S.; Dineva, A.A.; Wang, X.; Mosavi, A. Evaluation efficiency of hybrid deep learning algorithms with neural network decision tree and boosting methods for predicting groundwater potential. Geocarto Int. 2021, 37, 1–21. [Google Scholar] [CrossRef]
Kochhar, A.; Singh, H.; Sahoo, S.; Litoria, P.K.; Pateriya, B. Prediction and forecast of pre-monsoon and post-monsoon groundwater level: Using deep learning and statistical modelling. Model. Earth Syst. Environ. 2022, 8, 2317–2329. [Google Scholar] [CrossRef]
Sun, J.; Hu, L.; Li, D.; Sun, K.; Yang, Z. Data-driven models for accurate groundwater level prediction and their practical significance in groundwater management. J. Hydrol. 2022, 608, 127630. [Google Scholar] [CrossRef]
Oyedele, A.A.; Ajayi, A.O.; Oyedelec, L.O.; Bello, S.A.; Jimoh, K.O. Performance evaluation of deep learning and boosted trees for cryptocurrency closing price prediction. Expert Syst. Appl. 2022, 213, 119233. [Google Scholar] [CrossRef]
Jiménez-Mesa, C.; Ramírez, J.; Suckling, J.; Vöglein, J.; Levin, J.; Górriz, J.M.; DIAN, D.I.A.N. Deep Learning in current Neuroimaging: A multivariate approach with power and type I error control but arguable generalization ability. arXiv 2021, arXiv:2103.16685. [Google Scholar] [CrossRef]
Mosavi, A.; Hosseini, F.S.; Choubin, B.; Abdolshahnejad, M.; Gharechaee, H.; Lahijanzadeh, A.; Dineva, A.A. Sustainability prediction of groundwater hardness using ensemble machine learning models. Water 2008, 12, 2770. [Google Scholar] [CrossRef]
Yin, J.; Medellín-Azuara, J.; Escriva-Bou, A.; Liu, Z. Bayesian machine learning ensemble approach to quantify model uncertainty in predicting groundwater storage change. Sci. Total Environ. 2021, 769, 144715. [Google Scholar] [CrossRef]
Jiang, F.; Yu, X.; Du, J.; Gong, D.; Zhang, Y.; Peng, Y. Ensemble learning based on approximate reducts and bootstrap sampling. Inf. Sci. 2021, 547, 797–813. [Google Scholar] [CrossRef]
Mosavi, A.; Hosseini, F.S.; Choubin, B.; Goodarzi, M.; Dineva, A.A.; Rafiei Sardooi, E. Ensemble boosting and bagging based machine learning models for groundwater potential prediction. Water Resour. Manag. 2021, 35, 23–37. [Google Scholar] [CrossRef]
Lee, J.; Hong, H.; Song, J.M.; Yeom, E. Neural network ensemble model for prediction of erythrocyte sedimentation rate (ESR) using partial least squares regression. Sci. Rep. 2022, 12, 1–13. [Google Scholar] [CrossRef]
Li, Y.; Pan, Y. A novel ensemble deep learning model for stock prediction based on stock prices and news. Int. J. Data Sci. Anal. 2022, 13, 139–149. [Google Scholar] [CrossRef]
Ngo, N.T.; Pham, A.D.; Truong, T.T.H.; Truong, N.S.; Huynh, N.T.; Pham, T.M. An ensemble machine learning model for enhancing the prediction accuracy of energy consumption in buildings. Arab. J. Sci. Eng. 2022, 47, 4105–4117. [Google Scholar] [CrossRef]
Kumar, N.K.; Schneider, J. Literature survey on low rank approximation of matrices. Linear Multilinear Algebra 2017, 65, 2212–2244. [Google Scholar] [CrossRef]
Sapp, B.J. Randomized Algorithms for Low-Rank Matrix Decomposition; Computer and Information Science, University of Pennsylvania: Philadelphia, PA, USA, 2011; pp. 1–43. [Google Scholar]
Eckart, C.; Young, G. The approximation of one matrix by another of lower rank. Psychometrika 1936, 1, 211–218. [Google Scholar] [CrossRef]
Halko, N.; Martinsson, P.G.; Tropp, J.A. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 2011, 53, 217–288. [Google Scholar] [CrossRef]
Mohammed, A.; Kora, R. An effective ensemble deep learning framework for text classification. J. King Saud-Univ. Inf. Sci. 2022, 34, 8825–8837. [Google Scholar] [CrossRef]
Chen, W.; Lei, X.; Chakrabortty, R.; Pal, S.C.; Sahana, M.; Janizadeh, S. Evaluation of different boosting ensemble machine learning models and novel deep learning and boosting framework for head-cut gully erosion susceptibility. J. Environ. Manag. 2021, 284, 112015. [Google Scholar] [CrossRef] [PubMed]
Konstantinov, A.V.; Utkin, L.V. Interpretable machine learning with an ensemble of gradient boosting machines. Knowl.-Based Syst. 2021, 222, 106993. [Google Scholar] [CrossRef]
Ma, M.; Liu, C.; Wei, R.; Liang, B.; Dai, J. Predicting machine’s performance record using the stacked long short-term memory (LSTM) neural networks. J. Appl. Clin. Med. Phys. 2022, 23, e13558. [Google Scholar] [CrossRef]
Ground Water Data Access. Available online: http://cgwb.gov.in/GW-data-access.html (accessed on 20 May 2022).
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Hochreiter, S. Untersuchungen zu Dynamischen Neuronalen Netzen. Diploma Thesis, TU Munich, Munich, Germany, 1991. [Google Scholar]
Park, C.; Chung, I.M. Evaluating the groundwater prediction using LSTM model. J. Korea Water Resour. Assoc. 2020, 53, 273–283. [Google Scholar] [CrossRef]
India Groundwater: A Valuable but Diminishing Resource. Available online: https://www.worldbank.org/en/news/feature/2012/03/06/india-groundwater-critical-diminishing (accessed on 20 May 2022).
India Water Resources Information System. Available online: https://indiawris.gov.in/wris/#/groundWater (accessed on 20 May 2022).

Figure 1. State–wise groundwater level fluctuation (2010–2020).

Figure 2. Heatmap of reduced information system and approximated information system.

Figure 3. Similarity check against reduced information system and approximated information system.

Figure 4. Generalized architecture of ensemble learning process.

Figure 5. Four divisions of India.

Figure 6. Properties of input variables.

Figure 7. Attribute reduction.

Figure 8. Sample dataset.

Figure 9. Approximated dataset using RLRA.

Figure 10. Heat map for sample and approximated dataset.

Figure 11. Similarity between sample dataset and approximated dataset.

Figure 12. Stacking mechanism of proposed model.

Figure 13. Architecture of proposed ensemble model.

Figure 14. Design of proposed DEBi-LSTM.

Figure 15. Training performance with a batch size of 32.

Figure 16. Testing performance with batch size 32.

Figure 17. Comparative analysis with deep learning [35], benchmarked ensemble models [18,20] and proposed model with approximated dataset.

Figure 18. DEM predicts state-wise GWL of India.

Table 1. Information System.

Cl ( $a_{1}$ ) (ppm)	Mg ( $a_{2}$ ) (ppm)	Mn ( $a_{3}$ ) (ppm)	Fe ( $a_{4}$ ) (ppm)	Ca ( $a_{5}$ ) (ppm)	Zn ( $a_{6}$ ) (ppm)	Cu ( $a_{7}$ ) (ppm)
0.22	31.33	5.3	19.98	38.2	2.67	2
0.67	42.44	4.7	6.01	35.8	4.97	3
0.22	43.13	9.4	8.89	17	3.11	2
0.35	22.45	9.5	12.25	11.3	6.34	1
0.7	52.87	9.8	3.2	32.5	4.81	4
0.13	43.36	0.9	17.1	26.2	7.32	4
0.8	29.19	9.2	8.5	27.1	2.64	3
0.44	34.56	2.7	6.62	15.6	1.37	2
0.1	37.23	4.4	16.21	15.3	2.22	1
0.9	41.78	4.1	1.98	11.02	3.71	2

Table 2. VIF number with the corresponding attribute values.

Summary	Attribute
Summary	$a_{1}$	$a_{2}$	$a_{3}$	$a_{4}$	$a_{5}$	$a_{6}$	$a_{7}$
Multiple R	0.927	0.681	0.425	0.433	0.819	0.399	0.895
R Square	0.860	0.464	0.181	0.187	0.671	0.159	0.802
Adjusted R Square	0.438	−0.429	−0.638	−0.300	0.561	0.039	0.207
Standard Error	0.224	10.686	4.369	6.059	6.135	1.926	1.006
VIF value	7.120	1.866	1.221	1.231	3.037	1.190	5.047

Table 3. Reduced information system.

$a_{2}$	$a_{3}$	$a_{4}$	$a_{5}$	$a_{6}$
31.33	5.3	19.98	38.2	2.67
42.44	4.7	6.01	35.8	4.97
43.13	9.4	8.89	17	3.11
22.45	9.5	12.25	11.3	6.34
52.87	9.8	3.2	32.5	4.81
43.36	0.9	17.1	26.2	7.32
29.19	9.2	8.5	27.1	2.64
34.56	2.7	6.62	15.6	1.37
37.23	4.4	16.21	15.3	2.22
41.78	4.1	1.98	11.02	3.71

Table 4. Approximate information system.

$a_{2}$	$a_{3}$	$a_{4}$	$a_{5}$	$a_{6}$
31.48	5.61	20.55	37.19	3.12
43.15	5.48	6.42	36.05	4.55
42.70	10.39	8.49	15.62	3.57
23.11	10.97	12.04	13.25	5.86
54.37	10.02	3.28	31.38	4.62
42.65	1.32	16.96	28.22	7.70
30.28	9.55	8.82	25.56	3.57
33.14	2.55	6.39	18.89	1.95
39.07	4.60	15.78	16.38	3.17
43.00	4.71	1.46	10.31	2.68

Table 5. Notation table.

Contingency Factors	Attribute Names	Notation	Possible Range	Max Value
Geological	Depth of water	$a_{2}$	(0–4.21)	4.21
	extractable GW resource	$a_{5}$	(0.005–66.88)	66.88
	Total GW extraction	$a_{6}$	(0.005–46.03)	46.03
	GW availability	$a_{7}$	(0.002–21.53)	21.53
	Aquatic veg land	$a_{20}$	(0–228,174)	228,174
	Cultivable land	$a_{21}$	(1.5–25,503.7)	25,503.7
	Nitrogen	$a_{22}$	(1–2.98)	2.98
	Phosphorus	$a_{23}$	(1.02–2.54)	2.54
	Potassium	$a_{24}$	(1.03–2.69)	2.69
	Organic carbon	$a_{25}$	(1.12–2.99)	2.99
	Boron	$a_{26}$	(1–2)	2
	Copper	$a_{27}$	(1.38–2)	2
	Iron	$a_{28}$	(1.34–2)	2
	Manganese	$a_{29}$	(1.28–2)	2
	Sulpher	$a_{30}$	(1–2)	2
	Zinc	$a_{31}$	(1.38–1.99)	1.99
	GWL decision	$a_{32}$	(1–3)	3
Topographic	Land degradation	$a_{12}$	(42–18,034,066)	18,034,066
	Wasteland	$a_{13}$	(81.27–175,697)	175,697
	Geo area	$a_{16}$	(3000–34,223,900)	34,223,900
	Forest ecosystem	$a_{17}$	(2591–84,147)	84,147
	Wetland	$a_{18}$	(350–3,474,950)	3,474,950
Hydrological	Rainfall	$a_{1}$	(351.8–4489.5)	4489.5
	GW recharge	$a_{3}$	(0.01–72.2)	72.2
	Natural GW discharge	$a_{4}$	(0.001–5.32)	5.32
	GW temperature	$a_{8}$	(0–39)	39
	GW dissolved O2	$a_{9}$	(0–9.1)	9.1
	GW pH	$a_{10}$	(0–9.7)	9.7
	GW nitrate	$a_{11}$	(0–50)	50
	Soil depth	$a_{14}$	(25–50)	50
	soil pH	$a_{15}$	(5–8.4)	8.4
	Open water	$a_{19}$	(242–1,150,755)	1,150,755

Table 6. VIF calculation for 32-attributes.

Attribute	VIF	Attribute	VIF	Attribute	VIF	Attribute	VIF
$a_{1}$	1.2	$a_{9}$	2.99	$a_{17}$	3	$a_{25}$	1.01
$a_{2}$	3.8	$a_{10}$	2.48	$a_{18}$	10.2	$a_{26}$	1.63
$a_{3}$	1.6	$a_{11}$	1.34	$a_{19}$	9.25	$a_{27}$	7.89
$a_{4}$	1.04	$a_{12}$	9.7	$a_{20}$	2.25	$a_{28}$	4.75
$a_{5}$	8.27	$a_{13}$	1	$a_{21}$	2.46	$a_{29}$	4.35
$a_{6}$	7.78	$a_{14}$	2.22	$a_{22}$	3.59	$a_{30}$	8.66
$a_{7}$	5.07	$a_{15}$	1.79	$a_{23}$	3.55	$a_{31}$	3.32
$a_{8}$	1.35	$a_{16}$	9.93	$a_{24}$	5.12	$a_{32}$	1.2

Table 7. Choice and best choice of hyper-parameters.

Hyper-Parameters	Choice Option	Best Choice
Batch size	(64, 32, 20)	32
Number of epochs	(2, 4, 6, 8)	2
Number of memory units	(256, 128) (128, 64) (64, 32) (32, 10)	(128, 64)
Number of dropouts	(0.2, 0.1)	0.1
Learning rate	0.01	0.01
Optimizer	ADAM, RmsProp	ADAM

Table 8. Classification accuracy of approximated and non-approximated data on DEM.

Split	Approximated Dataset			Non-Approximated Dataset
Split	Batch-64	Batch-32	Batch-20	Batch-64	Batch-32	Batch-20
1	78.51	81.07	72.33	74.17	77.42	70.12
2	79	81.36	73.87	74.78	77.57	70.46
3	79.76	81.69	75.29	75.51	78.21	70.93
4	80.42	82.11	76.81	77.03	79.15	71.47
5	81.19	82.9	78.24	78.68	79.82	72.22
6	83.24	84.18	79.47	80.74	81.93	74.38
7	84.49	85.73	80.32	82.37	82.76	75
8	84.93	87.1	80.7	82.9	85.37	76.4
9	85.57	88.56	82.66	84.14	87.14	78.61
10	86.14	90.2	84.1	84.39	88.63	80.14
11	86.92	90.78	84.86	85.55	89.02	82.44
12	87.34	91.26	85.13	85.82	89.26	83.03

Table 9. Training iteration for approximated and non-approximated datasets.

Batch Size	Iteration		Accuracy		Precision		Recall		$F_{1}$ -Score
Batch Size	NAP	AP	NAP	AP	NAP	AP	NAP	AP	NAP	AP
64	210	160	69.2	76.9	33.33	43.75	44.4	5	34.37	46.43
32	190	130	89.26	91.26	82.49	91.67	62.5	97.22	71.12	94.36
20	250	140	78.4	84.6	56.25	83.33	59.72	84.72	50.2	77.7

Table 10. Prediction accuracy for approximated and non-approximated dataset.

Split	Approximated Dataset			Non-Approximated Dataset
Split	Batch-64	Batch-32	Batch-20	Batch-64	Batch-32	Batch-20
1	84.77	89.14	78.96	80.47	86.31	77.26
2	85.37	89.64	79.25	80.72	87.53	78.1
3	86.48	90.27	80.76	82.25	89.03	78.59
4	87.13	92.96	83.31	83.39	89.86	80.73
5	87.79	93.57	84.7	84.67	90.7	82.91
6	88.18	94.81	86.24	85.11	91.48	83.63
7	89.57	95.73	86.59	86.52	92.89	84.47
8	90.28	96.1	87.66	87.41	93.76	85.3

Table 11. Comparative analysis with deep learning, benchmarked ensemble models and proposed model.

Split	Proposed DEM				LSTM [35]				Bagging RF [20]				Ensemble Model [18]
Split	Ac	Pr	Rc	Fs	Ac	Pr	Rc	Fs	Ac	Pr	Rc	Fs	Ac	Pr	Rc	Fs
1	89.14	87.76	80.23	83.83	78.1	79.91	39.6	52.96	77.10	82.33	44.61	57.87	75.3	80.47	37.18	50.86
2	89.64	88.36	80.71	84.36	77.5	81.47	43.32	56.56	78.60	82.71	42.13	55.82	76.6	81.13	37.29	51.10
3	90.27	88.9	81.4	84.98	80	81.87	41.92	55.45	81.60	83.11	43.47	57.08	79	81.36	40.83	54.37
4	92.96	90.32	82.17	86.05	80.9	82.27	47.66	60.36	85.80	83.43	53.32	65.06	79.7	81.78	46.51	59.30
5	93.57	91.75	83.52	87.44	83.4	82.45	48.96	61.44	88.80	83.97	60.44	70.29	83.9	82.25	45.77	58.81
6	94.81	92.5	84.04	88.07	84.4	82.78	54.88	66.00	89.70	84.57	60.72	70.69	84.7	82.6	54.31	65.53
7	95.73	93.47	84.67	88.85	86.4	83.05	54.8	66.03	90.40	85.69	66.34	74.78	85.5	82.83	57.31	67.75
8	96.1	93.87	85.14	89.29	89.5	83.12	56.54	67.30	90.60	85.82	70	77.11	87.9	84	55.48	66.82

Table 12. Rainfall pattern of the past 8 years (2015–2022).

States	2015	2016	2017	2018	2019	2020	2021	2022
UP	154.45	205.43	216.29	211.98	383.38	188.40	251.54	135.17
UK	309.84	350.36	394.64	396.79	383.38	330.17	434.58	361.81
HR	123.77	102.13	117.62	129.82	100.00	120.03	170.22	158.17
CH	123.77	102.13	117.62	129.82	100.00	120.03	170.22	158.17
DL	171.09	126.36	128.12	135.99	124.86	121.37	243.08	131.62
PB	154.21	123.82	142.59	172.58	177.15	140.67	157.32	176.68
HP	315.99	250.79	310.89	294.30	302.41	230.62	244.42	290.55
RJ	134.29	141.63	120.68	102.63	163.60	109.21	141.57	161.17
MP	249.84	296.68	199.21	221.62	352.47	253.78	257.26	282.52
GJ	155.71	157.75	208.22	119.43	267.79	260.02	187.96	242.78
MH	219.88	309.17	274.66	242.42	371.34	293.31	324.17	308.70
TG	214.47	271.78	211.62	232.56	278.32	278.32	287.78	312.38
TN	310.20	136.38	120.27	146.16	234.06	234.06	352.22	156.83
PY	437.38	170.13	218.73	226.75	318.59	318.59	536.10	156.83

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Manna, T.; Anitha, A. Deep Ensemble-Based Approach Using Randomized Low-Rank Approximation for Sustainable Groundwater Level Prediction. Appl. Sci. 2023, 13, 3210. https://doi.org/10.3390/app13053210

AMA Style

Manna T, Anitha A. Deep Ensemble-Based Approach Using Randomized Low-Rank Approximation for Sustainable Groundwater Level Prediction. Applied Sciences. 2023; 13(5):3210. https://doi.org/10.3390/app13053210

Chicago/Turabian Style

Manna, Tishya, and A. Anitha. 2023. "Deep Ensemble-Based Approach Using Randomized Low-Rank Approximation for Sustainable Groundwater Level Prediction" Applied Sciences 13, no. 5: 3210. https://doi.org/10.3390/app13053210

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep Ensemble-Based Approach Using Randomized Low-Rank Approximation for Sustainable Groundwater Level Prediction

Abstract

1. Introduction

1.1. Review on Groundwater Level Prediction Using Machine Learning Technique

1.2. Review on Groundwater Level Prediction Using a Deep Learning Technique

1.3. Review on Groundwater Level Prediction Using Ensemble Learning

2. Background Fundamentals

2.1. Variance Inflation Factor (VIF)

2.2. Randomized Low-Rank Approximations

2.3. Ensemble Learning

2.3.1. Stacking Ensemble Learning

3. Proposed Research Methodology on Groundwater Level Prediction

3.1. Study Area Investigation and Data Pre-Processing

3.2. Feature Selection Using Multi Collinearity Test

3.3. Attribute Approximation Using Randomized Low-Rank Approximation Method

3.4. Proposed Deep Ensemble Model

3.4.1. Proposed Deep Ensemble Architecture

3.4.2. Double-Edged Bi-Directional LSTM

3.4.3. Proposed Training Process

4. Experimental Analysis According to Deep Ensemble Model

Performance Analysis Using Evaluation Metrics

5. Result and Discussion

6. Comparative Analysis with Existing Research Methods

7. Managerial Implications of the Proposed Groundwater Prediction Process

8. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI