Prediction of Groundwater Quality Index Using Classification Techniques in Arid Environments

Derdour, Abdessamed; Abdo, Hazem Ghassan; Almohamad, Hussein; Alodah, Abdullah; Al Dughairi, Ahmed Abdullah; Ghoneim, Sherif S. M.; Ali, Enas

doi:10.3390/su15129687

Open AccessArticle

Prediction of Groundwater Quality Index Using Classification Techniques in Arid Environments

¹

Artificial Intelligence Laboratory for Mechanical and Civil Structures and Soil, University Center of Naama, P.O. Box 66, Naama 45000, Algeria

²

Laboratory for the Sustainable Management of Natural Resources in Arid and Semi-Arid Zones, University Center of Naama, P.O. Box 66, Naama 45000, Algeria

³

Geography Department, Faculty of Arts and Humanities, Tartous University, Tartous P.O. Box 2147, Syria

⁴

Department of Geography, College of Arabic Language and Social Studies, Qassim University, Buraydah 51452, Saudi Arabia

⁵

Department of Civil Engineering, College of Engineering, Qassim University, Buraydah 51452, Saudi Arabia

⁶

Department of Electrical Engineering, College of Engineering, Taif University, Taif 21944, Saudi Arabia

⁷

Faculty of Engineering and Technology, Future University in Egypt, New Cairo 11835, Egypt

^*

Author to whom correspondence should be addressed.

Sustainability 2023, 15(12), 9687; https://doi.org/10.3390/su15129687

Submission received: 23 May 2023 / Revised: 10 June 2023 / Accepted: 12 June 2023 / Published: 16 June 2023

(This article belongs to the Special Issue Water Quality Assessment and Pollution Analysis of Surface Water, Wastewater and Groundwater)

Download

Browse Figures

Versions Notes

Abstract

:

Assessing water quality is crucial for improving global water resource management, particularly in arid regions. This study aims to assess and monitor the status of groundwater quality based on hydrochemical parameters and by using artificial intelligence (AI) approaches. The irrigation water quality index (IWQI) is predicted by using support vector machine (SVM) and k-nearest neighbors (KNN) classifiers in Matlab’s classification learner toolbox. The classifiers are fed with the following hydrochemical input parameters: sodium adsorption ratio (SAR), electrical conductivity (EC), bicarbonate level (HCO₃), chloride concentration (Cl), and sodium concentration (Na). The proposed methods were used to assess the quality of groundwater extracted from the desertic region of Adrar in Algeria. The collected groundwater samples showed that 9.64% of samples were of very good quality, 12.05% were of good quality, 21.08% were satisfactory, and 57.23% were considered unsuitable for irrigation. The IWQI prediction accuracies of the classifiers with the standardized, normalized, and raw data were 100%, 100%, and 90%, respectively. The cubic SVM with the normalized data develops the highest prediction accuracy for training and testing samples (94.2% and 100%, respectively). The findings of this work showed that the multiple regression model and machine learning could effectively assess water quality in desert zones for sustainable water management.

Keywords:

ground water; water quality; IWQI; artificial intelligence; support vector machine; k-nearest neighbors; environment

1. Introduction

Groundwater is a crucial resource for many different purposes, including drinking water, agriculture, and industrial uses [1,2]. Assessing and monitoring the quality of groundwater, however, has consistently been a major challenge that needs to be overcome to ensure the long-term sustainability of already depleted water resources. While some groundwater is a renewable resource that can be replenished through rainwater and snowmelt, it can be depleted if consumed faster than naturally recharged [3,4]. On the other hand, non-renewable groundwater resources that have been stored for thousands of years are finite and can be drained if overexploited. Farmers in arid areas usually irrigate their crops with groundwater more than other sources such as surface water. Therefore, improvements in water resource quality may significantly reduce irrigation treatment costs and increase agricultural yields [5,6]. However, groundwater quality in arid areas is a complex and dynamic issue [7]. Groundwater in arid regions is often degraded due to various factors, including drought, overexploitation, irrigation fertilizers, geology, wastewater discharge, and climate change. This can lead to declining water quantity and quality, with severe consequences for people’s health and livelihoods as a result [8,9].

Nevertheless, evaluating water quality involves several challenges, including massive sample collection, laboratory testing, and data processing. These are typically time-consuming operations with high costs in terms of equipment, chemical solutions, and human resources [10]. Monitoring and assessing groundwater quality regularly is crucial to ensure it remains safe for human consumption and irrigation uses [11]. It should be noted that laboratories in several countries have suffered from a lack of chemical analysis reactors due to the COVID-19 pandemic in the last three years [12]. Therefore, it is necessary to find cost-effective and time-efficient methods to assess water quality precisely and overcome the circumstances above. To do so, there are various methods, including water quality sampling and analysis, remote sensing, and modeling. Recently, there has been an increasing interest in artificial intelligence (AI) and its potential applications for water quality monitoring and management [10,13,14]. AI has been used for various water quality-related tasks, including data collection, analysis, and decision-making. Many AI technologies can be used for water quality applications [10,15]. The most common type of AI technology is machine learning, which can be used for data classification and prediction tasks [16]. Other types of AI technologies that have been applied to water quality include rule-based systems, evolutionary computation, and artificial neural networks. The use of AI techniques is prevalent in different water-related studies around the world, including random forests (RF), artificial neural networks (ANN), and support vector machines (SVM) [17,18,19,20,21,22]. Several studies have focused on assessing and monitoring groundwater quality in arid regions, particularly in the context of irrigation water quality management. The availability and suitability of groundwater for irrigation and drinking purposes are critical factors for enhancing agricultural productivity in arid areas. Researchers have employed various artificial intelligence (AI) techniques to predict the irrigation water quality index (IWQI) based on specific input parameters [23,24]. Abdel-Fattah et al. [25] used a neural network to evaluate the appropriateness of quality of water for irrigation in Egypt. In the Algerian Illizi region, Mokhtar et al. [26] forecast irrigation water quality indices using machine learning models and regression analyses. Ahmed et al. [27] predicted the irrigation water quality index for irrigation purposes in Bangladesh by using ANN and SVR models. M’nassri et al. [28] estimated IWQI using ANN and multiple linear regression (MLR) models in Sidi El Hani in Tunisia. Haider et al. [29] proposed a hierarchical-based fuzzy technique to address the uncertainties associated with the absence of long observations and inaccurate measurements of groundwater data in Qassim, Saudi Arabia. These studies demonstrate the potential of AI in water quality assessment, providing valuable tools for stakeholders and decision-makers to evaluate groundwater suitability for irrigation and drinking purposes. AI technologies offer several potential benefits for water quality monitoring and management. First, AI can automate repetitive tasks such as data collection and analysis. As a result, it can free up staff time for other fieldwork or public outreach activities. Second, AI can provide decision support by generating recommendations or alerts based on data analysis. Finally, AI can help to better understand complex water quality problems by providing insights that would not be possible with traditional methods. Despite the potential benefits of using AI for water quality applications, some challenges need to be addressed. First, there is a lack of standardization among AI tools and methods, which makes it difficult to compare results across studies [16]. Second, using AI requires access to high-quality datasets, which may not be available in some areas [30].

To address these challenges, this research aims to assess and monitor groundwater quality in the arid region of Adrar, Algeria, utilizing AI algorithms to predict IWQI based on key input parameters and subsequently support effective water resource management decisions for irrigation and drinking purposes in the study area. This will make it easier to identify problems early and take steps to protect this vital resource. Artificial intelligence algorithms are employed to predict the irrigation water quality index (IWQI) of the study area based on the input parameters of water electrical conduction (EC), sodium concentration (Na), sodium adsorption ratio (SAR), chloride concentration (Cl), and the percentage of bicarbonate (HCO₃), and the output parameter is IWQI. These parameters were computed based on the analysis of 166 samples collected from an arid desert. Using the study’s findings, farmers in arid areas can boost agricultural productivity through enhanced irrigation water quality management, and policymakers and stakeholders can make reasonable choices on water resource management. The implications of the proposed methods are two-fold. Firstly, for irrigation purposes, the IWQI provides a valuable tool for stakeholders and farmers to assess the suitability of groundwater for agricultural uses. By comparing the predicted IWQI with the recommended standards, decision-makers can determine whether the groundwater is suitable for irrigation or if additional treatment measures are necessary. This information is essential for optimizing crop production, minimizing the negative impact of poor water quality on agricultural yields, and ensuring sustainable water resource management. Secondly, in terms of drinking water, the predicted IWQI allows for an evaluation of the groundwater’s suitability for human consumption. By comparing the calculated IWQI with the World Health Organization (WHO) drinking water standards, decision-makers can also assess the potential health risks associated with the consumption of the groundwater. This information is crucial for ensuring the provision of safe drinking water to communities, as it helps identify the need for appropriate treatment measures or the implementation of alternative water sources. By providing stakeholders and decision-makers with a reliable and efficient tool to evaluate groundwater quality, our study empowers them to make informed decisions regarding water resource management and safeguarding the health and well-being of communities reliant on groundwater for irrigation and drinking purposes.

2. Materials and Methods

2.1. Study Area

The investigated area, located in the southwestern part of Algeria between 5°38′38″ W and 2°6′30″ E latitudes and 24°53′30″ N and 31°42′27″ N longitudes, covers a total area of 297,790 km², which constitutes approximately 18% of the area of Algeria. Figure 1 illustrates the location of the study area. The study area belongs to the Algerian Sahara, one of the world’s driest and hottest areas [31]. The summers are long and hot, and the winters are short and warm. Adrar is characterized by scarce rainfall, where the annual average is about 15 mm yearly, and the evaporation rate is about 4500 mm yearly [32]. Temperatures in the summer are consistently high and can exceed 45 °C [33]. The study area often experiences a scorching, dusty southerly wind called the Sirocco in the summer [34]. During this time, the northern part of the country can be soaked for as long as 40 days [21]. Geographically, Adrar is bounded by Erg Chech in the west, Tadmait in the east, the occidental Erg in the north, and Tanezrouft in the south. This area comprises four natural Saharan regions: Gourara, Touat, Tidikelt, and Tanezrouft. In the study region, the hydrographic network is represented by Wadi Messaoud, which is the continuity of Wadi Saoura towards the north (the latter drained from the Saharan Atlas), and Oued Tillia and its tributaries, which drain the plateau of Tademaït towards the southeast at the level of Zaouïet Kounta from Baamer to Reggane. At the eastern end of the Touat depression, an intense hydrographic network of small distinct ravines drains the plateau of Tademait. Adrar is mainly an agricultural region characterized by its traditional irrigation system, named “foggara” [35]. Hydrogeologically, the study area is part of the transboundary Northern Saharan Aquifer System (SASS) [36]. Many of these deposits are deeply buried, and their thickness can reach at least 2000 m [37]. In addition to siliciclastic sandstone, some parts of the aquifer are karstic and evaporite [36]. Aquifers in this area tend to be highly productive. Over the centuries, foggaras (water galleries) have exploited the aquifer of Continental Intercalaire (CI) around its edges in the Sahara [38].

2.2. Data Collection

For this study, 166 water samples from the research area were provided by the national water resources agency (ANRH). The samples were provided from boreholes and the foggara system. The data for each sample consisted mainly of chemical elements represented by pH, cations such as magnesium (Mg), calcium (Ca), sodium (Na), and potassium (K), and anions such as chloride (Cl), sulfate (SO₄), and bicarbonate (HCO₃). Pollution indicators such as nitrate (NO₃) and other physical elements are represented by electrical conductivity (EC) and temperature (°C). The assessment of the suitability of groundwater in the region of Adrar for irrigation was established with the international standard provided by the Food and Agriculture Organization (FAO). Therefore, this database adequately represents groundwater quality in the study area. A summary of the collected data is given in Table 1.

2.3. Irrigation Water Quality Criteria

2.3.1. Suitability Indices for Irrigation

A great deal of variation exists in irrigation water’s quality depending on the type and quantity of its salts. Groundwater irrigation waters contain NaCl as their predominant salt [39]. Consequently, the sodium adsorption ratio (SAR) played an important role in determining the effects of the application of irrigation water on soil structural behavior in earlier research [40]. The USDA’s salinity lab defined SAR as [41]:

S A R = \frac{{N a}^{+}}{\sqrt{\frac{({C a}^{2 +} + {M g}^{2 +})}{2}}}

(1)

where concentrations of cations (

{N a}^{+}, {C a}^{2 +} + {M g}^{2 +})

are expressed in milliequivalents per liter (mEq/L).

2.3.2. Irrigation Water Quality Index (IWQI)

Depending on the crop pattern, soil type, and climate, irrigation quality requirements may vary from one field to another [42]. Hence, a spatially distributed assessment of individual quality parameters is possible through irrigation water quality mapping. GIS can therefore be used to visualize such maps and make comparative evaluations. Nowadays, groundwater has been assessed for its suitability for irrigation and drinking purposes using the Irrigation Water Quality Index (IWQI) in many regions worldwide [27,43]. For this study, the IWQI model, developed by Meireles et al. [44], was used to analyze the data. First, it was necessary to identify the most relevant irrigation parameters. Second, aggregation weights (wi) and quality measurement values (qi) were defined. According to the irrigation water quality characteristics required by the Food and Agriculture Organization (FAO) for agricultural uses, proposed by Ayers and Westcot [45], values of (qi) were calculated based on every chemical parameter, as shown in Table 2. Equation (2) is used in this model to calculate the irrigation water quality parameter (qi), which is determined by the tolerance limits of the parameters listed in Table 2:

q i = {q i}_{m a x} - \frac{[(x i j - x_{i n f}) \times {q i}_{a m p}]}{X_{a m p}}

(2)

where

q i

is the quality of each parameter,

{q i}_{m a x}

stands for the maximum value of qi for every class,

x i j

stands for every parameter’s observed value,

x_{i n f}

represents the lower limit class of the parameter,

{q i}_{a m p}

is the amplitude of quality measurement class, and

X_{a m p}

is the amplitude class.

X_{a m p}

was evaluated based on the highest value determined in the analysis of the physicochemical properties of groundwater. According to Table 3, the parameter weights in the IWQI were calculated, as suggested by Meireles et al. [44]. Finally, the IWQI is calculated using Equation (3).

I Q W I = \sum_{i = 1}^{n} q_{i} w_{i}

(3)

2.4. Classification Learner

Using classification and prediction methods of machine learning can minimize time-consuming efforts by avoiding using many underlying calculations to predict a specific output [21]. When fed with reliable data, machine learning can predict appropriate categories (patterns), while an untrusted data source could inversely affect machine learning results. Therefore, it is vital to prepare the collected data and randomly divide them into two groups, one for training and the other for testing to assess the data quality. The latter group is essential to determine the accuracy of the constructed machine learning model. As soon as the machine learning algorithm is run on the input dataset, the model determines the outputs, so it is necessary to choose a model that is relevant to the task and the information presented and related to it [26,46]. Multiple models are suitable for many tasks, such as recognizing and processing images. Hence, the classification learner tool in Matlab is used to predict the irrigation water quality index (IWQI).

The input parameters to the classification learner toolbox in this study are water electrical conduction (EC), sodium concentration (Na), sodium adsorption ratio (SAR), chloride concentration (Cl), and the percentage of bicarbonate (HCO₃), while the output parameter is IWQI. These parameters were computed based on the analysis of 166 groundwater samples. The samples were divided into 156 samples for training all classification learner models, and the remaining samples were used as testing data to identify the accuracy of the models. The IWQI state was defined based on specified range limits, as illustrated in Table 4. As shown in Table 5, training and testing data are distributed by water state based on training and testing processes.

The classification learner toolbox was used with the trained data to select the best-performing classifier. Then, the raw, standardized, and normalized data were applied for all classifiers to investigate the best way to obtain a high-accuracy classifier. Some samples of the raw data of the five inputs to all classifiers are illustrated in Table 6.

Next, the data are standardized and normalized to enhance the classifiers’ accuracies. In order to standardize the data, each column was divided by the standard deviation of the parameter column and then subtracted from the average of the column. The new parameter values (

X_{n e w}

) can be determined as follows:

X_{n e w} = \frac{X_{i} - μ}{σ}

(4)

where μ refers to the mean and

σ

is the standard deviation of each variable of the five input variables of the trained data. The normalization can be developed using the maximum value of all data for each parameter. Therefore, the new transformation ratios become Equation (5):

X_{n e w} = \frac{X_{i}}{M a x (X_{i})}

(5)

2.4.1. Support Vector Machine (SVM) Classifier

SVM is a machine learning tool that splits data into two classes via the hyperplane. The main objective of SVM is to reduce errors by customizing the hyperplane, which increases the tolerance limit. Since the optimization problem is convex rather than linear, SVM offers a unique solution compared with ANN models containing many local minima [26,47]. First, it must satisfy the maximum distance between points for each category. After that, the exact classification can happen. The hyperplane classifies all points outside its margin as different. Larger features make it difficult to categorize them. Good classification can occur with a large margin, as shown in Figure 2 [48,49].

The hyperplane mathematical representation in SVM is as follows:

W^{T} \cdot x = 0

(6)

where x and W are the vectors. The vector W refers to the weight vector. The training data can be simulated as:

\{(x_{1}, y_{1}), (x_{2}, y_{2}), (x_{3}, y_{3}), \dots . ., (x_{n}, y_{n})\} \in R^{n}

(7)

This means that ordered pairs can represent our data

(x_{n}, y_{n})

, where x_n refers to features and y_n refers to the label of the x_n. The classification function can be expressed as:

y = f (x) : R^{n} \to \{1, - 1\}

(8)

where the f(x) function will learn from the training data we feed it, and it will then be able to perform the classification process for future data or unseen data outside the original range of datasets. The training process is carried out to find the maximum amount of margin (M) that can be obtained. The margin is mathematically represented as follows:

M = \frac{2}{‖W‖}

(9)

The relationship between M and W is inverse. The main equation for the SVM, which this work is based on, takes the following form:

\min (w, b) \frac{1}{2} {‖w‖}^{2} s u b j e c t t o y_{i} (w . x + b) \geq 1 f o r a n y i = 1, \dots . ., n

(10)

2.4.2. Weighted K-Nearest Neighbors (KNN) Classifier

For classification and regression, the KNN algorithm is used (most commonly) as a supervised learning algorithm. Datasets can be resampled, and this algorithm can calculate missing magnitudes. This method uses the k closest neighbors (data points) to predict the class of a new variable. Unlike model-based algorithms, instance-based learning uses whole training cases to predict the output of unseen data instead of learning by weights from training data. Based only on the number of points closest to a new point, the k-nearest neighbors method neglects much information. The steps of this method are summarized as follows:

The value of the variable k, which expresses the number of neighbors, is determined.
The distances between a new point and those in the dataset are calculated.
After arranging the points according to the minimum distance calculated in the previous step, the number of adjacent ones is calculated.
The class for the neighbors is defined.
Finally, the class with the most neighbors is the expected class for this point.

Figure 3 shows an example with the two classes represented in the red hexagon and the other in the green triangle. The new data point, represented by the question mark in the small circle, can be classified based on number of the red hexagons and green triangles. As the red hexagon class is dominant in the inner circle in this example, the new data point here must be classified to be in the red hexagon class.

3. Results

3.1. Chemical Composition of the Study Area

The FAO’s standards for agricultural purposes proposed by Ayers and Westcot [45] are compared with all physicochemical parameters in this study. Groundwater in the study area shows significant differences in chemical composition. The values of pH ranged from 7.35 to 8.19, averaging 7.71. For the electrical conductivity (EC), the values varied from 620.00 µδ/cm to 5920.00 µδ/cm with an average of 2475.75 µδ/cm, where the acceptable level of EC is 3000 µδ/cm according to FAO guidelines [45]. Therefore, 80.72% of Adrar’s electrical conductivity (EC) values are within the acceptable limits for irrigation purposes. Groundwater in the research area contains calcium concentrations ranging from 1.07 to 15.10 mEq/L. Consequently, all samples are within the permissible range of the FAO recommendations, which set a maximum value of 20 mEq/L [45]. It was observed that magnesium values varied from 0.65 to 14.63 mEq/L with a mean value of 4.90 mEq/L. So, about 50.6% of samples are within the standards stipulated by the FAO (<5 mEq/L) [45]. Sodium levels range from 1.52 mg/L to 38.70 mg/L, within FAO standard limits (<40 mEq/L) [45]. About 97.59% of the potassium values of Adrar are within the acceptable limits for irrigation purposes stipulated by the FAO (<2 mEq/L) [45]. Throughout this study, the sulfate value ranged from 2.08 to 20.83 mEq/L. The FAO guidelines stipulate a maximum value of 20 mEq/L for samples, which is met by 98.79% of samples. With a mean concentration of 0.77 mEq/L, the nitrate concentration in groundwater samples ranges from 0.12 to 3.02 mEq/L. About of 98.19% of chloride values respect the FAO guidelines. The concentration of bicarbonate ranges from 0.75 to 4.35 mEq/L. Observations show that all samples are in the permissible range of 10 mEq/L, stipulated by the FAO [45].

3.2. Irrigation Water Quality Results

Calculated SAR values range from 1.01 to 13.71, with a mean and standard deviation of 5.45 and 1.95, respectively. Generally, every sample whose SAR value ranges between 0 and 18 qualifies as an excellent or good irrigated area, as in our study [50,51,52]. Results of the calculated IWQI for the region of Adrar are presented in Table 7 and Figure 4. There was a wide range of IWQI values, ranging from 3.64 to 93.77, with an average of 41.81. According to Meireles et al. [44], the IWQI was divided into four categories: (i) excellent or very good, when IWQI is more than 70; (ii) good, when IWQI is between 55 and 70; (iii) satisfactory, when IWQI is between 40 and 55; and (iv) inappropriate or unsuitable, when IWQI is below 40. In our study area, the analysis concluded that 16 samples fell into the very good category, representing 9.64% of the total sample set. There were approximately 12.05% of samples deemed good, and there were approximately 21.08% deemed satisfactory. In addition, 95 samples were categorized as unsuitable, accounting for 57.23% of all samples examined.

3.3. Artificial Intelligence

In this section, the classification learners’ prediction accuracy results are reported. First, the input and output data were specified in the workspace in the MATLAB learner toolbox, and then the command Classification Learner was written in the command box. The input and output files must be identified from the workspace on a new session page. The input parameters were determined, and the last column was kept as the output. A 10-fold cross-validation was selected to ensure a good training process leading to a stable classification model.

When all of the classification learners were used with the raw, standardized, or normalized data, the SVM and KNN developed the highest prediction accuracy for the trained data, so the results of the SVM and KNN are reported here.

3.3.1. SVM Results for Standardized Data

The standardized data can be obtained using Equation (1). For the standardized data, the cubic SVM gave the highest prediction accuracy of 92.9%. Figure 5 shows data samples’ distribution with correct and incorrect prediction as a scatter plot of cubic SVM. In this plot, the name of the trained file appears as input_av with 156 observations. Five parameters are used as predictors. The incorrectly predicted points are marked with colored crosses (x). The codes 1, 2, 3, and 4 on the x and y axes in Figure 5 refer to very good, good, satisfactory, and unsuitable IWQI states, respectively. Figure 6 illustrates the confusion matrix of the cubic SVM. As can be seen from Figure 6a, 14 observations exhibit very good IWQI. The cubic SVM classifier correctly predicts thirteen out of fourteen samples; one sample is incorrect and belongs to the good IWQI state. Correct diagnoses are highlighted in green, while incorrect diagnoses are highlighted in red. For the good IWQI state, the cubic SVM correctly predicts 15 of 18 samples, with three incorrect samples, one for satisfactory and two samples for the very good IWQI state. Therefore, the prediction accuracy of the good IWQI state is 83% as shown in Figure 6b. The highest prediction accuracy is for the unsuitable IWQI state where the cubic SVM correctly predicts 91 from 92 samples, with an accuracy of 99%. The total accuracy of the cubic SVM is 92.9% for all trained data samples.

A receiver operating characteristic (ROC) is shown in Figure 7. ROC plots show the current classifier performance with the true positive rate (TPR) on the y-axis and false positive rate (FPR) on the x-axis. Based on the figure mentioned above, 1% of the observations were incorrectly assigned to the positive category based on an FPR of 0.01, while 93% of the observations are correctly classified as positive by the classifier, as indicated by the TPR of 0.93. It is considered a poor classification result when the ROC curve makes a 45° angle, as opposed to a perfect classification result when it makes an acute angle. A classifier’s accuracy can be measured by its area under the curve (AUC). Classifier accuracy increases with increasing AUC. It can be seen from Figure 7 that the AUC is 100%, meaning that the classifier performed better than expected.

Finally, the results of all classifiers are presented in Table 8, explaining the accuracy of all classifiers with the trained (raw, standardized, and normalized) data. The results of applying all classifiers on the trained data indicated that the best performance for the raw data is with the linear support vector machine (SVM) where the prediction accuracy was 92.9%. High accuracy can be obtained through the cubic SVM (92.9%) when applying all classifiers with the standardized data. In addition, the cubic SVM (94.2%) can develop high accuracy for the normalized data. The weighted k-nearest neighbors (KNN) classifier generates the second-best prediction accuracy with raw, standardized, and normalized data of 92.3%, 92.3%, and 92.9%, respectively. Therefore, the two classifiers SVM and KNN are presented in the following section since they develop higher prediction accuracy for the trained data.

3.3.2. SVM Results for Normalized Data

The normalized data can be obtained using Equation (2). For the normalized data, the cubic SVM gave the highest prediction accuracy of 94.2%. Figure 8 shows the confusion matrix of the cubic SVM with a prediction accuracy of 94.2%. Figure 8a shows that the number of observations expressing very good IWQI is 14 samples, which are correctly predicted. For good IWQI, 15 of 18 samples were correctly predicted, two incorrect samples were predicted as very good, and one sample was satisfactory. The prediction accuracy for good IWQI was 83% as presented in Figure 8b. For the satisfactory state, the prediction accuracy was 88%, where 28 of 32 samples were correctly predicted and the other four incorrect samples were unsuitable. A total of 90 out of 92 samples were correctly predicted in the case of unsuitable IWQI, and two samples were incorrectly predicted as the satisfactory IWQI state.

An FPR of 0.01, which indicates that 1% of the observations were classified incorrectly, can be seen in Figure 9. The TPR is 1.0, indicating that the classifier correctly assigns 100% of the observations to the positive class. This figure shows a classifier that performs better due to the 100% AUC.

3.3.3. SVM Results for Raw Data

When applying all classifiers on the raw data of the inputs (EC, Na, SAR, Cl, HCO₃), the linear SVM developed the highest prediction accuracy of 92.9%. Figure 10 shows the distribution of the correct and incorrect samples with linear SVM. The incorrectly predicted samples appear as colored crosses. Figure 11 shows the confusion matrix of the linear SVM. The prediction accuracies for IWQI state were 79% (11/14), 83% (15/18), 91% (29/32), and 98% (90/92) for very good, good, satisfactory, and unsuitable IWQI, respectively. The inaccurate predictions were 21% (3/14), 17% (3/18), 9% (3/32), and 2% (2/92) for very good, good, satisfactory, and unsuitable IWQI, respectively. For example, for the satisfactory IWQI state, the linear SVM predicts 29 samples as the satisfactory IWQI state, one as the good IWQI state, and two as the unsuitable IWQI state.

3.3.4. KNN Results for Normalized Data

The KNN classifier developed 94.2% prediction accuracy with the normalized data as in Equation (2). Figure 12 shows the results of the KNN classifier with 94.2% prediction accuracy. Figure 12 illustrates that the prediction accuracies were 100% (14/14), 83% (15/18), 81% (26/32), and 100% (92/92) for the very good, good, satisfactory, and unsuitable IWQI states, respectively. As seen in Figure 13, both positive and negative predictive values are shown. Based on Figure 13, we can determine the positive and negative predictive values. According to the figure, the predicted class 3 for satisfactory appears 29 times; 26 of these are correct, with an accuracy of 90% for the satisfactory IWQI state (class 3), with a 10% error rate for the good IWQI state (class 2).

The constructed SVM with standardized, normalized, and raw data was tested with ten samples to verify the accuracy of the constructed model. Table 9 shows the results of applying the classifiers with different data. The prediction accuracies of the classifiers with the standardized, normalized, and raw data were 100%, 100%, and 90%, respectively. The cubic SVM with the normalized data develops the highest prediction accuracy for training and testing samples (94.2% for training, 100% for testing).

4. Conclusions

Groundwater is an essential resource for drinking and irrigation in many parts of the world, particularly in arid regions. However, in many areas of the world, groundwater is unsafe to drink and can negatively affect crop production due to high concentrations of contaminants such as industrial chemicals or agricultural pesticides. This study aimed to assess the water quality for irrigation purposes in the region of Adrar and to develop a classification model to predict the irrigation water quality index (IWQI) class. Additionally, the sodium adsorption ratio (SAR) was assessed to determine the effects of water on soil structural behavior. Analyzing irrigation water quality can improve agricultural productivity and prevent plant damage. The calculation of SAR for the groundwater samples in the study area showed that they belong to the “very good” and “good” classes. Based on available data, it can be observed that most physicochemical parameters are within the Food and Agriculture Organization (FAO) criterion for agricultural purposes. Results for the calculated IWQI in the study area ranged from 3.64 to 93.77, with an average of 41.81. Based on the IWQI results, over 57.23% of the study area falls within the unsuitable category, mainly in the south and northeast parts of the study area. On the other hand, approximately 12.05% of samples were deemed good, and about 21.08% were considered satisfactory. The rest of the study area, about 9.64%, falls within the “high restriction” category, which is dominant in the western parts of the study area.

Artificial intelligence was used to predict groundwater quality for irrigation in the study area. SVM with the normalized data emerged as the optimal model for predicting the IWQI, where the accuracy for training and testing samples was 94.2% and 100%, respectively. This study has successfully developed an accurate SVM model for IWQI in arid areas. By combining physicochemical data, SAR, IWQI, and GIS, we can comprehensively understand water quality and its governing mechanisms in the study area. In this study, the methodology used to summarize the monitoring data could be an efficient and valuable tool for reporting the data to decision-makers. According to our findings, artificial intelligence techniques can enhance groundwater quality management plans in Adrar. By predicting the irrigation water quality index (IWQI) using hydrochemical parameters and machine learning techniques, the study provides a means to evaluate the suitability of groundwater for irrigation and potentially drinking purposes. Furthermore, the model may be adopted in other desert regions where the costs of estimating several water quality variables are high and might be restrictive. Further improvements may also be achieved by including more hydrochemical parameters and applications in different climate regimes.

Author Contributions

Methodology, A.D., H.G.A. and H.A.; writing—original draft preparation, A.D., S.S.M.G., E.A., A.A., A.A.A.D. and H.G.A.; writing—review and editing, A.D., S.S.M.G., H.A., A.A., E.A. and H.G.A. All authors have read and agreed to the published version of the manuscript.

Funding

Researchers would like to thank the Deanship of Scientific Research, Qassim University for funding the publication of this project.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data are available on request to the first author.

Acknowledgments

The authors would like to thank the editor and the four reviewers for their thoughtful and detailed comments on our paper.

Conflicts of Interest

The authors have no conflict of interest to declare.

References

Rao, N.S.; Dinakar, A.; Sravanthi, M.; Kumari, B.K. Geochemical characteristics and quality of groundwater evaluation for drinking, irrigation, and industrial purposes from a part of hard rock aquifer of South India. Environ. Sci. Pollut. Res. 2021, 28, 31941–31961. [Google Scholar] [CrossRef] [PubMed]
Hrudey, S.E.; Hrudey, E.J.; Pollard, S.J. Risk management for assuring safe drinking water. Environ. Int. 2006, 32, 948–957. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Edmunds, W.M. Renewable and non-renewable groundwater in semi-arid and arid regions. In Developments in Water Sciece; Elsevier: Amsterdam, The Netherlands, 2003; Volume 50, pp. 265–280. [Google Scholar]
Mays, L.W. Groundwater resources sustainability: Past, present, and future. Water Resour. Manag. 2013, 27, 4409–4424. [Google Scholar] [CrossRef]
Oweis, T.; Hachum, A. Water harvesting and supplemental irrigation for improved water productivity of dry farming systems in West Asia and North Africa. Agric. Water Manag. 2006, 80, 57–73. [Google Scholar] [CrossRef]
Akhtar, S.S.; Li, G.; Andersen, M.N.; Liu, F. Biochar enhances yield and quality of tomato under reduced irrigation. Agric. Water Manag. 2014, 138, 37–44. [Google Scholar] [CrossRef]
Besser, H.; Mokadem, N.; Redhouania, B.; Rhimi, N.; Khlifi, F.; Ayadi, Y.; Omar, Z.; Bouajila, A.; Hamed, Y. GIS-based evaluation of groundwater quality and estimation of soil salinization and land degradation risks in an arid Mediterranean site (SW Tunisia). Arab. J. Geosci. 2017, 10, 350. [Google Scholar] [CrossRef]
Jiang, Y.; Wu, Y.; Groves, C.; Yuan, D.; Kambesis, P. Natural and anthropogenic factors affecting the groundwater quality in the Nandong karst underground river system in Yunan, China. J. Contam. Hydrol. 2009, 109, 49–61. [Google Scholar] [CrossRef]
Panneerselvam, B.; Ravichandran, N.; Kaliyappan, S.P.; Karuppannan, S.; Bidorn, B. Quality and Health Risk Assessment of Groundwater for Drinking and Irrigation Purpose in Semi-Arid Region of India Using Entropy Water Quality and Statistical Techniques. Water 2023, 15, 601. [Google Scholar] [CrossRef]
Tung, T.M.; Yaseen, Z.M. A survey on river water quality modelling using artificial intelligence models: 2000–2020. J. Hydrol. 2020, 585, 124670. [Google Scholar]
Babiker, I.S.; Mohamed, M.A.; Hiyama, T. Assessing groundwater quality using GIS. Water Resour. Manag. 2007, 21, 699–715. [Google Scholar] [CrossRef]
Patrinley, J.R.; Berkowitz, S.T.; Zakria, D.; Totten, D.J.; Kurtulus, M.; Drolet, B.C. Lessons from operations management to combat the COVID-19 pandemic. J. Med. Syst. 2020, 44, 1–2. [Google Scholar] [CrossRef]
Xiang, X.; Li, Q.; Khan, S.; Khalaf, O.I. Urban water resource management for sustainable environment planning using artificial intelligence techniques. Environ. Impact Assess. Rev. 2021, 86, 106515. [Google Scholar] [CrossRef]
Jha, K.; Doshi, A.; Patel, P. Intelligent irrigation system using artificial intelligence and machine learning: A comprehensive review. Int. J. Adv. Res. 2018, 6, 1493–1502. [Google Scholar] [CrossRef] [Green Version]
Ibrahim, H.; Yaseen, Z.M.; Scholz, M.; Ali, M.; Gad, M.; Elsayed, S.; Khadr, M.; Hussein, H.; Ibrahim, H.H.; Eid, M.H. Evaluation and prediction of groundwater quality for irrigation using an integrated water quality indices, machine learning models and GIS approaches: A representative case study. Water 2023, 15, 694. [Google Scholar] [CrossRef]
Lowe, M.; Qin, R.; Mao, X. A review on machine learning, artificial intelligence, and smart technology in water treatment and monitoring. Water 2022, 14, 1384. [Google Scholar] [CrossRef]
Singh, B.; Sihag, P.; Singh, K. Modelling of impact of water quality on infiltration rate of soil by random forest regression. Model. Earth Syst. Environ. 2017, 3, 999–1004. [Google Scholar] [CrossRef]
Leong, W.C.; Bahadori, A.; Zhang, J.; Ahmad, Z. Prediction of water quality index (WQI) using support vector machine (SVM) and least square-support vector machine (LS-SVM). Int. J. River Basin Manag. 2021, 19, 149–156. [Google Scholar] [CrossRef]
Tan, G.; Yan, J.; Gao, C.; Yang, S. Prediction of water quality time series data based on least squares support vector machine. Procedia Eng. 2012, 31, 1194–1199. [Google Scholar] [CrossRef] [Green Version]
Wu, W.; Dandy, G.C.; Maier, H.R. Protocol for developing ANN models and its application to the assessment of the quality of the ANN model development process in drinking water quality modelling. Environ. Model. Softw. 2014, 54, 108–127. [Google Scholar] [CrossRef]
Derdour, A.; Jodar-Abellan, A.; Pardo, M.Á.; Ghoneim, S.S.; Hussein, E.E. Designing Efficient and Sustainable Predictions of Water Quality Indexes at the Regional Scale Using Machine Learning Algorithms. Water 2022, 14, 2801. [Google Scholar] [CrossRef]
Gaagai, A.; Aouissi, H.A.; Bencedira, S.; Hinge, G.; Athamena, A.; Haddam, S.; Gad, M.; Elsherbiny, O.; Elsayed, S.; Eid, M.H. Application of Water Quality Indices, Machine Learning Approaches, and GIS to Identify Groundwater Quality for Irrigation Purposes: A Case Study of Sahara Aquifer, Doucen Plain, Algeria. Water 2023, 15, 289. [Google Scholar] [CrossRef]
Samtio, M.S.; Rajper, K.H.; Mastoi, A.S.; Sadaf, R.; Rajper, R.H.; Hakro, A.A.; Agheem, M.H.; Lanjwani, M.F. Hydrochemical assessment of groundwater from taluka Dahili, Thar Desert, Pakistan, for irrigation purpose using water quality indices. Int. J. Environ. Anal. Chem. 2023, 103, 2368–2384. [Google Scholar] [CrossRef]
Pham, H.; Rahman, M.M.; Nguyen, N.C.; Le Vo, P.; Le Van, T.; Ngo, H. Assessment of surface water quality using the water quality index and multivariate statistical techniques—A case study: The upper part of Dong Nai river basin, Vietnam. J. Water Sustain. 2017, 7, 225–245. [Google Scholar]
Abdel-Fattah, M.K.; Mokhtar, A.; Abdo, A.I. Application of neural network and time series modeling to study the suitability of drain water quality for irrigation: A case study from Egypt. Environ. Sci. Pollut. Res. 2021, 28, 898–914. [Google Scholar] [CrossRef]
Mokhtar, A.; Elbeltagi, A.; Gyasi-Agyei, Y.; Al-Ansari, N.; Abdel-Fattah, M.K. Prediction of irrigation water quality indices based on machine learning and regression models. Appl. Water Sci. 2022, 12, 76. [Google Scholar] [CrossRef]
Ahmed, M.T.; Hasan, M.Y.; Monir, M.U.; Biswas, B.K.; Quamruzzaman, C.; Junaid, M.; Samad, M.A.; Rahman, M.M. Evaluation of groundwater quality and its suitability by applying the geospatial and IWQI techniques for irrigation purposes in the southwestern coastal plain of Bangladesh. Arab. J. Geosci. 2021, 14, 233. [Google Scholar] [CrossRef]
M’nassri, S.; El Amri, A.; Nasri, N.; Majdoub, R. Estimation of irrigation water quality index in a semi-arid environment using data-driven approach. Water Supply 2022, 22, 5161–5175. [Google Scholar] [CrossRef]
Haider, H.; Al-Salamah, I.S.; Ghumman, A.R. Development of groundwater quality index using fuzzy-based multicriteria analysis for Buraydah, Qassim, Saudi Arabia. Arab. J. Sci. Eng. 2017, 42, 4033–4051. [Google Scholar] [CrossRef]
Li, L.; Rong, S.; Wang, R.; Yu, S. Recent advances in artificial intelligence and machine learning for nonlinear relationship analysis and process control in drinking water treatment: A review. Chem. Eng. J. 2021, 405, 126673. [Google Scholar] [CrossRef]
Zoui, M.A.; Bentouba, S.; Bourouis, M. The Potential of Solar Thermoelectric Generator STEG for Implantation in the Adrar Region. Alger. J. Renew. Energy Sustain. Dev 2020, 2, 17–27. [Google Scholar] [CrossRef]
Bassoud, A.; Khelafi, H.; Mokhtari, A.M.; Bada, A. Evaluation of summer thermal comfort in arid desert areas. Case study: Old adobe building in Adrar (South of Algeria). Build. Environ. 2021, 205, 108140. [Google Scholar] [CrossRef]
Hadidi, A.; Saba, D.; Benmedjahed, M.; Blal, M. An overview on the development of the irrigation system in the province of Adrar (Algeria). Arab. J. Geosci. 2022, 15, 854. [Google Scholar] [CrossRef]
Djamai, M.; Merzouk, N.K. Wind farm feasibility study and site selection in Adrar, Algeria. Energy Procedia 2011, 6, 136–142. [Google Scholar] [CrossRef] [Green Version]
Nasri, B.; Kalloum, S.; Benhamza, M.; Taha, A.; Benatiallah, D. Water quality study of the foggaras in the Adrar region (southwest Algeria) using WQI and GIS. Arab. J. Geosci. 2022, 15, 1758. [Google Scholar] [CrossRef]
Hakimi, Y.; Orban, P.; Deschamps, P.; Brouyere, S. Hydrochemical and isotopic characteristics of groundwater in the Continental Intercalaire aquifer system: Insights from Mzab Ridge and surrounding regions, North of the Algerian Sahara. J. Hydrol. Reg. Stud. 2021, 34, 100791. [Google Scholar] [CrossRef]
Al-Gamal, S.A. An assessment of recharge possibility to North-Western Sahara Aquifer System (NWSAS) using environmental isotopes. J. Hydrol. 2011, 398, 184–190. [Google Scholar] [CrossRef]
Boualem, R.; Bachir, A.; Rabah, K. The foggara: A traditional system of irrigation in arid regions. GeoScience Eng. 2014, 60, 32–39. [Google Scholar] [CrossRef] [Green Version]
Muyen, Z.; Moore, G.A.; Wrigley, R.J. Soil salinity and sodicity effects of wastewater irrigation in South East Australia. Agric. Water Manag. 2011, 99, 33–41. [Google Scholar] [CrossRef]
Derdour, A.; Guerine, L.; Allali, M. Assessment of drinking and irrigation water quality using WQI and SAR method in Maâder sub-basin, Ksour Mountains, Algeria. Sustain. Water Resour. Manag. 2021, 7, 8. [Google Scholar] [CrossRef]
Wilcox, L. Classification and Use of Irrigation Waters; US Department of Agriculture: Washington, DC, USA, 1955.
Musa, A.I.; Tsubo, M.; Ali-Babiker, I.-E.A.; Iizumi, T.; Kurosaki, Y.; Ibaraki, Y.; El-Hag, F.; Tahir, I.S.; Tsujimoto, H. Relationship of irrigated wheat yield with temperature in hot environments of Sudan. Theor. Appl. Climatol. 2021, 145, 1113–1125. [Google Scholar] [CrossRef]
Şener, Ş.; Varol, S.; Şener, E. Evaluation of sustainable groundwater utilization using index methods (WQI and IWQI), multivariate analysis, and GIS: The case of Akşehir District (Konya/Turkey). J. Environ. Sci. Pollut. Res. 2021, 28, 47991–48010. [Google Scholar] [CrossRef] [PubMed]
Meireles, A.C.M.; Andrade, E.M.d.; Chaves, L.C.G.; Frischkorn, H.; Crisostomo, L.A. A new proposal of the classification of irrigation water. Rev. Ciência Agronômica 2010, 41, 349–357. [Google Scholar] [CrossRef] [Green Version]
Ayers, R.; Westcot, D. Food, agriculture organization of the United Nations (FAO), water quality for agriculture. Irrig. Drain. Rome Pap. 1994, 29, 77044-2. [Google Scholar]
El Bilali, A.; Taleb, A. Prediction of irrigation water quality parameters using machine learning models in a semi-arid environment. J. Saudi Soc. Agric. Sci. 2020, 19, 439–451. [Google Scholar] [CrossRef]
Kouadri, S.; Elbeltagi, A.; Islam, A.R.M.; Kateb, S. Performance of machine learning methods in predicting water quality index based on irregular data set: Application on Illizi region (Algerian southeast). Appl. Water Sci. 2021, 11, 190. [Google Scholar] [CrossRef]
Benmahamed, Y.; Kherif, O.; Teguar, M.; Boubakeur, A.; Ghoneim, S.S. Accuracy improvement of transformer faults diagnostic based on DGA data using SVM-BA classifier. Energies 2021, 14, 2970. [Google Scholar] [CrossRef]
Zhang, Y.; Li, J.; Fan, X.; Liu, J.; Zhang, H. Moisture prediction of transformer oil-immersed polymer insulation by applying a support vector machine combined with a genetic algorithm. Polymers 2020, 12, 1579. [Google Scholar] [CrossRef]
Asadi, E.; Isazadeh, M.; Samadianfard, S.; Ramli, M.F.; Mosavi, A.; Nabipour, N.; Shamshirband, S.; Hajnal, E.; Chau, K.-W. Groundwater quality assessment for sustainable drinking and irrigation. Sustainability 2019, 12, 177. [Google Scholar] [CrossRef] [Green Version]
Hamdi, L.; Defaflia, N.; Merghadi, A.; Fehdi, C.; Yunus, A.P.; Dou, J.; Pham, Q.B.; Abdo, H.G.; Almohamad, H.; Al-Mutiry, M. Ground Surface Deformation Analysis Integrating InSAR and GPS Data in the Karstic Terrain of Cheria Basin, Algeria. Remote Sens. 2023, 15, 1486. [Google Scholar] [CrossRef]
Reddy, N.M.; Saravanan, S.; Almohamad, H.; Al Dughairi, A.A.; Abdo, H.G. Effects of Climate Change on Streamflow in the Godavari Basin Simulated Using a Conceptual Model including CMIP6 Dataset. Water 2023, 15, 1701. [Google Scholar] [CrossRef]

Figure 1. Study area.

Figure 2. SVM algorithm indicates the margin separating two classes.

Figure 3. KNN classifier based on k-parts where the red hexagon and the green triangle represent class 1 and 2, respectively.

Figure 4. Geospatial distribution of IWQI in the study area.

Figure 5. Scatter plot of cubic SVM with standardized data.

Figure 6. Confusion matrices: (a) the number of correct and incorrect observations, (b) true positive rate/false negative rate for cubic SVM based on standardized data, where the green and red cells refer to the correct and incorrect prediction, respectively.

Figure 7. The ROC result of cubic SVM for standardized data.

Figure 8. Confusion matrices: (a) the number of correct and incorrect observations, (b) normalized cubic SVM true positive rate/false negative rate, where the green and red cells refer to the correct and incorrect prediction, respectively.

Figure 9. The ROC result of cubic SVM for normalized data.

Figure 10. Scatter plot of linear SVM with raw data.

Figure 11. Confusion matrices: (a) the number of correct and incorrect observations, (b) linear SVM true positives/false negatives based on raw data, where the green and red cells refer to the correct and incorrect prediction, respectively.

Figure 12. The confusion matrix of KNN with normalized data.

Figure 13. KNN confusion matrix for positive predictive values and negative predictive values. The green-colored cells represent the correct prediction while the red-colored cells represent incorrect prediction.

Table 1. Descriptive statistics of physicochemical parameters of irrigation water.

Parameter	Unit	Min	Max	Mean	SD
${C a}^{+ +}$	mEq/L	1.07	15.10	4.90	2.11
${M g}^{+ +}$	mEq/L	0.65	14.63	5.52	2.56
${N a}^{+}$	mEq/L	1.52	38.70	13.03	5.93
$K^{+}$	mEq/L	0.15	5.26	0.65	0.47
${C l}^{-}$	mEq/L	1.97	35.21	12.37	6.06
${S O}_{4}^{- 2}$	mEq/L	2.08	20.83	8.35	3.59
${H C O}_{3}^{-}$	mEq/L	0.75	4.35	2.60	0.49
${N O}_{3}^{-}$	mEq/L	0.12	3.02	0.74	0.41
EC	µδ/cm	620.00	5920.00	2475.75	927.00
pH	--	7.35	8.19	7.71	0.19
T	°C	21.3	24.8	23.4	2.62

Table 2. Limiting values for parameters used in quality assessments (q_i).

Parameters	Limiting Values
$q_{i}$	0–35	35–60	60–85	85–100
${H C O}_{3}^{-}$	${H C O}_{3}^{-} < 1 o r$ ${H C O}_{3}^{-} \geq 8.5$	$4.5 \leq {H C O}_{3}^{-} < 8.5$	$1.5 \leq {H C O}_{3}^{-} < 4.5$	$1 \leq {H C O}_{3}^{-} < 1.5$
$E C (u S {c m}^{- 1}$ )	$E C < 750 o r$ $E C \geq 3000$	$1500 \leq E C < 3000$	$750 \leq E C < 1500$	$200 \leq E C < 750$
$S A R$	$S A R < 2 o r S A R \geq 12$	$6 \leq S A R < 12$	$3 \leq S A R < 6$	$2 \leq S A R < 3$
${N a}^{+}$	$N a < 2 o r N a \geq 12$	$6 \leq N a < 12$	$3 \leq N a < 6$	$2 \leq N a < 3$
${C l}^{-}$	$C l < 1 o r C l \geq 10$	$7 \leq C l < 10$	$4 \leq C l < 7$	$1 \leq C l < 4$

Table 3. Relative weights used to calculate IWQI.

Parameters	SAR	EC	Cl	Na	HCO₃	Total
$w_{i}$	0.189	0.211	0.194	0.204	0.202	1

Table 4. The state of irrigation water based on IWQI limits.

Range of IWQI	Irrigation Water State
70–100	Very good
55–70	Good
40–55	Satisfactory
0–40	Unsuitable

Table 5. Distribution of the collected data based on the irrigation water state.

Irrigation Water State	Very Good	Good	Satisfactory	Unsuitable	Total
Training	14	18	32	92	156
Testing	2	2	3	3	10

Table 6. Computed input parameters and IWQI state of some of the trained data samples.

EC	Na	HCO₃	Cl	SAR	IWQI	IWQI State
83.33	122.25	75.90	92.61	94.96	93.77	Very Good
88.55	119.71	70.00	92.61	94.58	93.08	Very Good
63.33	71.59	47.50	80.54	88.02	69.82	Good
69.67	72.68	43.61	71.39	87.97	68.81	Good
51.17	40.43	50.33	39.58	75.45	51.15	Satisfactory
43.67	30.29	58.00	27.84	77.02	47.07	Satisfactory
40.17	12.17	37.05	27.84	73.36	37.71	Unsuitable
36.67	1.30	62.62	27.84	60.99	37.58	Unsuitable

Table 7. IWQI results of the study area.

Name	IWQI	Type	Name	IWQI	Type	Name	IWQI	Type	Name	IWQI	Type
TAA	93.77	VG	SAID	52.25	St	RH	38.32	Us	SALI	30.78	Us
TAZ	93.08	VG	KDR	52.00	St	BKR	38.12	Us	TAAT	29.92	Us
TAB	84.95	VG	TAMT	51.15	St	HMM	38.00	Us	ALF2	29.67	Us
TIN	84.15	VG	MNSR	47.07	St	BHH	37.90	Us	ISSA2	29.42	Us
ASDI	83.41	VG	RSL	46.32	St	FNL2	37.81	Us	RGN	28.82	Us
TIL	81.16	VG	AMR	46.18	St	TILL	37.71	Us	CHM	28.59	Us
TIM	80.21	VG	TITAF	45.60	St	RBT	37.58	Us	FAT	28.49	Us
KSRH	73.22	VG	KBL	45.52	St	GHRT	37.10	Us	ABR2	27.80	Us
TIL	73.05	VG	TMT2	45.36	St	KSN	36.98	Us	CHRW	27.55	Us
ABD	72.27	VG	KNN	44.99	St	HDJD	36.82	Us	DRA	27.55	Us
TIL2	72.13	VG	YCF	44.74	St	BAAM	36.61	Us	AITM	27.12	Us
TIB	71.86	VG	ABN	44.35	St	TEBN	36.51	Us	IGOS	27.01	Us
OUF	71.04	VG	FNGL	43.75	St	SHL	36.44	Us	TIMA	26.66	Us
SNLGZ	70.95	VG	RTB	43.63	St	BOR	36.40	Us	LAGH	26.61	Us
AERO	70.51	VG	RCHD	43.28	St	WHB	36.12	Us	AJIR	26.53	Us
BGM	70.25	VG	MHD	42.96	St	GBR	35.81	Us	GDM	25.43	Us
TIL3	69.82	Gd	TIMI2	42.82	St	ZGL	35.57	Us	NOM	24.77	Us
TAA	68.81	Gd	TID	42.27	St	SYCF	35.44	Us	BKRI	24.36	Us
GUR	68.42	Gd	AZO	41.75	St	SHL	35.34	Us	SML	23.76	Us
MHD	66.80	Gd	TIMO	41.68	St	ABBO	35.13	Us	TKK	23.40	Us
AKBR	65.87	Gd	BSL	41.48	St	YHIA	34.91	Us	HJMH	21.67	Us
TZA	65.72	Gd	BHH	41.19	St	CHTB	34.90	Us	BRL	21.23	Us
ATAR	65.70	Gd	AIAN	41.18	St	FNFL	34.69	Us	TWT	21.11	Us
BARB	65.66	Gd	ADM	41.09	St	FTH	34.61	Us	AWM	20.96	Us
MAIZ	65.36	Gd	CHRF	40.57	St	LHMR	34.37	Us	TSFT	20.39	Us
SBAA	65.35	Gd	BNZT	40.56	St	TGH	34.35	Us	NEFS	19.06	Us
SLM	65.06	Gd	CHKH	40.53	St	ZKNT	34.23	Us	ZKKR	18.28	Us
TIL4	64.75	Gd	TNRT	40.25	St	MSS	34.05	Us	AZRF	16.21	Us
TIL5	64.53	Gd	TIMI3	40.10	St	TLB	34.04	Us	AMS	15.77	Us
LAA	64.23	Gd	MNC	39.90	Us	HFR	33.79	Us	CHRW	14.88	Us
TMR	63.96	Gd	ZGH	39.84	Us	TMR	33.60	Us	AWLF	14.65	Us
TYB	60.48	Gd	IKKIS	39.84	Us	NZA	33.54	Us	DGHA	14.65	Us
YAK	57.47	Gd	MHD	39.77	Us	KID	33.50	Us	TLAL	13.26	Us
KORT	57.13	Gd	AKR	39.67	Us	SMD	33.43	Us	ARR	11.94	Us
ISSA	56.55	Gd	TKN	39.60	Us	TIAF	33.29	Us	TSAM	10.34	Us
TIMI	55.89	Gd	AWLF	39.59	Us	ABB2	32.60	Us	HBL	9.68	Us
KABR	54.93	St	MSTR	39.47	Us	ABLI	32.54	Us	TLH	9.68	Us
MRG	54.43	St	ALM	39.14	Us	MCN	32.36	Us	MLK	8.73	Us
ARPT2	53.66	St	RKIA	38.91	Us	ZGLF	32.33	Us	LAAR	7.48	Us
BLKB	52.83	St	TRR	38.83	Us	SALI	32.04	Us	TRR	3.64	Us
BALI	52.78	St	MLD	38.63	Us	AGHIL	31.19	Us
TIT	52.76	St	TDM	38.46	Us	BGL	30.90	Us

VG: Very good, Gd: Good, St: Satisfactory, Us: Unsuitable.

Table 8. Accuracies of all classifiers.

	Training Data
Classifier	Raw Data	Standardized	Normalized
Cubic KNN	84.6	85.9	85.3
Fine KNN	89.1	89.1	91
Medium KNN	86.5	85.9	87.2
Cosine KNN	82.7	84	83.3
Corase KNN	59	59	59
Weighted KNN	92.3	92.3	92.9
Corase tree	89.1	86.5	89.1
Medium tree	86.5	84	87.2
Quadratic discriminant	88.5	85.3	88.5
Linear discriminant	88.5	88.5	89.7
Ensemble bagged trees	87.8	89.7	90.4
Ensemble boosted trees	59	59	59
Ensemble subspace KNN	86.5	85.3	88.5
Ensemble subspace discriminant	87.8	87.2	87.2
Linear SVM	92.9	91	92.3
Ensemble RUSBoosted trees	88.5	87.8	89.7
Fine Gaussian SVM	76.3	75.6	75.6
Cubic SVM	92.3	92.9	94.2
Medium Gaussian SVM	92.3	90.4	92.9
Quadratic SVM	92.3	89.1	91.7
Coarse Gaussian SVM	83.3	82.1	82.1

Table 9. Prediction of the SVM classifier with standardized, normalized, and raw data.

EC	Na	HCO₃	Cl	SAR	IWQI	Actual Water Type	1*	2*	3*
69.33	67.61	61.15	78.43	85.43	72.13	Very Good	Very Good	Very Good	Very Good
72.33	59.28	65.50	86.80	76.37	71.86	Very Good	Very Good	Very Good	Very Good
60.67	57.46	49.00	80.31	81.21	65.35	Good	Good	Good	Good
70.00	52.03	48.85	83.12	72.39	65.06	Good	Good	Good	Good
53.00	26.67	64.00	51.31	70.52	52.83	Satisfactory	Satisfactory	Satisfactory	Satisfactory
56.33	28.48	89.67	27.84	61.23	52.78	Satisfactory	Satisfactory	Satisfactory	Satisfactory
52.67	41.16	58.69	33.71	78.58	52.76	Satisfactory	Satisfactory	Satisfactory	Satisfactory
48.33	19.42	34.10	27.84	67.16	39.14	Unsuitable	Unsuitable	Unsuitable	Unsuitable
40.00	−13.19	70.00	27.84	72.07	38.91	Unsuitable	Unsuitable	Unsuitable	satisfactory
44.67	12.17	42.95	27.84	67.98	38.83	Unsuitable	Unsuitable	Unsuitable	Unsuitable

1* Cubic SVM with standardized data, 2* Cubic SVM with normalized data, 3* Linear SVM with raw data.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Derdour, A.; Abdo, H.G.; Almohamad, H.; Alodah, A.; Al Dughairi, A.A.; Ghoneim, S.S.M.; Ali, E. Prediction of Groundwater Quality Index Using Classification Techniques in Arid Environments. Sustainability 2023, 15, 9687. https://doi.org/10.3390/su15129687

AMA Style

Derdour A, Abdo HG, Almohamad H, Alodah A, Al Dughairi AA, Ghoneim SSM, Ali E. Prediction of Groundwater Quality Index Using Classification Techniques in Arid Environments. Sustainability. 2023; 15(12):9687. https://doi.org/10.3390/su15129687

Chicago/Turabian Style

Derdour, Abdessamed, Hazem Ghassan Abdo, Hussein Almohamad, Abdullah Alodah, Ahmed Abdullah Al Dughairi, Sherif S. M. Ghoneim, and Enas Ali. 2023. "Prediction of Groundwater Quality Index Using Classification Techniques in Arid Environments" Sustainability 15, no. 12: 9687. https://doi.org/10.3390/su15129687

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Prediction of Groundwater Quality Index Using Classification Techniques in Arid Environments

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Data Collection

2.3. Irrigation Water Quality Criteria

2.3.1. Suitability Indices for Irrigation

2.3.2. Irrigation Water Quality Index (IWQI)

2.4. Classification Learner

2.4.1. Support Vector Machine (SVM) Classifier

2.4.2. Weighted K-Nearest Neighbors (KNN) Classifier

3. Results

3.1. Chemical Composition of the Study Area

3.2. Irrigation Water Quality Results

3.3. Artificial Intelligence

3.3.1. SVM Results for Standardized Data

3.3.2. SVM Results for Normalized Data

3.3.3. SVM Results for Raw Data

3.3.4. KNN Results for Normalized Data

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI