**Impact of Rice Intensification and Urbanization on Surface Water Quality in An Giang Using a Statistical Approach**

#### **Huynh Vuong Thu Minh 1, Ram Avtar 2, Pankaj Kumar 3, Kieu Ngoc Le 1,4, Masaaki Kurasaki <sup>2</sup> and Tran Van Ty 5,\***


Received: 26 May 2020; Accepted: 12 June 2020; Published: 15 June 2020

**Abstract:** A few studies have evaluated the impact of land use land cover (LULC) change on surface water quality in the Vietnamese Mekong Delta (VMD), one of the most productive agricultural deltas in the world. This study aims to evaluate water quality parameters inside full- and semi-dike systems and outside of the dike system during the wet and dry season in An Giang Province. Multivariable statistical analysis and weighted arithmetic water quality index (WAWQI) were used to analyze 40 water samples in each seasons. The results show that the mean concentrations of conductivity (EC), phosphate (PO4 <sup>3</sup><sup>−</sup>), ammonium (NH4 <sup>+</sup>), chemical oxygen demand (COD), and potassium (K+) failed to meet the World Health Organization (WHO) and Vietnamese standards for both seasons. The NO2 − concentration inside triple and double rice cropping systems during the dry season exceeds the permissible limit of the Vietnamese standard. The high concentration of COD, NH4 <sup>+</sup> were found in the urban area and the main river (Bassac River). The WAWQI showed that 97.5 and 95.0% of water samples fall into the bad and unsuitable, respectively, for drinking categories. The main reason behind this is direct discharge of untreated wastewater from the rice intensification and urban sewerage lines. The finding of this study is critically important for decision-makers to design different mitigation or adaptation measures for water resource management in lieu of rapid global changes in a timely manner in An Giang and the VMD.

**Keywords:** triple-rice cropping system; full-dike; surface water quality; WAWQI; An Giang Province; the Vietnamese Mekong Delta

#### **1. Introduction**

Deltas around the world have played a vital role in food security and economic development. However, the rapid exploitation of natural resources and changes in land use land cover (LULC) have also caused severe environmental degradation, such as water quality deterioration in many deltas in recent years [1–4]. The heavy metal concentrations and high bacterial pathogens due to industrial, agricultural activities, poor sanitation, and hygiene were found in the Middle Nile Delta, Egypt [5]. Several studies have also reported irregulated urban expansion and animal husbandry and its impact on water quality deterioration in Irrawaddy delta, Myanmar [6,7]. Consequently, when this polluted water flows into the city during monsoon, it causes several waterborne diseases such as

cholera, gastroenteritis, skin diseases [6,8,9]. Surface water pollution from organic pollutants, microbial contamination, pesticides, metals, etc. is revealed in the Mekong Delta Basin, in both the Cambodian (Phnom Penh) and Vietnamese (Chau Doc, Tan Chau, and Can Tho) part [10–15].

The well-known trans-boundary river of the Mekong River Basin (MRB) in the Asian region has a natural area of 795,000 km<sup>2</sup> and mean annual discharge of 14,500m3/s [16–18]. The glaciers in the Himalaya mountains is the source of the international Mekong River, which flows to China, Myanmar, Thailand, Laos, Cambodia, Vietnam, and finally to the Pacific Ocean [18]. Therefore, the lower Mekong Delta in Vietnam, located in the downstream of the MRB and accounting for 8% of the entire basin, has dominant diurnal tidal seawater entering twice a day. Changes in water quality and quantity in the upstream region would directly affect the health of proximally 242 million people (2018 data) [19] who live in the lower Mekong river [18,20]. The upper region of the VMD receives from 60% to 80% discharge from outside of the VMD, in which the only location of An Giang Province lies between the two main rivers of Mekong and Bassac. Therefore, the covered lands of An Giang are of fertile soil due to the abundance of water resources and fluvial sedimentation from the Mekong River. Consequently, An Giang has large agricultural areas with dominant rice production [21], but this province has also faced substantial damage by natural flooding phenomena annually from August to November due to the monsoon season in the Asian region [21–23].

The full- and semi-dike systems in An Giang were rapidly built since the 1990s to prevent flooding and to grow rice both for food security and economic development [22,24,25]. The full-dike system and the hydraulic infrastructure were developed to protect the triple-rice cropping system as well as the urban cities [21,25]. Local farmers can grow two or three rice crops per year inside the dike systems instead of single rice crops per year as in the past [21]. Although the dike systems can protect residential areas and increase income for the local farmers, the most critical disadvantage of this system is the surface water quality deterioration [21,22,25]. Water quality degradation may be derived from both natural conditions like rock–water interaction, ion exchange, groundwater–surface water interaction, evapotranspiration, and human activities such as a discharge of untreated wastewater from a point or nonpoint source in natural water bodies [16,21,26].

Water demand for agriculture and aquaculture alone consumes a significant portion of total available water, resulting in high waste discharged from agriculture [27]. Although few studies have reported the impact of land use on stream water quality [21,28,29], studies focusing on different types of dike development for agricultural intensification and its impacts on water quality remain scarce. Henceforth, the objective of this study is to assess the physicochemical properties of the surface water in An Giang Province using the multivariate statistical analysis approach and the weighted arithmetic water quality index (WAWQI). The primary focus of this study is to evaluate the impact of dike development on surface water quality compared to other remaining areas in An Giang. The hypothesis of this study is that the water quality inside the full-dike systems was worse than the outside ones, and water quality in the dry season was worse than that of the wet season.

#### **2. Methodology**

#### *2.1. Study Area*

An Giang Province (10◦12 N to 10◦57 N and 104◦46 to 105◦35 ) is located in the most upper part of the VMD and borders with Cambodia in the northwest (104 km long). An Giang is a home to over 2.4 million people (2019) [30], and the total area of 3536 km2, 70% of which is for agricultural production. There are two distinct seasons: dry and wet (monsoon) in the region. The wet season occurs between May and November annually in which the high rainfall usually occurs at the end of the wet season from October to November (Figure 1). Although total annual rainfall in An Giang is low compared with the average rainfall of the VMD, the rainfall occurs nearly at the same time with the flooding season leading risk at deep inundation. Thus, An Giang has to build a large area of the dike systems (Figure 2) to increase agricultural production and to protect crops during the flooding season

(July to November). Multi-dike protection systems have been built to protect residential areas from flooding, and have mainly supported agricultural intensification since the early 1990s. In addition, hydropower plants were built along the Mekong River, and its branches have led to a change in the water regime (Figure 1). During 1991 and 2015, the average discharge was decreased in the wet season and increased in the dry season. The primary soil type is alluvial soil, accounting for 44.5% of all 37 different soil types present in the province. About 72% of the area is alluvial soil or land receiving huge sediment supply and is suitable for many kinds of crops. The dike systems and hydropower plants have reduced the amount of alluvial soil to be added to the region annually [31,32].

**Figure 1.** Average hourly discharge (Q) from 2006 to 2017 and average daily rainfall from 1991 to 2015 at Tan Chau Station in An Giang. The discharge imposes a decreasing trend in the wet season and an increasing trend in the dry season. All data were collected from the Southern Regional Hydro-meteorological Center (SRHMC) in Vietnam [33].

**Figure 2.** Study area and water sampling sites in An Giang, the Mekong Delta in Vietnam.

#### *2.2. Collection of Water Samples and Analytical Methods*

Surface water quality samples were collected and analyzed in the wet and the dry seasons inside the full- and semi-dike systems and outside of the dike system (on the main river and single rice cropping system), as shown in Figure 3. Analyzed data were processed using statistical tools and used to calculate water quality indicators. Finally, the obtained result is discussed to observe spatio-temporal water quality classification and the impact of the dike system on water quality parameters.

**Figure 3.** Flowchart for study methodology.

Each season, 40 surface water samples were taken from inside the full- and semi-dike systems, and outside the dike system in An Giang (Figure 3). Sampling was done both for the dry season (22–28 April 2018) and the wet season (6–13 October 2018). Water sample locations were taken by geotagged photos, which were marked in the global positioning system (GPS). The stratified random sampling technique was conducted to select the sampling sites: Cluster 1 includes ten samples outside of the dike system (6 in the main rivers and 4 in single-rice cropping system), Cluster 2 includes ten samples inside the semi-dike system (3 in the forest and 7 in the double-rice cropping system), and Cluster 3 includes 20 samples inside the full-dike system (6 in the urban area and 14 in triple-rice cropping system). After collection, water samples were brought to the laboratory in an ice chest and stored below 4 ◦C. The collected samples were analyzed for twelve water quality parameters: pH, EC, chloride (Cl<sup>−</sup>), nitrite (NO2 <sup>−</sup>), nitrate (NO3 <sup>−</sup>), NH4 <sup>+</sup>, COD, PO4 <sup>3</sup>−, sodium (Na<sup>+</sup>), calcium (Ca2<sup>+</sup>), magnesium (Mg2<sup>+</sup>), and K+. The HORIBA multi-parameter meter (Kyoto, Japan) with a precision of 1% and a handheld meter (Oaklom; Tokyo, Japan) was used for in situ analysis of the physical parameters such as pH, Cl−, EC, and some chemical parameters of NO2 <sup>−</sup>, NO3 <sup>−</sup>, NH4 <sup>+</sup>, COD and PO4 <sup>3</sup> were measured using pack test- . Anions were analyzed by DIONEX ICS-90 ion chromatography with an error percentage of <2%, while cations were analyzed by a Shimadzu mass spectrometer with a precision of <1% using duplicates. The historical meteorological data were collected from the Southern Regional Hydro-meteorological Center (SRHMC) [33].

#### *2.3. Statistical Analyses*

#### 2.3.1. Multivariate Statistical Analysis

Multivariate statistical analysis was completed to obtain a better understanding of the processes governing water quality [34–40]. First, we conducted correlation and discriminant analysis (DA) [41] to find out the significant relationship among parameters and discriminant among clusters in terms of water quality characteristics. Second, we used box plots to show differences among different clusters in the dry and wet seasons. Finally, we used the WAWQI method to classify the water quality for human use. XLSTAT Software version 2018 (Addinosoft SARL, Paris, France) and the inverse distance weighting (IDW) interpolation were used to make different plots and display the results [42–45].

We conducted Spearman rank–order to evaluate the relationship among parameters at each season since most of the dataset had a non-normal distribution. Spearman rank–order consumption does not require any distribution test, such as a person correlation with a normal distribution [46,47]. Moreover, Spearman rank–order is used to identify the correlation between related parameters by producing the significance of the data, as reported in previous studies [45,48].

In this study, we use the DA technique to determine the most significant parameters among 40 samples sites as well as between the dry and wet seasons. The DA was also found in various studies [48,49]. The standard DA, forward stepwise, and backward stepwise were applied, which was previously documented [21,48,50]. The forward stepwise adds a parameter in each step, starting from the most significant fit improvement until no change was found. In the case of backward stepwise, each parameter is excluded step-by-step, starting from the least significant fit improvement until no significant changes [51,52]. After standard DA, the backward stepwise model helped to clarify which parameters are the most important. In this standard model, step-by-step, variables were removed from the beginning of the less significant until no significant changes in removal criteria are achieved [48,51].

#### 2.3.2. Weighted Arithmetic Water Quality Index (WAWQI) Model

The WAWQI is an index number that represents the overall quality of water and is a standard tool for the classification of water pollution (Figure 4). The WAWQI can be identified as a reflection of the composite influence of multivariable quality parameters [53]. Thus, WAWQI becomes an important indicator for the assessment and management of water resources. Here, all the selected water quality parameters are aggregated into an overall index, which is the most effective tool to express water quality [54].

**Figure 4.** Flowchart of the weighted arithmetic water quality index (WAWQI) model.

In this study, we chose the Horton method to calculate the WAWQI [21,35,54]. The standard for the drinking water was based on the permissible standard for drinking water set by WHO guidelines [55]. These all variables were turned into sub-indices such as quality rating (qi) and unit weights (Wi). The sub-indices were expressed on a single scale, and water quality was classified. The WAWQI was estimated using Equation (1) [56]:

$$\text{WAWQI}\_{i} = \frac{\sum\_{i=1}^{n} Q\_{i} \times W\_{i}}{\sum\_{i=1}^{n} W\_{i}} \tag{1}$$

where,

WAWQI is weighted arithmetic water quality index;

Qi is a quality rating of nth parameters, *Qi* = [(*Vi* <sup>−</sup> *Vdi*)/(*Si* <sup>−</sup> *Vdi*)] <sup>×</sup> 100 in which Vi is estimated value of nth parameters based on sample location, Vd is ideal value in pure water for nth parameters (pH = 7.0 and other parameters is 0); Si is permissible limits of nth parameters;

Wi is the unit weight of nth parameters, *Wi* = *K*/*Si*, in which K is proportionality constant, *K* = 1/ *n <sup>i</sup>*=1(1/*Si*).

Based on the ranges of WAWQI value, the corresponding status of water quality and their possible drinking use are summarized in Table 1.


**Table 1.** Water quality classification for human consumption using the weighted arithmetic water quality index (WAWQI) [55].

#### **3. Results**

#### *3.1. Statistical Assessment Using Correlation*

The results of correlations matrices among 12 water quality parameters in the dry and wet season are shown in Tables 2 and 3, respectively. The parameters showing weak correlation coefficients with others in both seasons in An Giang have been affected by multiple sources such as agriculture, urbanization, and industry [13,21,57]. In the dry season, COD had a strong statistically significant correlation with Mg2<sup>+</sup> (0.61) and EC (0.61) and a moderately positive relation with PO4 <sup>3</sup><sup>−</sup> (0.49) and NH4 <sup>+</sup> (0.461). In contrast, in the rainy season, COD had no correlation with PO4 <sup>3</sup><sup>−</sup> and Mg2<sup>+</sup> parameters, excluding EC, pH, and NH4 <sup>+</sup>, with which it showed weak correlations. PO4 <sup>3</sup><sup>−</sup> had a weak correlation with EC and NH4 <sup>+</sup> in both seasons and had a very weak relationship with the only NO2 − in the wet season. On the other hand, NO3 <sup>−</sup> had a strong correlation with NO2 <sup>−</sup>, while NO3 − did not correlate to others in both seasons. During flooding, a large amount of water flowing from the upper Mekong River discharges into An Giang with high COD concentration, supported by previous observation [13].

Interestingly, the characteristics of physical parameters in the dry season are strongly correlated than those in the wet season. Physical parameters such as EC and pH had a negative correlation in the wet season and had almost no correlation in the dry season. In the dry season, EC correlated with COD, NH4 <sup>+</sup>, and PO4 <sup>3</sup><sup>−</sup> while pH only correlated with NO2 −. In the wet season, pH and EC had a moderate correlation with COD and NH4 <sup>+</sup>. Besides, EC correlated with PO4 <sup>3</sup><sup>−</sup> and pH correlated with Mg2<sup>+</sup> in the wet season. The EC parameter qualitatively reflects the status of inorganic pollution [58]. The significantly high relation between EC and NH4 <sup>+</sup> for both seasons signifies the excess of breakdown/decomposition of organic matters, animal, and human waste. Nitrogen fixation is an indicator of anthropogenic input, excess of fertilizer application in the agricultural fields. During the wet season, pH and EC are negatively correlated, indicating a lower prevalence of cations and anions when water becomes alkaline. The strong correlation between EC and COD for both seasons indicates high organic pollutants, while the moderate association with PO4 <sup>3</sup><sup>−</sup> implies anthropogenic input. A strong association between NO2 <sup>−</sup> and NO3 − suggest the same source of origin, likely an agricultural runoff with high fertilizer input.

#### *3.2. Spatial Assessment of Water Quality Using DA*

The analysis technique of DA method was used to determine how many discriminant water quality parameters between the two seasons. The DA result shows a temporal comparison of the three discriminant significant parameters: pH, Cl<sup>−</sup>, and Ca2<sup>+</sup> between the dry and wet seasons (Figure 5). The pH, Cl<sup>−</sup>, and Ca2<sup>+</sup> showed different behaviors between the two seasons. The pH measures acidity in water or represents the negative logarithm of the hydrogen-ion activity [59,60]. The pH value beyond 6.5 to 8.5 range represents its contamination or pollution [61]. On the other hand, pH has a significant association with dissolved oxygen (DO) in freshwater. Therefore, the breakdown of organic matter exceeds synthesis activities caused oxygen consumption to increase. In this study, the pH 7.42 ± 0.63 (dry season) and 6.97 ± 1.06 (wet season) were neither highly alkaline nor highly acidic. In the dry

season, the water is slightly alkaline, while the water is slightly acidic in the wet season. This result also confirms that the fluctuations in the value of water quality parameters in the dry season are greater than those in the wet season. On the other hand, the concentrations of Cl<sup>−</sup> and Ca2<sup>+</sup> were also relatively higher for the dry season than that of the wet season. Relatively low river discharge and higher evapotranspiration cause this seasonal difference in the concentration. Even though Cl− occurs naturally in water, the larger value of Cl− level can increase the corrosiveness of water, and in combination with sodium, it creates a salty taste.


**Table 2.** Correlation matrices in the dry season using Spearman rank–order.

Values in bold are different from 0 with a significance level at alpha = 0.05. Concentrations of conductivity (EC), phosphate (PO4 <sup>3</sup><sup>−</sup>), ammonium (NH4 <sup>+</sup>), chemical oxygen demand (COD), nitrite (NO2 <sup>−</sup>); nitrate (NO3 −).

**Table 3.** Correlation matrices in the wet season using Spearman rank–order.


Values in bold are different from 0 with a significance level at alpha = 0.05.

**Figure 5.** Log-normal probability distribution of (**a**) pH, (**b**) Cl−, and (**c**) Ca2<sup>+</sup> during the dry (red line) and wet seasons (green line).

The DA approach was also applied to identify the contribution of the most important parameters of water quality seasonal variations, especially concerning the contribution of the variables in discriminating in space. Therefore, the DA approach is used to determine the discriminant among clusters in the dry and wet seasons (Tables 4 and 5). The significant parameters among clusters are the concentrations of NO2 <sup>−</sup>, NO3 <sup>−</sup>, and pH in the dry season and Cl<sup>−</sup> and Mg2<sup>+</sup> in the wet season.


**Table 4.** Unidimensional lambda test of the quality of water parameter equality in the dry season.

Note: Significance levels are denoted as follows: \*\* *p* < 0.01, \*\*\* *p* < 0.001.

**Table 5.** Unidimensional lambda test of the quality of water parameter equality in the wet season.


Note: Significance levels are denoted as follows: \*\* *p* < 0.01.

The discriminant of water pollutant level among different clusters (Cluster 3: inside the full-dike system, Cluster 2: inside the semi-dike system, and Cluster 1: outside of the dike system) was evaluated. The discriminant among clusters for selected parameters in both seasons was displayed by using box and whisker plots (Figures 6 and 7). For the dry season, concentrations of pH, NO3 <sup>−</sup>, NO2 − were high in Cluster 3 in comparison with Clusters 1 and 2. Meanwhile, in the wet season, the highest concentration of Mg2<sup>+</sup> was found in Cluster 2, followed by Cluster 3 and Cluster 1. The concentration of Cl− was found higher in Cluster 3 than that in Clusters 1 and 2 in the wet season.

**Figure 6.** Water quality variables among three Clusters in the dry season. NO3 <sup>−</sup>, NO2 −, and pH were found higher in Cluster 3 than those in Clusters 1 and 2.

**Figure 7.** Water quality variables among the three clusters in the wet season. Mg2<sup>+</sup> was high in Cluster 2 while Cl− was high in Cluster 3.

#### *3.3. Water Quality Classification Using WAWQI*

Table 6 shows the range, mean, and standard deviation values of parameters, some of which were found to exceed the permissible standard for drinking water set by WHO and Vietnam national standard for both seasons. The higher values of these water quality parameters would lead to an increase in WAWQI. Overall, EC, NO2 <sup>−</sup>, NH4 <sup>+</sup>, COD, PO4 <sup>3</sup>−, and K<sup>+</sup> were above the permissible standard set by WHO and Vietnamese standards. The EC is a measure of current carrying capacity due to the electrical current being carried by ions in a solution [62]; thus, as the concentration of dissolved salts increases, conductivity value also increases. On the other hand, EC is also used to determine the suitability of water for irrigation and firefighting [61]. Both NO3 <sup>−</sup> and NO2 − are nitrogen-containing compounds that generally indicate contamination from a pasture, decomposed vegetation, agricultural fertilizers, sewage, and rock–water interaction. NO3 − is the essential nutrients in an ecosystem. Generally, water polluted by organic matter exhibits higher values of nitrate. In this study, the mean concentration of nitrate was 0.34 mg/L in the dry season and 0.5 mg/L in the wet season. Nitrate in all sample sites was below permissible standards.

The Cl− mean values are 90 mg/L in the dry season and 20 mg/L in the wet season. The concentration of Cl− in surface water may come from human activities, namely, agricultural runoff and wastewater sources [61,63]. In this study, the high concentration of Cl− is also considered to be an indication of pollution due to the high organic waste from irrigation drainage, septic tank effluent, animal feed, and landfill leachates [59,60]. This also indicates poor governance and infrastructure to manage wastewater coming from both agricultural fields and urbanized areas.

The WAWQI of the present investigation from 40 sampling sites in both seasons were calculated. The WAWQI calculated from sampling Number 2 in the dry season is shown in Table 7 as an example.


**Table 6.** Standards for drinking water and relative weight of parameters.

Permissible limits for drinking \* WHO and \*\* Vietnamese standard. Measured values (Vi), standard values of water quality parameters (Si), corresponding ideal values (Vdi), Qi is a quality rating of n-th parameters, and unit weights (**Wi**) for sampling.

**Table 7.** Weighted arithmetic water quality index (WAWQI) calculation for sampling Number 2 as an example in the dry season.


Measured values (Vi), standard values of water quality parameters (Si), corresponding ideal values (Vdi), Qi is a quality rating of n-th parameters, and unit weights (**Wi**) for sampling.

The WAWQI is commonly used for the detection and evaluation of overall water pollution since it can reflect the influence of different quality parameters on the quality of water. The application of WAWQI is a useful method in assessing the suitability of water for various beneficial uses. The WAWQI was analyzed for two seasons, as shown in Appendix A. From the WAWQI of the dry season samples, 70% of the total water samples was unsuitable for drinking, 10% was very bad, 17.7% was bad, and only 2.5% was good. The water quality of the wet season showed that 60% of the total water samples was unsuitable for drinking, 10% was very bad, 20% was bad, and 10% was good. In general, the surface water quality was better in the wet season than in the dry season.

Besides, the WAWQI of both the wet and dry seasons was mapped to show the spatial distribution of WAWQI using the IDW method (Figure 8). The bad conditions of water quality (high values of WAWQI) were located in the rice intensification areas. Some bad water quality could be found at tributaries of the Bassac River. It might be caused by water discharged from intensive rice crop areas, tourism and urban areas. In the area surrounded by the Mekong and Bassac Rivers in the northeast, the water quality is found to be better. It may be because the proper operation of the sluice-gates system and the alternatives of intensive rice crops (instead of 3 crops/year, it had shifted to 8 crops for every 3-years, and 5 crops for every 2-years by now). Being surrounded by the two large rivers is also advantageous in that the exchange of inside and outside dike systems may lead to a reduction in pollution by dilution.

**Figure 8.** Spatial distribution of weighted arithmetic water quality index (WAWQI) in the (**a**) dry and (**b**) wet seasons using inverse distance weighting (IDW) interpolation.

Overall, the WAWQI values in the wet season are more scattered among the different sites compared to that of the dry season. For example, extreme high WAWQI were found in the northwest and the southwest of An Giang, while the southeast of An Giang was found with good water quality. Regions with high WAWQI were mainly found in the triple-rice system, and the urban area inside the full-dike system was linked with high concentrations of EC, NH4 <sup>+</sup>, COD, NO2 <sup>−</sup>, and PO4 3−. Contrastingly, locations with low WAWQI mainly represent orchards located inside the full-dike system. The heavy rain in the wet season can dilute pollutant concentrations. Therefore, water quality in this region in the wet season is better than the dry season. The "hotspot" of water quality in the south most of An Giang province is found in both dry and wet seasons. This can be explained by the full triple rice cropping system inside the full-dike system in this location being linked with high concentrations of EC, COD, NO2 <sup>−</sup>, and PO4 3−.

#### **4. Discussion**

Water is a precious resource for various activities in An Giang. However, due to a rapid rate of increase in rice intensification, urbanization, and tourist area, the water quality has decreased dramatically. This issue was found in various studies in the VMD in recent years [15,21]. The clarification of the seasonal change in water quality was important to evaluate the temporal variations of surface water pollution.

The results show that the concentration of NH4 <sup>+</sup>, COD, PO4 <sup>3</sup>−, and K<sup>+</sup> was relatively higher compared to the World Health Organization (WHO) and the Vietnamese standard for both seasons. Figures 9 and 10 show the concentrations of COD and PO4 <sup>3</sup><sup>−</sup> at the stations of Tan Chau and Chau Doc, respectively, which is close to the Cambodian border. The concentrations of COD showed an increasing trend from 1985 to 2011 at Tan Chau and in 2013 at Chau Doc station. Although COD concentration from 1996 to 2010 in Cambodia was higher than those in Vietnam, most of the COD values were below the permissible standard of Vietnam. From 2015 to 2017, COD has exceeded the Vietnamese standard

for domestic use. Linear progress analysis shows the R<sup>2</sup> values at 0.47 and 0.39 for Tan Chau and Chau Doc stations, respectively. The PO4 <sup>3</sup><sup>−</sup> concentrations from 1995 to 2005 (Figure 10) in the Cambodia side were below the standard of Vietnam, while those concentrations in the Vietnam part fluctuated seasonally and were higher than the permissible standard of Vietnam for several years.

**Figure 9.** Temporal concentrations of chemical oxygen demand (COD) in the Vietnamese side (Tan Chau and Chau Doc stations) from 1985 to 2017 and in the Cambodia side (Phnom Penh Port and Kratie) from 1995 to 2010.

**Figure 10.** Temporal concentrations of PO4 <sup>3</sup><sup>−</sup> in the Vietnamese side (Tan Chau and Chau Doc stations) from 1985 to 2017 and in the Cambodia side (Phnom Penh Port and Kratie) from 1995 to 2005.

The results of this study show that pH, Cl−, and calcium were significant discriminant parameters between the two seasons. Cl− was chosen as an important indicator parameter since its values represent the degree of organic pollution, as mentioned above. The concentration of Cl− in the dry season was found extremely higher than that in the wets season.

The classification of water quality in this study clearly shows that the status of water bodies in the study area is eutrophic, and it is unsuitable for drinking. It is also observed that most of the pollution loads relatively high in the dry season compared to those in the wet season except NH4 <sup>+</sup> and COD. Anthropogenic pollutant load is relatively high, as indicated by a higher concentration of PO4 <sup>3</sup>−, NO2 − and NO3 −. These results support the hypothesis that considered water quality deterioration in the dry season.

Furthermore, high concentrations of NO3 <sup>−</sup>, NO2 − and pH in water samples of Cluster 3 inside the full-dike system in the dry season were detected. Meanwhile, high Cl<sup>−</sup> and Mg2<sup>+</sup> were found in water samples of Cluster 3 and Cluster 2, respectively. Minh et al. [21] also found high nitrite and nitrate inside the full-dike system where the triple rice cropping system was dominant in An Giang. The high mean concentration of 90 mg/L in the dry season for Cl− inside the full-dike system was identified for the influence of wastewater surrounding the urban area and rice fields. Rivers typically have concentrations of Cl− less than 50 mg/L [64]. The high level of Cl− may have a negative impact on an ecosystem [64]. This may be an indicator of sewage pollution, which may be from a water softener or sewage contamination discharge from city, located inside the full-dike system. In summary, it also supports the hypothesis that water quality inside the full-dike system is worse than that of outside ones.

The WAWQI for 40 samples ranges from 34 to 1847 in the dry season and from 40 to 1584 in the wet season. Although the range of WAWQI, as well as the minimum values in the dry season, was lower than those in the wet season, the good water quality index of 10% of the location in the wet season was higher than 2.5% of the location in the dry season. The high value of WAWQI at these stations has been found to be mainly due to the higher levels of EC, NH4 <sup>+</sup>, and COD. Spatial distribution of water quality using WAWQI values helped to identify factors and processes responsible for water quality evolution.

#### **5. Conclusions**

Overall, this study provides an approach for assessing surface water pollutant levels. Water quality in An Giang in the dry and wet seasons has deteriorated tremendously due to urban wastewater discharge and rice intensification in the past 30 years. During the flood season, water from the Upper Mekong River carries high concentrations of pollutants into An Giang. We found high NO3 <sup>−</sup>, NO2 −, Cl<sup>−</sup> concentrations inside the full-dike system, while high concentrations of COD and NH4 <sup>+</sup> were found in the urban area and the main river (Bassac River). Most of the water quality samples in both dry and wet seasons were bad or unsuitable for drinking. Thus, the water in An Giang Province should be treated before supplying for drinking water or domestic use. Water quality observation stations along the border should be strengthened to provide a better understanding of the primary pollutant sources that have influenced the surface water quality during the flood season in An Giang as well as the entire VMD.

**Author Contributions:** Conceptualization—H.V.T.M., R.A., M.K. and T.V.T.; methodology—H.V.T.M., R.A., M.K., P.K., K.N.L. and T.V.T.; writing—original draft preparation, H.V.T.M., R.A., M.K., P.K., K.N.L. and T.V.T.; writing—review and editing, H.V.T.M., R.A., M.K., P.K., K.N.L. and T.V.T. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Acknowledgments:** The authors thank the Vietnamese Ministry of Education and Training, Can Tho University, and Hokkaido University for supporting us to complete this research.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**




**Table A1.** *Cont.*

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Spatio-Temporal Analysis of Surface Water Quality in Mokopane Area, Limpopo, South Africa**

**Mmasabata Dolly Molekoa 1, Ram Avtar 1,2,\*, Pankaj Kumar 3, Huynh Vuong Thu Minh 4, Rajarshi Dasgupta 3, Brian Alan Johnson 3, Netrananda Sahu 5, Ram Lal Verma <sup>6</sup> and Ali P. Yunus 7,8**


**Abstract:** Considering the well-documented impacts of land-use change on water resources and the rapid land-use conversions occurring throughout Africa, in this study, we conducted a spatiotemporal analysis of surface water quality and its relation with the land use and land cover (LULC) pattern in Mokopane, Limpopo province of South Africa. Various physico-chemical parameters were analyzed for surface water samples collected from five sampling locations from 2016 to 2020. Time-series analysis of key surface water quality parameters was performed to identify the essential hydrological processes governing water quality. The analyzed water quality data were also used to calculate the heavy metal pollution index (HPI), heavy metal evaluation index (HEI) and weighted water quality index (WQI). Also, the spatial trend of water quality is compared with LULC changes from 2015 to 2020. Results revealed that the concentration of most of the physico-chemical parameters in the water samples was beyond the World Health Organization (WHO) adopted permissible limit, except for a few parameters in some locations. Based on the calculated values of HPI and HEI, water quality samples were categorized as low to moderately polluted water bodies, whereas all water samples fell under the poor category (>100) and beyond based on the calculated WQI. Looking precisely at the water quality's temporal trend, it is found that most of the sampling shows a deteriorating trend from 2016 to 2019. However, the year 2020 shows a slightly improving trend on water quality, which can be justified by lowering human activities during the lockdown period imposed by COVID-19. Land use has a significant relationship with surface water quality, and it was evident that built-up land had a more significant negative impact on water quality than the other land use classes. Both natural processes (rock weathering) and anthropogenic activities (wastewater discharge, industrial activities etc.) were found to be playing a vital role in water quality evolution. This study suggests that continuous assessment and monitoring of the spatial and temporal variability of water quality in Limpopo is important to control pollution and health safety in the future.

**Keywords:** surface water quality; WQI; HPI; HEI

**Citation:** Molekoa, M.D.; Avtar, R.; Kumar, P.; Thu Minh, H.V.; Dasgupta, R.; Johnson, B.A.; Sahu, N.; Verma, R.L.; Yunus, A.P. Spatio-Temporal Analysis of Surface Water Quality in Mokopane Area, Limpopo, South Africa. *Water* **2021**, *13*, 220. https://doi.org/10.3390/w13020220

Received: 23 December 2020 Accepted: 14 January 2021 Published: 18 January 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

Water is an essential resource to sustain life on the Earth. Different key drivers of global change *viz.* urbanization, population growth and extreme weather conditions induced by climate change are severely affecting this finite resource, both in terms of quantity and quality [1]. Indiscriminate exploitation of groundwater, resulting in the depletion of groundwater levels and consequently greater dependency on surface water resources, is occurring in many regions around the world. The diligent monitoring and analysis of surface water quality are essential for sustainable management and use of surface water resources [2,3]. It is also useful for assessing processes that govern hydro-geochemical evolution of water resources [4].

South Africa has a population of over 51 million people, and out of that, 60% live in urban environments [5]. Because of the uneven distribution of water resources, approximately 77% of South African people are dependent on surface water resources [5]. Approximately 40% of African people lack improved water supply and more than 60% have no access to improved sanitation facilities [6,7]. There are valuable surface water resources such as rivers, dams and streams that are priceless assets, irreplaceable and provide important habitat for recreations, economic growth and nature conversation [8]. Preserving and ensuring the sustainable use of surface water resources can contribute towards the implementation of Sustainable Development Goals (SDGs 6) [9]. The increasing population, economic growth, and change in lifestyle cause an increase in the requirement of fresh water, which amplifies the pressure on limited water resources [10]. The surface water resources are at risk of contamination because of rapid industrialization, urbanization, extensive agriculture activities, mining and population growth [5,11].

Among different contaminants in water resources, heavy metal pollution is one of the most serious, and it poses threats to human life even at minor concentrations [12,13]. The major sources of heavy metal pollution in water are both natural (such as chemical weathering of minerals and soil leaching) and anthropogenic (such as industrial and domestic effluents, landfill leachate, water runoff, urban storm, mining activities, etc.). Several studies [14–16] have shown that heavy metal pollution of water can lead to various diseases such as tumors, head congestion, muscular edema etc. To evaluate the pollution load in water bodies, calculating the heavy metal pollution index is one of the most common approaches, as it can decipher the source of heavy metals [17–19]. A study on the distribution of heavy metals conducted by [18] showed how human activities could have impacts on aquatic ecosystems as a result of discharged wastes. According to [20,21], poorly planned industrialization and urbanization still exist in many developing countries and that deteriorates the situation on environmental pollution. Untreated waste disposal from refineries and various industries worsen the water quality. Therefore, monitoring heavy metals in surface water is an essential need in order to ensure the safety of both animal and human health. Villanueva et al., 2013 [22] reported that increased effluent from industrial, urban and agricultural areas elevates heavy metal pollution in surface water bodies. With the above background, in the absence of any significant work on surface water quality and factors playing key roles in determining this quality in the Mokopane area, Limpopo, South Africa, this study strives to quantify the spatio-temporal trend of different physico-chemical parameters and their relationship with the land use and land cover (LULC) pattern. In particular, the focus of this study is to quantify heavy metal pollution in the study area because of nearby mining activities as well as the absence of heavy metal pollution information in the Mokopane area.

#### **2. Study Area**

#### *2.1. Site Description*

The study area is located in Mokopane, Limpopo province of South Africa, approximately 250 km from Johannesburg city and is situated at the latitude 24.1944◦ S and longitude 29.0097◦ E (Figure 1). The total population of Mokopane is approximately 328,905 in 2016 [23]. It is one of the richest agricultural areas, producing wheat, cotton,

maize, citrus fruits, etc. Recently, there have been mining industries introduced in the area. The mean annual maximum and minimum temperature ranges from 23.4 ◦C and 13 ◦C, respectively (Figure 2). It is a steppe climate with a mean annual precipitation of 490 mm that normally occurs from December to April and less rainfall during the winter season from June to September (Figure 2) [24]. The region is served with water mainly by four rivers, the Dithokeng, Mogalakwena Deep pool (Ngwaditse), Rooisloot and the Dorps Rivers, which supply water for various domestic and irrigation purposes [25]. Sahu et al. [26] studied the impact of climatic variability on the streamflow of river; therefore, the study area's climatic data were analyzed to see the rainfall and temperature patterns.

**Figure 1.** Study area map with sampling location. The sampling sites are indicated through upstream to downstream.

**Figure 2.** Monthly average rainfall and temperature pattern of the study area [23].

There are five sampling sites selected for this study—Mogalakwena Deep pool, Rooisloot downstream and upstream, Dithokeng dam and Dorpsrivier, as shown in Figure 1. Water samples for physico-chemical analysis were collected mid-stream directly into clean polyethylene bottles. The most socially and economically important site is the Mogalakwena Deep pool, because all of the other streams are flowing into it. Many local people rely on the Mogalakwena deep pool for their primary source of water, as well as for fishing. It was found that there is no water in Rooisloot upstream during the dry seasons, hence there is no water sampling done during that period.

#### *2.2. LULC Classification*

Land use/land cover (LULC) classification involves the extraction of thematic information about various landscape features from satellite data. Landsat-8 OLI data were acquired on 6th May, 2019 from the USGS Earth Explorer [27] in order to produce a LULC map of the study area. Figure 3 illustrates the LULC map of the study area. LULC information is useful for the management and planning of land resources [28]. Various classification algorithms have been developed to classify satellite data. However, in this study, the most common Maximum likelihood classification algorithm was performed using ENVI 5.2 software. We have noticed some misclassification in the built-up area using the MLC algorithm; therefore, the built-up area was manually digitized to improve the accuracy of the LULC map. The study area was classified into five classes, namely—agriculture, bare land, built-up, mountain/vegetation and water bodies (Figure 3). Results showed that most of the study area is covered by mountain/vegetation with 48.7%, followed by agriculture (29.6%), built-up (19.8%), bare land (1.5%) and water body (0.36%), respectively. The study area is one of the richest agricultural areas, producing wheat, cotton, maize, citrus, etc., with the supply of water from the surrounding river system.

**Figure 3.** Land use/land cover (LULC) map of the study area and pie chart shows percentage of various classes.

#### **3. Methodology**

To get the insight of surface water quality, water samples were collected from five monitoring sites—Dithokeng River, Rooisloot upstream, Rooisloot downstream, Mogalakwena deep pool and Dorps River. Sampling locations were selected in such a way that they represent a significant stretch of rivers from upstream to downstream, as well as distance from Mogalakwena and Ivanplats platinum mines. To analyze spatio-temporal variation in river water quality, water samples were collected and analyzed four times a year (except for year 2020) from March 2016 until November 2020 by the Environmental Department of Ivanplats mine [29] in South Africa. Twenty samples were collected from each monitoring point except Dithokeng Upstream, Rooisloot upstream regions because of non-accessibility and non-availability of water, respectively, during some sampling periods. Field measurements for pH, EC and temperature were done using an Orion Model Number, 01915. After in situ analysis, water samples were filtered by 0.20 μm Millipore filter paper and then collected in pre-rinsed uncontaminated polyethylene bottles. To prevent any fluctuation in the concentration of trace metals, the collected samples for major cation and trace metal analysis were acidified by 1% HNO3 at pH ~2. The concentration of HCO3 − was analyzed by acid titration (using Metrohm Multi-Dosimat); while other anions Cl−, NO3 −, SO4 <sup>2</sup>−, and PO4 <sup>3</sup><sup>−</sup> were analyzed by DIONEX ICS-90 ion chromatograph. Inductively coupled plasma-mass spectrometry (ICP-MS) was used to evaluate major cations and trace metals. The summary of different techniques used for water quality parameter analysis is also shown in Table 1. After obtaining all the analyzed water quality data for the aforementioned period from the Environmental Department of Ivanplats mine, different techniques and software were used to deduce the factors responsible for spatio-temporal variation in the water quality. Heavy metal pollution index (HPI) and Heavy metal evaluation index (HEI) were calculated to provide overall quality of the water with regard to heavy metals. In this study, the permission limits are taken from WHO, 2009 [29].



#### *3.1. Heavy Metal Pollution Index (HPI) Calculation*

Metal pollution is one of the most significant problems in water bodies, causing serious health hazards to human beings. The HPI, based on the weighted arithmetic sum of water quality parameters, is a powerful technique for the assessment of water quality based on the heavy metal concentration and effect of individual trace metals on human health [30,31]. The HPI model has been proposed in Equation (1) given by Mohan et al., 1996 [30]. Heavy metal concentrations were compared with the drinking water standards set by the WHO.

$$\text{HPI} = \frac{\sum\_{i=1}^{n} Q\_i \times W\_i}{\sum\_{i=1}^{n} W\_i} \tag{1}$$

where, *n* and *i* are the number of parameters considered and denote *i*th parameter;


#### *3.2. Heavy Metal Evaluation Index (HEI) Calculation*

*i*=1

We also conducted HEI to interpret the water quality in response to heavy metals and trace elements present in water, as proposed in Equation (2).

$$\text{HEI} = \sum\_{i=1}^{n} \frac{M\_i}{S\_i} \tag{2}$$

where, *Mi*—monitored value of *i*th and *Si*—standard value of *i*th parameter.

The classifications of the HEI index is as follows—low heavy metal (less than 10), moderate-heavy metal (between 10 and 20), and high heavy metal (more than 20).

#### *3.3. Water Quality Index*

Water quality index is one of the effective methods which has been applied in various studies for both surface and groundwater [18,25,31–35].

Water quality index (WQI) is used in this study, which has been considered as one of the most reliable tools for classifying water pollution levels for both groundwater and surface water [24,32,33]. The following steps were taken in order to calculate WQI:

1. Calculating relative weight: It was calculated using Equation (3).

$$\mathcal{W}i = \frac{wi}{\sum\_{i}^{n} wi} \tag{3}$$

where *Wi* represents the relative weight of each parameter sampled, *wi* represents the weight of each parameter, and *n* represents the total number of parameters.

2. Calculating *Q* value: It was calculated using Equation (4).

$$Q\_i = \frac{C\_i \times 100}{S\_i} \tag{4}$$

where *Qi* = quality rating, *Ci* = Concentration of each parameter (mg/L), and *Si* is derived from the WHO water quality standard.

3. Finally, the Water quality Index (WQI) was calculated using Equation (5).

$$\text{WQI} = \sum \mathcal{W}\_{\text{i}} \times Q\_{\text{i}} \tag{5}$$

Water Quality assessment in terms of the WQI is shown in Table 2.


**Table 2.** Water quality classification based on Water quality Index (WQI) values [24].

#### **4. Results and Discussion**

*4.1. General Water Chemistry*

A statistical summary of the analyzed river water quality is shown in Table 3. The pH values of the water samples varied from 6.63 to 9.43, with an average value of 8.12, depicting the alkaline nature of the water due to high soil–water interaction during the flow course of the drainage system [35]. The electrical conductivity values varied from 91.19–2686.6 μS/cm, with an average value of 1022.17μS/cm, indicating high ionic activity in the area. Furthermore, the arid/semiarid climate, with relatively low rainfall and high evaporation, supports high mineral concentration in the water bodies. Looking into the ionic abundance, Na+ > Mg2+ > Ca2+ > K+ was the order among cations, whereas the order among anions was HCO3 − > Cl− > SO4 <sup>2</sup><sup>−</sup> > PO4 <sup>3</sup><sup>−</sup> > NO3 <sup>−</sup> > F−. For cations, Na+ > Mg2+ > Ca2+ > K+ and the average milli-equivalent ratio of Mg2+ + Ca2+/Na+ + K+ was found to be 1.23, indicating the ascendency of carbonaceous weathering in the study area. The dominance of Na+ in the water sample might be because of its conservative nature. Excess of both Mg2+ and Ca2+ can be explained by the presence of a common source of minerals like dolomite. The highest average concentration of HCO3 − among the anions is due to the weathering of the carbonaceous sandstones in the watershed and the weathering of the carbonaceous minerals through runoff. Higher Cl− and SO4 <sup>2</sup><sup>−</sup> concentration in the river water witnessed the anthropogenic inputs coming along surface runoff in the watershed area. In particular, higher concentrations of SO4 <sup>2</sup><sup>−</sup> can be due to leaching of organic matter and agricultural runoff carrying unused SO4 <sup>2</sup>−. This organic matter can range from landfills area with piles of organic wastes or leaching from organic matter-rich sediment present in the study area like peat or clay. The concentration of PO4 <sup>3</sup>−, NO3 − and F- are not a concern as they are well below the permissible limits of WHO for all surface water samples. The time series value for EC and Ti is shown in Figure 4a,b, respectively. Here, it is found that the EC value has an increasing tendency towards downstream. It can be supported by higher values of major cations and anions, a strong indicator of inputs from both anthropogenic (runoff carrying pollutants) and natural sources (mineral weathering). Among different trace metals, the concentration of Ti is of major concern for this study area, especially in Rooisloot Upstream as compared to Downstream from 2016–2017. A lot of animals such as cows and pigs were observed during field surveying and grazing-led sedimentation can exaggerate the water quality deterioration. Ti is among the most abundant chemical elements on the earth's crust, ranking ninth of all the elements and among transition metals, it follows second after iron [36–38]. Human activities are among the factors that cause Ti to enter water, especially in its nanoparticle form and this affects aquatic life. The migration mobility of Ti is generally low. To analyze the spatio-temporal variation of water quality, time series evaluation of key water quality parameters is plotted and shown in Figure 4. Looking at the spatial trend, EC displays an increasing trend when moving from the upstream region towards the downstream region. This can be justified because of the transportation and continuous accumulation of contaminants from different point and non-point sources throughout the stretch of the river. On the other hand, the spatial trend of Ti shows some different patterns. Here, the result shows a higher concentration in the upstream region, which is decreasing when going towards the downstream region. Hence, after their release in the river body through

surface runoff or leachate, the concentration gradually decreases because of the dilution effect. Looking at the temporal variation, in general, the concentration of water quality parameters shows higher concentration during dry periods compared to the wet periods. The possible reason behind this is that because of the reduction in river discharge, these parameters attenuate and hence the concentration increased. Looking at the year 2018, Mokopane received an increase of rainfall (Figure 5), which might have caused a sudden decrease in Ti found in water. The high concentration of EC in Rooisloot Downstream could result from domestic effluents and affected by Rooisloot Upstream as it recharges this stream. To further support the increasing causes of water quality deterioration, the land use land cover map was prepared for the years 2015 and 2020 as shown in Figure 6. Here, it is found that built-up areas are significantly increased, especially in the upstream region, at the expense of bare land and water bodies. This increase in built-up areas represents the source of both point and non-point sources of water pollution. Based on the above findings, a conceptual diagram is developed as shown in Figure 7, which is depicting the processes governing water quality evolution in the study area.


**Table 3.** Statistical summary for observed water quality parameters.

**Figure 4.** Time series concentration values for (**a**) EC and (**b**) Ti for water samples at five sampling locations.

**Figure 5.** Comparison of rainfall accumulation between 2017 and 2018.

**Figure 6.** Land-use map of the study area for year 2015 and 2020.

**Figure 7.** Conceptual diagram showing processes involved in water quality evolution.

#### *4.2. Heavy Metal Evaluation Index (HEI)*

To calculate the heavy metal evaluation index, first of all the unit weight for different metals at the individual levels was calculated, which was used further as an input to calculate the heavy metal pollution index and the heavy metal evaluation index for different water samples at a different time period. Results for heavy metal unit weight are shown in Table 4.

The results for HPI and HEI are shown in Tables 5 and 6, respectively. These values represent the cumulative value of different heavy metals. It was found that sampling locations in the upstream region, namely the Dithokeng upstream and Rooisloot upstream locations, have low concentrations of heavy metals. On the other hand, both the Rooisloot downstream and Mogalakwena sites had moderate heavy metal content. Finally, Dorpsriver has low-to-moderate heavy metal content, which can be explained by a dilution effect on heavy metal concentration by river discharge. Here, the main attribute for heavy metal contamination in the water samples can be related to the land use pattern as shown in Figure 3. In this area, the spatial distribution of built-up areas that are dominant in the southern side of the study area is significantly correlated with the heavy metal composition of the water samples. Built-up areas may act as a non-point source of heavy metal due to different activities like small-scale industries (leather, textile, etc.), human settlements; where wastewater and effluent discharge bring different heavy metals like Fe, Zn, Mn, etc. into the river water bodies. On the other hand, mining sand, natural factors such as rock weathering and other domestic effluents near Rooisloot upstream region also exaggerate the concentration of heavy metals like Ti, Cu, Cr, Ni [25] etc. An uncontrolled flow of sewage into Dorps River (the downstream sampling locations) was also observed during the field survey. Looking at the result, it is found that both HPI and HEI are showing lower values, especially for the year 2020. This can be justified with lower anthropogenic activities like mining, industrial activities during COVID-19-induced lockdown.


**Table 4.** Unit weight calculation of the heavy metal evaluation index (HEI).

**Table 5.** Heavy metal pollution index (HPI) calculation.



**Table 6.** Heavy metal evaluation index (HEI) calculation.

#### *4.3. Water Quality Index (WQI)*

The result for the water quality index calculated for the four-year time period is shown in Table 7. Calculated WQI values ranged from 120.71 to 4643.71, which indicates that the water in all of these locations falls under the "very poor water" and "likely not suitable for drinking purposes" categories. The highest values were mainly found near the downstream, i.e., Dorpsriver, which shows the accumulative effects of different contaminants along with the river flow course. One of the major concerns regarding poor WQI is heavy metal contamination. Values for the year 2018 were relatively on the higher end because of high rainfall, which results in high sedimentation and ionic activities. Lower values for the year 2019 are only because of the giving input of incomplete datasets for the year 2019. The temporal variations of WQI showed that surface water quality at five sampling sites has not improved much over the 2016–2019 period. All sampling sites were considered as "poor water quality" to "likely not suitable for drinking". The reason behind this was inefficient water resource management practices during that time. In this regard, it shows that the water quality did not improve in the period from 2016–2019. However, for the year 2020, water quality is relatively improved because of lower environmental perturbances due to COVID-19-induced lockdown period, as discussed earlier.

**Table 7.** Water quality index (WQI) results for the period 2016–2019.



**Table 7.** *Cont.*

#### **5. Conclusions and Recommendation**

This study strived to quantify spatio-temporal water quality in the Mokopane area of South Africa and identify the processes which governed water quality changes. The results indicated that the concentration of most of physico-chemical species in the water samples was within permissible limits, except for a few parameters and a few locations. There was a trend showing water quality deterioration towards the downstream, as contaminants accumulated with the river flow. The water quality for the streams was found to be worsened or unchanged over the four-year period. For example, the Dorps River and Dithokeng dam showed no significant change from the 2016–2019 periods as the water quality fell under "likely not suitable for drinking". However, for the year 2020, water quality shows an improvement in terms of WQI, HPI, HEI owing to the suspension of different human activities like mining, industrial, agricultural, etc., due to the lockdown imposed by COVID-19. This means without proper management that ensures good water quality in these areas, the water will continue to not being fit for humans, animals and plants for their survival. Focusing on spatio-temporal variation, water quality concentration showed an increasing trend from upstream to downstream as pollutants get accumulated. Also, temporally, rainfall has a significant impact on water quality parameters by dilution and attenuation during wet and dry seasons, respectively. Land use has a significant relation with water quality, and we found that built-up areas had a bad impact on water quality in the study site. Looking into the processes, both natural processes (rock weathering) and anthropogenic activities (household wastewater discharge, industrial especially mining activities etc.) were playing a major role in governing water quality. In the absence of any previous credible scientific study or reports, this study sheds light on issues regarding water resource management. The sampling location and number of water samples in this study are less due to lack of financial support. Detailed coverage of the river stretch with the inclusion of more sampling locations for the time-series analysis of water quality data along with analyzing both point and non-point sources of pollutants is recommended as a future study. A participatory approach for watershed management and making land use climate-resilient might be investigated in the future to plan the best suitable adaptation and mitigation measures for water resource management. A comparative study of surface water quality in the study site and nearby Doorndraai dam is necessary considering the impacts of LULC change.

**Author Contributions:** Conceptualization, M.D.M., R.A., P.K., H.V.T.M.; methodology, M.D.M., R.A., P.K., H.V.T.M.; investigation, M.D.M., R.A.; resources, M.D.M., R.A., R.L.V.; data curation, M.D.M., R.A., R.L.V., N.S.; writing, M.D.M., R.A., P.K., R.D., B.A.J., N.S., R.L.V., A.P.Y.; writing review and editing, M.D.M., R.A., P.K., R.D., B.A.J., N.S., R.L.V., A.P.Y.; supervision, R.A., P.K.; funding acquisition, M.D.M., R.A. All authors have read and agreed to the published version of the manuscript.

**Funding:** The publication fee was supported by JICA-ABE research grant.

**Acknowledgments:** Mmasabata Dolly Molekoa extends gratitude to the JICA-ABE Initiative to provide a scholarship to study master course at Hokkaido University, Japan. The authors are thankful to the Environmental Department of Ivanplats mine for providing data for this research and the Graduate School of Environmental Science (Hokkaido University) for facilities, and anonymous reviewers for their comments.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Exploring Artificial Intelligence Techniques for Groundwater Quality Assessment**

**Purushottam Agrawal 1, Alok Sinha 1, Satish Kumar 2, Ankit Agarwal 3,4, Ashes Banerjee 5, Vasanta Govind Kumar Villuri 2, Chandra Sekhara Rao Annavarapu 6, Rajesh Dwivedi 7, Vijaya Vardhan Reddy Dera 8, Jitendra Sinha <sup>9</sup> and Srinivas Pasupuleti 10,\***

	- Dhanbad 826004, Jharkhand, India

**Abstract:** Freshwater quality and quantity are some of the fundamental requirements for sustaining human life and civilization. The Water Quality Index is the most extensively used parameter for determining water quality worldwide. However, the traditional approach for the calculation of the WQI is often complex and time consuming since it requires handling large data sets and involves the calculation of several subindices. We investigated the performance of artificial intelligence techniques, including particle swarm optimization (PSO), a naive Bayes classifier (NBC), and a support vector machine (SVM), for predicting the water quality index. We used an SVM and NBC for prediction, in conjunction with PSO for optimization. To validate the obtained results, groundwater water quality parameters and their corresponding water quality indices were found for water collected from the Pindrawan tank area in Chhattisgarh, India. Our results show that PSO–NBC provided a 92.8% prediction accuracy of the WQI indices, whereas the PSO–SVM accuracy was 77.60%. The study's outcomes further suggest that ensemble machine learning (ML) algorithms can be used to estimate and predict the Water Quality Index with significant accuracy. Thus, the proposed framework can be directly used for the prediction of the WQI using the measured field parameters while saving significant time and effort.

**Keywords:** WQI; Pindrawan tank area; drinking water quality; artificial intelligence; particle swarm optimization; support vector machine; naive Bayes classifier

#### **1. Introduction**

A high enough quantity and appropriate quality of freshwater are some of the fundamental requirements for sustaining human life and civilization. Indeed, the tremendous population growth and miraculous achievements in science and technology have increased

**Citation:** Agrawal, P.; Sinha, A.; Kumar, S.; Agarwal, A.; Banerjee, A.; Villuri, V.G.K.; Annavarapu, C.S.R.; Dwivedi, R.; Dera, V.V.R.; Sinha, J.; et al. Exploring Artificial Intelligence Techniques for Groundwater Quality Assessment. *Water* **2021**, *13*, 1172. https://doi.org/10.3390/w13091172

Academic Editors: Kwok-wing Chau and Pankaj Kumar

Received: 8 January 2021 Accepted: 20 April 2021 Published: 23 April 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

groundwater utilization for domestic, industrial, and irrigation purposes multiple folds throughout the world over the last few decades. Rapid urbanization, overexploitation, and unscientific waste disposal have also influenced the accessibility and quality of groundwater. Excessive population growth and rapid urbanization have forced the use of chemicals and pesticides for agricultural purposes, which often results in leaching and mixing into the groundwater. As indicated by the World Health Organization (WHO), inappropriate or polluted water causes around 80% of all diseases in human beings. Furthermore, contaminated groundwater quality cannot be improved or re-established by preventing contamination from the source. Therefore, understanding and determining water quality is imperative in the study of water resources and environmental engineering.

Water quality essentially determines the usability of water from a source in terms of the nature and concentration of the impurities present in the sample [1]. As a combined effect of the continuous deterioration in water quality and quantity, approximately one billion people worldwide face a shortage of adequate and safe water supply. These statistics' increasing nature makes it essential to monitor water quality for its efficient management and supply [2,3].

The most efficient method for classifying water quality is using the Water Quality Index (WQI). Water quality is often estimated based on water quality indices [4,5]. It is a tool that has been extensively utilized to assess the performance of water quality management approaches [6]. The approach and methodology used for calculating and interpreting water quality indices have evolved over the years [7–11]. The estimated values of water quality indices have been used to indicate water samples' suitability for day-to-day use. They can be utilized effectively in the execution of water quality overhauling programs.

The WQI's variables comprise biological oxygen demand (BOD), temperature, dissolved oxygen (DO), total suspended solids (TSSs), ammoniacal nitrogen (AN), chemical oxygen demand (COD), and pH [12]. Groundwater quality indices (GQIs) are usually forecasted by measuring the standard variables, such as magnesium (Mg2+), calcium (Ca2+), and nitrate (NO−3) [13–15]. The value provided by the WQI is significant enough to help decision makers. However, estimating the WQI is not that simple because subindex calculations are done in the WQI equations themselves. Several methods are available in the literature for the computation of the WQI worldwide, e.g., United States National Sanitation Foundation Water Quality Index (NSFWQI), the British Columbia Water Quality Index (BCWQI), and the Canadian Water Quality Index (CWQI).

The WQI aims to convert the complicated water quality information into straightforward data that is readily useable by researchers and conveyable to people in general. The calculation process in the case of some approaches applied in several countries, including India [6,16], can be exceptionally intricate and time consuming. As a result, the process always contains the risk of attracting unintended miscalculations [17]. Thus, the limitations for the calculations of WQI are the following: (a) time consuming, (b) lengthy process, (c) complicated process, and (d) different equations are used for WQI calculations, hence there are inconsistencies. It may be obvious from the above discussion that no standard method is available for the WQI.

To conquer the above problems, a few scientists have proposed a nonphysical approach that can successfully predict WQI using machine learning (ML) and artificial intelligence (AI) [18–20]. After satisfactory training, an AI-based model can promptly produce a WQI value by eliminating the sub-index calculations. Awareness of AI algorithms is increasing due to benefits that include nonlinear structures, the capability to calculate complicated trends, the capability to manage huge datasets consisting of different data scales, and insensitivity to absent data. The forecasting capability of ML–AI algorithms greatly relies on the procedures and exactness of the data collection and analysis. The continuous evolution of computational ability has allowed researchers to use diverse arrangements of ML–AI models. Approaches such as artificial neural networks (ANNs) [17,21–26] adaptive neuro-fuzzy inference systems [27–31], and support vector machines (SVMs) [32] have been effectively applied to predict the quality of water worldwide. Abba et al. (2020) [33] describe in detail the ML–AI techniques that are used for WQI measurement. Most of these ML–AI algorithms can perform with a certain degree of accuracy and it is challenging to compare them based on their performance [25,34].

The AI techniques used in the present study, sometimes include complex manual implementation to reduce its actual effectiveness for water quality management personnel. Practitioners have a great interest in learning the codes such that the codes can be used for solving complex models like the one discussed above. A comprehensive comparison of such models' applications with required software packages must be carried out to improve the accuracy of predictions and the suitability of the AI-based models. However, various data mining programs do not involve vast manipulation of several AI models; instead, the majority of them just support fundamental methods without optimization.

Our study also aimed to develop a user-friendly interface in MATLAB for practitioners that do not have a programming background. The recommended interface is based on a nature-inspired metaheuristic classification system that integrates particle swarm optimization (PSO), along with an SVM and NBC. The water quality was forecasted using fundamental AI techniques, which involved a particle swarm optimization (PSO) algorithm combined with support vector machines (SVMs) for prediction. The classification and predictive AI system investigated in the study was developed using four AI models (single), hybrid metaheuristic regression, and four ensembles (i.e., stacking, voting, bagging, and tiering). The baseline models encompassed single models by using two AI techniques: SVM and NBC, respectively. Subsequently, the ensemble models integrated the registered single models and utilized voting, bagging, tiering, and stacking methods. The goal of the present work was to propose a framework for flexible water quality modeling. The analytical technique had similar goals: the models' predictive accuracy and applicability. The framework will empower administrators and hydrologists to choose the best analytical tools for water management using AI techniques.

These models should be selected based on specific requirements. However, sometimes applying an ensemble model can significantly enhance the model accuracy and reduce the computational cost. In the present study, the combination of the PSO algorithm's applicability with an SVM and NBC was exploited. A framework was proposed for predicting the WQI in the Pindrawan tank area, Raipur region, Chhattisgarh, India.

#### **2. Study Area**

The Pindrawan tank command area was the area under study (Figure 1); it is situated within 81◦45 –81◦50 E and 21◦20 –21◦25 N in the upper Mahanadi River valley (southeastern part) and Raipur district of Chhattisgarh, India. A total of nine villages, namely, Pauni, Amlitalab, Khauna, Deogaon, Bangoli, Dhansuli, Kurra, Baraonda, and Nilja, come under the study area, which has a tropical wet and dry climate. The temperature in this part of India remains moderate throughout the year. The highest temperatures in the year are observed from March to June.

#### **3. Methodology**

#### *3.1. Data Collection and Water Quality Estimation*

The groundwater samples were collected in 2018 during the pre-monsoon period from hand pumps and bore wells (37 sites), which are extensively utilized for drinking in the Pindrawan tank area. The identification of the sampling points was performed using topographic sheets and GPS, and the maps were prepared using ArcGIS 10.1 (ESRI, California USA). Topographic sheets were utilized to prepare the base map and recognize the general features of the area. GPS techniques were used to identify the geographic position of each sampling point. The collected groundwater samples were investigated for the concentration of different parameters, namely, electrical conductivity (EC), pH, total dissolved solids (TDSs), total hardness (TH), alkalinity, bicarbonate (HCO3 −), chloride (Cl−), sulfate (SO4 <sup>2</sup>−), nitrate (NO3 <sup>−</sup>), fluoride (F−), calcium (Ca2+), magnesium (Mg2+), sodium (Na+), potassium (K+), iron (F<sup>−</sup>), and chromium (Cr2+), per the specification of the Federation and American Public Health Association (2005). The EC and pH of the collected samples were measured using an EC and pH meter on the field. Fluoride concentrations were analyzed based on the selective electrode method. TH, chloride, and alkalinity were measured using titrimetric methods. Heavy metals were measured using an atomic absorption spectrum and prescribed safety measures were considered to avoid contamination.

The locations of the sampling stations are presented in Figure 1. The concentrations of the parameters were compared with the acceptable limits prescribed by BIS (2012) [35]. The permissible limits of potassium, bicarbonate, and sodium are reported in [36,37].

The WQI of the collected samples was calculated using the weighted arithmetic Water Quality Index (WQI) method [38–40]. The weights (*Wi*) that were assigned to each parameter according to their impact on the water quality are shown in Table 1.

Based on the corresponding WQI values, the quality of the groundwater for drinking purposes can be classified into five categories, as presented in Table 2.


**Table 1.** Water quality parameters used when calculating the WQI.

**Table 2.** WQI classification based on the same WQI used by Ramakrishnaiah et al., 2009 [41].


#### *3.2. Utilization of AI for the Prediction of the WQI*

The present study utilized two powerful machine learning approaches for the estimation of the WQI classes by considering the parameter (variables) values as inputs. All 16 variables resembled a variable vector. The analysis was carried out using 1250 variable vectors (250 for each class), which were generated using PSO to contain the whole array of every class. Calibration was conducted using 1250 variable vectors (250 from each class) by applying tenfold cross-validation, and the assessment was done using 250 variable vectors (50 from every class).

3.2.1. Classification and Prediction Using a PSO–SVM Approach Based on the Water Quality Index

The PSO approach is an extremely powerful algorithm that can optimize different model parameters depending on a population's behavior. The approach was proposed by Eberhart and Kennedy in 1995 [42]. The PSO approach has been efficiently used to solve a multitude of nonlinear problems in diversified fields, such as geology [43,44], landslide analysis [45,46], forest fire mapping [47], and flood modeling [48,49]. The algorithm is initialized with a population of arbitrarily selected solutions between the maximum and minimum range of the parameters. Several advantages of the PSO approach, including the ease of implementation and convergence, fewer parameters, and the use of parallel computing, makes this approach a more comfortable choice compared to other available optimization techniques. The algorithm was developed based on the conduct of a group of fish or birds selecting the smallest path to a food source [50]. The algorithm can improve the exchange of information between samples in a population through an interactive learning process that helps the population arrive at a consistent solution. Each solution is considered as "bird", also known as "particle", in the solution space. Such interactions between members of the population allow this algorithm to demonstrate a robust search proficiency and advanced adaptability to various problems. In PSO, particles (solutions) will be collected randomly, and then the best particles will be found by renewing the generation. In each generation, each particle is modified using the next two "best" parameters. The first is the best value based on fitness that has been obtained by it until now (fitness parameters are also stored). This value is called individual best value (pbest). Pbest is the best value of thepartile among all the values obtained so far. The other "best" parameter, which comes from the particle swarm analyzer, is the best value that ha been obtained by any particle in the current population. This highest value is called global best (gbest). The movement of the particles is controlled by these optimal values of pbest and gbest. After finding an improved position, they will continue to control the movement of the flock. In the solution space [51], a particle is comprised primarily of two vectors, namely, velocity (Vi) and position (Xi) [52], by using Equations (1) and (2) respectively. Figure 2 gives the PSO algorithm that is used for the particle optimization. The optimization of these two vectors in the dth dimension is performed through the following equations:

$$\mathbf{w}\_{\rm id}^{t+1} = wv\_{\rm id}^t + \mathbf{c}\_1 r\_{1d} \left( pbest\_{\rm id}^t - \mathbf{x}\_{\rm id}^t \right) + \mathbf{c}\_2 r\_{2d} \left( gbest\_{\rm id}^t - \mathbf{x}\_{\rm id}^t \right) \tag{1}$$

$$
\mathbf{x}\_{id}^{t+1} = \mathbf{x}\_{id}^{t} + \mathbf{v}\_{id}^{t+1} \tag{2}
$$

where, *w* is known as the inertia weight. The value of these parameters specifies the number of particles following the current velocity. The parameters *c*<sup>1</sup> (cognitive coefficient) and *c*<sup>2</sup> (social coefficient) are known as the acceleration factors. The parameters c1 and c2 represent the self-reasoning capability and the ability to acquire information from any particle's contemporary global optimal solution, respectively. *r*<sup>1</sup> and *r*<sup>2</sup> are two independent arbitrary parameter numbers in the range [0, 1] [53]. *pbest<sup>t</sup> id* and *gbest<sup>t</sup> id* are known as the local optimum (best-known position value of any particle *i*) and the global optimum (optimal value obtained by the swarm of all particles).

The coordinate attained by every individual particle in the solution space is recorded by the algorithm. These coordinates are representations of the best solution (fitness value) that has been attained by the particle and is called the local optimum (pbest), whereas the best solution attained by any particle in the vicinity of a specific particle is known as the global optimum (gbest). Although, the particles in the PSO approach tend to move arbitrarily, the best achieved position of the particles (pbest) and the group's best position (gbest) have significant influence over their movement.

Presently, the PSO approach was utilized to produce the optimized values of the WQI, along with all of the 16 water quality variables by considering the variables' lower and upper limits, as presented in Table 3. Based on the corresponding WQI values, the groundwater quality for drinking purposes was classified into five categories (Table 2). To achieve the optimized values of the WQI and water quality variables corresponding to the different classes of water quality, the WQI parameter was considered as the fitness function. The algorithm was set up with an initial population of 50 and processed up to a maximum generation of 500; therefore, a total of 50 × 500 = 25,000 optimized values were generated. The ranges of values for each variable used in the WQI function are presented in Table 3.

The procedure for generating the optimal variables' values was as follows:

Step 1—The fitness function was explained using the WQI function, initializing "50 as population" and "500 as the maximum generation."

Step 2—Each variable's maximum and minimum limits were set while using the WQI function according to Table 3.

Step 3—Every particle's movements were recorded in every generation in the vector form comprising the value of the WQI, together with the subsequent values of the 16 variables.

Step 4—The category (class) of each variable vector was obtained by considering its corresponding WQI, as presented in Table 2.

Step 5—A total of 250 variables vectors were selected from each category in such a manner that the entire range of the particular category should be covered, as given in Table 2.

In every generation, the populace shifted from the initial position to a new appropriate place and produced new fitness values. Every particle's movement in every generation was recorded in the vector form containing the WQI value along with the subsequent variables' values. Every random particle updated its fitness value (WQI) in each generation, which was stored in the database and related variables. In PSO, the population's values (swarm) and max iteration (generation) depend on the user. The flowchart for this work is shown in Figure 3. The classification of the WQI values was performed using a support vector machine and a naive Bayes classifier. Before proceeding with the classification, the dataset was normalized between 0 and 1 to enhance the accuracy. The variables' values in vector format were treated as a feature vector in the normalized dataset.


**Table 3.** Comparison of chemical parameters with prescribed standards.

**Figure 2.** Flowchart for the optimization of the particles.

**Figure 3.** Flowchart describing the workings of the PSO.

#### 3.2.2. Classification Using a Support Vector Machine

The SVM classifier [54] plays an essential and comprehensive role in classification due to its high accuracy and ability to deal with high-dimensional data. The simple form of the classification is the binary used for separating two types of objects belonging to positive (+1) and negative (−1) classes. A support vector machine uses two kinds of concepts to distinguish between two classes: (1) separation from the margin and (2) the kernel function.

The simple two-dimensional data can be classified by using a straight line. The points that fall above the line belong to one class, and the points that fall below the line belong to another class. The high-dimensional data can be classified by using the hyperplanes. However, in a binary classification, multiple planes can be drawn such that they separate the data into two classes. As such, which plane will be selected for the classification? In this case, the hyperplane that gives the maximum margin will be selected for classification. Therefore, we choose the hyperplane such that the distance from it to the nearest data point on each side is maximized. The classification of the data with the best margin hyperplane is shown in Figure 4.

In Figure 4, there are two types of data points: filled and unfilled dots. Three planes exist, which are named H1, H2, and H3. H1 does not successfully classify the data points. Planes H2 and H3 are both capable of classifying data points, but H2 gives a smaller margin than plane H3.

This is why plane H3 is selected for the classification. Sometimes the data is not classified by hyperplanes because of its distribution in a vast space. In that case, we use a nonlinear separation for the classification. The SVM classifier can efficiently perform this nonlinear classification by using kernel functions. The nonlinear classification is presented in Figure 5. In Figure 5, there are two types of objects, as identified by the solid and hollow dots. The objects represented in this figure cannot be separated using a linear hyperplane; the support vector machine performs this task using kernel functions. The kernel function separates the data in the feature space by using a linear hyperplane.

In this work, the SVM classifier separates the individual water quality classes with hyperplanes by using the radial basis kernel (Gaussian) function [55–58]. The distance of a feature vector from the hyperplanes determines its probability of featuring in a specific class. The normalized dataset and the class labels were used as inputs in the present study. The dataset was randomly divided 80:20, where 80% of the dataset was used for training purposes using tenfold cross-validation. In the tenfold cross-validation, the entire dataset was divided randomly into ten equal-sized subsamples. A single subsample was used for testing purposes, and nine subsamples were used for training purposes on ten subsamples. This process was repeated ten times until each of the 10 subsamples were used exactly once for testing purposes. The remaining 20% of the dataset was used for testing and validation purposes.

**Figure 5.** Use of the kernel function in an SVM.

#### 3.2.3. Classification Using Naive Bayes Classifier

Naive Bayes classifiers are based on Bayes Theorem with a family of algorithms with the same principle, i.e., each pair of features being categorized is independent of every other. The fundamental naive Bayes assumption is that every feature makes an unbiased and identical contribution to the outcome. A naive Bayes classifier is a probabilistic machine learning model that is used for a classification task. The crux of the classifier is based on Bayes' theorem:

$$P(A|B) = \frac{P(B|A)P(A)}{P(B)}\tag{3}$$

By using Equation (3), the probability of event *A* happening can be measured by considering that event *B* has occurred. Here *A* is the hypothesis and *B* is the evidence. One assumption that is considered here is that all features are independent/autonomous, which means the presence of one particular feature does not affect the other. Hence it is called naive. Before the PSO–NBC analysis, the dataset was normalized to enhance the performance of the model. A total of 80% of the dataset was used to train the algorithm, whereas 20% of the dataset was used to study the algorithm's prediction accuracy. In this work, continuous values that were associated with each feature were assumed to be distributed according to a Gaussian/normal distribution.

#### **4. Results and Discussion**

#### *4.1. Water Quality Index (WQI) Analysis of the Field-Based Samples*

The concentration, distribution, and impact of different physicochemical parameters observed from water samples collected from the Pindarwan tank area are discussed in this section. The ranges of concentrations observed for various parameters and the percentages of total samples exceeding the prescribed limit are presented in Table 3, along with their undesirable effect on groundwater quality and human physiology. This section provides an overview of the spatial distribution of the physicochemical parameters that were measured in the Pindarwan tank area; a more detailed description is provided in Figures A1–A15 in the Appendix A.

Out of 37 samples, 32.43% of the samples had excellent water quality, 43.24% of the samples had good water quality, 21.62% of the samples had poor water quality, and 2.71% of the samples had very poor water quality. This may be due to the heavy concentrations of metals, such as Pb and Cr, due to nearby industries, which involve mining activities, thermal power plants, etc. The areas corresponding to these WQI values are presented in Figure 6.

Figure 7 represents a correlation plot between the WQI and the parameters observed from the study area's water samples. The correlation between the independent parameters can be neglected in the plot since these plots are mostly empirically based on specific values. In decreasing order, the influence of different parameters can be presented as chromium, sodium, fluoride, potassium, chloride, conductivity, total dissolved solids, alkalinity, bicarbonate, and pH. Contributions from the rest of the parameters on the

overall water quality were much less compared to these parameters. Through observing Figure 7, it can be concluded that water quality for drinking was susceptible to heavy metal concentrations, such as chromium.

**Figure 6.** Spatial distribution of the WQI.

**Figure 7.** Correlation plot between various groundwater quality parameters.

Based on the WQI, the sample area's drinking water quality was divided into four categories. No sample was observed to be unsuitable for drinking based on the analysis. Very poor water quality was observed from the Raikheda pond area due to a very high chromium concentration. Poor water quality was observed in significant parts of the Deogaon, Dhansuli, Bangoli, Amlitalab, and Khauna villages. Most areas of all the villages

had good water quality. Excellent water quality was observed in Saragaon, Nilja, Dhansuli, Bangoli, Khauna, Baronda, and Pauniarea. The observed water qualities may suggest that most of the study area's water quality is satisfactory and there is no immediate danger for the population. However, the values of certain parameters, such as the chromium concentration, total hardness, and total dissolved solids, were alarmingly high for many areas and could become worse. This may significantly influence the present scenario of the water quality in the study area under consideration. Therefore, concerned authorities should note the situation and plan proper steps for maintaining or improving the current situation of the drinking water quality in the study area.

Furthermore, the averages and ranges of the values of different parameters corresponding to water quality are presented in a boxplot format in Figure 8a–p. The concentration of some parameters such as alkalinity, chloride, conductivity, chromium, iron, bicarbonate, sodium, and TDSs are found to be directly proportional and has much more significant impact on the WQI of the study area. These are, therefore, the parameters that have to be first taken care of when aiming to improve the water quality for the specific study area. The influences presented in Figure 8a–p are the combined effect of the concentration of each parameter and the relative weight of each parameter. Therefore, even if a parameter's relative weight is much less, it could make a significant impact if it had a very high concentration. However, these plots are strictly applicable to the present study area and no inference should be derived from these plots for any other samples. The boxplots and correlation plots can be extremely useful for conveying a detailed picture regarding the water quality of the study area and the influence of different parameters on the water quality.

**Figure 8.** *Conts.*

**Figure 8.** Ranges of various parameters corresponding to the water quality: (**a**) alkalinity, (**b**) calcium, (**c**) chloride, (**d**) conductivity, (**e**) chromium, (**f**) iron, (**g**) fluoride, (**h**) bicarbonate, (**i**) potassium, (**j**) magnesium, (**k**) sodium, (**l**) nitrate, (**m**) sulfate, (**n**) TDSs, (**o**) total hardness, and (**p**) pH.

#### *4.2. Result from the PSO–SVM Study*

The performance of the model is presented using the confusion matrix in Figure 9a. The confusion matrix is used to explain the model's classification and overall performance on the testing datasets whose original labels are known. The instances in a predicted class and actual class are represented in every row and each column respectively (or vice versa). In Figure 9a, the rows from the top to the bottom correspond to the excellent, good, poor, very poor, and unfit for drinking water qualities, respectively, as predicted using the SVM classifier.

Furthermore, the columns from left to right follow a similar arrangement of the target class (actual classifications based on the WQI values). Each column related to these classes had 50 variable vectors (water quality class from excellent to unfit for drinking), totaling 250 variable vectors. In the first row, 50 variable vectors are presented, indicating 50 excellent water class WQIs, where the system predicted them all as being in the excellent category. Similarly, in the second, third, fourth, and fifth rows, a sum of 61, 54, 69, and 16 variable vectors are presented, respectively. The result indicates that the algorithm predicted 61 samples as good quality, 54 as poor quality, 69 as very poor quality, and 16 as unfit for the drinking category. The prediction accuracies corresponding to each class are also presented in the last column from the left-hand side. The overall accuracy of the algorithm was found to be 77.60%. Furthermore, a difference between the classifications based on actual values of the WQI and the predicted classification based on the SVM classifier is presented in Figure 9b.

**Figure 9.** Comparison between the predicted class and target class using the SVM approach: (**a**) confusion matrix and (**b**) column plots.

#### *4.3. Discussion of the PSO–NBC Approach*

The PSO–NBC study was carried out by considering the same dataset as in the PSO– SVM approach. The test accuracy is discussed using the confusion matrix presented in Figure 10a. The rows and columns marked as 1 to 5 indicate the excellent (1), good (2), poor (3), very poor (4), and unfit for drinking (5) water qualities. The 51 variable vectors in the first row indicate that the algorithm identified 51 variable vectors as excellent water quality when there were 50 actual excellent water categories (1 more due to misclassification). Similarly, in the second (50 variable vectors of good water quality), third (50 variable vectors of poor water quality), fourth (50 variable vectors of very poor water quality), and fifth rows (50 variable vectors of unfit for drinking water quality), the algorithm placed 57 (good water quality), 46 (poor water quality), 51 (very poor water quality), and 45 (unfit water quality) variable vectors. The prediction accuracy of the algorithm corresponding to each class is presented in the sixth column. The total accuracy of the algorithm was observed to be 92.80%.

The comparisons of the model-predicted outcomes against the actual WQI values are graphically represented in Figure 10b.

**Figure 10.** Comparison between the predicted class and target class using the NBC approach: (**a**) confusion matrix and (**b**) column plots.

*4.4. Comparison between the PSO–SVM and PSO–NBC Approaches*

The performances of the PSO–SVM and PSO–NBC approaches used in the present study are presented inFigure 11.

**Figure 11.** Comparison of the predicted outcomes using the PSO–SVM and PSO–NBC approaches.

The figure indicates that the PSO–SVM algorithm predicted some classes (excellent and poor water categories) with significant accuracy; however, significant deviations were observed in the model's performance for the other categories. On the other hand, the prediction accuracies of PSO-NBC were much higher for all the classes and did not distinctly deviate for any specific categories. Therefore, a naive Bayes classifier aided by particle swarm optimization can be efficiently used to construct a machine learning model to classify water for drinking purposes.

#### **5. Conclusions**

The process of WQI estimation is often associated with handling large quantities of identical data. This can create significant confusion during the calculation process and make decision making difficult. A machine-learning-based predictive model can assemble the necessary information and predict the groundwater quality with significant accuracies. This study aimed to utilize modern machine learning techniques for the prediction of water

quality for drinking. The groundwater samples collected from parts of the Pindrawan tank command area were used for testing and validation of the developed model. The collected samples were tested for different parameters of water quality and the subsequent values of WQI were computed. Conclusions derived from the present work are as follows:


The general outcomes from the present research indicate the benefits of using ensemble machine learning techniques, where outcomes from several different algorithms can be combined and used to achieve predictions with enhanced accuracies. Finally, with the help of a user interface, the algorithm developed in the present study can be used for water quality estimation in different regions across the globe.

The classification in the present study was carried out by taking the synthetic dataset that was generated using particle swarm optimization. However, the developed approach can be further improved if more real data is available. Therefore, the authors suggest using a larger field dataset to obtain better accuracy, though this is often a difficult undertaking provided the painstaking process of sample collection and laboratory analysis for all the water quality parameters. The developed algorithm can be further improved by studying its performance and fine-tuning it with different input parameters.

**Author Contributions:** Conceptualization, P.A., A.S. and S.P.; data curation, P.A.; formal analysis, P.A. and S.K.; investigation, P.A., S.K. and A.B.; methodology, P.A., A.S., S.K., A.A., A.B. and S.P.; project administration, A.S., V.G.K.V. and S.P.; resources, P.A., A.A. and J.S.; software, P.A., A.B., C.S.R.A. and R.D.; supervision, A.S., J.S. and S.P.; validation, P.A., A.B., C.S.R.A. and R.D.; visualization, A.S., A.A., V.V.R.D. and S.P.; writing—original draft, P.A., A.S., A.B., C.S.R.A. and J.S.; writing—review and editing, A.S., S.K., A.A., V.G.K.V., V.V.R.D., J.S. and S.P. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data may be obtained from the authors upon request.

**Acknowledgments:** The authors would like to sincerely thank the Indian Institute of Technology (Indian School of Mines) authorities, Dhanbad, for extending their support and allowing the use of facilities from various engineering departments, i.e., Environmental Science and Engineering, Civil Engineering, Mining Engineering, and Computer Science Engineering Departments. The authors would also like to acknowledge the support received and facilities used from the Water Resources Department, Government of Chhattisgarh, and Indira Gandhi Krishi Vishwavidyalaya, Raipur, in carrying out this research work. A.A. acknowledges the infrastructural support provided by the Indian Institute of Technology Roorkee.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**


**Table A1.** Locations used for the groundwater samples.

**Figure A1.** Spatial distribution of EC.

**Figure A2.** Spatial distribution of PH.

**Figure A3.** Spatial distribution of potassium.

**Figure A4.** Spatial distribution of chloride.

**Figure A5.** Spatial distribution of iron.

**Figure A6.** Spatial distribution of magnesium.

**Figure A7.** Spatial distribution of calcium.

**Figure A8.** Spatial distribution of SO4.

**Figure A9.** Spatial distribution of HCO3.

**Figure A10.** Spatial distribution of HNO3.

**Figure A11.** Spatial distribution of fluoride.

**Figure A12.** Spatial distribution of alkalinity.

**Figure A13.** Spatial distribution of TDSs.

**Figure A14.** Spatial distribution of Cr.

**Figure A15.** Spatial distribution of TH.

**Figure A16.** Flowchart of the procedure followed in the study.

#### **References**


## *Article* **Socio-Hydrological Approach to Explore Groundwater–Human Wellbeing Nexus: Case Study from Sundarbans, India**

**Soham Halder 1, Pankaj Kumar 2,\*, Kousik Das 1, Rajarshi Dasgupta <sup>2</sup> and Abhijit Mukherjee 1,3,4**

	- Kharagpur 721302, India

**Abstract:** Coastal regions are the residence of an enormously growing population. In spite of rich biodiversity, coastal ecosystems are extremely vulnerable due to hydroclimatic factors with probable impact on socio-economy. Since the last few decades, researchers and policymakers were attracted towards the existing water demand–resource relationship to predict its future trends and prioritize better water resource management options. Water Evaluation And Planning (WEAP) serves the wholesome purpose of modeling diverse aspects of decision analysis using water algorithm equations for proper planning of water resource management. In this study, future groundwater demand (domestic, agricultural, and livestock sector) in the fragile Sundarbans ecosystem was estimated considering different human population growth rates (high, low, and current) for 2011–2050. The results showed that the sustainability of coastal aquifer-dependent rural livelihood is expected to face great danger in the near future. The total groundwater demand is expected to rise by approximately 17% at the current growth rate in the study area to fulfill the domestic and agricultural requirement, while this value goes up to around 35% for a higher growth rate and around 4% for a lower growth rate. The impact of increasing groundwater demand was analyzed further to identify any socioeconomic shifts in this region.

**Keywords:** groundwater demand; Sundarbans; agriculture; WEAP; vulnerability; sensitivity loop; water–human wellbeing nexus

#### **1. Introduction**

The history of humankind can be scripted with regard to the human and water interactions and interrelationship [1]. Water is an indispensable natural resource. However, sustainable management of water resources requires a proper understanding of interactions between human and hydrological parameters [2]. Almost 70% of global water (both surface and groundwater) is utilized in the agricultural sector, making it the largest consumer of water resources worldwide [3]. According to the water bulletin published by Circle of Blue [4], nearly 80% of Indian livelihoods directly or indirectly rely on groundwater. Furthermore, rapid population expansion entails intensification of the food manufacturing sector, demanding a constant water supply. Rockstrom et al. (2001) defined a country as water-stressed where per capita per year (pcpy) water availability is less than 1700 m3 [5]. As stated in the Ministry of Jal Shakti report [6], the annual water availability per capita was 1545 m<sup>3</sup> in 2011, which may further reduce to 1486 m3 in 2021, signifying India as a water-stressed country. According to the study conducted by Vorosmarty et al. [7], nearly 2.4 billion people, which is 1/3rd of the world population, are residing in water-stressed

**Citation:** Halder, S.; Kumar, P.; Das, K.; Dasgupta, R.; Mukherjee, A. Socio-Hydrological Approach to Explore Groundwater–Human Wellbeing Nexus: Case Study from Sundarbans, India. *Water* **2021**, *13*, 1635. https://doi.org/10.3390/ w13121635

Academic Editor: Alexander Yakirevich

Received: 7 May 2021 Accepted: 8 June 2021 Published: 10 June 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

countries, whereas this value might increase to 2/3rd by the end of 2025. Moreover, the impact of climate change is intensifying over South Asian countries with adverse effects on the agricultural sector, especially in coastal regions. Water resources, agriculture, human health status, and the overall ecosystem are in ever-increasing stress due to inconsistent rainfall, rapid temperature increases, and the rise in sea level along with frequent intense extreme climatic events such as cyclonic storms in coastal regions of India [8]. Global climate change and increased demand for economic expansion with population growth at full tilt have led to the disproportionate exploitation of water resources causing declining water availability per capita specifically in developing countries such as India [9].

In the recent past, deserving emphasis was given to the term "vulnerability" by the scientific community. Vulnerability is described as the sensitivity of the system to any stressor variable, which is also dependent on the state, rate, and duration of exposure to the variable as well as capability to adapt to the stressful situation [10]. The lower part of the Gangetic delta comprising Sundarbans, the largest mangrove forest in the world, is a well-recognized hotspot of climate change [11]. The entire Sundarbans region is extremely fragile due to the low-lying coastal plain, and in addition to that, excessive reliance on rainfed agro-economy affects the remunerative security of farmers as a result of inconsistent rainfall driven by climatic change, which has further worsened rural livelihood [12,13]. The sustainability of livelihood in Sundarbans and associated coastal zones is facing great insecurity due to over-dependence on the agricultural sector [14]. Additionally, increasing soil salinity due to proximity to the Bay of Bengal and frequent cyclonic storms (such as Aila and super cyclone Amphan in the recent past) along with the increasing prevalence of harmful pests and disease because of elevated temperature and humid conditions further deteriorates agricultural output [15]. Saline river water and brackish subsurface groundwater have led the residents of the Sundarbans region to depend on fresh groundwater abstracted from the confined aquifer at 160 to 300 m below ground level. The drinking water requirement of nearly 4.5 million people residing in this region is fulfilled by deep groundwater [16], whereas merely 32% of total households receive water from the piped groundwater facility [17], intensifying the hardship of local people to obtain one of the most important natural resources—water.

Socio-hydrology or hydro-sociology is a broad domain that combines both socioeconomic aspects and environmental features of hydrology, concentrating on basic scientific principles of interrelationship, feedback mechanisms consisting of two separate loops of socio-economy and community sensitivity loops, and evolving human behavior. The sensitivity loop, a decisive component, depends solely on the community behavior and water management decisions which are driven by a community's social and environmental values and local action translating into direct or indirect impacts on any marginal change in water variables. Naturally, the sensitivity loop determines how behavioral response will impact on future available water quality as well as quantity [2]. Various tools are available to assess water resource management along with coupling with socio-economic parameters. Some of the widely used models are mentioned below. The Spatial Agro Hydro Salinity Model or SAHYSMOD (developed by International Institute for Land Reclamation and Improvement, Wageningen, Netherlands) is used to incorporate hydrological and physical consequences along with social and economic factors to understand the issues and approach to the sustainable development of any basin [18]. The ModSim, also named as Modular Simulator (developed by Colorado State University, Colorado, USA), is widely used as a decision support system (DSS) for both short and long-term planning in any river basin to develop advanced policy-making for better water allocation strategy [19]. Among many, Water Evaluation And Planning (WEAP) (developed by Stockholm Environment Institute's U.S. Center, Massachusetts) modeling has emerged as one of the best platforms in many catchments to understand and implicate water management strategies under varying climatic and socio-economic scenarios [20,21]. A major advantage of using WEAP lies in the simulation of the water situation in recent times, quantitative evaluation of water, and based on that, proper management scenarios for water demand and supply problems.

There are some previously set objects along with procedures provided by WEAP to solve any management-related problem of any stream, reservoir, watershed, or canal by applying a scenario-based approach [22]. Previously, the WEAP model has been successfully used to supervise the transboundary water resource-related issues associated with political conflicts in the Jordon River basin [23]. Ospina et al. (2011) [24] analyzed the adaptive strategies against increasing climate change to understand an effective supply–demand scenario in the Sinu-Caribe River basin, located in Columbia. In India, one major case study (Polavaram project) was conducted by Bharati et al. (2008) [25] to estimate the water availability–demand conflict from Godavari River to Krishna River. Nevertheless, there are no such studies on the hydrological impact on the socio-economy in the highly vulnerable Sundarbans region, where major risk factors are increasing groundwater salinity and depleting groundwater storage [26].

This study aimed to identify the basic drivers of the "sensitivity loop", the most important feedback loop in socio-hydrology [2], to understand the human behavior and management policies under a hydrological context in the Sundarbans region. This is one of the first studies to estimate the groundwater demand and its impact on socio-economic parameters with the application of the WEAP model in the Sundarbans region. Under this model, each scenario describes compatible plots of how a system might evolve in the future (within a particular period) under the specific socio-economic condition. Scenario analysis acts as an effective tool for developing proper policy amidst uncertainties [27,28].

More precisely, the research objectives are:


The integrated hydro-socioeconomic approach is applied to provide scientific evidence to guide policy-makers in formulating better water management practices. Throughout the entire manuscript, the water should be regarded as groundwater and the agricultural crop is specified as paddy or rice.

#### **2. Study Area**

#### *2.1. Description of the Study Area*

Combined depositional activities of the world's largest river network, namely the Ganges–Brahmaputra–Meghna (GBM), shaped the Sundarbans delta, covering an area of nearly one million hectares, shared among India (38%) and Bangladesh (62%) [29]. Geographically, the Indian Sundarbans region lies within 21◦30 to 22◦40 N latitude and 88◦05 to 89◦55 E longitude and is bounded by the Dampier and Hodges line from the northern side, the Hoogli River (a distributary of the river Ganges and a part of the Bhagirathi– Hoogli river channel) from the west, and Bangladesh from the east, while the Bay of Bengal demarcates the southern boundary. The Sundarbans deltaic complex is characterized by extensive sand flats, beaches, coastal dunes, and estuaries, while creeks, mudflats, and mangrove swamps are other important morphotypes [30,31]. The warm-humid climate is prevalent throughout the hydrological year with an average annual precipitation of around 1600 mm in the study area [26]. Lithologically, the study area is categorized into three zones: (i) upper shallow aquifer with a depth range up to 60 mbgl; (ii) the main aquifer (up to a depth of 140–150 mbgl), which is characterized by semi-confined to locally unconfined comprising medium to coarse grained sand with alternate layers of gravel and varying in thickness up to 75 m in the southern part; and (iii) the aquifer beyond 150 m is regarded as a deeper aquifer which is separated from the main aquifer by a clay layer which consists of medium to coarse sands with inter-beds of clay and silt [32–34]. The Indian Sundarbans Biosphere Reserve comprises around 9630 sq. km., among which 5367 sq. km. of forest area is cleared for habitation extending over 19 community development or C.D. blocks

(13 blocks from South 24 Parganas and 6 from North 24 Parganas) [31,35]. The present study was conducted specifically in thirteen blocks of South 24 Parganas, which are further divided into three zones, i.e., Zone I (Patharpratima, Kakdwip, Namkhana, and Sagar), Zone II (Canning I, Canning II, Basanti, Gosaba), and Zone III (Jaynagar I, Jaynagar II, Mathurapur I, Mathurapur II, Kultali) (Figure 1). The southern region (Zone I in this study) is famous for its tourist attraction, which is mainly divided into three types, viz., religious tourism, beach tourism, and wildlife tourism. Sagar Island is well known as a Hindu pilgrimage site, attracting millions annually. Beach tourism is mostly prevalent in the southern beaches of the Namkhana community development block. Although they are elusive, tigers are the major attraction of wildlife tourism [36]. According to Mitra et al. (2009) [37], water quality in the Sundarbans region has been deteriorating over the last few decades due to the multifarious impact of climate change and anthropogenic activities. Furthermore, saline intrusion is evident from ionic ratios and salinity content of groundwater [26].

**Figure 1.** Location map of the study area, the Indian Sundarbans delta. The brown color signifies Zone I, the blue color designates Zone II and the green color represents Zone III.

#### *2.2. Livelihood of Indian Sundarbans Region*

Agriculture is the primary occupation of the inhabitants of Indian Sundarbans. Sánchez-Triana et al. (2014) [38] reported that among the total working population, 23.6% belongs to cultivators while 36.1% are agricultural laborers, summating around 60% of the total working population solely dependent on the agricultural sector. However, the indirect dependence on agriculture may well reach over 90%, which also supports the backbone of an agrarian economy. A large proportion of the inhabited area, measuring about 2648 sq. km, is dedicated to agriculture [39]. Due to inefficient irrigation amenities in non-monsoonal months, the mono-crop rainfed paddy (Aman rice) is cultivated during the rainy seasons (July–September). Farming sites are dominated in marginal areas. The second-largest occupation is fisheries and aquaculture [38]. Extensive traditional fishing is reported in Frazergunj, Sagar, Bakkhali, and Kalisthan [40]. Apart from these, rural livelihood is largely dependent on mangrove forest-derived products such as honey and timber. Eco-tourism is also promoted in Indian Sundarbans, which serves both remunerative and conservative purposes [41].

#### **3. Required Dataset**

Block-level census data (human) were obtained from the Office of the Registrar General and Census Commissioner, India, Ministry of Home Affairs, Government of India, Census Digital Library [42]. The livestock population was retrieved from the statistical handbook of South 24 Parganas under the Department of Planning and Statistics, Govt. of West Bengal [43]. The daily per capita groundwater (both from the deep aquifer for drinking purposes and shallow aquifers for household works) requirement was set at 40 L (14.5 cubic meters annually per capita) for humans, while it was kept at 10 L (5 cubic meters annually per capita) for livestock [44]. The groundwater level data were retrieved from the Central Groundwater Board Annual Report [45]. The agricultural statistics were extracted from the official website of the Directorate of Economics and Statistics under the supervision of the Department of Agriculture, Cooperation and Farmers Welfare [46]. The inland and marine fisheries data were retrieved from the Handbook of Fisheries Statistics (many editions) [47]. Another socio-economic parameter (literacy rate) was retrieved from census data and the Department of Planning and Statistics (Govt. of West Bengal) [42]. The entire dataset in the reference scenario is provided in Table 1.

**Table 1.** List of parameters considered for this study.


#### **4. Methodology and Model Set Up**

The WEAP model was applied in this study due to its robustness and utility based on data availability as it is useful in performing both aggregated and disaggregated forms of water management analysis in multiple sites [28]. To estimate the groundwater demand and unmet demand which indicates the water shortage, the study area (three zones comprising 13 blocks) was divided into three demand sectors: domestic, agriculture, and livestock. The primary intention was to incorporate the management problems regarding the availability, consumption, and conservation of water resources for comprehensive policy development. The WEAP model was used because it implements a coordinated approach to simulate the present and future scenarios with the consequent optimization of water resources, ultimately aligning them into policy construction [28,48,49]. The water resource system is represented by both hydrological and water quality modules within the WEAP model. Further, these modules can be represented by various elements such as administrative boundaries, river networks, groundwater, reservoir withdrawals, ecosystem requirements, wastewater treatment facilities, etc., depending on our research objectives [28]. The methodology flowchart is provided in Figure 2.

**Figure 2.** Flowchart of methodology (Blue color: Secondary data, Pink color: Hydrological data analysis in WEAP, Yellow color: Socio-economic parameter analysis, Grey color: Incorporation of results, Green color: Conclusion).

The year 2011 was considered as the base year. The simulation on the socio-economic scenario (e.g., human population growth, water consumption rate) continued until 2050, which was specified as the end year of modeling. Water quantity is calculated for each node and link on a regular (monthly or annually) interval in the selected system. The overall groundwater demand for domestic purposes is a function of human population size and the rate of utilization. Two fundamental parameters, annual activity level and annual water use rate, were defined for the demand sectors. The key assumptions made for these two parameters for the human population, agricultural status, and livestock population are given in Table 2.


**Table 2.** Table for initial parameters to set up model.

According to the specified demand priorities, WEAP allocates the available water to satisfy the different demand sites [27,28]. In this study, the domestic and agricultural demand sites are given a priority value of one as these are the major groundwater consuming sites, whereas livestock water demand is described with a priority value of two for lesser consumption. The description of the hydrological model setup is provided in Figure 3. The annual activity level for domestic, agriculture, and livestock defines the total human population, land irrigated by groundwater, and livestock population, respectively, in 2011 for three zones. The domestic groundwater utilization in the study area was determined by five Focused Group Discussions (FGD) involving cultivators conducted during February, 2020 in Zone I of the study area [50]. The consumption rate or the amount of groundwater that is not returned to the system is estimated as 25%, 80%, and 25% for the respective annual activity levels [44,51,52]. The human and livestock population were presumed to increase over the years (2011–2050). The annual rural population growth rate was estimated from the previous census data, which was 1.44%, 1.93%, and 1.73%, respectively, for Zone I, II, and III. For the simulations, a minimum growth rate of 1% and maximum rate of 2.5% were determined (according to various growth simulation model outputs reported in macrotrends [53] and several print media [54]). This was further confirmed by the rural population growth rate according to World Bank data [55]. The livestock population increase rate (1%) was retrieved from Vikashpedia [56].

**Figure 3.** Hydrological simulation model in the study area (Sundarbans).

The groundwater use rate for agricultural purposes was assumed to decrease due to certain improvements in technologies and adaptation measures. The rainwater harvesting potential for the Sundarbans region was already estimated by Bhadra et al. (2018) [35]. Along with these data, the seasonal effective agricultural land area was estimated (District statistical abstract, many editions); further groundwater requirement for these areas was calculated [57]. Additionally, the potentially stored rainwater amount was subtracted from the total water demand. It was estimated that technological improvements for agriculture as well as awareness among communities in the Sundarbans region might cause a decrement in groundwater demand by 0.2 cubic meters with an interval of 5 years. The agricultural land area was considered constant throughout the study period as only the impact of varying human growth rates on water demand was studied. The overall activity was multiplied by the rate of water use to calculate the water consumption. The activity level was adopted for socio-economic demand analysis. The annual water use rate defines the mean water consumption rate per unit of activity. The total groundwater demand was calculated by multiplying the activity level and the rate of water consumption. It is worth mentioning that the annual water use rate does not indicate the total amount used, rather this represents the average annual water consumption per unit of activity.

$$\mathbf{D} = \mathbf{A} \times \mathbf{C} \tag{1}$$

where: D = Groundwater demand at each node;

A = Activity level;

C = Water consumption rate at each node.

The reference scenario is described as the ongoing status of the system without any futuristic strategy and policy management. This is also helpful in distinguishing the demand sites and where more focus is to be given [28,58]. Prior to the simulations, the demand priorities were defined to specify the importance of groundwater requirement in-demand sites and assure that demands are met properly [28,48]. Therefore, the three scenarios selected for this study are:


#### **5. Results**

#### *5.1. Decadal Trend of Census Data*

According to the Indian Census, the working population is divided into two major groups, viz, main workers, who work a substantial portion (more than 180 days) of any year, and marginal workers, who work less than 180 days in a year. Furthermore, the main worker population is divided into four classes, namely, agricultural cultivators, agricultural laborers, household industry workers, and other workers (Indian Census, 2011). For simplicity, cultivators and agricultural laborers are considered as one group "Agriculture-dependent population". Block-wise census data for two decades (1991–2011) show a hike in marginal worker percentage compared to main workers. A steady decline in the total main worker population (26.09% in 1991 and 22.9% in 2011) was observed in the study area. The male main worker population in the study area has decreased from 47.63% to 34.85% during 1991–2011, while the opposite trend was observed for the female main worker population with an increasing trend (2.76% in 1991 and 5.44% in 2011 of total population). The male marginal worker population in the study area has increased from 1.56% to 18.72% during 1991–2011. A similar trend was observed for the female marginal worker population, which has shown an increasing pattern (6.09% in 1991 and 12.85% in 2011) (Figure 4a). These data are also significant as men are drifting away from main to

marginal status due to mass outward migration of male workers to urban areas as well as different states or even overseas. Furthermore, the non-agriculture occupation is becoming an important sector to absorb the increasing labor force.

**Figure 4.** (**a**) Gender-wise percentage of marginal worker population and (**b**) the percentage of cultivator population in the study area (blue bar—male, brown bar—female).

The occupation preference of the total workforce is still inclined towards agriculture in the study area as 54% of workers are strongly dependent on it according to 2011 census reports. Another noteworthy observation is that the percentage of self-cultivators is declining sharply (39.61% in 1991 and 22.22% in 2011) along with a steady decrease in agriculture laborers (35.44% in 1991 and 32.34% in 2011) (Figure 4b). Since 1991, increasing participation (around 24% in 1991 to 46% in 2011) in the non-agricultural sector has been observed. Nevertheless, this high growth rate in non-agriculture has decelerated, and during the recent decade (2001–2011), the shift was recorded to be 2%, while the shift in the past decade (1991–2001) was 20%. As a result, the considerable change in the structure of the workforce towards non-agriculture observed in the 1990s was reduced in the 2000s. This is in contrast to the growth of GDP in India in its non-agriculture sector that had registered its highest ever during 2001–2011. This indicates a glitch in the structural change of the workforce.

Moreover, the study area is famous for its inland and marine fishery products [40]. Extensive inland fishing is reported in all the blocks of the study area (Figure 5a,b). The fisheries data from 1992 to 2018 have shown that there is an observable decrease in marine fish production [47]. Although various new techniques are available for marine fish capture nowadays, increased cyclones, death rates, and other associated geopolitical complications have forced the marine fishermen to shift into inland fisheries or any other occupation. Subsequently, the total marine fish production has decreased steadily from 20% to 10% over the above-mentioned period (Figure 5c).

**Figure 5.** Statistics of fisheries. (**a**) Represents the effective area for inland fisheries over the years in the study area. (**b**) Represents the number of persons involved in inland fisheries. (**c**) Representation of contribution of marine fisheries to total fish production in the study area over the years (1992–2018).

#### *5.2. Water Demand Derived from WEAP*

#### 5.2.1. Domestic Groundwater Demand

The total groundwater demand under different scenarios is shown in Figure 6. The domestic water demand for humans is expected to increase with population growth as groundwater demand for domestic purposes is a function of human population size and the rate of utilization. The water requirement under three scenarios for three zones is shown in Figure 7. According to the reference scenario, the domestic groundwater demand (annually) at the start of modeling (2011) was estimated as 14.62, 16.54 and 16.82 million cubic meters (mcm) for class I, II and III, respectively. According to the current growth rate in respective areas, the water demand for the domestic purpose for those respective zones is calculated as 26.13, 35.80, and 68.24 cubic meters in 2050. When the scenarios are considered, at a higher population growth rate (2.5%), the groundwater demand in 2050 showed an expected hike up to 31.65, 43.32, and 124.72 cubic meters in respective zones. At a lower growth rate (1%), the water demand decreased to 21.55, 24.38, and 34.08 mcm in respective areas in 2050. The higher domestic groundwater demand in Zone III is due to more population than the other two zones. The sector-wise groundwater demand is provided in Table 3.

**Figure 6.** Total groundwater demand under different scenarios in the study area.

**Figure 7.** Domestic groundwater demand under different scenarios.

**Table 3.** Table of sector-wise groundwater demand (in mcm) in 2050 (HGR—High growth rate, LGR—Low growth rate).


#### 5.2.2. Agricultural Groundwater Demand

Sundarbans, a prominent agro-economic region, contributes over 60% of the district's (South 24 Parganas) annual rice production (calculated from district statistical abstract). Due to the non-availability of data for areas irrigated by groundwater only, the total canal irrigated area was subtracted from the total agricultural area for respective zones. The study on future trends in groundwater abstraction for irrigation aims to understand the relationship between evolving agricultural demand and the consequent groundwater withdrawal rate. The groundwater demand for respective zones in the base year (2011) was 80, 50 and 120.6 mcm. It was considered that the water consumption per hectare would decrease yearly due to technological improvements (e.g., rainwater harvesting and artificial groundwater recharge) and irrigational efficiency. Based on this scenario, the rate of groundwater use was estimated to decrease by 0.2 cubic meters every 5 years. With this assumption, the groundwater demand is likely to decline up to 67.2, 42, 109.2 mcm in 2050 for the respective zones (Figure 8).

**Figure 8.** Agricultural groundwater demand under different scenarios.

#### 5.2.3. Livestock Groundwater Demand

Cattle, buffalo, goat and poultry birds are the major livestock animals in the study area. Cattle and buffalos are reared because of milk and meat while the poultry farms are reared for meat only. Cattles and buffalos are the major water consuming livestock followed by poultry in the study area. The water consumption rate for poultry farms was retrieved from The Poultry Site [59] and Water Consumption in Broiler [60]. The groundwater requirements for dairy animals are only attributed to drinking purposes, whereas pond water is used for washing. The livestock water demand is estimated to double from 7.04 mcm (in 2011) to 15.25 mcm in Zone I in 2050, while a steady increase in demand is observed in Zone II (8.44 mcm in 2011 to 12.44 mcm in 2050). In Zone III, the livestock water demand is also estimated to double from 6.03 mcm in 2011 to 13.1 mcm in 2050 (Figure 9). This anomaly is due to the presence of more cattle and buffalo in Zone I than the other two regions, while the steady increase in water demand in Zone II is due to more poultry farms contributing to lesser water demand.

**Figure 9.** Livestock groundwater demand under different scenarios.

#### *5.3. Unmet Groundwater Demand Derived from WEAP*

The unmet demand represents the gap between the allocated water and the demand in each node, or simply the shortage of water. It was calculated by subtracting the WEAP allocated water amount from the demand values in each node. The estimated value of groundwater flow in deep aquifers is 18.25 mcm annually in the study area [35]. The groundwater abstracted from shallow tube wells was estimated to be around 24 mcm (calculated from the water abstraction rate and a number of shallow tube wells installed as mentioned in district statistical abstract and CGWB annual reports). The WEAP model was capable of simulation and the results indicated strong seasonal changes in required groundwater utilization over the years. The sectoral division of unmet demand is compiled in Table 4 and represented in Figure 10. Based on unmet demand values, it is clearly indicated that water stress-related issues (described in the next section) are going to rise at any growth rate scenario in Zone III due to higher agricultural and domestic demand.


**Table 4.** Table of sector-wise groundwater unmet demand (in mcm) in 2050.

**Figure 10.** Zone-wise unmet demand under different scenarios.

#### **6. Discussion**

#### *6.1. Implications of Increasing Groundwater Demand*

The future sectoral water demand and unmet demand in three separate zones gave an idea about the groundwater storage vulnerability. Based on unmet water demand, the zones can be segregated into three major criteria: more vulnerable (Zone III), medium vulnerable (Zone I), and less vulnerable (Zone II). Zone III is comprised of five community development blocks, with a decadal human population growth rate of 1.71% along with a larger land area irrigated by groundwater, which is translated into higher groundwater demand. The decadal human growth rate (1.44%) is much lesser in Zone I, but being the major tourist attraction, the water demand is expected to rise at a greater rate with a subsequent depletion in groundwater level. The groundwater-dependent agricultural land area is much lesser in Zone II compared to the other two zones. This is significant as, despite a higher decadal population growth rate (1.93%), the overall demand stayed at the lower level. Along with population size, an increasing trend of population density (573 in 1991 to 819 person/km<sup>2</sup> in 2011) [42] is observed in the study area, impacting on the groundwater abstraction rate significantly. From the monthly demand, it was confirmed that the groundwater demand rises in the pre-monsoon months (March–June, when rainfall is not sufficient) during the sowing of Aman paddy, and again it rises in November during the sowing of Boro paddy, indicating a strong negative correlation with monthly rainwater availability (Figure 11). In other words, groundwater demand increases when rainfall is lower and demand decreases when rainfall is adequate. During FGD, it was confirmed that in any extreme climatic event (cyclone), hydrological changes can trigger serious repercussions on economic output and associated mental stress among communities. This leads to severe consequences such as restlessness, anxiety, and frequent domestic violence and suicides too [50].

**Figure 11.** The monthly variation of groundwater demand in the study area.

#### *6.2. Relationship between Agricultural Activities and Climate*

Climatic parameters (rainfall) play a prominent role in agricultural activities in the study area. The monthly variation of rainfall amount in the study area is depicted in Figure 12a. This rainfall pattern is significant as agricultural groundwater demand increases during non-monsoonal months. Any variability in rainfall might cause a significant impact on crop yield. Decreasing rainfall [26] prompts the farmers to extract more groundwater. This was confirmed by FGD in the study area. Furthermore, the reach of groundwater-based irrigation is not available to all farmers, thus, it greatly acts as a limiting factor of economic growth. Increasing population density [42] and decreasing rainfall and groundwater recharge rate [26] might be important reasons behind the declining groundwater level in Figure 12b. This leads to the extraction of water from deeper wells, which increases the electricity cost as well as the total cost of production. According to Mandal et al. (2015) [15], the Sundarbans coastal region has been experiencing a delay in the onset of monsoon and late recession, while sometimes-heavy rainfall was observed during the harvest period of the monsoonal crop. The depletion of the groundwater level causes a higher rate of seawater intrusion, resulting in the qualitative deterioration and quantitative depletion of groundwater resources [26,50]. Additionally, the changes in hydrological parameters have a significant aftermath due to the increased rate of pest attacks with a consequent decrease in crop productivity [61].

**Figure 12.** (**a**) Monthly rainfall amount in the study area, (**b**) Relationship between annual rainfall amount and changing groundwater level over the years (2004–2017) in the study area.

#### *6.3. Status of the Agricultural Economy*

For the last couple of decades, the contribution of the agricultural sector to GDP has been decreasing significantly in India (from over 50% during the 1950s to 15.4% in the year 2015–2016), while an increasing contribution from the industrial (manufacturing) and service sectors was observed (constant prices considered) [62]. According to the reports of PRS Legislative Research [62], the major influencing factors of agricultural productivity are over-dependence on monsoonal rainfall, landholdings, deteriorating soil nutrients due to improper use of fertilizers, deprivation from formal credit, and the disproportionate use of technology, etc. Compared to other rice-producing states, the lowest growth rate (−1.3%) was observed in West Bengal, as mentioned in the NSSO 2015 report [63]. Almost 95% of farming households who cultivate <4 hectares were incapable of fulfilling their basic consumption need [64]. These statistics are highly significant as nearly 84% of total farmland is cultivated by small and marginal farmers in West Bengal. Additionally, West Bengal belongs to those states where the average monthly income of farmers lies below 5000 rupees [64].

When plotting the groundwater level data (GWL) and the cost of cultivation in the study area, a significant inverse relationship was found. As the GWL started declining, the cost of cultivation started increasing (Figure 13a). This may be due to electricity consumption during agricultural water abstraction. The relationship between groundwater and electricity is straightforward, as electricity is consumed for groundwater abstraction from aquifers. Unlike other states, the farmers of West Bengal do not receive free electricity for agriculture or tariff subsidies [65], and as a result, the cost of production increases along with the depleting groundwater level. This trend is also significant as the cost of cultivation has shown an inverse relationship with the rainfall amount in 2013–2018 time periods (Figure 13b). As such, these data imply that when rainfall is lower, there is a higher groundwater demand in agriculture, which leads to excessive electricity costs and an associated increase in the overall cost of paddy production (Figure 13c).

**Figure 13.** (**a**) Relationship between groundwater level (GWL) and cost of paddy production (2004–2017), (**b**) Relationship between annual rainfall and operational cost of paddy production (2013–2018), (**c**) Overall cost of paddy production over the years.

#### *6.4. Impact of Literacy Rate on Economic Wellbeing*

Literacy is one of the key factors of the economic prosperity of any region (country/state/district). Therefore, increasing the literacy rate is suggestive of better employment opportunities [66]. According to the Indian Census, literacy is defined as the capability of reading, writing, and interpreting in any language [42]. It is worth mentioning that, along with the described literacy definition, it is important to have certain technical skills to survive in the modern labor market. This will eventually decrease occupational passivity in the workforce with an improving rate of employability. Therefore, a certain level of educational qualification and technical knowledge helps in the attainment of better salary jobs in India [67]. The literacy rate in the study area showed an upsurging trend; 39%, 54%, and 75% in 1991, 2001, and 2011, respectively. An increasing literacy rate is associated with a shift in occupation from agriculture (lower wage rate) to other industrial sectors (higher wage rate). The daily wage rate for unskilled agricultural laborers is 260 rupees (without food), whereas any unskilled industrial (bakery, automobile manufacturing and repairing, biscuit manufacturing, etc.) worker gets around 338 rupees (without food) daily, as mentioned in a recent report published by Labour Commissionerate, Labour Department, Government of West Bengal [68]. As the economic benefit from agriculture is very low in West Bengal (with respect to other states), the occupation shift will be more prominent in the coming years. The relationship between the decadal literacy rate and trend in occupation shifting is provided in Figure 14.

**Figure 14.** Relationship between literacy rate (percentage value) and agriculture-dependent population.

#### *6.5. Overall Socio-Economic Status under the Hydrological Context*

The increasing salinity content of surface water has compelled the people in Sundarbans to depend solely on groundwater as a drinking water source [26]. Apart from surface water salinization, several constraints in agricultural groundwater utilization have led to economic hardship and the associated migration of local people [50]. According to Bhanja et al. (2017) [69], unsustainable abstraction of groundwater and anthropogenic impacts are the consequences of the inaccurate implementation of management policies. Unproductive agricultural lands are transformed into brackish water aquaculture, possibly leading to increased salinization of shallow aquifers [26,70]. Therefore, improper knowledge about the hydrology of this area and a lack of convenient awareness among communities have

resulted in the over-abstraction of deep groundwater, ultimately leading to the disruption of groundwater dynamics [26].

From the above analysis, it was observed that there is conspicuous shifting in professional dependency. Occupational shifting is a broad term that is controlled by multifarious factors starting from the resource availability, ecosystem vulnerability, and socio-economic status of the area [71]. Sundarbans is exposed to various highly impactful climatic events such as cyclones [72,73] with an additional higher rate of coastal aquifer salinization [26,34], leading to the serious vulnerability of the ecosystem. One significant outcome of the analysis is changing occupational trends as the portion of cultivators is declining with an increase in agricultural laborers in the workforce. This suggests that self-cultivators are quitting farming activities and becoming laborers in the agriculture field. Additionally, a trend in occupation shifting was also observed in the fisheries-dependent population, where inland fisheries are blooming at a considerable rate with a slight decrease in marine fish production. The prevalence of inland aquaculture practice was also reported in Dubey et al. [70]. Even though marine fish production can be affected by various other factors (quality of seawater, climatic change, etc.), the rate of overall production is significant as less yield may lead to a shift in occupation. Although more than half of the total working people are dependent on agriculture (Figure 15), a higher rate of conversion is expected in the coming years if the situation does not improve significantly. The principal drivers of hydrological parameters and associated socio-economic status in the study area are described in Figure 16.

**Figure 15.** Overall trend of occupation shifting in the study area.

**Figure 16.** The drivers of socio-economic trend/status under hydroclimatic context in the study area.

#### *6.6. Limitations and Way Forward*

Previously, no data were published on the contribution of groundwater and agricultural growth in the Sundarbans region. Due to this, the net area irrigated by groundwater (only) was calculated by subtracting the total canal/pond irrigated area from the total agricultural area. Moreover, the non-availability of data prompted us to use various estimated parameters from printed media and non-governmental association data. The increasing pattern of groundwater is reflected through human population growth rate only, and a further increment due to urbanization is not considered in this study due to the lack of data. Additionally, the complex hydrostratigraphy of the study area was not considered during modeling as the zone-wise groundwater demand and unmet demand were only studied with respect to varying population growth rate. Although database limitations are there, substantial analysis has shown the importance of groundwater and associated climatic factors and social parameters on the overall economic status of the study area. There is always a gap between outcomes from scientific experiments and policy-makers' decisions. It is noteworthy to mention that innumerable feedback mechanisms exist between humans and the hydrological system. Combining all these factors is beyond the scope of this study. The presented model and findings solely portrayed a snapshot of future trends in water use and human response for defined scenarios only. This study identified separate zones based on water demand and unmet demand to guide policy-makers for the implementation of location-specific policies to accomplish an impactful water management strategy to

maintain the socio-economic development of humans. This is one of the first studies to acknowledge the complex human–water interactions in the Sundarbans region and further studies are required to predict the overall social dynamics under the hydrological context.

#### **7. Conclusions**

This study addresses crucial hydrological queries regarding human–nature interaction along with socio-economic issues in a highly complex coastal ecosystem. Despite the history of Sundarbans narrating the transformation for mangrove forests into agricultural land, gradually, it is becoming uninhabitable owing to changing hydrological dynamics, including the rapid salinization and contamination of freshwater aquifers. With these irreversible impacts, crop productivity and aquaculture yield are further expected to decrease in the near future. Sundarbans is possibly the classic case where groundwater availability is a great limiting factor to socio-economic growth. The relationship between hydroclimatic factors and socio-economic vulnerability is prominent, and such a relationship is further magnified by low per capita income, uneven allocation of natural resources, an inefficient health care system, improper education, and several associated factors, ultimately leading to an inadequate adaptive response to stressors. As a result, the major occupation sectors (e.g., agriculture, fishery) are affected considerably, posing challenges to populations living there. The impact of rapid changes in hydroclimatic factors on natural resources can endanger the utilization of natural resources. Both qualitative and quantitative securities of natural resources are important for sustainable socio-economic wellbeing. The probable factors for occupation shifting are: firstly, the ever-increasing cost of irrigation with reducing profitability in paddy production. Secondly, being a tourism attraction, there is a rapid bloom in real estate with urbanization that ultimately leads to a decrease in land area for cultivation. Thirdly, climate change (inadequate and inconsistent rainfall, cyclones, storms and increasing surface water temperature) acts as proper slow-burn with ever-lasting impact to promote this shifting. These factors either individually or in combination act as the push factors. Therefore, the early estimation of vulnerability is crucial to mitigate the aftermath of stressors. Further analysis is required to estimate the groundwater exploitation with increasing demand and understand the impact of climate change. Additional studies are also required to design effective adaptation strategies, both for agriculture and other employment sectors.

**Author Contributions:** Conceptualization: S.H., P.K., A.M.; methodology: S.H., P.K.; formal analysis: S.H., P.K.; field investigation: S.H., P.K., K.D., R.D.; writing—original draft preparation: S.H., P.K.; writing—review and editing: S.H., P.K., K.D., R.D., A.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work is supported by Asia Pacific Network for Global Change Research (APN) under Collaborative Regional Research Programme (CRRP) project with project reference number CRRP2019-01MY-Kumar.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** The authors also appreciate the support from the people of Namkhana, Kakdwip and Patharpratima community development blocks during FGD.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

