**Water Quality Modelling, Monitoring and Mitigation**

Editors

**Amit Kumar Santosh Subhash Palmate Rituraj Shukla**

MDPI Basel Beijing Wuhan Barcelona Belgrade Manchester Tokyo Cluj Tianjin

*Editors* Amit Kumar School of Hydrology and Water Resources Nanjing University of Information Science and Technology Nanjing China

Santosh Subhash Palmate Water Resources Division Texas A&M AgriLife Research Center El Paso United States

Rituraj Shukla School of Engineering University of Guelph Ontario Canada

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal *Applied Sciences* (ISSN 2076-3417) (available at: www.mdpi.com/journal/applsci/special issues/ water quality modelling monitoring mitigation).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. *Journal Name* **Year**, *Volume Number*, Page Range.

**ISBN 978-3-0365-9643-3 (Hbk) ISBN 978-3-0365-9642-6 (PDF)**

© 2023 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications.

The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.

## **Contents**


Reprinted from: *Appl. Sci.* **2022**, *12*, 8158, doi:10.3390/app12168158 ................. **151**

### **Yan Liu, Qin Chen and Rajendra Prasad Singh**

## **About the Editors**

#### **Amit Kumar**

Dr. Kumar is a hydrologist cum environmentalist. He has worked in areas like sources, fate, and transport of carbon in water and soil, greenhouse gas emissions from reservoirs/lakes/forests, water quality, and ecological health assessment. He holds an M.Tech and Ph.D from IIT Roorkee, one of the top technical institutes in India, for which he received an MHRD fellowship. He undertook a postdoc at Hohai University's College of Hydrology and Water Resources, Nanjing, China, and now works as an Associate Professor at Nanjing University of Information Science and Technology, School of Hydrology and Water Resources, Nanjing, China. He has published more than 90 research/review articles and attended several training and conferences with support from ICAR, ETH, CAS, MOEF and DAAD. Recently, he has edited two special issues book for MDPI publication. He has edited special issues in *Land* (MDPI), *Applied Sciences* (MDPI), *Frontier in Environmental Science*, is a Topic Editor in *Water* (MDPI), is an Advisory Board Member of *Ecological Indicators*, Elsevier, as well as Associate Editor for *Applied Water Science* (Springer), and *IJRBM* (Taylor and Francis). Currently he is handling a NSFC projects and being ranked in 2% of world scientists, which was released by Stanford University in 2022.

#### **Santosh Subhash Palmate**

Dr. Palmate has worked on hydrological and climate change modeling for the last 10 years. He completed his M.Tech and Ph.D at IIT Roorkee, India. Before joining the Texas A&M AgriLife Research Center, he worked with Wetlands International South Asia, New Delhi, as a Technical Officer. Currently, he is working as a Postdoctoral Researcher at Texas A&M AgriLife Research Center, USA, and is deeply involved in research. During his academic career, he has published over 20 research articles and written 6 book chapters of international repute. Dr. Palmate is a Guest Editor in *Water* and a reviewer for various prestigious journals. He is also a professional member of EGU, AGU, IAHS, EWRA, etc.

#### **Rituraj Shukla**

Dr. Shukla has worked on hydrological and water quality modeling for the last 10 years. He completed his B.Tech and M.Tech at Indira Gandhi Agricultural University and his Ph.D at IIT Roorkee, India. Before joining the University of Guelph, he worked at the National Institute of Hydrology, Indian Institute of Technology Roorkee, etc., and handled various research projects. Currently, he is working as a Postdoctoral Researcher at the University of Guelph, Canada, and is deeply involved in research. During his academic career, he has published over 40 research articles and written 4 book chapters of international repute.

## **Preface to "Water Quality Modelling, Monitoring and Mitigation"**

In the modern era, water quality indices and models have received global attention from environmentalists, policymakers, governments, stakeholders, water resource planners, and managers due to their ability to evaluate the water quality of freshwater bodies and groundwater aquifers. Due to their wide applicability, models are generally developed based on site-specific guidelines and are not generic; therefore, predicted/calculated values are reported to be highly uncertain. Thus, model and/or index formulation are still challenging and represent a current research hotspot in the scientific community. The inspiration for this reprint came from our desire to provide a platform for sharing results and informing young minds around the world to develop suitable models to understand water quality so that mitigation measures can be taken in advance to make water fit for drinking and for life-supporting activities.

> **Amit Kumar, Santosh Subhash Palmate, and Rituraj Shukla** *Editors*

### *Editorial* **Water Quality Modelling, Monitoring, and Mitigation**

**Amit Kumar 1,2,\* , Santosh Subhash Palmate <sup>3</sup> and Rituraj Shukla <sup>4</sup>**


**Abstract:** In the modern era, water quality indices and models have received attention from environmentalists, policymakers, governments, stakeholders, water resource planners, and managers for their ability to evaluate the water quality of freshwater bodies. Due to their wide applicability, models are generally developed based on site-specific guidelines and are not generic; therefore, predicted/calculated values are reported to be highly uncertain. Thus, model and/or index formulation are still challenging and represent a current research hotspot in the scientific community. The inspiration for this Special Issue came from our desire to provide a platform for sharing results and informing young minds around the world to develop suitable models to understand water quality so that mitigation measures can be taken in advance to make water fit for drinking and for life-supporting activities.

**Keywords:** water quality; monitoring; modeling; mitigations; water quality indexing

#### **1. Introduction**

Due to the rapid increase in anthropogenic activity in catchments, further adverse changes in access to water resources are expected in the future [1] Under these conditions, water quality (WQ) plays an important role that determines its economic utility, including in the potable or drinking water supply, recreation, and agriculture. In the modern era, the study of and commitment to monitoring, modeling, and mitigation have become important and meaningful aspects of the environmental impact assessment process [2]. Under various circumstances, the potentially adverse impacts on ecological flora and fauna can be mitigated through the strategic design and implementation of appropriate models, tools, or techniques to diminish the severity of the effects [3,4]. Different types of nutrients, contaminants (heavy/trace metals), micropollutants, nanoparticles, microplastics, microbes, etc., disturb the ecological life in freshwater bodies [5,6]. Therefore, evidencebased pollution control is urgently needed to focus on the elementary level of water governance, known as "monitoring, modeling, and mitigation". Monitoring sets the empirical basis by providing spatio-temporal information on substance (contaminants and WQ parameters such as dissolved oxygen, biochemical oxygen demand, chemical oxygen and demand, and nutrients) loads as well as the driving boundary conditions for evaluating WQ trends and statuses and for further providing useful information to mitigate contamination and to balance ecological life [7]. Thus, modeling helps to provide long-/medium- and long-term information for times and locations where monitoring is not at all possible [8,9].

The proposed Special Issue will explore cross-disciplinary approaches, modeling, and methods and will discuss water quality risks as well as solutions for the implications for environmental sustainability and for the further conservation of ecological life. The

**Citation:** Kumar, A.; Palmate, S.S.; Shukla, R. Water Quality Modelling, Monitoring, and Mitigation. *Appl. Sci.* **2022**, *12*, 11403. https://doi.org/ 10.3390/app122211403

Received: 17 July 2022 Accepted: 9 November 2022 Published: 10 November 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

interconnectedness of this critical problem cannot be assessed with traditional approaches; instead, inter- and trans-disciplinary approaches are urgently required worldwide to deal with water resource problems and environmental sustainability challenges.

#### **2. Overview of Water Quality Indexing Models**

In general, water quality index (WQI) models are frequently used to evaluate the WQ of freshwater bodies (e.g., lakes, rivers, and reservoirs) [10,11]. These models use aggregation techniques to convert extensive WQ datasets into a single representative value. Since the 1990s, WQI models have been extensively used to evaluate the WQ of surface water and groundwater [12] based on local criteria because they are easy to handle and free (Figure 1). The literature has reported that more than 30 WQI models have been created and introduced worldwide to evaluate the WQ of freshwater bodies [13–15]. WQI models are generally completed in four consecutive stages: (i) the selection of WQ parameters, (ii) sub-indices generation for individual parameters, (iii) the calculation of the weighting values of each parameter, and (iv) the sum of all sub-indices values to evaluate the WQI. The literature has reported a range of applications of WQI models to evaluate the WQ of freshwater systems [10,14–16]. However, most of the models that have been developed are based on site-specific guidelines and are not generic; therefore, the large uncertainty in the predictions and/or estimations made by these WQI models is coming into the picture and creating a hindrance in strategic mitigation measures for WQ control for sustainable ecological life and human use.

**Figure 1.** Commonly used WQI models worldwide [15].

#### **3. Water Quality Models, Challenges, and Limitations**

Water quality modeling (WQM) is an important tool that aids environmentalists, policymakers, water resource planners, and managers in strategic water resource management.

However, WQM represents a challenge in the scientific community due to several constraints and limitations. In general, WQ models are classified based on the types of receiving water, the complexity of the models, and the WQ parameters (e.g., nutrients, dissolved oxygen, biological oxygen demand, etc.) that the model can predict. Thus, WQM requires proper standardization, pollution hotspots, the identification of common features, and policy-relevant models. These models save labor costs, materials, and time [16] and help in effective pollution mitigation for the watershed. In the recent era, numerous models have been frequently used to simulate the water quality of freshwater bodies (streams, rivers, reservoirs, and lakes), estuaries, coastal waters, and marine ecosystems [17]. However, due to the different theories and algorithms applied in the models, their corresponding outputs are different and create huge differences in the results; thus, models could be useful and produce fruitful results when applied to solve particular environmental problems [18].

Water quality (WQ) models are generally categorized into two categories: (i) physical and (ii) mathematical models [19]. Furthermore, they can be categorized according to the complexity of model simulation, i.e., 1D, 2D, and 3D; type of approach (conceptual, physical, or empirical); data requirements; types of pollutants; area of application (groundwater, catchment, lake, river, coastal waters, etc.); nature (stochastic or deterministic); and spatial analysis [20]. In recent decades, WQ models such as ANSWERS-2000, AquaChem, MIKE SHE, AGWA, GLEAMS/CREAMS, AQUATOX, APEX, EFDC, EPD-RIV1, BASINS, HSPF, KINEROS2, LSPC, NLEAP, PRMS, QUAL2K, QUAL2E, SWMM, SWAT, WARMF, WAM, WCS, and WASP7 have been frequently used to predict WQ worldwide [21]. Because of data requirements and availability as well as types of catchment problems, the simplest reliable models are dominant over complex models [22].

WQ modeling is still challenging in the scientific domain due to the lack of expert handling of user, site, and/or regionally specific and parameter-specific information as well as inadequacies in model calibration and errors in data reporting. The uncertainty in WQM comes from various sources of errors, such as (i) parametric uncertainty, (ii) structural errors, and (iii) errors in the measurements of the input values and response uncertainty [23]. In developing countries (e.g., India and China), a uniform model standardization system has not been recognized, which limits the extensive utilization of those models for ecological and water management as a result of the lack of benchmarks and comparisons between different modeling outcomes [9,10,16]. Spatial variability is reported as a serious problem in catchment-scale WQM that generally acquires catchment behavior, representative site selection, and the integration of nonlinear biogeochemistry [24]. However, the complexity of models, the inadequate availability of data, and poor WQ data are other important limiting factors for WQM.

#### **4. Water Quality Mitigation Measures**

Water quality mitigation measures or strategies are generally intended to inform and assist communities in identifying potential alternatives to minimize the adverse impacts of pollutants on WQ and to ensure that water is safe for community use [25]. Ultimately, mitigation measures help to protect, restore, preserve, and improve the WQ of receiving water bodies. WQ protection refers to adequately treating runoff to protect downstream resources from WQ degradation [26]. Restoration comes into action if the protection strategies are not sufficient to maintain WQ standards as per the permissible limits. Stakeholders from different fields working together to achieve WQ restoration goals [27]. Water quality preservation necessitates a decision-support framework that can be used to evaluate, monitor, and optimize the effects of different drivers on WQ [28]. Furthermore, WQ improvements can be accomplished by identifying the highest priorities for WQ conditions and implementing mitigation strategies to address ongoing issues in a study area (Figure 2) [29,30]. Sometimes, the study areas do not follow the jurisdictional boundaries; therefore, several stakeholders need to work together to achieve local/regional/national WQ goals.

**Figure 2.** Implementation steps for strategies to mitigate WQ problems [2].

Water quality standards can be mitigated through regulation, remediation, and watershed management [31]. Water regulation in a specific area can control the free discharge of waste from industry or sewage treatment plants by setting standards for each pollutant released into surface waters [31]. Remediation acts, such as biological, chemical, and physical acts, help in cleaning the water contamination; (i) biological remediation is a cost-efficient method and is also called "bioremediation", which involves the use of naturally occurring organisms such as plants, bacteria, and fungi to remove or neutralize water pollutants and to breakdown hazardous substances into less toxic or nontoxic substances. Human sewage and agricultural chemicals that leach from the soil into the groundwater are generally treated by bioremediation [32,33]. (ii) Chemical remediation methods use chemicals to react with the water contaminants to remove or make them less harmful, and (iii) physical remediation includes the removal of water contamination by treating it with filtration or disposing of it. Overall, all these three remediation methods are somehow complex, expensive, and difficult to adapt.

Watershed management strategies consist of reducing the chemicals applied to land, making them more effective for nonpoint source pollution than setting pollution standards [3]. In a watershed, riparian areas promote WQ and limit pollution; therefore, their maintenance and restoration are crucial. Vegetation surrounded by a water body absorbs nutrients and provides shade to keep water cool and increase its capacity to hold dissolved oxygen (DO). Additionally, vegetation reduces runoff, promotes infiltration, and lowers soil erosion. Hence, vegetation plays a key role in the effective management of WQ through watershed management. Watershed practices that are beneficial for maintaining WQ standards include (i) regional infiltration basins; (ii) neighborhood-scale practices such as rain gardens, bioretention, and permeable pavement; (iii) stream restoration, including pooling and meandering to enhance infiltration; (iv) floodplain restoration, including floodplain benching; (v) stream (riparian) buffers; (vi) using park green space and fields to store and infiltrate water; (vii) stormwater-friendly post-construction design; and (viii) protecting and resting natural and human-made wetlands. Some important actions can be taken to get rid of polluted water before pollution ever happens and to mitigate WQ standards:


Overall, we need to work on mitigating water quality and educating friends, family, neighborhoods, and relatives about the necessary actions for water safety.

Most WQ mitigation measures aim to prohibit illicit discharge, control erosion, reduce pollutants, and control excessive flows. Additionally, strategies consisting of implementing outreach, education, and other activities that promote infiltration, flood reduction, and stable drainage channels could be beneficial for WQ management [34]. Stormwater flow management, floodplain restoration, channel stabilization, and green infrastructure installations are the main strategies to prevent pollutant discharge into surface waters from stormwater, including wastewater. Wetland protection, rehabilitation, and restoration activities improve WQ and quantity and support the maintenance of floodplains in their natural state [35]. The protection of riparian areas and floodplains and keeping hazardous materials from source water areas can directly safeguard drinking WQ and can indirectly protect public health. Sometimes, financial resources limit the application of these mitigation strategies, so the prioritization of mitigation strategies can focus on important WQ issues that are necessary to complete in a short period of time. To overcome this, the provision of grants/funding is also essential to encourage vegetation planting and maintenance over time.

#### **5. Conclusions**

Water quality (WQ) tools and models are described and selected based on their applicability, site- or regional-specific qualities, weaknesses, strengths, and whether or not they are intended for commercial or industrial use. The outputs of models and WQ indexing are different based on the input requirements and data availability and therefore have large levels of uncertainty, are not freely available for commercial use, and require skilled model users. Model selection is a robust task in the scientific domain; therefore, when selecting suitable models for pollution control in freshwater bodies, catchments, or a specific site, there are requirements to consider, such as the availability of datasets, the complexity of the models, and the type of freshwater bodies, and the intended objectives should be modeled so that mitigation strategies can be implemented in fruitful ways.

**Author Contributions:** Conceptualization, A.K. and S.S.P.; methodology, A.K.; validation, A.K., S.S.P. and R.S.; formal analysis, A.K., S.S.P. and R.S.; investigation, A.K.; resources, A.K.; data curation, A.K.; writing—original draft preparation, A.K., S.S.P. and R.S.; writing—review and editing, A.K., S.S.P. and R.S.; visualization, A.K.; supervision, A.K.; project administration, A.K., S.S.P. and R.S.; funding acquisition, A.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** This study is partially supported by National Natural Science Foundation of China (NSFC)- International young scientist Project (Grant no: 52150410400).

**Acknowledgments:** We are thankful to the authors for their contributions to this Special Issue, which have been integral to its success. Moreover, the excellent support from the editors (especially Section Managing Editor) and reviewers has been highly encouraging.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Article* **Ecosystem Services: A Systematic Literature Review and Future Dimension in Freshwater Ecosystems**

**Deeksha and Anoop Kumar Shukla \***

Manipal School of Architecture and Planning, Manipal Academy of Higher Education, Manipal 576104, Karnataka, India

**\*** Correspondence: anoopgeomatics@gmail.com

**Abstract:** Ecosystem services are part and parcel of human lives. It is of paramount importance to understand the interaction between these ecosystem services, as they are directly related to human life. In the modern era, quantification of ecosystem services (ES) is playing an important role in the proper understanding and efficient management of social–ecological systems. Even though a significant amount of literature is available to present on the topic, there is a need to build an adequate amount of knowledge repository. Hence, a systematic literature review method is used, in which research question and searching stages are defined. This review study is conducted on ecosystem services and remote-sensing-related keywords in the Scopus database. After a systematic analysis of the papers retrieved from the Elsevier, Scopus database, MDPI, and open source, a total of 140 primary articles were categorized according to their relationship with other ecosystem services, land use, land cover, and planning management. Major issue findings and important aspects have been analyzed and reported in each category. With this analysis and developments in the existing literature, we have potential areas for future research. Findings pointed out that regional or local-level ecosystem services-related work is immensely important, and a hotspot of current research aiming to understand the variability and spatiotemporal dynamics in terrestrial and aquatic ecosystems.

**Keywords:** ecosystem services; provisioning ecosystem services; regulating ecosystem services; cultural ecosystem services; supporting ecosystem services

#### **1. Introduction**

The biophysical state of the ecosystem is affected by multiple elements and, simultaneously, by humans' ability to enjoy its services [1]. Furthermore, Anthropogenic and non-anthropogenic interventions can change the biochemical cycles and earth's energy equilibrium, in turn causing global warming, and future climate changes [2]. On the other hand, rapid urbanization degrades ecosystem services [1,3]. Gretchen, in his work, points out that the lifestyle of the people may be hampering the prosperity of ecological biodiversity at the expense of their descendants [4,5]. Talking about ecological biodiversity, the term ecosystem services is described by many authors as the process that helps to sustain human life, with the help of interaction between the natural ecosystem and the species [4,6]. Globally, human systems are supported by nature's contribution, i.e., ecosystem services, and [7,8] state that land use and land cover changes (LULC) induced by humans have increased over the last three decades, leading to changes from a natural setting to humanconquered land. Furthermore, according to Gómez-Baggethun & Barton [9] more than 50% of the world's inhabitants reside in urban areas and potentially receive benefits from the ecosystem services; the future projection of people living in urban areas is calculated to reach 66% by 2050. The main motive for this research is to find a way to conserve the existing ecosystem, with a background of arising global environmental issues.

**Citation:** Deeksha; Shukla, A.K. Ecosystem Services: A Systematic Literature Review and Future Dimension in Freshwater Ecosystems. *Appl. Sci.* **2022**, *12*, 8518. https://doi. org/10.3390/app12178518

Academic Editors: Amit Kuma, Santosh Subhash Palmate and Rituraj Shukla

Received: 11 June 2022 Accepted: 11 August 2022 Published: 25 August 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### *Ecosystem Services*

Ecosystem services (ES) are defined as the benefits people obtain from nature [1,4,6]. ES connects human well-being and natural systems to ecological and economic development to lay a platform between nature and society [10]. Land use and land cover changes increase the population rate and have a huge impact on ES, which is leading to its degradation [11–16]. Therefore, evaluation of ES has been the core subject of research in the academic section for years [17,18], and the recent past interventions also show the readiness of the study to inform policymakers in undertaking essential decisions in the policy making process along, with the integration of ecology, geography, and economy [19]. In the study [20], the authors put forward that the authenticity of the ecosystem cannot be based on human intervention alone; the same is considered authentic when the researcher considers both pristine and altered forms of the ecosystem, therefore understanding the change in fundamental characteristics of the ecosystem.

ES plays a vital part in constituting the well-being of an individual's life through security provision, meeting the basic needs for day-to-day life along with health and good social relationships with each other. Urban ecosystems are still a critical area of ES research, as half of the world's population dwells in urban areas. According to MEA, around 60% of global ES has been threatened or used inappropriately, and the same process is expected to continue essentially in the first half of the present century [1]. For this reason, recently, ES is significantly considered one of the vital aspects of land use planning and ecological environmental planning and management [19–32].

The interaction between the ESs can take place in two ways. The first is trade-offs, where an increase in the effect of one of the ES results in a decreased effect in other ES. The second is synergies, where the increase in the effect of one ES also leads to the increased effect of other ES [23,24]. When these relationships occur again across space and time, they are called ES bundles [25]. Understanding this relationship is rather critical, as it focuses on the relationship between ES by concentrating on inherent bundles rather than on discrete ES [26–28]. Studies by Bennett et al. say that the trade-offs and the synergies are caused by the interaction among various ecosystems, so the ecosystem services cannot be considered independent [29]. Braat & de Groot infer that the study of various ES is complex [30].

Ecosystems can be monitored at different levels; they can be studied at a global scale or regional scale, or local scale. Global-level studies are carried out worldwide, but researchers suggest studying the services at the local level, which gives us a better understanding of the situation, helping us to take up appropriate mitigation strategies at a regional level. This helps us achieve sustainable goals at the global level [28]. Although research studies by a wide range of scholars have shed light on the interaction of various ecosystem services in recent years, the amalgamation of our existing knowledge repository and gaps is still inadequate [31].

Various other frameworks emerged in the recent past for ES studies [33]. To account for the natural capital, Common International Classification of Ecosystem Services (CICES) integrates different criteria of various ESs. Closely capturing concepts that relate to nature's contribution to humans is the framework developed by Inter-governmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES). To understand ES, Ref. [1] came up with a base framework for global ES study; therefore, ES can be classified into four categories: (i) provisioning ecosystem services, (ii) regulating ecosystem services, (iii) cultural ecosystem services, and (iv) supporting ecosystem services.

Provisioning ecosystem services (PES) are defined as the goods that can be directly extracted from nature and consumed, which have a certain market value. Examples of provisioning ecosystem services are water, food, wood, biofuels, etc. Stating the same, for the freshwater supply ecosystem service, it is necessary to have an ecosystem that is functioning in a good way [34]. Climatic factors such as precipitation, evaporation, and climate variability are the important components that control the water yield of the region [34]. Water yield has positive linkage between evapotranspiration and soil conservation [35], along with other components such as food production, timber, etc.

Regulating ecosystem services (RES) can be defined as the benefits that are drawn from the process of the ecosystem that modifies the condition that we are presently experiencing. Examples of the same will be climate changes, carbon storage, soil fertility, floods, etc. [29]. The study emphasizes the relationship in terms of trade-offs and/or synergies of regulating ecosystem with other ESs, due to which regulating ESs can be considered as one of the critical parameters for the assessment of ecological resilience [35]. Managing one ES parameter will improve synergies among other ES parameters, especially among carbon storage, low flow, biodiversity, etc. [36]. Carbon (C) storage is a key attribute in regulating the global service of climate regulation [9]. Practical implementation of C sequestration knowledge will take a back step in public policymaking due to the lack of effectiveness in translating scientific criteria [37]. Carbon sequestration acts as an important parameter in global climatic regulation [33]. Carbon is stored in four different layers in nature, i.e., aboveground biomass, belowground biomass, soil organic carbon, and dead matter storage. As carbon is stored (~70%) in the terrestrial ecosystem [38–40], its carbon dynamics potential could be affected in the future under rising carbon dioxide. Therefore, the carbon present in the soil considerably has a huge impact on the spatial and non-spatial data. Hence, long-and-medium term modeling taking into consideration of different LULC scenarios is the hotspot of current research, which helps policymakers with the mitigation strategies framework and decision-making process. This process can be re-scaled globally, regionally, and locally by co-relating different rationales to economic opportunities and regulatory policymaking.

Gómez-Baggethun et al. [41] introduced cultural ecosystem services (CES), but MEA studied and defined them as "the non-material or intangible benefits people obtain from the ecosystem either spiritually, through cognitive development, recreation, self-reflection or through experiencing aesthetically" [1]. In this ES, some of the services, such as recreations, have market value, whereas other services do not have the same. Functions fulfilling life information functionality are the different ways in which cultural ecosystems are included in the study [4,6,42]. Additionally, Sen & Guchhait simplify the definition by correlating humans' sociocultural practices with psychological development [43]. CES is also associated with the intangible benefits that people attain from nature due to the interaction [44,45]. Most of the studies on cultural ecosystem services deal with recreation services that are nature based and aesthetic [46], whereas not a great amount of study hqw been carried out on the spiritual value of landscape due to the limitation of modeling [47]. Supporting ecosystem services (SES) are the fundamental process of the ecosystem that supports life, such as photosynthesis, nutrient cycle, and evolution; this is a vital service that the ecosystem provides, which allows the rest of the ecosystem services to be delivered.

To achieve sustainable development of the city and conserve the ecosystem, it is necessary to understand each of the abovementioned ecosystem services and their interactions with the changing LULC [48]. This study can help provide future research perspectives and proper decision-making strategies. To offer the same, a literature review for each of the ecosystem services was carried out, providing a global perspective first and then elaborating on the studies conducted at the national level. It will provide an overview of the models and methods that are used for the quantitative study of ecosystem services in the limelight. The publications used various methods of quantitative assessments, such as spatial mapping, economic valuation, etc. Therefore, the main objective of our paper is to understand; (i) the global scenario of ES, (ii) where does ES study in India fall? (iii) what are the research gaps that could be studied, and a way forward in the same area.

#### **2. Materials and Methods**

#### *2.1. Data Collection*

The literature survey was carried out in November 2021, and data were collected from Science Direct, Scopus, MDPI, official reports, and Wiley. The main strategies of the basic literature review contained four different phases. First, to understand the total number of publications present, we used the keyword "Ecosystem services" to understand the pattern of the study on a yearly basis. Second, to understand works on different types of Ecosystem services, we used the keywords "Ecosystem services and Provisioning Ecosystem services". In the next search, using keywords "Ecosystem services and Regulating Ecosystem services", similarly, the search was carried out for cultural ES and supporting ES and their trend of publication for two decades. This was followed by the search to understand the types of models used to study ES, and also to figure out the types of models catering to ES. Additionally, a search was carried out to understand the trends of individual models serving various parameters of ES. Finally, the investigation was carried out to understand the ES publication in the Indian context using the search keyword "Ecosystem services and India".

After collection of the records, the initial step of analysis took place at various levels. The first step of the selection criteria was to select the papers which spoke about ES throughout the world. This was followed by the data range strategy of selection, wherein the collected data were segregated based on types of publication. In this step, most of the journal articles were selected over the books or conference papers, as the articles are periodical, and there was a high chance of understanding the current trend of publication. Books and conference proceedings were negated. The step was followed by the title and abstract search, wherein the non-related articles were negated after going through the article abstracts and title. This was followed by the criteria search, which considered the related variables of the study. As a result of this, 138 articles were extracted for this review. The assessment parameters for this review are based on the ecosystem services approach and include the date of publication, the context of the publications, the kind of data used/analyzed (qualitative or quantitative), as well as the spatial size of the study. Table 1 gives an insight into the criteria considered for the study.

**Table 1.** Criteria of Review with Feasible entries.


#### *2.2. Data Analysis*

The data analysis consisted the database of search records that was built from the data collection process. The study can be considered susceptible given the huge amount of data resent in the database. To avoid arbitrary comments, a systematic review was carried out on 140 selected papers that dug deeper to understand the knowledge base of the subject, which excluded book chapters, student theses, and reports. Regardless, the search output is considered authenticated and peer reviewed, as it was taken from the distinguished journal article database. The data were taken as a basis for future study (Figure 1).



**Figure 1.** Showing the flow chart of the data analysis for ecosystem services.

### **3. Results**

#### *3.1. Mapping of Publication*

Research publications until early 2000 were fewer in number (Figure 2). The reason was unfamiliarity with the subject; later, the work changed the whole lens on how the ecosystem was viewed [4,6]. From 2000 to 2005, we can find approximately two thousand papers on ecosystem services (Figure 2). Later, once the MEA [1] was published, we found a sudden rise in the graph, which denotes that universal attention was attained by ecosystem service research. Later on, it became one of the core research areas among academicians and scholars. From the years 2005 to 2010, we found publications that provided an insight into the trade-off and synergies among the ecosystem services. Following MEA, in 2010, TEEB [46] came up with a newer lens of added economic value to the ES. In 2011, CICES [49] gave common ground for all international works related to ecosystem services. Post-2015 marks a prolific change in the number of publications on ecosystem services, with the publication of the Sustainable Development Goals (SDG) 2030. These are aimed at making cities locally and globally sustainable due to the change in global climatic aspects.

**Figure 2.** Number of publications on ecosystem services for two decades.

#### *3.2. Chronological Publication on ES Papers*

Ecosystem services papers published between the years 2000 and 2005 show that awareness of the subject was limited, wherein a critical understanding of the same was not present (Figure 3). Post MEA, the publications on the ecosystem services increased; today, we find 20,000 publications based on ESs (Figure 3). Sustainable development goals gave the necessary push required for the study and to make the cities more sustainable. MEA formed the ecosystem framework, along with four major categories.

**Figure 3.** Number of publications on ecosystem services classification—chronology. Where PES indicates provisioning ecosystem services, RES is regulating ecosystem services, CES is cultural ecosystem services and SES is supporting ecosystem services.

The number of papers was classified into four ES categories (Figure 4). Sixteen percent of the papers discuss provisioning ecosystem services, basically focusing on agricultural products, freshwater bodies, food, etc. (Figure 3). The publication trend of PES is gradually increasing day by day and is more focused on the water and agriculture-related aspect, as it has a significant role to play in every human life today. Regulating ES has 17% of paper publications, mainly focusing on the vital aspects of the present-day scenario, i.e., climatic changes, carbon sequestration, floods, soil erosion, etc. The publication trend of RES falls in line with the provisioning ecosystem, as we find trade-offs and synergies among the ESs [29], so it is important to study critical aspects on the same basis (Figure 4).

**Figure 4.** Number of publications on ecosystem services classification. Where PES denotes provisioning ecosystem services, RES is regulating ecosystem services, CES is cultural ecosystem services, and SES is supporting ecosystem services.

#### *3.3. Models Used to Access Ecosystem Services*

Modeling of the ES helps the researcher to quantify, spatially locate, and potentially evaluate the economic trends. Daily et al. [19] point out that this information plays a vital role in the decision making of urban planners, urban designers, and policymakers attempting to understand the effect of urban expansion on ES. In the present scenario, there is a proliferation of models and tools that helps us to map and access ES and vice versa [42,50,51].

Over time, numerous studies tried to simultaneously understand land use changes and their impact on ES, which helped designers and policymakers take appropriate steps to overcome the issue. To monitor LULC changes and ES changes, satellite images have been globally used as the most accurate tool [52,53]. Models are used to investigate the interactions (such as a trade-off, synergies, bundles/clusters, and flows) of ES, and deliberately put forward benefits that are enjoyed by humans for their well-being [54–56]. There is much importance given to enhancing ecosystem service management by objectively quantifying interactions among various ES [57].

Integrated Valuation of Environmental Services and Trade-offs (InVEST) is a globally accepted tool that was developed inside the Natural Capital Project [50,54–59]. The In-VEST model can illustrate a spatially visualized map of the ESs. Comparing the InVEST model with other models, InVEST does not require any expertise; this model provides a nearly accurate assessment with limited demand of data input criteria, and is relevant in understanding the areas dealing with ecological processes [50,51,60]. InVEST model is a

useful tool for assessing small-scale and local studies which give relevant and credible results for LULC and ES [58]. The InVEST toolbox is used to determine nearly 14 ES for supply changes using user-defined base setups like land use land cover and climatic changes [61,62].

The Soil and Water Assessment Tool (SWAT) is considered universally to simulate hydrological processes [63]. Further, the model has the flexibility in a spatial discretization that evaluates the space, locally, regionally, and globally. Like this, a decent number of models are used to assess ES changes; some of them are ARIES, LUCI, CA- Markov, SLEUTH, CLUES, etc.

The study of ecosystem services is performed quantitatively using mapping and modeling techniques. Researchers also have used a combination of models to assess ES, such as a combination of model mapping ES (such as InVEST, SWAT, ARIES) and model mapping urban expansion. With the help of the statistical model, the mapping was carried out. Urban expansion models such as LUSD–urban (Land Use Scenario Dynamics– urban) [64] help in a multi-scale simulation of urban expansion, LUSD–urban along with Cellular Automata (CA) and system dynamics models signifies micro-scale evolutionary factors and macro-scale resource constraints. This model has undergone certain iterations in recent years, with improved accuracy and an average kappa index [65]. The other models are SLEUTH (slope, land use, exclusion, urban extent, transportation, and hill shade) [66], CLUE-S (the Conversion of Land Use and its Effects at Small regional extent) [67]. Statistical models such as correlation analysis [13,68], regression analysis [28], and root mean square deviation were used [69]. This combination of models is efficient at forming the correlation among a few variables but is not considered to be functionally viable. The most celebrated models are InVEST, ARIES, and SWAT. ]. Figure 5 gives a brief idea of various models used by researchers.

**Figure 5.** Number of publications on assessment models of ecosystem services.

The most commonly used base data are LULC, soil data, terrain data, and hydrological data. This gives a whole picture of different criteria such as habitats, soil types, vegetation class, and biomes. According to Metzger et al., the above data are used as ecosystem services indicators [70]. Adding on the same data can be used for valuation and spatial estimation of ecosystem services [71]; the other types of data used for ES assessment are census data; climatic data such as precipitation data, which is used for water yield assessment; and a digital elevation model (DEM); this is used for hydrology assessment [72].

The models used to access ESs spatiotemporally are InVEST [60], SWAT [61], ARIES (Artificial Intelligence for Ecosystem Services) [73], LUCI, etc. According to the publication trend from 2000 till 2021, we find the InVEST model is being used extensively due to its input data criteria; it uses open source data that are freely available, with a mapping/modeling scale of 30 m × 30 m. This model helps us to access multiple ecosystem services, (water quality, soil erosion, carbon sequestration, biodiversity conservation, nutrients, agricultural produce, etc.) [72].

#### *3.4. World ES Publication Status*

Ecosystem service study post [1] has been proliferated due to rapid urbanization, causing temporal changes such as climatic variation, global warming, etc. We find work on ecosystem services is mainly carried out in the developed countries. It is of prime importance to study the changing trajectory of spatial settings in developing countries, as the population increase demands changes in the land use and land cover dynamics. The interrelation between the ESs and human activities play a critical role in global climatic conditions. Publication trends in the world ES scenario is showcased in Figures 6 and 7.

**Figure 7.** Trend of Asia's ecosystem service publications.

In the present scenario, we find ES studied globally in three ways: (1) estimating the physical quantity of services provided [74]; this is primary work that is carried out, to understand the influence of LULC change on ES, as well as its impact on the climatic aspect; (2) the economic value [75], which is needed to understand the income that is generated

due to the interaction of ES, and also helps in estimating the economy lost due to the deterioration of ES, and (3) the basic benefit transfer method [6].

#### *3.5. ES Publication Status of India*

The publication trend of ecosystem services in the Indian context gives us a brief idea of the present knowledge gap. After the search carried out in the Scopus database, we found hardly 200 papers published on ESs (Figure 8). India is a very diverse country regarding its spatial, temporal, and cultural aspects. It will be of prime importance for the study of ES to bring about awareness of ESs' influence on recent global temporal changes. India is a developing country; hence, it has experienced a lot of spatiotemporal changes in recent decades.

**Figure 8.** Indian Ecosystem service Publication trends.

India is peninsular; it is surrounded by the sea on three sides. With this being said, mangrove plays an important role in protecting the coastal region. Table 2 gives a brief idea on the studies conducted on ES in India. Giri et al. studied the status of mangrove forests in Southeast Asia [76]. Prasad et al. [77] studied the rate of degradation of seagrass impacting regulating ES due to human activities, whereas Edward et al. gave insight into methods of restoring seagrass [78]. The study conducted by [79] to understand the spatiotemporal dynamics in the mid-sized town of Telangana using statistical methods showed unsustainable growth trends among LULC variables, making study of the patterns vital. Studies were conducted to understand the degradation and rate of sedimentation of wetlands in the Western Himalayan region of Himachal Pradesh, showing a large-scale unregulated development causing the damage to ES [80]. Furthermore, Sannigrahi et al. [81] measured 17 ESs; Sannigrahi et al. [82] showed that climatic factors, biophysical factors, and environmental stress significantly affect the ESs in the Sundarbans region. The seasonal variation was captured using GHG (Green House Gas) on carbon pools in the degraded Sundarbans region [83]. Talukdar et al. [84] demonstrated the relationship between LULC and changes in ES; later on, Das et al. [85] shed light on decreasing ecosystem health in the lower Gangetic Plain region. Stakeholder participation plays a vital role in conserving ES. Sinclair et al. showed the willingness of the stakeholders to maintain the same [86]. ESs by the world hotspot region of Western Ghats played a vital role, elaborating on the impact of LULC on the ecological hotspot region and ES of Western Ghats [87,88]. Water richness and wetland habitable suitability criteria are important for understanding the habitat suitability of a populated region [89]. Further, there are stresses related to the dynamics of soil carbon in alternative cropping techniques [90,91]. Shah et al. [92] came up with the framework to

understand the ecosystem services with a comprehensive view of common resources used by policymakers to attain sustainability.

**Table 2.** Studies conducted on ESs in India.


Where P denotes provisioning ES, R is regulating ES, S is supporting ES, and C is cultural ES.

#### **4. Discussion**

#### *4.1. Contribution of ES and Global Issues*

According to the study, Millenium Ecosystem Assessment [1] inferred that 15 of 24 ecosystem services had degraded globally. Anthropogenic activities are the main reason

for 60% of the deterioration of provisioning ecosystem services [1,94–96]. On a global scale, we projected the impact of LULC on ecosystem services and concluded that changes in LULC can deteriorate the ES [97].

Bennett et al. [29] infer that trade-off and/or synergies do take place between regulation ecosystem services and other ESs, considering this the main determinant to access the ecological changes [35]. In the global scenario of the past five decades, due to an increase in the population, demand, and usage of water, intensive agricultural produce, industrialization, and economic growth, Ref. [98] pinpoints that the use of water has tripled. Studies conducted by [14,99–101] prove that there is a significant role played by LULC on the water yield. A trade-off relationship between ES was found by Zhang et al. [102] between provision ES and soil conservation. The study conducted by Yi et al. [103] found that there is a significant connection between carbon storage and soil sediment retention in an urban watershed and river basins. However, it was also proved that the synergies and trade-offs occur in different scenarios [104]. Hence, a study analyzing various ESs is vital to urban management, planning, and policy decision making [28,105].

Literature proves that there is a direct relationship between the carbon and soil-based ESs [106–108]; Rodríguez et al. [109] proved the existence of a positive correlation between aboveground carbon storage with water regulation and supply. However, knowledge related to the potential of the coastal belt to regulate climate change and emission levels due to different anthropogenic activities is not available. Nevertheless, it is critical to access the influence of carbon sequestration on climate changes by different LULC on susceptible areas that are sensitive to changing processes. Many research scholars have studied the influence of LULC on carbon stock and climatic changes [110]. Following this study, we find [111] established the relationship between precipitation variation and LULC and its influence on ES

The study of soil erosion has caught the attention of researchers recently, and studies have been conducted globally on various scales [112]. The study conducted by Vaezi et al. [113] showed the result of ecosystem services hampered due to soil erosion, desertification, etc. Additionally, due to the presence of spatial heterogeneity, Ref. [114] explains the importance of soil-related study at various scale dynamics. This helps us to understand the effect of soil and its trade-off and/or synergies at various scales, as demonstrated by [115].

The major issue found globally today, as summarized by [116–119], shows that ES have been critically impacted due to the intense interaction between the ecosystem and humans at a regional scale, and this has to be looked after with immediate effect. This is also important according to the study [120], which found that the relationship between ES is spatially heterogeneous. Some researchers have studied the LULC change on a smaller scale can transform into synergy in spatiotemporal distribution at a larger scale [121–123]. Sun et al. [104] stated that studies at the future level at a regional scale should be designed to improve various scenarios in a detailed way to cater to the local situation and policy planning. Researchers such [82,124,125] found that there was a significant influence of climatic factors on the ESs variations. Therefore, Refs. [126,127] suggest that the effective planning management strategy is to incorporate ES bundles and hotspots in the decision-making process.

#### *4.2. Way forward to ES Research in India*

Based on the literature review conducted, there is a need to account for ecosystem services on different LULC in Indian scenarios, varying in urban settings and geolocation of the urban areas. On a national scale, we have the work of LULC changes giving an insight into the change dynamics in land use. There is empirical evidence that shows there are evident changes in the structure of ES due to urban expansion, which will lead to the degradation of the same [65]. However, little is known about the intensity of ES losses at a regional scale in the Indian context, due to the influence of LULC. It is also important to understand the influence of climatic factors on ES at a regional scale, providing better service to society.

The reason to select the ESs is that first, ESs such as carbon sequestration, water yield, and soil retention are the common focus areas for research study, as they represent the ESs subset [128–130]. Second, the quantitative methods of spatially analyzing the driving forces of these ESs can be reinforced by the efficient availability of large-scale data [130,131]. Vallet et al. [132] state that the study of ES interaction is important when it comes to questioning the usefulness of various criteria to come up with appropriate decision-making needs and expectations. From the literature survey, it is seen that provision, regulation, and support of ESs are threatened [27].

Even though there is a prolific amount of research on ES, there must be in-depth knowledge about the relationship and tradeoff among various ES [27], which should also be explored in terms of emerging climatic changes [27]. Additionally, Refs. [36,84] suggests that research should be conducted on every site-specific scenario leading to informed design management strategies, which in turn elevates ES benefits. The authors of [132] investigated the relationship between ES and urbanization, and concluded that for LULC, topography has a greater influence on ESs than urbanization.

#### **5. Conclusions**

Understanding the changes in ESs and their relationship with the help of spatially explicit methods could be helpful for the study to be conducted. In the present review, research dynamics of ES in the global scenario are given, and are then narrowed down to the national scale of the Indian scenario between 2000 and 2021. This analysis is based on 138 articles gathered from the databases of Science Direct, etc., with the help of bibliometric statistics such as keywords, countries, and outcomes. Additionally, ecosystem types, geographical location of the studies conducted, and assessment and valuation methods are in the limelight. The number of publications in the Indian context is gradually increasing. We find a steady increase in the publication trend post-2015. We find publications focusing on the study of two or more ES categories. Crop production and water yield have focused on provisioning ecosystem services; carbon sequestration, soil conservation for regulating ecosystem services; biodiversity conservation, along with the nutrient cycle, for supporting ecosystem services; psychological behavior, and quality of life in cultural ecosystems. We find these studies have been completed, since these are considered major determinants ruling these ESs along with the help of readily available research methodological framework. On the other hand, due to lack of methods and data, it is difficult to map the remaining ESs. In the Indian context, water, carbon, and soil play a major role in improving the socioeconomical aspect of ES.

India is a peninsular country with a wide variety of physical landscapes, including croplands, woods, grasslands, deserts, rivers, lakes, deltas, shelves, oceans, mountains, plateaus, basins, and islands. With its existing rapid economic growth and massive urbanization, India has become increasingly vulnerable to both natural disasters, such as droughts and floods, and human-caused ecological disasters, such as deforestation, salinization, erosion, and water, air, and soil pollution. The ES transdisciplinary paradigm serves as a useful framework for analyzing diversified natural assets and addressing environmental issues through integrated ecosystem management. However, the dataset built in this review work is not comprehensive; it can serve as a foundation for future studies, with the hope of creating a complete ES research database at the national level.

**Author Contributions:** Conceptualization, D. and A.K.S.; methodology, D. and A.K.S.; validation, D. and A.K.S.; formal analysis, D. and A.K.S.; writing—original draft preparation, D. and A.K.S.; writing—review and editing, A.K.S.; supervision, A.K.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data that support the findings of this study are available from the corresponding author, upon reasonable request.

**Acknowledgments:** The authors are thankful to the Scopus database, Science Direct, Web of Science, and Google Scholar for providing the ecosystem services-related publications. The authors are also thankful to the anonymous reviewers for their constructive suggestions as to how to improve the quality of the present work.

**Conflicts of Interest:** This manuscript has not been published or presented elsewhere in part or entirety and is not under consideration by another journal. There are no conflicts of interest to declare.

#### **References**


**Andrzej Bielski \* and Cezary To´s**

Faculty of Environmental Engineering and Energy, Cracow University of Technology, Warszawska 24, 31-155 Cracow, Poland; cezary.tos@pk.edu.pl

**\*** Correspondence: abielski@riad.usk.pk.edu.pl

**Abstract:** This study examines the chlorophyll a content and turbidity in the shallow dam reservoir of Lake Dobczyce. The analysis of satellite images for thirteen wavelength ranges enabled the selection of wavelengths applicable for a remote determination of chlorophyll a and turbidity. The selection was completed as the test of the significance of the coefficients in the equation, which calculates the values of the parameters on the basis of reflectance. The reflectance of the reservoir surface differs from the reflectance of individual water components, and the overlapping of spectral curves makes it difficult to isolate the significant reflectance. In the case of Lake Dobczyce, the significant reflectance was for wavelengths 665, 705, 740, and 842 nm (chlorophyll a) and for wavelengths 705, 740, and 783 nm (turbidity). In the model, the natural logarithm of chlorophyll a or turbidity was a linear combination of the natural log reflectance and the squares of those logarithms. A lake surface reflectance also includes the bottom reflectance. The reflectance obtained from the Sentinel-2 satellite was corrected with a bottom reflectance determined using the Lambert–Beer equation. The reflectance of a given surface may vary with the position of both the satellite and the sun, atmospheric pollution, and other factors. Correction of reflectance from satellite measurements was performed, as reflectance changes for the reference surface; the reference reflectance was assumed as the first reflectance of the reference surface observed during the study. The models helped to develop the maps of turbidity and chlorophyll a content in the lake.

**Keywords:** Sentinel-2; chlorophyll; turbidity; lake; concentration modeling of contaminants

#### **1. Introduction**

The chlorophyll content in surface waters is related to nutrients, such as nitrogen and phosphorus, and serves as one of many indicators of eutrophication. High amounts of these elements in water contribute to the excessive growth of algae, resulting in poor water quality. This topic is especially urgent in surface water intakes used for municipal or industrial consumers. The chlorophyll content in water can be determined by the traditional laboratory method, based on acetone extraction and absorbance measurement [1,2]. However, to track down changes in the chlorophyll content in rivers or lakes, and to develop concentration maps, a large number of analyses would have to be done; it would be a time-consuming and ineffective approach.

Satellite images of water surfaces allow for faster and more cost-effective estimation of chlorophyll content in water [3]. Another advantage of remote sensing (teledetection) is a spatial analysis of chlorophyll concentrations from in situ data are collected at particular points [4]. There are, however, a number of problems regarding the remote sensing of chlorophyll faces in surface waters. Water regime [5], as well as the reservoir's depth [6], are associated with the chemical composition and content of biological elements in water. Radiometric and atmospheric corrections also play an important role in the case of satellite data. The correction models, such as ATCOR [7], Second Simulation of a Satellite Signal in the Solar Spectrum (6SV), Acolite, or Sen2cor are available and used in these studies. All

**Citation:** Bielski, A.; To´s, C. Remote Sensing of the Water Quality Parameters for a Shallow Dam Reservoir. *Appl. Sci.* **2022**, *12*, 6734. https://doi.org/10.3390/app12136734

Academic Editors: Amit Kumar, Santosh Subhash Palmate and Rituraj Shukla

Received: 25 May 2022 Accepted: 1 July 2022 Published: 2 July 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

these concerns have prompted the use of empirical [8,9] or semi-analytical approaches in the development of teledetection methods for monitoring chlorophyll content in inland waters. These methods are based on the physics of the interactions of radiation with water and its compounds [10,11]. In recent years, neural networks and machine learning methods have been employed in the research on chlorophyll detection [8,12].

Empirical methods, though well suited to local conditions [13], require quite a lot of data. On the other hand, semi-analytical methods are more universal, but provide less accurate results. Deep learning methods are still in the early stage of development, and they require a large amount of heterogeneous teaching data; their complexity delays their implementation in small subjects, such as water reservoirs.

The research focused on the automation of the estimation of turbidity (in nephelometric units, NTU) and chlorophyll content in water taken from shallow dam reservoirs used for drinking purposes. In such reservoirs, water is usually classified as case 2, where optical properties are a function of at least three water components, i.e., phytoplankton, suspended sediments, and colored dissolved organic matter [14]. Such objects, with their firm positions, usually exhibit multiple time observation series of basic water quality indicators (turbidity, chlorophyll). The described remote water quality research favors the empirical method, which, however, must include an optical interaction between the main pollution components, i.e., mineral suspension and phytoplankton, and the shallow tank. Therefore, the authors offer a combination of statistical models that include these elements. The models have been verified using Sentinel 2 data in Lake Dobczyce; the lake serves as the drinking water reservoir for Krakow, Poland.

#### **2. Data, Methods, and Techniques**

#### *2.1. Remote Sensing Methods for a Chlorophyll Content*

In surface waters, algae and bacteria contain many pigments that can be analyzed using spectral methods. Listed according to color, these are, e.g., chlorophylls a, b, c, c1, c2, d, e, f, and g—green; carotene—orange; xanthophyll—yellow; phycoerythrin—red; phycocyanin—blue; and fucoxanthin—brown. The dominant pigments in photosynthetic organs are chlorophyll a (blue-green) and chlorophyll b (yellow-green).

Chlorophyll a is the most frequently used indicator of surface water quality. The surface spectral reflectance curves for water with different chlorophyll a concentrations are presented in Figure 1. Blue and far-red light ranges are strongly absorbed by chlorophyll, while the reflectance peaks are recorded at the wavelengths of approximately 566 and 688 nm (Figure 1).

**Figure 1.** Surface remote sensing reflectance spectrum for the waters with different concentration of chlorophyll a, collected in the study area (119◦52 –119◦54 E, 26◦16 –26◦19 N) at 10:00–15:40, 2 June 2003. Reprinted with permission from [15].

Mineral and organic suspensions may pose a serious difficulty in studying the concentration of chlorophyll a in surface waters. Both suspensions result in the reflectance peaks at approximately 550, 712.5, and 805 nm (Figure 2), so a shift in the peaks only takes place relative to the peaks for chlorophyll a. The suspensions may also enhance the reflectance in relation to that of chlorophyll a and therefore, models describing the relationship between chlorophyll a and the reflectance should consider several wavelengths.

**Figure 2.** Relative contributions of chlorophyll and suspended sediment to a reflectance spectra of the surface water, based on in situ laboratory measurements made 1 m above the water surface by the authors of [16,17]. Reprinted with permission from [16,17].

The presence of clay or dusty particles in water results in a specific spectrogram (Figures 3 and 4) [18]. High reflectance in the range of 580–690 nm and reflectance near 810 nm will strongly distort the chlorophyll a spectrogram. The first range is characteristic for a specific type of suspension. In the case of lake bottom sediments that are composed of clay or dusty suspensions, they will have a similar spectrogram. The reflectance pattern around 810 nm is similar for clay and dusty suspensions and therefore, the reflectance around 810 nm can be used to estimate the suspension concentration.

**Figure 3.** Reflectance for water with a clay suspension [g/m3]. Reprinted with permission from [18].

**Figure 4.** Reflectance for water with a dusty suspension [g/m3]. Reprinted with permission from [18].

Mathematical models using the reflectance R to determine the concentration of Cchl-a chlorophyll a in water are developed for a given type of surface water. These are the most abundant types of empirical models. Models such as GlobColour or Modis Aqua can be used globally or adapted to local reservoir conditions. They can take the form of polynomial dependence, as shown in the research of [19,20], dependence, as a form of the products or quotients of expressions [21,22], or a logarithmic form [23,24]. The estimation of model parameters can be carried out in different ways—using multiple linear recreation, support vector machine regression (SVR), or genetic algorithms. Examples of models used to determine the chlorophyll a content in the water are presented in Table 1.


**Table 1.** Typical mathematical formulas to calculate chlorophyll a concentrations.

The models are characterized by a different precision of results. Johnson et al., 2013 [25], testing global models GlobColour or Modis Aqua, obtained low R<sup>2</sup> determination coefficients of 0.25–0.51; on the other hand, for local conditions, this indicator may reach up to

the value 0.96, with an error for RMSE chlorophyll a content of 0.07 mg chl/m3, determined on the basis of a few data covering 5 days of composite images for the year 2014 [24].

Empirical models that are focused on determining the chlorophyll a concentration take into account the variability of the aquatic environment in terms of the content of other substances in a very limited way, which may affect the accuracy of the results obtained.

A possible solution to this problem in the case of inland waters may be the construction of seasonal models (used for imaging registered at specific seasons of the year) [26]. Another more widely used technique is the use of bioptic methods based on the modeling of lightwater interaction, such as Hydrolight [27], or the newer 2SeaColor [28], adapted to water with high turbidity.

In the case of the second model, the determination indicator was equal 0.71, and model error RMSE was 6.23 mg chl/m<sup>3</sup> [28]. Such models may include bottom reflectance [29,30], which is extremely important for shallow tanks, as pointed out by Hicks et al. 2013 [31].

In the works (Li et al., 2017, 2018) [32,33], a semi-analytical method for colored dissolved organic matter (CDOM) was presented. This method includes bottom effect and is accurate (RMSE = 0.17 mg chl/m<sup>3</sup> and R2 = 0.87); it also uses multispectral data.

The extremely promising result for turbid water research was the model developed by Gitelson et al. [21]. Finding three ranges of wavelengths guaranteeing the least error to estimate the chlorophyll a content enabled an accuracy of 7.8 mg chl a/m3 in the range of 1.2 to 236 mg chl a/m3, medium relative error 18.3%, with a high determination factor R2 ≈ 0.96. The version of the model with two wave length ranges provided much worse results. Despite the high accuracy of in situ research for low concentrations (below 10 mg chl a/m3), there are relative errors reaching almost 100%.

Remote bathymetrical tests of shallow water, conducted by Lee et al. [34,35], yield the possibility of linking the effect of bottom reflectance and chlorophyll concentration. The developed model includes many different reflectances, including bottom reflectance and the absorption of radiation for various water components, such as phytoplankton pigments. However, the model is focused on the remote determination of depth. The number of model parameters is very large, making it troublesome to simultaneously determine the data for several wavelengths. Therefore, the absorption of radiation by phytoplankton is an input parameter for this model, not an output enabling the determination of chlorophyll concentration. In the case of oligotrophic sea waters, the computing models are simpler. Computational difficulties appear when many water components with spectrally different properties affect the calculation of the content of a given water component. This usually requires the use of the reflectances of many water components for many wave lengths. Particularly large complications appear in the case of shallow and turbid water.

The model of chlorophyll a content estimation in a reservoir used for drinking water intake was presented in this work. This application model is characterized by high accuracy, taking into account the bottom effect and interactions between chlorophyll and mineral or organic suspension.

#### *2.2. Area of Study*

The research was carried out in the area of Lake Dobczyce, located in the My´slenice poviat, in the Lesser Poland voivodeship of Poland. It is a dam reservoir, constructed in 1986 by damming the Raba river with a 30 m high and 617 m long dam. The reservoir area is approximately 10.7 km2, and its total capacity is 127 million m3. The reservoir serves as a source of drinking water. The lake water quality is determined by the quality of the Raba river, which varies due to anthropogenic activities and precipitation. These factors contribute to significant fluctuations in the concentrations of nutrients, mineral and organic suspensions, and other substances. The depth of the lake is just over twenty-six meters (Figure 5).

**Figure 5.** Depth map of Lake Dobczyce.

In the case of Dobczyce Lake, it is difficult to determine its range, especially in the western shallow region. This problem was solved by using information on the area elevation of the water surface and lidar altitude data (ALS), which, due to its high density, was subjected to resampling from the 5 × 5 m network using the MIN method, suggested by Sliwi ´ ´ nski et al., 2022 [36], as the most suitable method for this type of task.

#### *2.3. Data for a Remote Sensing Research*

Images from Sentinel 2 (L1C product), registered from 13 April 2016 to 31 December 2021, were used for the teledetection quality modeling of water in Lake Dobczyce. The choice of specific images was determined by the dates of water quality tests in situ. A significant limitation in the number of images resulted from the rejection of images in which clouds or ice covered the lake.

The content of chlorophyll a and suspended solids (as turbidity) in the water was measured, in the period from 2016 to 2021, in the laboratory of the Waterworks of the City of Krakow, located at the water intake. The study covered the period from 2016 to 2021. Chlorophyll a was analyzed in acetone extracts using the monochromatic spectrophotometric method, with correction for pheopigment a [2,37]; absorbance was measured on a Hach DR 4000 U spectrophotometer, while turbidity was measured on a TL 2360 spectrophotometer, according to the standard methods [38].

#### *2.4. Spectral Reflectance Curves*

To compare the data recorded by Sentinel-2 satellites at different times, they were normalized to the conditions above the reference surface. The reference area was established as a fragment of the lakeside dam crest; it shows a low reflectance, and its properties are stable in time. The air composition over the reference surface was assumed to be the same as that over the lake surface. To correct lake surface reflectance obtained from satellite data, the special corrective parameter was introduced. It is the ratio of the reflectance registered on 13 April 2016 (13 April 2016, base date) in the reference field and the actual reflectance at a given day in the field. The normalized reflectance for the lake surface was the product of the corrective parameter and the reflectance of a given point on the lake surface at a given date. Such normalization is considered as an atmospheric correction. It made it possible to accommodate changes in air quality over the reference surface and the lake surface.

Spectral curves for the fragment of the dam crest are presented in Figure 6.

**Figure 6.** Some spectral curves for the fragment of the dam crest.

The base reflection curve for the fragment of the dam crest (reference area) is dated 13 April 2016 (Figure 6). Other spectral curves have a similar shape, but pass through points with different reflectance values, even though the properties of the dam crest surface have not changed; this was due to changes in the atmosphere composition over time.

The specific course of the spectral curves for Lake Dobczyce (Figure 7) results from the combination (superposition) of partial spectral curves for the water surface, water column, and bottom of the reservoir. Backscattering in a blue band also results in high reflectance (Figure 7). It should be noted that even at a high concentration of, e.g., montmorillonite suspensions (500 g/m3), the spectral curve does not show strong local extremes for wavelengths over 440 nm (Figure 8). Such a concentration corresponds to a turbidity of 500 NTU, rarely found in surface waters (mostly after heavy rains). Starting from approximately 440 nm, the spectral curve drops down (Figure 8), as shown in the curves in Figure 7. At concentrations exceeding 500 g/m3, the maximum, corresponding to 440 nm, shifts towards longer wavelengths. The spectral curves may take completely different shapes than the ones shown in Figures 7 and 8 for other types of the mineral suspensions.

**Figure 7.** Reflectance at different measuring points of Lake Dobczyce in the years 2016–2021, after a normalization of the original reflectance from the satellite.

**Figure 8.** Spectral curve for the montmorillonite suspensions in water, USGS Spectral Library Version 7, https://crustal.usgs.gov/speclab/QueryAll07a.php?quick\_filter=water (accessed on 1 January 2022). Reprinted with permission from [39].

The spectral curves for the chlorophyll solution can also differ significantly. At low concentrations (2.97 mg chl/m3) (Figure 9), there are no strong local extremes on the spectral curve above 400 nm. Characteristic local extremes appear at higher concentrations (7.609 mg chl/m3) (Figure 10) and are similar to the curves in Figure 2. Therefore, it may be concluded that concentration of water components (chlorophyll, type of suspension) will have a decisive influence on the shape of the spectral curves.

**Figure 9.** Spectral curve of chlorophyll in water (2.97 mg chl/m3), USGS Spectral Library Version 7, https://crustal.usgs.gov/speclab/QueryAll07a.php?quick\_filter=water (accessed on 1 January 2022). Reprinted with permission from [39].

**Figure 10.** Spectral curve of chlorophyll in water (7.609 mg chl/m3), USGS Spectral Library Version 7, https://crustal.usgs.gov/speclab/QueryAll07a.php?quick\_filter=water (accessed on 1 January 2022). Reprinted with permission from [39].

#### *2.5. Model*

To compute concentrations of chlorophyll a (Cchl) from reflectance R obtained from satellite images for different wavelengths, a mathematical relationship has to be developed that meets the rules for logical concentration values for extremely high or low reflectance values. Empirical models, which are a linear combination of partial formulas, are risky to use because they cannot determine model parameters for all possible R values. The risk arises from a finite number of observational data, which usually does not include information on extreme values of R. Therefore, it may happen that Cchl calculated from the R values obtained from the new photos will be illogical, or even negative. Generally, extrapolation of such a model beyond the R values used to determine the model parameters yields highly ambiguous results; that is why models originating from the theory of dimensional analysis are considered to be safer. These types of models are usually the product of function modules at the appropriate power. They may also generate poor results outside of the R values used in the model estimation, but at least they guarantee that the results are positive.

The initial model was defined as:

$$\mathbf{C\_{chl}} = \mathbf{a\_0} \mathbf{R\_{443}^{a\_1}} \mathbf{R\_{490}^{a\_2}} \mathbf{R\_{560}^{a\_3}} \mathbf{R\_{665}^{a\_4}} \mathbf{R\_{705}^{a\_5}} \mathbf{R\_{740}^{a\_6}} \mathbf{R\_{842}^{a\_7}} \mathbf{R\_{865}^{a\_8}} \mathbf{R\_{865}^{a\_9}} \mathbf{R\_{945}^{a\_{10}}} \mathbf{R\_{1375}^{a\_{11}}} \mathbf{R\_{1610}^{a\_{12}}} \mathbf{R\_{190}^{a\_{13}}} \tag{1}$$

where:

*α*<sup>0</sup> to *α*13—model coefficients;

*R* ... *.*—radiation reflectance for wavelengths: 443, 490, 560, 665, 705, 740, 783, 842, 865, 945, 1375, 1610, and 2190 nm, determined from the satellite data; and

*Cchl*—concentration of chlorophyll a [mg/m3].

The logarithmic Equation (1) leads to a linear relationship for the logarithms of reflectance R. The spectral curves of chlorophyll a (Figure 1) and the suspensions (Figures 2–4) show that the reflectance for wavelengths over 865 nm and shorter than 490 nm will not provide significant information on the concentration of chlorophyll a. Therefore, the logarithmic form of Equation (1), after simplification, takes the form:

$$\begin{aligned} \mathbf{1nC}\_{\text{cll}} &= a\_0 + a\_1 \mathbf{1nR}\_{490} + a\_2 \mathbf{1nR}\_{560} + a\_3 \mathbf{1nR}\_{665} + a\_4 \mathbf{1nR}\_{705} + \\\ a\_5 \mathbf{1nR}\_{740} &+ a\_6 \mathbf{1nR}\_{783} + a\_7 \mathbf{1nR}\_{842} + a\_8 \mathbf{1nR}\_{865} \end{aligned} \tag{2}$$

where:

*a*<sup>0</sup> to *a*8—model coefficients,

*R* ... .—radiation reflectance for wavelengths: 490, 560, 665, 705, 740, 783, 842, and 865 nm, determined from the satellite data.

To make the model more general, additional components, like the squares of the reflectance log, were introduced:

$$\begin{aligned} \mathbf{InC}\_{cll} &= a\_0 + a\_1 \mathbf{InR}\_{490} + a\_2 \mathbf{InR}\_{650} + a\_3 \mathbf{InR}\_{665} + a\_4 \mathbf{InR}\_{705} + \\ &a\_5 \mathbf{InR}\_{740} + a\_6 \mathbf{InR}\_{783} + a\_7 \mathbf{InR}\_{842} + a\_8 \mathbf{InR}\_{865} + a\_9 \left(\mathbf{InR}\_{490}\right)^2 \\ &a\_{10} \left(\mathbf{InR}\_{560}\right)^2 + a\_{11} \left(\mathbf{InR}\_{665}\right)^2 + a\_{12} \left(\mathbf{InR}\_{705}\right)^2 + a\_{13} \left(\mathbf{InR}\_{740}\right)^2 + \\ &a\_{14} \left(\mathbf{InR}\_{783}\right)^2 + a\_{15} \left(\mathbf{InR}\_{842}\right)^2 + a\_{16} \left(\mathbf{InR}\_{865}\right)^2 \end{aligned} \tag{3}$$

The regression analysis of Equation (3) enabled the selection of such coefficients (from *a*<sup>1</sup> to *a*16) that showed a probability lower than the significance coefficient 0.05 in the Student's t-distribution (significance test of the equation coefficients). It was found that the most important coefficients are related to the wavelengths: 665, 705, 740, and 842 nm. These are the lengths approximately corresponding to the local minimum and maximum on the spectral curve of chlorophyll a (Figures 1 and 2) and the local minimum and maximum on the spectral curve of the suspensions (Figures 2–4). In the case of the Sentinel satellite, there were no 760 and 810 nm wavelength channels corresponding to the local minimum and maximum on the spectral curve of the suspensions (Figures 2–4); hence, the 740 and 842 nm channels turned out to be statistically significant. Eventually, the model looked as follows:

$$\begin{array}{c} \mathbf{InC}\_{\text{cll}} = a\_0 + a\_3 \mathbf{InR}\_{665} + a\_4 \mathbf{InR}\_{705} + a\_5 \mathbf{InR}\_{740} + a\_7 \mathbf{InR}\_{842} + \\\ a\_{11} (\mathbf{InR}\_{665})^2 + a\_{12} (\mathbf{InR}\_{705})^2 + a\_{13} (\mathbf{InR}\_{740})^2 + a\_{15} (\mathbf{InR}\_{842})^2 \end{array} \tag{4}$$

To eliminate the influence of the lake bottom reflectance, some reflectance corrections in Equation (4) were required. The effect of light reflection by the lake bottom in four light wavelengths is shown in Figure 11. However, just the reflectance corrections for 665 nm and 705 nm are sufficient for a good model accuracy. Of course, the reflectance for 740 nm and 842 nm may also be corrected, but their impact is negligible for the quality of the model (4). Generally, the shorter the wave, the stronger the bottom reflectance. Such observation is also confirmed by the spectral curves (Figure 7), which show that the shorter the wave, the higher the reflectance.

Following the Lambert–Beer law, absorbance (−**ln**(*I*/*I0*)) of the medium for UV, Vis, and IR radiation is proportional to an optical path length l and a concentration of substance C. This means that a light intensity I in a water column decreases exponentially:

$$I = I\_0 \cdot \exp(-k \cdot \mathbb{C} \cdot l) \tag{5}$$

*I*0—incident light intensity [W/m2];

*<sup>k</sup>*—absorption coefficient [m3/(g·m)];

*C*—concentration of a radiation absorber [g/m3];

*l*—optical path length [m].

Assuming that the lake bottom reflectance *Rb* is approximately proportional to the intensity of the radiation reaching the depth of l = h, the reflectance over a water surface, recorded by a photosensitive sensor, would be described by the relationship:

$$R\_b = \frac{\alpha\_b \cdot I\_0 \cdot \exp(-k \cdot \mathbb{C} \cdot 2h)}{I\_0} = \alpha\_b \cdot \exp(-k \cdot \mathbb{C} \cdot 2h) \tag{6}$$

*αb*—coefficient of the radiation reflectance through the bottom;

*C*—concentration of all substances (e.g., chlorophyll a, mineral suspensions, organic suspensions, water) responsible for absorption of radiation [g/m3], i.e., water turbidity [NTU];

*h*—depth (actual length of an optical path, when shooting close to the zenith, is approximately equal to the depth h) [m].

**Figure 11.** Images of Lake Dobczyce at four wavelengths (color representation: blue, yellow, violet, with the same color saturation in each photo).

The number 2 in Formula (6) means that radiation passes twice through the water layer. The reflectance *R*, recorded by the satellite, is approximately equal to the sum of the four principal reflectances:

$$\begin{array}{l} R = \frac{a\_s l\_0 + l\_{\rm cll} + l\_m + a\_b (l\_0 - a\_s l\_0) \exp(-k \cdot \mathbb{C} \cdot 2h)}{l\_0} = \\ \alpha\_s + \frac{I\_{\rm cll}}{l\_0} + \frac{I\_m}{l\_0} + \alpha\_b (1 - \alpha\_s) \exp(-k \cdot \mathbb{C} \cdot 2h) = \\ \alpha\_s + R\_{\rm cll} + R\_m + \alpha\_b (1 - \alpha\_s) \exp(-k \cdot \mathbb{C} \cdot 2h) \end{array} \tag{7}$$

*αs*—coefficient of the radiation reflectance for the water surface (water surface reflectance); *Rchl*—reflectance related to chlorophyll a in water, as the total effect of radiation; *Ichl*—reflected at different water depths;

*Rm*—reflectance related to the substances, other than chlorophyll a, in water, as the total effect of radiation *Im* reflected at different water depths;

*αb*(1 − *αs*)**exp**(−*k*·*C*·2*h*)—reflectance of the lake bottom *Rb*.

Intensity of radiation *Ichl*, *Im* reflected from water at different depths is defined as the depth integration from the derivative of the radiation intensity at different depths, with respect to the depth defined by (5); intensity *I*<sup>0</sup> must be reduced by the intensity of the radiation reflected from the water surface *αsI*0. Therefore:

$$I\_{cll} = -\int \frac{d(\alpha\_{cll} \mathbf{C}\_{cll} (I\_0 - \alpha\_s I\_0) \exp(-k \cdot \mathbf{C} \cdot 2l))}{dl} dl = \\\tag{8}$$

$$\alpha\_{cll} \mathbf{C}\_{cll} (I\_0 - \alpha\_s I\_0)(1 - \exp(-k \cdot \mathbf{C} \cdot 2l))$$

$$I\_m = -\int \frac{d(\alpha\_m \mathbf{C}\_m(I\_0 - \alpha\_s I\_0) \exp(-k \cdot \mathbf{C} \cdot 2l))}{dl} dl = \\\alpha\_m \mathbf{C}\_m (I\_0 - \alpha\_s I\_0) (1 - \exp(-k \cdot \mathbf{C} \cdot 2h))$$

*Cchl*—concentration of chlorophyll a [mg/m3];

*αchl*—coefficient of the radiation reflectance for chlorophyll a [m3/mg];

*Cm*—concentration of non-chlorophyll a substances in water responsible for light reflection [g/m<sup>3</sup> ];

*αm*—coefficient of the radiation reflectance for substances other than chlorophyll a [m3/g].

If h and/or C are sufficiently high, then **exp**(−*k*·*C*·2*h*) << 1 and:

$$I\_{\rm chl} \approx \mathfrak{a}\_{\rm chl} \mathbb{C}\_{\rm chl} (I\_0 - \mathfrak{a}\_{\rm s} I\_0) \implies \mathbb{R}\_{\rm chl} \approx \mathfrak{a}\_{\rm chl} \mathbb{C}\_{\rm chl} (1 - \mathfrak{a}\_{\rm s}) \tag{10}$$

$$I\_m \approx \alpha\_m \mathbb{C}\_m (I\_0 - \alpha\_s I\_0) \implies \mathbb{R}\_m \approx \alpha\_m \mathbb{C}\_m (1 - \alpha\_s) \tag{11}$$

In model (4), the *R* . . . reflectance for different wavelengths is the computational reflectance *Rcalc*, which are formally sums *Rchl*, ... + *Rm*, ... for different wavelengths. Therefore, each reflectance in model (4) should be corrected and replaced with the computational reflectance for different wavelengths:

$$\begin{array}{l} \mathbb{R}\_{\dots} \leftarrow \mathbb{R}\_{\text{calc},\dots} = \mathbb{R}\_{\text{chl},\dots} + \mathbb{R}\_{\text{m},\dots} = \\\mathbb{R}\_{\dots} - \mathbb{a}\_{\text{s},\dots} - \mathbb{a}\_{\text{b},\dots} (1 - \mathbb{a}\_{\text{s},\dots}) \exp(-k\_{\dots} \cdot \mathbb{C} \cdot 2h) \end{array} \tag{12}$$

. . . —the index dots refer to different wavelengths;

*R* . . . .—radiation reflectance for different wavelengths determined from satellite;

*Rcalc*, ...—computational reflectance for different wavelengths.

The coefficient *α<sup>s</sup>* (or reflectance) for different wavelengths is small if compared to other reflectance, and therefore should be neglected, so:

$$\begin{array}{l} R\_{\ldots} \leftarrow R\_{\text{calc},\ldots} = R\_{\text{chl},\ldots} + R\_{m,\ldots} = \\ R\_{\ldots} - \alpha\_{b,\ldots} \exp(-k\_{\ldots} \cdot \mathbf{C} \cdot \mathbf{2} h) \end{array} \tag{13}$$

A reflectance correction (13) in Equation (4) gives the equation describing the concentration of chlorophyll a as:

$$\begin{split} \mathsf{C}\_{\mathrm{chf}} &= \mathbf{exp}(a\_{0} + a\_{3} \mathbf{In}(R\_{665} - a\_{b,720m665} \mathbf{exp}(-k\_{665} \mathbf{C} \cdot 2h)) + \\ &\quad a\_{4} \mathbf{In}(R\_{705} - a\_{b,720m705} \mathbf{exp}(-k\_{705} \mathbf{C} \cdot 2h)) + \\ &\quad a\_{5} \mathbf{In}R\_{740} + a\_{71} \mathbf{In}R\_{842} + \\ &\quad a\_{11} \left(\mathbf{In}(R\_{665} - a\_{b,720m665} \mathbf{exp}(-k\_{665} \mathbf{C} \cdot 2h))\right)^{2} + \\ &\quad a\_{12} \left(\mathbf{In}(R\_{705} - a\_{b,720m705} \mathbf{exp}(-k\_{705} \mathbf{C} \cdot 2h))\right)^{2} + \\ &\quad a\_{13} \left(\mathbf{In}R\_{740}\right)^{2} + a\_{15} \left(\mathbf{In}R\_{842}\right)^{2}\right) \end{split} \tag{14}$$

*Cchl*—chlorophyll a concentration [mg/m<sup>3</sup> ]; **exp**(*a*0)—model's coefficient [mg/m<sup>3</sup> ].

Assume the following parameter values for Equations (7)–(9) relating to one wavelength: *<sup>α</sup><sup>s</sup>* = 0.002, *<sup>α</sup><sup>m</sup>* = 0.001, *<sup>α</sup><sup>b</sup>* = 0.3, *<sup>α</sup>chl* = 0.001, *<sup>k</sup>* = 0.05 (NTU·m)−<sup>1</sup> , *C<sup>m</sup>* = *C* ≈ 5 NTU (in practice, water turbidity is assumed as C) and *Cchl* = {1, 5, 10, 20, 50} mg/m<sup>3</sup> . From the Equations (7)–(9), the reflectance R can be determined, as a function of the depth h (Figure 12). Knowing R and the parameters of Equations (7)–(9), the concentration of chlorophyll a *Cchl* can be again determined. These would be horizontal lines of constant values *Cchl* = {1, 5, 10, 20, 50} mg/m<sup>3</sup> for all depths h. In order to check the quality of the model (14), we write it for a single wavelength:

$$\begin{array}{l} \mathbf{C}\_{\text{chl}} = \exp\left[a\_0 + a\_3 \ln\left(R - a\_b^\* \exp(-k^\* \cdot \mathbf{C} \cdot \mathbf{2}h)\right) + \\ a\_{11} \left(\ln\left(R - a\_b^\* \exp(-k^\* \cdot \mathbf{C} \cdot \mathbf{2}h)\right)\right)^2\right] \end{array} \tag{15}$$

**Figure 12.** Changes in reflectance R as a function of depth h for the sensor above the water surface at different concentrations of chlorophyll a *Cchl* = {1, 5, 10, 20, 50} mg/m3.

The parameters of model (15) were determined on the basis of changes in reflectance R as a function of h, determined from Equations (7)–(9). The following parameters were obtained: *a*<sup>0</sup> = −0.994654, *a*<sup>3</sup> = −3.91129, *α*<sup>∗</sup> *<sup>b</sup>* = 0.293404, *<sup>k</sup>*<sup>∗</sup> = 0.04999945 (NTU·m)−1, *<sup>a</sup>*<sup>11</sup> <sup>=</sup> −0.767181. Parameters: *α*∗ *<sup>b</sup>* and *<sup>k</sup>*<sup>∗</sup> are almost equal to *<sup>α</sup><sup>b</sup>* = 0.3, *<sup>k</sup>* = 0.05 (NTU·m)−<sup>1</sup> used in Equations (7)–(9). Formally, *α*∗ *<sup>b</sup>* should be equal to *α<sup>s</sup>* + *αb*(1 − *αs*) = 0.002 + 0.3·(1 − 0.002) = 0.3014, as in Equation (12). The compliance of the parameters *α*∗ *<sup>b</sup>* and *k*<sup>∗</sup> with the parameters *α<sup>b</sup>* and *k* confirms that a correction (13) for the bottom reflectance in the models (14) and (15) was needed. Model (15) is less accurate for shallow depths and higher concentrations of chlorophyll a Cchl (Figure 13), along with model (14).

**Figure 13.** Chlorophyll a Cchl concentrations as a function of depth h for Equations (7)–(9) and model (15) (approximation).

It would be difficult to use a combination of Equations (7)–(9) to calculate chlorophyll a concentrations on the basis of the reflectance recorded by a satellite; such reflectance is a combination of both a chlorophyll a reflectance and the reflectance of other substances present in the water. Such calculations were performed for the wavelengths of 665 nm and 705 nm. However, accurate parameters of models (7)–(9) for one wavelength and known turbidity could not be found. Therefore, model (14) has been proposed as the one that that includes the reflectance for different wavelengths.

#### **3. Results**

#### *3.1. Model Parameters for Chlorophyll a*

Based on the measurements of the concentration of chlorophyll a Cchl, turbidity C, lake depth h, and the reflectance *R* ... for wavelengths: 665, 705, 740, and 842 nm, the model parameters were determined (14). Due to the large number of parameters:

*a*<sup>0</sup> , *a*<sup>3</sup> , *αb*, 665 , *k*<sup>665</sup> , *a*<sup>4</sup> , *αb*, 705 , *k*<sup>705</sup> , *a*<sup>5</sup> , *a*<sup>7</sup> , *a*<sup>11</sup> , *a*<sup>12</sup> , *a*<sup>13</sup> , *a*<sup>15</sup>

the two-stage procedure was introduced. At first, a preliminary estimate of the parameters: *a*<sup>0</sup> , *a*<sup>3</sup> , *a*<sup>4</sup> , *a*<sup>5</sup> , *a*<sup>7</sup> , *a*<sup>11</sup> , *a*<sup>12</sup> , *a*<sup>13</sup> , *a*<sup>15</sup> was produced for the logarithm of r (4) using the least square method, i.e., minimizing the sum of squares of deviations between the logarithm of the measured chlorophyll concentration Cchl and the logarithm of the concentration calculated from model (4). Then, other model parameters were found, assuming the parameter values previously determined for model (4) for the calculations. All parameters were determined by minimizing the sum of squared deviations between the measured Cchl concentration and the one calculated from model (14); the correlation coefficient was 0.944. The values of the model parameters (14) are summarized in Table 2, col. 2. The model fit is shown in Figure 14. If turbidity was calculated from model (17), the parameters of model (14) would have been provided, as shown in Table 2, col. 3. The correlation coefficient was 0.925. The fit of model (14), while using model (17), is shown in Figure 15. In both cases, the model's fit to the measurements was good. If there is no detailed data on turbidity and the parameter shows little variability, the average value of turbidity can be used in the calculations.


**Table 2.** Model parameters (14).

#### *3.2. Maps of Chlorophyll a Concentrations for Lake Dobczyce*

The map of chlorophyll a concentrations (Figure 16) was developed from satellite images using models (14) and (17). Long retention times observed in Lake Dobczyce (average 157 days), contribute to the algae growth and high chlorophyll a concentrations; higher concentrations are noted in the middle of the lake. In this region, towards the dam, low flow velocities are also observed (Figure 17); therefore, concentrations slightly decrease due to sedimentation of the suspended solids and algae. There are stagnant zones at the northern and southern banks of the lake, where flow velocities are very low (Figure 17). Such conditions promote algae growth, and the observed concentration of chlorophyll a is high. Moreover, in the lake branches, where a water exchange is low and the retention times exceed the average one, high concentrations of chlorophyll a are observed (Figure 16).

**Figure 14.** Model fit (Cchl model) (14) to the measured concentrations of chlorophyl-a (Cchl data) at the known turbidity values.

**Figure 15.** Model fit (Cchl model) (14) to the measured concentrations of chlorophyl-a (Cchl data) for turbidity model (17).

In October, water temperatures in the lake are low (around 13 ◦C), and the algae growth slows down. At that time, higher concentrations of algae could be found only in the northern branch of the lake (stagnant zone) and at the southern shores, where low flow velocities (Figure 17) are responsible for stagnant zones (Figure 18). The chlorophyll a model showed a good fit to the measurement data (Figure 18, table).

**Figure 16.** Chlorophyll concentration in Lake Dobczyce 9 May2021 (imagery by Sentinel 2).

**Figure 17.** Two-dimensional field of the average vertical velocity in the main part of Lake Dobczyce (without the northern bay), with the total flow of 10 m3/s and no-wind conditions (model RMA2). Reprinted with permission from [40].

#### *3.3. Models Parameters for Turbidity*

The turbidity C model was developed similarly to the chlorophyll a model. At first, it was determined which factors from the range of *a*0–*a*<sup>16</sup> are significant for turbidity in Equation (3). Then, the significance test was used for the *a*1–*a*<sup>16</sup> coefficients at a significance level of 0.05 using the Student's t-distribution. It was found that the most important coefficients are related to the wavelengths: 705, 740, and 783 nm; these lengths approximately correspond to the local minimum and maximum on the spectral curve of suspensions (Figures 2–4). The initial form of the equation was as follows:

$$\begin{array}{l} \mathbf{1nC} = a\_{\mathrm{C,0}} + a\_{\mathrm{C,4}}\mathbf{1nR\_{705}} + a\_{\mathrm{C,5}}\mathbf{1nR\_{740}} + a\_{\mathrm{C,6}}\mathbf{1nR\_{783}} + \\\ a\_{\mathrm{C,12}}(\mathbf{1nR\_{705}})^2 + a\_{\mathrm{C,13}}(\mathbf{1nR\_{740}})^2 + a\_{\mathrm{C,14}}(\mathbf{1nR\_{783}})^2 \end{array} \tag{16}$$

*C*—turbidity [NTU];

*a* ... .—model coefficients;

*R* ... .—radiation reflectance for wavelengths: 705, 740, and 783 nm.

Taking into account the reflectance of the lake bottom leads to a relationship:

$$\begin{array}{l} \mathbf{C} = \mathbf{exp} \quad a\_{\mathrm{C},0} + a\_{\mathrm{C},4} \mathbf{ln} \big( R\_{705} - a\_{\mathrm{C},b,705} \mathbf{exp} \big( -k\_{\mathrm{C},705} \cdot \mathbf{C} \cdot 2h \big) \big) + \\ \quad a\_{\mathrm{C},5} \mathbf{ln} R\_{740} + a\_{\mathrm{C},6} \mathbf{ln} R\_{783} + \\ \quad a\_{\mathrm{C},12} \Big( \mathbf{ln} \big( R\_{705} - a\_{\mathrm{C},b,705} \mathbf{exp} \big( -k\_{\mathrm{C},705} \mathbf{C} \cdot 2h \big) \big) \big)^{2} + \\ \quad a\_{\mathrm{C},13} \big( \mathbf{ln} R\_{740} \big)^{2} + a\_{\mathrm{C},14} \big( \mathbf{ln} R\_{783} \big)^{2} \end{array} \tag{17}$$

#### **exp**(*aC*,0)—model's coefficient [NTU].

**Figure 18.** Chlorophyll concentrations in Lake Dobczyce, 21 October 2021 (imagery by Sentinel 2).

The effect of light reflection by the lake bottom is shown in Figure 11. It appears that it is sufficient to consider the reflectance correction for 705 nm to obtain a satisfactory accuracy of model (17). Of course, it is also possible to correct the reflectance for 740 nm and 783 nm, but the overall improvement of the model quality would be negligible.

Equation (17) is implicit due to turbidity C. The C value can be obtained by successive approximations, i.e., inserting in **exp** −*kC*,705·*C*·2*h* values C from the previous approximation; after several attempts (e.g., 4), the value C becomes reasonably accurate. Another method involves searching for the function's zero, i.e., the difference of the right side of Equation (17) and C. Here, the regula falsi method of searching for the function's zero can be used.

Based on the measurement data of: turbidity C, lake depth h, and the reflectance *R*... for wavelengths 705, 740, and 783 nm, the parameters of model (17) were determined. Due to the large number of parameters:

$$a\_{\text{C,0}} \text{ \textquotedblleft } a\_{\text{C,4}} \text{ \textquotedblright} \text{ \textquotedblright} 5 \text{ \textquotedblright} 5 \text{ \textquotedblright} 5 \text{ \textquotedblright} 5 \text{ \textquotedblright} \text{ \textquotedblleft} a\_{\text{C,5}} \text{ \textquotedblright} \text{ \textquotedblleft} a\_{\text{C,12}} \text{ \textquotedblright} \text{ \textquotedblleft} a\_{\text{C,13}} \text{ \textquotedblright}$$

the two-stage procedure was employed. First, the values of some parameters were initially estimated: *aC*,0 , *aC*,4 , *aC*,5 , *aC*,6 , *aC*,12 , *aC*,13 , *aC*,14, and then all the others were estimated, while correcting the values of the pre-estimated parameters. The parameters were determined using the least squares method (the best fit) for turbidity C and turbidity from model (17). The model parameters are summarized in Table 3. The correlation coefficient was 0.939, and the model fit is shown in Figure 19. A good fit of the model to the measurement data was obtained.

#### *3.4. Maps of Turbidity for Lake Dobczyce*

The water turbidity map for the lake (Figure 20) was developed from the satellite images and model (17). Long retention times in Lake Dobczyce (average 157 days) favor the sedimentation of suspended particles. Therefore, the water turbidity in the middle of the lake, towards the dam, where there are low velocities (Figure 17), is lower than turbidity close to the place where the Raba river enters the lake (Figure 20). In the branches of the lake, where water exchange is low and retention times are longer, low turbidity may be attributed to good sedimentation conditions (Figure 20); turbidity is also low on the south-eastern shores of the lake, where the flow velocities are low.


**Table 3.** Values of the model (17) parameters.

**Figure 19.** Model (17) fit (C model) to the measured turbidity (C data).

In October (Figure 21) and in May (Figure 20), turbidity decreased along the lake towards the dam. Moreover, in the northern branch, sedimentation contributed to a lower turbidity (Figure 21). The turbidity model showed a good fit with the measurements (Figure 21, table); however, at point 2 (Figure 21), the turbidity calculations were poor due to clouds obscuring the view.

**Figure 20.** Water turbidity of Lake Dobczyce 9 May 2021 (imagery by Sentinel 2).

**Figure 21.** Water turbidity of Lake Dobczyce, 21 October 2021 (imagery by Sentinel 2).

#### **4. Discussion**

The authors developed the model to calculate the concentration of chlorophyll a and turbidity in the water. In the case of Lake Dobczyce, the chlorophyll a model takes into account reflectance corresponding to the middle wavelengths 665, 705, 740, and 842 nm, while the bottom effect is related to the wavelengths 665 and 705 nm. The model for turbidity considers the reflectance corresponding to the middle wavelengths 705, 740, and 783 nm, while the bottom effect is related to the wavelength 705 nm. To eliminate a minor reflectance related to other wavelengths from the pseudo-linear model (3), the Student's *t*-test was been used for both chlorophyll a and turbidity. It cannot be predicted in advance whether the models for different lakes will always use the reflectance for the same

wavelengths. If so, the model coefficients may probably be different due to different water characteristic and other properties of the bottom sediments. In the case of Lake Dobczyce, all of the discriminants that are quotients of reflectance differences or reflectance quotients (Table 1) were not used in the models because they did not improve their quality.

The average relative error of the model for chlorophyll a is about 0.216, while the average error is about 2.01 mg Chl a/m3. Graphs for errors are shown in Figures 22 and 23.

**Figure 22.** Model errors for chlorophyll a versus measurement data.

**Figure 23.** Model relative errors for chlorophyll a versus measurement data.

The greatest errors of the model relate to concentrations of about 15 mg Chl a/m3 (Figure 22) and the highest relative errors concentrations of about 6 mg Chl a/m3 (Figure 23). For low and high chlorophyll a concentrations, the relative errors are the smallest. Sometimes, the existence of greater errors is caused by the unrepresentativeness of point measurement relative to the area represented by one raster. In addition, a greater number of data with average values of chlorophyll a concentration increases the likelihood of a greater error.

The reflectance corrections for 665 nm were smaller than for 705 nm (Figure 24). This may be due to the fact that the wavelength of emissions for chlorophyll a is 663 nm, after it was stimulated using a 430 nm radiation. Backscattering with a length of 665 nm, recorded by the satellite, therefore had to be more characteristic of chlorophyll a than for other substances contained in water. Backscattering with a length of 705 nm is characteristic of minerals, in particular montmorillonite (Figure 8), and this likely explains why the reflectance corrections were greater. Reflectances for other wavelengths, occurring in the model for chlorophyll a, can be characteristic of other substances contained in water and atmosphere ingredients.

**Figure 24.** The reflection corrections graph versus the product of the depth and turbidity for the chlorophyll a model.

The average relative error of the model for turbidity is about 0.184, while the average error is about 0.629 NTU. Graphs of errors are shown in Figures 25 and 26.

**Figure 25.** Model errors for turbidity versus measurement data.

**Figure 26.** Model relative errors for turbidity versus measurement data.

The greatest errors for the turbidity model concern the turbidity of about 7 NTU (Figure 25) and the greatest relative errors of turbidity of about 2–4 NTU (Figure 26). Sometimes, the existence of greater errors is caused by the unrepresentativeness of point measurement relative to the area represented by one raster. In addition, a greater number of data with average turbidity increases the likelihood of a greater error.

The reflectance corrections for 705 nm for the turbidity model were approximately constant and amounted to around 0.0146.

The remote determination of water quality parameters requires corrections of the reflectance measured by the satellite, which may pose some problems. In urbanized areas or areas close to industrial agglomerations, standard corrections do not give good results. The changes in composition of the atmosphere (content of moisture, dust, and other pollutants) may require different corrections. Therefore, it becomes necessary to determine a reference surface that enables a reflectance correction. This can be any surface with spectral properties constant in time. In this case, a fragment of the dam crest served as the reference surface. The base reflectance was the first reference surface reflectance in a series of measurements. The changes in the reflectance of the reference surface in relation to the base surface made it possible to correct the reflectance of the lake surface.

#### **5. Conclusions**


**Author Contributions:** Conceptualization, A.B.; Formal analysis, A.B.; Software, C.T.; Validation, A.B.; Visualization, C.T. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Informed Consent Statement:** Informed consent was obtained from all subjects involved in the study.

**Acknowledgments:** The authors thank the Waterworks of the City of Krakow for providing assistance in the physic-chemical analyses of water samples and for sharing the data regarding the water quality of Lake Dobczyce.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Article* **Evaluation of Seasonal and Spatial Variations in Water Quality and Identification of Potential Sources of Pollution Using Multivariate Statistical Techniques for Lake Hawassa Watershed, Ethiopia**

**Semaria Moga Lencha 1,2,\* , Mihret Dananto Ulsido 2,3 and Alemayehu Muluneh <sup>2</sup>**


**\*** Correspondence: semaria.lencha@uni-rostock.de or semaria@hu.edu.et; Tel.: +49-1521-121-2094 **Abstract:** The magnitude of pollution in Lake Hawassa has been exacerbated by population growth and economic development in the city of Hawassa, which is hydrologically closed and retains pollutants entering it. This study was therefore aimed at examining seasonal and spatial variations in the water quality of Lake Hawassa Watershed (LHW) and identifying possible sources of pollution using multivariate statistical techniques. Water and effluent samples from LHW were collected monthly for analysis of 19 physicochemical parameters during dry and wet seasons at 19 monitoring stations. Multivariate statistical techniques (MVST) were used to investigate the influences of an anthropogenic intervention on the physicochemical characteristics of water quality at monitoring stations. Through cluster analysis (CA), all 19 monitoring stations were spatially grouped into two statistically significant clusters for the dry and wet seasons based on pollution index, which were designated as moderately polluted (MP) and highly polluted (HP). According to the study results, rivers and Lake Hawassa were moderately polluted (MP), while point sources (industry, hospitals and hotels) were found to be highly polluted (HP). Discriminant analysis (DA) was used to identify the most critical parameters to study the spatial variations, and seven significant parameters were extracted (electrical conductivity (EC), dissolved oxygen (DO), chemical oxygen demand (COD), total nitrogen (TN), total phosphorous (TP), sodium ion (Na+), and potassium ion (K+) with the spatial variance to distinguish the pollution condition of the groups obtained using CA. Principal component analysis (PCA) was used to qualitatively determine the potential sources contributing to LHW pollution. In addition, three factors determining pollution levels during the dry and wet season were identified to explain 70.5% and 72.5% of the total variance, respectively. Various sources of pollution are prevalent in the LHW, including urban runoff, industrial discharges, diffused sources from agricultural land use, and livestock. A correlation matrix with seasonal variations was prepared for both seasons using physicochemical parameters. In conclusion, effective management of point and non-point source pollution is imperative to improve domestic, industrial, livestock, and agricultural runoff to reduce pollutants entering the Lake. In this regard, proper municipal and industrial wastewater treatment should be complemented, especially, by stringent management that requires a comprehensive application of technologies such as fertilizer management, ecological ditches, constructed wetlands, and buffer strips. Furthermore, application of indigenous aeration practices such as the use of drop structures at critical locations would help improve water quality in the lake watershed.

**Keywords:** monitoring; mitigations; spatial and temporal variabilities; principal component analysis; cluster analysis; discriminant analysis; water quality; pollution; correlation

**Citation:** Lencha, S.M.; Ulsido, M.D.; Muluneh, A. Evaluation of Seasonal and Spatial Variations in Water Quality and Identification of Potential Sources of Pollution Using Multivariate Statistical Techniques for Lake Hawassa Watershed, Ethiopia. *Appl. Sci.* **2021**, *11*, 8991. https:// doi.org/10.3390/app11198991

Academic Editor: Jorge Rodríguez-Chueca

Received: 2 August 2021 Accepted: 23 September 2021 Published: 27 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

Studies have shown that urban, agricultural, and industrial discharges have a direct effect on surface water quality. Similarly, urban wastewaters cause fecal contamination of surface waters, and urban stormwater runoff, which contains large amounts of fecal microbes, also affects surface water quality [1]. Surface water bodies are vital natural resources that are vulnerable to pollution. The contaminants are chemical, physical, and biological constituents resulting from anthropogenic activities and are of greater environmental consideration [2]. Surface water bodies are extensively used as the major sources for domestic, non-domestic, industrial, and irrigation purposes. Therefore, monitoring and assessment of water bodies is imperative to obtain reliable information on water quality for effective management [3]. Anthropogenic uses of the waterbodies in the study basin can degrade the quality of surface water and impair its usability as potable water supply or for industry, agriculture, recreation, or other purposes. Hence, regular monitoring of water quality of rivers and lake is indispensable [4,5]. The most affected river stretches are those that flow through urbanized and exceedingly populated urban areas where there is no adequate sanitation. Upstream rural areas are mainly affected by pollutants from non-point sources such as agricultural runoff, whereas urban areas are polluted by point sources, sewage discharges, urban runoff, and pollutants from upstream areas [6,7].

Studies have shown that some lakes and wetlands around the world have disappeared or are showing changes in their ecosystem. Furthermore, factors such as intensive land use for urbanization and agriculture have had significant impact on the hydrology, ecology, and ecosystem services of lakes, which has eventually led to a decline in lake levels [8]. In addition, pollutants have long been a concern, as their accumulation can have serious effects on fauna, flora, and human health when the huge amount of urban and industrial wastewater reaches the shores [9].

Lake Hawassa is located near the city of Hawassa and is surrounded by agricultural land, industries and residential areas. Therefore, it is susceptible to a variety of pollutants that enter the lake directly or indirectly. On the other hand, the Lake Hawassa Watershed is experiencing rapid land cover change, and natural resources have overwhelmingly diminished. The lake is hydrologically closed and has no apparent outlet, so all pollutants entering the lake are retained. As a result, the lake faces numerous problems, and the water quality deteriorates over time, threatening biodiversity [10].

Significant industrialization, augmented with rapid urbanization and increasing economic development, has increased the extent of pollution [11]. The pollution is mainly from non-point sources caused by urban and agricultural runoff, overgrazing, deforestation, soil erosion, land development, and industrial effluents. This leads to numerous environmental concerns that have resulted in substantial hydrological disturbances. The main factories in the study area are a ceramics factory, a flourmill, a cement products factory, a Moha soft drink factory, a BGI (St. George Brewery factory), an Etabs soap factory, an industrial park in Hawassa, and other small-scale industries. They are virtually all concentrated along the main road, which is close to the shallow swamp, and discharge their effluents into the lake through streams. On the other hand, deforestation and irrigation of the land have caused the drying up of Lake Cheleleka by reducing the streamflow [12].

Various studies have been conducted to examine water quality in the LHW catchment and identify sources of pollution. Teshome [11] investigated the eastern catchment of Lake Hawasa Watershed to assess the seasonal water quality and its suitability for the designated uses. The findings revealed that the rivers in the eastern part of Lake Hawassa Watershed are suitable for agriculture and livestock but unpleasant for aquatic life, and the lake is hypereutrophic.

Amare [13] investigated the primary sources of non-point source pollution and their relative contribution in Lake Hawassa Watershed using the Annualized Agricultural Non-Point Source (AnnAGNPS) model. The pollutant-loading model revealed non-point source pollutants originating from agricultural lands and associated with deleterious anthropogenic activities responsible for the water quality impairment of Lake Hawassa. These point sources have been determined to be the source of numerous pollutants in the lake ecosystem if the effluent control system put in place is unsuitable [14].

Kebede [15] studied the impact of land cover changes on water quality and streamflow in Lake Hawassa Watershed and concluded that water quality in the upper watershed of the three rivers was better than the lower sections of the catchment with respect to the parameters studied, which might be correlated to the observed land use.

A study conducted by Lencha et al. [16] at Lake Hawassa revealed that most of the population, including the inner part of the city, are using latrines. Larger buildings have conventional flushing systems but without any wastewater treatment. Furthermore, industrial and commercial point sources are known to discharge their effluents into streams or rivers that end up in the Lake. In addition, Hawassa Industrial Park and Referral Hospital discharge their effluents directly into the lake. This is a threat to the people who rely on rivers, streams, and the lake for domestic and other purposes and to the survival of aquatic life as well.

To sum up, some studies regarding the water quality have been conducted in either the eastern or the western catchment of Lake Hawassa, while others have been carried out only at Lake Hawassa. Nonetheless, there is no sufficient water quality study to connect agricultural and urban land use with the watershed pollution level to identify the sources of pollution. The previous studies mainly relied on random monitoring and data from literature and focused only on a few water quality parameters, which cannot reflect the whole picture of water quality in the watershed. Additionally, some previous studies also obtained contradictory findings. On the other hand, urbanization, industrialization, commercial activities, and population growth are increasing rapidly, which could increase sewage and effluents production. Through monitoring data, consistent data analysis, and homogenization of parameters, this study aimed to (1) statistically analyze multipleparameter data by using principal component analysis (PCA), cluster analysis (CA), and discriminant analysis (DA); (2) investigate the broad-spectrum variation in the parameters of LHW; and (3) cluster monitoring stations with similar characteristics and identify potential sources of pollution in LHW.

#### **2. Materials and Methods**

#### *2.1. Study Area*

Lake Hawassa Watershed (LHW) is located 275 km from the capital Addis Ababa, in the capital of Sidama regional state, on the main road leading to Nairobi, Kenya via Moyale. LHW has a total area of 1431 km2 and lies between 6◦45 to 7◦15 N latitude and 38◦15 to 38◦45 E longitude (Figure 1). LHW comprises five sub-watersheds [17].

The watershed is known for its flat plains and dissevered undulating landscape with elevation ranging from 1571 to 2962 m above sea level [18]. The area comprises mountains and low-lying areas, with a wide flat wetland called Cheleleka. Perennial rivers and streams on the north and northeast sides of the catchment and runoff on the east wall feed Cheleleka. The sub-basin of Tikur-Wuha consists of only a tributary called Tikur-Wuha that flows into Lake Hawassa. In this lake system, no surface water flows out from the lake except by evaporation and abstraction, so the catchment can be considered hydrologically closed [15]. The climate of the Hawassa sub-basin is sub-humid and distinctly seasonal. The months from April to October are wet and humid, and the main rainy season is between July and September, with a mean annual precipitation of about 955 mm. The mean minimum precipitation is 17.8 mm in December (dry season) and the mean maximum precipitation is 119.8 mm in August (rainy season) [19].

#### *2.2. Sampling and Monitoring Parameters*

The monitoring sites and sampling strategy were planned to cover a wide range of factors contributing to the water quality of the river, taking into account tributaries and point sources whose effluents end up in the lake and have a substantial impact on the water quality of the lake. The criteria for selecting monitoring points were hydrological, with confluence of sub-basins having distinct characteristics and land use types, with the intention of transferring parameters to unmonitored sub-basins. Furthermore, factors such as availability of point and non-point sources, land use type, and urban and wastewater drains were considered in the selection of monitoring sites.

Hence, a total of nineteen (19) monitoring stations were selected (Table 1 and Figure 1). Four (4) monitoring sites were selected purposively at the Wesha, Hallow, Wedessa, and Tikur-Wuha river mouths of the respective sub-watersheds.

Eleven (11) monitoring sites were distributed evenly along the entire course of Lake Hawassa for water quality monitoring. Three (3) monitoring sites were selected near the industrial disposal site, and one (1) was at the health care center.

The monitoring sites in the Tikur-Wuha catchment were Wesha River (MS1), Hallo River (MS2), and Wedessa River (MS3), which are located in the upstream part of Lake Hawassa, where agricultural runoff from the catchment flows directly or through its tributaries into the Cheleleka wetland. The three rivers were purposively selected based on their size and spatial location to represent their respective sub-basins. Monitoring station 6 (MS6) is a critical area with mostly fresh water where factories discharge their effluent into the Tikur-Wuha River, and the river eventually flows into Lake Hawassa. This is an area where river inputs to the lake are high.


**Table 1.** Monitoring stations in Lake Hawassa Watershed.

The site codes are indicated in Figure 1. FH designates Fikerhayk, HR labels Haile Resort, LHW designates Lake Hawassa Watershed, LH designates Lake Hawassa.

Monitoring sites for point sources were selected from available industries in the catchment that directly or indirectly feed Lake Hawassa. The selected sites were the St. George Brewery factory, BGI (MS4), and the Moha soft drink factory (MS5), whose effluents discharge into the Cheleleka wetland and eventually enter Lake Hawassa via Tikur-Wuha River, as well as the Referral Hospital (MS15) and Hawassa Industrial park (MS19), which discharge their effluents directly in to Lake Hawassa.

The monitoring stations for Lake Hawassa were selected based on the presence of major pollution sources in the lake, existence of point sources, health facilities, industrial effluent emission sites, availability of boating and recreational activities, presence of service rendering facilities such as Haile and Lewi resorts, fish market (Amora-Gedel and Gudumale), and also the central part of the lake where the disturbance is minimum.

For this purpose, eight (8) monitoring sites were selected in the eastern part (northeast to southeast) of the lake and designated as MS7, MS8, MS9, MS10, MS11, MS12, MS13, and MS14.

The other three (3) monitoring sites were located on the western (northwest to southwest) sides of the lake and were designated as MS16 for the local village Ali-Girma site (opposite Haile Resort), MS17 for Sima site that is opposite side of Mount Tabor, and MS18 for Dore-Bafana Betemengist site. In this part of the lake, although there is no point source pollution, there is enormous anthropogenic activity in the form of non-point source pollution from recreational activities, agricultural runoff, and animal waste.

The analyses of physicochemical water quality parameters at selected sites and periods were conducted from May 2020 to January 2021 to see seasonal variation. Sample collection for the wet season was event-based, i.e., samples were collected after rainfall events. The coordinates of each sampling stations was determined using GNSS.

Composite samples were collected in pre-cleaned 2L polyethylene plastic bottles (sterilized glass bottles were used for biochemical oxygen demand (BOD) and chemical oxygen demand (COD) analyses) for different parameters. The bottles were washed with concentrated nitric acid and distilled water before sample collection and thoroughly rinsed with sample water during collection to avoid possible contamination. The water samples were aseptically handled, labelled, preserved in sterile glass bottles, stored in the cooler (Mobicool v30 AC/DC, Germany) and ice box, and transported to the laboratory of Hawassa University Environmental Engineering, Addis Ababa City Government Environmental Protection, and Green Development Commission and Engineering Corporation of Oromiya for analysis.

The collection, handling, preservation, and treatment of the water samples followed the standard methods outlined for the examination of water and wastewater by the American Public Health Association guidelines [20] and all the parameters were presented with their respective analytical methods and instruments used for analysis in Table 2 below.

**Table 2.** Analytical methods and instruments used for analysis.


Total ammonium nitrogen (TAN), electrical conductivity (EC), total dissolved solids (TDS), dissolved oxygen (DO), biochemical oxygen demand (BOD5), chemical oxygen demand (COD), soluble reactive phosphorous (SRP), total phosphorous (TP), nitrate (NO3 −), nitrite (NO2 <sup>−</sup>), magnesium ion (Mg+2), sodium ion (Na+), potassium ion (K+), calcium ion (Ca+2), and suspended solids (SS).

Un-Ionized Ammonia Determination from Total Ammonium Nitrogen (TAN)

The un-ionized free ammonia was calculated by the mass action law in its logarithmic form (1). The pKa as function of temperature was taken from Emerson et al. [21]:

$$\% \text{Un} - \text{ionized NH3} - \text{N} = \frac{1}{\left(1 + 10^{\left(\text{pK}\_{\text{a}} - \text{pH}\right)}\right)} \tag{1}$$

$$\mathrm{pK\_a} = \frac{0.09108 + 2729.92}{\left(\mathrm{T\_k}\right)}\tag{2}$$

where Tk is temperature in Kelvins (273 + ◦C).

#### **3. Multivariate Statistical Techniques and Data Treatment**

#### *3.1. Multivariate Statistical Techniques*

Multivariate statistical techniques (MVST) are a valuable tool to estimate efficiently the spatio-temporal variability in a watershed and the influences of human intervention on the characteristics of physicochemical parameters at monitoring stations [22]. In addition, MVST like cluster analysis (CA), discriminant analysis (DA), and PCA/factor analysis can be implemented to interpret complex databases to offer better visualization of water quality in the studied watershed [23]. The statistical techniques PCA, CA, and DA are vital to determine the primary relationships among the physicochemical parameters measured in experimental data standardized to the Z-scale to avoid inaccurate grouping because of the huge variability in the data dimensionality [5,24–26].

Principal component analysis (PCA), cluster analysis (CA) and discriminant Analysis (DA) were carried out to examine the seasonal variations, identify possible pollution sources, and analyze and interpret surface water quality data to draw meaningful information in China [2,7,27–30], South Asia—Bangladesh [31], the Middle East—Iran [3], India [23,32,33], South African [34], Ethiopia [22,35], South Asia Malaysia [36], the Middle East—Lebanon [6,37], Spain [38], and Serbia [39].

XLSTAT 2016 (Addinsoft, New York, USA), Microsoft Excel 2016, and "Statistical Package for the Social Sciences Software, IBM SPSS 25 for Windows" were employed to perform statistical analysis integrally.

#### *3.2. Data Treatment and Multivariate Statistical Methods*

PCA is sensitive to outliers, missing data, and poor linear correlation among variables due to insufficient assigned variables. Thus, the data treatment needs to be performed for missing data and outliers in the monitored water quality data before executing multivariate statistical analysis. There might be a real shift in the value of an observation that arises from non-random causes. In this study, outliers were detected according to Grubbs [40] test method using XLSTAT 2016. On the other hand, data collection and analysis were conducted with great prudence to minimize the amount of missing data. However, the incidence of missing data is inevitable and was handled by the multiple imputation of missing values technique using Markov Chain Monte Carlo (MCMC) [41].

The raw water quality parameters were standardized to a mean of 0 and variance of 1 using Z-scale transformation to examine the normality of the distribution of data sets and to ensure that the different variables were equally weighted in the statistical analyses [36]. The data were further checked for normality using Kaiser–Meyer–Olkin (KMO) and Bartlett's sphericity tests to determine if our measured variables may be factorized efficiently. KMO is the degree of sampling adequacy, which shows the percentage of variance that is likely attributable to the underlying factors. Generally, the KMO index ought to be greater than 0.5 for satisfactory factor analysis. When the KMO index is close to 1, the PCA of the variables is suitable; however, when it is close to 0, the PCA is not relevant. In this study, the KMO had a value of 0.68. Bartlett's test of sphericity shows whether the correlation matrix is an identity with variables that are unrelated. The significance level, which is 0 in this study (less than 0.05), indicates that there are significant relationships among the variables.

#### 3.2.1. Principal Component (PCs)/Factor Analysis (FA)

PCA reduces the dimensionality of the data set by explaining the correlations amongst a large number of variables in terms of a smaller number of underlying factors without losing much information [42,43]. The original variables of PCs produce loadings that have correlation coefficients with PCs. The PCs' formula was taken from [33,36]:

Ymn = Zm1X 1n + Zm2X 2n + Zm3X 3n ... + ZmiX in (3)

where *z* is the component loading, *y* is the component score, *x* is the measured value of a variable, m is the component number, n is the sample number, and m is the total number of variables.

Meanwhile, FA attempts to extract a lower-dimensional linear structure from the data set and extracts the new group of variables known as varifactors (VFs) via rotation along the PCA axis. In FA, the basic concept is borrowed from [33,36]:

$$\mathcal{J}\mathfrak{mn} = \mathcal{Z}\mathfrak{p}1P1\mathfrak{m} + \mathcal{Z}\mathfrak{p}2P2\mathfrak{m} + \mathcal{Z}\mathfrak{p}3P3\mathfrak{m} + \dots + \mathcal{Z}\mathfrak{p}r\mathcal{P}r\mathfrak{m} + \text{epm} \tag{4}$$

where *y* is the measured value of the variable, *z* refers to the factor loading, *p* is the factor score, m is the sample number, n is the variable number, *r* is the total number of factors, and *e* is the residual term accounting for errors or other sources of variation.

In this study, PCA was employed for qualitative determination of pollution sources.

#### 3.2.2. Discriminant Analysis

DA was used for discriminating between and among groups by applying discriminating variables. These variables measure characteristics regarding which the groups are expected to differ [44]. DA applies a linear equation of a regression analysis on raw data with prior knowledge of membership of objects to particular clusters and provides statistical classification of samples, expressed in the following equation [43,45]:

$$\text{f(Gi)} = \text{Ki} + \sum\_{i=1}^{n} (\text{Wij} \* \text{Pij}) \tag{5}$$

where Ki is a constant specific to each particular group, i is the number of groups (G), n is the number of parameters used in group classification, and Wij is the weight coefficient designated by DA for the specific parameter (Pij).

Independent variables are entered into DA either all together or stepwise, using both backward and forward approaches. In the first approach of variable entry, the discriminant function is calculated by engaging all the independent variables at once. This approach is used when there are a limited number of independent variables in the interest of discovering how well certain variables perform as discriminants in the absence of others. The stepwise method, on the other hand, involves entering the independent variables into the discriminant function (DF) one at a time. This stepwise input is based on the fact that variables with relative importance to the cluster variables with greater discriminant weights were entered first [46].

In this study, standard, forward, and backward stepwise approaches of DA were applied to each matrix of the primary data. In the forward stepwise mode, discriminant function analysis (DFA) variables were added stepwise until no significant change occurred, while in the backward stepwise mode, variables were removed starting from least significant until a significant change occurred. For this purpose, two groups obtained from CA were selected for spatial evaluations [35].

#### 3.2.3. Pollution Index (PI)

Pollution index (PI) is a simple technique to examine surface water quality and was applied by Tiwan EPA. The parameters such as DO, BOD, SS, and NH3−N employed to determine PI were classified into four index scores (Table 3) and computed using the equation formulated by [47,48]. In particular, PI refers to the arithmetic mean of the index values with respect to the water quality.

$$\text{PI} = \frac{1}{4} \sum\_{\mathbf{K}=1}^{4} \text{Si} \tag{6}$$

**Table 3.** Classification system for pollution index.


PI classifies water quality into four categories: (0–2) for good or non-polluted, (2–3) for slightly polluted, (3–6) for moderately polluted, and (>6) for highly polluted. Anthropogenic activities have been associated with water quality degradation [47,49].

#### *3.3. Cluster Analysis*

Hierarchical agglomerative CA was carried out on the normalized data set using Ward's approach, where Euclidean distances were used as the degree of similarity among samples, and a distance was represented by the distinction among analytical values. In hierarchical clustering, sequentially higher clusters formed [23,45,50–52]. In cluster analysis, cases are classified into classes based on similarities between two samples, which are usually

given by the Euclidean distance between analytical values of the two samples. The squared Euclidean distance can be calculated by [53]:

$$\text{Distance}\left(\mathbf{Q}\_{\text{i}\prime}\;\mathbf{Q}\_{\text{j}}\right) = \sum\_{\mathbf{j}=1}^{n} (\mathbf{X}\_{1\text{i}} - \mathbf{X}\_{2\text{j}})^2\tag{7}$$

where Qi is the ith object, and Xij is the value of the jth variable of the ith object.

The dendrogram provides a visual summary of the clustering process to classify a sample of entities into a smaller number of mutually exclusive groups on the basis of multivariate similarities among entities [33].

Therefore, CA, DA, PCA, and pollution index were applied in this study to identify the underlying interrelationship among the parameters and monitoring stations. CA was applied based on prior knowledge of monitoring stations and the results of DA and pollution index to accurately cluster monitoring stations. PCA was employed to qualitatively identify pollution sources and the type of contaminants contributing to pollution.

#### **4. Results and Discussion**

*4.1. Correlation Matrix Evaluation and Seasonal Variation*

Correlation coefficients are established to portray a correlation among variables and measure statistical significance between pairs of water quality variables [54,55]. Correlation analysis measures the proximity between the identified dependent and independent variables. Correlation coefficients that are close to −1 or +1 demonstrates a strong correlation between x and y, which have a linear correlation. The correlation between the parameters is referred to as strong from (+0.8 to 1.0) or (−0.8 to −1.0), moderate from (+0.5 to 0.8) or (−0.5 to −0.8) and weak from (+0.0 to 0.5) or (−0.0 to −0.5) [56]. In cases where the correlation coefficient between variables is zero, there could be no correlation with a degree of *p* < 0.05 between the two variables [57]. In this study, a correlation matrix was constructed for each dry and wet season using the physicochemical parameters. Pearson's correlation coefficient (r) is determined using correlation matrix to identify the highly correlated and interrelated water quality parameters. To test the significance of the pair of parameters, the *p*-value is determined.

In the wet season, strong positive correlations were observed between TDS values and EC, temperature, TP, TN, and Na<sup>+</sup> values (r = 0.992, r = 0.874, r = 0.850, r = 0.836; *p* < 0.05), and strong negative correlations between TDS and DO with −0.825 at *p* < 0.05. Moderate positive correlations were found between TDS and PO4−P, BOD, COD, and K+ values (r = 0.797, r = 0.698, r = 0.695, r = 0.523; *p* < 0.05), and low positive correlation between TDS and pH with r = 0.26; *p* < 0.05 (Table 4). Strong negative correlations were found between DO and EC, TDS, TP, and TN (r = −0.825, r = −0.850, r = −0.851, r =−0.806; *p* < 0.05), and moderate negative correlations were observed between DO and temperature, BOD, COD, and Na+ values (r = −0.526, r = −0.544, r = −0.692, r = −0.599; *p* < 0.05).

**Table 4.** Correlation matrix Pearson (r) and alpha (p) values for the wet season.


Values in bold are different from 0 with a significance level alpha = 0.05.

Strong positive correlations were observed between temperature and the values of EC, TDS, Na<sup>+</sup> and TP (r = 0.86, r = 0.864, r = 0.849, r = 0.821; *p* < 0.05), and a moderate positive correlation was observed between temperature and the values of TN and PO4−P (r = 0.525, r = 0.669, r = 0.594; *p* < 0.05). There was also a moderate negative correlation between temperature and DO, with r = −0.692 at *p* < 0.005. There was a weak correlation between temperature and the values of COD and BOD (r = 0.447, r = 0.454; *p* < 0.05).

NH3−N had a moderate positive correlation with K+, with r = 0.531 at *<sup>p</sup>* < 0.005, and weak positive correlations with TN and temperature (r = 0.331, r = 0.481 at *p* < 0.05). NO2−N correlated moderately positively with BOD and COD (r = 0.721, r = 0.664 at *<sup>p</sup>* < 0.05) and weakly positively with PO4−P and Ca+2 (r = 0.449, r= 0.404 at *<sup>p</sup>* < 0.05).

A strong positive correlation was found between PO4−P and TN, with r = 0.825 at *p* < 0.005, moderate positive correlations were found between PO4−P and COD, BOD, TP, and temperature (r = 0.712, r = 0.709, r = 0.730, r = 0.602, r = 0.594; *p* < 0.05), and a moderate negative correlation was observed between PO4−P and DO values (r = −0.793; *p* < 0.05). No statistically significant difference was found between pH and NO3−N and the rest of the parameters of LHW (*p* > 0.05).

In the dry season, strong positive correlations were observed between TDS values and EC, TP, Na+, PO4 −P, and temperature values (r = 0.999, r = 0.814, r = 0.899, r =0.839, r = 0.933; *p* < 0.05), moderate positive correlations were observed between TDS and BOD, COD, K+, and TN values (r = 0.686, r = 0.561, r = 0.645, r = 0.534; *p* < 0.05), and a strong negative correlation was found between TDS values and DO (r = −0.819 at *p* < 0.05).

Strong negative correlations were observed between the values of DO and TDS, EC, and Na<sup>+</sup> (r = −0.819, r = 0.817, r = −0.826; *<sup>p</sup>* <0.05), moderate negative correlations were observed between DO and TN, TP, BOD, K+, and temperature values (r = −0.577, r = −0.568,r= −0.687, r = −0.639 r = −0.729; *p* < 0.05), and a moderate negative correlation was observed between DO and NO3 <sup>−</sup>N, with r = −0.464 at *p* < 0.005).

Strong positive correlations were found between temperature and EC and TDS (r = 0.839, r = 0.842; *p* < 0.05), and moderate positive correlations were found for temperature with TP and PO4−P(r = 0.730, r = 0.532; *p* < 0.05). There was also a moderate negative correlation observed between temperature and DO, with r = −0.729 at *p* < 0.005. NH3−N had a moderate positive correlation with COD, TP, temperature, and Na+ (r = 0.476, r = 0.484, r = 0.550, r = 0.343; *p* < 0.005).

A strong positive correlation was found between PO4−P and TP, with r = 0.921 at *<sup>p</sup>* < 0.005, moderate positive correlations were found for PO4−P with BOD, COD, TP, Na+, and temperature (r = 0.749, r = 0.647, r = 0.680, r = 0.76; *p* < 0.05), and a moderate negative correlation was found between PO4−P and DO values r = −0.626; *p* < 0.05) (Table 5).

**Table 5.** Correlation matrix Pearson (r) and alpha (p) values for dry season.


Values in bold are different from 0 with a significance level alpha = 0.05.

The pH of rivers was 7.4 (7.1 to 7.6) in the dry season and 8.2 (7.5 to 8.7) in the wet season, and the pH of lake was 8.2 (7.3 to 8.9) in the dry season and 8.5 (7.5 to 9) in the wet season. The pH of point sources was 8.3 (7.1 to 9) in the dry season and 8.3 (8.1 to 8.7) in the wet season. The recommended pH as per the standard for drinking, irrigation, and aquatic life is 6.5–8.6, and the pH of LHW was within the accepted limit (Table 6). The EC (TDS) of rivers was 148mg/L (297 μS/cm) in dry seasons and 89 mg/L (179 μS/cm) in wet seasons, and EC (TDS) of lakes was 453 mg/L (877 μS/cm) in dry season and 421 mg/L (829 μS/cm) in wet seasons. The EC (TDS) of point sources was 1655 mg/L (3509 μS/cm) in dry season and 1395 mg/L (2809 μS/cm) in wet seasons. This shows that the EC (TDS) of rivers, lakes, and point sources increases significantly with increasing temperature (Table 6). The NO3−N concentration of rivers was 0.5 mg/L, NO3−N concentration of Lake Hawassa was 1.4 mg/L, and that of point sources was 1.5 mg/L for the dry season. In the wet season, the NO3−N concentration was 0.7, 1.9, and 1.9 for rivers, Lake Hawassa, and point sources, respectively. The value of NO3−N increases in the rainy season due to the contribution of agricultural runoff and use of fertilizers. The PO4−P concentration of rivers was 6.5 mg/L, PO4−P of Lake Hawassa was 3.3 mg/L, and that of point sources was 43.8 mg/L in dry season. In the wet season, the PO4−P concentration was 7.4, 2.9, and 25.7 for rivers, Lake Hawassa, and point sources, respectively (Table 6). Similarly, Gebre-Mariam [58] reported that Ethiopian Rift Valley lakes generally have lower EC values in the rainy season than in the dry season, due to dilution by rain coupled with minimal evaporation rates during the rainy season.

**Table 6.** Descriptive statistics (mean and standard deviation) of the physicochemical characteristics of LHW collected during dry season.


All units in mg/L except pH (Dimensionless), Temperature (◦C), EC (μS/cm) and Turbidity (NTU).

The TN (TP) of rivers was 8 (0.12) mg/L in dry seasons and 5(0.26) mg/L in wet season, and TN (TP) of lakes was 5.3 (0.2) mg/L) in dry season and 5.2 (0.6) mg/L in wet season. Hence, there is an obvious increase of TN in rivers and Lake Hawassa when temperature increases due to lower dilution and greater agricultural contribution from the upper stream by irrigation, whereas TP in rivers and Lake Hawassa increases in wet seasons due to greater agricultural, rural, and urban runoff. The TN (TP) from point sources was 31.8 (7.2) mg/L in dry season and 13.9 (5.4) mg/L in wet season. This shows that TN (TP) of point sources increases significantly with increasing temperature due to lower dilution. The NH3−N of rivers was 0.2 mg/L, NH3−N of Lake Hawassa was 0.83 mg/L, and that of point sources was 4.72 mg/L in dry season. In the wet season, the NH3−N values were 0.03, 0.71, and 3.6 for rivers, Lake Hawassa, and point sources, respectively. The decreases in NH3−N level in the rainy season might be due to dilution effect (Table 6).

The positive correlation between temperature and TN, TP, EC, TDS, NH3−N, and PO4−P indicates the increase in the concentration of nutrients as the temperature increases (dry period). It also confirms the major contributors of nutrients were the point sources that are releasing a relatively higher amount of pollutants than the agricultural and other sources, as this value lowers during the wet season due to dilution effect. However, the increase in nutrient (NO3−N) concentration in rivers and Lake Hawassa in the wet season might be due to the increased contribution of agricultural runoff and use of fertilizers.

Sodium, calcium, magnesium, and potassium concentrations of the rivers were 49.1, 13.06, 55.1, and 7.74 mg/L in dry season and 28.9, 32.7, 10.1, and 5.7 mg/L in wet seasons. Sodium, calcium, magnesium and potassium concentrations of the lake were 214, 23.8, 8.7, and 19.7 mg/L in dry season and 178.9, 25.1, 7.3, and 17.2 mg/L in wet season. The sodium, calcium, magnesium, and potassium concentrations of the point sources were 575.2, 38.2, 11.5, and 26.2 mg/L, respectively, in the dry season and 375.2, 38.2, 9.5, and 50.1 mg/L. respectively in the wet season (Table 6). There was an observed decrease in ions when the temperature decreased in the study area. This can be ascribed to the discharge of industrial and domestic effluents, which contribute large amounts of alkaline ions to the river system, as the conductivity depends mainly on the ion concentration in surface water [52]. The natural range of sodium ions in water and soil is so low that their existence can show river pollution caused by human activities. Calcium is added to water from soil, industrial wastes, and natural resources. Magnesium is an essential nutrient required for numerous biochemical and physiological functions [59].

The TDS of water generally increases with the level of dissolved pollutants (such as nitrate, ammonium, and phosphate). Conductivity of ions in water depends on water temperature, and ions move faster when water is warm. Hence, conductivity apparently increases when water has a higher temperature [60]. In addition, Taylor et al. [61] pointed out a strong relationship between these variables or ions, such as nitrate, ammonium, and phosphate, and stated that high concentrations of EC indicate high concentrations of soluble salts. There are strong correlations between EC/TDS, as evidenced by an increase in conductivity as the concentration of all dissolved constituents increases [62] Table 6.

The BOD (COD) of rivers was 19.7 (96.5) mg/L in dry seasons and 6.9 (89.4) mg/L in the wet season, and the BOD and COD of lakes was 28.1 (133.3) mg/L in dry season and was 19.1 (112.9) mg/L in wet season. The BOD and COD concentrations for point sources were 116.2 (398.6) mg/L in dry season and 111.6 (353.7) mg/L in wet season (Table 6). The DO of rivers was 3.5 mg/L in dry season and 6 mg/L in wet season, and the DO of lakes was 4.2 mg/L in dry season and 4.4 mg/L in wet season. The DO of point sources was 2 mg/L in dry season and 2.3 mg/L in the wet season (Table 6).

The DO of the rivers in the dry seasons and Lake Hawassa were well below the standard value. This indicates that the discharge of industrial and domestic effluents has resulted in serious organic pollution of these rivers, as the decrease of DO was mainly caused by the decomposition of organic compounds. Moreover, an extremely low DO content usually indicates the degradation of an aquatic system [63].

The DO showed a negative correlation with most parameters in both dry and rainy seasons, revealing the value of DO decreases with the increase in other water quality parameters. This could explain the temporal variations, as more oxygen was available for reaction with the pollutants, especially metals and organic pollutants, during dry seasons. Additionally, the characteristics of temporal variation in water quality of LHW were affected by DO. DO was strongly correlated with organic matters, nutrients, and metals, and thus seasonal variation should be considered when DO is used as an indicator to evaluate surface water quality. Low dissolved oxygen (DO) is primarily the result of excessive algal growth caused by nutrients. As the algae die and decompose, this process consumes dissolved oxygen. This may result in insufficient dissolved oxygen for fish and other aquatic life. Temperature was significantly correlated with water quality parameters such as EC, TDS, TP, PO4−P, and DO in both seasons. Temperature had significant negative correlation with DO in the dry and wet seasons, indicating that when water temperature increases, the metabolic rate of microorganisms also increases, and the amount of DO in the water decreases. This might be because faster biodegradation of organic matter during dry seasons can effectively improve water quality. The solubility of oxygen was inversely related to temperature, as the water becomes warmer and more easily saturated with oxygen, hence holds less DO during the dry season. Singh et al. [32] observed the inverse relationship between temperature and DO in natural processes, as water can hold less DO with increasing temperature.

#### *4.2. Pollution Index (PI)*

The mean pollution index of the rivers in the lake watershed was 4.5 in dry and 3.3 in wet season, indicating a moderately polluted condition of rivers. Lake Hawassa PI was 5 in both dry and rainy season, indicating that the quality of the lake was moderately polluted. Anthropogenic activities were causing deterioration of the water quality of the rivers and Lake Hawassa, and the overall status of the water quality is moderately polluted. The PI for the point sources was measured for comparison purposes, and it was found to be highly polluted, having a PI index of 6.8 and 7.3 for the wet and dry seasons, respectively (Table 7).



#### *4.3. Cluster Analysis*

Spatial and Temporal Similarities

Cluster analysis was applied to find out if the monitoring stations had similar characteristics in terms of water quality parameters. It was implemented with the water quality data set to group comparable monitoring sites (spatial variability) spread over the watershed. Results from CA display high homogeneity within clusters and high heterogeneity between clusters [64]. Hierarchical agglomerative CA was carried out with the normalized data set employing Ward's method, using Euclidean distances as a measure of similarity. In this approach, the analysis of variance method is used to evaluate the distances between

clusters, attempting to reduce the sum of squares of all clusters that can be made at each step. In this method, the clusters are grouped sequentially, beginning with the most comparable pair of objects and establishing better clusters one after the other, demonstrated through a dendrogram [2,65].

The dendrogram presents a visual summary of the clustering processes and provides the map of the groups with a dramatic reduction in the dimensionality of the original records [2,5,32,43,44]. The CA grouped all 19 monitoring stations into two statistically significant clusters for the dry and wet seasons in LHW, and the dendrogram displays the grouping of stations for the wet and dry seasons, as demonstrated in Figure 2. Regarding the clustering for the dry and wet seasons, monitoring stations from most of the watershed upstream, from the eastern and western sides of the lake, and from the center of Lake Hawassa have been grouped in Cluster 1. Stations in these clusters typically consist of rivers and Lake Hawassa and are categorized as moderately polluted. The monitoring stations in these clusters are MS1-MS3, MS6-MS14, and MS16-MS18, which can be labeled as "moderate anthropogenic effect". This cluster received pollution from point sources and non-point sources, consisting of animal waste and runoff. It is characterized by moderate anthropogenic impact and labelled as moderately polluted.

**Figure 2.** Dendrogram for LHW based on Ward's method showing the clustering of 19 monitoring stations into two significant clusters for both dry (**a**) and wet (**b**) seasons.

The pollution sources for monitoring stations MS1-MS3 were mainly anthropogenic activities from non-point pollution sources such as agricultural and sewage pollution, whereas pollution sources for monitoring stations MS6 (Tikur-Wuha river) and Lake Hawassa (MS7–MS14, MS16–MS18) were mainly industrial pollution, dispersed point sources, agricultural pollution, urban runoff, and sewage pollution.

Owing to their relative sources, all stations in this cluster were rivers and lakes, suggesting that clustering is reasonable for both dry and wet seasons.

The spatial trend of water quality was generally driven by anthropogenic activities from point and non-point sources of pollution, especially anthropogenic activities with respect to pollutant loading and land use.

Cluster 2 includes four monitoring stations in the middle part of the LHW and groups monitoring stations in this cluster as MS4, MS5, MS15, and MS19. Four point sources, specifically BGI, Pepsi Factory, Referral Hospital, and Industrial Park monitoring stations, were assigned to this cluster. Consequently, this cluster is characterized by comparatively heavy pollution.

#### *4.4. Discriminant Analysis*

Discriminant analysis (DA) was used to evaluate the spatial variations in water quality and to distinguish the most critical parameters in relation to variations between clusters. Both the standard and stepwise modes were applied to the primary data by dividing them into wet and dry seasons, and the two spatial groups resulting from CA were used in DA. In this case, the WQ parameters were treated as independent variables, while the clusters were considered as dependent variables. The confusion matrixes (CM) showed that 100%, 100%, and 100% of the data points were correctly classified in the standard, forward stepwise, and backward stepwise modes for both dry and wet seasons, respectively (Table 8).

**Table 8.** Classification matrix for standard, forward stepwise, and backward stepwise DA of spatial variation in LHW for both dry and wet seasons, showing percentage of correct assignation for discriminating parameters.


C1: Includes stations (MS1-MS3, MS6-MS14, and MS16-MS18). C2: Includes stations (MS4, MS5, MS15, and MS19).

The standard DA method builds DFs using eighteen parameters, while only three and seven parameters were the critical parameters useful to make distinction within the two pollution groups for both the forward stepwise modes and backward stepwise modes, respectively, for both dry and wet seasons. In forward stepwise mode, most of the parameters such as turbidity, TDS, pH, NH3−N, NO3−N, PO4−P, DO, COD, NO2−N, TN, TP, temperature, Mg2+, Ca2+, and K+ were insignificant variables leading to less variation, and they were deleted in the further process. However, in the forward stepwise DA

mode, the three significant variables that were useful to make distinctions within the two pollution groups with 100% correct assignation were EC, BOD, and Na+. The backward stepwise mode deleted the least significant and identified seven significant variables: EC, DO, COD, TN, TP, Na+ and K+. These seven parameters, which were 100% correctly assigned, were the critical parameters useful to make distinctions within the two pollution groups. This implies that the expected spatial variation in water quality can be explained sufficiently using variables EC, DO, COD, TN, TP, Na+, and K+. Wilks' lambda shows that the discriminant distribution is skewed towards high concentrations.

On the other hand, the standard DA functions was constructed using eighteen parameters, of which three and four parameters were used for forward stepwise mode and backward stepwise mode, respectively, for wet season. In forward stepwise mode, the pollutants that were found to be insignificant variables and had less variation in terms of their spatial distribution were deleted in the further process. However, in the backward stepwise DA mode, the three significant variables that were useful to make distinctions within the two pollution groups with 84.5% correct assignment were EC, Na+, and COD. The backward stepwise mode deleted the least significant and identified two significant variables: EC and Ca+2. These two parameters were the critical parameters useful to make distinctions within the two pollution groups with 87.5% correct assignation (Table 8). This implies the spatial water quality variation can be sufficiently explained by using variables EC, Na+, COD, and Ca2+, with Wilks' lambda value showing discriminatory distribution is skewed toward high concentration, as shown in Figure 3.

**Figure 3.** Box plot of the most discriminating parameters, BOD (mg/L), EC (μS/cm) and Na<sup>+</sup> (mg/L) and Wilks' lambda showing skewedness of discriminatory distribution toward high concentration.

#### *4.5. Pollution Source Identification of Monitored Variables* Principal Component Analysis

PCA was applied to the normalized data and was able to identify three principal components (PCs) using the Kaiser criterion [66] based on loading higher than 0.5. The scree plot graphs are used widely to identify the number of PCs to be retained to understand the underlying data structure [26]. Based on the scree plot and the eigenvalues >1 criterion, three factors were chosen as principal factors. The variables with eigenvalues lower than 1 were removed due to their low significance [67].

In this study, the scree plot (Figure 4) shows the sorted eigenvalues from large to small as a function of the number of PCs. This figure shows a pronounced change in slope after the third eigenvalue; three components were retained (Table 9). After the third PC (Figure 4a,b), beginning with the upward curve, the remaining components were circumvented. It was used to classify the number of PCs to be retained in order to figure out the underlying data

y

structure [25]. Consequently, a new set of data is obtained that may explain the variation of data set having fewer variables.

**Figure 4.** Factor loadings derived from scree plot and eigenvalue for LHW and three factors are retained for dry (**a**) and wet (**b**) seasons.

**Table 9.** Matrix of factor loadings calculated based on water quality parameters measured in the period from May to January in the Lake Hawassa Watershed and factor loadings of variables on the first three PCs extracted by using eigenvalue for both wet (a) and dry (b) seasons.


**<sup>a</sup>** strongly correlated factor loadings, **<sup>b</sup>** moderately correlated factor loadings, **<sup>c</sup>** weakly correlated factor loadings.

Moreover, scree plots are used to visually evaluate which components or factors elucidate the maximum variability in the data.

The PCA results, which include the loadings (participation of the original variable in the new one), are summarized in Table 9. The FA in LHW extracted three factors by retaining the PCs through varimax rotation that explained 72.5% of the total variance for the wet season. An eigenvalue offers a degree of the importance of the factor, and factors having the highest eigenvalues are the most significant. Eigenvalues of 1.0 or more are considered significant. Liu et al. [26] additionally categorized the factor loadings as 'strong', 'moderate', and 'weak', corresponding to absolute loading values of >0.75, 0.75–0.50, and 0.50–0.30, respectively.

The first factor (F1), accounting for 46.8% of the total variance, showed strong positive loadings of TDS, EC, PO4−P, BOD, COD, TP, TN, Na+, and temperature with factor loadings of 0.974, 0.978, 871, 0.811, 0.784, 0.793, 0.898, 0.812, 0.825, and 0.832, respectively; a weak positive loading of K+ (0.477); and strong negative loading of DO (−0.842) (Table 9). High positive loadings of temperature and high negative loading of DO might suggest the impact of seasonal variation, and temperature is inversely related to DO. The strong and moderate positive loading of BOD and COD signify biodegradation of organic matter and are negatively affected by DO of water bodies. F1 stands clearly for pollution by BOD or COD, and nutrients and oxygen depletion is a consequence. When the temperature of water bodies decreases, the biodegradation of organic matter decreases, and the solubility of oxygen in the water increases. Similar reports of high concentrations of BOD and COD exist elsewhere [42,44,45]. Similarly, the strong negative DO loading indicates the utilization of DO under anaerobic conditions in rivers and lakes for the degradation of organic matter. F1 showed strongly positive loadings for both COD and BOD, while the loading for DO was strongly negative. This indicates a group of purely organic pollution indicator parameters from industrial effluents, domestic discharges, and livestock affecting water bodies [23,27,51].

High nutrient loadings of factors such as TN and TP represent pollution from point and non-point sources from industrial setup, agriculture areas, domestic sewage, and urban runoff. The high loading of metals demonstrates the influences of industrial effluents and agriculture activities. Phosphorus and nitrogen can originate from point sources such as sewage pollution, industrial facilities and livestock, as well as from non-point sources, mainly from agricultural activities, runoff from rural and urban areas, soil erosion, and livestock. These results are consistent with findings of other reports elsewhere [27,68]. Consequently, the component is more likely to be explained by the combination of domestic pollution and industrial factors. These factors are characteristic of the monitoring stations in the upper catchment (MS1 and MS2), in the middle section including point sources (MS5 and MS15), along Tikur-Wuha River (MS6), and on the eastern side of Lake Hawassa (MS7, MS9, MS12, MS13, and MS14), where domestic and industrial effluents and agricultural runoff are predominant.

The strongly positive loadings of Na+ and weak positive loadings of K<sup>+</sup> are likely due to industrial effluents discharged into the river Tikur-Wuha and Lake Hawassa. Reports also indicate that the sources of Na<sup>+</sup> and K+ might be domestic sources, fertilizers, and residential waste in addition to industrial effluents [69]. During field observation, it was found that the major industries are discharging their treated and untreated effluents directly into the Tikur-Wuha River and the lake during the rainy period when the flow rate is high, resulting in high dilution, but during the dry period, the dilution effect is lower and consequent pollution is higher.

On the other hand, the strong loadings of TN and TP in F1 suggest higher contribution from point sources in industry and non-point sources such as agricultural land use, urban drainage, and residential areas during the rainy season. In general, these factors are symbolic of a blended source of contamination, encompassing industrial discharges, urban runoff, and agricultural land use. The results are in agreement with those of other studies [5,24,67,69]. Hence, they can be considered as the contamination index for surface water [44,45].

The second factor (F2) explained 13.4% of the total variance. It had a moderately negative loading of Mg2+ and Ca2+ (−0.654, −0.627) and a moderately positive loading of NH3–N (0.516). This factor's moderately negative loading of Mg2+ and Ca2+ is likely to originate from industrial wastewater discharged into the Tikur-Wuha River and Lake Hawassa, usually from carbonate minerals, which are naturally present in the soils of the Lake Hawassa watershed. This factor is more pronounced at monitoring stations affected by point sources, agricultural lands, and rural and urban runoff, such as MS3 in the upper catchment, MS19 in the middle section (point source), and MS8, MS11, MS16, and MS18 monitoring stations on both eastern and western sides of Lake Hawassa.

A moderately positive loading of NH3−N (0.7) indicates biodegradation of organic matter. This variable is primarily from runoff, with high loading of solids and wastes from point sources of pollution from domestic and industrial areas. Furthermore, NH3−N is triggered by the decomposition of organic matter, indicating the discharge of domestic sewage to surface water. Studies elsewhere have showed comparable results [42,44,45,69,70].

The third factor (F3), explaining 12.3% of the total variance, had a moderately negative loading for pH (−0.710), suggesting the dominance of physical reactions by aquatic plants and natural weathering of the basin, possibly due to industrial impact from different sources [22]. It had weak positive loading of turbidity (0.452), moderate negative loading of NO2−N (−0.620), and moderate positive loading of NO3−N (0.507). NO3–N may additionally have derived from agricultural areas in the region, where inorganic nitrogen fertilizers are in common use and the role of domestic waste is strong, and hence, this component can be best explained by a "nutrient" factor representing influences from nonpoint sources such as agricultural runoff and the domestic pollution factor. The reports of Yilma et al. [35] in Ethiopia and Zhang et al. [27] elsewhere were comparable with this result. This factor is typical of the monitoring stations in the middle section including point sources and eastern and western sides of Lake Hawassa (MS4, MS10, and MS17), where domestic sewage, industrial effluents, and agricultural runoff are predominant.

The FA in LHW extracted three factors by retaining the PCs through varimax rotation that explained 70.5% of the total variance for the dry season. The first factor (F1), accounting for 45.7% of the total variance, showed strong positive loadings of TDS, EC, PO4−P, BOD, DO, TP, Na+, and temperature, having factor loadings of 0.962, 0.961, 0.830, 0.796, 0.897, 0.783, and 0.973, respectively; moderate positive loadings of K+, COD, and TN (0.572, 0.721, 0.724); and strong negative loadings of DO (−0.847). Strong positive loadings of temperature and strong negative loadings of DO might suggest the impact of seasonal variations. The strong and moderate positive loading of BOD and COD signify biodegradation of organic matters and negatively affect DO of water bodies. F1 stands clearly for pollution by BOD or COD, and nutrients and oxygen depletion is a consequence. High temperature increases biodegradation and reduces solubility of oxygen in the water. This PC was correlated with COD and BOD5, indicating a group of purely organic pollution indicator parameters from uncontrolled domestic discharges caused by rapid urbanization and industrial effluents. Biodegradation of organic matter causes concentrations of BOD and dissolved oxygen in water [23,27,51].

A high loading of nutrients represents pollution from industrial setup and domestic wastewater. High loading of metals demonstrates the influences of industrial discharges. Phosphorus and nitrogen may originate from point sources such as sewage pollution, agricultural runoff in the upper stream due to irrigation, industrial facilities, and livestock. Consequently, this component is more likely to be explained by the combination of domestic pollution factors and industrial factors. Strongly positive loading of Na<sup>+</sup> and moderate positive loadings of K+ are likely to originate from industrial effluents discharged directly into the Tikur-Wuha River and Lake Hawassa. These results are also supported by similar findings obtained elsewhere [27,69].

This factor is more pronounced at monitoring stations in the upper catchment (MS1 and MS3), monitoring stations in the middle section including point sources (MS4, MS5, MS15 and MS19), Tikur-Wuha River (MS6), and monitoring stations from both eastern and western sides of Lake Hawassa (MS9, MS10, MS14, MS16, and MS17), where domestic sewage, industrial effluents, and agricultural activities are predominant. The major industries discharge their treated and untreated effluents directly into Tikur-Wuha River and the lake during the dry period when the flow is low, which might lead to higher pollution. On the other hand, the strong loadings of TN and TP at F1 suggest a higher contribution of point sources from industrial facilities and agricultural runoff in the upper stream due to irrigation. Generally, these factors suggest a blended source of contamination encompassing municipal and industrial point source and livestock. This result is also confirmed by other studies [5,23,33,67,69]. Hence, it can be considered to be the contamination index for surface water [44,45].

The second factor (F2) explained 16% of the total variance and had a strong negative loading of turbidity (−0.781), a moderate negative loading of NO2−N and Mg+2 (−0.567, −0.531), and a moderate positive loading of NO3−N and Ca+2 (0.599, 0.524). NO3–N could be mainly from point sources, and the role of domestic waste is also strong. Hence, this component can be explained by the "nutrient" factor, which represents influences from non-point sources such as the domestic pollution factor [24,27,32,35,66,69]. A moderately positive loading of K+ and a moderately negative loading of Mg2+ in this factor likely originate from industrial discharges into the Tikur−Wuha River and Lake Hawassa. This PC is more influenced by industrial discharges, and monitoring stations from the LHW, where industry is predominant, are more pronounced. This factor is more pronounced in monitoring stations in the upper catchment (MS2) and the monitoring stations in the eastern and western sides of Lake Hawassa (MS11, MS12, MS13, and MS18), where domestic, industrial, and agricultural activities are predominant in the upper stream due to irrigation.

The third factor (F3), explaining 8.8% of the total variance, had a strong positive loading of pH (0.775), suggesting the dominance of physical reactions by aquatic plants and natural weathering of the basin, and attributed to industrial impact from different sources [22]. A moderate positive loading of NH3−N (0.7) indicates the biodegradation of organic matter causing concentrations of waterborne factors such as NH3−N. This variable originated primarily from wastes from point sources of pollution from domestic and industrial areas. Furthermore, NH3−N is triggered by organic matter decomposition, indicating the discharge of domestic sewage to surface water. Reports elsewhere support the findings of this study [42,44,45,70]. This factor is more pronounced in monitoring stations on the eastern side of Lake Hawassa (MS7 and MS8), where domestic sewage, industrial effluents, and agricultural activities are prevalent.

The bi-plot of PCs on key parameters TDS, EC, PO4−P, DO, BOD, COD, TN, TP, temperature, Na+, K+, Turbidity, NO2−N, NO3−N, Mg2+, and Ca2+ that characterize monitoring stations from rivers in the upper and middle catchment, point sources in the middle catchment, and the eastern and western sides of Lake Hawassa are presented in Figure 5a,b for dry and wet seasons. In fact, the average values of EC, TDS, BOD, COD, Na+, K+, Mg2+, Ca2+, and NH3−N of point sources were exceedingly higher than that of rivers in the upper and middle catchment (MS1–MS3, and MS6) and Lake Hawassa (MS7- MS14, MS16 and MS18) in Table 6. In addition, NO3−N, NO2−N, TN, TP, and PO4−P were the main parameters characterizing the stated monitoring sites in both seasons. These stations predominantly include rural areas, urban and peri-urban areas, and industrial sites from which domestic sewage, urban runoff, and effluents are discharged into the lake. Furthermore, the influence of agricultural activities in the upper catchment and Tikur-Wuha River feeding the lake was evident. The results of this investigation were comparable to the findings of the studies conducted by Tibebe et al. [71] and Meshesha et al. [72] on Lake Ziway. In particular, higher EC and TDS values were recorded for similar monitoring stations in both seasons (Table 6). In an aquatic environment, EC is used to categorize the pollution status of surface waters, and an increase in conductivity indicates the presence of dissolved ions that can affect aquatic life and water quality [73].

**Figure 5.** PCA biplots (**a**,**b**) suggest the projection of the monitoring sites (blue dots) and the variable loadings of the primary components (F1 and F2). The biplots additionally display the relationship between highly correlated variables and monitoring stations for dry (**a**) and wet (**b**) seasons. High and low values indicate strong positive and negative correlation, respectively, while values close to 0 imply weak correlation between F1 and F2 and the respective parameter.

#### *4.6. Total Nitrogen to Total Phosphorus (TN:TP) Ratio*

The TN:TP ratio in lakes and reservoirs is a key element, as it gives an idea of which of these nutrients is either in excess or limiting to growth, and it was used to estimate the nutrient limitation in the lake. According to Smith [74], blue-green algae (cyanobacteria) has a capacity to dominate in the lake section when the TN:TP ratio was less than 29, and it tends to be rare in the lake when TN:TP > 29. On the other hand, Fisher et al. [75] used a more conservative ratio of TN:TP. According to them, the ratio > 20 is designated as the phosphorus limitation and nitrogen limitation when the ratio is <10, while a TN:TP ratio of 10 to 16 demonstrates either phosphorus or nitrogen (or both) are limiting for growth. The estimated ratio for Lake Hawassa was 31, which is higher than 20 and 30, revealing cyanobacteria dominance in the lake section, which is rare. The TN:TP ratio > 20 in Lake Hawassa indicated that phytoplankton growth in the lake might be phosphorous deficient.

#### **5. Conclusions**

Multivariate statistical techniques help researchers to scrutinize the relationships between parameters in a broader fashion by applying different approaches such as cluster analysis, correlation, factor analysis, discriminant analysis, and multiple regressions to determine the association between dependent and independent variables. They reduce the dimensionality of data so that the whole picture can be visualized more easily than looking at specific cases allows. Furthermore, multivariate techniques provide powerful significance testing compared to univariate techniques. Despite their various merits, the results of multivariate statistical modeling are not easy to interpret and require a large data set to get meaningful results due to the high standard errors. In particular, PCA/FA is likely to lose information if PCs or factors are not chosen judiciously.

This study was conducted to evaluate seasonal and spatial variations in water quality and to identify potential sources of pollution using multivariate statistical techniques for the Lake Hawassa Watershed. The results of this study show that the condition of Lake Hawassa Watershed was classified into moderately and highly polluted categories in both dry and wet seasons. In data-limited developing countries such as Ethiopia, it is especially clumsy to identify possible sources of pollution due to certain contaminants, as this requires frequently monitored water quality data, which are often not available. To address this serious problem, this study applied MVST. Multivariate statistics were used to perform temporal and spatial assessment of surface water quality to reduce the number of monitoring stations and chemical parameters in LHW. In this study, we used Pearson correlation, PCA/FA, CA, and DA to evaluate spatial and temporal variance in surface water quality.

CA grouped the monitoring stations into two statistically significant clusters for the dry and wet seasons, labelled MP and HP, using PI. Accordingly, this resulted in a dendrogram with two clusters for the dry and wet seasons. The findings of the study revealed that rivers in the upstream and middle portion of the lake watershed and Lake Hawassa were moderately polluted (MP), while point sources (industries, hospitals, and hotels) in the middle of the LHW were found to be highly polluted (HP).

DA was used to identify the most critical parameters to investigate the spatial variations and extracted seven significant parameters: EC, DO, COD, TN, TP, Na+, and K+, with spatial variance to distinguish the pollution statuses of the groups obtained using CA.

PCA/FA techniques helped to identify the potential sources of water quality degradation. This study comprehensively analyzed the water quality of LHW and identified three significant sources responsible for pollution of Lake Hawassa Watershed in dry and wet seasons affecting the water quality. Accordingly, the pollution is due to mixed sources including point sources such as municipal and industrial effluents, natural processes, livestock, urban runoff, and non-point sources from agricultural activities.

Poor industrial effluent management combined with non-point sources from agriculture and urban runoff contribute significantly to the pollution of Lake Hawassa. Discharge of industrial effluents into the surface water system is the largest point source of anthropogenic pollution. Diffuse sources that contribute enormously to LHW come from agricultural activities, i.e., intensive farming and livestock (F1, F2, and F3).

We conclude that effective management of point and non-point source pollution is imperative to improve domestic, industrial, livestock, and agricultural runoff to reduce pollutant inputs into the lake. A stringent management that requires a comprehensive application of technologies such as fertilizer management, ecological ditches, constructed wetlands, and buffer strips should complement proper municipal and industrial wastewater treatment set-up.

Furthermore, application of indigenous aeration practices such as the use of drop structures at critical locations would help improve water quality in the lake watershed.

**Author Contributions:** Conceptualization and improvement of the methodology, S.M.L., M.D.U. and A.M.; Data collection, analysis, and interpretation, S.M.L.; Writing of the original manuscript, S.M.L.; Supervision, follow-up of the work, reviewing and modifying of the manuscript, A.M. and M.D.U. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was part of the DAAD-EECBP Home Grown PhD Scholarship Program at EECBP Homegrown PhD Program, 2019 (57472170). The Open Access Department, University of Rostock, has funded the APC.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are available on request from the first author. The data are not publicly available, as they are experimental.

**Acknowledgments:** We are grateful to the German Academic Exchange Service (DAAD) for offering a stipend for the first author in the course of the study.

**Conflicts of Interest:** The authors claim no conflict of interest in connection with the work submitted.

#### **References**


### *Article* **Evaluation of the Hydrochemical and Water Quality Characteristics of an Aquifer Located in an Urbanized Area**

**Juan G. Loaiza 1, Yaneth Bustos-Terrones <sup>2</sup> , Victoria Bustos-Terrones 3, Sergio Alberto Monjardín-Armenta <sup>4</sup> , Alberto Quevedo-Castro 1, Rogelio Estrada-Vazquez <sup>1</sup> and Jesús Gabriel Rangel-Peraza 1,\***


**Abstract:** Groundwater is an important source of fresh water in the world. However, the excessive extraction and increasing pollution represent a major challenge for water sustainability in Mexico. Nowadays, since water quality changes in aquifers are not noticeable, aquifer monitoring and assessment are imperious. In this study, the water quality of the Cuernavaca aquifer was evaluated using a database of 23 parameters in 4 sampling points from 2012 to 2019. The spatial behavior of water quality variables was described by using interpolation. The temporal evaluation of groundwater quality was carried out through time series. Water quality indices (WQI) were obtained in this aquifer and the WQI values suggest that the groundwater could be considered as good quality for potable use and of medium-high quality for irrigation. The chemical characteristics of the groundwater were also evaluated using Gibb, Piper, and Schoeller diagrams. Finally, with a total of 34 samples of each parameter in each sampling site, a multivariate statistical analysis was performed using a Pearson correlation and hierarchical cluster analysis. This analysis showed a correlation between hydrochemical features and groundwater quality parameters, where nitrates presented the highest number of significant correlations with other parameters. These results may be useful for the authorities to adopt planning methods to improve the sustainable development of the aquifer.

**Keywords:** Cuernavaca aquifer; hydrochemistry; water quality index; time series analysis; spatial analysis

#### **1. Introduction**

Groundwater is one of the most important natural resources and plays an important role in ecosystems [1]. It is widely used for domestic, industrial, and agricultural activities; hence, its demand is constantly increasing [2–5]. Population growth, accidental spills, surface leaching, runoff, and the extensive use of fertilizers in irrigated areas are considered the main causes of groundwater alteration [6,7]. Furthermore, agricultural activities modify groundwater conditions with nutrients and pesticides coming from leachate infiltration into the soil. Therefore, the use of fertilizers, pesticides, and herbicides in agriculture are major threats to aquifers [8–10]. In addition, the deterioration of an aquifer can be also related to natural causes such as floods, droughts, and salinization [11–13]. Once the aquifer is altered, it is complex and expensive to reverse the damage [12,14]. Therefore, groundwater quality must be monitored regularly to prevent aquifer alterations.

Many studies have been proposed for the assessment of aquifer vulnerability. Bannenberg et al. [6] evaluated the hydrological regime and hydrochemical features of

**Citation:** Loaiza, J.G.; Bustos-Terrones, Y.; Bustos-Terrones, V.; Monjardín-Armenta, S.A.; Quevedo-Castro, A.; Estrada-Vazquez, R.; Rangel-Peraza, J.G. Evaluation of the Hydrochemical and Water Quality Characteristics of an Aquifer Located in an Urbanized Area. *Appl. Sci.* **2022**, *12*, 6879. https://doi.org/10.3390/ app12146879

Academic Editors: Amit Kumar, Santosh Subhash Palmate and Rituraj Shukla

Received: 18 June 2022 Accepted: 5 July 2022 Published: 7 July 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

the Flamouria aquifer in Edessa, Greece. They found that groundwater quality was not suitable for irrigation use since the high alkalinity and total dissolved solids found in groundwater could generate excessive salinization of the soil. Kumar et al. [8] conducted a hydrochemical study to assess the water quality suitability for drinking and irrigation purposes. As a result, they found a high concentration of some ions, such as As, Fe, and Mn, in an aquifer located in the Central Ganga Basin. Loh et al. [15] evaluated the suitability of an aquifer in Ghana for domestic and irrigation purposes. They used conventional hydrochemical and mass balance models to reveal relationships between water parameters and the main influence on the chemistry of the aquifer under study. As a result, they found that the groundwater in the area is permissible for agricultural irrigation.

Zakaria et al. [16] evaluated the hydrochemistry of groundwater in the Anayari catchment to identify the hydrogeochemical processes that are responsible for the main ions in groundwater. Their results showed good quality for irrigation without prior treatment. Wisitthammasri et al. [17] studied the water quality and hydrochemical characteristics of an aquifer in Thailand using multivariate statistical analysis to identify preliminary ion sources. This multivariate analysis evidenced the ion exchange between Ca2+ and Na+ from the weathering of silicates and calcite. Some commonly used multivariate statistical techniques, such as Pearson correlations and Hierarchical Cluster Analysis (HCA), have been used to illustrate the relationship between many groundwater variables and describe the relationship between them [18]. These studies have been carried out to develop appropriate groundwater management strategies and policies to protect aquifers. According to Elumalai et al. [4], multivariate statistical analyses are important because they provide essential information on groundwater quality and the processes responsible for its alteration.

Another tool for groundwater water quality assessment is the water quality index (WQI). This tool has been widely used by several researchers [8,14,19] since it simplifies the interpretation of water quality behavior. El Osta et al. [19] used this technique to classify the suitability of groundwater. As a result, they found that only a low percentage of the samples were classified as good to excellent to be used, while the rest of the samples were inadequate and required treatment to be used as drinking water. Another effective tool for assessing groundwater quality and its variability is that recommended by Kumar et al. [8] who evaluated the groundwater quality based on the Geographic Information System (GIS) through the Groundwater Quality Index (GQI) in an aquifer in southern India. They mention that this method is reliable for groundwater quality assessment and serves as a useful tool for decision-makers for efficient groundwater quality monitoring and management mainly in agricultural areas which have a great influence on groundwater recharge and quality.

In recent years, population growth and the increase of agricultural areas and indus-trial activities have intensified water demand, threatening the sustainable use of ground-water in the Cuernavaca aquifer. Despite this situation has been locally evidenced, no formal studies have been carried out to demonstrate the effect of these activities on groundwater resources. The novelty of this study lies in describing the hydrological and hydrochemical conditions of this aquifer to identify its vulnerability. The geohydrological features of the Cuernavaca aquifer are described using Gibb, Piper, and Schoeller diagrams. Groundwater quality evaluation is carried out based on time series analysis, water quality distribution maps, and water quality indices. This study performs a multivariate and correlation analysis to identify possible pollution sources and proposes better water management strategies for this aquifer.

#### **2. Materials and Methods**

#### *2.1. Study Area*

The study area is in the state of Morelos, Mexico with an approximate area of 820 km2 (Figure 1). Mean annual precipitation, evapotranspiration, and air temperature are 1278 mm, 874.7 mm, and 19.4 ◦C, respectively. The highest rainfall values are observed from July to September, which corresponds to the summer season, and less significant precipitations are registered in winter from October to January mainly caused by cold fronts. The Cuernavaca aquifer is a free, heterogeneous, and anisotropic aquifer with surface geology that is represented by lithological units mainly of sedimentary and volcanic origin [20] and does not show any significant structural complications. The static water level of this aquifer varies from 20 to 100 m.

**Figure 1.** Location of the Cuernavaca aquifer and location of sampling wells.

#### *2.2. Data Collection*

In Mexico, the National Water Commission (CONAGUA) is a federal agency responsible for monitoring, surveillance, and management of aquifers [21]. The analysis of the groundwater samples was carried out by an accredited laboratory [22] which applied a methodology based on the standard methods (APHA) [23]. The data used in this study were obtained by this federal agency. For economic and strategic reasons, four sampling wells were monitored, which were in sites with intense anthropogenic activity. These wells are used for water consumption. The extraction of groundwater for consumption purposes is carried out using pumping systems.

The hydrochemical and water quality parameters considered in this study were: bicarbonates (HCO3−), fecal coliforms (FC), total organic carbon (TOC), ammonium (NH3), nitrites (NO2−), nitrates (NO3−), organic nitrogen (ON), total nitrogen (TN), total phosphorus (TP), total dissolved solids (TDS), electrical conductivity (EC), pH, chlorides (Cl−), fluorides (F), silicon oxides (SiO2), potassium (K+), manganese (Mn), sodium (Na+), sulfates (SO4 <sup>2</sup>−), calcium (Ca2+), magnesium (Mg2+), total hardness (TH), water temperature (WT).

Quality control (QC) and quality assurance (QA). The sampling wells were monitored by CONAGUA. An accredited laboratory carried out the analysis of 23 parameters sampled every six months in the 2012–2019 period. The water sampling was carried out according to Mexican regulations. Based on these regulations, chemical products of analytical grade were required for the preparation of the standard solutions and reagents. In addition, replicates were performed to ensure the reliability of the results and comply with quality control required by the General Directorate of Standards and the Federal Law on Metrology and Normalization.

#### *2.3. Spatial and Temporal Assessment of Groundwater Quality*

The spatial evaluation of the groundwater quality of the Cuernavaca aquifer was carried out through interpolation of the measured parameters as suggested by other studies [24,25]. The inverse distance weighting (IDW) interpolation method was used to describe the spatial distribution of the groundwater quality values through the study area by using the QGIS 3.18 software. The weights used in the IDW method were calculated according to the weighting strategy proposed by Bartier [25]. These weight values are determined based on the distance between the sampling points according to Equation (1).

$$z\_{x,y} = \frac{\sum\_{i=1}^{n} z\_i d\_{x,y,i}^{-\beta}}{\sum\_{i=1}^{n} d\_{x,y,i}^{-\beta}} \tag{1}$$

where *zx*, *<sup>y</sup>* is the water quality parameter to be estimated; *zi* represent the measured value for the sampling point; *dx*, *<sup>y</sup>*, *<sup>i</sup>* is the distance between *zx*, *<sup>y</sup>* and *zi*; and *β* is a user-defined coefficient (the software default value of 2 was used for the *β* coefficient).

Temporal evaluation of groundwater quality was performed using time series analysis to determine possible groundwater quality temporal trends using biannual sampling data from 2012 to 2019. A temporal analysis was carried out by describing the groundwater quality variations over time. Finally, groundwater quality data were compared to World Health Organization (WHO) and local guidelines.

#### *2.4. Water Quality Assessment*

#### 2.4.1. Drinking Water Quality Index

The drinking water quality index (DWQI) is frequently used to determine the suitability of groundwaters. In this study, the determination of DWQI was performed according to Equations (2)–(5) [14,26,27].

$$\mathcal{W}\_{\bar{l}} = \frac{w\_{\bar{l}}}{\sum\_{i}^{n} w\_{\bar{l}}} \tag{2}$$

$$Q\_{\bar{i}} = \frac{e\_{\bar{i}} - v\_{\bar{i}}}{b\_{\bar{i}} - v\_{\bar{i}}} \ast 100 \tag{3}$$

$$SI = \mathcal{W}\_{\bar{i}} \* \mathcal{Q}\_{\bar{i}} \tag{4}$$

$$DWQI = \sum\_{i=1}^{n} SI \tag{5}$$

where *Wi* is the relative weight; *wi* is the weight assigned to each parameter according to its relative importance for drinking water (the maximum weight of "5" has been assigned for the highest importance and the minimum weight of "2" for the lowest importance); "*n*" is the number of groundwater parameters; *Qi*: is the rating according to the distribution of the "*i*th" parameter. *ei*: is the concentration of each parameter; *vi*: is the optimum value of the parameter ("0" is considered as optimum value for all parameters, except pH which is "7"); *bi* is the guideline value [28] for each parameter; *SI*: is the sub-index of "*i*th" parameter. According to some researchers [14,29,30], the optimum values and weights for the parameters of the DWQI are: pH (*bi* = 8.5, *wi* = 4, *Wi* = 0.13), TDS (mg/L, *bi* = 500, *wi* = 4, *Wi* = 0.13), total hardness (mg/L, *bi* = 300, *wi* = 3, *Wi* = 0.10), calcium (mg/L, *bi* = 75, *wi* = 3, *Wi* = 0.10), magnesium (mg/L, *bi* = 30, *wi* = 3, *Wi* = 0.10), nitrates (mg/L, *bi* = 45, *wi* = 4, *Wi* = 0.13), chlorides (mg/L, *bi* = 250, *wi* = 2, *Wi* = 0.06), sulfates (mg/L, *bi* = 200, *wi* = 2, *Wi* = 0.06), fluorides (mg/L, *bi* = 1, *wi* = 4, *Wi* = 0.13), and total alkalinity (mg/L, *bi* = 200, *wi* = 2, *Wi* = 0.06). Based on the results of Equation (4), the aquifer water was then classified into different categories: DWQI < 50 (excellent), DWQI = 50–100 (good), DWQI = 100–150 (moderate), DWQI = 150–200 (poor) and DWQI ≥ 200 (extremely poor).

#### 2.4.2. Hydrochemical Characteristics

The chemical composition of groundwater is highly variable. Hence, the hydrochemical classification and groundwater chemical composition evolution were determined by using the Gibb, Piper, and Schoeller plots [15,31]. Then, the suitability of the groundwater for irrigation was evaluated by using the groundwater indices shown in Table 1.

#### 2.4.3. Multivariate Statistical Analysis

Multivariate statistical techniques such as Pearson correlation and hierarchical cluster analysis (HCA) were used to figure out the relationship between the water quality variables. This multivariate statistical analysis was used to identify the factors and possible sources that could explain the behavior of the groundwater quality of the aquifer [32–35]. In addition, a dendrogram was performed using the ward conglomeration method with a Euclidean distance metric [7,15].


**Table 1.** Groundwater indices based on hydrochemical features.

#### **3. Results and Discussion**

#### *3.1. Descriptive Analysis of Groundwater Quality Parameters*

The total dissolved solids reflect the behavior of the salt concentration of the aquifer. These solids were found in a range of 75–688 mg/L and a mean value of 316 mg/L was registered. This value is low compared to that reported by Tefera et al. [31], who found concentrations up to 2777.6 mg/L. According to WHO [28], groundwaters with TDS values higher than 500 mg/L could be considered unsuitable for drinking water supply. The total hardness was found in a range of 24.6–456.8 mg/L. However, the mean value (179.2 mg/L) is below the concentration of 300 mg/L suggested by WHO [28] for drinking water. This value is also below the total hardness found by Kumar et al. [8], who presented concentrations greater than 292 mg/L in an unconfined aquifer located in the Central Ganga Basin, India.

The electrical conductivity of the Cuernavaca aquifer was between 90 and 991 μS/cm with a mean conductivity of 409.8 μS/cm. A high variation of electrical conductivity was observed in this aquifer, where the lowest conductivity values were found in sampling well one. Anthropogenic activities, such as agriculture, and rainwater filtration could be the reason for this variation. Jama et al. [38] presented concentrations up to 11,950 μS/cm in the unconfined Doukkala Aquifer located in a large agricultural region in Morocco. The groundwater of the Cuernavaca aquifer is slightly alkaline since its pH is in the range of 6.2–8.4 (the water is considered alkaline when pH > 8 and acidic when pH < 6). This pH range is within the drinking water standards of the WHO (6.5–8.5).

Nitrogen and phosphorus were below the permissible limits proposed by local standards. The TN and TP concentrations found were between 0.012–7.02 and 0.001–0.39 mg/L, respectively. Nitrogen concentrations are not usually frequent in natural soils, they occur due to the contact of the soil cover with nitrated fertilizers, animal waste, domestic effluents, and septic tanks [14]. The total organic carbon was found in a range of 0.07–2.57 mg/L. The presence of organic matter in the Cuernavaca aquifer could be related to the infiltration of the organic matter produced naturally by plants and animals due to excretion and decomposition. This situation is corroborated since fecal coliforms were found in the aquifer, with a mean value of 276.6 CFU/100 mL. The presence of fecal coliforms in groundwater could indicate pollution from anthropogenic sources since the sampling wells are in an urban area with a large population. Table 2 presents the mean values of the water quality parameters measured in the Cuernavaca aquifer from 2012 to 2019.

**Table 2.** Range, standard deviation and mean values for water quality parameters in the Cuernavaca aquifer from 2012 to 2019.


The concentrations of some mineral compounds such as calcium and magnesium cause the precipitation of these salts. In the Cuernavaca aquifer, calcium was found in concentrations from 3.88 to 121.1 mg/L and a mean value of 41.9 mg/L, while magnesium was found from 3.8 to 50.5 mg/L with a mean value of 19.2 mg/L. The presence of concentrations of these salts (Mg2+ and Ca2+) is due to the geological features of the aquifer. Sodium and potassium were found in a range of 1.92–37.5 and 1.3–6.6 respectively. The mean values for all major cations were within the maximum permissible limit [28].

Bicarbonates were within a range of 48.4–295 mg/L and a mean value of 145.1 mg/L was calculated. It is noteworthy that carbonates were not found in the samples. Sulfates in the Cuernavaca aquifer are between 0.8 and 136 mg/L, which are below the SO4 2− concentrations reported in other studies [39] and the guidelines recommended by the WHO [28]. Moreover, the chlorides presented a concentration between 8.4 and 78.2 mg/L, while nitrates showed a maximum concentration up to 6.2 mg/L, with a mean value of 3.2 mg/L. Both anions' mean values were also below the WHO maximum allowable values. Adimalla and Qian [14] suggest that nitrates could be found in groundwaters due to anthropogenic activity. They reported NO3 concentrations up to 198.17 mg/L in groundwater under the influence of agriculture activities in Nanganur, India. In this study, a high variation in NO3 concentrations was found between sampling sites, where the highest concentration was found in sampling site 2. Cadmium, chromium, mercury, lead, zinc, and arsenic were also analyzed in this study; however, the concentrations found could be negligible because low concentrations were observed (cadmium < 0.0002 mg/L; chromium < 0.00088 mg/L; mercury < 0.00009 mg/L, lead < 0.00154 mg/L, zinc < 0.002 mg/L and arsenic < 0.00139 mg/L). According to these results, the influence of geogenic sources was evidenced, where leaching and weathering of rocks and the use of pesticides and fertilizers

could be recognized as the main driving factors for the hydrochemical and water quality of the aquifer.

#### *3.2. Spatial and Temporal Variations of Measuring Indicators*

A total of 23 water quality parameters were analyzed at four sampling wells. These sampling wells are located within the urban area of the city of Cuernavaca, which has different elevations as shown in Figure 2.

The land use and soil classifications in the study area are shown in Figures 3 and 4, respectively. These figures demonstrate that P1 is in a wooded area with little human settlement, close to the annual rainfed agricultural area. The dominant soil type in this area is Luvic phaeozem which is characterized by organic matter and scarce carbonates. This sample site is next to an oak-pine forest land-use zone. Sample sites P2, P3, and P4 have similar characteristics because they are in irrigated agricultural areas close to the urban area. These sites are in a Pelic vertisol soil characterized by high mineral content.

**Figure 3.** Land use classification of the Cuernavaca aquifer.

**Figure 4.** Soil classification of the Cuernavaca aquifer.

Figure 5 presents the spatial interpolation of the water quality parameters in the Cuernavaca aquifer. The highest values for all the parameters analyzed were observed in sampling point two (P2), while sampling point one (P1) presented the lowest values. This situation could be related to the soil type in the area. Since the highest elevation is observed in P1, the rest of the sample points located in lower elevation areas could be influenced by the erosion, transport, and deposition of contaminants.

**Figure 5.** Spatial behavior of physicochemical parameters in the Cuernavaca aquifer.

Figure 6 presents the spatial behavior of the major ions. The presence of ions in the sampling wells is due to interactions with the geological material of the aquifer, natural processes of rock dissolution, and ion leaching. This figure shows that higher concentrations of ions were found at the P2. At this site, groundwater is not suitable for domestic use according to WHO [28] guidelines. The spatial distribution of these chemical elements highlights the vulnerability of the aquifer, especially at P2. Since the concentrations of ions at P1, P3, and P4 sites are similar, they could be considered reference values for the major

ions in the Cuernavaca aquifer. It is noteworthy that the concentration of ions in all the groundwater samples was found to be within the WHO desirable limits for agricultural irrigation [6,28].

**Figure 6.** Spatial behavior of major ions in the Cuernavaca aquifer.

Figure 7 presents the temporal variation of the water quality parameters from 2012 to 2019. No trends, seasonal or cyclic patterns were found in the groundwater quality data. The time series also demonstrated that P2 showed higher values in almost all parameters. Lower concentrations of the physicochemical and major ions are observed at P1 since this site is at a higher elevation where the runoff of anthropogenic contaminants is significantly low. Similar values are presented by Adimalla et al. [14], who evaluated the groundwater of Nanganur county in India. Based on these results, they consider that groundwater quality does not represent health risks for drinking water use and only recommend groundwater defluoridation.

Table 3 presents the ANOVA statistical analysis of the groundwater quality parameters. This table showed that 15 parameters had a statistically significant variation from a spatial point of view. However, only 4 groundwater quality parameters showed a temporal significant variation.

**Figure 7.** Variation of water quality parameters over time (2012–2019) at P1 (---), P2 (-•-), P3 (--) y P4 (--).


**Table 3.** Spatial and temporal statistical analysis (ANOVA) of the water quality parameters measured in the Cuernavaca aquifer.

\* *p*-value ≤ 0.05) is statistically significant.

#### *3.3. Multivariate Statistical Analysis*

Figure 8 shows the Pearson correlations between the groundwater quality parameters. Pearson correlation coefficient (*r*) ranges from −1 to +1 and measures the strength of the linear relationship between parameters [9]. A high negative correlation is found when *r* is close to −1 but *r* values close to +1 indicate a high positive correlation. A Pearson correlation (*r*) close to 0 indicates that there is no linear relationship between the two variables.

**Figure 8.** Pearson correlation coefficients of the water quality parameters of the Cuernavaca aquifer.

Nitrates presented the highest number of correlations with other parameters. This parameter is correlated with TN, TDS, EC, PH, Cl−, K+, Na+, SO4, Ca2+, Mg2+, TH, and WT. TA is related to HCO3 −, NO3 <sup>−</sup>, TN, TDS, EC, Cl−, Na+, Ca2+, Mg2+, TH, and WT. Bicarbonates showed a high correlation with NO3 <sup>−</sup>, TN, TDS, EC, Cl−, Na+, Ca2+, Mg2+, TH, and WT. Total nitrogen is associated with TDS, EC, Na+, SO4 <sup>2</sup>−, Ca2+, Mg2+, TH, and WT. Total dissolved solids are highly related to ions, TH, EC, and WT. The electrical conductivity attributes a higher correlation with Cl−, K+, Na+, SO4 <sup>2</sup>−, Ca2+, Mg2+, TH, and WT. Chlorides are significantly related to K+, Na+, SO4 <sup>2</sup>−, Ca2+, Mg2+, and TH. Potassium is related to other ions such as Na+, SO4, Ca2+, Mg2+, and TH. In turn, sodium is related to SO4 <sup>2</sup>−, Ca2+, Mg2+, and TH). Sulfates are related to Ca2+, Mg2+, and TH, and calcium shows a correlation with Mg and TH. This method has been used for the evaluation of groundwater quality. Strong correlations between major ions are also reported by Miao et al. [40]. This situation evidenced that the groundwater quality of a coastal city in China was affected by various factors, such as dissolution and water evaporation.

Since a high amount of groundwater quality parameters were correlated with each other, a hierarchical cluster analysis was carried out (Figure 9). Hierarchical cluster analysis was used to further unearth the main chemical processes controlling groundwater chemistry in the aquifer [15,34]. This analysis included the 23 analyzed parameters and 34 water samples at different times of the year. The dendrogram formed the main cluster which in turn formed two groups. The first group includes only fluorides and sulfates. The second main group is composed of the rest of the water quality analyzed parameters. Several subgroups are evidenced, such as those formed by TDS, EC, and TH, TA, and HCO3 <sup>−</sup>, and SiO2, K, Ca2+, and Mg2+. These results corroborated the relationship between the observed parameters in the Pearson correlations. The relationship between the groundwater parameters indicated a common source. Due to the nature of these subgroups, groundwater quality is derived from geogenic sources, mainly carbonate mineral solutions [32]. Abdelaziz et al. [34] also noted that the dendrogram can be used to classify the groundwater quality parameters and found great similarities with the grouping carried out by the principal components analysis.

**Figure 9.** Hierarchical cluster analysis for groundwater quality parameters monitored in the Cuernavaca aquifer from 2012 to 2019.

#### *3.4. Drinking Water Quality Index*

Table 4 shows the DWQI obtained in the four sampling sites of the Cuernavaca aquifer. DWQI range from 11.2 to 78.2 were obtained from 2012 to 2019, where 70% of the samples showed an excellent groundwater quality, mainly in the P1, P3, and P4 sampling sites, as shown in Figure 10. P1 showed the best water quality, possibly because this sampling site is at the highest elevation and close to a protected green area. In contrast, P2 showed a high variation of groundwater quality because it is in a highly-populated area. Similar DWQI results are reported by Ahmed et al. [41], who mentioned that the DWQI ranged from 1.86 to 82.25 for water samples from different sampling sites of an aquifer in India.


**Table 4.** Classification of the water quality index and percentage of the values of the Cuernavaca aquifer samples.

**Figure 10.** Variation of the water quality index in the sampling wells of the Cuernavaca aquifer.

#### *3.5. Hydrochemical Characteristics*

Hydrochemical analysis was carried out to characterize the Cuernavaca aquifer's groundwater. A high content of salts in groundwater could lead to the salinization of the soils and crop yield losses due to dehydration of plants [38,42]. The concentrations of salts in the Cuernavaca aquifer showed the following behavior:

$$\mathrm{HCO\_3}^- > \mathrm{Ca^{2+}} > \mathrm{Na^+} > \mathrm{Mg^{2+}} > \mathrm{Cl^-} > \mathrm{SO\_4}^{2-} > \mathrm{K^+} > \mathrm{NO\_3}^-$$

Figure 11a shows the Piper triangular diagram. In this figure, the mean values of 34 samples at each sampling point were used. This diagram is a graphical representation of groundwater chemistry, where the relative concentrations of cations and anions are shown by separate ternary plots. In the lower-left ternary plot (cation diagram), a dominance of Mg2+ and Na+ + K<sup>+</sup> is observed. This dominance could be related to progressive evaporation and ion exchange processes [43]. The lower-right ternary plot (anion diagram) indicated that the groundwater chemistry of the Cuernavaca aquifer is highly influenced by Calcium-bicarbonate type and Bicarbonate type [44]. These results are consistent with the results suggested by other researchers [45]. Figure 11b shows the Schoeller diagram which exhibits a similar behavior of cations and anions in the multiple samples from different wells. This diagram demonstrated that the highest equivalent concentrations of the ions were present in sampling site P2, where Ca2+ and HCO3 − showed the highest equivalent concentrations of cations and anions, respectively. Similar results were presented by Tefera et al. [31] in Tana basin in Ethiopia. However, Abotalib et al. [46] obtained opposite results to those presented in this study for an aquifer located in hyperarid deserts in central Egypt.

**Figure 11.** Piper (**a**), and Schoeller (**b**) diagrams for groundwater chemistry composition at P1 ( ), P2 ( ), P3 ( ) y P4 ( ).

El Osta et al. [19] suggest that the groundwater chemistry of an aquifer is the result of evaporation, weathering, and rock-water interaction. In this study, the Gibbs diagram (Figure 12) showed that the cations and anions in the aquifer are primarily controlled by rock–water interaction. The dissolution of the rock in the aquifer is evidenced since a high content of chlorides and sulfates is observed. Therefore, this process regulates groundwater chemistry and quality. Likewise, this diagram suggests that the P2 site could be controlled by evaporation. This process produces dissolved solutes in groundwater and soil in areas with little depth [15,19,47,48].

**Figure 12.** Gibbs diagram showing the source of cations and anions in the Cuernavaca aquifer at P1 (-•-), P2 (-•-), P3 (-•-) y P4 (-•-) sites.

#### *3.6. Groundwater Indices Based on Hydrochemical Features*

Figure 13 presents the classification of water quality for irrigation purposes. Some groundwater indices obtained in this study suggest that the Cuernavaca aquifer has a good quality for irrigation purposes. For example, the SAR index classified the groundwater as excellent, which indicates that there is no risk of sodium for irrigation. The RSC index also showed that all samples presented a good quality of water for irrigation. Based on the KR index, the groundwater of the Cuernavaca aquifer was adequate in most of the samples (84%).

**Figure 13.** Classification of water samples for irrigation purposes in the Cuernavaca aquifer.

However, other groundwater indices suggest that groundwater quality is inadequate for irrigation. The SSP index showed that 65% of the samples have good water quality, mainly at P2 and P4. However, most of the samples at P1 and P3 showed an inadequate quality. Tefera et al. [31] reported that 53.3% of samples analyzed in alluvial aquifers in the Upper Blue Nile Basin, Ethiopia, could be considered good quality but 46.7% of the samples could be considered unsuitable. The MH index demonstrated that 56% of the samples have adequate quality at P2 and P4, but inadequate at P1 and P3. The high sodium levels found at these sampling sites could be related to weathering of Na-containing basaltic rocks. However, the content of calcium (41.93 mg/L) and magnesium (19.25 mg/L) in the groundwater of the aquifer maintains an equilibrium state. The groundwater of the Cuernavaca aquifer could be considered good quality according to the %Na. Most of the samples (56%) are within 20–40% Na. However, high sodium percentages were recorded in 38% of the samples, mainly at P1. The presence of high levels of sodium could reduce soil permeability. Similar results were obtained when using the PI index. Good water quality was observed in 56% of the samples, but it is noteworthy that 29% of the samples were within the poor-quality range (PI > 100). This groundwater quality index is related to the texture and structure of the soil. Since a high content of ions such as sodium, magnesium, calcium, and bicarbonates were found in the aquifer, the PI index also suggests that the use of groundwater for irrigation could affect the soil permeability [19]. Moreover, the high levels of bicarbonates over calcium and magnesium make groundwater unsuitable for irrigation uses.

The hardness of groundwater varied from soft to very hard. This variation is related to urbanization since soft groundwater was found at P1, characterized by the presence of agricultural areas, with low population density and small settlements, while very hard groundwater was located at P2 which is characterized by a mineralized subsoil. Hardness levels found in this study could be considered normal according to that suggested by Udeshani [49], who reports similar TH values in the groundwater of Sri Lanka.

The electrical conductivity in the Cuernavaca aquifer was found between 90 and 991 μS/cm. Despite this high variation, most of the samples were in a good quality range according to Tutmez's [50] classification (EC level between 0 and 750 mS/cm). The levels of electrical conductivity have been increasing during the last years. This increase could be also related to the loss of vegetation cover due to urbanization [51]. However, electrical conductivity in groundwater showed a satisfactory quality classification because the presence of ions (Ca2+, Mg2+, Na+, K+, Cl−, HCO3−, SO42+, and NO3−) is within the permissible limits according to the standards of the World Health Organization

(WHO). Determining groundwater suitability is important to understand the potential negative impacts of the high content of ions on crop production and mitigate groundwater contamination problems to improve healthy crop production [31].

#### **4. Conclusions**


**Author Contributions:** Conceptualization, J.G.L.; formal analysis, S.A.M.-A.; investigation, R.E.-V. and Y.B.-T.; methodology, J.G.L. and V.B.-T.; project administration, V.B.-T.; supervision, Y.B.-T.; validation, J.G.R.-P.; visualization, S.A.M.-A. and A.Q.-C.; writing—original draft, J.G.L. and Y.B.-T.; writing—review and editing, J.G.R.-P. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data that support the findings of this study are available on request from the corresponding author.

**Acknowledgments:** The authors kindly acknowledge the National Water Commission for providing data and the Polytechnic University of Morelos for their support.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Article* **Groundwater Quality Monitoring Using In-Situ Measurements and Hybrid Machine Learning with Empirical Bayesian Kriging Interpolation Method**

**Delia B. Senoro 1,2,3,4,\* , Kevin Lawrence M. de Jesus 2,3,4, Leonel C. Mendoza 4,5 , Enya Marie D. Apostol 4,5 , Katherine S. Escalona 4,6 and Eduardo B. Chan <sup>7</sup>**

	- **\*** Correspondence: dbsenoro@mapua.edu.ph; Tel.: +63-2-8251-6622

#### **Featured Application: In-Situ and Hybrid Machine Learning—Geostatistical Interpolation method for groundwater quality monitoring applications.**

**Abstract:** This article discusses the assessment of groundwater quality using a hybrid technique that would aid in the convenience of groundwater (GW) quality monitoring. Twenty eight (28) GW samples representing 62 barangays in Calapan City, Oriental Mindoro, Philippines were analyzed for their physicochemical characteristics and heavy metal (HM) concentrations. The 28 GW samples were collected at suburban sites identified by the coordinates produced by Global Positioning System Montana 680. The analysis of heavy metal concentrations was conducted onsite using portable handheld X-Ray Fluorescence (pXRF) Spectrometry. Hybrid machine learning—geostatistical interpolation (MLGI) method, specific to neural network particle swarm optimization with Empirical Bayesian Kriging (NN-PSO+EBK), was employed for data integration, GW quality spatial assessment and monitoring. Spatial map of metals concentration was produced using the NN-PSO-EBK. Another, spot map was created for observed metals concentration and was compared to the spatial maps. Results showed that the created maps recorded significant results based on its MSEs with values such as 1.404 <sup>×</sup> <sup>10</sup>−4, 5.42 <sup>×</sup> <sup>10</sup>−5, 6.26 <sup>×</sup> <sup>10</sup>−4, 3.7 <sup>×</sup> <sup>10</sup>−6, 4.141 <sup>×</sup> <sup>10</sup>−<sup>4</sup> for Ba, Cu, Fe, Mn, Zn, respectively. Also, cross-validation of the observed and predicted values resulted to R values range within 0.934–0.994 which means almost accurate. Based on these results, it can be stated that the technique is efficient for groundwater quality monitoring. Utilization of this technique could be useful in regular and efficient GW quality monitoring.

**Keywords:** groundwater; heavy metals; physicochemical parameters; in-situ; machine learning; geostatistical analysis

#### **1. Introduction**

Water quality is associated with ecosystem preservation, economic growth and social development [1]. Groundwater (GW) quality is critical to the Philippines' overall water resource; hence, monitoring should be given attention. Population expansion and the acceleration of modernization as well as industrialization have resulted in an increased

**Citation:** Senoro, D.B.; de Jesus, K.L.M.; Mendoza, L.C.; Apostol, E.M.D.; Escalona, K.S.; Chan, E.B. Groundwater Quality Monitoring Using In-Situ Measurements and Hybrid Machine Learning with Empirical Bayesian Kriging Interpolation Method. *Appl. Sci.* **2022**, *12*, 132. https://doi.org/10.3390/ app12010132

Academic Editors: Amit Kumar, Santosh Subhash Palmate and Rituraj Shukla

Received: 5 December 2021 Accepted: 17 December 2021 Published: 23 December 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

water demand [2]. It is inevitable that quality of both surface water and GW is compromised in areas where economy is in transition and where there is increasing urbanization, industrialization, and agricultural activities [3]. However, due to scarcity of elemental laboratory instruments in the Philippines, the preparation requirements of the laboratory station-based instruments for samples, and the travel time from sampling sites to laboratory stations become a challenge to GW quality monitoring especially to elemental concentration analysis.

The population of the Philippines has rice in its regular daily meals. Rice fields and techniques to produce high and good quality yields are among the agricultural areas and programs, respectively, that are being supported by the Philippine government. Calapan City, Oriental Mindoro province in the Philippines, with GW as primary water source, is among the top producers of rice in the country. Rice fields require highly flattened area and good water quality as among the criteria to increase yields with good rice quality. Having the Philippines within the tropical storm belt, Calapan City experienced regular flooding as shown in Figure 1. This condition become a challenge to the water quality for rice fields and for domestic supply.

**Figure 1.** Flood Hazard Map of Calapan City [4].

The Philippines is rich in natural resources such as metals and non-metallic minerals [5] with tropical climate where annual rainfall is high. However, anthropogenic activities, due to economic development, population positive growth rate and urbanization, caused unintentional adverse effects to the environment. When the pristine environment is disturbed, and minerals are exposed to oxygen and water, chemical reactions happen. Similar condition takes place during weathering. Having rich in mineral resources, high annual rainfall, large flattened agricultural areas, and GW as primary source of water supply, GW quality monitoring is important. Water quality that has elevated concentration of metals such as arsenic (As), copper (Cu), iron (Fe) nickel (Ni), manganese (Mn), lead (Pb), and other metals known for its toxicity characteristics would have acute and/or chronic adverse effects to human health [6–8].

Several technologies for measurement of the presence of metals in GW exists such as Inductively Coupled Plasma—Mass Spectrometry (ICP—MS), Inductively Coupled Plasma—Optical Emission Spectrometry (ICP—OES), and Atomic Absorption Spectrometry (AAS). These approaches are laboratory-based and require several days before result

of analysis is available. This condition become not suitable for field monitoring and measurements [9] in sub-urban, rural areas, areas where access is a challenge and when time is a primary criterion in the analysis. These conditions require in-situ measurements, and accurate detection become a critical component in monitoring the GW quality. Insitu measurements provide observations on a rapid phase as well as covering the wider areas especially those with difficulty in access. This is in contrast of laboratory—based methods with significant limitations such as expensive instrumentation making limited availability, complex sample preparation and applicability in field conditions [10]. There are on-site detection and monitoring techniques such as electrochemical analysis [10,11], cyclic voltammetry (CV) [12], anodic stripping voltammetry (ASV) [13], square wave anodic stripping voltammetry (SWASV) [14], electro-chemical impedance spectroscopy (EIS) [15], electrochemiluminescence (ECL) [16] and the use of piezoelectric biosensors [17]. However, these techniques have drawbacks such as background noise control, unable to fulfill the current requirements for selectivity [9], detection limits of CV [12], insolubility of metals and the multiple peaks of ASV [13], complicated interferences and complex matrices in SWASV [14], EIS's inability to identify different ions [15], frequent fouling of electrodes in the case of ECL [16], and only few enzymes are sensitive to heavy metals for the case of biosensors [17]. Hence, the use of portable x-ray fluorescence spectrometry (pXRF) technique in onsite metals detection and analysis is appropriate in rugged condition yet provides user of accurate and rapid analysis. This is in contrast of laboratory-based methods with significant limitations such as expensive instrumentation, complex sample preparation and applicability in field conditions [18]. Therefore, non-destructive analytical technique, such as relatively simple spectra line void of many interferences and rapid multi-element analyses [19], contributes significantly to the successful implementation of this study.

Concentration maps were frequently used tool for spatial monitoring. Spatial information in water resources are limited, and GW quality data can be obtained only through spot sampling. However, this procedure often requires extensive manpower and resources [10]. The issue in this practice and the determination of sample locations density influences the accuracy of the generated spatial maps [19]. The integration of in–situ measurements and GIS—based spatial interpolation techniques offer an improvement in the presentation and display of the status of GW quality in an area. The use of this integrated approach provides a clear and intelligent base—maps which can be utilized by researchers, policy makers, implementors for planning proposals [20] and creation of strategic programs.

Several studies on GW monitoring and assessment implemented using GIS—based approaches focused on different Southeast Asian countries such as the Philippines [1,21], Thailand [22], Malaysia [23], Singapore [24], Indonesia [25], Cambodia [26], Laos [27], and Vietnam [28]. This article illustrates the quantification and mapping of the concentrations of heavy metals such as Ba, Cu, Fe, Mn, and Zn in GW and presents the utilization of in-situ GW quality monitoring that uses a hybrid machine learning-geospatial interpolation technique and pXRF. This type of technique and analyses give prompt, accurate data and information on the current GW quality. This is to address the challenges encountered on-ground during sampling activities, the scarcity of instruments in the Philippines due to its price, and the complex samples preparation required by some laboratory-based instruments. Analyzing heavy metal concentrations in a faster, accurate and convenient method can help the researchers, authorities, water utility companies and local government units in making prompt decision, guidelines and strategic programs.

#### **2. Materials and Methods**

#### *2.1. Description of the Study Area*

The study area is Calapan City, in the province of Oriental Mindoro, Philippines. This is a third-class city and one of only two cities in the MIMAROPA region of the Philippines. It is the capital of the island province of Mindoro and located on the island's northeastern shore. It has a population of about 150,000 people (about 25,000 households)

and 62 barangays (the smallest local government unit) [29]. The city lies within 13◦22 N Latitude and 121◦9 E Longitude and Mindoro is located approximately 13◦11 N Latitude and 121◦53 E Longitude south of Mainland Luzon. The island of Mindoro is popularly known of rice production. Calapan City has an area of 217.30 square kilometers. Deep and shallow wells, in addition to piped water supply, are currently the primary sources of water in the city.

#### *2.2. Collection/Treatment of Groundwater Samples*

The GW samples were collected from twenty-eight (28) suburban deep and shallow well sites following the USEPA SESDPROC-301-R3/SESDPROC-111-R4 [30] as shown in Figure 2. The GW samples were collected using stainless steel sampler and polyethylene (PE) bottles. The PE bottles were thoroughly pre-washed with Type 1 water. Each PE bottle was carefully labeled, sealed, and placed temporarily in coolers for metals concentration detection. This is in preparation for the detection of the presence of metals concentration in all collected GW samples.

**Figure 2.** Map of the Study Area and Sampling Sites.

#### *2.3. Physicochemical and Metal Concentrations Analysis*

Temperature, pH, electric conductivity (EC), and total dissolved solids (TDS) of the samples were determined onsite using a multi-parameter water analyzer (HANNA HI 9811-5) with HI1285-5 probe (electrode) and HI7007, HI70031, HI70032, buffer solutions for calibration [31]. The HI7007, HI70031, HI70032 solution were used for pH, EC and TDS calibration, respectively. While HI700661 solution was used for cleaning the electrode. The physicochemical values detected in groundwater were compared to the permissible limits specified in the 2017 Philippine National Standards for Drinking Water (PNSDW) [32] and the WHO Drinking Water Guidelines [33,34]. These water parameters were used in the hybrid machine learning technique.

The heavy metal concentration analysis employed the use of portable handheld Olympus Vanta X-Ray Fluorescence Spectrometry. This pXRF is a rapid onsite accurate elemental analyzer that could be used for various environmental media including water [35–39]. The pXRF was set on geochem mode and recorded metals concentration in ppm (mg/L) detected from GW samples. Target metals were Ba, Cu, Fe, Mn and Zn.

#### *2.4. Spatial Concentration Mapping Using Machine Learning Informed Empirical Bayesian Kriging (EBK) Method*

Hybrid machine learning—geostatistical interpolation (MLGI) method was employed. The EBK technique was used to produce spatial concentration maps of the GW's physicochemical properties and heavy metal concentrations. By sub-setting and replicating observed data, the EBK automates the most time-consuming aspects of constructing a viable kriging model. EBK provides a distribution of semi-variogram models and compensates for semi-variogram estimate uncertainty. The EBK is more realistic and superior to other current geostatistical modeling methods owing to its dependence on limited maximum likelihood estimation. This is in contrast to other existing kriging models that rely on weighted least squares estimation. EBK has many significant benefits, including a low need for interactive modeling, more accurate prediction of standard error and projection for small datasets as compared to other traditional kriging techniques, and exact prediction of substantially non-static data [40].

The Artificial Neural Network (ANN) approach is a subset of methods for artificial intelligence inspired by biological neurons. It is capable of quickly acquiring patterns and forecasting the result of a problem in a multi-dimensional environment. ANN models are trained using datasets [41] to show the efficacy. The training algorithm and the transfer function that was utilized in the model are two critical components of the ANN model. The Levenberg—Marquardt (LM) algorithm was chosen as the training algorithm since it is the quickest function for training a network, and the hyperbolic tangent sigmoid function was used as the transfer function because it is the recommended transfer function for rapid processes [42,43].

Particle Swarm Optimization (PSO) is a population-based stochastic optimization technique inspired by biological communities' collaborative nature. The PSO is initiated using a community of randomly generated particles as solution options. It looks for global optima via iterations in which particles with their own velocity fly around the search space following the current optimum particles, which is the best approach for finding the best solution. The PSO was integrated to the ANN to determine the weights and biases which gives the minimum error [44].

This hybrid technique integrated to the EBK method generated the spatial concentration maps of the target study area. The Neural Network—Particle Swarm Optimization (NN-PSO) approach was applied to generate the spatial concentration maps of physicochemical parameters and HM concentrations.

#### **3. Results**

Subsequent sections elaborate the results of the study and in comparison, of the WHO and PNSDW guidelines.

#### *3.1. Physicochemical Groundwater Parameters*

The recorded physical and chemical properties of groundwater of the 28 sampling points are shown in Table 1, and in comparison to WHO (2017) and PNSDW (2017) guidelines. The detailed description of the sampling locations of the study area with the physicochemical properties of GW were exhibited in Table A1 of Appendix A.


**Table 1.** The groundwater physical and chemical properties.

The GW temperatures ranged from 26.2 to 33.6 degree Celsius which could lead to increased release rates of metals concentration especially within the water temperature of 30–35 degrees Celsius [47]. Furthermore, the study of Zhu, et al. in 2010 [48] attributed the high temperature in GW to the boom of urbanization that was also observed in the City of Calapan. The recorded pH of GW ranged from 6.7 to 8.8 which is within the pH range guidelines set by the WHO and PNSDW [49]. The release rates of metals were affected by a lower water pH. Lower pH of water means acidic water and known to be aggressive, enhancing the breakdown of Fe and Mn resulting in an unpleasant taste in water [50]. This condition could have adverse effects including heavy metal poisoning and toxicity [51–53]. The majority of the water samples are slightly basic which could be attributed to the existence of carbonates and bicarbonates [54]. The TDS and EC range recorded was 40–900 ppm and 100–1820 μS/cm, respectively. The TDS in GW found to be below WHO guidelines; however, beyond PNSDW guidelines. The TDS and EC had been found to have positive correlation [55]. The elevated EC of 1820 μS/cm has been attributed to inorganic chemicals in ionized form in water [54] such as metal elements.

#### *3.2. Heavy Metal Concentrations*

Presence of heavy metals were investigated in GW samples collected from the 28 sampling sites as indicated in Figure 2. Detected concentrations were compared to the existing maximum allowable limit of the WHO and the PNSDW 2017. These limits are enumerated in Table 2. The toxicants found in the GW samples are discussed in more detail in the subsections below.


**Table 2.** Permissible limits of metals in groundwater.

<sup>1</sup> Guidance value.

#### 3.2.1. Barium

All sampling locations observed Ba concentrations below permissible limits (Figure 3) of WHO, USEPA and PNSDW. The presence of Ba in GW has been attributed to the weathering of rocks such as igneous rocks, sandstone, shale, and coal [57,58].

**Figure 3.** Concentration of Ba in GW samples.

#### 3.2.2. Copper

The Cu concentrations (Figure 4) in GW samples from all sampling locations did not exceed the WHO guideline of 1.3 mg/L. The Cu in trace amounts in GW was associated to the kind of rock that forms the aquifer [59]. Another possible source of copper is the pipeline. Also, Cu concentrations at all sampling sites were within the acceptable limit of PNSDW (1 mg/L) and WHO guidelines.

**Figure 4.** Concentration of Cu in GW samples.

#### 3.2.3. Iron

Iron stains laundry and plumbing fixtures at concentrations more than 0.3 mg/L; it can also give metallic taste [60]. Hence, USEPA set an allowable concentration limit of 0.3 mg/L. The majority of the Fe in GW comes from minerals and sediments which may be in the form of particulate or dissolved [61]. The Fe concentration in each sampling location is presented in Figure 5. Sampling location 8 recorded an elevated Fe concentration. This was attributed to a longer residence time [62] which is associated to the type of subsurface (aquifer) that promotes longer residence time and creates opportunity for metals to react through chemical and physical weathering [63]. In addition, the area shown in Figure 2 illustrates the area of sampling point 8 of having lesser active wells. This condition also contributes to longer residence time of GW.

**Figure 5.** Concentration of Fe in GW samples.

#### 3.2.4. Manganese

Groundwater samples collected from all sampling locations did not exceed the WHO's maximum permissible level for Mn concentration (Figure 6). The natural occurrence of Mn in GW can be influenced by several factors including TDS, GW level fluctuations, and the residence time. Agricultural operations and domestic wastewater are additional potential two sources of Mn that can adversely affect the GW quality [64,65].

**Figure 6.** Concentration of Mn in GW samples.

#### 3.2.5. Zinc

The highest concentration of Zn was recorded at Brgy. Gutad. However, this highest Zn concentration was within the WHO and PNSDW permissible limits. Several locations observed without Zn concentrations were at Brgy. Balingayan, Brgy. Maidlang, Brgy. Managpi, and Brgy. Personas. Zinc is naturally found in GW and the acidity affects the quality [64]; hence, it is important that monitoring is carried out. The acidity theory states that the higher the acidity (i.e., lower pH) of the water, the higher the Zn concentration. As observed in Table 1, GW samples from all sampling sites were slightly basic which explains the low concentration levels of Zn. The Zn concentration in each sampling location is presented in Figure 7.

**Figure 7.** Concentration of Zn in GW samples.

#### *3.3. Correlation Analysis*

The correlations between the physicochemical characteristics were investigated using Pearson correlation analysis calculated through International Business Machine Statistical Package for Social Sciences (IBM SPSS). The r and p values were presented. The r value expresses the relationship between variables. The p value expresses the significance of the relationship. A lower *p*-value denotes statistical significance, whereas a higher *p*-value denotes the opposite. A negative correlation was found between pH and the other variables, while a positive correlation was found between the other parameters. At the 0.01 p level, all relationships were significant. The correlation matrix for the physicochemical parameters are presented in Table 3.


**Table 3.** Correlation matrix of the physicochemical parameters.

\*\* Correlation is significant at the 0.01 level (2-tailed).

A substantial negative correlation was observed between pH and temperature (r = −0.665), pH and EC (r = −0.602), and pH and TDS (r = −0.603) which is similar to the findings of Abou Zakhem et al. in 2017 [66] and Sunkari and Abu in 2019 [67]. On the other hand, a substantial positive correlation was observed between temperature and EC (r = 0.657), temperature and TDS (r = 0.664), and EC and TDS (r = 0.995). This correlation values agreed to the findings of Wali et al. in 2021 [68].

Similar to physicochemical parameters, Pearson correlation analysis for the relationships between Ba, Cu, Fe, Mn, and Zn was also taken. Fe was positively correlated with Mn and Zn; and Mn was positively correlated with Zn. Positive substantial correlations between these metals indicated the same origin, are mutually dependent, and have similar transport characteristics [69]. The positive p with higher r values of this study illustrates relationship between metals; however, this relationship was not significant. The presence of these metals in GW is attributed to natural weathering of rocks. The correlation matrix for metals concentrations is shown in Table 4.


**Table 4.** Correlation matrix for the heavy metal concentration of groundwater samples.

\*\* Correlation is significant at the 0.01 level (2-tailed).

#### *3.4. Spatial Concentration Mapping Using NN-PSO + EBK*

The NN-PSO simulation was applied to accelerate the performance of the prediction capability of the EBK method. The simulation showed an excellent result as evident to the mean squared error (MSE) and correlation coefficient (R) values wherein the ideal value is 0 and 1, respectively. The NN-PSO simulation performed for the physicochemical parameters and heavy metal concentrations are presented in Table 5. Correlation plots of the R (validation) and R (testing) for the governing NN-PSO models of physicochemical parameters and heavy metals concentration are illustrated as Figure A1 of Appendix B.


**Table 5.** NN-PSO Simulation Results.

The relationship between the number of neurons ranging from 1 to 30 and the corresponding AIC (Akaike Information Criterion) values obtained for the physicochemical parameters (temperature, pH, EC, and TDS) as well as the heavy metal concentrations are exhibited in Figures 8 and 9, respectively. These figures represent the AIC values of all NN-PSO models for each hidden neuron that was simulated in this study. It was observed that the best models for the physicochemical parameters were determined from the 25, 29, 30, and 27 hidden neurons (HN) for temperature, pH, EC and TDS, respectively. The best models for Ba, Cu, Fe, Mn, and Zn were observed in 29, 29, 28, 29, 30 HN, respectively, for the heave metal concentrations.

**Figure 8.** The AIC Values for Physicochemical Parameters.

**Figure 9.** The AIC Values for Heavy Metals.

The spatial concentration of the physicochemical parameters of GW in Calapan City, Oriental Mindoro was mapped using NN-PSO+EBK interpolation method. The highest temperature for GW recorded in the study area was 33.6 ◦C which was observed in Brgy. Parang. While the least temperature was observed in Brgy. Biga with recorded temperature of 26.2 ◦C. The highest pH for GW was observed at Brgy. Canubing I, with pH equal to 8.8. The lowest pH was detected at Brgy. Sto. Nino with pH of 6.7. The highest EC and TDS observed in Brgy. Ibaba West with EC and TDS value of 1820 μS/cm and 900 ppm, respectively. The least observed EC and TDS concentration was 100 μS/cm and 40 ppm, respectively which was recorded in Brgy. Sta. Rita. The spatial concentration of the physicochemical parameters of GW is shown in Figure 10.

**Figure 10.** Physicochemical parameters map of Calapan City (**a**) Temperature, (**b**) pH, (**c**) EC, and (**d**) TDS.

The spatial concentration of the heavy metals of GW in Calapan City, Oriental Mindoro including Ba, Cu, Fe, Mn, and Zn was also mapped using the NN-PSO+EBK interpolation method. The heavy metal concentration maps generated using the NN-PSO+EBK method was presented in Figure 11.

**Figure 11.** *Cont*.

**Figure 11.** Heavy metal concentration map of Calapan City (**a**) Ba, (**b**) Cu, (**c**) Fe, (**d**) Mn, and (**e**) Zn.

The highest Ba concentration was measured in Brgy. Canubing I, where it was 7.9 times more than the background value for Ba measured in the research area. The average Ba concentration across all sample sites was 5.6 times more than the background value in the area of study. The highest concentrations of Cu were measured at several locations and recorded to be three times greater than the background concentration of copper. The mean concentration of Cu was found to be 2.1 times that of the background concentration. The Fe concentrations were found to be highest in Brgy. Gutad where it was found to be 6.1 times greater than the background value reported for the research region. Moreover, multiple sites were observed to exceed the WHO standards for Fe. These sites include Brgy. Camansihan, Brgy. Ibaba East, Brgy. Masipit, Brgy. Parang, Site 2 of Brgy. Personas, Brgy. San Vicente East, Brgy. Sta. Cruz, Brgy. Sta. Rita, and Brgy. Sto. Nino. Meanwhile, the mean concentration in Calapan City was just 0.5 percent more than the background level. Mn and Zn concentrations were highest in Brgy. Sto. Nino and Brgy. Gutad, respectively. However, these concentrations were still below the background concentration in the research region. The heavy metal concentration trend in the study area was observed to be Mn < Cu < Ba < Zn < Fe. Generally, these concentrations are within the WHO, USEPA and PNSDW limit except for Fe in several areas.

#### *3.5. Cross Validation and Spot Sampling Evaluation Results*

The predicted and observed values were compared to the NN-PSO+EBK method using the correctness measures to test the robustness of the predicted models. The results shown in Table 6 exhibit a robust and accurate result based on the R values close to 1. The crossvalidation results suggested that all values provided more accurate spatial distribution for the study area. The cross-validation results are presented in Table 6.



A spot sampling analysis was performed using the data from the households in different barangays of Calapan City. A total of 21,559 households were utilized in the spot sampling analysis which is presented in Figure 12. The distribution of the households included in the spot sampling analysis per barangay is presented in Figure 13.

**Figure 12.** Concentration of Zn in GW samples.

**Figure 13.** Distribution of the Number of Households included in the Spot Sampling Analysis.

The spot sampling results was compared to the spatial concentration maps created in Figure 11. Table 7 exhibits the spot sampling comparison results for all heavy metals detected in the GW resources in Calapan City. The results showed that the created maps provided good results based on its MSE which is approaching zero when contrasted to the spot sampling values.

**Table 7.** Spot Sampling Comparison Results.


Considering each barangay, the MSEs for each element were also obtained as presented in Table A2 of Appendix C. Figure 14 presents the summary of the spot sampling comparison results in each barangay for all heavy metals considered in the study.

**Figure 14.** *Cont*.

**Figure 14.** Summary of MSE per Barangay considering (**a**) Ba, (**b**) Cu, (**c**) Fe, (**d**) Mn, and (**e**) Zn.

#### **4. Discussion**

Oriental Mindoro, an island province, is vulnerable to GW pollution and degradation due to natural and human activities. Due to structural disadvantages and characteristics such as smaller land area and population, insufficient natural resources, geographical distribution, and other global factors beyond domestic control, a small island economy is less resilient to the threat of GW deterioration and contamination than larger and more diverse economies [70].

Water plays a critical part in achieving the United Nations' Sustainable Development Goals (SDGs). One of the problems that population has been experiencing is ensuring that everyone achieves SDG 6 (clean water and sanitation) which seeks to guarantee universal access to, and sustainable management of water. Continuous data integration and frequent monitoring remain to be critical components to achieving SDG 6 [71]. Hence, creating tools to aid in carrying out GW monitoring is significant. Tools such as the hybrid NN-PSO+EBK in making GW monitoring convenient to researchers and authorities are important.

Various heavy metals were detected at various sampling sites across Calapan City. The mean concentration of these metals in GW remained below the WHO and PNSDW acceptable levels. The recorded in-situ physicochemical characteristics were also compared to the WHO and PNSDW acceptable limits. The average GW temperature observed in the study area was 29.99 ◦C while the average GW pH observed was 7.69. Both figures are within the permissible range of WHO and PNSDW. One (1) sampling location exhibited pH value exceeding the maximum limit for pH of PNSDW. The average EC for the area of study was 560.36 μS/cm. This is within the permissible limit of the WHO. One (1) sampling location exhibited an EC value exceeding the WHO limit of 1500 μS/cm. The EC observation made in this location was categorized as Type II. The EC greater than 1500 μS/cm but less than 3000 μS/cm implies medium salts enrichment [72]. The average TDS is 277.15 ppm which was below the maximum allowable limit by the PNSDW and WHO. Though, TDS levels did not exceed the permissible limits set by WHO and PNSDW but were substantially lower or higher than the suggested TDS range of 600−1000 impairing palatability. Specifically, data of water samples from Ibaba West recorded TDS of 900. This number is at the high side of the limit which suggested the probability of impaired taste. On the other hand, the TDS concentrations recorded in Balingayan (60), Biga (50), Canubing (50), Comunal (90), Ibaba East Site 1 (160), Managpi (100), Personas (60), and Sta. Rita (40) were all significantly lower than the recommended range (600−1000 ppm) which may result to flat and insipid flavor.

The heavy metals concentration for GW was also observed in the study area. The Fe concentrations detected in multiple locations were above the WHO and PNSDW standards. However, the rest of the heavy metals detected were within the permissible limits. Water with elevated metals concentration has the potential to cause several public health issues. Health risks associated with elevated Fe in GW is probable. Pollutants entering the human body through drinking water have been shown in numerous studies to have detrimental health consequences for consumers. Micronutrients are essential in living organisms; how-

ever, elevated concentration adversely affects public health. Similar case with Mn which is necessary for humans; however, excessive quantities will have negative consequences. Neurological disorders, such as aberrant walking, ataxia, muscle hypotonicity, and a face devoid of lasting emotions, are frequently associated to Mn [64]. Dysfunction of liver was also reported [73]. Furthermore, excess Mn concentration has been demonstrated to produce neurotoxicity in infants receiving parenteral nourishment [74]. Excess Mn has been also linked to a lower level of IQ in children [64].

Meanwhile, asbestos-related cancer is believed to be caused by free radicals, which are produced by iron. Free radicals produced by iron can cause cancer by oxidizing DNA and causing DNA damage [75]. Additionally, elevated levels of Mn and Fe in drinking water have been associated to a decrease in birth weight in term-born infants [76]. Furthermore, since animals' intestinal mucosa is highly porous, the fast absorption into the blood has been attributed to the Ba2+ ions which are rapidly absorbed from the gastrointestinal system and lungs. Moreover, it has been observed that Ba poisoning mostly affects the cardiovascular system; nevertheless, renal dysfunction has been documented as well [77].

The use of in-situ and hybrid machine learning—geostatistical methods are an integral part of data integration for GW quality monitoring. The impact of GW contamination in an island province had been a threat to public health especially when GW is used as primary source of domestic, agricultural and industrial water supply. Application of NN-PSO+EBK hybrid technique enables the establishment of spatial variability map of the contaminants that contributes to the depletion of GW quality. As a result, future undesired consequences could be avoided using this monitoring technique. The NN-PSO+EBK can offer periodic and long-term data that can be utilized for permanent monitoring of GW quality and risk assessments. Also, this tool can be utilized as early warning of GW quality for detrimental effects [78] by human activities and/or natural weathering.

#### **5. Conclusions**

An in-situ approach and hybrid MLGI, i.e., NN-PSO+EBK, was applied to assess and evaluate the GW quality in Calapan City, Oriental Mindoro, Philippines. Physicochemical characteristics and metals concentrations were detected onsite at various sampling locations. Generally, the physicochemical analysis of GW samples met the WHO and PNSDW guidelines. The average values for temperature, pH, EC, and TDS were within the permissible limits though few sampling locations exceeded the permissible limits of WHO and PNSDW. The pH of all samples was within the limits set by the PNSDW. Barangays Buhuan, Camansihan, Gutad, Ibaba East (Site 2), Ibaba West, Ilaya, Lazareto, Maidlang, Masipit, Nag-iba II, Pachoca, Parang, San Vicente East, Sta. Cruz, and Sto. Nino recorded elevated EC values. This was attributed to the addition of leachable salts. Also, the recorded TDS values suggested probable impaired palatability by having values significantly below the recommended range of 600−1000 ppm. Heavy metals analysis showed that only Fe detected in multiple GW samples had concentration above the WHO and PNSDW maximum permitted levels. This condition presents health concerns to the consumers. The record on Fe concentration in Brgy. Gutad GW samples were above WHO and PNSDW limit. Other GW samples recorded target metals concentration within the WHO and PNSDW permissible limits. The spot sampling analysis results showed that the generated maps by hybrid technique such as NN-PSO+EBK were reliable in describing the heavy metal concentration in the city of Calapan based on its MSE, R and AIC values.

This study is useful as a reference to providing techniques on gathering data for GW quality monitoring to help attain SDGs 6. It is suggested to conduct a study targeting other metals and regular monitoring of its concentration using this hybrid MLGI technique. Additionally, the regular monitoring is necessary to better understanding of the possible health consequences. Furthermore, a health risk assessment based on GW quality should be conducted. Another, preliminary interventions on GW quality control is necessary.

**Author Contributions:** Conceptualization, D.B.S.; methodology, K.L.M.d.J. and D.B.S.; software, K.L.M.d.J. and D.B.S.; validation, D.B.S., L.C.M., K.S.E. and E.M.D.A.; formal analysis, K.L.M.d.J. and D.B.S.; investigation, L.C.M., E.M.D.A., K.S.E. and D.B.S.; resources, D.B.S.; data curation, D.B.S., L.C.M., E.M.D.A., K.S.E. and E.B.C.; writing—original draft preparation, K.L.M.d.J., L.C.M. and E.M.D.A.; writing—review and editing, E.B.C., D.B.S. and K.L.M.d.J.; visualization, K.L.M.d.J.; supervision, D.B.S.; project administration, D.B.S.; funding acquisition, D.B.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Philippine Council for Health Research and Development of the Department of Science and Technology, Philippines.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** All data are contained in the manuscript.

**Acknowledgments:** This is to recognize the 'in-kind' support of Mapua University, Manila, Philippines, and the Calapan City local government units. In addition, the Philippines Mines and Geosciences Bureau by providing the hazard maps for MIMAROPA region.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

**Table A1.** Observed Physicochemical Properties of Groundwater Samples.


#### **Appendix B**

**Figure A1.** R Value Plots for Validation and Testing Phase of the NN-PSO: (**a**) Temperature; (**b**) pH; (**c**) EC; (**d**) TDS; (**e**) Ba; (**f**) Cu; (**g**) Fe; (**h**) Mn; (**i**) Zn.

#### **Appendix C**


**Table A2.** Spot Sampling Results in each Barangay of Calapan City.

#### **References**


### *Article* **Chaotic Characteristic Analysis of Vibration Response of Pumping Station Pipeline Using Improved Variational Mode Decomposition Method**

**Li Jiang 1,2, Zhenyue Ma 1, Jianwei Zhang 2,\*, Mohd Yawar Ali Khan 3, Mengran Cheng <sup>2</sup> and Libin Wang <sup>2</sup>**


**Abstract:** The measured vibrational responses of the pumping station pipeline in the irrigation site were chosen to confirm the chaotic characteristics of the pumping station pipeline vibration and to determine the vibrational excitation that makes it chaotic. First, the chaotic properties of the pipeline vibration responses were investigated using a saturation correlation dimension and the maximum Lyapunov exponent. The vibration excitation with chaotic features was obtained using an improved variational mode decomposition (IVMD) method to examine the multi-time-scale chaotic characteristics of the pipeline vibration responses. The results show that the vibrational responses of each measuring point of the pipeline under different operating conditions have clear chaotic characteristics, where the chaotic characteristics of the axial points and bifurcated pipe points are relatively strong. The vibration of the operating conditions and measurement points affected by the unit's operation and flow state change is further complicated. The intrinsic mode function (IMF) produces a low-dimensional chaotic attractor after the IVMD disrupts the vibration response. Still, the vibration excitation of the remaining components on behalf of the units does not have chaotic properties, implying that water pulsation excitation makes the pumping station pipeline vibrations chaotic. The vibration excitation caused by the unit's operation covers the chaotic characteristics of the pipeline vibration and increases its uncertainty. The outcomes of this study provide a theoretical basis for further exploration of the vibration characteristics of pumping station pipelines, and a new method of chaos analysis is proposed.

**Keywords:** pumping station pipeline; chaotic characteristic; IVMD; vibration response; correlation dimension; Lyapunov exponent

#### **1. Introduction**

High-lift pumping stations and water-diversion irrigation areas have been built in many water-deficient areas due to the continuous development of electric water-lifting equipment and water-diversion irrigation technology in China. These projects have created enormous economic, ecological and social benefits. Thus, ensuring their safe and stable operation is the main task of modernising and developing water conservation in China [1]. Natural and human forces create varying degrees of pipeline vibration during long-term operation at pumping stations [2]. Long-term irregular pipe vibration will lead to the loosening of the pipelines and their auxiliary system, causing catastrophic damage in severe cases [3]. Therefore, it is of great research interest to analyse the vibration characteristics of the pumping station pipeline to avoid its adverse vibrations.

Chaos is a unique mechanical phenomenon in the vibration of strongly nonlinear structures. Most researchers believe that the vibrations of pipelines are weakly nonlinear,

**Citation:** Jiang, L.; Ma, Z.; Zhang, J.; Khan, M.Y.A.; Cheng, M.; Wang, L. Chaotic Characteristic Analysis of Vibration Response of Pumping Station Pipeline Using Improved Variational Mode Decomposition Method. *Appl. Sci.* **2021**, *11*, 8864. https://doi.org/10.3390/app 11198864

Academic Editors: Amit Kumar, Santosh Subhash Palmate and Rituraj Shukla

Received: 18 August 2021 Accepted: 16 September 2021 Published: 23 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

so they mainly focus on studying chaotic phenomena caused by the flow of water and other excitation sources such as flow. There is a scarcity of research on the chaotic processes in the pipeline itself when researching nonlinear problems. It is found that the chaotic phenomena of pipelines do not depend solely on the strength of structural nonlinearity; for some weak nonlinear or even linear structures, chaos occurs [4]. PaïDoussis studied the dynamics of a cantilever pipeline with nonlinear constraints and constant internal flow, which discovered the chaotic motion of the system [5]. Tang obtained the chaotic characteristics of the transport pipeline by increasing the nonlinear force and found that the occurrence of chaos is mainly affected by the flow velocity in the pipeline [6]. B.G. Sinir investigated the nonlinear vibrations of slightly curved pipes that transport fluid with constant velocity [7]. The periodic and chaotic movements have been observed in the transverse vibrations of slightly curved pipes transporting fluid. Zhao analysed the chaotic phenomenon in the pipeline vibration caused by the flow pulsation excitation under thermal load and then obtained the relationship between the frequency response and flow velocity [8].

Research on the chaotic characteristics of pipeline systems mainly focuses on oil-gas pipelines and the mathematical models of pipelines with specific nonlinear constraints. In contrast, the chaotic characteristics of pumping station pipeline systems are rarely studied. Most of the previous research achievements have only analysed the chaotic characteristics of the vibration system but have not further explored the vibrational excitation that caused the chaos. In this paper, the measured vibration responses of the pumping station pipeline in an irrigation area are taken as the research objective. The chaotic characteristics of the vibratory responses of the pumping station pipeline under different working conditions are analysed by using the saturation correlation dimension and the largest Lyapunov exponent. In addition, the IVMD method is used to decompose the vibration responses of the measurement points under typical working conditions. The chaotic characteristics of the IMFs are analysed to obtain the vibratory excitation that causes the chaotic characteristics of the pumping station pipeline.

#### **2. Theoretical Aspects**

#### *2.1. Identification Method of Chaotic Characteristics*

There are many methods for identifying chaotic characteristics, which are roughly divided into qualitative and quantitative analysis. Orbit observation, Poincare surface of section, and power spectral analysis are examples of qualitative approaches [9–11]. These methods are feasible and straightforward, but they are limited in determining whether the system has chaotic characteristics and cannot perform transverse comparisons under different operating conditions. Quantitative methods, such as the saturation correlation dimension method [12] and the largest Lyapunov exponent method [13], can reflect the vibration complexity and the degree of chaos under different conditions by comparing the values of the parameters. The saturation correlation dimension and the largest Lyapunov exponent are chosen as the chaotic identification indexes of pipeline vibration responses in pumping stations to improve the trustworthiness of the result.

#### 2.1.1. Saturation Correlation Dimension

The correlation dimension characterises the compactness of a dynamic system and is used to reflect the system's complexity. When the saturation correlation dimension is fractional, the system is said to have chaotic properties. For an m-dimensional phase space, its correlation function can be defined as follows:

$$C(r) = \lim\_{M \to \infty} \frac{2}{M(M-1)} \sum\_{1 \le i \le j \le M} H\left(r - ||Y\_i - Y\_j||\right) \tag{1}$$

where *M* = *N* − (*m* − 1)*τ* is the number of phase points, *H*(*u*) is the Heaviside function, *N* is the time series; *r* is the vector point in the time series; *M* is the embedding dimension; Tau is the time delay; *Y* is the reconstruction vector.

When the time series is chaotic, for the positive, the relationship between the correlation function *C*(*r*) and *r* is

$$C(r) \propto ar^{D\_2} \tag{2}$$

where *α* is a constant, *D*<sup>2</sup> is the correlation dimension which can be obtained by the slope of the log2 *C*(*r*) ∼ log2 *r* curve, that is

$$D\_2 = \lim\_{r \to 0} \frac{\log\_2 \mathbb{C}(r)}{\log\_2 r} \tag{3}$$

Due to the noise in the measured signal, the embedding dimension is generally controlled to rise gradually. The apparent straight line segments in the log2 *C*(*r*) ∼ log2 *r* curve are fitted using the least square method for each embedding dimension. The slope of each segment increases with the rising embedding dimension, and eventually reaches saturation, the saturation correlation dimension.

#### 2.1.2. Largest Lyapunov Exponent

The Lyapunov exponent determines the chaotic characteristics of the system based on the diffusion of the phase trajectory. Generally, the direction represented by the positive Lyapunov exponent supports the attractors. In contrast, the contraction direction corresponding to the negative Lyapunov exponent contributes to the attractor dimension's fractional part after counteracting the expansion direction's effect. Thus, the positive Lyapunov exponent is a prominent feature of chaos. Suppose *λ*<sup>1</sup> as the largest Lyapunov exponent of a system, then the chaotic components of the system can be found if *λ*<sup>1</sup> is positive, and its value reflects the chaos degree.

Rosenstein [14] proposed the small data sets for computing *λ*1. Its basic steps are as follows: Choose suitable *τ* and *m* to reconstruct the phase space and find the nearest neighbor point *Y*ˆ*<sup>i</sup>* of each *Yi* in the phase space. Short separation limitation is as follows:

$$d\_{\hat{l}}(0) = \min\_{\hat{l}} ||Y\_{\hat{l}} - Y\_{\hat{l}}|| (\left| \hat{l} - \hat{l} \right| > p) \tag{4}$$

where *p* is the average period of time series, *i* is the vector in space, ˆ*i* is the vector of the nearest neighbor of the second vector.

Define the distance of *Yi*+*<sup>j</sup>* and *Y*ˆ*i*+*<sup>j</sup>* as

$$d\_{\hat{\imath}}(j) = \|\mathcal{Y}\_{\hat{\imath}+\hat{\jmath}} - \mathcal{Y}\_{\hat{\imath}+\hat{\jmath}}\|\tag{5}$$

where *<sup>j</sup>* <sup>=</sup> 0, 1, 2, ··· , min *<sup>M</sup>* <sup>−</sup> *<sup>i</sup>*, *<sup>M</sup>* <sup>−</sup> <sup>ˆ</sup>*<sup>i</sup>* .

For each *j*, compute the ln *di*(*j*) average as follows:

$$y(i) = \frac{1}{q\Delta t} \sum\_{i=1}^{q} \ln d\_i(j) \tag{6}$$

where *q* is the number of nonzero ln *di*(*j*). The slope of the regression line made by the least square method is *λ*1.

#### *2.2. Improved Variational Mode Decomposition (IVMD)*

Variational mode decomposition (VMD) is a new method of multi-component adaptive signal decomposition [15]. Compared to traditional signal decomposition methods, it effectively avoids modal aliasing and over-decomposition defects and has a higher utilization value [16]. VMD comprises two processes comprising the establishment of variational constraints and iteration to find the optimal solution. The specific operation process is as follows: VMD decomposes a given signal *f* into *K* modal functions using variational constraints *mk*(*t*). The bandwidth of each IMF is limited, and each IMF is distributed around the central pulsating frequency. The variational constraint model is as follows [17]:

$$\begin{cases} \min\_{\begin{subarray}{c}m\_{k},w\_{k}\end{subarray}} \left\{ \sum\_{k} \left\| \left. \partial\_{t} \right| \left( \sigma(t) + \frac{j}{\pi t} \right) m\_{k}(t) \right\| \varepsilon^{-jw\_{k}t} \left\| \right\|\_{2}^{2} \right\} \\ \text{s.t.} \sum\_{k} m\_{k} = f \end{cases} \tag{7}$$

where {*mk*} represents the decomposed K IMF components, {*mk*} = {*m*1,*m*2, ··· , *mk*}; *σ*(*t*) is a pulse function; {*wk*} is the central frequency of each IMF, {*wk*} = {*w*1, ... *wk*}.

To complete the adaptive decomposition of input signals *f* and to obtain the IMFs with the minimum sum of bandwidth, the following expanded Lagrange expression is introduced:

$$L(m\_k, m\_k, \lambda) = a \sum\_k \left\| \delta(t) \left[ \left( \delta(t) + \frac{j}{\pi \mathbf{t}} \right) m\_k(t) \right] e^{-j m\_k t} \right\|\_2^2 + \left\| f(t) - \sum\_k m\_k(t) \right\|\_2^2 + \left< \lambda(t), f(t) - \sum\_k m\_k(t) \right> \tag{8}$$

where *α* is the penalty factor to ensure the accuracy of signal reconstruction; *λ*(*t*) is a Lagrange multiplier used to strengthen the constraint; represents the inner product operation.

To solve the above variational constraint problem, the dual decomposition and alternate direction multiplication sub-algorithm are used [18]. Keep updating *mk*, *wk* and *λ*(*t*) to find the saddle point of Equation (8), that is, the optimal solution of Equation (7). The modal component function *mk* and the central frequency *wk* are

$$m\_k^{n+1}(w) = \frac{f(w) - \sum\_{i \neq k} m\_i(w) + \frac{\lambda(w)}{2}}{1 + 2a(w - w\_k)^2} \tag{9}$$

$$w\_k^{n+1} = \frac{\int\_0^\infty w \left| m\_k(w) \right|^2 dw}{\int\_0^\infty \left| m\_k(w) \right|^2 dw} \tag{10}$$

$$
\lambda^{n+1} = \lambda^n + \tau \left( f(w) - \sum\_{k} m\_k^{n+1}(w) \right) \tag{11}
$$

When VMD decomposes the vibration response sequence, determining the total modal number is a crucial step. The selection of modal parameters *K* greatly affects the accuracy of the results [19]. A parameter *K* is usually challenging to determine. If *K* is greater than the number of useful components obtained by signal decomposition, information superposition will occur; if *K* is smaller than it, a part of the limited bandwidth of the solid modulus cannot be decomposed. An IVMD method based on the mutual information (MI) method is proposed for *K* selection.

MI reflects the correlation between two random variables and allows better identification of the degree of correlation [15]. MI is as follows:

$$I(X,Y) = H(Y) - H(X|Y) \tag{12}$$

where *H*(*Y*) is the entropy of *Y*, and *H*(*Y*|*X* ) is the conditional entropy of *Y* when *X* is known. When *I*(*X*,*Y*) = 0, *X* and *Y* are independent of each other.

The mutual information *I*<sup>k</sup> of the original signal and each IMF obtained by the IVMD decomposition is calculated and normalised by Equation (13). Then the correlation between each modal component and the original signal is judged, that is, whether the original signal is completely decomposed.

$$\sigma\_{\text{l}} = \frac{I\_{\text{l}}}{\max(I\_{\text{l}})} \tag{13}$$

where *σ*<sup>i</sup> is the normalized mutual information value of each IMF, i =1, 2, ...k. Refer to reference [20], when *σ*<sup>i</sup> is less than 0.02, it is considered that the IMF does not contain valid feature information. The original signal has been decomposed completely.

The specific algorithm for adaptive determination of K using MI method is as follows:

Step 1: Initialize *n* = *n* + 1, assign *K* = 1;

Step 2: *K* = *K* + 1, perform outer circulation;

Step 3: Initialize *m*<sup>1</sup> *<sup>k</sup>*, *<sup>w</sup>*<sup>1</sup> *<sup>k</sup>* , *<sup>λ</sup>*<sup>1</sup> and *<sup>n</sup>*, assign *<sup>n</sup>* = 0;

Step 4: Order *n* = *n* + 1 to execute the inner loop;

Step 5: For all *w* ≥ 0, according to Equations (9) and (10), *mk* and *wk* are updated, respectively;

Step 6: Update *λ* according to Equation (11);

Step 7: For a given discriminate accuracy *e* > 0, if the iteration condition ∑ *<sup>m</sup>n*+<sup>1</sup> *<sup>k</sup>* <sup>−</sup>*m<sup>n</sup> k* 2 2 *<sup>m</sup><sup>n</sup> k* 2 < e

*k*

2

is satisfied, the process is terminated, otherwise loop step 2 to step 6;

Step 8: Circulate step 2 to 7 until the set threshold *σ* is greater than the normalized mutual information *σ*i, that is, if *I*(*f* − ∑ *mk*, *f*) < *σ*, end the cycle.

The flow chart of the above calculation steps is shown in Figure 1.

**Figure 1.** Flow chart for adaptive determination of K.

#### **3. Chaotic Characteristics Analysis of Pipeline Vibration Response**

The pipe material is stainless steel. Model 891–2 vibration sensors are used in the test, which are divided into four grades: small speed, medium speed, large speed and acceleration. So speed sensors are used in this test. Taking the No. 2 pressure pipeline of the Jingdian Project pumping station No. 3 as a research objective, the No. 4 and No. 5 units of a 1200S–56 horizontal centrifugal pump are connected with the branch pipe. Six measurement points are selected on the main pipe and two branches of the pipeline. Each point is equipped with vibration sensors in X, Y and Z directions. The measuring points are arranged as shown in Figure 2.

**Figure 2.** Layout of pipeline measuring points (**a**) Field test of pipeline, and (**b**) Measuring points layout (Note: 1~18 is the sensor number).

In the prototype test, four working conditions were selected to collect the vibration responses of the pipeline. The descriptions of each working condition, sampling time and sampling frequency are shown in Table 1.

**Table 1.** Four working conditions.


The velocity-time history of points under typical conditions is shown in Figure 3. The chaotic characteristic analysis of the vibration responses under different conditions is carried out as follows:

**Figure 3.** Velocity time history of points under typical conditions (**a**) Z-axis vibration of point 1 under condition 2, and (**b**) Z-axis vibration of point 1 under condition 4.

First, the reconstruction of the phase space of the time series is performed, that is, the calculation of the time delay *τ* and embedding dimension *m*. The CAO method essentially uses the minimum error method to determine the embedding dimension, which was proposed by Liangyue Cao in 1997. This paper calculates *τ* by the autocorrelation function method and chooses the CAO method to obtain *m* [21]. The calculation process of *τ* and *m* is illustrated by taking the Z-axis vibration of point 1 under condition 4 as an example.

In the process of calculating *τ* by the autocorrelation function method, when the value drops to 1–1/e of the initial value, the corresponding time delay is *τ*. The result of the autocorrelation function is shown in Figure 4.

**Figure 4.** *τ* Calculation of point 1 Z-axis vibration under condition 4.

After obtaining *τ*, the embedding dimension is determined by the CAO method. *E*1(*m*) represents the minimum embedding dimension. *E*2(*m*) represents the characteristics of time series. When *E*1(*m*) obviously no longer changes with the increase, and the *E*2(*m*) value tends towards 1, the corresponding *m* is the optimal embedding dimension. From Figure 5, we can see that the optimal embedding dimension *m* of point 1 Z-axis vibration under condition 4 is 11.

**Figure 5.** *m* Calculation of point 1 Z-axis vibration under condition 4.

The G-P algorithm [22] and the small data sets are chosen to calculate the saturation correlation dimension and the largest Lyapunov exponent. Two types of indexes are used to analyse the chaotic characteristics of time series.

The G-P algorithm is a chaotic eigenvalue calculation method proposed by Grassberger and Procaccia to calculate the saturation correlation dimension *D*2.

The embedding dimension is selected as *m* = 2, 4, 6, ··· , 20 and the *τ* has been calculated above. According to the correlation function relation in Equation (3), the log2 *C*(*r*) ∼ log2 *r* double logarithmic relation graph of different *m* is plotted, respectively. The slope fitted by the near line segment of the curve is the correlation dimension under the corresponding embedding dimension. As the embedding dimension increases, it is the saturation dimension *D*<sup>2</sup> when the correlation dimension reaches saturation. Figure 6 is the diagram representing the calculation of the saturation correlation dimension of specific points.

**Figure 6.** Calculation of point 1 z-axis vibration under condition 4 (**a**) Double logarithmic curve, (**b**) Slope of double logarithmic curve, and (**c**) Relation between *D*<sup>2</sup> and m.

To reveal the distribution law of the saturation correlation dimension, the *D*<sup>2</sup> variation curves of points in each direction under different conditions are shown in Figure 7.

**Figure 7.** Correlation dimension curves of points in different directions (**a**) X-axis points, (**b**) Y-axis points, and (**c**) Z-axis points.

As can be seen from Figure 7:


To verify the validity of the above analysis results, the chaotic characteristics of the pumping station pipeline are further analyzed by using the largest Lyapunov exponent *λ*1. According to the time delay *τ* and embedding dimension *m*, the small data sets calculate the largest Lyapunov exponent. Figure 8 is the *λ*<sup>1</sup> calculation diagram of typical points, and the value of the separation factor y(*i*) tends to be stable after nearly linear growth. The linear slope is adjusted by the least square method, and the value is *λ*1. The *λ*<sup>1</sup> of each point in different vibration directions are shown in Figure 9.

**Figure 8.** Calculation diagram of typical points (**a**) Point 3 Y-axis vibration under condition 1, and (**b**) Point 1 Z-axis vibration under condition 4.

**Figure 9.** Largest Lyapunov exponent curves of points in different directions (**a**) X-axis points (**b**) Y-axis points, and (**c**) Z-axis points.

As shown in Figure 9:


The above analysis is complementary to the calculation results of the saturation correlation dimension *D*2, which further confirms that the unit's operation and flow state changes greatly impact the chaotic characteristics of the pumping station pipeline.

#### **4. The Analysis of Multi-Time-Scale Chaotic Characteristics Based on IVMD**

The vibration characteristics of the pumping station pipeline are different from those of the general pipeline, which is mainly reflected in the influence of the pumping station unit on the vibration of the connecting pipeline. The vibration sources are primarily composed of low-frequency water pulsations caused by the pipeline flow and blade frequency, rotation frequency and frequency doubling produced by the unit's operation [23].

Taking the vibration response of the specific point (point 1 Z-axis vibration under condition 4) as an example, the spectrum analysis is shown in Figure 10. Concerning the author's previous article [23,24], 20, 40 and 60 Hz are the blade frequency, the rotation frequency and the frequency doubling, respectively, and 0.5 Hz is the low-frequency water pulsation. Spectrum analysis shows that the frequency band of the vibration excitation caused by water pulsation (0.5 Hz) is relatively wide. The wide-peak power spectrum is the typical characteristic of the chaotic system [25,26]. The pipeline vibration excitation produced by the unit's operation (20 Hz, 40 Hz, and 60 Hz) corresponds to the peak power spectrum and has high periodicity. Therefore, it is speculated that the chaotic characteristics of the pipeline are mainly caused by water pulsation, while the unit vibration masks the chaotic characteristics of the pump station pipeline.

The excitation components of different time scales must be effectively separated to clarify the vibration excitation with chaotic characteristics. As a new signal decomposition method, IVMD can adaptively decompose a signal into a series of IMFs with different scale characteristics. Therefore, the IVMD method is used to identify the vibration excitation that causes the chaotic characteristics of the pipeline.

The multi-time scale chaotic vibration response characteristics of the specific point (point 1 Z-axis vibration under condition 4) are analysed.

**Figure 10.** Spectrogram of point 1 Z-axis vibration response under condition 4 (**a**) Holistic drawing, and (**b**) Partial enlarged drawing.

The modal parameters *K* of IVMD are determined as 4 by the MI method. Four IMFs are obtained by the IVMD decomposition of point 1 Z-axis vibration response under condition 4. Figure 11 is the time history of decomposed IMFs.

**Figure 11.** Time histories of IMFs decomposed by IVMD.

Mutual information value: IMF1 is 1.000, IMF2 is 0.025, IMF3 is 0.038, IMF4 is 0.0661. It can be seen that the normalized mutual information values of the IMFs are all above the threshold of 0.02, which meets the decomposition requirements. Figure 11 shows that IVMD can sequentially decompose the original vibrational response to obtain four IMFs with increasing frequency. The frequencies from IMF1 to IMF4 correspond to four major frequency bands in the original response spectrum: 0.5, 20, 40 and 60 Hz, respectively, and the decomposition effect is improved. Then the chaotic characteristics of the decomposed IMFs are analysed using the saturation correlation dimension and the largest Lyapunov exponent.

The calculation process of typical IMF chaotic eigenvalues is shown in Figure 12. From Figure 12a,b the saturation correlation dimension *D*<sup>2</sup> of IMF1 is 1.115, and the largest Lyapunov exponent *λ*<sup>1</sup> is 0.0774, indicating that IMF1 has prominent chaotic characteristics. The near-linear region of the *D*<sup>2</sup> logarithmic curve cannot be found from IMF2 to IMF4; these components have no chaotic characteristics. Due to space limitations, only the slope of the IMF2 double logarithmic curve is given in Figure 12c.

**Figure 12.** Calculation of chaotic eigenvalues of typical IMFs (**a**) Relation between D2 and m of IMF1 (**b**) *λ*<sup>1</sup> Separation factor function of IMF1 (**c**) Slope of double logarithmic curve of IMF2.

By comparing the results of the chaotic eigenvalues of the IMFs with those of the vibrational response before decomposition, it can be concluded that:


#### **5. Conclusions**

(1) Comparing the saturation correlation dimension *D*<sup>2</sup> among the vibration responses of the pumping station pipeline under different conditions, the *D*<sup>2</sup> of the measuring points are distributed in the range of 1.156–5.283, and all are fractions, which show that the vibration of the pumping station pipeline has chaotic characteristics. The

axial vibration of the pipeline presents a chaotic attractor with a lower dimension (1.156~2.569), and the vibration form is relatively simple. At the same time, the *D*<sup>2</sup> of conditions and points which are greatly affected by the unit's operation have a larger value (3.021~5.283), and the vibration form is more complex;


In this paper, the chaotic characteristics of the vibration system of the pumping station pipeline are shown by the analysis of the measured vibration responses, and the chaotic excitation is found by combination with IVMD, which provides a theoretical basis for the complete description of the vibration characteristics of the pumping station pipeline. A new way of chaotic characteristics analysis based on IVMD decomposition is also proposed.

**Author Contributions:** Conceptualization, L.J. and Z.M.; methodology, L.J.; software, J.Z. and L.W.; validation, L.J. and J.Z.; formal analysis, J.Z. and M.C.; investigation, L.J.; resources, Z.M.; data curation, J.Z.; writing—original draft preparation, L.J.; writing—review and editing, M.Y.A.K.; visualization, L.J.; supervision, Z.M.; funding acquisition, J.Z. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by Program for Science & Water conservancy science and technology innovation project in GuangDong Province, grant number 2020-18.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data provided in this study are available from the corresponding author.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

IVMD improved variational mode decomposition IMF intrinsic mode function

#### **References**


**Boyu Du 1,2,3, Xiaoqian Guo 1,2,\* , Guwang Liu 1,2, Anjian Wang 1,2,\*, Hongmei Duan <sup>4</sup> and Shaobo Guo 1,2,3**

	- Beijing 100010, China; duanhm@cugb.edu.cn

**Abstract:** As the most populous country in the world, China has a great shortage pressure of water resources. With the acceleration of urbanization, China's water usage in different sectors will change significantly in next few years. In order to investigate the main reasons behind water usage change in China, the Logarithmic Mean Divisia Index (LMDI) model was adopted in this paper from 2000 to 2020 with provincial data. Three effects, including that of technology, industrial structure, and regional scale, were analyzed. In addition, the decoupling effect between water usage and economic growth was also considered. The results show that: (1) from 2000 to 2020, the technological effect, industrial structure effect, and regional scale effect are −376.54, −89.85 and 20.66, respectively; (2) the technical effect and industrial structure effect have the greatest impact on primary industry, followed by secondary industry; (3) the technical effect is greater than the industrial structure effect in most provinces; and (4) the decoupling state gradually changes from weak decoupling to strong decoupling. In the future, the key policy recommendations for water saving are the following: (1) technological innovation has the most efficient effect on the reduction of water usage in China, and (2) the optimization of industrial structure can be helpful in water-saving in the future.

**Keywords:** water intensity; LMDI model; Tapio model; technical effect; industrial structure effect; regional scale effect

#### **1. Introduction**

China is one of the countries with the most serious water shortage pressures in the world [1–3]. Besides, the weak awareness of water saving, uneven distribution of water resources, rapid population growth, increasing water usage of residents, and climate change have all aggravated the tensions surrounding water resources in China [4,5]. With the rapid development of modern industry and the accelerating process of urbanization, the demand for water resources in different sectors will change greatly, and access to water will become an important factor restricting China's economic development [6,7].

In the last 20 years, China's water usage structure has changed significantly in line with economic development. The water usage in China has been divided into three industries. The primary industry category mainly includes agriculture, forestry, and animal husbandry and fisheries. The secondary industry category mainly refers to mining, manufacturing, and construction. The tertiary industries include everything not contained within the primary and secondary industries, including the service industry, transportation, accommodation and catering, finance, real estate, culture and sports, public administration, and social security [8,9]. The water usage in primary industry showed a downward trend, and

**Citation:** Du, B.; Guo, X.; Liu, G.; Wang, A.; Duan, H.; Guo, S. China's Water Intensity Factor Decomposition and Water Usage Decoupling Analysis. *Appl. Sci.* **2022**, *12*, 7039. https://doi.org/10.3390/ app12147039

Academic Editors: Rituraj Shukla, Amit Kumar and Santosh Subhash Palmate

Received: 20 June 2022 Accepted: 10 July 2022 Published: 12 July 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

the water usage in secondary industry increased first and then decreased [10–12]. Water usage in tertiary industry continued to rise at a rapidly increasing rate, which was 15% in 2020 [13,14]. Therefore, it is of great significance for the sustainable management of water resources to investigate the driving factors behind China's water intensity [15].

A series of publications have calculated the single resource intensity at national level, the city level, and certain industry levels. Some researchers even considered the resource intensity of overall resources in the world or in a certain country.

A series of publications regarding driving factors in different resources have been conducted by previous researchers, including structural decomposition analysis (SDA), granger causality test, and the Logarithmic Mean Divisia Index (LMDI) model. Most of the existing studies using SDA were based on the monetary I–O tables, which require a considerable amount of sector data, and the research scope is mainly in a national level [16–18]. While the Granger causality test is only a statistical estimation, not a real causality, which cannot be used as the basis for affirming or denying causality [19–21]. Therefore, the LMDI model was adopted in this paper.

The LMDI model is a factor decomposition method that does not generate residual error [22]. The application of LMDI is essentially for resources and the environment, such as carbon emissions, energy, land, and water resources [23–28]. At present, LMDI research on water resources has calculated the driving factors at national level or the city level [29–35]. The LMDI method in water resources research was mainly broken down into population scale effect, economic development effect, domestic intensity effect, production intensity, and industrial structural effect [36]. However, as the world's most populous country with serious water usage pressure, there is a shortage of research on the drivers of change in China's water intensity at industrial level with provincial data.

Based on the data of water usage, GDP, and the added value of various industries in different provinces of China from 2000 to 2020, the LMDI model was used to analyze the potential factors affecting the change of water usage within various industries. Three effects, including the technical effect, industrial structure effect, and regional scale effect will be adopted. Besides, these three effects will be applied in each province and each industry in China. The decomposition model can measure the contribution of various factors to water intensity, while it cannot directly measure the decoupling state between economic growth and water usage, and the actual decoupling situation under different policies [37]. The Tapio method is then used for decoupling analysis between water usage and GDP. Finally, the most efficient water-saving methods will also be discussed.

#### **2. Materials and Methods**

#### *2.1. Logarithmetic Mean Divisia Index Model*

In order to analyze the influence of technological progress, regional scale, industrial structure, and other factors on the water usage change of different industries in China, it is beneficial to analyze the driving factors of water usage in China by using the LMDI proposed by Ang [38]. This method has the advantages of zero value and complete decomposition, and can be completely decomposed [39–41].

Water usage index can be expressed by absolute quantity and relative quantity. The absolute quantity refers to total water usage, and the relative quantity refers to the water usage per unit of economic output, that is, water intensity. It reflects the utilization efficiency of water resources, which is influenced by economic growth, technological progress, industrial structure, regional scale, and policy factors. According to the definition of water intensity, it can be expressed as:

$$w = \sum\_{i} \sum\_{j} \frac{\mathcal{W}\_{ij}}{\mathcal{G}\_{ij}} \tag{1}$$

where *w* is the water intensity (cubic meters/10 thousand CNY); *Wij* is the water usage (cubic meters) of the *j*th industry in the ith province; *Gij* is the gross output value (10 thousand CNY) of the *j*th industry in the ith province.

According to LMDI analysis framework, by analyzing the influence of each effect on water intensity, we can construct Equation (2) as follows:

$$w = \sum\_{i} \sum\_{j} \frac{\mathcal{W}\_{ij}}{\mathcal{G}} = \sum\_{i} \sum\_{j} \frac{\mathcal{W}\_{ij}}{\mathcal{G}\_{ij}} \times \frac{\mathcal{G}\_{ij}}{\mathcal{G}\_{i}} \times \frac{\mathcal{G}\_{i}}{\mathcal{G}} = \sum\_{i} \sum\_{j} q\_{ij} r\_{ij} s\_{i} \tag{2}$$

where *w* is the water intensity, *Wij* is the total water usage of the *j*th industry in the *i*th province, *Gij* is the gross national product of the *j*th industry in the *i*th province, *Gi* is the gross product of the *i*th province, *G* is the gross domestic product, *qij* is the water intensity of the *j*th industry in the *i*th province, *rij* is the proportion of the gross product of the *j*th industry in the gross product of the *i*th province, *si* is the ratio of GDP of the *i*th province to total GDP.

Therefore, the total effect formula of water intensity is:

$$
\Delta tot = \Delta t + \Delta u + \Delta v \tag{3}
$$

where Δ*tot* is the total effect, that is, the sum of all effects, indicating the total change of water intensity; Δ*t* refers to the technical effect, indicating the contribution of the change of resource utilization efficiency caused by technological progress to the total change of water intensity; Δ*u* refers to the industrial structure effect, indicating the contribution of industrial structure adjustment to the total change of water intensity; Δ*v* is the regional scale effect, which indicates the contribution of the ratio of regional economic output to GDP to the total change of water intensity.

The contribution of each effect is expressed as follows:

$$\Delta t = \sum\_{i} \sum\_{j} \frac{\frac{W\_{ij}^T}{G^T} - \frac{W\_{ij}^0}{G^0}}{\ln(\frac{W\_{ij}^T}{G^T}) - \ln(\frac{W\_{ij}^0}{G^0})} \ln(\frac{q\_{ij}^T}{q\_{ij}^0}) \tag{4}$$

$$\Delta\mu = \sum\_{i} \sum\_{j} \frac{\frac{W\_{ij}^{T}}{G^{T}} - \frac{W\_{ij}^{0}}{G^{0}}}{\ln(\frac{W\_{ij}^{T}}{G^{T}}) - \ln(\frac{W\_{ij}^{0}}{G^{0}})} \ln(\frac{r\_{ij}^{T}}{r\_{ij}^{0}}) \tag{5}$$

$$
\Delta v = \sum\_{i} \sum\_{j} \frac{\frac{W\_{ij}^{T}}{G^{T}} - \frac{W\_{ij}^{0}}{G^{0}}}{\ln\left(\frac{W\_{ij}^{T}}{G^{T}}\right) - \ln\left(\frac{W\_{ij}^{0}}{G^{0}}\right)} \ln\left(\frac{s\_{ij}^{T}}{s\_{ij}^{0}}\right) \tag{6}
$$

The contribution rates of the three effects to the change of water intensity are Δ*t*/Δ*tot*, Δ*u*/Δ*tot*, and Δ*v*/Δ*tot*, respectively. When the positive and negative impacts of each effect are consistent with the total effect, it shows that this effect has a positive impact on the reduction of water intensity, and vice versa.

#### *2.2. Decoupling Model*

The decomposition model can be used to study the contribution of various factors to the change of water usage intensity, but it cannot directly measure the decoupling state between economy and water usage [22]. Therefore, the Tapio decoupling model is adopted [42–44], and the decomposition model of water usage is as follows:

$$\mathcal{W} = \sum\_{i} \sum\_{j} \mathcal{W}\_{\bar{i}\bar{j}} = \sum\_{i} \sum\_{j} \frac{\mathcal{W}\_{\bar{i}\bar{j}}}{\mathcal{G}\_{\bar{i}\bar{j}}} \times \frac{\mathcal{G}\_{\bar{i}\bar{j}}}{\mathcal{G}\_{\bar{i}}} \times \frac{\mathcal{G}\_{\bar{i}}}{\mathcal{G}} \times \mathcal{G} \tag{7}$$

So as to decompose the changes of water usage into:

$$
\Delta TOT = \sum\_{i} \sum\_{j} \Delta T + \Delta Ul + \Delta V + \Delta Q \tag{8}
$$

Among them, Δ*TOT* is the total effect of water usage, Δ*T* is the technical effect, Δ*U* is the effect of industrial structure, Δ*V* is the effect of regional scale, and Δ*Q* is the effect of output scale. The contribution of each effect is as follows:

$$
\Delta T = \sum\_{i} \sum\_{j} \frac{\mathcal{W}\_{ij}^T - \mathcal{W}\_{ij}^0}{\ln \mathcal{W}\_{ij}^T - \ln \mathcal{W}\_{ij}^0} \ln(\frac{\mathcal{W}\_{ij}^T / G\_{ij}^T}{\mathcal{W}\_{ij}^0 / G\_{ij}^0}) \tag{9}
$$

$$
\Delta U = \sum\_{i} \sum\_{j} \frac{\mathcal{W}\_{ij}^T - \mathcal{W}\_{ij}^0}{\ln \mathcal{W}\_{ij}^T - \ln \mathcal{W}\_{ij}^0} \ln(\frac{\mathcal{G}\_{ij}^T / \mathcal{G}\_i^T}{\mathcal{G}\_{ij}^0 / \mathcal{G}\_i^0}) \tag{10}
$$

$$
\Delta V = \sum\_{i} \sum\_{j} \frac{\mathcal{W}\_{ij}^T - \mathcal{W}\_{ij}^0}{\ln \mathcal{W}\_{ij}^T - \ln \mathcal{W}\_{ij}^0} \ln(\frac{G\_i^T / G^T}{G\_i^0 / G^0}) \tag{11}
$$

$$
\Delta Q = \sum\_{i} \sum\_{j} \frac{\mathcal{W}\_{ij}^T - \mathcal{W}\_{ij}^0}{\ln \mathcal{W}\_{ij}^T - \ln \mathcal{W}\_{ij}^0} \ln \frac{\mathbf{G}^T}{\mathbf{G}^0} \tag{12}
$$

Decoupling elasticity index is used to discuss the decoupling relationship between economic growth and water usage. The elastic coefficient of GDP water usage is calculated as follows:

$$
\omega(\mathcal{W}\_\prime \mathcal{G}) = \frac{\Delta \mathcal{W} / \mathcal{W}}{\Delta \mathcal{G} / \mathcal{G}} \tag{13}
$$

The types of decoupling can essentially be divided into coupling, decoupling, and negative decoupling. In addition, according to the elasticity coefficient, the change of water usage and the change of GDP, the decoupling types can be subdivided into eight cases (Table 1) [45].


**Table 1.** Types of Tapio models.

The decoupling elasticity index can be used to calculate the decoupling relationship between economic growth and water usage, but it cannot help to investigate the specific factors that affect the decoupling state. The LMDI model can be used to analyze the influence of various factors on water usage, but it cannot be used to analyze the decoupling effect between economic growth and water usage. Combining the LMDI model with the Tapio decoupling model, a decoupling effort index model is constructed:

$$
\Delta W\_{\rm LS} = \Delta TOT - \sum\_{i} \sum\_{j} \Delta Q = \sum\_{i} \sum\_{j} \Delta T + \Delta U + \Delta V \tag{14}
$$

where Δ*WUS* indicates the government's efforts to save water, and refers to various measures taken by the government to reduce water usage in the process of economic development, such as improving production technology, adjusting industrial structure, and expanding regional scale.

The decoupling effort indicators are constructed as follows:

$$D\_i = -\frac{\Delta W\_{US}}{\sum\_i \sum\_j \Delta Q} \tag{15}$$

where *Di* is the total decoupling effect of water usage. When *Di* > 1, it indicates a strong decoupling effect. When *Di* < 1, it indicates a weak decoupling effect. When *Di* < 0, it means there is no decoupling effect.

#### *2.3. Date*

The data used in this study are the water usage and industrial added value of three major industries in each province of China from 2000 to 2020. All the data in this paper come from *the Water Resources Bulletin* issued by China's Ministry of Water Resources from 2000 to 2020 and the National Bureau of Statistics [46,47].

From 2000 to 2013, China's total annual water usage increased from 549.752 billion cubic meters to 618.394 billion cubic meters, before the water usage showed a decreasing trend. The water usage of primary industry contributes most to the total water usage and remains stable with approximately 400 billion cubic meters per year. While secondary industry is a more minor user of water in China, and it has a trend of first increasing and then decreasing. The water usage of tertiary industry continues to rise, from 57.492 billion cubic meters in 2000 to 86.310 billion cubic meters in 2020 (Figure 1).

**Figure 1.** Water usage and proportion of various industries in China from 2000 to 2020.

The added value in each industry of the past 20 years is shown in Figure 2. China's economy maintains a high speed of development from 2000 to 2020, so the added value in each industry increases continually. The fastest growth occurs in tertiary industry, with an average annual growth rate of 0.07%, which demonstrates that China's economy has gradually shifted into tertiary industry.

**Figure 2.** China's added value of various industries from 2000 to 2020.

#### **3. Results**

*3.1. Water Intensity and Factor Decomposition Analysis*

3.1.1. Analysis of Decomposition Effect in Each Year

According to Equations (3)–(6), three effects and their respective contribution rates from 2000 to 2020 are shown in Figure 3. The total effect of each year is negative, indicating that the water intensity is decreasing year by year, signaling water saving considerations. The total effect from 2002 to 2003 was the smallest, with value of −64.50.

**Figure 3.** Effects and contribution rates in China from 2000 to 2020.

Technical effects in the last 20 years are negative and the technical effect contribution rate is the largest among three effects, indicating it has an inhibitory impact on the water

intensity, while technological innovation is the most effective measure for water saving. The technical effect fluctuated greatly, with its largest value from 2002 to 2003 of −59.91 and highest contribution rate in 2001 of 2481.29%. For the industrial structure effect, it was negative except for 1.62 in 2003 and 0.38 in 2020, meaning that it restricted the water intensity in most years. As for for the regional scale effect, it fluctuated greatly from 2000 to 2001, reaching 39.49, and it was stable with values between −1 and 2 from 2003 to 2020.

The three effects in each industry were also explored in China through the LMDI in Figure 4. The technical effects are all negative for the three industries, which means that the water intensity of the three industries all declined with technological innovation. It also fluctuated greatly before 2011, with the largest absolute value of −36.24, −12.52 and −11.15, respectively, during 2002 to 2003, then it tended to be flat. Besides, it fluctuated most within primary industry, due to the largest proportion of China's primary industry in current water usage structure.

**Figure 4.** Decomposition analysis of China's water intensity in each industry from 2000 to 2020.

The industrial structure effect on the three major industries has different characteristics. In primary industry, it has increased from −18.21 to 2.63, shifting from a restriction effect to a promoting effect from 2018 to 2020. In secondary industry, it changed from strong promotion to weak promotion, and finally into a restriction effect, which is mainly attributed to the intensive management of industrial development with the increasing industry output. In tertiary industry, it remains essentially unchanged. Therefore, the industrial structure effect has restricted the water usage in China, indicating the industrial transformation in China has impacted on water usage reduction in the last 20 years.

#### 3.1.2. Analysis of Decomposition Effect in Each Province

According to Equations (3)–(6), the three effects in each province are calculated in Table 2. The technical effect in each province is negative with the increasing absolute value, and it means that the technical effect in each province in China has been generally improved. Besides, due to the highest average value of −12.15 in these three effects, the technical effect is a decisive factor to promote the decline of water intensity. Among all the provinces, Xinjiang have the greatest inhibitory effect with values of −34.05, and Tianjin has the smallest inhibitory effect with values of −1.20. The industrial structure effect is also negative excepted for Anhui Province in the studied areas with values between −0.21 and −10.35, and its absolute value is smaller than the technology effect. This indicates that the industrial structure transformation has taken effect.


**Table 2.** Three effects in each province in China.

The decomposition analysis of water intensity in each province from 2000 to 2020 is also obtained in Table 3 based on Equations (3)–(6). The technical effects of all industries in each province are negative, which is consistent with Table 2, indicating the restraining effect on water usage. The value of primary industry in most provinces is the smallest, with values between −31.70 and −0.06, followed by secondary industry and tertiary industry. Because the water usage of primary industry accounts for the largest proportion of the total water usage in China, the technological progress of primary industry plays a significant role. The efficiency of technological progress in secondary industry is higher than that in tertiary industry. Moreover, the industrial structure effects of primary industry are basically negative, and it has both positive and negative values in secondary industry, with an almost positive effect on tertiary industry, indicating the greater effect of industrial transition on primary industry than that in secondary or tertiary industry. In addition, the provinces with a positive industrial structure effect of secondary industry are typically underdeveloped areas, such as Tibet, Inner Mongolia, and Qinghai, which also shows that industrial transition in underdeveloped areas needs to be improved.

#### *3.2. The Decoupling Effect of Water Usage*

#### 3.2.1. Decoupling Elasticity Index

In this paper, the elastic index of decoupling analysis between economic growth and water usage in China from 2000 to 2020 is calculated and divided into four stages (Table 4).


**Table 3.** Effects of various industries in various provinces.

In these four stages, the relationships between water usage and economic growth in all provinces are decoupled, indicating the water usage is not related with the development of China's economy. There are 13 strong decoupling provinces in the first stage, with 8 in the second stage, 15 in the third stage and 23 in the fourth stage, respectively. Besides, the weak decoupling status in most provinces has gradually changed into strong decoupling status. The increasing trends of decoupling provinces in different stages is due to the gradually improvement of water efficiency with economic development. In recent years, corresponding policies in China have been issued to improve water efficiency, such as *the National Water Conservation Action Plan* and *the Water Pollution Prevention Action Plan*, and the task of water conservation has been officially put into *the 13th Five-Year Plan*, which illustrates the Chinese government's determination on the issue of water saving.

From a regional perspective, Beijing, Yunnan, and Qinghai Province have changed from strong decoupling in the first stage to weak decoupling later. The four stages of Inner Mongolia are all weak decoupling, which means that the economic development quality in water resources in these regions still need to be improved. Hebei and Ningxia Province are strongly decoupled in the four stages, which shows that the popularization of water conservation policies in these two regions is relatively effective and should be maintained. East China, such as Shanghai, Zhejiang, Jiangsu, Anhui, and Fujian; South China, such as Guangdong, Guangxi, and Hainan; and Southwest China, such as Guizhou, Sichuan, and Chongqing, have all changed from weak decoupling at first stage to strong decoupling later, meaning the areas with relatively abundant water resources are more likely to improve the local decoupling state and achieve high-quality economic development.


**Table 4.** Decoupling index and state of water usage and economic growth in each province.

3.2.2. Decoupling Effort Index

The decoupling effort index is used to measure the decoupling status between economic growth and water usage (Table 5).

**Table 5.** Decoupling effort index of China's water usage from 2000 to 2020.



**Table 5.** *Cont.*

From the perspective of contributions of these three effects, the technology effect has the greatest influence on the total decoupling effect, with the maximum absolute value of 998.16, which is bigger than the corresponding industrial structure effect with values of 66.43. This shows that technological innovation is an important measure to realize the decoupling of economic development and water usage. The influence of the regional scale effect is smallest, but it plays a driving role in most periods.

#### **4. Conclusions and Implications**

This research focused on the investigation of driving factors behind water usage intensity in China from 2000 to 2020, and the identification of decoupling status between water usage and economics. The LMDI model and Tapio model were applied jointly. The results show that:

(1) from 2000 to 2020, the technological effect, industrial structure effect, and regional scale effect are −376.54, −89.85, and 20.66, respectively. The technical effect is from −59.91 to −4.05, and the industrial structure effect is from −17.11 to 1.62, indicating these two effects constrained the increase of water usage intensity. The regional scale effect was stable with values between −1 and 2. From the perspectives of different industries, each effect has the greatest impact on primary industry, followed by secondary industry, and finally tertiary industry.

(2) From the perspective of different provinces, the development of technology and the adjustment of industrial structure have promoted the decline of water intensity. The technological effect varies in different provinces. For example, Tianjin has the value of −1.20, while Xinjiang has values of −34.05. The industrial structure effect is smaller, with the largest value of −0.21 in Qinghai and the smallest being −10.35 in Jiangsu. The technology effect is greater than the industrial structure effect, except for in Anhui Province. When the effects in each industry in different provinces were explored, the technical effect is largest in primary industry in most areas, and the industrial structure effect of primary industry is negative, with positive values in tertiary industry.

(3) The decoupling status for most provinces in China have gradually improved, from weak decoupling to strong decoupling. The technical effect is the main factor towards promoting the decoupling effect, followed by the industrial structure effect.

Therefore, two implications could be put forward. Firstly, technological innovation is the most efficient effect on the reduction of water usage intensity in China with the proliferation of water-saving facilities, and it is still the most efficient policy in China in the near future. Secondly, the optimization of industrial structure is helpful in water-saving in China, but it still needs to be strengthened.

**Author Contributions:** All the authors (B.D., X.G., G.L., A.W., H.D. and S.G.) have made substantial contributions to this article. Conceptualization, B.D. and X.G.; methodology, B.D. and X.G.; validation and data curation, B.D. and X.G.; writing-original draft preparation, B.D.; Guiding opinions, X.G., A.W., G.L., H.D. and S.G. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by National Natural Science Foundation of China, grant numbers: 72088101, 71991485, 71991480. The APC was funded by grant number 72088101.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** This research is supported by Institute of Mineral Resources, Chinese Academy of Geological Sciences.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Article* **Effects of the Operational Parameters in a Coupled Process of Electrocoagulation and Advanced Oxidation in the Removal of Turbidity in Wastewater from a Curtember**

**Paul Alcocer-Meneses <sup>1</sup> , Angel Britaldo Cabrera-Salazar 1, Juan Taumaturgo Medina-Collana <sup>1</sup> , Jimmy Aurelio Rosales-Huamani 2,\* , Elmar Javier Franco-Gonzales <sup>2</sup> and Gladis Enith Reyna-Mendoza <sup>1</sup>**


**Citation:** Alcocer-Meneses, P.; Cabrera-Salazar, A.B.; Medina-Collana, J.T.; Rosales-Huamani, J.A.; Franco-Gonzales, E.J.; Reyna-Mendoza, G.E. Effects of the Operational Parameters in a Coupled Process of Electrocoagulation and Advanced Oxidation in the Removal of Turbidity in Wastewater from a Curtember. *Appl. Sci.* **2022**, *12*, 8158. https://doi.org/10.3390/ app12168158

Academic Editors: Amit Kumar, Santosh Subhash Palmate and Rituraj Shukla

Received: 28 May 2022 Accepted: 14 July 2022 Published: 15 August 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

**Abstract:** The tannery industry during its process generates various polluting substances such as organic matter from the skin and chemical inputs, producing wastewater with a high concentration of turbidity. The objective of this research is to evaluate the most appropriate operational parameters of the coupled process of electrocoagulation and advanced oxidation to achieve the removal of turbidity in wastewater from a tannery in the riparian zone (tannery). This process uses a direct current source between perforated aluminum electrodes of circular geometry submerged in the effluent, which causes the dissolution of the aluminum plates. For our study, an electrocoagulation unit coupled to an ozone generator has been built at the laboratory level, where the influence of five factors (voltage, inlet flow to the reactor, initial turbidity, pH, and ozone flow) has been studied with three levels with regarding turbidity, using the Taguchi experimental methodology. The optimal conditions for the removal of turbidity were obtained at 10 volts, 7.5 pH, 360 L/h of wastewater recirculation flow rate; 2400 mg/h of ozone flow rate; and 1130 NTU of initial turbidity of the sample in 60 min of treatment reaching a removal of 99.75% of the turbidity. Under optimal conditions, the removal of chemical oxygen demand (COD) and biochemical oxygen demand (BOD) was determined, reaching a removal percentage of 33.2% of COD and 39.36% of BOD was achieved. Likewise, the degree of biodegradability of the organic load obtained increased from 0.467 to 0.553.

**Keywords:** electrocoagulation; tannery effluent; ozonation; optimization; turbidity removal; Taguchi

### **1. Introduction**

The leather trade can be an economic problem for developing countries that produce a good type of product from animal leather, such as footwear, luggage, and clothing. However, its production has a terribly high environmental footprint [1]. In addition, considering the enormous quantity and low biodegradability of the chemical products present in the productive cycle of tannery work, the wastewater from said process represents a great environmental and technological inconvenience [2]. In [3], the authors mentioned that the Electrocoagulation (EC) is often considered as an alternate treatment methodology with several advantages such as easy instrumentality, simple operation and automation, a brief retention time, low sludge production, and no chemical necessities.

Other studies have stated that Electrocoagulation mixtures and alternative technologies have been designed to treat high concentration organic waste material such as the textile trade and mixed industries [4,5]. In [6], mentioned that the use of EC as the only treatment process could face serious practical limitations, especially if the wastewater is highly contaminated. Therefore, there is a need for an efficient and relatively inexpensive treatment process. Due to this, the use of a post- or pre-treatment process with the EC

will improve its performance, as mentioned by several studies that have described more profitable combined treatment systems [6,7].

The authors of [8] published a review that includes EC combined with other treatment processes such as: electrocoagulation–ozone, electrocoagulation–adsorption, electrocoagulation– ultrasound, and electrocoagulation–pulses. In his work, the authors also mention about the performance of these combined systems.

According to [9] Electrocoagulation (EC) is used in chemical science water treatment techniques where anode electrodes (aluminum, Al, or iron, Fe) area unit are dissolved in place, which promote coagulation and succeeding removals of pollutants and also the concurrent reduction of turbidity from water and wastewater. EC relies on the physical– chemical method of destabilization of mixture systems below the action of a right away current [10].

The electrodes dissolve according to Equations (1) and (2) to provide coagulant metal ions (Al3<sup>+</sup> or Fe2+/Fe2−) into the water, and these instantaneously carries rapid hydrolysis.

Anode reactions:

$$\text{Al}\_{\text{(s)}} \rightarrow \text{Al}^{3+}\_{\text{(aq)}} + 3e^- \quad E^0 = 1.66 \text{ V} \tag{1}$$

$$\text{Fe}\_{\text{(s)}} \rightarrow \text{Fe}^{3+}\_{\text{(aq)}} + 3e^- \quad E^0 = 0.04 \text{ V} \tag{2}$$

When the anode potential is sufficiently high, secondary reactions may occur, especially oxygen evolution, according to Equation (3)

$$2\text{H}\_2\text{O}\_{(l)} \rightarrow \text{O}\_{2(g)} + 4\text{H}^+\_{(aq)} + 4e^- \quad E^0 = 1.66\text{ V} \tag{3}$$

Simultaneously with the anode reaction, water molecules H2O break down at the cathode, producing hydrogen gas H2 and OH− ions, according to Equation (4).

Cathode reaction:

$$\text{H}\_2\text{O}\_{\text{(l)}} + 2e^- \rightarrow \text{H}\_{2(g)} + \text{OH}^-\_{\text{(aq)}} \quad E^0 = 1.66\text{ V} \tag{4}$$

The electrical energy applied to the anode dissolves the aluminum into the solution which then reacts with the hydroxyl ion from the cathode to form aluminum hydroxy. The most significant advantage of electrocoagulation is avoiding any addition of chemical substances thus reducing the likelihood of secondary pollution; the dosing of coagulator depends on the cell potential (or current density) applied [11]. Other advantages are the simple equipment, so requiring less maintenance and straightforward automation of the method [12].

Standard treatments for cloudiness removal have many disadvantages, such as the use of enormous amounts of chemicals and generating large amounts of sludge that causes disposal issues and therefore the loss of water. Then in [13] mentioned that the combination of ultrasound technique with different processes such as electrocoagulation, electro-Fenton, and electrooxidation could be important to achieve effective decomposition of organic contaminants in wastewater. Independently in [14] mentioned the integrated sonoelectro-Fenton (SEF) method could be a novel methodology for the removal of paracetamol (PCT) waste material from liquid solutions through synthesized iron ore (Fe2O3) nanoparticles.

The novelty of our study was the design of the electrocoagulation cell with perforated plates installed vertically, improving the mixture of ozone with the residual water and the ions generated by the electrodes. In this way, reducing areas of stagnation in the electrocoagulation cell that produce passivation of the electrodes, causing a decrease in the efficiency of the process.

The objective of this study was to examine the treatment of wastewater from the tanning industry, through the electrocoagulation process, the impact of the factors electrical potential, feed flow, initial concentration of turbidity, pH, and ozone flow on the percentage reduction of turbidity and energy consumption, based on the Taguchi methodology.

#### **2. Materials and Methods**

#### *2.1. Effluent Sample Collection*

The samples were collected from the operations corresponding to the riparian zone (pre-soaking, main soaking, peeling, descaling, and purging or delivery), from the tannery located in the district of Ate Vitarte, Lima (Peru). Each sample was collected and then homogenized and allowed to stand for 3 h. These samples came from a process of transformation of sheep skins preserved with salt, with hair destruction technology.

A part of the sample was sent to a specialized laboratory, applying the corresponding monitoring protocols to know the physicochemical characteristics, as illustrated in Table 1.


**Table 1.** Some of the physicochemical characteristics of the wastewater of the riparian zone.

#### *2.2. Analytical Methods*

The turbidity was measured by Ezodo model TUB-430, turbidimeter, to determine the pH, conductivity and total dissolved solids, the Multiparameter equipment (pH, EC, TDS, T ◦C), HANNA brand was used. To determine the voltage and current intensity, the Digital Hook Multimeter (amps, voltage, temperature, etc.) was used.

#### *2.3. Design of Experiment*

The optimization of wastewater turbidity removal using Aluminum electrodes was performed using the Taguchi Design. Five important factors such as voltage, feed flow, effluent concentration, pH, and ozone flow were used as independent variables where their combined effects were examined, while the percentage of turbidity removal was the dependent variable.

This was performed to determine the best conditions for the optimum removal of turbidity from the wastewater. The experimental design involves varying the independent variable at three different levels (−1, 0, +1). The experimental range and levels of the independent variables are presented in Table 2. In this work, a set of 27 experiments with two replicates, the mean shown in Table 3. Where the levels of the applied electrical potential were acquired from the work developed by [15] and to select the pH range the research work provided by [16] was taken.

The interactive effects of the independent (process) variables on the dependent variable (response) were examined using the analysis of variance (ANOVA) as shown in Table 4.


**Table 2.** Experimental range and levels of independent variables used in this study.

**Table 3.** Presents the results of the 27 experiments carried out using the Taguchi methodology of five factors at three levels under study.



**Table 4.** Analysis of variance (ANOVA).

#### *2.4. Electrocoagulation Reactor*

The EC experiments were performed by a batch process using a 7 L capacity of a cylindrical reactor, the configuration (Figure 1) of the electrochemical reactor has a cylindrical shape, aluminum electrodes were used both for the anode and for the circular cathode (Perforated plates), we work with a configuration of parallel monopolar electrodes, with a separation of 1 cm as mentioned in [17–19], and the specific area of each electrode was 0.014 cm2. Each electrode was 10 cm (diameter) with 10 holes of 10 mm diameter each, by 0.3 cm (thickness), the number of electrodes used were four. The EC cell was configured for the vertical water flow of the feed water that was delivered by a peristaltic pump. Accessory (ACC) power supply was connected (0–15 volts). Before installation in the EC unit, each plate was weighed to allow the calculation of the mass consumed after the tests. Each experiment was continued for 60 min, which was considered enough to achieve a stable operation. Ozone was coupled to the system by means of venturi, the ozone generating equipment has a capacity of (0 to3gO3/h).

All experiments were performed at room temperature (nominally 20 ◦C). After the seating time elapsed, the samples were removed from a depth of 2 cm using a syringe and measured using the turbidity meter. The electrodes were cleaned in a solution of low concentration hydrochloric acid (0.04 M) and another caustic soda solution (0.08 M) to remove the remains stuck on the surface of the electrodes; they were finally washed with distilled water for reuse. The arrangement of the electrodes consisted of two cathodes that were interspersed with two anodes connected by stainless steel rods to other arranged and then the samples were periodically taken every 10 min for the measurement of turbidity. The power was supplied to the electrodes with a Direct Current (DC) power supply.

An improvement over other reported works [15,20,21] is the configuration of the experimental equipment used. In this investigation, an electrocoagulation cell with perforated circular electrodes has been built. This design allows for improved mixing, longer residence time for the effluent and ozone. therefore the mechanisms used in this hybrid process are improved such as sedimentation [15,22]. A disadvantage compared to other configurations of electrocoagulation cells is the maintenance of the electrodes, which is relatively easy.

**Figure 1.** Electrocoagulation and ozone experimental module. (**A**) Photograph of the experimental module doing preliminary tests. The sample is fed to tank 1, followed by the sample being pumped through the flow meter, followed by the Venturi system, dynamic mixer until reaching the electrolytic reactor, once the system is filled again the sample returns to the tank. (**B**) Module diagram, where 1 is the deposit; 2, 3, 5, 6, 8, and 14 stopcocks; 4 recirculation pump; 7 flow meter; 9 Venturi; 10 ozone generator; 11 dynamic mixer; 12 electrocoagulation reactor and 13 current rectifier.

#### *2.5. The Main Calculations of Electrocoagulation Process*

The reduction rate of turbidity, expressed in percentage "*T*" (%), was calculated using Equation (5).

$$T(\%) = \left(\frac{T\_i - T\_f}{T\_i}\right) \times 100\% \tag{5}$$

where *Ti* and *Tf* represent initial and final turbidity, respectively. Electrical energy consumption is a very important economical parameter in the electrocoagulation process. The electrical energy consumption was calculated using the following Equation (6) [23].

$$C.E. = \frac{U}{Vm} \int\_0^{3600} I(t)dt\tag{6}$$

*C*.*E*. is the energy consumption (kWh/m3)

*U* is the applied voltage (V)

*Vm* is the treated volume of the sample (L).

The integral represents the intensity value multiplied with time in seconds.

The amount of dissolved electrode was calculated theoretically using Faraday's law [24], through the following Equation (7).

$$m = \frac{M}{nF} \int\_{\text{t. inicial}}^{\text{t. final}} I(t)dt\tag{7}$$

*m* is the aluminum mass (g) in the electrolytic cell

*I* is the intensity of the current (A)

*t* is the electrocoagulation time (s), *M* is the molecular weight of the anode (g/mol) *z* is The chemical equivalence, *F* is the faraday constant (96,500 c/mol)

(*MAl* = 26,982 g/mol)

*n* is the valence of the ions of the electrode material (*nAl* = 3.0).

#### **3. Results**

The results of our experiment are shown below. Complementing the results of Table 3, the standard deviation has been evaluated with respect to the mean of the percentage of turbidity removal and Energy consumption whose results are shown in Table 5. Then we show the physicochemical parameters obtained in Table 6.

**Table 5.** Standard deviation of percent turbidity removal and energy consumption and Energy consumption.



**Table 6.** Results of the physicochemical characterization of the treated sample.

#### *3.1. Main Effect of Variables*

The main effect plots for the six operating variables are given in Figure 2. The main effects of the tested variables were calculated by averaging the experiment results achieved at each level for each variable. This plot was obtained from Table 3 and is used to visualize the relation between variables and the output response.

**Figure 2.** The effects of the operating variables on the mean turbidity removal percentage.

#### 3.1.1. Comparison of Ozonation, Electrocoagulation, and Ozone-Assisted Electrocoagulation

From Figure 3 we observe that for the initial turbidity of 655 NTU of the sample, when the process is hybrid (electrocoagulation and ozone), a turbidity of 4.19 NTU is reached (99.36% turbidity removal). In the electrocoagulation process, a turbidity of 18.34 NTU (97.2%) is obtained and through ozone up to 196.6 NTU (69.98%) is reached, therefore it is concluded that the hybrid and electrocoagulation process reach yields above 97% for removal of turbidity. We also observed that the removal of turbidity in the three processes is achieved in the first 20 initial minutes of treatment. In the work of [25] indicated that the combined electrocoagulation/ozonation process improved both the degradation rate and the maximum removal of COD compared to the electrocoagulation and ozonation processes alone.

**Figure 3.** Turbidity reduction for separate processes such as pure ozone, electrocoagulation and coupled process of EC/O3, operated at conditions of 10.0 volts, feed flow 360 L/h, O3 flow 2400 mg/h, pH 6.89, and initial turbidity of 655 NTU.

#### 3.1.2. Initial pH Effect

From Table 3, trials 1, 4, 10, 20, 24, and 27 have been plotted as they are the most representative. Then it is observed from Figure 4 that for experiments 1 and 10 the pH of the sample increases with the treatment time. For experience 27, a pH of 8.21 is reached and is attributed mainly to the increase in electrical potential (10 volts). When the initial pH is 7.5 for experiences 4 and 20, the increase is not very significant, reaching a final value of 8.54. Finally, when the sample has an initial pH of 10.8 in tests 10 and 24, a decrease is observed, reaching a value of 9.21.

The tannery industry generates effluents with a wide pH range, from pH = 3.5 to pH = 11; on the other hand, studies show that pH has a significant impact on electrocoagulation performance. The increase in pH is a consequence of the formation of Al3<sup>+</sup> which precipitates due to the presence of other anions, as well as the precipitation of aluminum hydroxide; however when the pH starts at alkaline, the decrease in pH is the result of the formation of Al(OH)−<sup>1</sup> <sup>4</sup> [26].

#### 3.1.3. Effect of Initial Turbidity

According to Figure 5B, a greater reduction in turbidity is observed as the initial turbidity is less than 1130 NTU, this behavior could be explained because the amount of flocs formed is sufficient for their adsorption and thus quickly decrease turbidity. This trend is also deduced from Faraday's law, which states that Al3<sup>+</sup> released to the solution for the same applied solution is constant [27].

**Figure 4.** Variation of the pH in the time of treatment by electrocoagulation/O3.

**Figure 5.** Representation of the effect of operational variables on % reduction in turbidity, energy consumed in the electrolytic cell, and operational cost of the module. (**A**) Variation of the pH and voltage variables against % turbidity for fixed values of 300 L/h, 752.5 NTU, and 1650 mg/h. (**B**) Variability of pH and initial turbidity versus % turbidity for fixed values of 6.5 volts, 300 L/h and 1650 mg/h of O3 flow. (**C**) pH and feed flow variability versus % turbidity for fixed values of 6.5 volts, 752.5 NTU, and 1650 mg/h O3 flow. (**D**) Ozone and pH flow variability versus % turbidity for fixed values of 6.5 volts, 300 L/h, and 752.5 NTU. (**E**) Variability of initial turbidity and voltage versus energy consumption in the electrocoagulation cell for fixed values of 300 L/h, 7.4 pH, and 1650 mg/h of O3 flow. (**F**) Variability of voltage and initial turbidity against the cost of the built module for fixed values of 300 L/h, 7.4 pH, and 1650 mg/h of O3 flow.

The proposed mechanism for the reduction of turbidity by means of the hybrid system of electrocoagulation and ozone is shown in Figure 6. This consists of destabilizing the colloidal particles and forming larger flocs, in which the contaminants are trapped and these flocs can be separated from the solution by flotation or sedimentation [28]. The dissolved air flotation mechanism is effective in reducing the organic load [29] and dissolved ozone flotation gives efficient results in the removal of suspended solids [30,31]. For soluble contaminants, aluminum-based coagulants can act as catalysts for ozone and generate hydroxyl formation [22] and also oxidize surface functional groups of colloidal contaminants that promote colloid aggregation.

**Figure 6.** Mechanism of hydroxyl formation.

3.1.4. Feed Flow Effect

According to Figure 5C, as the feed flow increases (240 to 360 L/h) there is an increase in the reduction of turbidity, this could be attributed as the feed flow increases towards the reactor, there is a greater formation of bubbles, this is influenced by the principle of

hydrodynamic cavitation that forms in the Venturi tube [32]. As a consequence, the flotation mechanism predominates to reduce turbidity, this formation of bubbles increases when working under acidic conditions, forming two phases (80% foam and 20% liquid) [33]. However, this generation of bubbles generates a problem in the electrodes (activation polarization), generating an increase in voltage and a decrease in electrical current, thus an increase in energy consumption [34].

#### 3.1.5. Ozone Flow Effect

Ozone flow is one of the factors that has the least influence on reducing turbidity, as can be seen in Figure 5D. Furthermore, in Figure 2, we observe that the mass flow of ozone does not have much influence on the removal of turbidity. In [35], mentioned that for the activation of ozone and its transformation into hydroxide ion (OH−), it is achieved through electroreduction, which in this case would help in the oxidation either directly or indirectly to the components present in the effluent (organic matter, nitrates, sulfides, etc.). To oxidize the sulfur, ozone is an alternative to the traditional ions (Fe2+, O2, etc.), as verified in the research work [36].

#### **4. Discussion**

When evaluating the five operational parameters against the reduction of turbidity according to Figure 2, it is shown that the factor with the greatest influence is the voltage, corroborating it in Table 4 of ANOVA due to its greater contribution with respect to the other parameters. By increasing the potential values from 4 to 10 volts as seen in Figure 5A, it was possible to increase the percentage of turbidity reduction reaching 56.83% and 100%, a growing effect in the elimination of turbidity. This originated effect is analogous to those reported in [37], where they worked at 6, 8.5 and 10 volts, for one hour of treatment on grey water, reaching a reduction of 68%, 73%, and 86% respectively.

On the other hand, the effect on removal is due to the increase in particle size as a function of time, studied by [38], where he reported that in a synthetic sample of kaolinite, the size formed is affected as the voltage and time are increased, allowing the generation of a higher sedimentation rate of the particles.

This ascending effect of the voltage on the turbidity can also be seen in the report presented by [39], they worked in the range of 2.9 to 11.7 mA/cm2, for a time of 14 min on water. residues from car washes, achieving close to a 96% reduction in turbidity. On the other hand, the work presented by [15], also reported the influence of the applied potential on turbidity, where they evaluated 4 voltage levels for a period of 15 min such as: 2, 5, 10, and 15 V, achieving a reduction 83% for voltage 2 and 92% for 15 volts; therefore, as stated [40], the applied voltage is an influential and important parameter. As a main step, it ensures the production of Al3<sup>+</sup> ion coagulants as a result of electrolytic oxidation of the electrode. Table 7 shows the results.


**Table 7.** Effect of the applied potential difference on the removal of the turbidity.

From Table 3 we generate Figure 7 which shows the effect of the process parameters with respect to energy consumption in kWh/m3. From said figure we observe that the average energy consumption in the 27 experiments was 0.5 kWh/m3.

In addition, the factor with the greatest influence was the electrical potential applied to the electrocoagulation cell, as indicated by the diagram, the lowest energy consumption (0.069) was obtained with the electrical potential at 4 volts and the highest energy consumption (0.94) was obtained at an electrical potential of 10 volts. Likewise, it is observed that turbidity has a significant influence on energy consumption at the high level, 0.376 kWh/m3 is consumed, whose value is below the average.

In the study carried out by [15], about the reduction of turbidity and chromium content in tannery wastewater by electrocoagulation process using aluminum electrodes at an electrical potential of 10 volts, pH of 6.1, and a time of 90 min. The authors obtained an energy consumption of 1.5 kWh/m3, which is quite close to that obtained in our present study.

**Figure 7.** Effect of process parameters on specific energy consumption.

#### **5. Conclusions**

The coupled process of electrocoagulation with ozone was successfully tested in the treatment of wastewater from a tannery from the riparian zone. Parameters such as applied voltage potential, feed flow, initial turbidity concentration, pH, and ozone flux were studied on the percentage of turbidity reduction and energy consumption in the electrocoagulation cell. It was found that parameters have the greatest influence on turbidity reduction and the effects separately of each process such as ozone, electrocoagulation and ozone-assisted electrocoagulation on turbidity.

The result showed that the factor that has the greatest influence on reducing turbidity is voltage. The present study showed that the coupled electrocoagulation and ozone system reduced more turbidity than the processes alone. The optimal conditions for the removal of turbidity, Chemical Oxygen Demand (COD) and Biochemical Oxygen Demand (BOD) were obtained at 10 volts, 7.5 pH, 360 L/h of wastewater recirculation flow, 2400 mg/h of ozone flow, and 1130 NTU of initial turbidity of the sample in 60 min of treatment. Finally, under these conditions, a removal of 99.75% of turbidity, 33.2% of COD, and 39.36% of BOD was achieved. Likewise, the degree of biodegradability of the organic load obtained increased from 0.467 to 0.553.

**Author Contributions:** Conceptualization, P.A.-M.; Data curation, A.B.C.-S.; Investigation, J.A.R.-H.; Project administration, A.B.C.-S.; Software, J.T.M.-C.; Validation, P.A.-M.; Writing—original draft, E.J.F.-G.; Writing—review & editing, G.E.R.-M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflicts of interest.

#### **References**


### *Article* **Low-Cost RSAC and Adsorption Characteristics in the Removal of Copper Ions from Wastewater**

**Yan Liu, Qin Chen and Rajendra Prasad Singh \***

School of Civil Engineering, Southeast University, Nanjing 210096, China; liuyian@seu.edu.cn (Y.L.); 220191095@seu.edu.cn (Q.C.)

**\*** Correspondence: rajupsc@seu.edu.cn

**Abstract:** Adsorption is a typical method for treating copper-containing wastewater. Fly ash and steel slag both have a good adsorption performance, and activated clay is added in this study, too. In this study, the performance of residue and soil adsorption composite (RSAC) particles for copper ion adsorption was discussed through the substrate ratio and the influence mechanism, to achieve the win–win effect of industrial waste reuse and copper ion wastewater treatment. The results indicated that adsorption time, dosage, initial copper ion concentration, coexisting ions, and temperature showed different effects on the adsorption, respectively. Additionally, the adsorption kinetic study showed the removal of copper ions by adsorption of RSAC particles was in accordance with quasiprimary kinetic model and quasi-secondary kinetic model. The adsorption thermodynamics study shows the adsorption process of ΔG<sup>0</sup> < 0, ΔH0 > 0 and ΔS<sup>0</sup> > 0, indicating that the process of copper ion adsorption by RSAC particles was spontaneous, heat-absorbing, and entropy-increasing. The research demonstrates that RSAC particles have a certain adsorption capacity for copper ion.

**Keywords:** water pollution; adsorption; copper ions; adsorption mechanism; adsorption kinetics; thermodynamics

#### **1. Introduction**

Water contamination through heavy metal ions is an environmental problem of great concern [1]. Adsorption is one of the most efficient methods to remove noxious heavy metal ions, especially for wastewater with large volumes and low heavy metal ion concentrations [2]. Adsorption is spontaneous and the basic principle is that the surface energy of substances could change the concentration at the phase interface. Adsorption usually relies on some adsorbent materials with a large specific surface area and a high surface energy to remove heavy metal ions [3,4]. Adsorption has two major advantages: the reaction rate is fast, and no other reagents are needed. Therefore, adsorption is regarded as an important and promising method for addressing heavy metal ions such as copper in wastewater.

The key issue of adsorption is the adsorbent. Adsorbents with good adsorption performance have such qualities as: a fast adsorption reaction rate, stable physical and chemical properties, good solid–liquid separation, an economical cost, easy recovery and regeneration, and reusability [5]. However, industrial adsorbents could not meet all of these qualities. Therefore, the core of the adsorption method focusing on treating wastewater with heavy metal ions is to find efficient adsorbents at a low cost [6,7].

Currently, common adsorbents in the water treatment domain include activated carbon, fly ash, etc. Activated carbon has a large surface area, fast filtration rate, stable structure, large adsorption capacity, a wide range of applications, and good adsorption performance. However, activated carbon has a short service life, high sludge treatment cost, and low recycling performance [8,9]. Fly ash, a waste product from power plants that use coal as the main fuel, has a loose and porous structure. It can intercept pollutant molecules and bind pollutants to the active sites on the surface by a chemical bond, resulting in

**Citation:** Liu, Y.; Chen, Q.; Singh, R.P. Low-Cost RSAC and Adsorption Characteristics in the Removal of Copper Ions from Wastewater. *Appl. Sci.* **2022**, *12*, 5612. https://doi.org/ 10.3390/app12115612

Academic Editors: Amit Kumar, Santosh Subhash Palmate and Rituraj Shukla

Received: 29 April 2022 Accepted: 28 May 2022 Published: 1 June 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

excellent adsorption [10]. Whereas, Andersson et al. [11] have also considered fly ash a low-cost material for adsorption. It has been found that steel slag has a good adsorption effect on copper, nickel and zinc ions, which mainly relies on the generation of hydroxide complexes [12]. Under optimal conditions, the adsorption efficiency of modified steel slag for uranium was 98% [13]. Activated white clay is an adsorbent made from clay minerals by inorganic acidification and other means, and dried by water rinsing. It is mainly made of bentonite clay as a raw material; its appearance is presented as a milky white powder, which is a non-toxic, tasteless, odorless, strong adsorption; and it can adsorb colored substances and organic matter [14]. Bentonite is considered to be an excellent adsorbent for Cu2+, and the maximum adsorption capacity was 248.9 mg/g [15].

The residue and soil adsorption composite (RSAC) particle consists of solid waste and natural minerals. The raw materials used in the preparation of RSAC granules are fly ash, steel slag, activated white clay, bonding agent, and porogenic agents. The main raw materials are fly ash, steel slag, and activated white clay, the first two of which are industrial waste substrates [16].

#### **2. Materials and Methods**

#### *2.1. RSAC Preparation*

The fly ash used in this study was obtained from a power plant in Nanjing, China, and the steel slag was from a steel mill in Nanjing, China. Table 1 presents the main physical properties of the fly ash and the steel slag, Table 2 provides the particle size distribution of the fly ash and the steel slag, and Table 3 provides the chemical composition of the fly ash and the steel slag as measured by X-ray fluorescence analysis (XRF).


**Table 1.** Physical properties of waste substrates.

**Table 2.** Particle size distribution of waste substrates.


**Table 3.** Chemical composition of waste substrates (%).


Additionally, the physical properties and the chemical composition of the activated white clay used in this experiment are presented in Tables 4 and 5.


**Table 4.** Physical properties of activated white clay.

**Table 5.** Chemical composition of activated white clay.


The binder used in the experiments was 525R ordinary silicate cement, which has the characteristics of a slow thickening rate, fast setting, and high strength. It can significantly improve the early strength of the composites [17]. The porogenic agent used in the experiment was a plant-based foaming agent. It is made by a saponification reaction with rosin and sodium hydroxide as the main raw materials, and it has a light yellow-brown viscous liquid appearance. The use of a plant-based foaming agent makes the adsorbent structure lose and porous, which can enhance the adsorption performance of RSPRC particles [18]. After comprehensive consideration of the adsorption and mechanical properties, the following substrate proportioning methods were set, which are shown in Table 6.

**Table 6.** Ratio of base material to binder for different groups (%).


After the groups of adsorbents were maintained and shaped, static adsorption tests were performed. The results of the experiments are shown in Table 7 and Figure 1. The RSAC particle morphology and the residual liquid shape were observed after the completion of static adsorption experiments.

**Table 7.** Copper ion removal by different groups of adsorbents.


The properties of the RSAC particles in each group were observed after the adsorption of copper ions. The residuals of Groups I, II, and III were clear, the RSAC particles were not abnormal, and the hardness did not change significantly. The residuals of group IV were somewhat turbid, with slight precipitation, the surface of RSAC particles showed signs of shedding, and the hardness decreased.

**Figure 1.** Comparison of adsorption effect of different groups of adsorbents.

Although Group IV had the best adsorption effect on copper ions, it was found that the residual solution had slight precipitation and the strength of the RSAC particles was reduced to break easily. On balance, Group III with the second highest removal rate was selected as the best ratio for the subsequent experiments. The process flow of the pellet is shown in Figure 2.

**Figure 2.** Experimental RSAC pellet fabrication process.

#### *2.2. Determination of Copper Ion Concentration*

#### 2.2.1. Measurement Methods

The method used for the determination of copper ion concentration was the bisglyoxal oxaldihydrazone spectrophotometric method (GB/T 5750.6-2006). The measurement instrument used was a 752 UV-Vis spectrophotometer, which had a minimum detection mass concentration of 0.04 mg/L. At pH 9, copper ions (Cu2+) could react with bis (cyclohexanone oxaldihydrazone) and acetaldehyde. The reaction product is a purple bis (acetaldehyde oxaldihydrazone) chelate, and the copper ion concentration is determined by the relationship between absorbance and copper ion concentration in direct proportion.

#### 2.2.2. Reagents for Experiments

The reagents for the experiments were: copper sulfate (CuSO4), acetaldehyde solution (W(CH3CHO) = 40%), ammonium chloride (NH4CI), ammonium hydroxide (NH4OH), ammonium citrate ((NH4)3C6H5O7), ethyl alcohol (C2H6O), and bis(cyclohexanone)oxaldihydrazone(BCO). All reagents were supplied by Sinopharm Chemical Reagent Co., Ltd., Nanjing, China. All solutions in these experiments were prepared with analytical grade water (R = 18 M/cm) using grade A glassware unless otherwise stated.

#### 2.2.3. Determination Procedure

Absorb 25.0 mL water sample in 50 mL glass plug colorimetric tube; another 50 mL colorimetric tube 5, respectively, adding copper standard solution 0 mL, 0.50 mL, 1.00 mL, 1.50 mL, and 2.00 mL, diluted with deionized water to 25 mL.

Absorbing 2.0 mL ammonium citrate solution, adding each colorimetric tube, mixing evenly and adjusting pH to 9.0 with (1 + 1) ammonia. Then, 5.0 mL ammonia-ammonium chloride buffer solution was added and mixed evenly. Then, 5.0 mL BCO solution and 1.0 mL acetaldehyde were added successively. Finally, deionized water was added to the scale and mixed evenly.

Heat for 10 min in a 50 ◦C water bath, remove, and cool. After cooling to room temperature (standing for 20 min), under the condition of wavelength of 546 nm, the absorbance of the sample to be tested and the standard series was determined using a colorimetric dish with an optical path of 1 cm and deionized water as the reference.

The standard curve was plotted with the copper ion concentration of the standard series as the abscissa and the corresponding absorbance as the ordinate. The corresponding copper ion concentration was determined from the standard curve according to the absorbance of the water sample to be measured.

The results of the standard series measurements are presented in Table 8.

**Table 8.** Measurement results for the standard series.


The standard curve was plotted as shown in Figure 3 and the linear regression equation (Equation (1)):

$$\mathbf{y} = 0.0914\mathbf{x} + 0.016\mathbf{, R}^2 = 0.9997\tag{1}$$

#### *2.3. Copper Ion Removal Effect Examination Index*

The adsorption effect of RSAC particles on copper ions is mainly reflected in two aspects, namely the copper ion removal rate and the adsorption amount. Removal rate (η) and adsorption amount (Γ) are used in this paper to investigate the copper ion removal effect and the adsorption performance of RSAC particles, respectively.

$$
\eta = \frac{\mathbf{C}\_0 - \mathbf{C}\_e}{\mathbf{C}\_0} \times 100\% \tag{2}
$$

$$
\Gamma = \frac{\left(\mathbf{C}\_0 - \mathbf{C}\_e\right)\mathbf{V}}{\mathbf{m}} \tag{3}
$$

In Equations (2) and (3), η is the removal rate of copper ions (%), C0 is the initial copper ion concentration of the solution (mg/L), Ce is the concentration of copper ions in solution at equilibrium (mg/L), Γ is the amount of copper adsorbed per unit mass of adsorbent (mg/g), V is the volume of the solution (L), and m is the mass of adsorbent (g).

**Figure 3.** Copper ion standard curve.

#### *2.4. Effect of Time on Adsorption*

RSAC particles were weighted and divided into five groups: 2, 4, 6, 8, and 10 g. The groups were put into conical flasks, respectively, with 150 mL of 100 mg/L copper ion solution. Then, all the samples were mixed at room temperature (25 ± 1 ◦C). The concentration of residual copper ion in the supernatant of each sample was measured at a specific time and the relationship was investigated.

#### *2.5. Effect of Dosage on Adsorption*

The same samples were prepared and oscillated at room temperature (25 ± 1 ◦C). The concentration of residual copper ions in the supernatant of each sample was measured when the adsorption time reached 48 h.

#### *2.6. Study of Initial Concentration on Adsorption*

A series of 10, 50, 100, and 150 mg/L copper ion solutions were prepared separately. Then, a 5 g RSAC adsorbent was added into the solution series. Finally, the copper ion concentration in the supernatant of each sample was measured when the static adsorption time reached 2, 6, 12, 24, 48, and 72 h.

#### *2.7. Study of Coexisting Metal Cations on Adsorption*

A series of 150 mL of 100 mg/L copper ion solutions were prepared with the coexistence of Na+, Mg2+, Ca2+, and Fe3+, respectively. To each sample 5 g RSAC particles was added and then mixed at room temperature (25 ± 1 ◦C). The cooper ion concentration in the supernatant of each sample was measured when the adsorption time reached 48 h.

#### *2.8. Study of Ambient Temperature on Adsorption*

Three groups of 150 mL solutions with 10, 40, 80, 120, 160, and 200 mg/L cooper ion were prepared. To each sample was added 5 g RSAC particles. The three groups were mixed at 15 ◦C, 25 ◦C, and 35 ◦C separately. To each sample was measured a cooper ion concentration of the supernatant when reaching adsorption equilibrium.

The Freundlich and Langmuir models are often used to describe the adsorption behavior in solid–liquid systems [19]. Therefore, the adsorption isotherms of the Freundlich and the Langmuir models were plotted using nonlinear fits based on the experimental results of heavy metal ion adsorption by RSAC particles.

The expression for the Langmuir model equation is (Equation (4)):

$$\mathbf{q}\_{\rm e} = \frac{\mathbf{q}\_{\rm max} \mathbf{K}\_{\rm L} \mathbf{C}\_{\rm e}}{1 + \mathbf{K}\_{\rm L} \mathbf{C}\_{\rm e}} \tag{4}$$

In Equation (4), qe is the adsorption equilibrium adsorption capacity (mg/g), Ce is the equilibrium concentration (mg/L), qmax is the maximum adsorption capacity (mg/g), and KL is the Langmuir adsorption constant.

The expression for the Freundlich model equation is (Equation (5)):

$$\mathbf{q}\_{\mathbf{e}} = \mathbf{K}\_{\mathbf{F}} \mathbf{C}\_{\mathbf{e}}^{\frac{1}{n}} \tag{5}$$

In Equation (5), qe is the adsorption equilibrium adsorption capacity (mg/g), Ce is the equilibrium concentration (mg/L), KF is the Freundlich adsorption constant, and n is a constant related to the adsorption capacity.

#### *2.9. Kinetic Study of Copper Ion Adsorption by RSAC*

Currently, quasi-primary and quasi-secondary kinetic models are often used to describe the adsorption kinetic behavior of adsorbents in solid–liquid static adsorption systems [20].

The expression for the quasi-level kinetic model equation is (Equation (6)):

$$\ln\left(\mathbf{q}\_{\text{e}} - \mathbf{q}\_{\text{t}}\right) = \ln\mathbf{q}\_{\text{e}} - \mathbf{k}\_{\text{l}}\mathbf{t} \tag{6}$$

In Equation (6), qe is the amount of solute adsorbed on the adsorbent surface at adsorption equilibrium (mg/g), qt is the amount of solute adsorbed on the adsorbent surface at the specified moment (t) during the adsorption process (mg/g), and k1 is the adsorption rate constant (h<sup>−</sup>1).

The expression for the quasi-secondary kinetic model equation is (Equation (7)):

$$\frac{\mathbf{t}}{\mathbf{q}\_{\text{f}}} = \frac{1}{\mathbf{k}\_{2}\mathbf{q}\_{\text{e}}} + \frac{\mathbf{t}}{\mathbf{q}\_{\text{e}}} \tag{7}$$

In Equation (7), qe is the amount of solute adsorbed on the adsorbent surface at adsorption equilibrium (mg/g), qt is the amount of solute adsorbed on the adsorbent surface at the specified moment (t) during the adsorption process (mg/g), and k2 is the adsorption rate constant (g/(mg·h)).

A 150 mL sample of a 100 mg/L copper ion was prepared and 5 g RSAC particles were added. Then, the sample was mixed at room temperature (25 ± 1 ◦C). The rest copper ion concentration was measured at different times to calculate the adsorbed copper ion. The experimental data of static adsorption of copper ions were fitted to the curve using quasi-primary and quasi-secondary kinetic models in turn.

#### *2.10. Thermodynamic Study of the Adsorption of Copper Ions by RSAC*

The thermodynamic equations are as follows (Equations (8) and (9)):

$$
\Delta \mathbf{G}^0 = \Delta \mathbf{H}^0 - \mathbf{T} \Delta \mathbf{S}^0 = -\mathbf{R} \mathbf{T} \ln \mathbf{K}\_0 \tag{8}
$$

$$
\ln \text{K}\_0 = \Delta \text{S}^0/\text{R} - \Delta \text{H}^0/\text{RT} \tag{9}
$$

In Equations (8) and (9), T is the thermodynamic temperature (K), ΔH<sup>0</sup> is the enthalpy change of adsorption (kJ/mol), ΔG0 is the free energy of adsorption (kJ/mol), <sup>Δ</sup>S<sup>0</sup> is the change in entropy of adsorption [J/(mol·K)], R is the molar volume constant 8.314 J/(mol·K), and K0 is the adsorption partition coefficient, usually taken as the Langmuir constant KL. Using the isothermal adsorption experimental data of RSAC particles at

different temperatures, the adsorption equilibrium coefficient K0 was calculated [21]. A straight line could be fitted by using the inverse of the temperature 1/T as the horizontal coordinate and lnK0 as the vertical coordinate for the graph.

#### **3. Results and Discussion**

#### *3.1. Adsorption Experiments*

#### 3.1.1. Effect of Adsorption Time

The concentration of the residual copper ion in the supernatant of each sample as a function of adsorption time is shown in Figure 4. The residual copper ion concentration showed a similar trend for different dosage amounts. At the beginning of the adsorption reaction (0–12 h), the residual copper ion concentration decreased significantly with the increasing time; at the middle of the adsorption reaction (12–48 h), the residual copper ion concentration decreased slowly with time; at the end of the adsorption reaction (48–72 h), the adsorption equilibrium state was reached. The reason for this trend could be that the initial adsorption occurs mainly on the surface and in the pores of RSAC particles. In the initial adsorption stage, there are many active sites on the surface and in the pores of RSAC particles so copper ions could occupy the active sites rapidly and show the characteristics of a fast adsorption rate [22]. With the extension of time and the increase of adsorption capacity, the active sites become fewer [23]. It was also found that there were large functional groups on the surface of FA and MFA, such as O-H, C=C, and Si-O-Si, which played a crucial role in the process of adsorption of heavy metal ions [24]. It could be inferred that the adsorption equilibrium time of copper ions on RSAC particles is 48 h. This is consistent with the experimental results to explore the optimal ratio of RSAC particles. Therefore, the adsorption time could be set as 48 h in the subsequent static adsorption experiments.

**Figure 4.** Effect of adsorption time on the adsorption of copper ions by RSAC particles.

#### 3.1.2. Effect of Adsorbent Amount

Figure 5 shows the correlation between adsorbent amount and copper ion adsorption. Firstly, the copper ion removal rate increased continuously with the increase of the RSAC particle amount, but the slope decreased at the same time. Meanwhile, the adsorption amount per unit mass of RSAC particles showed a different trend: the adsorption amount per unit mass of RSAC particles increased at the beginning stage and decreased after

reaching a specific point with the increase of the adsorbent amount. The two trends indicated that there was a balance point between efficiency and performance. The amount aimed best removal rate may lead to inefficient usage of RSAC particles: the adsorption capacity per mass of RSAC particles was only 1.43 mg/g at 10 g, which indicated that the adsorption performance of RSAC particles was not fully utilized. The reason may be that the copper ion removal rate increased because of the increase of RSPRC particle amount, the increase in contact area, and the increase in the number of adsorption sites [25].

**Figure 5.** Effect of adsorbent amount on the adsorption of copper ions by RSAC particles.

The adsorption amount of copper ions per unit mass of adsorbent showed a different trend. This may be due to the fact that when the concentration of copper ions in the solution is constant and the adsorbent dosage is low, the adsorption sites are not significant in driving the diffusion and adsorption reaction caused by the atmosphere of copper ion concentration. With the increase in the dosage, the total adsorption sites provided to copper ions in the solid-liquid system increased and the adsorption amount per unit mass of RSPRC particles also increased [26]. Considering economic factors, the dosage of 5 g/150 mL was chosen as a balance point and used in the subsequent experiments.

#### 3.1.3. Effect of Initial Concentration

The relationship between the initial concentration and the residual concentration of copper ions is shown in Figure 6. The removal rate of copper ions and the adsorption amount per unit mass of RSAC particles in each sample after adsorption for 72 h is presented in Table 9. As shown in Figure 6, a high initial copper ion concentration led to a corresponding steep adsorption curve and a fast adsorption rate compared with a low initial concentration sample in the pre-sorption stage. This could be attributed to high initial copper ion concentration providing a sufficient driving force for mass transport, which could make ions occupy the active site on adsorbents rapidly, facilitating the adsorption of copper ions by RSAC particles [27,28].

**Figure 6.** Effect of initial concentration on the adsorption of copper ions by RSAC particles.

**Table 9.** Copper ion removal for different initial concentrations.


#### 3.1.4. Effect of Coexisting Metal Cations

Figure 7 shows the relationship between the metal cation concentration and the effect on the removal of copper ions. The results revealed that the adsorption removal rate of copper ions by RSAC particles fluctuates in a small range with the increase of Na+ concentration, which indicated that Na<sup>+</sup> has a weak competitive behavior against RSAC particles. The adsorption removal rate of copper ions by RSAC particles decreases with the increase of Mg2+ and Ca2+ concentration, which indicated that Mg2+ and Ca2+ may have some effect on the removal rate. The adsorption removal rate decreased significantly with the increase of Fe3+ concentration, from 74.65% to 62.47%, which indicated that Fe3+ had a significant inhibitory effect on the adsorption of copper ions.

The experimental results may be interpreted as that metal cations can replace the original cations in the RSAC particles by ion exchange into the adsorbent surface and pore channels, affecting the adsorption of copper ions on the RSAC particles by changing the adsorbent environment [29]. A higher charge number of the metal cation may result in a stronger ability to replace the original cation [30]. Ion exchange and surface adsorption may be involved in the adsorption process of copper [31].

**Figure 7.** Effect of metal cations on the adsorption of copper ions by RSAC particles.

#### 3.1.5. Effect of Ambient Temperature

The fitted curves at 25 ◦C are shown in Figure 8. The fitting parameters can be obtained from the adsorption isotherm. It can be observed from Table 10 that the fit coefficients R2 of both Langmuir and Freundlich models are greater than 0.95 at different temperatures, which indicates that both models could well express the isothermal characteristics of the adsorption of copper ions by RSAC particles. This also implies that the adsorption isotherm characteristics of the adsorbent for copper ions could fit two or more adsorption isotherm models under certain conditions [32]. Based on the Langmuir model, the KL and the qmax which increase as temperature increases indicates that the intermolecular binding and the adsorption capacity may increase as temperature increases. Based on Freunlich model, the low 1/n value indicates that the adsorption process could undertake easily. Meanwhile, the KF which represents the absorbability increases as temperature increases [33]. The two models show that the adsorption of copper ions by RSAC particles is a heat-absorbing process. This could be explained from different aspects. Firstly, the cooper ion needs the energy to approach the RSAC particles and overcome the resistance from the liquid film of the particles to reach the internal active sites. Secondly, the physical adsorption may release heat since the intermolecular force (Van der Waals force) between adsorbates and adsorbents contributed to the main effect during adsorption, which made the molecular kinetic energy decrease by releasing thermal energy [34]. Another study also found that the adsorption process of fly ash involves physical adsorption [35]. In conclusion, chemisorption is an endothermic reaction while physical adsorption is an exothermic reaction, and the adsorbed thermal is more than the released thermal, causing the increase in temperature to promote the adsorption reaction, which is consistent with previous research conclusions [36,37].

**Figure 8.** Adsorption isotherm of copper ions (25 ◦C).



#### *3.2. Kinetic Study of Copper Ion Adsorption by RSAC*

The fitted curves are shown in Figure 9 and the fitted parameters of the two kinetic models are shown in Table 11. As shown in Figure 9, the adsorption amount of copper ions by RSAC particles increases rapidly with the increase of adsorption time in the early stage of the adsorption reaction. The increase of adsorption amount decreases gradually to almost 0 in the middle and the late stage of the adsorption reaction, which means the adsorption reaches the equilibrium state. In the preliminary stage of adsorption reaction, the adsorbent mainly adsorbs copper ions at the solid–liquid interface [38]. After the preliminary stage, copper ions diffuse from the adsorbent surface to the internal micropores and lattice, reach and are fixed by the internal surface active-sites, thus the adsorption rate decreases slowly [39]. Another study found that the adsorption of fly ash involves both boundary-layer diffusion and intraparticle diffusion [11]. Kai-sung Wang et al. also found fast surface adsorption was followed by a slow intra-particle diffusion adsorption of fly ash [40]. The adsorption process of copper ions by RSAC particles can reach the equilibrium state at 48 h.

**Figure 9.** Adsorption kinetic curve of copper ions (25 ◦C).


The R2 of both models is greater than 0.95 and the difference between them is not significant (Table 11), which indicates that both kinetic models could describe the adsorption process of copper ions on RSAC particles well. It could be further inferred that the adsorption process of copper ions on RSAC particles is a mixed control: both surface diffusion and internal fine pore diffusion are important.

#### *3.3. Thermodynamic Study of the Adsorption of Copper Ions by RSAC*

As shown in Figure 10, the intercept and the slope of the straight line were calculated by ΔS<sup>0</sup> and ΔH0; then, we proceeded to calculate the different temperatures of ΔG0, and Table 12 presents the various thermodynamic parameters obtained. The adsorption free energy ΔG0 of copper ions adsorbed by RSAC particles at different temperatures is negative, and the absolute value of ΔG0 increases gradually with the increase of temperature (Table 12). This indicates that the adsorption of copper ions in solution by RSAC particles is a spontaneous reaction and the spontaneity increases with the increase of temperature [41]. The enthalpy change ΔH<sup>0</sup> during the adsorption of copper ions by RSAC particles is positive, which indicates that the adsorption process is a heat absorption reaction and therefore the increase in temperature contributes to the adsorption [42]. As presented in Table 5, the maximum adsorption capacity of copper ions increases with the increasing temperature, which also confirms that the adsorption of copper ions by RSAC particles is a heat absorption reaction.

**Figure 10.** Relationship between lnK0 − 1/T for the process of copper ion adsorption by RSAC particles.


**Table 12.** Thermodynamic parameters of copper ion adsorption on RSAC particles.

#### **4. Conclusions**

This study aimed to prepare and to apply RSAC to remove copper ions from wastewater and to discuss the influence mechanism and the microstructure for adsorption by RSAC particles. One of the more significant findings to emerge from this study is that the adsorption of copper ions in a solution by RSAC particles is a spontaneous, heat absorption reaction. The mechanism of copper ion removal by RSAC particles includes an ion exchange reaction and chemical precipitation in addition to physical adsorption. The second major finding is that metal cations can replace the original cations in the RSAC particle ion exchange, change the environment of the adsorbent, and affect the adsorption of copper ions on RSAC particles. This study has also indicated that both Langmuir and Freundlich models can well describe the isothermal characteristics of the adsorption of copper ions by RSAC particles. Both the quasi-primary kinetic model and the quasi-secondary kinetic model can describe the adsorption process of copper ions on RSAC particles well. It can be further inferred that the adsorption process of copper ions on RSAC particles is a mixed control, and both surface diffusion and internal fine pore diffusion are important.

**Author Contributions:** Conceptualization, Y.L.; methodology, Y.L.; software, Y.L. and Q.C.; validation, Y.L. and Q.C.; formal analysis, Q.C.; investigation, Q.C.; resources, Q.C.; data curation, Q.C.; writing—original draft preparation, Q.C. and R.P.S.; writing—review and editing, R.P.S. and Q.C.; visualization, Y.L. and Q.C.; supervision, Y.L. and R.P.S.; project administration, Y.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** The authors would like to show their gratitude to all those who helped during the experimental period and with the writing of this manuscript. Q.C. would like to gratefully acknowledge the help of Rajendra Prasad Singh, School of Civil Engineering, Southeast University, and appreciate his guidance, patience, encouragement, and professional instructions.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Article* **Removal of Copper, Nickel, and Zinc Ions from an Aqueous Solution through Electrochemical and Nanofiltration Membrane Processes**

**Jagdeesh Kumar 1,\*, Himanshu Joshi <sup>1</sup> and Sandeep K. Malyan 2,\***


**Abstract:** Heavy metal contamination in water is a major health concern, directly related to rapid growth in industrialization, urbanization, and modernization in agriculture. Keeping this in view, the present study has attempted to develop models for the process optimization of nanofiltration (NF) membrane and electrocoagulation (EC) processes for the removal of copper, nickel, and zinc from an aqueous solution, employing the response surface methodology (RSM). The variable factors were feed concentration, temperature, pH, and pressure for the NF membrane process; and time, solution pH, feed concentration, and current for the EC process, respectively. The central composite design (CCD), the most commonly used fractional factorial design, was employed to plan the experiments. RSM models were statistically analyzed using analysis of variance (ANOVA). For the NF membrane, the rejection of Zn, Ni, and Cu was observed as 98.64%, 90.54%, and 99.79% respectively; while the removal of these through the EC process was observed as 99.81%, 99.99%, and 99.98%, respectively. The above findings and a comparison with the conventional precipitation and adsorption processes apparently indicate an advantage in employing the NF and EC processes. Further, between the two, the EC process emerged as more efficient than the NF process for the removal of the studied metals.

**Keywords:** nanofiltration; electrocoagulation; nickel; zinc; copper; heavy metals; water pollution

#### **1. Introduction**

Heavy metals are inorganic elements naturally found throughout the earth's crust [1]. Their concentration above permissible limits is considered pollution. "Heavy metals" refers to a group of elements with a density greater than 4 g cm−3, including metals and metalloids [2]. Industrial discharges, agricultural runoff, storm water, mining activity, and direct inclusion of sewage/wastewater contribute to the heavy metal pollution load in fresh water, leading to various health and environmental problems. Among the commonly reported heavy metals, copper (Cu) is used widely in electroplating, batteries, pesticides, galvanized pipes, and alloys [3–7]. Regular consumption of copper-contaminated drinking water may cause stomach upsets, abdominal cramp and diarrhea. Nickel (Ni) is another metal found widely in water and wastewater. The electroplating industry, rechargeable batteries, and galvanized pipes are its main sources. High levels of nickel contamination cause serious lung and kidney problems as well as skin dermatitis and pulmonary fibrosis. In drinking water, the maximum allowable limit for nickel is 0.1 ppm [8]. Zinc (Zn) is used in many types of industry, such as metal production, galvanization, food preservation, agri-food and biological engineering, pharmaceuticals, electronics, mining and metallurgy, with major contributions coming from electroplating and mining effluents [9,10]. Zinc is not considered highly toxic but its presence in drinking water if exceeding 15 mg/L is reported to cause nausea, vomiting and diarrhea [11]. These heavy metals are ingested directly either

**Citation:** Kumar, J.; Joshi, H.; Malyan, S.K. Removal of Copper, Nickel, and Zinc Ions from an Aqueous Solution through Electrochemical and Nanofiltration Membrane Processes. *Appl. Sci.* **2022**, *12*, 280. https:// doi.org/10.3390/app12010280

Academic Editor: Bart Van der Bruggen

Received: 15 October 2021 Accepted: 22 December 2021 Published: 28 December 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

by drinking contaminated water or indirectly through the food chain, and subsequently affect human health [12–14]. Drinking of contaminated water has been reported to lead to around 70–80% of the total diseases in developing countries [15,16], where the impact of increased pollution is particularly problematic because the population at large does not have sufficient resources to effectively treat the contaminated water or access to safe drinking water systems at their homes. According to a WHO (2017) estimate, around 844 million people do not have access to a basic drinking water source and 230 million people spend more than 30 min/day in collecting water from an improved water source, which may include piped water, boreholes, protected wells and springs, rainwater and packaged/delivered water [17]. According to the United Nations, an estimated 80% of all industrial and municipal wastewater in the developing world is released into freshwater bodies without any prior treatment [18]. Heavy metal removal can be achieved through different physical, chemical and biological methods such as fungal remediation [19], microbial remediation [12,20], phytoremediation [21,22], adsorption [23,24], flotation, coagulation– flocculation [25], chemical precipitation or ion exchange [26]; selection between these may be based on the nature and quantum of the pollution load and merits/demerits of decontamination processes along with other factors. It is noted that removal of heavy metals from water/wastewater is still an evolving research area, and there is wide scope for case-specific evaluation, optimization and integration of new and/or available technologies. In this regard, it has been noted that removal of heavy metals from aqueous solutions, especially metal-laden water or wastewater displaying high and heterogeneous concentrations is one of the major challenges. For this, nanofiltration (NF) and electrocoagulation (EC) processes have been reportedly more reliable than bioremediation in terms of the shorter time taken in providing near complete removal, ease of setup, and predictability. The primary emphasis of the present study is to explore the efficiency of removal of Cu, Ni, and Zn by NF and EC processes from their synthetic aqueous solutions in low to high concentrations. The selection of these metals for study is based on the findings of a comprehensive literature review, indicating that these comprise the major constituents in electroplating effluents or the recipient waters of these effluents.

#### **2. Materials and Methods**

#### *2.1. Chemicals, Membranes and Electrodes*

The experiments were conducted for the technical evaluation of the NF and EC processes employing a range of concentrations of metals in aqueous solution based on the available secondary data on electroplating effluent quality in the literature and also in the study area [27,28]. All chemicals used in this research were of analytical grade, and synthetic composite metal solutions were prepared by dissolving the appropriate mass of each metal in high purity Milli-Q water (18.2 MΩ cm). Copper (III) sulphate pentahydrate CuSO4·5H2O, nickel (II) sulphate hexahydrate NiSO4·6H2O, zinc sulphate ZnSO4·7H2O, sodium hydroxide (NaOH), nitric acid (69–72%), sodium chloride (NaCl) and calcium carbonate extra pure (CaCO3) were all obtained from Merck Specialties Private Ltd. Quicklime (CaO), nanofiltration membrane (Permionics, Flat Sheet Membrane HFN-300 AR) and stainless steel (SS-304) electrodes were procured from the market as commercially available. The stated membrane was specifically employed so as to effectively work in both acid and basic medium. Stainless steel electrodes were used, as they are less susceptible to corrosion, and have reportedly shown a better performance in earlier studies. The grade of steel used is reported to not contain Zn, and have very low amount of carbon.

#### *2.2. Experimental Setup and Procedure*

#### 2.2.1. Experimental Setup for Nanofiltration

The NF unit was a cross-flow lab-scale system (Nilshan Nishotech Pvt. Ltd., Navi Mumbai, India). It consisted of a high-pressure pump, feed vessel, flat membrane sheet housing cell, and a temperature control unit (Figure 1a). The membrane housing cell contained a rectangular channel. The active surface area of the membrane was 0.0155 m2. Lab experiments were

conducted by filtering the multicomponent solution with NF membrane. The permeate and the concentrate streams were recirculated back into the feed tank continuously during experiments. After the completion of every single experiment, the system was appropriately cleaned by Milli-Q water. The samples were collected after each experiment.

**Figure 1.** (**a**) Flat Plate Membrane System. (Control Panel, TN = Temperature Node, P = Pressure Node, FR = Flow Rate Controller node, TS = Temperature Sensor, PV = Pressure Valve, PS = Pressure Sensor, FPC = Flat Plate Membrane Cell, HP = High-Pressure Pump, FT = Feed Tank.). (**b**) Laboratory scale experimental setup of electrocoagulation unit. (1-AC, power scheme; 2, direct current supply; 3, treatment vessel, consists of anode and cathode in mono-polar mode, magnetic-bead; 4, magnetic stirrer).

#### 2.2.2. Experimental Set for Electrocoagulation

The electrocoagulation (EC) experimental setup consisted of a DC power supply unit for constant DC output. The experimental reactor (11.0 cm × 11.0 cm × 15.0 cm) was made of plexiglass and four mono-polar stainless-steel plates (9.0 cm × 9.0 cm × 0.1 cm) submerged in the solution as the electrodes (Figure 1b). Plate spacing was 1 cm. The magnetic stirrer was used to provide proper stirring to maintain the uniformity of the solution throughout the reactor. Initially, at the start of each EC experimental run, 1.8 L of the synthetic solution was put into the electrolytic reaction cell after mixing it with one gm/L electrolyte (NaCl). The pH of the solution was measured and maintained by adding drops of 0.1 N NaOH and H2SO4 solution using a pH meter. The current was controlled through the power supply regulator. Samples were collected at the end of the electrolysis process.

Samples collected after the experiments employing different treatment processes were digested through the microwave digestion unit (Anton Par) and filtered by 0.42 μ filter papers. They were further analyzed by inductively coupled plasma mass spectroscopy (ICP-MS of Agilent). The removal efficiency was determined by calculating the difference in the concentrations measured by ICP-MS before and after each experiment.

#### *2.3. Preparation of Working Solutions*

Metal solutions were prepared by dissolving the appropriate mass of each metal in high purity Milli-Q water (18.2 MΩ cm), as mentioned earlier. All metal salts were added sequentially, after the previous metal salt had completely dissolved. Thereafter, the mixed-metal solutions with different concentrations (ppm) were prepared for each batch experiment.

#### *2.4. Calculation of Removal Percentage*

The removal efficiency of any metal can be calculated using the following equation:

$$\mathcal{R}(\%) = \frac{\mathcal{C}\_{\text{i}-}\mathcal{C}\_{\text{f}}}{\mathcal{C}\_{\text{i}}} \times 100 \tag{1}$$

where Ci and Cf (mg/L) denote the concentration of the metal before and after the treatment process, respectively.

#### *2.5. Experimental Design and Optimization through Response Surface Methodology*

For mathematical modelling of the process, an empirical approach [29–31] employing response surface methodology (RSM) was adopted [32–34]. RSM reportedly reduces systematic errors with an estimation of investigational error and also reduces the number of experiments [30], requires lesser computer simulations, is more accessible and more efficient than the other methods based on limited components or computational complexity [31].

RSM based on the central composite design (CCD) was used to examine the efficacy of the NF membrane and EC processes. CCD helped in arriving at the operational conditions highlighting the highest removal efficiency scenarios. In the NF membrane process, the solution pH, pressure, concentration, and temperature were the key factors widely reported to contribute to the removal of metal ions [35–38] and thus design expert software was used for the experimental design with a varying range of these factors (Table 1).

**Table 1.** Factor and range for design experiments of NF membrane and EC.


In the EC process, solution pH, time, concentration, and current were the key factors [25] widely reported to contribute to the removal of metal ions [39–42] and thus design expert software was used for experimental design with a varying range of these factors (Table 1). The initial and final conductivity values were 2.78 and 2.35 mS/cm for the final optimum condition. The complete design matrix of the NF membrane and EC processes obtained after the application of CCD is presented in Supplementary Table S1 and Supplementary Table S2 respectively, which suggest thirty sets of runs and six centrally coded level runs for each treatment process.

#### **3. Results and Discussion**

In this study, experiments were performed for different combinations of factors for both processes, as described in the following sections.

#### *3.1. Experimental Performance of NF*

The details of the coded variables (X1, X2, X3 and X4), and their response values are presented in supplementary (Supplementary Table S1).

#### 3.1.1. Statistical Analysis and Modelling by RSM

The NF membrane process responses were studied for the permeate flux and metal rejections (Zn, Cu, and Ni). The findings of the experimental studies were analyzed statistically through analysis of variance (ANOVA). Table 2 shows the ANOVA models.

#### **Table 2.** ANOVA analysis for the NF membrane.


For the above models, Fisher's test statistic (*F*-Value) clarifies the scattering of actual data around the fitted models, while the *p*-value indicates the significance of the model terms. The F value of responses suggested that the respective models were significant regarding the residual error. The *p*-value of a model lower than 0.05 indicates a significant model, and higher than 0.10 means an insignificant model. The *p*-value of all responses were lower than 0.0001, suggesting that the models are highly significant. The coefficient of regression (*R*2) described the system behaviour and the adequacy of the model in the range of independent variables. The high *R*<sup>2</sup> and adjusted *R*<sup>2</sup> in Table 2 also reveal that the models are highly significant.

The quadratic regression model equations for NF membrane permeate flux (Y1), Zn removal (Y2), Ni removal (Y3), and Cu removal (Y4) in terms of coded factors are presented below as Equations (2)–(5), respectively.


$$\begin{array}{l} \text{Y}\_{2} = +89.32 + 24.78\text{X}\_{1} - 1.62\text{X}\_{2} - 4.67\text{X}\_{3} - 2.82\text{X}\_{4} - 6.26\text{X}\_{1}^{2} - 4.22\text{X}\_{2}^{2} - 4.71\text{X}\_{3}^{2} - 3.91\text{X}\_{4}^{2} \\ + 0.52\text{X}\_{1}\text{X}\_{2} + 1.96\text{X}\_{1}\text{X}\_{3} + 0.67\text{X}\_{1}\text{X}\_{4} - 0.34\text{X}\_{2}\text{X}\_{3} - 1.25\text{X}\_{2}\text{X}\_{4} + 0.23\text{X}\_{3}\text{X}\_{4} \end{array} \tag{3}$$

$$\begin{array}{l} \text{Y}\_{3} = +84.38 + 19.27 \text{X}\_{1} - 2.23 \text{X}\_{2} - 3.82 \text{X}\_{3} - 2.17 \text{X}\_{4} - 0.43 \text{X}\_{1}^{2} - 3.58 \text{X}\_{2}^{2} - 3.46 \text{X}\_{3}^{2} - 3.75 \text{X}\_{4}^{2} \\ + 0.61 \text{X}\_{1} \text{X}\_{2} + 0.59 \text{X}\_{1} \text{X}\_{3} + 0.51 \text{X}\_{1} \text{X}\_{4} - 0.52 \text{X}\_{2} \text{X}\_{3} + 0.076 \text{X}\_{2} \text{X}\_{4} + 0.26 \text{X}\_{3} \text{X}\_{4} \end{array} \tag{4}$$

$$\begin{aligned} \mathbf{Y\_4} &= +92.65 + 8.88\mathbf{X\_1} - 0.85\mathbf{X\_2} - 2.02\mathbf{X\_3} - 1.04\mathbf{X\_4} - 3.76\mathbf{X\_1}^2 + 0.66\mathbf{X\_2}^2 + 0.47\mathbf{X\_3}^2 - 0.46\mathbf{X\_4} \\ &+ 0.44\mathbf{X\_1}\mathbf{X\_2} + 1.50\mathbf{X\_1}\mathbf{X\_3} + 0.31\mathbf{X\_1}\mathbf{X\_4} - 0.33\mathbf{X\_2}\mathbf{X\_3} + 0.070\mathbf{X\_2}\mathbf{X\_4} + 0.27\mathbf{X\_3}\mathbf{X\_4} \end{aligned} \tag{5}$$

#### 3.1.2. Response Surface Plots

The response surface plot for the permeate flux of NF is presented in Figure 2. The observations show an upsurge in the permeate flux simultaneously with the increase in trans-membrane pressure. It has been well established that permeate flux depends on pressure and gets amplified almost linearly with increasing pressure [32–36]. The maximum permeate flux of 59.34 L/m2·h is obtained at the feed concentration of 25 ppm, pH 9.5, pressure 25 bar, and temperature 35 ◦C, as shown in Figure 2. It is typically theorized that an increased temperature accelerates the permeate flux due to one or more reasons such as a decline in solvent viscosity, a rise in solvent diffusion, intensification in the solvent diffusion coefficient, or a surge in polymer chain mobility [24]. Membrane-solvent interactions can be expected to differ with a change in solvent properties, like dielectric constant, molecular size, dipole movements, and Hildebrand solubility parameter. The rise in temperature also affects structural properties such as pore radius and membrane thickness, which have shown a much more noticeable impact on membrane performance in comparison to solvent and solute motilities [37–39]. Experiments have demonstrated a linear increase in the slope of flux with a rise in temperature, as reported by others [40,41]. Figure 2a shows a direct increase of permeate flux with an increase in trans-membrane pressure. Figure 2b demonstrates a significant increase in the permeate flux with an increase in temperature. Water permeation by micropores is an activated process that is absolutely dissimilar from viscous flow. It should be taken into account that the water molecule is one of the smallest molecules, having the same range of kinetic diameter (0.29 nm), as helium (0.24 nm) and hydrogen (0.27 nm). A portion of water molecules gets adequate thermal energy to cross the energy barrier from the pore wall and passes over the pores, which is another justification based on the adsorption of water on hydrophilic pore walls. The actual pore diameter might get reduced by the water adsorbed on pore walls. In such a case, the adsorbed water layer can be thinner at higher temperatures resulting in the effective pore diameter becoming more extensive [41].

The separation of metal ions by NF is attained by size exclusion, and electrical interactions between the ions in the feed aqueous solution and the charged NF membranes. The degree of ionization of these functional groups is a function of the solution pH, which influences the membrane charge and, therefore, the rejection properties of the membrane [41]. The rejection of Cu, Ni, and Zn ions increased with the increase in the solution pH (Figure 3a–c). The feed solution pH determines the ion charge in the solution and the surface charge density of the membranes. The more the pH increases, the more the membrane charge becomes positive, leading to a stronger electrostatic repulsion between the membrane and the metal ions [35]. Copper hydroxide precipitation starts at pH 5.24, and the precipitation of the other metals (Zn and Ni) at a still higher pH. At the different pH values studied, the rejection of copper was higher than for Zn and Ni ions, as reported earlier [42]. The maximum rejection of Cu, Zn, and Ni was demonstrated as 99.99%, 99.96% and 99.63%, respectively, in the experiments where concentrations ranged between 10–25 ppm and pressure between 10–25 bar. It was observed that the rejection of metal ions decreased when the concentration of feed solution increased, a common phenomenon for NF membranes [37]. The increase in concentration in the feed solution apparently generates a screen formation of cation adjacent to the membrane on the high-pressure side, which neutralizes the negative charges of the NF membrane. Thus, the total negative charge of the membrane decreases, and the repulsion between membrane and anion decreases. As a result, the co-ions (ions with the same charge as the membrane) quickly escape through the membrane and due to electro-neutrality, the rejection of counter-ions is reduced [8,43]. Figure 3d–f shows a slight decrease in the rejection of Ni and Zn ions with an increase in feed solution concentration, whereas the rejection of the Cu ions was not much affected. Temperature and pressure also have not shown much influence upon the rejection of the metal ions. Overall, the findings of this study are quite in line with some other relevant studies, as displayed in Table 3.

**Figure 2.** RSM plot for permeate flux. (**a**) Effect of pressure and pH. (**b**) Effect of temperature and concentration.

**Table 3.** Comparison of rejection efficiency of metals ions by NF membrane between this and other studies in the literature.


Note: NF—nanofiltration membrane; FO—Forward Osmosis.

**Figure 3.** NF membrane RSM plots for pH and pressure effects on metals ions rejection% efficiency— (**a**) Zn ions, (**b**) Ni ions, and (**c**) Cu ions; and effect of temperature and concentration on metal ions rejection% efficiency—(**d**) Zn ions, (**e**) Ni ions, and (**f**) Cu ions.

#### 3.1.3. Multi Response Optimization

The optimization of all input variables was done using the desirability function approach to arrive at the best response values of the factors Y1, Y2, Y3, Y4. As depicted in Table 4, at the optimal condition, the predicted response values of factors (Y1, Y2, Y3, Y4) were observed as 36.9 (L/m2·h), 94.77%, 88.67%, 95.89%, respectively. The average values of factor responses after three runs, were found to be 41.93 (L/m2·h) for Y1, 98.64% for Y2, 90.54% for Y3, and 99.79% for Y4. All the experimentally derived values are close to the predicted response values, showing a good correlation (Table 4).



#### *3.2. Experimental Performance of EC*

The details of the coded variables (X1, X2, X3 and X4), and their response values are presented in Supplementary Table S2.

#### 3.2.1. Statistical Analysis and Modelling by RSM

The findings of the experimental studies were analyzed statistically through analysis of variance (ANOVA). Table 5 shows the ANOVA models.

**Table 5.** ANOVA analysis for the EC process.


The *F* and *p* values presented in Table 5 indicate that the fitted models are significant. The values of *R*<sup>2</sup> and adjusted *R*<sup>2</sup> in the Table 5 also reveal the high significance levels of the models.

The quadratic regression model Equations (6)–(8) for Zn removal (Y1), Ni removal (Y2) and Cu removal (Y3) in terms of coded factors are given below.

$$\begin{array}{l} \text{Y}\_{1} = +93.51 + 7.05 \text{X}\_{1} + 10.35 \text{X}\_{2} - 5.22 \text{X}\_{3} + 10.69 \text{X}\_{4} - 0.030 \text{X}\_{1}^{2} - 7.46 \text{X}\_{2}^{2} + 0.58 \text{X}\_{3}^{2} - 6.74 \text{X}\_{4}^{2} \\\ + 0.039 \text{X}\_{1} \text{X}\_{2} + 4.07 \text{X}\_{1} \text{X}\_{3} - 4.70 \text{X}\_{1} \text{X}\_{4} + 0.81 \text{X}\_{2} \text{X}\_{3} - 0.60 \text{X}\_{2} \text{X}\_{4} + 4.38 \text{X}\_{3} \text{X}\_{4} \end{array} \tag{6}$$

Y2 = +90.88 <sup>+</sup> 9.06X1 <sup>+</sup> 12.42X2 <sup>−</sup> 5.53X3 <sup>+</sup> 10.75X4 <sup>−</sup> 2.14X<sup>2</sup> <sup>1</sup> <sup>−</sup> 8.60X<sup>2</sup> <sup>2</sup> + 0.069X<sup>2</sup> <sup>3</sup> <sup>−</sup> 7.71X2 4 − 1.75X1X2 + 1.35X1X3 − 4.46X1X4 + 1.01X2X3 + 0.93X2X4 + 2.24X3X4 (7)

Y3 = + 97.05 <sup>+</sup> 2.00X1 <sup>+</sup> 3.69X2 <sup>−</sup> 1.00X3 <sup>+</sup> 3.06X4 <sup>+</sup> 0.10X<sup>2</sup> <sup>1</sup> <sup>−</sup> 2.19X<sup>2</sup> <sup>2</sup> + 0.79X<sup>2</sup> <sup>3</sup> <sup>−</sup> 1.55X<sup>2</sup> 4 + 0.081X1X2 − 0.57X1X3 − 1.25X1X4 + 0.91X2X3 − 0.16X2X4 + 0.32X3X4 (8)

#### 3.2.2. Response Surface Plots

It is well documented in the literature that initial pH is an essential operating parameter that strongly affects the EC process performance. The pH effect on the removal efficiencies of metal ions after EC treatment was validated in the experimental observations. Maximum removal efficiency for Zn (99.46%), Ni (98.14%), and Cu (99.96%) has been observed at pH 6. Figure 4a–c demonstrates an increase in the removal efficiency with an increase in pH. This indicates that metal ion elimination decreases in an acidic medium [50]. As reported, in an intensely acidic medium, the protons in the solution get reduced to H2 gas at the cathode, and a sufficient number of hydroxyl ions are not generated. The pH of the initial solution affects the EC process by changing the solution's physico-chemical properties, such as solubility of metal hydroxides, electric conductivity, and size of colloidal particles of iron (III) complexes, which are most reactive agents for metal ions [51]. A slight reduction in the removal efficiency with the rise in the initial concentration of the metals in solution, as shown in Figure 4, is attributed to the fact that the amount of dissolved iron from the electrode may not have been enough to treat the metal ions present in the wastewater. The higher initial concentration in the feed solution was also reportedly found to significantly affect the EC process [52].

It was observed (Figure 4d–f) that increasing the constant current substantially reduces metal ions. The constant current emerged as a crucial parameter in improving metal ion removal, which may have contributed to the direct current field, and potential electrolysis, resulting in more release of ferric ions and generating more iron hydroxides, further forming coagulants for metal removal.

**Figure 4.** Electrocoagulation RSM graphs: effects of concentration and pH on removal% efficiency of metal ions: (**a**) Zn ions, (**b**) Ni ions and (**c**) Cu ions. Effect of current and reaction time on removal% efficiency of metal ions: (**d**) Zn ions, (**e**) Ni ions and (**f**) Cu ions.

Electrolysis time period plays a vital role in metal ion removal studies along with the constant current, pH and concentration. The concentration of Zn, Ni, and Cu has been observed to decrease with an increase in the electrolysis time. The complete reduction of metal ions was possible at a lower constant current by extending the electrolysis time. It may be stated that higher metal ion concentration consumes the adsorption ability of flocs formed, with fewer flocs being accessible for adsorption. Moreover, removal was limited by the formation rate of flocs of iron hydroxide complexes at the anode surface. It has been shown (Figure 4) that the minimum reduction was observed at lower electrolysis reaction times. The present study highlights that both the current and reaction time play a vital role in the removal efficiency of the EC process. Table 6 reports the results of this study vis-à-vis other studies reported in the literature on metal removal through EC processes.


**Table 6.** Overview of metals ion removal efficiency by EC processes described in the literature.

#### 3.2.3. Multi Response Optimization

For the EC System, the predicted response values of the factors (Y1, Y2, and Y3) were obtained as 101.50%, 94.452%, and 98.866% under optimal operating conditions. Input variables of current and time are the dominant factor in reaction conditions, so the predicted response value shows a higher value. After three experimental runs, the average response values of Y1, Y2, and Y3 were 99.81%, 99.99%, and 99.98%, respectively. All the experimentally attained values are quite close to the predicted response values and show a good correlation (Table 7).

**Table 7.** EC Optimization of response through RSM.


#### *3.3. Comparison with Chemical Precipitation and Adsorption Processes*

Chemical precipitation is a commonly used treatment process for the removal of heavy metals from industrial wastewater because it is relatively inexpensive and easy to operate. This process involves the precipitation of heavy metals in the form of hydroxide and sulfide. Hydroxide precipitation depends on pH adjustment (9–11) to basic conditions [58]. The metal ions dissolved in the solution are precipitated into the insoluble solid phase as metal hydroxide through the chemical reaction when quicklime (CaO) is employed as a precipitant. Yet another common treatment process viz., adsorption, on the other hand, is a mass transfer process involving the migration of the metal ions (adsorbate) from the wastewater to a solid surface (adsorbent, commonly CaCO3) and binding through physical (weak Van der Waals force) and chemical (strong covalent bond) adsorption mechanisms [59,60]. With an idea to compare the performance of NF and EC processes with the above-stated routine ones, the present study employed concentration, contact time, and dosing amount as operational variables for the conventional chemical treatment process employing CaO and CaCO3. Twenty experiments were conducted for each process and Table 8 provides details of the experimental design.


**Table 8.** Factors and Range of Design for CaO and CaCO3.

In CaO precipitation, the removal efficiency of the process was quite high for Zn, Ni, and Cu ions, as expected and as indicated by the results in Table 9 However, understandably, there are also many demerits in this process. It requires a large amount of chemical precipitant and produces a considerable amount of low-density sludge due to the poor settling properties, duly followed by further dewatering and disposal issues [60,61]. Aggregation of metal precipitates also has long term environmental impacts. Treated water also has a very high pH (10–12), so it cannot be further used in industrial process and requires treatment.

**Table 9.** Removal% Efficiency of Conventional (CaO and CaCO3) vs. EC and NF processes.


Regarding the CaCO3 adsorption process, the results presented in Table 9 indicate that the removal of Ni ions is not as efficient as for Cu and Zn ions. In the adsorption process, generated sludge needs to be separated from the solution and requires regeneration or labelling as a hazardous waste due to the strong possibility of leaching out of metals ions in the environment, while needing post-treatment sludge management. Van der Waal forces are very weak to strong for different adsorbents, due to which the process is unable to deliver promising results [62–64].

NF process lies between ultrafiltration (UF) and reverse osmosis (RO). Designed to separate contaminants smaller than 10 nm, it emerges as one of the exemplary processes for eliminating dissolved metals ions from wastewater. The leading gains of this process are higher removal efficiency, reliability and easy operation, lesser space requirement, and relatively lower energy requirement [60,63,65]. Table 9 shows an outstanding rejection rate for metals ions from this study.

The EC process is also widely recognized as an effective treatment method for eliminating heavy metal ions from industrial wastewater. It does not require any additional chemicals because the electron is a crucial reagent in the process. EC is considered a rapid and well-controlled technique, provides good reduction yield, produces less sludge, has the potential of metal recovery, requires less labor, can save significant energy, and is ecofriendly [57,59]. Table 9 shows an excellent metal ion reduction in the present experimental work.

#### **4. Conclusions**

The present study examines the removal efficiency of heavy metals (Cu, Ni, and Zn) in a mixed aqueous solution in a batch mode through a nanofiltration (NF) membrane and an electrocoagulation (EC) process and compares it with conventional chemical treatment processes. Solution pH is seen to significantly affect the removal efficiency in both the NF and EC processes. The highest permeate flux of 59.34 L/m2·h was observed at the experimental condition of pH 9.5, pressure 25 Bar, concentration 25 ppm and temperature 35 ◦C in NF process. The rejection rate of Zn, Ni and Cu was demonstrated as 95.32%, 94.98% and 96.93%, respectively. A marked synergistic effect of temperature and pressure has been observed, which increased the flux to a high value. The EC process has shown

a maximum removal of Zn (99.46%), Ni (98.14%), and Cu (99.87%) at the operational conditions viz., pH 6, time 60 (min), concentration 2.5 ppm, and current 1.5 Å. The results for the EC process indicated that a lower concentration and approximately neutral pH helped the system to reach its full potential. Overall, both NF and EC processes have shown excellent removal for all the studied metal ions and the outcome of the experiments described above projects them as promising solutions in comparison to conventional chemical treatment approaches.

**Supplementary Materials:** The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/app12010280/s1, Table S1: Experimental design and responses of the NF process; Table S2: Experimental design and responses of the EC process.

**Author Contributions:** Conceptualization, J.K. and H.J.; methodology, J.K. and H.J.; software, J.K.; formal analysis, J.K. and S.K.M.; investigation, J.K.; resources, H.J. and S.K.M.; writing—original draft preparation, J.K.; writing, review and editing, H.J. and S.K.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Data Availability Statement:** All relevant data are within the manuscript and available from the corresponding author on request.

**Acknowledgments:** This work has been carried out with the Junior/Senior Research Fellowship of the University Grant Commission (UGC), New Delhi, India, under the Ph.D. programme. Authors are highly thankful to the Indian Institute of Technology Roorkee, Roorkee for providing the necessary support and encouragement for bringing out this publication.

**Conflicts of Interest:** The authors declare no conflict of interest or state.

#### **References**


### *Article* **Expanded S-Curve Model of Relationship between Domestic Water Usage and Economic Development: A Case Study of Typical Countries**

**Xiaoqian Guo 1,2,\* , Anjian Wang 1,2, Guwang Liu 1,2 and Boyu Du 1,2**


**Abstract:** Domestic water plays a growing role with the unprecedented economic development and rising urbanization. The lack of long-term evaluation of domestic water usage trends limits our understanding of the relationship between domestic water usage and economics. Here, we present a pragmatic approach to assess the long-term relationship between domestic water usage and economics through historical data of the last 100 years from 10 typical countries to establish an evaluation method for different economics. The relationship between domestic water usage and GDP per capita was described as an expanded S-curve model and the mathematical modeling was derived to simulate this relationship for four typical countries as case studies. The simulation results show that the expanded S-curve of different countries can be calibrated with three key points: takeoff point, turning point, and zero-growth point, and four transitional sections: slow growth, accelerated growth, decelerated growth, and zero/negative growth, corresponding to the same economic development level. In addition, other factors influencing domestic water usage are also discussed in this research, including urbanization, industrial structure, and technical progress. We hope to provide a case study of an expanded S-curve as a foundation for forecasting domestic water usage in different countries or in the same economy at different developmental stages.

**Keywords:** expanded S-curve model; domestic water usage; economic development; mathematical model

#### **1. Introduction**

As an essential resource for human development, water is required throughout the life-cycle processes of all of society. In the context of unprecedented economic development and rising urbanization, water usage (i.e., withdrawal) by humans has increased from 500 km3yr−<sup>1</sup> to nearly 4000 km3yr−<sup>1</sup> over the last century, with an annual increase rate of 1.5% between 1960 and 2010 [1,2]. This increasing water usage has aggravated water scarcity, affecting more than 2 million people globally. In addition, it is predicted that more than half of the global population will live in regions suffering from at least moderate water shortage by 2050 [3,4].

Among global water usage, the principal user of water is in the agriculture sector, accounting for 70% of total water usage, with the remaining part being attributable to the industrial sector and domestic sectors [5]. The global domestic water demand is projected to see a 130% increase by 2050, which is much faster compared to other water sectors [6]. Therefore, with economic development, domestic water will play a major role in total water usage in the near future. A long-term evaluation of domestic water usage trends could provide references for policymakers, and is emerging as a paramount issue for efficient and sustainable management of water resources [7].

**Citation:** Guo, X.; Wang, A.; Liu, G.; Du, B. Expanded S-Curve Model of Relationship between Domestic Water Usage and Economic Development: A Case Study of Typical Countries. *Appl. Sci.* **2022**, *12*, 6090. https://doi.org/10.3390/ app12126090

Academic Editors: Amit Kumar, Santosh Subhash Palmate and Rituraj Shukla

Received: 29 April 2022 Accepted: 11 June 2022 Published: 15 June 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

A series of studies has been conducted on the evaluation of water usage trends, based on different theories and methodologies. These published studies were performed based on two integrated criteria: drivers and approaches. The first one investigated water-usage drivers in a short timeframe. Domestic water usage is considered to be related to a series of drivers, including economic, climate, population, water price, and policies. Zhou et al. quantified socioeconomic drivers, such as urban population and service GVA, to investigate the key drivers of water changes [7,8]. Manouseli et al. considered climate change as a factor affecting domestic water [9,10]. Meng et al. proved the significant relationship between regional GDP, population, and water consumption [11]. Suarez-Varela modeled the linear relationship between water usage and water price [12]. Bijl et al. described the GDP per capita, population, and water withdrawal efficient as synthetic factors for domestic water change [13]. Among these factors, the investigation of long-term drivers was constrained by the lack of continuous data, except for the economic drivers, which could be traced back to 1900 by the World Bank [14]. The second one is the investigation of approaches. These approaches can be divided into two types, namely, single-equation models and hybrid models. Single-equation models include the linear regression model [10], whale optimization algorithm [15], artificial neural network [16], pseudo-panel approach [8], and so on. Hybrid models are driven by macroscale socioeconomic activity to simulate water use in specific regions. Hybrid models include IMAGE (Integrated Model to Assess the Global Environment) in IMAGE regions [17], QUAIDS (Quadratic Almost Ideal Demand System) models in Spanish [12], and IUWM (Integrated Urban Water Management) in Australia [18]. Therefore, a universal and long-term evaluation of water-usage trends is still missing, and a new approach should be introduced.

In this research, we adopted the expanded S-curve model to assess the long-term relationship between water usage and economics, which are presented by domestic water usage per capita (DWPC) and GDP per capita, respectively. The S-curve model has been widely used in mineral and energy resource evaluation. The S-curve pattern was first proposed by French mathematician Verhulst in 1838 for the description of biological population [19], and was employed by Wang et al. [20] and Gao et al. [21] to quantify the relationships between economics and energy and steel, respectively. Besides, in the water-usage studies by Zuo [22] and Florke et al. [23], the S-curve pattern was also used as country curve qualitatively. Here, water-usage data from 10 typical countries and regions in the last 100 years are described through the relationship between domestic water usage and GDP per capita, and the key points and transitional sections are identified in the expanded S-curve model. Other factors affecting domestic water usage are also discussed.

#### **2. Material and Methods**

#### *2.1. Data and Key Drivers*

This study first collected a vast amount of data about domestic water resources from 17 typical countries from 1950 to 2020 according to the publicly available data. The water data in this research are mainly from three parts. Firstly, the global water-related database with free access was established in AQUASTAT by the Food and Agriculture Organization of the United Nations (FAO) and the water databases in the World Bank Open Data [5,14]. It should be noted that national water-use records were conducted every 5 years or longer, and most of the water records could only be traced to 1960 or later. Secondly, a few of the detailed water-use categories were collected from national statistical offices, such as the U.S. Geological Survey (USGS) and the Eurostat and German Association of Energy and Water Industries [24–26]. Thirdly, a series of published literature reviews and statical surveys about water use were consulted. Gleick [27] and Shiklomanov [28] tried to conduct an adequate data survey for the World's Water Report and USA water data. Florke et al. used the WaterGAP 3 model for back-calculating water-use data on a global scale [23]. However, many historical records on domestic water use were incomplete or discontinuous, as shown Figure 1, for the primary selection of 17 countries. The relationship between water usage

and economic development is shown as the domestic water per capita (DWPC) and GDP per capita to offset the regional disparity.

**Figure 1.** Relationship between GDP per capita and DWPC for (**a**) South Africa and Brazil; (**b**) Spain, Poland, Greece, Romania, and Mexico; and (**c**) all 17 countries.

Figure 1 shows the collected domestic water data from 17 countries, including Japan (JP), China (CN), the United States (USA), Spain (ES), France (FR), the United Kingdom (UK), Poland (PL), India (IND), Indonesia (ID), South Africa (ZA), South Korea (KR), Greece (GR), Romania (RO), Germany (GER), Brazil (BR), Mexico (MX), and Canada (CA). However, several countries showed poor data availability. South Africa and Brazil, as shown in Figure 1a, showed a C-type curve, meaning that the water usage was reduced during economic recession and increased during economic recovery. Spain, Poland, Greece, Romania, and Mexico showed an entangled type, showing that the water usage drastically changed during economic transition. These 7 countries with cluttered data were excluded and the remaining 10 countries were collected as our research objects.

To get consistent and long-term DWPC data, the relationship between urbanization rate and DWPC was derived first, and changes in water-usage intensity were expressed as urbanization rate change due to the observation that as urbanization rate increases, water users in a more urban population trend toward a more water-intensive lifestyle. After the maximum level was reached, DWPC was either stable or declined with the increasing urbanization rate. The relationship between urbanization rate and DWPC is shown in Figure 2. Instead of using solely regional curves to estimate past DWPC, the current data model version was derived for 9 countries. Germany was an exception due to the lower DWPC, so the DWPC with urbanization between 20% and 60% in Germany was derived from its original data. Where data availability was missing with an urbanization of between 20% and 60%, the available information was combined in order to allow for the fitting of a simulating curve to the historical data, as shown in Figure 2.

**Figure 2.** Relationship between urbanization and DWPC.

The urbanization data could be traced back to 1900 and the relationship between urbanization and DWPC was simulated as three different models in Table 1, with the R-squared above 0.85. In addition, the corresponding models for applicable countries were listed according to the trends of the curves after urbanization above 60%, as shown in Table 1. When the urbanization was acquired, the DWPC could be derived with corresponding models.

**Table 1.** Simulated models between urbanization and DWPC.


Therefore, the long-term trends between DWPC and GDP per capita during 1900 to 2020 are shown in Figure 3 for 10 countries. It should be noted that the DWPC shown in Figure 3 is adjusted data, which were derived from the urbanization. Consequently, the DWPC is somewhat higher or lower than the observed data.

**Figure 3.** Relationship between GDP per capita and DWPC for 10 typical countries.

#### *2.2. Mathematical Modeling of the Expanded S-Curve*

The expanded S-curve model illustrating the relationship between DWPC and GDP per capita offers a tool to identify critical transitions from one stable state to another during economic development. A mathematical technique is employed to describe the expanded S-curve model. According to the expanded S-curve in previous studies [20,21,29], the relationship between DWPC (*W*) and GDP per capita (*G*) can be expressed as follows:

$$W - W\_{\bar{l}} = A \frac{\exp[\alpha\_1 (G - G\_{\bar{l}})] - \exp[-\alpha\_3 (G - G\_{\bar{l}})]}{2 \cos \text{h}[\alpha\_2 (G - G\_{\bar{l}})]} \tag{1}$$

where *α*1, *α*2, and *α*<sup>3</sup> are the exponential constraints, and *A* is the amplitude of the equation. *Wi* and *Gi* are the corresponding turning points on the expanded S-curve for DWPC and GDP per capita, respectively. Equation (1) is expressed as a hyperbolic tangent function.

Then, the linearity changes before the takeoff point, around the turning point, and after the zero-growth point are derived from Equation (1) as Equations (2)–(4).

$$W - W\_i = A + A(a\_2 - a\_3)(G - G\_i) = A + \rho\_l(G - G\_i) \tag{2}$$

$$W - W\_{\rm l} = 0.5A(\alpha\_1 + \alpha\_3)(G - G\_{\rm l}) = \rho\_{\rm l}(G - G\_{\rm l})\tag{3}$$

$$W - W\_{\rm l} = A + A(\alpha\_1 - \alpha\_2)(G - G\_{\rm l}) = A + \rho\_{\rm v}(G - G\_{\rm l}) \tag{4}$$

where *ρl*, *ρi*, and *ρ<sup>v</sup>* are the slopes of the curve before the takeoff point, around the turning point, and after the zero-growth point, respectively. They can be calculated from the systems of Equations (5)–(7):

$$\mu\_1 = \frac{\rho\_l + 2\rho\_i + \rho\_v}{2A} \tag{5}$$

$$\alpha\_2 = \frac{\rho\_l + 2\rho\_i - \rho\_v}{2A} \tag{6}$$

$$\alpha\_3 = \frac{-\rho\_{l+} 2\rho\_i - \rho\_v}{2A} \tag{7}$$

Equation (1) has a first-order partial derived from 0 at the zero-point of the S-curve:

$$\tanh[a\_1(\mathbf{G}\_\upsilon - \mathbf{G}\_i)] \tanh \mathbf{h} [a\_2(\mathbf{G}\_V - \mathbf{G}\_i)] = a\_1 a\_2^{-1}, \frac{dW}{dG} = 0 \tag{8}$$

By substituting Equations (5) and (6) into Equation (8), Equation (9) can be obtained

$$\tanh\left(\varphi\_1 A^{-1}\right)\tanh\left(\varphi\_2 A^{-1}\right) = \varphi\_3\tag{9}$$

where

$$
\varphi\_1 = 0.5(\rho\_l + \rho\_i + \rho\_v)(G\_v - G\_i) \tag{10}
$$

$$
\varphi\_2 = 0.5(\rho\_l + 2\rho\_i - \rho\_v)(G\_v - G\_i) \tag{11}
$$

$$\varphi\_3 = \frac{\rho\_l + \rho\_i + \rho\_v}{\rho\_l + 2\rho\_i - \rho\_v} \tag{12}$$

In summary, the *Wi*, *Gi*, *ρl*, *ρi*, and *ρ<sup>v</sup>* were from research data, and the *A*, *α*1, *α*2, and *α*<sup>3</sup> were from the equations.

#### **3. Results**

#### *3.1. Expanded S-Curve in Typical Developed Countries*

According to the universal equation, the expanded S-curve equations of the DWPC were established for four typical countries as Equations (13)–(16), which were the US, Japan, UK, and France, respectively.

(1) US

$$W = 170 + 181 \times \frac{\exp[0.000311 \times (G - 11,500)] - \exp[-0.0000872(G - 11,500)]}{2 \cosh[0.000088(G - 11,500)]} \tag{13}$$

(2) Japan

$$\mathcal{W} = 95 + 90 \times \frac{\exp[0.0000338 \times (G - 11,000)] - \exp[-0.0000668(G - 11,000)]}{2 \cosh[0.0000673(G - 11,000)]} \tag{14}$$

(3) UK

$$\mathcal{W} = 120 + 30 \times \frac{\exp[0.000395 \times (G - 13.000)] - \exp[-0.000311(G - 13.000)]}{2 \cosh[0.000395(G - 13.000)]} \tag{15}$$

(4) France

$$W = 90 + 35 \times \frac{\exp[0.00000763 \times (G - 12,000)] - \exp[-0.000825(G - 12,000)]}{2 \cosh[0.000178(G - 12,000)]} \tag{16}$$

Figure 4 gives the expanded S-curve simulation for these four typical countries.

**Figure 4.** Expanded S-curve simulation of DWPC and GDP for (**a**) the United States (**b**) Japan, (**c**) France, and (**d**) the United Kingdom.

The changing trajectories of DWPC with GDP per capita in these countries generally experienced three stages.

For the US in Figure 4a, the first stage was before the GDP per capita of USD 6500 in 1930 during Great Depression, and its economy sequence entered a special period with a winding curve until World War II. The DWPC maintained a flat trend during the first stage. In the second stage, the DWPC in the US kept growing to a high level of 230 m3 in 1990, with a GDP per capita of USD 20,000. After the GDP per capita of USD 35,000 in 2000, the DWPC started to decrease due to technology improvement, with the wide use of dishwashers and water-saving toilets. The efficiency improvements dramatically reduced the water usage.

Japan, in Figure 4b, showed a similar evolution pattern. The DWPC decreased distinctly during World War II in 1940s with a GDP per capita of around USD 2500, and it dropped to 32 m<sup>3</sup> with an annual decreasing rate of 7%. Then, with the post-war construction in 1950 with GDP per capita around USD 3500, the economic development model enabled Japan to enter a rapid development process of urbanization. From 1990 to 2000, after 40 years of linear growth, the DWPC in Japan peaked at approximately 120 m<sup>3</sup> with a GDP per capita of USD 20,000. After 2000, more efficient appliances and fixtures contributed to significant reductions in DWPC in Japan, with the same reduction trends in US.

France's and Britain's economic development was similar to that of US and Japan in the first stage, as shown in Figure 4c, d. They were stagnant for a long time after World War I and World War II before the 1950s. Meanwhile, the DWPC showed stationary trends until 1950. Then the DWPC in these two countries developed differently. For France, the DWPC showed no variation around 65 m<sup>3</sup> until 1970, with a GDP per capita of USD 11,000. Then it started to experience a slight increase due to the soaring urbanization rate. In the third stage, the DWPC in France diminished from 106 m<sup>3</sup> after 2000 with a GDP per capita

of USD 20,000 due to the application of water-saving machines on a large scale. The DWPC variations in UK were directly related to the evolution of water bureau management, which was further propelled by urbanization development. In the late 1960s with a GDP per capita above USD 10,000, the DWPC increased from 113 m3 with the increasing urbanization rate and population, and the management framework of water in the UK was optimized to improve the water efficiency. So, from 1970 to 1990, with a GDP per capita of between USD 11,000 and USD 18,000, an increase in the DWPC of between 115 m3 to 138 m3 was evident in the UK. In 1980s, with the stagflation in economics, the UK government published the Government White Paper on Privatization of Water Industry in 1986 and the top 10 water industries in the UK completed the privatization in 1989, which led to an increase in water prices and a decrease in DWPC in the 1990s with a GDP per capita of above USD 18,000 [30].

#### *3.2. Implication of the Expanded S-Curve Model*

According to the correlation analysis of the increase in DWPC and GDP per capita in Sections 2.2 and 3.1, we can conclude that the expanded S-curve can be calibrated with three key points, which are the takeoff point, the turning point, and the zero-growth point. Meanwhile, the long-term DWPC trends with GDP per capital were also divided into four stages according to the growth rate transition, including slow growth, accelerated growth, decelerated growth, and zero/negative growth. Figure 5 shows the key points and stages of the S-curve, and the points for each country are summarized in Table 2.


**Table 2.** Key points of expanded S-curve for each country (1990 GK in USD).

The takeoff point is the starting point for the accelerated growth in DWPC in the range of USD \$1500–5000, implying an adjustment of agriculture society to industrial society with the economic boom. Before this takeoff point, the DWPC was in the slow-growth section. The takeoff points for developed economics, such as the UK, the USA, France, and South Korea, occurred after USD 3000, whereas for developing economies, such as India and China, it occurred between USD 1500 and USD 2500. The turning point made an adjustment period of an industrial structure in the process of industrialization, with a GDP of USD 10,000–USD 13,000 for the researched countries without diversity. After the turning point, the growth rate in the DWPC transited from accelerated growth to decelerated growth until the zero-growth point. The zero-growth points were concentrated around USD 17,000–USD 22,000, which indicates that the DWPC entered a zero-growth or slow-decline stage. This is also consistent with the post-industrial stage [20], when the living standards were improved substantially to promote the technical progress of water-saving facilities.

**Figure 5.** Three key points and four stages with different growth rates of the expanded S-curve.

#### **4. Causes for the Changes in Domestic Water Usage**

The expanded S-curve model describes the effect of economics on domestic water usage; however, domestic water-usage changes also have a close connection with industrial structure, urbanization, and scientific–technical progress, which were also promoted by economics. The relationship between tertiary industry proportion, urbanization, DWPC, and GDP per capita for these 10 typical countries is summarized in Figure 6. The tertiary industry proportion in Figure 6a and urbanization in Figure 6b both present a similar regularity with GDP per capita, with accelerate growth before a GDP per capita of USD 10,000, decelerated growth between a GDP per capita of USD 10,000 and USD 13,000, and peak value around a GDP per capita of USD 20,000, which is consistent with the DWPC in Figure 6c.

**Figure 6.** Relationships between GDP per capita and (**a**) tertiary proportion, (**b**) urbanization, and (**c**) tertiary proportion of GDP.

In order to clearly characterize the relationship between these factors, a schematic diagram was established in Figure 7. The three key points of the expanded S-curve were annotated, and the corresponding turning points of tertiary industry proportion and urbanization were also calibrated to compare the corresponding relations.

#### *4.1. Urbanization*

The urbanization rate represents the population structure, which has a significant effect on the DWPC. According to previous study, water users in a more urban population in the first trended toward a more water-intensive lifestyle with increasing urbanization rate [13,31,32]. Chen et al. tested the urbanization factor in promoting the DWPC by LMDI [33,34]. The urbanization rate and DWPC showed a positive relationship with conformal key points, as seen in Figures 1 and 7. The urbanization rate can be divided into three stages: It rapidly grew from an urbanization of 20% to 70% before GDP per capita reaches USD 10,000, and then the growth rate decelerated until urbanization reached 80% and the GDP per capita reaches USD 20,000. After this stage, the urbanization was generally saturated, with a stable trend. The turning point for DWPC also occurred between USD

10,000 and USD 13,000, as seen in Figure 7, which is consistent with urbanization. Therefore, we can arrive at the conclusion that urbanization could accelerate domestic water usage.

**Figure 7.** Relationship between key points of expanded S-curve and important indicators of economic and social development.

#### *4.2. Industrial Structure*

Domestic water is used in the tertiary industry for urban households, rural households, and commercial service [5,14,24]. As the tertiary industry proportion increases, the urban infrastructure and commercial service would be enhanced with economic development, resulting in an increasing trend in the DWPC. During the industrialization and postindustrialized era, the tertiary industry proportion showed a linear increasing shape. This rapid increasing trend discontinued until a GDP per capita of USD 10,000. Then, it entered a slowly increasing stage between a GDP per capita of USD 10,000 and USD 15,000. The peak of the tertiary industry proportion was around USD 20,000, coinciding with the zero-growth point of the expanded S-curve of DWPC in Figures 6 and 7.

#### *4.3. Technical Progress*

The technical progress was usually promoted by the economic development, and it can be considered a main driver for water-usage change [35]. Figure 8 shows the indoor DWPC subsectors for the US and Japan. The indoor DWPC can be divided into five parts, and showers accounted for the greatest share for the US in Figure 8a and toilets account for the greatest share for Japan in Figure 8b. Besides, the shares for kitchen (mainly for dishwashers), clothes washers, and others were also substantial. Between 1990 and 2016, there was a statistically significant reduction in DWPC for toilets in both the US and Japan. According to previous studies, the declines are easy to understand due to the wide use of water-saving toilets. In addition, the increasing the efficiency of fixtures and appliances significantly reduced the amounts of clothes washers and kitchen use. Therefore, the reduction of DWPC after 2000 for most causes could be partly attributed to technological progress.

**Figure 8.** DWPC subsector proportions in (**a**) the US and (**b**) Japan.

#### **5. Conclusions**

In this paper, we focused on the evaluation of long-term domestic water-usage trends with economic development from typical countries. The relationship between domestic water usage and socio-economic developments on a country scale for the time period from 1900 to 2020 in 10 typical countries was demonstrated. The simulation results show that with the growth in GDP per capita, domestic water per capita showed an expanded S-curve of 'slow growth–rapid growth–zero growth, or even negative growth, with three key points, which were the takeoff point, turning point, and zero-growth point, respectively. The takeoff point of the expanded S-curve was located at a GDP per capita of USD 1500–USD 5000, according to the different development levels. The turning point was located at a GDP per capita of USD 10,000–USD 13,000, and the zero-growth point was concentrated around a GDP per capita of USD 17,000–USD 22,000, consistent with the post-industrial stage. Besides, the urbanization was proven to accelerate domestic water usage, and the higher tertiary industry proportion of GDP enhanced the domestic water-usage trends. The decreased water usage was attributed to technological progress, with widely used water-saving appliances.

The results of this research show that the expanded S-curve is applicable to the relationship between domestic water usage and economic development on a country scale. We hope this conclusion can contribute to the development of future solutions and strategies for domestic water prediction in different economies or similar economies under different development stages. However, there is still a lot of uncharted territory of the application of the expanded S-curve models. In this paper, only four of 10 typical countries were simulated in detail, and studies on a series of countries with few water usage data are still deficient. We will apply this expanded S-curve model to other countries in our future research, and hope that this model will encourage the efficient and sustainable management of water resources.

**Author Contributions:** Conceptualization, X.G. and A.W.; methodology X.G. and G.L.; validation and data curation, X.G. and B.D.; writing—original draft preparation, X.G. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Natural Science Foundation of China, grant numbers 72088101, 71991485, and 71991480. The APC was funded by grant number 72088101.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Article* **Prediction of the Long-Term Performance Based on the Seepage-Stress-Damage Coupling Theory: A Case in South-to-North Water Diversion Project in China**

**Xinyong Xu 1,2,\*, Wenjie Xu 1, Chenlong Xie <sup>3</sup> and Mohd Yawar Ali Khan <sup>4</sup>**


**Abstract:** The South-to-North Water Diversion Project has been in operation since 2014, directly benefiting more than 79 million people in China. Thus, its service life and long-term performance have gained much attention from scholars. To predict its life and performance, this study used the seepage/stress-damage coupling method. In addition, a seepage/stress-damage coupling theory was proposed and a finite element model of a deep excavated canal in the Xichuan Section of the South-to-North Water Diversion Project was established. The results showed that this canal subsided greatly in the first two years of operation, which can be confirmed by the monitoring data. It is predicted that, after 50 years of normal operation, the canal damage may start and spread from the water level, and reach 37.6%, but such damage will not affect its normal water delivery function. The purpose of this study is to provide guidance for the safe operation of the project.

**Keywords:** settlement; damage evolution; seepage/stress-damage method; data monitoring

#### **1. Introduction**

The South-to-North Water Diversion Project (SNWDP) aims to optimize the temporal and spatial allocation of water resources in China. As a national strategic project, it safeguards China's land management and sustainable development. Canal engineering is an integral part of SNWDP, and its seepage failure involves complicated hydraulic problems, particularly in some deep excavated sections, due to the high groundwater level, complex geological conditions, soil consolidation and deformation, and rainfall or channel infiltration [1]. The seepage–stress coupling may occur between the concrete lining and the foundation, damaging the lining plate. If the damage persists, the water from the canal will seep into the soil of the canal more quickly, altering the seepage field and causing structural damage between the soil of the canal, the concrete lining, and the seepage field [2]. Therefore, scholars at home and abroad are all concerned about the SNWDP's service life and performance evolution in long-term operation, for it matters to water delivery safety and further affects the people's living conditions, social and economic development, and environmental protection [3]. The Xichuan Section is the first section of the main channel of the SNWDP Middle Route, classified into Class I project according to the engineering grade, so its safe running is of great significance.

A seepage/stress-damage (SSD) coupling theory was proposed, and a finite element model of a deep excavated canal in the SNWDP Xichuan Section was established in the same scale as its actual design drawings. Even the materials used and the surrounding

**Citation:** Xu, X.; Xu, W.; Xie, C.; Khan, M.Y.A. Prediction of the Long-Term Performance Based on the Seepage-Stress-Damage Coupling Theory: A Case in South-to-North Water Diversion Project in China. *Appl. Sci.* **2021**, *11*, 11413. https:// doi.org/10.3390/app112311413

Academic Editor: Amit Kumar

Received: 9 October 2021 Accepted: 29 November 2021 Published: 2 December 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

environment were the same as the actual situation. First, the SSD coupling theory proposed in this paper needs a stable seepage field, on which a lot of research has been performed. For example, Cai et al. [4] used the strength reduction method to establish a seepage–stress coupled numerical model. They discussed the impact of various factors on the slope stability under different working conditions and concluded that the groundwater and rainfall infiltration have the most considerable effect on slope stability. In southern Jiangxi province, Pan et al. [5] investigated the failure modes of granite residual soil slopes and employed normal soil material parameters to analyze precipitation infiltration under three operating situations. They discovered that the wet front's depth and slope coefficient have varied over time. Zhou et al. [6] studied the spatial–temporal characteristics of water movement on fractured soil slopes under rainfall conditions. They investigated the mechanism of fractured slope instability as a function of soil saturation variations and discovered that matrix suction is the primary driver of overall instability. Huang et al. [7] analyzed the stability of hydraulic landslides with different permeability coefficients under fluctuating reservoir water levels and rainstorm conditions. They found that due to heavy rains, the stability of the landslide was considerably reduced, and the coefficient of stability increased with increased permeability. Kim et al. [8] analyzed and compared water pressure and pore pressure data from hydraulic wells to observe the influence of seepage changes. Luo et al. [9]. analyze the sudden pipeline crash of a specific project and derive its evolution process. Zhao et al. [10] proposed an innovative permeability evolution equation. They found that the seepage pressure will continuously intensify fracture propagation and penetration in the rock mass due to the time effect of permeability and failure. Nian et al. [11] used pore pressure as a controlled condition to analyze the rainfall infiltration and seepage on slopes under different rainfall intensities. They obtained the relationship between the rainfall intensity and the actual infiltration rate. The above research results showed that the permeability coefficient will change with time, rainfall intensity and other external factors, which can be verified by the SSD theory (taking the permeability coefficient as a variable) in this study.

Second, the SSD coupling method used in this paper needs the coupling of the seepage field and the stress field, which is mainly achieved by solving the seepage field and converting it into an equivalent load acting on the model nodes. Many research on the seepage–stress coupling has been conducted at national and global scales. For example, Wang et al. [12] established a theoretical model of micro-fracture grouting seepage based on the fluid–solid coupling between grout seepage and micro-fractures. They studied the fracturing conditions, fractures' spatial distribution, and the variation law of mud seepage distance. Through analysis and comparison, Ma et al. [13] obtained the failure mode and seepage characteristics of unloaded rock with and without water pressure. Ma et al. [14] used finite-difference to analyze the influence of saturated or unsaturated seepage on the slope stability. They obtained the influence of the flow rate on the stability of the slope. Liu et al. [15] analyzed that the external water pressure of the tunnel is related to the basement and seepage of the basement and is the main influencing factor related to safety. Xiao [16] used the neural network method to develop the program to combine the seismic load effect and the fluid–solid coupling effect. He analyzed the seepage stability of the earth dam and determined the dangerous sliding surface of the dam slope. Chen [17] gave a method of calculating the safety factor of slope stability considering seepage conditions based on the law of seepage–stress coupling evolution. He estimated that the effect of seepage has a significant impact on slope stability. Cai et al. [18] evaluated the slope stability under rainfall infiltration conditions based on the shear strength reduction technology. They considered the non-coupled conditions of seepage and deformation, combined with statistics and observation methods. Rahardjo et al. [19] studied the factors influencing slope stability under rainfall infiltration conditions. They found that slope instability mainly depends on the rainfall intensity and the nature of the soil, as well as slope type and groundwater level. Baum et al. [20] established saturated and unsaturated transient rainfall infiltration models. Based on correlated groundwater transients, unsatu-

rated infiltration analysis, and groundwater pressure diffusion, the models predicted the time and main source areas of landslides caused by rainfall. Rahardjo et al. [21] considered different groundwater levels, rainfall intensity, and soil properties to analyze the stability of the residual soil slope under rainfall infiltration conditions. The results are in good agreement with the research trend in the parameter study. Muntohar et al. [22] analyzed the failure laws of shallow slopes under rainfall infiltration conditions based on the Green– Ampt infiltration model and the infinite slope stability model. The proposed model can be used to estimate the first-order approximation of the time when a rainfall-induced shallow landslide occurs and its sliding depth. Tsai et al. [23] compared the design plan with actual case data. They investigated the influence of unit weight and the function of unsaturated shear strength and saturation on shallow landslides triggered by rainfall infiltration. Borja et al. [24] established a finite element model that couples solid deformation with fluid pressure in unsaturated soil to evaluate slope stability. However, most of the above research was conducted by using the seepage method or the seepage–stress coupling method. Based on these results, the SSD coupling method was adopted in this paper.

Third, the SSD coupling method used in this paper correlates seepage–stress with damage to reflect the impact of damage on the seepage of concrete linings. The relationship between the seepage coefficient and damage was used as a bridge connecting the seepage field, stress field and damage field. Some scholars have also explored such methods. For example, Zhou [25] derived the permeability coefficient conversion equation taking into account the damage to the tunnel rock and depicted the SSD multi-field coupling model of the surrounding rock. He analyzed the stability of the surrounding rock excavated in the tunnel construction based on the fluid–solid coupling theory. Zhou et al. [26] established an SSD coupling algorithm based on the permeable lining theory and applied it to highpressure hydraulic tunnels. Their results are consistent with the general engineering laws and provide a reference for solving practical engineering problems. Sheng et al. [27] believe that the influence of groundwater on slope stability cannot be replaced by pore water, and the synergy of the seepage field and the stress field must be considered in foundation pit slope engineering. Xu et al. [28] established the equation of relation between rock failure and coefficient of permeability based on damage variables and seepage–stress coupling. They described the evolutions of the rock-failure-based permeability and groundwater seepage field. They studied the evolutionary relationship between rock mass stress and strain, permeability and strain, strain and failure, as well as permeability and failure. Zhu et al. [29] coupled failure and fluid flow to the Mohr–Coulomb failure criterion, based on the dynamic evolution of damage, porosity and permeability, and proposed SSD models under the effect of hydraulic fracturing and natural fracturing based on the dynamic evolution of damage, porosity, and permeability. The results are very close to the engineering practices. The above research findings and the SSD coupling method used in this paper make the predictions more realistic.

Aiming at the above problems, this study used the SSD coupling method to predict the SNWDP's service life and performance evolution after long-term operation. Taking into account the change in the permeability coefficient induced by soil consolidation over time and the evolution of the infiltration field and its performance after long-term operation, it is necessary to discover the internal mechanism of the seepage failure and further explore the long-term changes in the performance of typical deep excavated sections of the SNWDP. The ultimate goal of this study is to predict the performance evolution of the deep excavated canal in the SNWDP Xichuan Section after long-term operation, providing a theoretical basis for the actual operation of the project.

#### **2. Basic Theory and Realization Method of SSD Coupling**

#### *2.1. Basic Theory of SSD Coupling*

Changes in the seepage–stress coupling environment can cause changes in the internal microstructure (i.e., meso-damage) [30], macro-mechanical properties, and permeability of concrete. Changes in permeability and mechanical properties can affect the concrete's

stress state, the distribution of soil pore pressure and worsen the meso-damage of concrete. This phenomenon is called SSD coupling.

Conventional seepage–stress coupled governing equations include solid-based geometric equations and equilibrium equations, fluid-based mass conservation equations and flow equations, and seepage–stress coupled constitutive equations [31]. The SSD coupled governing equations can be obtained by introducing concrete damage variables into the conventional seepage–stress coupled equations. The following derivation techniques are explained by using direct tensor notation to simplify the theoretical formulations mathematically [32].

Assuming that the seepage process follows the nonlinear Darcy's law in the entire section, water and materials are incompressible, and the volumetric deformation of the saturated porous solid framework is equal to the deformation of the pores, then, the seepage field conforms to the continuity equation of three-dimensional single-phase porous fluid [33,34].

$$
\frac{
\partial
}{
\partial\mathbf{x}}
\left[\mathbf{k}\_x \frac{
\partial H
}{
\partial\mathbf{x}
}\right] + \frac{
\partial
}{
\partial y}
\left[\mathbf{k}\_y \frac{
\partial H
}{
\partial y
}\right] + \frac{
\partial
}{
\partial z}
\left[\mathbf{k}\_z \frac{
\partial H
}{
\partial z
}\right] + \mathbf{Q} = \mathbf{0} \tag{1}
$$

where *kx*, *ky* and *kz* are the permeability coefficients in the *x*, *y*, and *z* directions, respectively; hydraulic potential *<sup>H</sup>* <sup>=</sup> *<sup>p</sup>* <sup>γ</sup> , in which *<sup>p</sup>* is the pore water pressure and *<sup>γ</sup>* is the water unit weight; *z* is the elevation head; and *Q* is the source sink term.

Assuming that concrete and rock masses are equivalent continuum models, then, after finite element discretization, interpolation, and integration, the matrix equation for solving the seepage field can be obtained as follows [35]:

$$[A]\{H\} = \{F\} \tag{2}$$

where [*A*] is the total permeability matrix, {*H*} is the column vector of the node head, and {*F*} is the nodal load obtained by integrating the seepage boundary. After the seepage field is calculated, the water load generated by the hydraulic gradient acts on the inside of the structure in the form of seepage force. In the equivalent continuum model, the seepage gradient acts on the node in the form of seepage force. After the node head is obtained through the seepage field calculation, the seepage load acting on the element node is calculated as follows:

$$\mathbb{E}\{\mathbf{F}\_p\} = -\int\int\int \mathbf{\dot{y}}[\mathbf{N}]^T \left\{\frac{\partial \mathbf{H}}{\partial \mathbf{x}}, \frac{\partial \mathbf{H}}{\partial y}, \frac{\partial \mathbf{H}}{\partial z} - \mathbf{1}\right\}^T d\Omega \tag{3}$$

where [*N*] is the interpolation function, **Ω** is the integral domain of the seepage force node and *T* is the transposition of a matrix.

The computational space domain is discretized to obtain the seepage–stress coupling equation:

$$[\mathbf{K}][\mathbf{U}] = \{\mathbf{F}\_V\} + \{\mathbf{F}\_s\} + \{\mathbf{F}\_p\} + \{\mathbf{F}\_{\sigma\_0}\} \tag{4}$$

where [*K*] is the structural stiffness matrix; [*U*] is the nodal displacement matrix; *FV* and *Fs* are the body and surface loads, respectively; *Fp* is the equivalent load formed by pore pressure; and *<sup>F</sup>*σ**<sup>0</sup>** is the initial stress load.

According to the incremental theory of plasticity in the plastic damage model, the total strain tensor, ε**,** is composed of the elastic strain rate, ε*el*, and the equivalent plastic strain rate, ε*pl*:

$$
\mathfrak{e} = \mathfrak{e}^{el} + \mathfrak{e}^{pl} \tag{5}
$$

When there is no damage to the concrete, the stress–strain relationship of the concrete is as follows:

$$
\sigma = D^{el} \left( \varepsilon - \varepsilon^{pl} \right) \tag{6}
$$

where σ is the total stress, and *Del* is the elastic stiffness matrix.

When the concrete material is damaged, according to the theory of continuum damage mechanics, the internal micro-cracks, micro-pores, and other micro-defects under the action of external loads can be described by the damage factor, *d*. The damage factor is mainly used to reflect the concrete stiffness degradation under uniaxial or multiaxial loads. Assuming that the damage is isotropic, then, the relationship between the damage and stress of the concrete under the three-dimensional multiaxial state can be expressed by the damage elasticity equation, and the concrete stress, σ**,** is calculated as follows [36]:

$$
\sigma = (\mathbf{1} - d)\overline{\sigma} = (\mathbf{1} - d)D^{el} \left(\varepsilon - \varepsilon^{pl}\right) \tag{7}
$$

where σ is the effective stress, which represents the stress on the net section of the concrete material.

The element damage, *<sup>d</sup>***,** is expressed by the equivalent plastic strain <sup>ε</sup>*pl*:

$$\begin{cases} \quad d\_{\mathbf{t}} = d\_{\mathbf{t}} \left( \hat{\mathfrak{e}}\_{\mathbf{t}}^{pl} \right), \mathbf{0} \le d\_{\mathbf{t}} \le \mathbf{1} \\\ d\_{\mathbf{c}} = d\_{\mathbf{c}} \left( \hat{\mathfrak{e}}\_{\mathbf{c}}^{pl} \right), \mathbf{0} \le d\_{\mathbf{c}} \le \mathbf{1} \end{cases} \tag{8}$$

where *dt* is the tensile damage factor, *dc* is the compressive damage factor, *t* is the tensile state, and *c* is the compressive state.

The equivalent plastic strain <sup>ε</sup>*pl* is calculated as follows:

$$\begin{cases} \begin{array}{c} \hat{\varepsilon}\_{t}^{pl} = \int\_{0}^{t} \hat{\varepsilon}\_{t}^{pl} dt \\ \hat{\varepsilon}\_{c}^{pl} = \int\_{0}^{t} \hat{\varepsilon}\_{c}^{pl} dt \end{array} \end{cases} \tag{9}$$

$$\begin{cases} \dot{\hat{\varepsilon}}\_t^{pl} = r(\hat{\bar{\sigma}}) \hat{\bar{\varepsilon}}\_{max}^{pl} \\ \dot{\hat{\varepsilon}}\_c^{pl} = -\left(\mathbf{1} - r(\hat{\bar{\sigma}})\right) \hat{\bar{\varepsilon}}\_{min}^{pl} \end{cases} \tag{10}$$

where ˆ . ε *pl max* is the maximum value the plastic strain rate tensor, ˆ . ε *pl min* is the minimum value of the plastic strain rate tensor, . ε *pl <sup>t</sup>* is the equivalent plastic strain rate in tension, and . ε *pl c* is the equivalent plastic strain rate in compression. The multiaxial stress weighting factor *r* σˆ can be defined as follows:

$$r(\hat{\vec{\sigma}}) = \frac{\sum\_{i=1}^{3} \langle \hat{\vec{\sigma}}\_{i} \rangle}{\sum\_{i=1}^{3} |\hat{\vec{\sigma}}\_{i}|}, \mathbf{0} \le r(\hat{\vec{\sigma}}) \le \mathbf{1} \tag{11}$$

where <sup>σ</sup><sup>ˆ</sup> *<sup>i</sup>* (*<sup>i</sup>* = 1, 2, 3) are the principal stress components, respectively, · is defined as *x* <sup>=</sup> (|*x*<sup>|</sup> <sup>+</sup> *<sup>x</sup>*)/**2**, and <sup>|</sup>*x*<sup>|</sup> is the absolute value of *<sup>x</sup>*.

Under periodic alternating loads, the complex concrete damage mechanism is related to the cracking and merging of the initial cracks and their interrelation during changes. When the concrete is subject to compression after tension, its stiffness will be partially restored, that is, the unilateral effect is more significant. To reflect this effect, the relationship between tensile and compressive damage variables, *dt* and *dc*, is:

$$(\mathbf{1} - d) = (\mathbf{1} - \mathbf{s}\_t d\_c)(\mathbf{1} - \mathbf{s}\_c d\_t) \tag{12}$$

where **<sup>0</sup>** <sup>≤</sup> *st*, *sc* <sup>≤</sup> **<sup>1</sup>**, *st*, and *sc* are the relational expressions after stiffness recovery.

$$\begin{cases} \mathbf{s}\_{t} = \mathbf{1} - \boldsymbol{\omega}\_{t} r(\hat{\overline{\boldsymbol{\sigma}}}), \mathbf{0} \le \boldsymbol{\omega}\_{t} \le \mathbf{1} \\\ s\_{\boldsymbol{\varepsilon}} = \mathbf{1} - \boldsymbol{\omega}\_{\boldsymbol{\varepsilon}} [\mathbf{1} - r(\hat{\overline{\boldsymbol{\sigma}}})], \mathbf{0} \le \boldsymbol{\omega}\_{\boldsymbol{\varepsilon}} \le \mathbf{1} \end{cases} \tag{13}$$

where ω*t* and ω*c* are the weighting factors of stiffness recovery related to the material properties. Figure 1 shows the stiffness recovery curve of the concrete damage model

when the weighting factors are ω*<sup>t</sup>* = **0** (compression → tension) and ω*<sup>c</sup>* = **1** (tension → compression) under uniaxial alternating loads.

**Figure 1.** Stress–strain relation under the uniaxial alternating load.

When the plastic damage model is used, the damage may cause the degradation of the concrete structure's stiffness. Based on it, the influence of concrete damage and cracking on the stress state of the structure can be simulated. At the same time, the concrete damage and cracking have a significant impact on the permeability characteristics of the structure. The material element is composed of a damaged phase and an undamaged phase. The element permeability coefficient is calculated as follows [37]:

$$k = (\mathbf{1} - d)k\_m + dk\_d \left(\mathbf{1} + \varepsilon\_v^{pf}\right)^3 \tag{14}$$

where *km* is the permeability coefficient of the undamaged phase, and *kd* is the permeability coefficient of the damaged phase. Assuming that no damage occurs in the case of elastic deformation, while plastic deformation and damage occur simultaneously, then, the plastic volumetric strain of the damaged phase is ε*v pf* = *d*ε *p v*, in which ε *p v* is the plastic volumetric strain of the element.

Once macroscopic cracks appear, brittle material's permeability will suddenly increase; thus, the sudden jump factor, *ξ,* is introduced to calculate the permeability coefficient of the damage phase [25]:

$$
\mathbf{k}\_d = \xi \mathbf{k}\_m \tag{15}
$$

where, for compression–shear damage, ξ = **100**, and for tensile damage,

$$\mathfrak{E} = \left\{ \begin{array}{c} \mathbf{10}, \mathbf{0} < d \le \mathbf{0}. \mathbf{1} \\ \frac{\mathbf{10}\mathbf{0}\mathbf{0} - \mathbf{10}}{\mathbf{0}\mathbf{9} - \mathbf{0}\mathbf{1}} d + \mathbf{10}, \mathbf{0}. \mathbf{1} < d < \mathbf{0}. \mathbf{9} \\ \mathbf{10}\mathbf{0}\mathbf{0}, \mathbf{0}. \mathbf{9} \le d \le \mathbf{1} \end{array} \right. $$

Compared with conventional seepage–stress coupling models, the model in this paper couples the effect of damage and extends the study of the seepage–stress coupling problem from the simple stress state analysis to the damage process analysis, which lays a theoretical basis for further studying the concrete failure process and seepage evolution under seepage–stress coupling conditions.

#### *2.2. SSD Coupling Method*

The lining supports most of the water pressure from the deep excavated canal of the SNWDP [38]. In this paper, the constitutive elastoplastic relationship is used to simulate the canal lining, and it is based on the Mohr–Coulomb criterion; and the SSD coupling method is used to analyze and predict the long-term functioning of the lining of the canal.

The SSD coupling method used in this paper is based on the ABAQUS finite element software for secondary development [39]. The SSD coupling method used in this paper is based on the ABAQUS finite element software for secondary development. The damage was obtained via the FORTRAN language program, namely GETBRM. According to the damage curve and the permeability coefficient, the subprogram USDFLD (ABAQUS subprogram, which can define the constant variable on the material point as a time function) is used to update the canal permeability coefficient with damage changes. The element's permeability coefficient is defined as a field variable, and the subprogram was utilized in each incremental calculation to obtain the maximum principal strain and the equivalent plastic strain of the material integration point, thereby determining the element's stress state. The element's damage variable was solved. The lining's permeability coefficient was revised based on relevant information of the element and node to predict the long-term operation of the SNWDP more accurately. The established SSD coupling analysis process is shown in Figure 2.

**Figure 2.** Flowchart of seepage/stress-damage coupling analysis.

#### **3. Model Parameters and Boundary Conditions**

#### *3.1. Project Overview*

There is a canal excavated with a depth of 36–47 m in the SNWDP Xichuan Section. The canal is located on the edge of the northern subtropical zone and also in the humid area. Affected by the monsoon climate all year round, it has four distinct seasons and abundant rainfall, with an average annual rainfall of more than 730 mm. In addition, this canal has a high groundwater level. Given that canals with high fills are more likely to undergo slope instability, this paper selects a typical section of the said deep canal in Xichuan Section for research. Its slope is reinforced by the combination of large-section excavation and water collection well. The canal consists of concrete lining plates, a geomembrane, sand–gravel

cushions, and foundations. There are three-level bridleways on both banks, which can withstand vehicle loads, canal water pressure, groundwater pressure, and gravity.

As illustrated in Figure 3, a three-dimensional finite element numerical simulation model of a representative portion of a deep dug channel was created. It was accomplished by considering the central point of the canal bottom as the origin, the *X*-axis as the horizontal direction of the vertical water flow, the *Y*-axis as the direction parallel to the flow direction, and the *Z*-axis as the vertical direction the vertical water flow. The canal structure in the numerical model was discrete based on the C3D8RP (hexahedral reduced-integration) element. It had 185,000 elements and 208,098 nodes in total. The bottom of the finite element model was constrained fully, and the surrounding was constrained normally. The boundary conditions of the total head and the free seepage section were set. The monitored seepage flow of the canal was converted into the seepage velocity and set as the seepage velocity boundary condition.

**Figure 3.** Three-dimensional finite element model of a typical section of the deep excavated canal.

#### *3.2. Finite Element Model and Material Properties*

This study chose a specific section of a high groundwater level for the numerical simulation analysis of the infiltration–stress coupling. A three-dimensional finite element numerical simulation model based on the drawings of the typical section design was established. Figure 4 depicts the canal's general details and the distribution of structural materials in each portion. Part of the model parameters are as follows: the canal bottom width is 13.5 m; the digging depth is 46 m; an 8 cm–thick C25 concrete slab is used as the lining plate, under which there is a composite geomembrane and then a 25 cm–thick coarse sand cushion, with the foundation at the bottommost; the designed water level is 8 m; the increased water level is 8.77 m; and the underground water level is 41.28 m.

**Figure 4.** Finite element model of a typical section of the deep excavated canal.

Three monitoring points, A, B, and C, were selected in the deep excavated canal to monitor the canal's displacement and settlement. The clay materials are mainly used for the foundation and slope of the deep excavated canal section of the SNWDP. The anti-seepage system is mainly achieved by a concrete lining board, geomembrane, coarse sand cushion, and polysulfide sealant, as shown in Figure 5. During the actual operation of the SNWDP, the canal's infiltrated surface is subject to the continuous change of the permeability coefficient. Therefore, in this study, the characteristics of the water section were constantly assumed to ensure the continuity and accuracy of the results.

**Figure 5.** Canal seepage system.

#### **4. Comparison between Monitored Data and Numerical Simulation**

This section provides a concise and precise description of the experimental results, their interpretation, and the experimental conclusions that can be drawn.

Three monitoring points, A, B, and C, were selected on the canal, bottom, and slope, respectively (as shown in Figure 6a). Their safety-monitoring data from January 2014 to January 2018 were calculated and analyzed. Settlement monitoring points were used to conduct on-site surveys of the settlement displacement of the deep excavated section of the SNWDP (Figure 6b,c). The calculation results were compared and analyzed. Figure 7 shows the correlation curve between the simulated settlement and the monitored settlement of the monitoring points.

It can be seen from Figure 7 that, (1) during the 5-year operation of the deep excavated canal, the displacement gradually increased from the bottom to the top of the canal, reaching the highest at the top. The maximum difference between the monitored displacement and settlement value and the calculated value is 0.559 mm, and the minimum is 0.02 mm; (2) the canal subsided rapidly during the first two years, and then the settlement slowed down and got close to final settlement; and (3) compared to the displacement curve of the monitored section, the settlement trend is similar. According to the Adj.R-square coefficient and Pearson's correlation coefficient, the calculated data curve is highly fitted to the actually monitored data curve. After five years of operation, the settlements are the same, suggesting that the numerical simulation of the canal's long-term settlements is consistent with the actual project operation.

(**a**)

**Figure 6.** Canal of the South-to-North Water Diversion Project and settlement monitoring points. (**a**) A section of the South-to-North Water Diversion Project, (**b**) Settlement monitoring points protection box, (**c**) Settlement monitoring points.

**Figure 7.** Correlation curve between the estimated settlement and the monitored settlement of the monitoring points.

#### **5. Evolution of the Canal's Long-Term Behavior Based on SSD**

#### *5.1. Evolution of Canal Pore Pressure*

Long-term seepage failure has a significant impact on the safe operation of the canal. Taking as an example the deep excavated canal in the section of the SNWDP, the long-term settlement of the canal and seepage field changes under the effect of coupling of seepage

and stresses are calculated. Based on the above parameters and conditions, the ABAQUS software is used to estimate the deep excavated canal's long-term settlements and seepage field changes.

Figures 8 and 9 respectively show the saturation contours of the deep excavated canal after 10 and 20 years of operation. Here, the saturation is used to describe the moisture content of the foundation soil under the concrete lining plate. It can be seen from the figures that in the deep excavated canal, most of the soil is in a saturated state, and some are unsaturated, and the seepage effect is relatively large. By comparing the two contours, it can be seen that under given conditions, as the operating time of the canal increases, the saturated zone decreases and the unsaturated zone increases. It may be due to the fluctuation in the void ratio and the permeability coefficient produced by canal settlement. With the continuous settlement of the canal, the void ratio decreases, and the coefficient of permeability also decreases. In the macroscopic view, a change in the seepage field generates a change in the stress field, which shows a change in the settlement.

**Figure 8.** Canal saturation contour when t = 10 years.

**Figure 9.** Canal saturation contour when t = 20 years.

Figures 10 and 11 respectively show the pore pressure contours of the deep excavated canal after ten years and 20 years of operation. It can be seen from the figures that there is negative pore pressure at the top of the canal, indicating the existence of unsaturated zones. The foundation exhibits both saturated seepage and unsaturated seepage, which is consistent with the numerical results of saturation. Pore pressure decreases on both sides towards the middle. Due to the high groundwater level on both sides of the deep excavated canal, the groundwater flows into the canal center under the action of gravity. Therefore, drainage measures should be taken on the slope of the canal to reduce the groundwater seepage and improve the security of the canal water supply.

**Figure 10.** Canal pore pressure contour when t = 10 years.

**Figure 11.** Canal pore pressure contour when t = 20 years.

#### *5.2. Evolution of the Canal's Long-Term Settlements*

In the calculation process, the design water level of the canal, the highest groundwater level of the slope and underground drainage measures are not applicable. The anti-seepage system is damaged, the seepage of the slope is stable, and the interior slope has no boundary flow. Based on the finite element model of the specific cross-section of the deep excavated canal, the characteristic points a, b, c, and d respectively were selected for settlement analysis at the canal bottom, embankment, slope, and top. The water level adopted the canal's design water level, the vehicle load on the first-level bridleway adopted the car-10 level load, the crowd load was 0.3 t/m2, and the highest groundwater level was 180.218 m.

The SSD coupling method was used to compute the canal settlements after five years of operation based on the above working conditions. The nephogram of the canal's vertical displacements after a 5-year operation is shown in Figure 12, and the settlement of each distinctive point is shown in Figure 13. The highest vertical displacement of the canal was 4.151 mm, which was within the authorized settlement range, as shown in the figures. The U2 is the vertical component of the total settlement U.

**Figure 12.** Canal settlement nephogram when t = 5 years.

**Figure 13.** Settlements of each characteristic point whent=5 years.

Based on the foregoing conclusions, a numerical simulation of the deep excavated canal is carried out, and a nephogram of the canal's settling clouds after ten years of operation is obtained (see Figure 14). According to the nephogram of canal settlements, the settlement curve of each characteristic point was obtained (as shown in Figure 15). It can be seen from the figures that the maximum settlement is 5.128 mm, which represents an increase of 0.977 mm compared to the five years of operation. This change is not significant, so it is believed that the settlement has reached the final settlement. Table 1 shows the settlement rate of the canal top's characteristic point at different times after the 10-year operation. The canal subsides at a decreasing rate after its operation. After 10 years of

operation, the canal settlement rate is 0.16 mm/year. Although its absolute number is minimal and may not have much impact, the settlement continues.

**Figure 14.** Canal settlement nephogram when t = 10 years.

**Figure 15.** Settlement of each characteristic point when t = 10 years.

**Table 1.** Canal top's settlement rate.


The numerical simulation is repeated to investigate the deep excavated canal's longterm settlement and deformation to get the nephogram of canal settlements after 20 years of operation, as shown in Figure 16. It is evident from the figure that the maximum settlement is very close to that calculated after 10 years of operation, indicating the deep excavated canal has reached a stable state, as shown in Figure 17. However, in actual operation, the stability of the deep excavated canal is affected by many factors such as complex water distribution conditions and varying environments, and significant settlements can occur locally. As the operating time increases, the canal's settlement becomes smaller, not affecting its operating safety.

**Figure 16.** Canal settlement nephogram when t = 20 years.

In this paper, the hypothetical SSD coupling method was used to predict the life and performance of the deep excavated canal in the SNWDP Xichuan Section. The numerical results showed that under normal operating conditions, the canal may only subside a little after 20 years of operation. The numerical simulation in this paper is based on the coupling of the seepage field and the stress field. It is expected that the canal operates will work stably for some time in the future.

#### *5.3. Canal Lining Damage and Crack after Long-Term Operation*

A numerical simulation of the evolution of the SNWDP's deep excavated canal after 50 years of regular operation was conducted by using the SSD coupling. The canal settlement nephogram was obtained (Figure 18). The displacement of each characteristic point is shown in Figure 19. Compared with the settlement nephogram after 20 years of operation, the settlement is insignificant and remains stable.

**Figure 18.** Canal settlement nephogram when t = 50 years.

**Figure 19.** Settlement of each characteristic point when t = 50 years.

The seepage effect is larger in the deep excavated canal because much of the soil is saturated. As the settlement increases, both the void ratio and permeability coefficient decrease. The change in the seepage field affects the stress field and settlement at the macro level.

After 50 years of operation, the canal was discovered to be slightly damaged when investigating the evolution of its long-term behavior (Figure 20). In terms of overall damage, the maximum damage to the canal lining after the long-term operation is 37.6%. It is predicted that the deep excavated canal in the SNWDP Xichuan Section can still be used under normal operating conditions, for there is no large-area damage, except the concrete damage near the water surface. The damage first appeared near the water level of the left lining plate because the elevation of the top of the canal on the left bank was higher than that on the right bank, and the seepage–stress coupling effect was greater. The damage then occurred on the right lining plate, particularly at the point of the overflow of the water table in the canal, which was symmetrical to the point of damage on the left bank. The high level of the water table and its increasing water pressure with depth are the main causes of damage to the canal lining board. The water pressure differential between the inside and outside of the canal lining board is significant. The water pressure difference between the top and lower positions of the water surface inside the canal is minor, resulting in damage. The canal lining board is generally in a relatively safe condition, and there is no large-scale damage. Over time, the lining board may eventually be damaged in the form of uplifting after 50 years of operation, which may provide theoretical indications for the actual operation of the project. This study used the SSD coupling method to predict the service life and running status of the deep excavated canal in the SNWDP Xichuan Section under normal conditions, but it was limited to such external factors as environment, climate, and rainfall intensity [40]. More other influencing factors need to be studied further.

**Figure 20.** Damage-distribution cloud diagram after 50 years.

#### **6. Conclusions**

In this study, the SSD coupling method was adopted. The canal "lining-foundation" is considered a whole coupled system. The concrete lining's damage and the foundation's seepage damage were linked together, and long-term effects were introduced for numerical simulation and analysis of the canal lining structure. The conclusion is summarized as follows:


nephograms after 20 and 50 years of operation. It is evident from the nephograms that the settlement finally remains the same.


**Author Contributions:** Conceptualization, X.X. and W.X.; methodology, C.X.; software, C.X.; validation, W.X. and C.X.; formal analysis, W.X.; investigation, W.X.; resources, X.X.; data curation, X.X.; writing—original draft preparation, W.X.; writing—review and editing, W.X. and M.Y.A.K.; visualization, W.X.; supervision, X.X.; project administration, X.X.; funding acquisition, X.X. All authors have read and agreed to the published version of the manuscript.

**Funding:** This study was supported by the National Key Research and Development Program of China (Grant No. 2018YFC0406901), the National Natural Science Foundation of China (Grant No. 51979109), Henan Science and Technology Innovation Talent Program (Grant No. 174200510020), and Henan Province University Science and Technology Innovation Team Support Plan (Grant No. 19IRTSTHN030). These supports are gratefully acknowledged and greatly appreciated.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Applied Sciences* Editorial Office E-mail: applsci@mdpi.com www.mdpi.com/journal/applsci

Academic Open Access Publishing

mdpi.com ISBN 978-3-0365-9642-6