1. Introduction
Rivers have a pivotal role in ecological and human health as well as in the economic development of territories, representing the main water supply for domestic use, irrigation, and industrial activities. In the last decades, their water quality has ever more worsened due to both natural processes and anthropic interventions, such as the discharge of industrial and municipal pollutants together with runoff from agricultural lands [
1]. Recently, climate change has further contributed to increasing such problems in many countries, causing more and more extreme events. In fact, on the one hand, less inflow in rivers during draughts reduces the dilution of the contaminants introduced from human and natural sources; on the other hand, the more frequent occurrence of higher runoff due to intensive storms increases their load of pollutants. Similarly, the growth of water temperature modifies the bio-geo-chemical processes and reduces the dissolved oxygen concentration in natural channels, while the overflow of treated and untreated wastewater systems due to flooding seriously affects the biotic life cycle and the possibility of waterborne diseases [
2]. In addition, the rapid growth of population and economic activities, together with the urban sprawl, are pushing towards a higher demand of high-quality water not often matched by the locally available resources, while the discharge of insufficiently treated wastewater raises expenses for downstream users and has damaging effects on the aquatic environments [
2].
In this context, reliable information about river water quality must be collected for an efficient resource management and to implement protective measures and actions able to improve the conditions of the water bodies [
3] as required by the Sustainable Development Goals (SDGs). Monitoring networks measuring various chemical, physical, and biological river quality parameters appear as a great source of information on the water status in space and time [
4,
5,
6,
7,
8]. However, they do not provide a complete and clear picture of the scenario but only judgement in terms of individual parameters. In order to quickly and easily collect information on the river water quality with a global vision, different approaches based on the evaluation of only a few indices have been developed in recent years [
9]. Among these, the Water Quality Index (WQI) method is widely used to simplify expressions of complex sets of pollution variables in rivers, lakes, and groundwater, and it is considered a key element in water resource management [
10]. In particular, the WQI combines various environmental parameters and converts them into a unique value, detecting the overall status of water quality. Therefore, instead of comparing the different evaluation results of multiple parameters, the WQI method is a reliable approach able to provide integrated information on the quality [
11]. Moreover, it helps decision makers to correctly and sustainably manage the water resource, it analyses the impacts of the application of regulatory policy or laws, and it provides a more comprehensive picture of the source’s quality for an easier understanding by non-technical stakeholders [
12]. Introduced as early as 1965 by [
13] to define the status of water quality in the Ohio River, it has undergone various formulations and modelling over time, becoming one of the 25 environmental performance indicators of the holistic Environmental Performance Index [
14]. The evaluation of the WQI is based on four main steps: (1) choice of parameters; (2) calculation of sub-index values; (3) giving weights to the different parameters; (4) final assembly of the weighted sub-index values [
15].
The parameters choice is one of the most important phases in the design of the WQI and also the most complex one. There are various WQIs across the world which are based on different selected parameters, ranging from 4 [
16] to 26 [
17]. In the last decades, most of the studies have focused on the design of a WQI with fewer environmental parameters able to describe the overall water quality, in order to reduce the repetitive or correlated environmental variables and lower the analytical and monitoring cost. Recently, various multivariate statistical techniques, including Cluster Analysis (CA), Principal Component Analysis (PCA), Factor Analysis (FA), and Discriminate Analysis (DA), have been widely used to select the few parameters able to detect variations in river water quality in space and time and to detect potential degradation sources within the basin. For instance, Kumarasamy et al. [
18] investigated the hydrochemistry of the Tamiraparani river basin in Southern India with multivariate CA, PCA, and FA. Phung et al. [
19] applied the CA, PCA, FA, and DA techniques to estimate the temporal and spatial changes of surface water quality in the Mekong Delta area of Vietnam. Correlation analysis, PCA, and CA components were employed by [
20] to describe seasonal changes, identify contamination sources, and cluster monitoring stations of the Ganga and Yamuna rivers in the Uttarakhand State (India). In 2016, Barakat et al. [
4] determined the main contamination sources in the Oum Er Rbia river and its main tributary in Morocco, using multivariate statistical methods including Pearson
’s correlation, PCA, and CA. Zandagba et al. [
21] studied the suitability of Nokoué’s water, one of the largest West African lagoons, and identified possible sources of pollution through Hierarchical Cluster Analysis (HCA) and PCA. Although such techniques are becoming more and more popular for their capacity to manage great volumes of spatial and temporal data deriving from a variety of gauge stations, they are still subjective because they depend on the number of parameters provided for the analysis [
12,
16].
The present paper offers a new approach on the basis of information theory, in order to select the variables causing the spatial and temporal quality variations of a river subject to point and diffuse pollution sources within basin. It provides powerful tools able to relate various interconnected flow data in order to obtain the best understanding of processes without any assumptions about the correlations/dependencies among time series. This theory, built on the mathematical concept of entropy, represents the quantitative measure of the information content associated with a signal. It has been widely used in different sectors of hydraulics and hydrology to derive models of rainfall-runoff, infiltration, and soil moisture [
22,
23,
24,
25,
26,
27] as well as distribution of velocity, sediment concentration, and shear stress in open-channel flows [
28,
29,
30,
31,
32,
33,
34,
35,
36,
37,
38,
39]. Among the different applications, information theory has also been employed for the optimization, design, and management of several gauge stations including networks of water quality and groundwater [
40,
41], rainfall [
42,
43], streamflow, and water level [
44,
45,
46,
47,
48,
49,
50,
51]. These problems can be solved through a multi-objective optimization approach, in which the repetitive information is minimized whilst the total information is maximized. This concept is known as Maximum Information Minimum Redundancy (MIMR) [
45]. To the authors’ knowledge, the MIMR criterion has not yet been used for the identification of representative sets from an ensemble of quality parameters collected along a river. To that end, an easy-to-implement algorithm will be developed here and applied to a sample basin of Northeast Italy, subject to continuous stresses of urban and industrial origin, in order to verify its reliability and accuracy [
52,
53,
54,
55]. During the selection, the three norms of maximum overall information, maximum information transition ability, and minimum redundant information must be satisfied to achieve a unique solution under different scenarios with a good performance and to thereby simplify the decision-making process. The MIMR criterion, being based on a mathematical principle, could be more objective and less affected by the number of investigated variables compared to other selection methods. In fact, the above-mentioned four most used techniques for parameter selection (CA, PCA, FA, and DA) are characterized by several disadvantages: the need of correlated parameters; the strict assumption about their relation having to be linear, which occurs very rarely; and the required number of over 300 measured data points [
56,
57] for the investigated sample, in order to obtain reliable results. The MIMR approach, instead, would allow identifying only the parameters mostly responsible for the river pollution. In this way, the local monitoring programs could be better addressed and prioritized, increasing both the recording frequency of these parameters and the amount of measuring sites, especially in fluvial reaches at higher risk of contamination and located in strongly anthropized, industrial, and agricultural areas. A fast and simplified water quality assessment, based on few parameters, could thus be more easily communicated and better understood by the public and non-technical stakeholders. In addition, the local administrators and policy makers could be guided towards a faster and better choice of mitigation measures and structural investments in order to achieve some of the Sustainable Development Goals (SDGs) such as:
- -
the significant reduction of pollutants in fluvial and marine environments (Goals 6.3 and 14.1);
- -
the minimum release of hazardous substances and of untreated wastewater in rivers (Goal 6.3);
- -
an increasingly efficient and right use of the water resource (Goal 12.2);
- -
cleaner water to satisfy the needs of society and the safe use of surface waters for recreational purposes, hygiene, and household activities (Goal 6.4).
The paper is organized as follows: in
Section 2, the study area and data are introduced, the basic entropy theory is briefly described for an easier understanding of the MIMR criterion, and the selection algorithm is presented;
Section 3 reports the results of the MIMR application in the identification of the representative quality parameters set, the potentialities of the proposed framework, and the comparison with the PCA selection method; finally,
Section 4 states the conclusions.
4. Conclusions
The rapid growth of the worldwide population, together with the current climate change, are contributing to the increase of river pollution, pushing research towards the development and implementation of effective methodologies able to rapidly and easily provide reliable information on the degradation status.
The Water Quality Index (WQI) proved to be a useful tool to obtain a clear and complete picture of the contamination level of a river stressed by point and diffuse sources of natural and anthropic origin, leading the policy makers and end-users towards a more and more correct and sustainable management of the water resource. Such index is often based on a significant number of environmental parameters describing the overall water quality and, recently, most of the studies have focused on reducing them in order to remove the redundant variables and lower the analytical and monitoring costs. Therefore, the quality parameters selection represents one of the most important and complex phases for the design of the WQI, and recent multivariate statistical techniques do not seem to show great objectivity and accuracy in the identification of the real water pollution status.
This study proposes a new method based on information theory in order to select the variables causing the quality variations in time and space of a river subject to point and diffuse pollution sources within the basin. Such method, known as the Maximum Information Minimum Redundancy (MIMR) criterion, built on the mathematical concept of entropy, allows choosing the parameters through a multi-objective optimization approach, where the repetitive information is minimized whilst the total information is maximized. The criterion was validated on a sample basin of Northeast Italy subject to continuous stresses of urban and industrial origin. Its application required the data discretization using a mathematical floor function, which converts continuous random variables to integers assigning a proper value of the bin width. In the present paper, the three known empirical formulas, used to define the optimal bin width, showed not to significantly affect the entropy values, leading to the conclusion that any formula could be chosen for the data discretization. The assessment of the quality parameters’ information content under different time windows highlighted its reaching about 90 % in 5-years, compared to the one calculated in 10 years, demonstrating how shorter lengths of series could also be considered, especially when a limited amount of data is available. Besides, a sensitivity analysis, performed by varying the information redundancy tradeoff weights, allowed choosing the most suitable weights to balance the two conflicting objectives, maximum information and minimum redundancy, and thus obtaining the optimal representative subset of quality parameters.
The MIMR criterion was also quantitatively compared to the multivariate statistical approach PCA, and the results showed how the MIMR seems be more suitable to detect the optimal parameters set both when the amount of the investigated data is small and when a non-linear relationship among the parameters exists. In fact, this set of parameters, constituted by Dissolved Oxygen and Escherichia Coli, stays constant both when considering all data and when grouping them in the four seasons. This way, the MIMR criterion could be used to develop a future WQI, more objective and more correctly weighted, able to provide a better water quality assessment of the Bacchiglione river under different conditions. In addition, the correlation between the spatial and temporal variability of only two parameters and one of the factors affecting the river quality status also allows a faster and clearer identification of the contamination sources within the basin. This can help the environmental managers to better address and prioritize the local monitoring activities and guide the local administrators and policy makers towards the choice of mitigation measures and structural investments, which could speed up the achievement of the Sustainable Development Goals (SDGs). Some of these mitigation measures and interventions could be the adoption of good land use practices and sustainable food production systems (Goal 2.4), the re-naturalization of some fluvial reaches with parks and green areas (Goal 6.6), the revamping of wastewater treatment plants with advanced technologies (Goal 6.A), and the building of new treatment plants (Goal 6.A).
Finally, the method achievements could help the public and non-technical stakeholders to more meaningfully understand the drivers of the water quality degradation in the basin, therefore, strengthening the involvement of the local communities in actions aimed at improving the water quality and sanitation (Goal 6.B).