**Ambient Background Values of Selected Chemical Substances in Four Groundwater Bodies in the Pannonian Region of Croatia**

## **Zoran Naki´c \*, Zoran Kovaˇc , Jelena Parlov and Dario Perkovi´c**

Faculty of Mining, Geology and Petroleum Engineering, University of Zagreb, 10000 Zagreb, Croatia; zoran.kovac@rgn.unizg.hr (Z.K.); jelena.parlov@rgn.unizg.hr (J.P.); dario.perkovic@rgn.unizg.hr (D.P.)

**\*** Correspondence: zoran.nakic@rgn.unizg.hr

Received: 25 August 2020; Accepted: 21 September 2020; Published: 24 September 2020

**Abstract:** Groundwater quality is a consequence of cumulative effects of natural and anthropogenic processes occurring in unsaturated and saturated zone, which, in certain conditions, can lead to elevated concentrations of chemical substances in groundwater. In this paper, the concept of determining the ambient background value of a chemical substance in groundwater was applied, because the long-term effects of human activity influence the increase in concentrations of substances in the environment. The upper limits of ranges of ambient background values were estimated for targeted chemical substances in four groundwater bodies in the Pannonian region of Croatia, according to the demands of the EU Groundwater Directive. The selected groundwater bodies are typical, according to the aquifer typology, for the Pannonian region of Croatia. Probability plot (PP), the modified Lepeltier method, as well as the simple pre-selection method, were used in this paper, depending on a number of chemical data in analysed data sets and in relation to the proportion of <limit of quantification (LOQ) values in a data set for each groundwater body. Estimates obtained by using PP and the modified Lepeltier method are comparable when data variability is low to moderate, otherwise differences between estimates are notable. These methods should not be used if the proportion of <LOQ values in a data set is higher than 30%; however, the integration of results of both methods can increase the confidence of estimation. If the proportion of <LOQ values is higher than 30%, it is recommended to use the robust pre-selection method with the adequate confidence level. For highly skewed data, the 90th percentile of the pre-selected data set is comparable with other methods and preferable over the 95th percentile. The estimates obtained for inert and mobile substances are comparable on different scales. For highly redox-sensitive substances, estimates may differ by one to two orders of magnitude, in relation to the observed heterogeneity of the aquifer systems. The critical issue in the estimation process is the determination of hydrogeological and geochemical homogeneous units within the heterogeneous aquifer system.

**Keywords:** ambient background values; probability plot; modified Lepeltier method; pre-selection method; LOQ; groundwater body; Croatia

## **1. Introduction**

By definition, a geochemical background value of an element or compound in groundwater indicates the absence of anomalous, usually high, measured values of concentrations of substances that would indicate human influence. Matschullat et al. [1] define the geochemical background concentration as a relative measure for distinguishing between the natural and anthropogenic concentrations of an element or compound in a real sample set. The natural background concentrations of substances are due primarily to interactions between the rock matrix and water, i.e., dissolution of minerals and rocks, chemical and biological processes in the unsaturated and saturated zone, interactions between different groundwater bodies, groundwater residence time, and chemical composition of precipitation [2].

Due to the ubiquitous human impact, which is also reflected in the chemical composition of groundwater, the natural composition of groundwater, especially in shallow aquifers, is almost non-existent today. Accordingly, Reiman and Garrett [3] defined ambient background concentration as a background value under slightly changed conditions, when elevated concentrations of a substance in water result from a long-term human impact, such as agriculture, industry or urbanization, meaning that the measured values of concentrations of a substance cannot entirely reflect natural conditions. Other authors [4–6] take up this concept, recognizing the fact that for some substances in groundwater, e.g., nitrate, there are numerous natural and anthropogenic sources that could have influenced their concentrations.

When determining background values of substances in groundwater, it is necessary to take into considerations the concept of natural variability due to the heterogeneity of the aquifer system. Due to the geological variety of the different regions, some studies have shown the convenience of deriving local or regional background level [7]. It follows from this that the background value should not be presented as a single fixed value, since the background value thus defined does not provide information on the natural variability of the substance [1]. However, for the practical use of background values, in particular in the context of the application of threshold values to the requirements of the EU Groundwater Directive (2006/118/EC), the background value can be defined with a single value, as the upper limit of the range of background concentrations, with the adequate confidence level.

According to Molinari et al. [8], background value can be viably evaluated (i) from groundwater samples unaffected by human impact, including those samples taken from deep wells where no anthropogenic impacts from the surface are present, (ii) using multicomponent reactive transport modelling in real aquifer systems, in cases where discrepancies observed between reaction rates at laboratory and field scales, the problem of bridging across scales and the conceptual and parametric uncertainty can be fully addressed. Alternative to these two approaches is an estimation of background values by statistical analysis of a large set of monitoring data [7,8], with the aim to identify concentrations related to the contamination anomaly as opposed to those solely reflecting background processes.

At the EU level, there is currently no single set of criteria to ensure a standardized Europe-wide approach for defining natural background values [9]. The EU research project BaSeLiNe, Natural Baseline Quality in European Aquifers: A Basis for Aquifer Management, funded under the Fifth Framework Programme, has revealed that Median ± 2MAD and Mean ± 2SD rules have often been used as the main statistical parameters for initially defining original data distribution and background values in EU aquifers [10]. The problem arises from the fact that the use of these methods can give a wrong estimate of the location of the main body of data if data distribution is influenced by more than one process, resulting in multimodal distribution, which could be superimposed [11]. Reimann et al. [12] proposed the use of boxplot, both for determining extreme values and background concentrations, based on the results of comparative analysis, in which they compared the results of multiple statistical methods. They concluded that this method is appropriate if the number of extreme values is less than 10%, while the Median ± 2MAD method produces better results if the number of extreme values is greater than 15%. The EU research project BRIDGE, Background Criteria for Identification of Groundwater Thresholds [13], funded under the Sixth Framework Programme, resulted in a proposal of two methods to setting background values at European level. The component separation method is based on the separation of the relative frequency of concentration of a chemical substance into a natural and anthropogenic component, which are modelled with separate distributions, and is applied when a large amount of data is available for a chemical substance [14]. The pre-selection method is used in cases where a limited data set is available and when chemical samples do not show or show very little human influence [15–18]. It is clear that EU Member States apply very different approaches to determining background concentrations of substances in groundwater. Sweden estimates background concentrations by comparing groundwater chemical status with drinking water standards [19]. Buss et al. [20] state that the Irish Environmental Protection Agency determines concentrations of chemicals free from anthropogenic influences and calculates the upper and lower limits of the range of background concentrations from the extrapolation of the normal distribution curve of chemical concentrations. In Germany, the aforementioned component separation method is used [14]. In the Netherlands, several methods are applied, namely the historical method, the tritium method and the oxidation capacity method [17].

Nowadays, many researchers use the approach to determine background concentrations based on the use of probability plot (PP), triggered by research conducted by Sinclair in the early 1970s [21]. This approach is based on the assessment of one or more inflection points on a probability graph, which separate different populations within the distribution of all measured data for a substance. Kyoung et al. [22] strongly recommend this approach in cases where the distribution of measured data shows bimodality or multimodality. A complete data set from a statistical sample is subdivided into subgroups, which reflect relevant geochemical processes or pollution. A subgroup representing the background concentrations of an element or chemical compound has a characteristic probability density function that results from the cumulative influence of different processes in an aquifer. Such a subset of data can be approximated with a normal or lognormal distribution [23]. In many scientific papers, researchers more often use the lognormal distribution to show the distribution of natural background concentrations, while the normal distribution is mostly used to show the distribution of human-influenced data [8,14,21]. Similar to PP is the Lepeltier method [24] that visually evaluates cumulative sums in double-logarithmic scale graphs. The idea of the Lepeltier method is the assumption of a lognormal concentration distribution of the element for which the upper limit for background concentration is to be determined. Other methods that are intrinsically related to probability graph approach, by splitting the overall data distribution into distinct components, are the iterative 2-σ technique and the calculated distribution function. Both methods take a set of measured data and process the data, i.e., remove non-ambient values, until a normal distribution of ambient concentration range is obtained [1,5,6].

In this paper, three methods were applied to estimate the upper limit of the range of ambient background values (UL) of each targeted chemical substance in the context of assessing groundwater quality status of groundwater bodies in the Pannonian region of Croatia. PP and the modified Lepeltier method were used in accordance to results of a statistical simulation study, which evaluated the robustness and reliability of widely used methods for determining background values [25]. The pre-selection method was applied in accordance to recommendation stemming from the EU research project BRIDGE, in cases of limited data set availability and/or limited data quality.

As presented in Section 2, arsenic (As), sulphate (SO4), chloride (Cl), and nitrate (NO3) were considered, which are stipulated by EU and Croatian regulations as key substances for the assessment of the chemical status of groundwater bodies and should be taken into account when determining background concentrations. These substances occasionally occur in higher concentrations than reference values prescribed by EU and Croatian regulations. Following the recommendations of the EU research project BRIDGE, iron (Fe) was also included in the analysis. Iron is particularly sensitive to change in redox conditions due to human influence that can lead to an enrichment of dissolved iron in groundwater.

Four investigated groundwater bodies enabled comparison of UL estimates in similar hydrogeological settings. Selected groundwater bodies are typical, according to the aquifer typology, for the Pannonian region of Croatia and results are valid for unconsolidated gravel and sand aquifers.

The main aim of this research was to apply a robust methodology for the background estimation, applicable to data sets with moderate to high data variability and high percentage of limit of quantification (LOQ) values. To our knowledge, our work is a rare example of the application of the formal statistical procedures for an estimation of background values, which addresses the above-mentioned data quality issue.

#### **2. Materials and Methods 2. Materials and Methods**

#### *2.1. Study area Description 2.1. Study area Description*

Fifteen groundwater bodies were identified in the Pannonian part of the Republic of Croatia within the process of implementation of the Directive 2000/60/EC. In the River Basin Management Plan of the Republic of Croatia for the period 2016–2021 [26], each groundwater body was categorized as one, laterally and vertically, hydrogeological homogeneous unit. The four investigated groundwater bodies are located at the southern edge of the Pannonian basin, in the northern and the eastern part of Croatia (Figure 1). Fifteen groundwater bodies were identified in the Pannonian part of the Republic of Croatia within the process of implementation of the Directive 2000/60/EC. In the River Basin Management Plan of the Republic of Croatia for the period 2016–2021 [26], each groundwater body was categorized as one, laterally and vertically, hydrogeological homogeneous unit. The four investigated groundwater bodies are located at the southern edge of the Pannonian basin, in the northern and the eastern part of Croatia (Figure 1).

*Water* **2020**, *12*, x FOR PEER REVIEW 4 of 26

**Figure 1.** Location of the investigated groundwater bodies in Croatia. **Figure 1.** Location of the investigated groundwater bodies in Croatia.

This area is characterized by vast plains of the Sava and Drava rivers, in which gravel and sand aquifers are formed at different depths. These aquifers are rich in water and represent the main water supply resource of the northern part of Croatia [26]. In the hilly area between Sava and Drava plains, small alluvial aquifers are contained in the catchment areas of large rivers tributaries, while spatially limited carbonate fissure and karst aquifers are found in the highest, mostly isolated parts of the hilly area. Table 1 lists the total thickness and average hydraulic conductivity of gravel and sand layers in each groundwater body as well as the area of four groundwater bodies analysed. This area is characterized by vast plains of the Sava and Drava rivers, in which gravel and sand aquifers are formed at different depths. These aquifers are rich in water and represent the main water supply resource of the northern part of Croatia [26]. In the hilly area between Sava and Drava plains, small alluvial aquifers are contained in the catchment areas of large rivers tributaries, while spatially limited carbonate fissure and karst aquifers are found in the highest, mostly isolated parts of the hilly area. Table 1 lists the total thickness and average hydraulic conductivity of gravel and sand layers in each groundwater body as well as the area of four groundwater bodies analysed.

**Table 1.** Groundwater body characteristics. Thickness and hydraulic conductivity data refer to gravel **Table 1.** Groundwater body characteristics. Thickness and hydraulic conductivity data refer to gravel and sand aquifers in groundwater bodies.


Groundwater body CDGI\_19 is in the western part of the Drava River plain (Figure 1). It is filled with thick gravel and sand layers separated by silt and clay interlayers and lenses. The groundwater body comprise two alluvial aquifers. The upper unconfined aquifer is of Quaternary age and is built

karst aquifers.

Groundwater body CDGI\_19 is in the western part of the Drava River plain (Figure 1). It is filled with thick gravel and sand layers separated by silt and clay interlayers and lenses. The groundwater body comprise two alluvial aquifers. The upper unconfined aquifer is of Quaternary age and is built of coarse-grained gravel and sand. The lower semi-confined aquifer is of Neogene age and has a higher amount of finer-grained sediments [27]. They are divided by a semipermeable silty to clayey layer of average thickness of several meters (Figure 2). *Water* **2020**, *12*, x FOR PEER REVIEW 5 of 26 of coarse-grained gravel and sand. The lower semi-confined aquifer is of Neogene age and has a higher amount of finer-grained sediments [27]. They are divided by a semipermeable silty to clayey layer of average thickness of several meters (Figure 2).

**Figure 2.** Conceptual model of the groundwater body CDGI\_19: (**a**) Planar extent of the groundwater body; (**b**) Schematic 3D hydrogeological cross-section. **Figure 2.** Conceptual model of the groundwater body CDGI\_19: (**a**) Planar extent of the groundwater body; (**b**) Schematic 3D hydrogeological cross-section.

A severe degradation of groundwater quality, due to intensive agricultural activity, is recorded in the upper unconfined aquifer [28]. Before the construction of hydropower plants and reservoirs on the Drava River in 1970s, the groundwater quality was within the limits of drinking water standards. Afterwards, nitrate concentrations increased significantly in the groundwater in the shallow aquifer, in which the nitrate concentrations exceeded the maximum permissible concentration in drinking water. The probable cause of groundwater pollution was an increase in the flow of contaminated groundwater from areas with intensive agricultural activity, due to changes in boundary conditions after the construction of reservoirs and drainage channels. A severe degradation of groundwater quality, due to intensive agricultural activity, is recorded in the upper unconfined aquifer [28]. Before the construction of hydropower plants and reservoirs on the Drava River in 1970s, the groundwater quality was within the limits of drinking water standards. Afterwards, nitrate concentrations increased significantly in the groundwater in the shallow aquifer, in which the nitrate concentrations exceeded the maximum permissible concentration in drinking water. The probable cause of groundwater pollution was an increase in the flow of contaminated groundwater from areas with intensive agricultural activity, due to changes in boundary conditions after the construction of reservoirs and drainage channels.

Groundwater body CDGI\_23 is in the eastern part of the Drava River plain (Figure 1). Sediments in the Drava River depression mostly originate from the Alp massif and to a lesser extent from Slavonian Mountains [29]. Here, coarse and fine-grained clastic sediments alternate laterally and vertically. The aquifer system is of Quaternary age and its thickness is more than 200 m, while thicknesses of single confined and semi-confined aquifers are from five to 50 m (Figure 3). Aquifers are predominantly composed of layers of medium to fine-grained sand in the western part of the groundwater body, while the fine-grained fraction prevails in the east. Sandy layers are separated by silt and clay interlayers and lenses. In the hilly area of the groundwater body, the rocks of the Middle Triassic carbonate complex, dolomites, dolomitic breccias and dolomitic limestones, form fissure and Groundwater body CDGI\_23 is in the eastern part of the Drava River plain (Figure 1). Sediments in the Drava River depression mostly originate from the Alp massif and to a lesser extent from Slavonian Mountains [29]. Here, coarse and fine-grained clastic sediments alternate laterally and vertically. The aquifer system is of Quaternary age and its thickness is more than 200 m, while thicknesses of single confined and semi-confined aquifers are from five to 50 m (Figure 3). Aquifers are predominantly composed of layers of medium to fine-grained sand in the western part of the groundwater body, while the fine-grained fraction prevails in the east. Sandy layers are separated by silt and clay interlayers and lenses. In the hilly area of the groundwater body, the rocks of the Middle Triassic carbonate complex, dolomites, dolomitic breccias and dolomitic limestones, form fissure and karst aquifers.

*Water* **2020**, *12*, x FOR PEER REVIEW 6 of 26

**Figure 3.** Conceptual model of the groundwater body CDGI\_23: (**a**) Planar extent of the groundwater body; (**b**) Schematic 3D hydrogeological cross-section. **Figure 3.** Conceptual model of the groundwater body CDGI\_23: (**a**) Planar extent of the groundwater body; (**b**) Schematic 3D hydrogeological cross-section.

Groundwater body CSGI\_29 is in the eastern part of the Sava River plain (Figure 1). The dominant factor on formation of the Sava River depression is transportation and deposition of eroded material by the River Sava tributaries. Rivers from Bosnia and Herzegovina deposited fan-shape coarse-grained clastic sediments in the peripheral areas of the depression. These sediments mostly originate from the Bosnian Mountains and to a lesser extent is of the Alpine provenance. The number of aquifers ranges from two to four, near the Sava River, to eleven in the northern part of the Sava depression (Figure 4). The thickness of the single aquifers is rarely greater than 30 m [29]. The aquifers near the Sava River, found from 20–70 m in depth, consist of gravel and sand layers, deposited during warm and humid interglacial periods, and alternate with fine sand, silt and clay, deposited during cold glacial periods in Pleistocene. Holocene deposits from 20 to 10 m in depth consist of silt, sand, and gravel, while uppermost deposits are fine grained, deposited by flooding of the Sava River and its tributaries and by erosion of loess plateau to the north. The aquifer thickness decreases from the Sava River to the north and the content of fine sediments, namely sand and silt, increases. Based on the measurement of tritium contents at locations far from the Sava River to the north, it is found that the relative mean residence time of groundwater is prior to the 1950s [30]. Groundwater body CSGI\_29 is in the eastern part of the Sava River plain (Figure 1). The dominant factor on formation of the Sava River depression is transportation and deposition of eroded material by the River Sava tributaries. Rivers from Bosnia and Herzegovina deposited fan-shape coarse-grained clastic sediments in the peripheral areas of the depression. These sediments mostly originate from the Bosnian Mountains and to a lesser extent is of the Alpine provenance. The number of aquifers ranges from two to four, near the Sava River, to eleven in the northern part of the Sava depression (Figure 4). The thickness of the single aquifers is rarely greater than 30 m [29]. The aquifers near the Sava River, found from 20–70 m in depth, consist of gravel and sand layers, deposited during warm and humid interglacial periods, and alternate with fine sand, silt and clay, deposited during cold glacial periods in Pleistocene. Holocene deposits from 20 to 10 m in depth consist of silt, sand, and gravel, while uppermost deposits are fine grained, deposited by flooding of the Sava River and its tributaries and by erosion of loess plateau to the north. The aquifer thickness decreases from the Sava River to the north and the content of fine sediments, namely sand and silt, increases. Based on the measurement of tritium contents at locations far from the Sava River to the north, it is found that the relative mean residence time of groundwater is prior to the 1950s [30].

*Water* **2020**, *12*, x FOR PEER REVIEW 7 of 26

**Figure 4.** Conceptual model of the groundwater body CSGI\_29: (**a**) Planar extent of the groundwater body; (**b**) Schematic 3D hydrogeological cross-section. **Figure 4.** Conceptual model of the groundwater body CSGI\_29: (**a**) Planar extent of the groundwater body; (**b**) Schematic 3D hydrogeological cross-section.

Chemical properties of deep groundwater in the eastern parts of the Drava and Sava plains are mainly controlled by natural geochemical processes, mostly by dissolution of carbonate and weathering of silicate minerals that form the particles of alluvial deposits [31]. Cation exchange processes, promoted by long groundwater residence time, can also significantly contribute solutes to groundwater. High concentrations of arsenic of natural origin are typical for the groundwater of the Pannonian Basin [32], where the type and geochemical composition of groundwater is result of lithological, sedimentological and palaeographic factors [33]. High arsenic content in groundwater in the eastern part of Croatia is mostly caused by reductive desorption of arsenic from iron oxides and/or clay minerals, reductive dissolution of iron oxides or competition for the sorption sites with organic matter and phosphate [29]. However, the influence of anthropogenic activity, namely agriculture and urbanisation, cause the deterioration of groundwater quality. In groundwater samples taken from observation wells located close to the Sava River, high content of nitrogen, potassium and chloride was periodically detected [30]. Groundwater body CSGN\_25 is in the hilly area between Sava and Drava plains (Figure 1). Low to medium permeability unconfined alluvial aquifers of Quaternary age are found in the lowland part Chemical properties of deep groundwater in the eastern parts of the Drava and Sava plains are mainly controlled by natural geochemical processes, mostly by dissolution of carbonate and weathering of silicate minerals that form the particles of alluvial deposits [31]. Cation exchange processes, promoted by long groundwater residence time, can also significantly contribute solutes to groundwater. High concentrations of arsenic of natural origin are typical for the groundwater of the Pannonian Basin [32], where the type and geochemical composition of groundwater is result of lithological, sedimentological and palaeographic factors [33]. High arsenic content in groundwater in the eastern part of Croatia is mostly caused by reductive desorption of arsenic from iron oxides and/or clay minerals, reductive dissolution of iron oxides or competition for the sorption sites with organic matter and phosphate [29]. However, the influence of anthropogenic activity, namely agriculture and urbanisation, cause the deterioration of groundwater quality. In groundwater samples taken from observation wells located close to the Sava River, high content of nitrogen, potassium and chloride was periodically detected [30].

of the groundwater body (Figure 5). These spatially limited water-bearing layers of small thicknesses alternate with loose and unsorted silt and sandy clay sediments of low permeability. The total thickness of the Quaternary deposits in the lowland part of the groundwater body is 40 to 130 m. To a lesser extent, conglomerate, breccia, dolomite and limestone fissure, and cavernous aquifers of Triassic age have been identified in the hilly part of the groundwater body [34]. A small depth to groundwater and low protection capability of discontinuous aquitards in the lowland area make the groundwater body moderately to highly vulnerable to agricultural pollution. Groundwater body CSGN\_25 is in the hilly area between Sava and Drava plains (Figure 1). Low to medium permeability unconfined alluvial aquifers of Quaternary age are found in the lowland part of the groundwater body (Figure 5). These spatially limited water-bearing layers of small thicknesses alternate with loose and unsorted silt and sandy clay sediments of low permeability. The total thickness of the Quaternary deposits in the lowland part of the groundwater body is 40 to 130 m. To a lesser extent, conglomerate, breccia, dolomite and limestone fissure, and cavernous aquifers of Triassic age have been identified in the hilly part of the groundwater body [34]. A small depth to groundwater and low protection capability of discontinuous aquitards in the lowland area make the groundwater body moderately to highly vulnerable to agricultural pollution.

*Water* **2020**, *12*, x FOR PEER REVIEW 8 of 26

**Figure 5.** Conceptual model of the groundwater body CSGN\_25: (**a**) Planar extent of the groundwater body; (**b**) Schematic 3D hydrogeological cross-section. **Figure 5.** Conceptual model of the groundwater body CSGN\_25: (**a**) Planar extent of the groundwater body; (**b**) Schematic 3D hydrogeological cross-section.

#### *2.2. Available Data Set 2.2. Available Data Set*

The UL of each selected substance was estimated by groundwater chemistry data obtained from the national database of Croatian Waters: "National monitoring program of groundwater quality". Data from observations wells, included in the national monitoring program, have been used for this purpose. Nine observation wells have been used from the groundwater body CDGI\_19, twenty-six from the groundwater body CDGI\_23, fifteen from the groundwater body CSGI\_29, and ten from the groundwater body CSGN\_25 (Figures 2–5). All observation wells are attributed to unconsolidated sand and gravel aquifers at different depths in selected groundwater bodies. In the groundwater body CSGN\_25, well screens are positioned only at shallow depths to monitor groundwater quality of unconfined alluvial aquifers. Four data sets for chemical substances, one per groundwater body, are evaluated for the period The UL of each selected substance was estimated by groundwater chemistry data obtained from the national database of Croatian Waters: "National monitoring program of groundwater quality". Data from observations wells, included in the national monitoring program, have been used for this purpose. Nine observation wells have been used from the groundwater body CDGI\_19, twenty-six from the groundwater body CDGI\_23, fifteen from the groundwater body CSGI\_29, and ten from the groundwater body CSGN\_25 (Figures 2–5). All observation wells are attributed to unconsolidated sand and gravel aquifers at different depths in selected groundwater bodies. In the groundwater body CSGN\_25, well screens are positioned only at shallow depths to monitor groundwater quality of unconfined alluvial aquifers.

from 2007 to 2017. Five chemical substances were chosen for further analysis: arsenic (As), sulphate (SO4), chloride (Cl), nitrate (NO3), and iron (Fe). The UL were estimated for substances that may be Four data sets for chemical substances, one per groundwater body, are evaluated for the period from 2007 to 2017. Five chemical substances were chosen for further analysis: arsenic (As), sulphate (SO4), chloride (Cl), nitrate (NO3), and iron (Fe). The UL were estimated for substances that may be due to natural and anthropogenic conditions. The EU Groundwater Directive (2006/118/EC), in Annex II, specifically lists substances that occur naturally and under the human influence, as well

as pollution indicators, which need to be considered when setting national groundwater quality standards. This group includes arsenic, sulphate and chloride, as well as nitrate, for which the Groundwater Directive sets Community criteria for the assessment of the chemical status of bodies of groundwater (50 mg NO3/L). The EU research project "Background Criteria for Identification of Groundwater Thresholds (BRIDGE)", funded under the 6th Framework Program of the European Union, proposed that the determination of background concentrations of substances is carried out for important pollutants that occur as a result of natural conditions and for certain characteristic substances, such as iron, which may occur in elevated concentrations due to human activity. Hence, all selected substances are to be considered for the assessment of groundwater body chemical status, according to EU and Croatian regulations and guidelines.

Table 2 shows that arsenic, sulphate, and nitrate data contain a high proportion of <LOQ values. In addition, standard deviation and coefficient of variation show high to moderate data variability for selected substances.


**Table 2.** Main statistics for analysed substances.

Note: \* Statistics for substances after exclusion of contaminated samples based on criteria for the pre-selection method.

It is worth noting that two selected substances frequently occur in higher concentrations than reference values stipulated by EU and Croatian regulations. From Table 2, it is evident that average values of arsenic and iron in some groundwater bodies exceed maximum permissible levels for drinking water (10 µg As/L; 200 µg Fe/L).

#### *2.3. Description of Methods*

The EU Groundwater Directive 2006/118/EC specifically defines background values as "concentration of a substance or the value of an indicator in a body of groundwater corresponding to no, or only very minor, anthropogenic alterations to undisturbed conditions". It further stipulates that the estimation of background values must be based on the characterization of groundwater bodies and on the results of groundwater monitoring. In cases where limited monitoring data are available, additional data need to be collected and background values must be determined by a simplified approach based on these limited monitoring data, considering geochemical reactions and processes in the groundwater system. Accordingly, the methodology for the estimation of ambient background values of substances must be as robust as possible to enable reliable evaluation of the chemical status and groundwater quality risk assessment, according to requirements of the Directive 2000/60/EC.

In this research, model-based objective methods were considered for the estimation of background values, which are based on the approach that a background population in a geological environment has a characteristic probability density function that results from the summation of natural processes that produce the background population. They differ from other methods, e.g., model-based subjective methods, in that the boundary between background and anomalous populations is defined by the data themselves rather than by an arbitrary decision of the researcher [23].

Well-known model-based objective methods have been selected for further testing, the probability plot (PP), the Lepeltier method, the iterative 2-σ technique, and the calculated distribution function. A statistical simulation study was conducted, which compared the reliability and robustness of these methods based on common criteria [25]. Since the lognormal distribution is frequently assumed for the background population [2,8,14,21], lognormal distribution parameters of hypothetical ambient and non-ambient populations were randomly selected using computer-generated values. The size of mixed population used in simulation study was predefined at 30, 100, 300, and 1000, assuming small, medium, or large number of groundwater samples. The mixing factor, i.e., the proportion of values belonging to the ambient and non-ambient distributions, was randomly selected from the uniform distribution, so that the percentages of ambient population vary from 30 to 70% of the mixed population. A proportion of <LOQ values in a hypothetical data set was set at 0%, 1%, 5%, 10%, 15%, 20%, 25%, and 30% and all <LOQ values were either discarded or were substituted with LOQ values or with randomly selected values from the uniform distribution from 0 to LOQ values. The simulation study revealed that PP and the Lepeltier method have the lowest relative and absolute error of the estimate of background values. The Lepeltier method gives better estimates than PP for small data sets with N < 100 and if the proportion of <LOQ values is between 20% and 30% [25].

The probability plot (PP) approach assumes that different processes generate data that have different probability distributions that can overlap, i.e., a certain part of the measurement range can be covered by multiple distributions, which can differ in parameters, while all distributions belong to the same distribution family. For example, it is possible that background and/or non-background processes result in data that can be described by normal or lognormal distribution, but with different parameters.

The aim of PP is to try to identify points that separate multiple distributions, i.e., the value to which the influence of one process is dominant and after that value the influence of another process grows stronger. If there is a partial overlap of background and non-background distributions, then a change in distributions can be seen on the probability graph as an inflection point, i.e., the point where the graph changes from concave to convex or vice versa. The concentration at the inflection point is defined as the threshold value (the upper limit of the range of natural concentrations), below which all measured values of the substance belong to background concentrations [6,23]. If background values follow a lognormal distribution, then the lognormal probability graph in its initial part should be very similar to a straight line. At the point of inflection, there is a visible deviation of the graph from the straight line and near that point a value is expected after which more and more influence has another process that generates non-background values.

In this paper, we have visually identified an inflection point below which all measured values of each analysed substance belong to ambient background concentrations. We have applied the following procedure to construct the probability graph using lognormal distribution:


D'Agostino and Stephens consider that 25 is a minimum number of data for reliable use of PP [35]. Other researchers consider an approach where the lower limit is 100 data, below which the probability graph show significant deviation from normal distribution [4,23]. If a number of data (N) is less than 100, then the graph may look jagged or exhibit nonlinear behaviour, which can increase the likelihood of erroneous readings and making the wrong conclusion. In this paper, we have applied PP to determine the upper limit of ambient background concentrations if N > 100 and, based on the results of statistical simulation study [25], if the proportion of <LOQ values is ≤20%.

The advantage of the PP approach is that it enables the identification of multiple populations, which are determined on the graph by inflection points. In addition, each data value is observable on graph and extreme values can be clearly detected as single values [36]. A limitation in the application is that ambient and non-ambient distributions must be assumed a priori, most often lognormal, although researchers also use normal, gamma, and other distributions [4,8,14,37].

The Lepeltier method is a graphical method that analyzes the cumulative sum on a graph with logarithmic scales [24]. In his original work, Lepeltier suggested a "reverse procedure" for cumulative frequency calculation, i.e., cumulation from high to low values [24]. The aim was to overcome the problem of plotting the highest value at 100% on a probability scale and to compensate the analytical imprecision at the lower end where the cumulation starts, particularly if the proportion of <LOQ values is high in a data set [38]. Ashley and Keith [39] modified the Lepeltier approach by computing log estimators of the central value and dispersion rather than using unadjusted means and deviations obtained graphically.

The advantage of the Lepeltier method is that it enable the identification of ambient background values for relatively small data sets. A limitation in the application is that care must be taken to avoid the temptation to accept the visual deviations at the lower part of the curve as significant if values are close to the detection limit and on a highly magnified scale [38].

In this paper, cumulative relative frequencies have been graphically evaluated in double logarithmic scale graphs. A point is sought on a graph at which a significant visual alteration of the slope shows as a bend in a curve. In the case of finding such a value, for example the value of *x*, then all values less than or equal to *x* are further considered for the calculation of the upper limit of ambient background concentrations. Similar to Ashley and Keith [39], we have modified the Lepeltier method by computing both Mean + 2SD, following the approach described by Matschullat et al. [1], and Median + 2MAD, to compare results of different estimators of central value and spread of the data distribution. The advantage of the MAD estimator in calculating the upper background concentration, i.e., threshold value between ambient and anomalous populations, is that it is much less influenced by skewed data or outliers than other, less robust, spread estimators [36,40].

Although it is difficult to give a precise limit for the minimum number of data required for reliable analysis by this method, in his original work Lepeltier states that it is necessary to analyze a minimum of 50 data, with which meaningful results can be expected [24]. If the number of data is less than 50, then it can be difficult to visually determine the exact location on the graph where the distribution changes, i.e., where the graph has unexpectedly changed appearance, because the points in this case may be too spaced or very localized around one or more points. Accordingly, we have applied the modified Lepeltier method if N > 50 and if the proportion of <LOQ values is ≤30%.

The pre-selection method, developed under the EU research project BRIDGE [13], can be applied for estimation of background concentrations of substances in cases of limited data quality. Since 2008, numerous researchers have used this method [2,15–18]. It is based on the assumption that selected indicators can give a good insight into whether a sample is contaminated or not. In cases where concentrations of indicators exceed a predefined value, the sample is considered contaminated and excluded from the estimation of background concentration. In this paper, the following exclusion criteria have been applied: (a) ion balance error more than ±10%; (b) the sum of chloride and sodium higher than 1000 mg/L (salt or brackish water); (c) NO<sup>3</sup> > 50 mg/L or active substances in pesticides > 0.1 µg/L (>0.5 µg/L for total pesticides) or sum trichloroethylene and tetrachloroethylene > 10 µg/L, in line with the provision of the EU Groundwater Directive (2006/118/EC); and (d) anaerobic samples (DO < 1 mg/L), following the approach described in a previous study [18]. From the resulting data set per groundwater body (Table 2), the upper limit of background concentrations can be expressed as a 70th, 90th, or 95th percentile of the remaining data range, indicating appropriate confidence level, using the following procedure: (a) 95th percentile if N > 30; (b) 90th percentile if 20 < N < 30; and (c) 70th percentile if 10 < N < 20.

The advantage of the pre-selection method is that it can be applied if data does not allow for derivation of natural background levels by more advanced methods [18]. The limitation of this method is that the boundary between background and non-background population is defined by an arbitrary decision. It belongs to model-based subjective methods of background determination that include some type of formal statistical model to a set of selected geochemical values, making no assumptions about the form of the data distribution [11]. This characteristic makes the pre-selection method less sensitive to high proportion of LOQ values in a data set than model-based objective methods. In this paper, the pre-selection method was applied if other methods were not applicable and if, after exclusion of contaminated samples, the proportion of <LOQ values is ≤50%. The pre-selection method was also used in several cases to compare the results of the estimation of ambient background values obtained by other methods.

#### **3. Results and Discussion**

This section presents the results of an estimation of upper limits of ranges of ambient background values (UL) for selected substances in considered groundwater bodies. Methods for background estimation have been applied based on the criteria described in Section 2.3. As shown in Table 3, the modified Lepeltier method and the pre-selection method were more frequently used than the probability plot. The pre-selection method has been additionally used for arsenic (groundwater body CDGI\_23), iron (groundwater body CDGI\_19) and sulphate (groundwater body CSGI\_29), to compare results with those obtained by the modified Lepeltier method. For As (groundwater bodies CDGI\_19 and CSGI\_29) and NO<sup>3</sup> (groundwater bodies CDGI\_23 and CSGI\_29), the proportion of <LOQ values was higher than 50%, hence UL were not estimated by either method.


**Table 3.** Methods selected for calculating the upper limits of ranges of ambient background concentrations of analysed substances based on criteria described in Section 2.3.

Note: \* For comparison purpose only.

## *3.1. Arsenic*

Table 4 shows the results obtained by applying the modified Lepeltier method and the pre-selection method to arsenic data representative for two groundwater bodies (CDGI\_23 and CSGN\_25). It can be seen that UL estimates for these two bodies, obtained by the 95th percentiles for the pre-selected data set, differ by an order of magnitude. High value of the UL estimate obtained for the body CDGI\_23, which significantly exceeds the EU drinking water standard for arsenic, can be associated with the chemical composition of deep groundwater, which is controlled by natural geochemical processes that contribute high amount of solutes to groundwater [29]. It has been shown that arsenic occur naturally in very high concentrations in unconsolidated aquifers at high depth in the eastern part of the Drava River plain and can vary due to local hydrogeological and geochemical conditions in aquifers [41,42]. In the groundwater body CDGI\_23, a high value of the coefficient of variation (1.8) for arsenic is noted (Table 2), indicating significant variability of the chemical composition in different parts of the groundwater body.

**Table 4.** Estimated Mean + 2SD and Median + 2MAD ranges (for modified Lepeltier method) and the upper limits of ranges of ambient background concentrations (UL) of arsenic (As) obtained by selected methods. EU Drinking Water Standard for As is 10 µg/L.


Note: \* UL expressed as 95th percentiles.

Figure 6 depicts the relative cumulative frequencies of arsenic for the groundwater body CDGI\_23. High percentage of <LOQ values (27.3%) can be clearly seen on the graph at the lower end of data distribution, while a slight bend in the curve can be visually identified at the upper end of data distribution.

*Water* **2020**, *12*, x FOR PEER REVIEW 14 of 26

**Figure 6.** Cumulative relative frequencies of arsenic (As) for the groundwater body CDGI\_23. Arrow indicates value **x** considered for the calculation of Mean + 2SD and Median + 2MAD ranges. **Figure 6.** Cumulative relative frequencies of arsenic (As) for the groundwater body CDGI\_23. Arrow indicates value **x** considered for the calculation of Mean + 2SD and Median + 2MAD ranges.

The UL estimate for the body CDGI\_23 obtained by Mean + 2SD is significantly higher compared to the Median + 2MAD estimate. The Mean + 2SD estimate is lower, but comparable to the estimate obtained by the pre-selection method. The Median + 2MAD estimate is an order of magnitude lower than the UL estimate obtained by the pre-selection method (Table 4). The UL estimate for the body CDGI\_23 obtained by Mean + 2SD is significantly higher compared to the Median + 2MAD estimate. The Mean + 2SD estimate is lower, but comparable to the estimate obtained by the pre-selection method. The Median + 2MAD estimate is an order of magnitude lower than the UL estimate obtained by the pre-selection method (Table 4).

#### *3.2. Iron 3.2. Iron*

CSGI\_29

modified Lepeltier

Iron is a highly redox-sensitive element that has low background concentrations in unconfined aquifers, where oxygen is present, but background concentrations of iron can be increased significantly across redox boundaries [9]. It is evident from Table 5 that estimated UL for analysed groundwater bodies vary over one to two orders of magnitude, which is consistent with findings of background concentration ranges of iron in groundwater across Europe [43]. The highest estimates obtained for bodies CDGI\_23 and CSGI\_29 are associated with the highest average depth of aquifers. Iron is a highly redox-sensitive element that has low background concentrations in unconfined aquifers, where oxygen is present, but background concentrations of iron can be increased significantly across redox boundaries [9]. It is evident from Table 5 that estimated UL for analysed groundwater bodies vary over one to two orders of magnitude, which is consistent with findings of background concentration ranges of iron in groundwater across Europe [43]. The highest estimates obtained for bodies CDGI\_23 and CSGI\_29 are associated with the highest average depth of aquifers. The lowest values are noted for the unconfined aquifer within the body CDGI\_19.

The lowest values are noted for the unconfined aquifer within the body CDGI\_19. **Table 5.** Estimated Mean + 2SD and Median + 2MAD ranges (for modified Lepeltier method) and the upper limits of ranges of ambient background concentrations (UL) of iron (Fe) obtained by selected methods. EU Drinking Water Standard for Fe is 200 µg/L. **GW Median +**  Very high UL estimates (Table 5) and mean values (Table 2), detected for groundwater bodies CDGI\_23 and CSGI\_29, by far exceed EU drinking water standard for iron and indicate fast and pronounced reductive dissolution of iron species in anoxic groundwater. It is well known that water-quality thresholds may be frequently breached for iron, which occur in groundwater by natural processes, such as the geochemical conditions existing in the aquifer or due to the specific geology of the area [44].

**Body Method Mean + 2SD (µg/L) 2MAD (µg/L) UL (µg/L)**  CDGI\_19 modified Lepeltier method 9.5 + 16.2 5.6 + 6.6 25.7 (Mean + 2SD) 12.2 (Median + 2MAD) pre-selection method \* - 47.5 CDGI\_23 modified Lepeltier method 1156.4 + 2239.8 712.0 + 1421.4 3396.2 (Mean + 2SD) 2133.4 (Median + 2MAD) Table 5 shows that UL estimates for iron, obtained by the probability plot, are higher but comparable with those obtained by the Mean + 2SD. Median + 2MAD estimates are significantly lower in comparison with other methods. Exception is noted for the body CDGI\_23, for which all estimates are the same order of magnitude, with Mean + 2SD estimate being the highest one. Tables 2 and 5 suggest that: (a) Mean + 2SD and probability plot estimates match well for highly variable data (coefficient of variation > 1), (b) Median + 2MAD and probability plot estimates are comparable for moderate to low data variability (coefficient of variation ≤ 1). It is noted that with higher data variability a difference between Mean + 2SD and Median + 2MAD estimates increases.

probability plot - 1950.0

probability plot - 4270.0

1443.1 (Mean + 2SD)

113.0 (Median + 2MAD)

method 301.2 + 1141.8 39.0 + 74.0


**Table 5.** Estimated Mean + 2SD and Median + 2MAD ranges (for modified Lepeltier method) and the upper limits of ranges of ambient background concentrations (UL) of iron (Fe) obtained by selected methods. EU Drinking Water Standard for Fe is 200 µg/L. CSGN\_25 modified Lepeltier method 83.2 + 178.8 30.1 + 56.2 262.0 (Mean + 2SD) 86.3 (Median + 2MAD)

*Water* **2020**, *12*, x FOR PEER REVIEW 15 of 26

Note: \* UL expressed as 95th percentiles. for moderate to low data variability (coefficient of variation ≤ 1). It is noted that with higher data

variability a difference between Mean + 2SD and Median + 2MAD estimates increases.

Figure 7 depicts the cumulative relative frequencies of iron. Significant visual deviation at the lower end of curve from the main body of data is identified for groundwater bodies CDGI\_19 and CSGN\_25. These observed deviations are related to analytical imprecision due to high percentage of <LOQ values in data set, which is particularly evident for the body CDGI\_19 (27.3%). Small alterations of the slope at the upper end of data distribution can be visually identified for all plots. Figure 7 depicts the cumulative relative frequencies of iron. Significant visual deviation at the lower end of curve from the main body of data is identified for groundwater bodies CDGI\_19 and CSGN\_25. These observed deviations are related to analytical imprecision due to high percentage of <LOQ values in data set, which is particularly evident for the body CDGI\_19 (27.3%). Small alterations of the slope at the upper end of data distribution can be visually identified for all plots.

**Figure 7.** Cumulative relative frequencies of iron (Fe) for analysed groundwater bodies. Arrow indicates value **x** considered for the calculation of Mean + 2SD and Median + 2MAD ranges for each groundwater body. **Figure 7.** Cumulative relative frequencies of iron (Fe) for analysed groundwater bodies. Arrow indicates value **x** considered for the calculation of Mean + 2SD and Median + 2MAD ranges for each groundwater body.

Several inflection points can be identified on the lognormal plot for each groundwater body (Figure 8), which correspond to thresholds between different natural and/or anthropogenic populations. Due to high sensitivity of iron on abrupt changes across redox boundaries, it is difficult to conclude in this particular case the causes of the appearance of multiple inflection points on data Several inflection points can be identified on the lognormal plot for each groundwater body (Figure 8), which correspond to thresholds between different natural and/or anthropogenic populations. Due to high sensitivity of iron on abrupt changes across redox boundaries, it is difficult to conclude in this particular case the causes of the appearance of multiple inflection points on data plots, in terms of

plots, in terms of whether they are a consequence of natural processes or are the result of direct or indirect anthropogenic impacts, which may be reflected, e.g., through intensive water abstraction.

whether they are a consequence of natural processes or are the result of direct or indirect anthropogenic *Water*  impacts, which may be reflected, e.g., through intensive water abstraction. **2020**, *12*, x FOR PEER REVIEW 16 of 26

**Figure 8.** Lognormal probability plot of iron (Fe) for analysed groundwater bodies. Arrow indicates inflection point that separate ambient and non-ambient populations for each groundwater body. **Figure 8.** Lognormal probability plot of iron (Fe) for analysed groundwater bodies. Arrow indicates inflection point that separate ambient and non-ambient populations for each groundwater body.

#### *3.3. Sulphate 3.3. Sulphate*

Table 6 summarizes the main results of UL values for sulphate. An increase of the UL estimate is observed for deep confined aquifers in body CDGI\_23, while significantly lower values are attributed to unconfined and semi-confined aquifers within water bodies CDGI\_19 and CSGI\_29. The highest value of the coefficient of variation (2.4) is noted for the water body CDGI\_23 (Table 2), which indicates a high sensitivity of sulphate to changes of geochemical conditions within the aquifer system. It is well known that high concentrations of sulphate may be triggered by dissolution of minerals that control its natural abundance in water or by various land use [9]. EU Groundwater Directive (2006/118/EC) specifically states sulphate as indicator of the saline intrusion resulting from Table 6 summarizes the main results of UL values for sulphate. An increase of the UL estimate is observed for deep confined aquifers in body CDGI\_23, while significantly lower values are attributed to unconfined and semi-confined aquifers within water bodies CDGI\_19 and CSGI\_29. The highest value of the coefficient of variation (2.4) is noted for the water body CDGI\_23 (Table 2), which indicates a high sensitivity of sulphate to changes of geochemical conditions within the aquifer system. It is well known that high concentrations of sulphate may be triggered by dissolution of minerals that control its natural abundance in water or by various land use [9]. EU Groundwater Directive (2006/118/EC) specifically states sulphate as indicator of the saline intrusion resulting from human activities.

human activities. **Table 6.** Estimated Mean + 2SD and Median + 2MAD ranges (for modified Lepeltier method) and the **Table 6.** Estimated Mean + 2SD and Median + 2MAD ranges (for modified Lepeltier method) and the upper limits of ranges of ambient background concentrations (UL) of sulphate (SO<sup>4</sup> ) obtained by selected methods. EU Drinking Water Standard for SO<sup>4</sup> is 250 mg/L.


pre-selection method \* - 44.5 Note: \* UL expressed as 95th percentiles.

Note: \* UL expressed as 95th percentiles.

CSGN\_25 pre-selection method \* - 87.3

An increase of the sulphate concentration due to human impact is apparent in unconfined

An increase of the sulphate concentration due to human impact is apparent in unconfined aquifers within water body CSGN\_25. Estimated UL in bodies CSGI\_29 and CSGN\_25, obtained by the 95th percentiles for the pre-selected data set (Table 6), differ by two times, which can be associated with pronounced human impact from household diffuse pressure or water abstraction in the body CSGN\_25. *Water* **2020**, *12*, x FOR PEER REVIEW 17 of 26 with pronounced human impact from household diffuse pressure or water abstraction in the body CSGN\_25. *Water* **2020**, *12*, x FOR PEER REVIEW 17 of 26

Figure 9 depicts cumulative relative frequencies of sulphate for groundwater bodies CDGI\_19 and CSGI\_29. The jagged appearance of the cumulative relative frequency graph at the lower end of CSGI\_29 data distribution can be attributed to the high percentage of <LOQ values (23.1%). A significant alteration of the slope at the upper end of CDGI\_19 data distribution can be clearly identified on the cumulative relative frequency graph. Figure 9 depicts cumulative relative frequencies of sulphate for groundwater bodies CDGI\_19 and CSGI\_29. The jagged appearance of the cumulative relative frequency graph at the lower end of CSGI\_29 data distribution can be attributed to the high percentage of <LOQ values (23.1%). A significant alteration of the slope at the upper end of CDGI\_19 data distribution can be clearly identified on the cumulative relative frequency graph. with pronounced human impact from household diffuse pressure or water abstraction in the body CSGN\_25. Figure 9 depicts cumulative relative frequencies of sulphate for groundwater bodies CDGI\_19 and CSGI\_29. The jagged appearance of the cumulative relative frequency graph at the lower end of CSGI\_29 data distribution can be attributed to the high percentage of <LOQ values (23.1%). A

**Figure 9.** Cumulative relative frequencies of sulphate (SO4) for groundwater bodies CDGI\_19 and CSGI\_29. Arrow indicates value **x** considered for the calculation of Mean + 2SD and Median + 2MAD ranges for each groundwater body. **Figure 9.** Cumulative relative frequencies of sulphate (SO<sup>4</sup> ) for groundwater bodies CDGI\_19 and CSGI\_29. Arrow indicates value **x** considered for the calculation of Mean + 2SD and Median + 2MAD ranges for each groundwater body. **Figure 9.** Cumulative relative frequencies of sulphate (SO4) for groundwater bodies CDGI\_19 and CSGI\_29. Arrow indicates value **x** considered for the calculation of Mean + 2SD and Median + 2MAD ranges for each groundwater body.

In the lognormal plot (Figure 10), multiple inflection points can be identified at the middle part and at the upper end of CDGI\_19 data distribution, which denote thresholds between several natural and/or anthropogenic populations in groundwater of this shallow alluvial aquifer system. In the lognormal plot (Figure 10), multiple inflection points can be identified at the middle part and at the upper end of CDGI\_19 data distribution, which denote thresholds between several natural and/or anthropogenic populations in groundwater of this shallow alluvial aquifer system. In the lognormal plot (Figure 10), multiple inflection points can be identified at the middle part and at the upper end of CDGI\_19 data distribution, which denote thresholds between several natural and/or anthropogenic populations in groundwater of this shallow alluvial aquifer system.

**Figure 10.** Lognormal probability plot of sulphate (SO4) for the groundwater body CDGI\_19. Arrow indicates inflection point that separate ambient and non-ambient populations. **Figure 10.** Lognormal probability plot of sulphate (SO<sup>4</sup> ) for the groundwater body CDGI\_19. Arrow indicates inflection point that separate ambient and non-ambient populations.

**Figure 10.** Lognormal probability plot of sulphate (SO4) for the groundwater body CDGI\_19. Arrow

An interesting observation is noted comparing results of the modified Lepeltier method and the probability plot for the groundwater body CDGI\_19 (Table 6). Both Mean + 2SD and Median + 2MAD

indicates inflection point that separate ambient and non-ambient populations.

An interesting observation is noted comparing results of the modified Lepeltier method and the probability plot for the groundwater body CDGI\_19 (Table 6). Both Mean + 2SD and Median + 2MAD UL estimates are directly comparable, but also very similar to the UL estimate obtained with the probability plot. It appears that all UL estimates match well in the case of the low data variability (coefficient of variation < 1).

#### *3.4. Chloride*

Chloride is an inert and mobile compound, which natural amount depends on the geographical location, distance to sea, and amount of precipitation, but also on the regional influence of saline water inputs to the groundwater [9]. The average concentrations of chloride in groundwater of analysed water bodies (Table 2, 6.4 to 16.4 mg/L) are consistent with findings of Thunqvist [45], who stated that natural mean concentrations of chloride in groundwater vary between 10–15 mg/L. Multiple natural and anthropogenic sources of chlorides cause great variability of ambient background concentrations across Europe. The EU research project BRIDGE revealed that background values for chloride in the groundwater of Europe range from 6 to 548 mg/L [18].

Due to low percentage of <LOQ data, UL were estimated only by the modified Lepeltier method and probability plot. Close inspection of Table 7 indicates that UL estimates, obtained by different methods and across water bodies, are directly comparable. A slight increase of UL estimates for the shallow aquifers in the body CSGN\_25 can be attributed to direct (salt used on roads to remove ice at low air temperatures, leakage from sewage, fertilizers), or indirect (water abstraction) anthropogenic impacts.


**Table 7.** Estimated Mean + 2SD and Median + 2MAD ranges (for modified Lepeltier method) and the upper limits of ranges of ambient background concentrations (UL) of chloride (Cl) obtained by selected methods. EU Drinking Water Standard for Cl is 250 mg/L.

Note: \* UL expressed as 95th percentiles.

Figures 11 and 12 depict cumulative relative frequencies and lognormal plots of chlorides for analysed groundwater bodies. Cumulative relative frequency graph show distinct bend in a curve at the upper end of data distribution (Figure 11). Lognormal plots indicate curved data distributions, but visible changes at inflection points can be easily detected, indicating the existence of multiple natural populations as well as, in the case of shallow aquifers, anthropogenic populations at the upper end of data distribution (Figure 12).

*Water* **2020**, *12*, x FOR PEER REVIEW 19 of 26

**Figure 11.** Cumulative relative frequencies of chloride (Cl) for analysed groundwater bodies. Arrow indicates value **x** considered for the calculation of Mean + 2SD and Median + 2MAD ranges for each groundwater body. **Figure 11.** Cumulative relative frequencies of chloride (Cl) for analysed groundwater bodies. Arrow indicates value **x** considered for the calculation of Mean + 2SD and Median + 2MAD ranges for each groundwater body. **Figure 11.** Cumulative relative frequencies of chloride (Cl) for analysed groundwater bodies. Arrow indicates value **x** considered for the calculation of Mean + 2SD and Median + 2MAD ranges for each groundwater body.

**Figure 12.** Lognormal probability plot of chloride (Cl) for analysed groundwater bodies. Arrow indicates inflection point that separates ambient and non-ambient populations for each groundwater **Figure 12.** Lognormal probability plot of chloride (Cl) for analysed groundwater bodies. Arrow indicates inflection point that separates ambient and non-ambient populations for each groundwater **Figure 12.** Lognormal probability plot of chloride (Cl) for analysed groundwater bodies. Arrow indicates inflection point that separates ambient and non-ambient populations for each groundwater body.

#### body. body. *3.5. Nitrate*

*3.5. Nitrate*  Nitrate is frequently detected in high concentrations in unconfined alluvial aquifers in Croatia [46], while its origin, in some cases, can be linked with the existence of elevated chloride and sulphate concentrations, particularly in urban areas [47]. Increased nitrate concentrations in groundwater body CDGI\_19, which occasionally exceed EU drinking water standard (50 mg NO3/L), were attributed to agricultural activities, sewage leaks and household discharges not connected to sewerage systems *3.5. Nitrate*  Nitrate is frequently detected in high concentrations in unconfined alluvial aquifers in Croatia [46], while its origin, in some cases, can be linked with the existence of elevated chloride and sulphate concentrations, particularly in urban areas [47]. Increased nitrate concentrations in groundwater body CDGI\_19, which occasionally exceed EU drinking water standard (50 mg NO3/L), were attributed to agricultural activities, sewage leaks and household discharges not connected to sewerage systems Nitrate is frequently detected in high concentrations in unconfined alluvial aquifers in Croatia [46], while its origin, in some cases, can be linked with the existence of elevated chloride and sulphate concentrations, particularly in urban areas [47]. Increased nitrate concentrations in groundwater body CDGI\_19, which occasionally exceed EU drinking water standard (50 mg NO3/L), were attributed to agricultural activities, sewage leaks and household discharges not connected to sewerage systems [26]. From Table 2 it is clear that the mean value of nitrate in the body CDGI\_19 is significantly higher than mean values recorded in other bodies.

[26]. From Table 2 it is clear that the mean value of nitrate in the body CDGI\_19 is significantly higher than mean values recorded in other bodies. Table 8 shows results obtained by used methods to nitrate data representative for two groundwater bodies (CDGI\_19 and CSGN\_25). An increase in the UL estimates for the shallow [26]. From Table 2 it is clear that the mean value of nitrate in the body CDGI\_19 is significantly higher than mean values recorded in other bodies. Table 8 shows results obtained by used methods to nitrate data representative for two groundwater bodies (CDGI\_19 and CSGN\_25). An increase in the UL estimates for the shallow aquifers in groundwater body CDGI\_19, obtained by the modified Lepeltier method and probability Table 8 shows results obtained by used methods to nitrate data representative for two groundwater bodies (CDGI\_19 and CSGN\_25). An increase in the UL estimates for the shallow aquifers in groundwater body CDGI\_19, obtained by the modified Lepeltier method and probability plot, can be associated to deterioration of groundwater quality due to prolonged human influence.

aquifers in groundwater body CDGI\_19, obtained by the modified Lepeltier method and probability plot, can be associated to deterioration of groundwater quality due to prolonged human influence.

plot, can be associated to deterioration of groundwater quality due to prolonged human influence.

Close inspection of Table 8 indicates that the UL estimate for the body CSGN\_25, obtained by the 95th percentiles for the pre-selected data set, is even higher than estimates obtained for the body CDGI\_19. This is not consistent with the average nitrate concentrations observed in these two groundwater bodies (Table 2). However, it is noted that 5% of samples from the pre-selected data set for the body CSGN\_25, with high values of nitrate (>30 mg NO3/L), significantly deviate from the majority of data characterized by the high percentage of <LOQ values in data set, thus having a strong influence on the calculation of the UL estimate. Close inspection of Table 8 indicates that the UL estimate for the body CSGN\_25, obtained by the 95th percentiles for the pre-selected data set, is even higher than estimates obtained for the body CDGI\_19. This is not consistent with the average nitrate concentrations observed in these two groundwater bodies (Table 2). However, it is noted that 5% of samples from the pre-selected data set for the body CSGN\_25, with high values of nitrate (>30 mg NO3/L), significantly deviate from the majority of data characterized by the high percentage of <LOQ values in data set, thus having a strong influence on the calculation of the UL estimate.

**Table 8.** Estimated Mean + 2SD and Median + 2MAD ranges (for modified Lepeltier method) and the upper limits of ranges of ambient background concentrations (UL) of nitrates (NO<sup>3</sup> ) obtained by selected methods. EU Drinking Water Standard for NO<sup>3</sup> is 50 mg NO<sup>3</sup> /L. **Table 8.** Estimated Mean + 2SD and Median + 2MAD ranges (for modified Lepeltier method) and the upper limits of ranges of ambient background concentrations (UL) of nitrates (NO3) obtained by selected methods. EU Drinking Water Standard for NO3 is 50 mg NO3/L.


Note: \* UL expressed as 95th percentiles. Figures 13 and 14 depict relative cumulative frequencies and lognormal plot of nitrate for the groundwater body CDGI\_19. Cumulative relative frequency graph (Figure 13) shows significant

Figures 13 and 14 depict relative cumulative frequencies and lognormal plot of nitrate for the groundwater body CDGI\_19. Cumulative relative frequency graph (Figure 13) shows significant bend in the curve at the upper end of data distribution. In the lognormal plot (Figure 14), multiple inflection points can be identified at the middle part and at the upper end of data distribution, indicating the existence of natural and probably multiple anthropogenic populations, which is consistent with observed contribution of nitrate from multiple anthropogenic sources in the body CDGI\_19. bend in the curve at the upper end of data distribution. In the lognormal plot (Figure 14), multiple inflection points can be identified at the middle part and at the upper end of data distribution, indicating the existence of natural and probably multiple anthropogenic populations, which is consistent with observed contribution of nitrate from multiple anthropogenic sources in the body CDGI\_19.

**Figure 13.** Cumulative relative frequencies of nitrate (NO3) for the groundwater body CDGI\_19. Arrow indicates value **x** considered for the calculation of Mean + 2SD and Median + 2MAD ranges. **Figure 13.** Cumulative relative frequencies of nitrate (NO<sup>3</sup> ) for the groundwater body CDGI\_19. Arrow indicates value **x** considered for the calculation of Mean + 2SD and Median + 2MAD ranges.

*Water* **2020**, *12*, x FOR PEER REVIEW 21 of 26

**Figure 14.** Lognormal probability plot of nitrate (NO3) for the groundwater body CDGI\_19. Arrow indicates inflection point that separates ambient and non-ambient populations. **Figure 14.** Lognormal probability plot of nitrate (NO<sup>3</sup> ) for the groundwater body CDGI\_19. Arrow indicates inflection point that separates ambient and non-ambient populations.

#### *3.6. Comparison of UL Estimates and Methods 3.6. Comparison of UL Estimates and Methods*

Observed similarities and/or differences between estimated UL values can be attributed to local hydrogeology and geochemistry of analysed groundwater bodies and to characteristics of selected methods. For mobile and inert compound like chloride, UL estimates are comparable both between groundwater bodies and at the level of each groundwater body (Table 7). For highly redox-sensitive substances (arsenic, iron) UL estimates across groundwater bodies range over one to two orders of Observed similarities and/or differences between estimated UL values can be attributed to local hydrogeology and geochemistry of analysed groundwater bodies and to characteristics of selected methods. For mobile and inert compound like chloride, UL estimates are comparable both between groundwater bodies and at the level of each groundwater body (Table 7). For highly redox-sensitive substances (arsenic, iron) UL estimates across groundwater bodies range over one to two orders of magnitude (Tables 4 and 5).

magnitude (Tables 4 and 5). Comparing the model-based objective methods, the modified Lepeltier method and probability plot, differences between estimated UL are related to data variability and type of estimators used (for the modified Lepeltier method). When data variability is low to moderate, e.g., for chloride, Mean + 2SD and Median + 2MAD estimators give similar results and are directly comparable with probability plot estimates. Otherwise, for highly variable data, e.g., for iron, differences between UL estimates Comparing the model-based objective methods, the modified Lepeltier method and probability plot, differences between estimated UL are related to data variability and type of estimators used (for the modified Lepeltier method). When data variability is low to moderate, e.g., for chloride, Mean + 2SD and Median + 2MAD estimators give similar results and are directly comparable with probability plot estimates. Otherwise, for highly variable data, e.g., for iron, differences between UL estimates increase, particularly between Median + 2MAD and probability plot.

increase, particularly between Median + 2MAD and probability plot. Median + 2MAD is a robust equivalent of Mean + 2SD, which is not affected by extreme values in data set and is highly resistant up to 50% of the data values being extreme [36]. However, by determining the point x on the cumulative relative frequency curve at which a bend in a curve is noted, the non-ambient population from the mixed data distribution is significantly reduced, so the occurrence of potential extreme values in the data set is also reduced. Given that Median + 2MAD estimates are systematically the lowest, especially in cases where the greatest variability of data was recorded, it appears that Median + 2MAD is excessively conservative and underestimate an actual Median + 2MAD is a robust equivalent of Mean + 2SD, which is not affected by extreme values in data set and is highly resistant up to 50% of the data values being extreme [36]. However, by determining the point x on the cumulative relative frequency curve at which a bend in a curve is noted, the non-ambient population from the mixed data distribution is significantly reduced, so the occurrence of potential extreme values in the data set is also reduced. Given that Median + 2MAD estimates are systematically the lowest, especially in cases where the greatest variability of data was recorded, it appears that Median + 2MAD is excessively conservative and underestimate an actual UL value.

UL value. The choice of an inflection point to discriminate between ambient and non-ambient populations is a critical issue for model-based objective methods. These methods contain element of subjectivity in that the choice of the UL estimate (probability plot) or point x on the cumulative relative frequency curve (the modified Lepeltier method) is subject to visual detection and depends to great extent on the experience of researchers. As noted before, one should avoid temptation to detect a threshold between natural and anthropogenic populations at lower part of curve, close to LOQ values. A further limitation inherent to probability plot is existence of the multiple inflection points due to multiple natural and/or anthropogenic populations. Hence, to increase the confidence in the estimation process, it is necessary to detect non-linear behavior of data distribution, related to The choice of an inflection point to discriminate between ambient and non-ambient populations is a critical issue for model-based objective methods. These methods contain element of subjectivity in that the choice of the UL estimate (probability plot) or point x on the cumulative relative frequency curve (the modified Lepeltier method) is subject to visual detection and depends to great extent on the experience of researchers. As noted before, one should avoid temptation to detect a threshold between natural and anthropogenic populations at lower part of curve, close to LOQ values. A further limitation inherent to probability plot is existence of the multiple inflection points due to multiple natural and/or anthropogenic populations. Hence, to increase the confidence in the estimation process, it is necessary to detect non-linear behavior of data distribution, related to analytical imprecision or

analytical imprecision or to erroneous data, and, whenever possible, to combine two or more model-

based objective methods, particularly in cases of limited data set and/or limited data quality.

to erroneous data, and, whenever possible, to combine two or more model-based objective methods, particularly in cases of limited data set and/or limited data quality.

The UL estimates for arsenic (groundwater body CDGI\_23), iron (groundwater body CDGI\_19), and sulphate (groundwater body CSGI\_29), obtained by the 95th percentile for the pre-selected data set, are systematically higher than those obtained by the modified Lepeltier method. The choice of an appropriate percentile, calculated from a data range remaining after the exclusion of contaminated samples, directly relates to corresponding confidence level of the UL estimate. It is influenced by a number of samples in data set [18] or by positive skewness of data distribution [2]. Since each data set contains more than hundred data, the choice of 95th percentile, according to "number of samples" criterion set by Hinsby et al. [18] is met. The Shapiro–Wilk normality test for all data sets (verified at 0.05 significance level) indicated very high positive skewness of all data distributions, probably due to high percentage of <LOQ values in corresponding data sets (Table 2). Hence, the 90th percentile criterion was additionally tested to examine UL estimates for different percentiles. Comparing UL estimates for arsenic, iron and sulphate (Table 9), differences are evident between the 90th and 95th percentiles, however, the 90th percentile estimates are comparable to the Mean + 2SD estimates, shown in Tables 4–6. Although the pre-selection method is less sensitive to limited data quality compared to model-based objective methods, the high proportion of <LOQ values in the data set significantly affects the UL estimate. In other words, the higher proportion of <LOQ values in data set, the higher degree of uncertainty in the UL estimate obtained with a high percentile.


**Table 9.** Comparison of 90th and 95th percentile UL estimates for pre-selected data sets of arsenic (As), iron (Fe), and sulphate (SO<sup>4</sup> ) for three groundwater bodies.

The UL estimates for redox-sensitive substances obtained by using the same statistical methods differ between groundwater bodies. Table 5 shows that UL estimates of iron for unconfined aquifers within groundwater bodies CDGI\_19 and CSGN\_25, obtained by the modified Lepeltier method, differ by an order of magnitude. Similarly, it is noted that UL estimates of iron for semi-confined and confined aquifers within groundwater body CDGI\_23 are not comparable with UL estimates of iron for groundwater body CSGI\_29 characterized by the same type of aquifer, although the same methods were used for the estimation. A related outcome can be observed comparing UL estimates of sulphate, obtained by the 95th percentiles for the pre-selected data set, between groundwater bodies CDGI\_23 and CSGI\_29 (Table 6).

Given the high data variability (coefficient of variation > 1) of redox-sensitive substances, these results point to the great heterogeneity of considered groundwater bodies. Due to variable dynamics of groundwater flow during the hydrological year, the movement of solutes due to geochemical and hydraulic gradient as well as geochemical barriers that cause retention of solutes on the rock matrix, natural variability is common even in lithologically homogeneous aquifers. As noted by Matschullat et al. [1], the geochemical background concentration must be determined for homogeneous units or areas within a natural system, primarily in relation to climatological, hydrogeological, lithological and pedological characteristics. This concept is taken by Preziosi et al. [48], who singled out representative aquifers as homogeneous units and determined background concentrations of substances in groundwater for each aquifer and for the whole groundwater body. Similarly, Molinari et al. [8] found that background concentrations of redox-sensitive substances increase with depth, showing that the geochemical and hydrogeological stratification can be an important factor in determining

background concentrations and that it is desirable to carry out detailed characterization of aquifer system in order to determine hydrogeochemically homogeneous areas.

Assuming a relevance of determining background values of substances for each homogeneous unit, the concept of "regional" background concentration at the level of groundwater body, as followed in [26], needs to be re-examined, because the actual background value of a substance may differ between lithologically similar aquifers with different geochemical and hydrodynamic conditions. In compliance with the "sample size" and the "percentage of <LOQ values" criteria for the use of methods presented in Section 2.3, this changing paradigm inevitably requires balancing the need for extending the actual monitoring program in relation to the costs of drilling new boreholes and regular monitoring of parameters set by EU and Croatian regulations and guidelines.

#### **4. Conclusions**

Groundwater quality data from four groundwater bodies with similar hydrogeological settings, located in the northern and the eastern part of Croatia, were analysed in order to estimate the upper limits of ranges of background concentrations (UL) for targeted chemical substances. The concept of determining the ambient background value of a substance was applied, recognizing the fact that elevated concentrations of substances in groundwater are no entirely of natural origin and reflect long-term human impact on the chemical composition of groundwater. The UL were estimated using model-based objective methods, probability plot, and modified Lepeltier method, as well as the simple and robust the pre-selection method.

Model-based objective methods are sensitive to high variability of data and to high percentage of <LOQ values in the data set. UL estimates obtained by probability plot and modified Lepeltier method are comparable when data variability is low to moderate, otherwise differences between estimates are notable. It appears that an UL estimate calculated as a Median + 2MAD range of data from undisturbed (ambient) part of mixed data distribution obtained from the cumulative relative frequency curve, particularly underestimate an actual UL value. High percentage of <LOQ values in data set influences non-linear behavior of data distribution at lower part of a curve, thus affecting the detection of inflection point to discriminate between ambient and non-ambient populations. Results from this research indicate that both methods should not be used if data set contains more than 30% of <LOQ values. However, combining results of two or more model-based objective methods, particularly in cases of limited data set and/or limited data quality, can increase the confidence of estimation of threshold values.

The robust pre-selection method proved less sensitive to limited data quality or data availability compared to model-based objective methods. This method is preferable over the others if data set contains more than 30% of <LOQ values. For highly skewed data, the 90th percentile of the pre-selected data set is comparable with other methods and preferable over the 95th percentile estimate.

The estimated UL values for inert and mobile substances, e.g., chloride, are comparable on different scales. On the other hand, significant variability of UL estimates for redox-sensitive substances, e.g., arsenic and iron, can be attributed to change of chemical composition of groundwater across redox boundaries. Observed differences of the UL estimates for redox-sensitive substances between considered groundwater bodies are related to the heterogeneity of aquifer systems. This stresses the value of high-resolution conceptual model of groundwater bodies in the future update of the UL estimates. The critical issue in this process is the determination of hydrogeological and geochemical homogeneous units within the heterogeneous aquifer system.

**Author Contributions:** Z.N. made the conceptualization and wrote the draft of the manuscript. Z.N. and Z.K. developed the methodology. Z.K. made the data investigation and formal analysis, while Z.N. and Z.K. made most of the data interpretation and construction of tables. Z.K., J.P., and D.P. participated in the creation of figures, review and editing of the manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** The publication process was supported by the Development Fund of the Faculty of Mining, Geology, and Petroleum Engineering of the University of Zagreb.

**Acknowledgments:** Authors would like to thank Croatian Waters for providing data for this research within the projects "Trend definition and groundwater status assessment in the Pannonian part of Croatia" and "Definition of criteria for the determination of background concentrations and threshold values of contaminants in the groundwater bodies in the Pannonian part of Croatia".

**Conflicts of Interest:** The authors declare no conflict of interest.

## **Abbreviation**


### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*
