where


$$\mathbb{K} = \frac{\mathbb{N}}{12} \times \frac{\text{d}}{30}$$

where


Thornthwaite suggested a method for the simulation of the hydrological phenomenon in a catchment to evaluate the agricultural deficiency (calculated as the difference of the water need ETp and the actual crop ET), which is based on a formula of evaporation of a generic crop [29]. The evaluation of evapotranspiration is necessary for agricultural issues and, especially, for the definition of water resource balance [30]. Another method used to evaluate evapotranspiration is Penman–Monteith method [31], considered more physically realistic but requiring many meteorological variables. The Thornthwaite method is more easily applied because it requires only monthly mean air temperature and the maximum amount of sunshine duration, calculated using latitude [32,33]. The results obtained through these two methods are very similar in terms of correlation, trend, and regional averages [32,33]. First, the potential evapotranspiration is calculated with the aforementioned formula, and then the actual evapotranspiration is calculated using the Turc formula [23].

If the monthly rainfall is more than the potential evapotranspiration, the actual evapotranspiration is considered equal to its potential; the rainfall surplus is assigned to the soil humidity until an assigned limit [20]. The possible precipitation remaining is assigned to the runoff and to the groundwater flow, with the criterion explained below. If the monthly rainfall is less than the potential evapotranspiration, the actual evapotranspiration is considered equal to the total rainfall plus the soil humidity; if the supply is sufficient, the actual evapotranspiration is equal to its potential value, otherwise it is equal to the sum of the precipitation and the water storage such as soil humidity. The water excess that is not lost by evapotranspiration or stored in the soil humidity is assigned half between the monthly runoff in which there is the excess and half in the following month (underground flow) [34].

The main parameters of the hydrological balance are represented using a box plot that allows for understanding if the distribution is symmetric or asymmetric, and to identify the presence of outliers.

These anomalous values were calculated using Tukey fences: the lower threshold equal to Q1 − 1.5 ∗ IQR and the upper threshold equal to Q3 + 1.5 ∗ IQR, where Q1 is the lower quartile, Q3 is the upper quartile, and IQR is the interquartile range.

#### *2.2. Trend of Each Component of Water Balance*

Monthly time series of the main component of water balance, such as direct rainfall on lake (PL), the sum of all surface water inflow into the lake (E = RS + RIR + IRE), the variation of lake volume (ΔV), the lake level (H), the outflow (Q), and the groundwater source (GS) were analysed for the period from 1993 to 2019 using R software, version 3.6.3 [35]. Decomposition of each water balance component of the time series into its constituting parts, namely, the observed trend, seasonality, and random parts, was done for the monthly time series from 1993 to 2019 using the time series functions in R. The observed part represents the data that were measured or calculated through water balance; the trend part specifies if there is an increase or a decrease around the mean value; the

seasonal part represents a cyclical trend; and the random part represents unpredictable changes in the data without a precise and identifiable cause.

Deseasonality was performed to correctly identify an increasing or decreasing trend in water balance component and then subtract the seasonal component from the original time series. Statistical tests to verify the trends and assess their significance were performed using the Mann–Kendall test [36,37]. This test was selected because of its lower sensitivity to outliers and its robustness for detecting a trend in rainfall, temperature, and hydrology, without specifying if the trend is linear or nonlinear [38,39]. In addition, this test identifies a monotonic trend that defines an increasing or decreasing trend, is simple and robust, and adapts to missing values and data that do not have any particular distribution for improving water resource management, detecting a trend in discharge, direct runoff, precipitation, and evaporation [40,41].

#### *2.3. Break Point*

The statistical problem of break point or tipping point in a trend has been addressed in many fields of research, such as medical, images analyses, and human activity [42] but especially on meteorological and climate parameters [43], hydro-meteorological variables [44], and on time series data [45]. Methods in change detection or change point detection in time series data try to identify the times when the probability distribution of a stochastic process such as a time series changes.

Detection of break points, in this study, was done using the algorithm of analyses on change point present in the R library strucchange [46,47] applied to water balance parameters seasonally adjusted in the previous step. The approach we followed was to use least squares regression to estimate the locations of the changes. The function selects an optimal model (choosing the number of change points) using the Bayesian information criterion (BIC) by default [48]. The assessment of changing point was carried out by checking the changes in the average and variance of each variable of water balance, returning the point in time, year or month, in which one or more turning points were highlighted. Furthermore, recent studies in Piedmont Region where Lake Candia is located, highlight a break point in the water table level in 2008, due to a different agricultural technique of rice cultivation; the dry direct-seeded rice technique replaced the traditional techniques in some areas of the Piedmont Plain, affecting water use in the study area [49]. It is therefore interesting and useful to verify any climate break point, to compare with the changes in groundwater level and analyze the trend and behavior of meteorological data before and after 2008.

#### *2.4. Understanding the Drivers of Water Balance*

After investigating the trend of main water balance terms and looking for potential break points, we wanted to evaluate the relative importance of each term (ΔV, PL, Q, GS, E) to determine the main driver of water resource management of Lake Candia. We considered that PL is the direct rainfall on the lake and represents an important entrance that depends only on meteorological factors; Q is the discharge of the outflow and depends on the form of the weir placed on the outflow; GS is the groundwater entrance and depends on the rainfall within the whole hydrogeological catchment and on the water use (water supply and agricultural); E is the other surface entrance, depending on rainfall, land use, and irrigation; ΔV is the variation of the lake volume that depends on rainfall, discharge, runoff, and groundwater supply. The variation of lake level (ΔH) is incorporated into the ΔV term.

Then to avoid various types of noise (e.g., small sample efficiency, outliers, high breakdown point, time complexity) we adopted robust linear regression [50,51] LTS, with the lqs package, Huber function, and bisquare estimate using the R package MASS [52] for Formula (3).

In any multiple regression analysis, it is necessary to highlight multicollinearity, recognizing regressor variables affected by linear dependencies [53], because this issue may

cause serious complication with the reliability of the regression parameter evaluation [54]. The selection of predictors depends on many factors and particular attention must be given; nevertheless, it happens that standard error of the coefficient will increase or that some statistically insignificant variables should be significant; this is due to multicollinearity [55]. In cases of pairs of predictors with Spearman correlation values greater than 0.8, only one predictor was kept. The R package performance [56] was then used to check regression model fit, to its defined quality and goodness, and to check the model's various assumptions (i.e., normality of residuals, normality of random effects, heteroscedasticity, homogeneity of variance, and multicollinearity), and that it includes R2, root mean squared error (RMSE), and intraclass correlation coefficient (ICC) [56].

Finally, we assessed the relative importance of an individual regressor's contribution to the multiple regression model in explaining ΔV by using the R package relaimpo [57].
