**4. Discussion**

In this study, an analysis of reliability of the ecological status classification resulted by monitoring macrophyte assemblages according to the WFD in the Venice Lagoon was proposed and discussed in relation to monitoring e ffort review.

Previous theoretical studies proposed mixed models to estimate uncertainties in monitoring data, considering numerous di fferent sources of variation that could a ffect an indicator, from the uncertainty related to sampling and analysis, to spatial and temporal variations and their interactions. However, when real data were considered, if some sources of uncertainty were small, they were disregarded by analysis [20–26]. In the current study, temporal variations were not considered, as the application of the MaQI index foresees to merge both spring and autumnal data collected in the year of monitoring, avoiding contribution from intra-annual variations to uncertainty [9]. Uncertainties associated with sampling and analysis methodology were excluded as the same laboratory sta ff was involved in the whole campaign at all activities, strictly following national protocols. Accordingly, only spatial variations were considered, also taking into account the practical need to reduce monitoring e ffort. However, when uncertainty is estimated from a larger dataset, pooling observations from multiple ecosystems with similar characteristics, with several spatial, temporal, and analytical method variations, it might be desirable to quantify every component at di fferent scales [3,22]. In addition, as soon as data from subsequent monitoring cycles are available, inter-annual variability may also be assessed [7,24–26].

In the current study, the reliability of classification was assessed in terms of probabilities that the observed ecological status classification (mean value at WB scale) lies inside the right class, i.e., inside the class assigned from the index. Spatial variations were estimated by the confidence interval: higher values resulted at ENC2, a small WB characterized by high hydromorphological, and pressure gradient from Lido inlet to Venice island. Lower values were mostly observed at polyhaline WBs (annual mean of salinity < 30), such as PC1, PC2, and PC3, which are mainly characterized by lower internal ecological variability. By Student's *t*-distribution, the probability that the actual mean value of MaQI EQRs of each WB fell within each one of the five WFD classes ranged between 53.9% and 99.6%. However, it should be considered that the confidence interval itself is not su fficient to determine the risk of misclassification. The uncertainty is in fact determined both by the width of the confidence interval and by the proximity of the mean to the class boundary and in particular to the critical threshold good/moderate. Anyway, considering the critical good/moderate boundary, which is important for making decision about measures by governments, results highlighted a satisfactory reliability of the WFD MaQI classification of 2011 (83–100%).

From results of the estimation of confidence, it was possible to investigate where a reduction of the monitoring network e ffort could be allowed, avoiding excessively increasing the risk of misclassification. Recent studies proposed, for monitoring programs assessing status of WBs, to identify the optimal allocation of samples in time and space, through the quantification of the di fferent uncertainty components a ffecting monitoring data [22,23]. As reported above, in this study, the main factors affecting the reliability of MaQI results were spatial variations, therefore changes focused only on the number and location of sampling stations.

Again, statistical principles o ffer relatively simple and suitable tools to address the optimization of sampling e ffort, providing a quantitative and objective assessment of the impact of sampling strategy on the risk of misclassification. On the other hand, results of statistical analysis require a careful analysis before their application and the operative choice is likely to benefit from including a final revision by expert judgment. Indeed, this study highlighted as the purely statistical approach, based on the amplitude of the interval of confidence, could lead to not-applicable or meaningless results. For instance, according to the first scenario (*L* = 0.1), to reduce the *L* value of the WB ENC2 from 0.27 to 0.1, the number of stations should increase from 7 to 51. Considering that the sampleable area of ENC2 is about 10 km2, it means five stations per km2, an e ffort unachievable in the framework of Institutional monitoring and far away from the concept of "optimization". Under scenario *L* = 0.1, more than one station per km<sup>2</sup> would be also required within WB ENC4, and similar issues of incoherence, in case

of the strictly application of statistical approach, are observed in the scenario *L* = *L*mean. Conversely, the application of the second scenario (*L* = 0.2) resulted in very little restriction for some large WBs, especially those with lower standard deviations. For instance, under this scenario, the number of stations within the WB ENC1 decreased from 26 to 8 (1 station for 13.5 km2). All these evaluations are obviously linked to real data. However, there could be contexts where statistical results are confirmed even after expert judgment.

Accordingly, a revision process by expert judgment is essential. Intrinsically, the expert judgment is di fficult to standardize and it introduces subjectivity into the evaluation. Therefore, it is crucial to guarantee maximum transparency on the followed criteria and later provide an objective estimation (validation) of the impact of the choices. In this study, the expert judgment followed the criteria described in Sections 2.3.2 and 3.2, and it aims (i) to homogenize the reliability of classification between WBs, (ii) to ensure for all WBs a minimum e fficient reliability, (iii) to ensure a high reliability of the status classification regarding the critical boundary good/moderate, and (iv) to consider other particular elements of each WB such as dimensions, and hydrological and morphological characteristics.

To ensure the validation process, spatial interpolation of data was performed. Geostatistical techniques are based on the hypothesis that nearer observations are more similar to one another than to distant observations, and therefore allow you not only to insert new points where knowledge is more approximate, but also to eliminate others in those where they are redundant [27]. Furthermore, mathematical variogram models could assess the reductions of sampling e ffort by interpolations. Previous attempts to validate optimal locations of monitoring networks were performed adopting specific variogram models, such as in ordinary Kriging or Bayesian maximum entropy, depending on characteristics of their study areas [19,28]. The Venice Lagoon is a complex transitional area characterized by a composite mosaic of canals, salt-marshes, mud and sand intertidal flats, shoals, man-made structures, and islands [29,30]. Accordingly, spatial interpolations of the MaQI EQRs were performed by applying Kernel interpolation with barrier. The choice of this model, rather than others, was made in order to obtain interpolations of data which reflect the morphology of the Venice Lagoon, therefore with the presence of breaklines. Cross-validation results of both interpolations confirmed the right choice of the model, as mean prediction errors tended to be zero and quite small root-mean-square errors were observed. Moreover, both mean prediction errors resulted positive, which indicated that both models slightly overestimated the data [31], with relatively better results in the MaQI network of 2011 rather than in that reduced. To quantify the relative error of the reduced MaQI network, standard errors of the two interpolations were related by the formula of [19] modified using Kernel standard errors instead of Kriging.

In summary, results of the monitoring e ffort review of the WFD MaQI network of the Venice Lagoon showed a relative error of 22.7%, which can be considered acceptable taking into account the total reduction of stations of 26.3%. To evaluate the optimization process, Adhikary et al. [19] reported values below 30% as acceptable as considering the zero trend as a criterion. In order to obtain a more objective assessment, we performed a further check to validate the optimization process. As the aim of this study was also to avoid risk of misclassification, we extrapolated, both from the Kernel interpolation map of the original and the reduced networks, the EQR values of each WB. Then, results of each WB were tested by Student's *t*-test to verify that the reduction had no significantly modified from the original network. All comparisons did not significantly di ffer, except for PC1, PC3, and PNC1. Moreover, the classification results were not a ffected by the optimization process at all WBs.
