1. Introduction
Stormwater runoff from urban impervious surfaces is considered one of the most critical non-point source (NPS) pollution since it contains contaminants that deteriorate water quality and represent a detriment to the surrounding ecological environment [
1,
2,
3]. In particular, NPS nutrients are currently the main cause of lake, stream, and coastal area eutrophication all over the world [
4]. Eutrophication of surface waters could cause increases in both phytoplankton biomass (generating the declines in water clarity), and in incidences of nuisance and toxic algal blooms (with taste and odor problems) with consequent implications in treatment costs for drinking water facilities [
5]. Therefore, the protection of surface water quality plays a crucial role for sustainable urban watershed management.
NPS nutrients are difficult to quantify due to their diffuse nature and the fact that many small sources might contribute to the generation of NPS pollutants. Another aspect to take into account is related to the adsorption of loose soil or sediment particles, which is the primary form by which nutrient offsite movement takes place in soil-water or sediment-water continuums. Therefore, the uncertainty in the evaluation of cleaning processes for urban impervious surfaces, in terms of particle quantity and granulometry, makes the quantification of nutrients in the urban environment even more challenging.
The first step in controlling the impact of NPS nutrients on water quality is the development of a thorough understanding of their sources. Therefore, it is important to create an in-depth knowledge of nutrient build-up and wash-off from impervious surfaces and their affecting factors. In recent years, there have been considerable research efforts to investigate the relationships between rainfall, catchment characteristics, and stormwater quality. Liu et al. [
6], using a total of 41 rainfall events in three different monitoring sites, found that the influence of rainfall characteristics on pollutant (total nitrogen (N
tot), total phosphorus (P
tot), total suspended solids (TSS), and total organic carbon (TOC)) wash-off is step-wise, based on specific thresholds. Additionally, pollutant build-up and wash-off processes were found to be significantly influenced by a range of catchment characteristics like land use, impervious area fraction, and urban characteristics [
6]. Yuan et al. [
7], exploiting a data set of 30 rainfall events, found that the effects of rainfall conditions on pollutant concentration (N
tot, P
tot, TSS, chemical oxygen demand (COD), ammonium nitrogen (NH
4–N), nitrate-nitrogen (NO
3–N), phosphate (PO
4–P), and heavy metals) depend on the nature of the pollutant itself. In particular, the antecedent dry period tended to affect the particle wash-off from impervious areas due to the predominance of particles with size <30 μm. Liu et al. [
8] found that pollutant inputs (total inorganic carbon (TIC), nitrate (NO
3−), phosphate (PO
43−), iron (Fe), and TOC) into estuarian and freshwater areas were found to be notably affected by seasonal factors, with the dominant pollutant sources differing between wet and dry seasons. This study was supported by extensive field sampling (96 surface water samples at eight monitoring locations).
To the authors’ knowledge, a significant research effort was spent on catchments characterized by a comprehensive record of rainfall, flow rate, and water-quality data. However, less attention has been given to the investigation of these correlations in watersheds where data availability may present a limitation. In general, the lack of a sufficient amount of hydrological data like precipitation, discharge, and water quality may induce inadequate representation of the complexity of responses to any type of precipitation input in urban systems. Furthermore, the lack of available data makes it difficult to assess pollutant build-up/wash-off and their impact parameters. In such a case, the system behavior is usually investigated through simulations of numerical models; nevertheless, these models are usually deterministic and based on the adoption of design rainfall events rather than real observations. This could lead to inefficiencies in the treatment of stormwater quality due to inadequate design. In any case, before undertaking expensive studies to gather and analyze additional data, it is reasonable first to understand what enhancement, in estimates of system performance, would result in an increased amount of available data.
On the basis of these considerations, the main objective of this work is to develop a methodological framework able to assess how rainfall, catchment, and drainage characteristics affect urban nutrient runoff in poorly gauged watersheds. The second objective of this study is to evaluate the diagnostic capacity of the proposed approach in assessing hydrologic/hydraulic and water-quality modeling performance.
Two real-world case studies with their practical specifications and challenges are reported in this work while testing the proposed methodology based on principal component analysis (PCA) and hierarchical cluster analysis (HCA) to cope with data scarcity.
The remainder of this paper is organized as follows. In
Section 2, an overall description of the study approach, along with a thorough description of each phase, is presented. In
Section 3, the parameter selection and the results of the influence of the rainfall and basin/drainage characteristics on stormwater quality are reported. The main conclusions are presented in
Section 4.
2. Materials and Methods
2.1. Study Approach
The proposed methodology is presented in
Figure 1. It consists of four main phases. In
Phase 1, Data collection, observed rainfall events were analyzed over the study areas and rainfall time series of 15 minutes of aggregation over ten years were generated using the iterated random pulse (IRP) rainfall generator proposed by Veneziano and Iacobellis [
9]. In
Phase 2, Data selection, a comparison between the generated and observed rainfall events was performed to select, among those generated, the most similar ones to those observed. For this purpose, the parameters in the antecedent dry period (ADP), total rainfall (TR), and event duration (ED) were taken into account. In
Phase 3, Hydrologic/hydraulic and water-quality modeling, the hydrologic/hydraulic and water-quality models were implemented to analyze and reproduce the water quality and flow rate of the selected events. In order to carry out the second objective of this study, two models implemented in two different watersheds (Sannicandro di Bari and Ruvo di Puglia) were considered. The hydrologic/hydraulic and water-quality models implemented in Sannicandro di Bari were successfully calibrated and validated, while at Ruvo di Puglia, only the hydrologic/hydraulic model was calibrated. In
Phase 4, Data analysis, the simulation outcomes were analyzed using two multivariate data analysis techniques: principal component analysis (PCA) and hierarchical cluster analysis (HCA). The results were properly examined and discussed to accomplish the two main aims of this study.
2.2. Study Area
The study area includes two watersheds belonging to Sannicandro di Bari and Ruvo di Puglia municipalities, located in the Puglia region (Southern Italy).
The drainage area of the watershed located in Sannicandro di Bari (hereafter called SB) is equal to 31.24 ha and, covering approximately 60% of the total urban area, and is characterized by 21.87 ha (70%) of impervious surface. The land-use map, extracted from the regional geographical information system (SIT Puglia), shows that the entire drainage area is exclusively residential and only 3.80% of the basin (1.20 ha) is covered by urban-green. The drainage basin has an average slope of 1.56% and the drainage network, used only for stormwater, has a total length of 1.96 km and collects water into a concrete rectangular (1.20 m wide and 1.70 m high) channel [
10]. The rainfall measurement is carried out by using a rain gauge installed close to the outlet of the drainage network. The water-quality data are observed by collecting samples through an autosampler with 24 bottles of 0.5 L each.
The Ruvo di Puglia watershed (hereafter called RP) is located in the suburban area of the same town. The drainage area, with a surface equal to 51.56 ha, is characterized by 41.25 ha (80%) of impervious surface and average slope of 2.20%. Similar to SB, RP is characterized exclusively by residential land use. The drainage network used only for stormwater has a total length of 5.86 km and ends with a circular pipe with 1.30 m in diameter. A rain gauge, a current meter sensor and an autosampler are installed close to the outlet of the drainage network to get rainfall, discharge, and water-quality data.
In
Figure 2, the area of the two watersheds, the drainage network, and the respective outfalls are shown.
It is worth remarking that these two watersheds were selected for the following reasons:
i) they are situated in urban areas;
ii) they are located in a semi-arid environment where precipitation events mostly occur in the winter season;
iii) a monitoring campaign (that provided water quality and quantity data) was carried out in a previous study [
10].
2.3. Data Collection
Two different sets of data were used in this study: (i) observed rainfall, flow rate, and water-quality data derived from monitoring campaigns, used for model calibration and validation; (ii) generated rainfall time series obtained through numerical simulation, used as input data in the hydrologic/hydraulic and water-quality models for producing flow rate and water-quality data.
Regarding the observed data, the monitoring campaigns provided rainfall, flow rate, total suspended solids (TSS) and nutrients (N
tot and P
tot) for five rainfall events occurred on SB (11/10/2006, 11/22/2006, 12/17/2006, 01/24/2007 and 02/10/2007) and two rainfall events occurred on RP (09/18/2006 and 09/23/2006).
Table 1 and
Table 2 summarize, respectively, hydrologic data (including rainfall event duration [h], length of antecedent dry weather [day], total runoff volume [m
3], runoff peak flow rate [m
3/s], and total depth measured at the outfall [m]) and water-quality data (including maximum, minimum and mean concentration [mg/L]) related to all the observed events.
Synthetic rainfall time series were generated through the IRP rainfall model (proposed by Veneziano and Iacobellis [
9] and Veneziano et al. [
11]) providing a time series of 15 minutes of aggregation and length of 10 years. The IRP rainfall generation uses the classical representation of the exterior process of the rainfall, as an alternating sequence of dry and wet periods with independent durations, which characterizes the arrival, duration, and average intensity of rainfall events at the synoptic scale. The wet and dry periods are respectively assumed exponentially and Weibull distributed. The average rainfall intensities in different wet periods are independent and identically distributed as an exponential distribution. In particular, the wet periods of the exterior model are scattered through the “interior” scheme, in which the rainfall is represented as the superposition of pulses with a hierarchically nested structure of temporal occurrences, with multifractal properties of location and intensity.
The model consists of six parameters in total. The exterior process is characterized by four parameters: the mean duration of the wet (
mτwet) [h] and dry periods (
mτdry) [h], the exponent
k of the Weibull distribution of the dry intervals, and the mean value (
mI) of the average of the rainfall intensity during the synoptic events [mm/h]. For the definition of the interior process, two other parameters related to its multifractality are introduced: the parameter
C1 that controls the multifractal properties of rainfall at small scales and the multiplicity
r that controls the quasi-fractal behavior of the rainfall support at small scales. To calculate
mI, Veneziano and Iacobellis [
9] proposed the following relationship:
where
R is the annual mean hourly rainfall intensity over the entire investigation period [mm/h]:
where
htot is the total annual precipitation mean value for each rain gauge station [mm].
The model was applied to SB and RP watersheds, whose generated rainfall time series are reported in the
Supplementary Material (SM-1). For the sake of simplicity, the generated rainfall series are reported in chronologic sequence, considering the first data associated with the 01/01/2014. In
Table 3, the model parameters (extracted from [
12]) for the two catchments are reported.
2.4. Data Selection
In this section, a comparison between generated and observed rainfall events was performed to select the generated events more similar to the observed ones. In particular, the following steps were considered for data selection:
- 1)
In the generated time series of 15 minutes of aggregation and a length of 10 years, single rainfall events were identified. Taking into account the regional regulation [
13], a criterion of 48 hours of dry weather was used to define single rainfall events. Accordingly, 408 rainfall events for SB and 117 for RP were identified.
- 2)
For each identified rainfall event, antecedent dry period, total rainfall, and event duration were calculated.
- 3)
The generated rainfall events were compared with the observed ones to choose those with the most similar characteristics. The comparison was carried out considering, for the seven observed events (for SB and RP), the following range of the difference between observed and generated values of the three investigated rainfall parameters, assuming that the variation of each parameter within this range will not affect the modeling outcomes:
Antecedent Dry Period (ADP) = ±1 day;
Total Rainfall (TR) = ±1 mm;
Event Duration (ED) = ±15 min.
- 4)
An index of similarity between observed and generated rainfall events was defined. It ranges from 1 to 3 since there are three rainfall parameters used for the comparison (ADP, TR, and ED). For instance, if only the ADP value of a generated rainfall event falls within the corresponding range, the index of similarity is equal to 1. If all three parameters fall within the corresponding ranges, the index of similarity is equal to 3. Following this approach, we have selected the events characterized by the highest index of similarity for both study areas. In particular, twelve generated rainfall events for SB and eight for RP were identified (see
Table 4 and
Table 5, respectively).
2.5. Hydrologic/Hydraulic and Water-Quality Model Implementation
The Storm Water Management Model (SWMM) simulates the hydrograph and the pollutograph for a real storm event (for a single and long-term event) based on the rainfall and other meteorological inputs, and system characteristics (catchment, conveyance, and storage/treatment) for urban and rural areas [
14]. SWMM has been designed in operating units or blocks. The runoff block, as well as the transport block, were adopted for this study. To simulate the runoff from urban surfaces, the kinematic-wave equation was chosen. In addition, the water losses taken into account are due to the depression storage on the impervious area of the basin and to the infiltration process on the remaining portion of the catchment. The latter is based on Horton’s equation, whose parameter values have been chosen according to the representative values reported in the literature, in relation to soil type [
15].
Water-quality processes include the generation of surface runoff constituent loads through the build-up of pollutants during dry weather, wash-off during wet weather, and first-order decay with the simulation of resuspension and deposition within the sewer system. An in-depth explanation of the mathematical representation of these physical processes is reported in the recent scientific literature [
3,
10,
16]. In particular, Di Modugno et al. [
10] showed good performances in predicting the quantity response of SB by using a hydrologic/hydraulic model with different sets of input parameters; moreover, the simulation of sediment build-up and wash-off was successfully tackled.
With the aim of accomplishing the second objective of this study, i.e., evaluating the diagnostic capacity of the proposed framework in assessing model performance, the water-quality model implemented in RP was not calibrated.
For both water-quality models, nutrients were simulated, considering the power build-up and exponential wash-off functions for high-density residential land use [
17].
2.6. Data Analysis
Two multivariate data analysis techniques, the PCA and HCA, were applied to identify linkages between pollutant parameters and correlations with rainfall, catchment, and drainage system characteristics. They were coded and run in R [
18], by using the libraries “devtools” and “ggbiplot.”
PCA is a technique that reduces a set of parameters into a number of principal components (PCs), which explain the most variance within the original data with the aim of identifying possible patterns between variables and data points. In particular, in its graphical visualization (PCA biplot), correlated parameters are represented by vectors that form an acute angle, while those that are uncorrelated are represented by perpendicular vectors. This technique has been extensively used for several applications related to water quality [
19,
20,
21]. A detailed description of PCA and its applications can be found in the literature [
22,
23]. PCA was chosen for this study because it provides a holistic vision of all the variables involved in the system. Particularly, this analysis offers a meta-description of all the variables, which has the advantage of gathering possible emerging properties of the system. These could be hidden, instead, if we focus only on the physical meaning of each original variable (macro-description).
HCA is typically used for clustering complex data sets and, in this study, it is considered as a complementary technique to PCA. Both PCA and HCA are unsupervised methods, meaning that no information about cluster belonging, or other response variables, are used to obtain the graphical representation. This makes these methods suitable for exploratory data analysis, where the aim is hypothesis generation rather than hypothesis verification [
24]. The goal of the clustering algorithm is then to partition the data points into homogeneous groups, such that the within-group similarities are large compared to the between-group similarities. These groups can reveal patterns related to the phenomenon under study. A distance function is used to evaluate the similarity between data points. The similarity is calculated first between the observations and, once the observations begin to be grouped into clusters, between the groups as well. To calculate the similarity, several metrics could be used, and the choice of the similarity measure could have a large effect on the result. This analysis also requires a priori choice of the number of clusters (
k) that is arbitrary.
4. Conclusions
In this study, we propose a methodological framework able to: (i) assess how rainfall, catchment, and drainage characteristics affect urban nutrient runoff in poor-gauged watersheds; (ii) evaluate hydrologic/hydraulic and water-quality model performance.
We exploited data from two real urban watersheds (SB and RP) located in the Puglia region (Southern Italy). The water-quality model was implemented in both sites, but it was calibrated and validated only at SB, with the aim of evaluating the diagnostic capacity of the proposed approach in assessing water-quality model performance. Good performances in predicting the quantity response of both sites were obtained by using the hydrologic/hydraulic module of SWMM with different sets of input parameters.
PCA and HCA were used to identify linkages between pollutant parameters and rainfall, catchment, and drainage-network characteristics. Three different sets of data were analyzed: (i) observations (data obtained from the monitoring campaign), (ii) simulations (data obtained from SWMM and measured rainfall), and (iii) generations (data obtained from SWMM and rainfall generated with IRP).
Based on PCA and HCA outcomes, the main conclusions can be summarized in the following points:
The three rainfall-related variables (ADP, total rainfall, and runoff volume) presented a strong relationship-driven by ADP vector. In fact, the longer the ADP, the higher the water loss in the system, the greater the distance between total-rainfall and runoff-volume vectors.
The solid relationships between EML_TSS and nutrient EMLs, and between EMC_TSS and nutrient EMCs, suggest that sediments transport plays a significant role in the mobilization of nutrients from urban impervious surfaces. This confirms that TSS can be considered as a synthetic index of the general level of pollution in urban areas.
The robust correlation between runoff-volume vector and EMCs suggests that the greater is the volume of runoff, the smaller is the pollutant concentrations due to the dilution effect.
The ADP behaves in different ways in the two study areas: at RP, it is clear that the longer the dry period, the higher is the number of pollutants (EMLs) accumulated on impervious surfaces. At SB, this relationship is less clear. It is worth mentioning that for SB, the ADP values are higher (up to one order of magnitude) than the ones measured/generated for RP; therefore, it is more probable that in longer dry periods, degradation processes (not considered in this study) play a critical role in the urban pollutant build-up.
Considering that % impervious-area, % slope, basin-area, and conduit-roughness almost overlap, it is clear that these parameters have exactly the same effect on water quality: if they increase, sediment and nutrient load increases. While width has a diametrically opposite behavior. Since width represents the distance over which overland flow leaves the sub-catchment surface and enters into the main drainage conduit, the bigger width, the higher is the number of pollutants that enter the drainage network.
The diagnostic capacity of this approach to evaluate model performance was successfully tested with HCA. At SB, simulated events were featured in the same clusters with the observed and generated. While, at RP, the simulations were included in the least similar clusters.
Further development can be pursued within the proposed framework in particular for evaluating possible model improvements aimed at reducing model structural uncertainty without increasing the consequent parameter uncertainty. Nevertheless, for the purpose of this paper, particular importance has the proposed methodological framework, whose diagnostic ability becomes crucial in real design cases where classical calibration/validation exercises are not possible due to the absence of sufficient direct observations. The outcomes presented in this study are expected to aid researchers and technicians in assessing and quantifying their confidence in the water-quality predictions, which facilitates informed analysis, communication, and decision-making.