**3. Results**

The methodology proposed in the previous sections was applied to the water consumption database of the pilot area, made up of *Nmax* = 1162 connections, each corresponding to a registered consumption time series. The analysis was initialized by setting 50 different values of *N* to be tested, ranging between 1 and 1162 (Table 1). For each *N*, *M* = 150 samples of *N* time series were randomly extracted from the consumption database and aggregated. Then, for each aggregated series, hourly peak demand factors *CPm*,*N*(*d*) were computed for the *m*-th sample by means of Equation (5). The total number of available monitored days *Dmax* in the 2016 database is equal to 322; thus, for each day of the week, the maximum number of monitored days is 46 (Table 1).

The computation of sample means *CPm*,*<sup>N</sup>* by means of Equation (6) was performed gathering *CPm*,*N*(*d*) values in seven groups according to the day of the week. Then, the ANOVA test was performed to highlight possible differences in the behaviour of peak factors during the week. Figure 3 shows the results of the ANOVA test as box-plots of peak factor sample means for two different values of aggregated households *N* = 10 and *N* = 1000. ANOVA outcomes highlight that there are significant differences in terms of expected values of sample means between the weekdays, the Saturdays, and the Sundays, so that three clusters can be identified, coherently with findings shown in [40]. Moreover, as expected, those differences are more and more evident the higher the *N* value and can be considered statistically significant starting from *N* = 5–10.


**Table 1.** Cluster definition and relevant parameters.

**Figure 3.** Box-plot of hourly peak factor sample means for the different days of the week and for two different numbers of aggregated households: (**a**) *N* = 10 and (**b**) *N* = 1000.

Figure 3 also shows that weekends are characterized by an expected value of peak factors higher than the weekdays. This could be explained considering that, during the weekend, people tend to adopt predictable schedules, translating in more homogeneous consumption behaviours, leading to more synchronous water uses and, therefore, producing more coherent water demand diurnal patterns. Finally, Figure 3 demonstrates that the differences among the three clusters are statistically significant and should be accounted for in further analyses. As a consequence, in the following sections four clusters will be investigated separately: Cluster 1, made up of Saturdays (*Dmax* = 46); Cluster 2, made up of Sundays (*Dmax* = 46); Cluster 3, made up of the remaining weekdays (*Dmax* = 230); Cluster 4, made up of all the days of the week (*Dmax* = 322). This last cluster is considered in order to better understand the significance of cluster separation in evaluating the statistics of interest.

#### *3.1. Sample Mean: Expected Value, Standard Error, and Scaling Laws*

According to the proposed methodology, the analysis of hourly peak factor sample means consists of the estimation of the expected value, associated standard deviation, and confidence band.

For each *N* value, *M* sample means *CPm*,*<sup>N</sup>* were computed by means of Equation (5) and the corresponding expected values μ*<sup>N</sup>* were estimated by means of Equation (6); then, the empirical relation between *N* and μ*<sup>N</sup>* was found by calibrating parameters in Equation (7). Sample means and expected values are shown, for each Cluster, in Figure 4 as a function of the number of aggregated households. Table 1 shows the estimated values of the regression coefficients *a*, *b*, and *c* and the value for the coefficient of determination, which is very high for all Clusters. Figure 5a shows the comparison between the observed expected values, computed by means of Equation (6), and the predicted expected values, obtained from Equation (7), for all Clusters. It is evident that points gather almost perfectly along the 1:1 line, showing a high accordance between the observed and the predicted values, with just a slight deviation for the highest mean values, corresponding to *N* = 1.

**Figure 4.** Sample means, expected values, and confidence bands as a function of the number of aggregated households for: (**a**) Cluster 1; (**b**) Cluster 2; (**c**) Cluster 3; (**d**) Cluster 4.

**Figure 5.** (**a**) Accordance between expected values of sample means estimated by Equations (6) and (7) for all the Clusters. (**b**) Comparison among calibrations of Equation (7) performed on the different Clusters.

Table 1 and Figure 5b show that the regression curves of the four Clusters are very similar, with only a different value for the *c* coefficient, which represents the expected value of hourly peak demand factor for a large number of households. As Figure 4 shows, this asymptotic value can be considered attained for *N* > 100–200 for every Cluster. Figure 5b and Table 1 also show that the highest asymptotic expected value is observed for the Sundays Cluster, followed by the Saturdays, and the Weekdays Clusters. Cluster 4 shows intermediated values.

As Figure 4 shows, for a fixed *N*, the *M* sample means *CPm*,*<sup>N</sup>* show a non-negligible variability, which can be quantified by means of the standard error *ESD*,*N*. In order to compute standard errors, the regression coefficients in Equation (11) were calibrated for each Cluster by using the estimate of *ESD*,*<sup>N</sup>* provided by Equation (9), and their values are shown in Table 1 along with the very high coefficient of determination. To capture the dependence of *ESD*,*<sup>N</sup>* on both *N* and *D*, different values of *D* were tested in the range *Dmin*–*Dmax*, where *Dmin* = 30 was set to ensure normality, as previously mentioned. However, in all cases the dependence on *D* resulted to be negligible with respect to the aggregation level, with very small values for the exponent β1, that was approximated to zero for all Clusters (Table 1). Moreover, Table 1 shows that for all the Clusters β<sup>2</sup> resulted equal to 0.5, with a simplification in the proposed regression equation.

Figure 6 shows, for each Cluster, the regression curve provided by Equation (11) as well as the standard errors estimated as the square root of Equation (9) for three different values of *D* (*Dmin*, *Dmax*, and intermediate value depending on *Dmax*). Coherently with the approximation β<sup>1</sup> = 0, no effect of the number of recording days can be observed, with all the points gathering along the regression curve, with just a slight deviation for *N* = 1.

**Figure 6.** Standard errors of the sample mean estimated by Equation (9) and predicted by Equation (11) for three different *D* for: (**a**) Cluster 1; (**b**) Cluster 2; (**c**) Cluster 3; (**d**) Cluster 4.

As a goodness-of-fit measure, Figure 7a shows a comparison between the squared standard error estimated by means of Equation (9) and the values predicted by Equation (11) for all the Clusters, with regression coefficients shown in Table 1. The points in Figure 7a gather almost perfectly along the 1:1 line, ensuring an extremely satisfying prediction of the sample mean standard deviation by Equation (11). Figure 7 shows a comparison among the prediction curves of the standard error as a function of *N* for the different Clusters for *D* = *Dmax*. It can be observed that the four curves show small differences for small values of *N*, which become negligible for *N* > 100–200. Coherently, in the same

range of *N*, the prediction curve for μ*<sup>N</sup>* reaches its asymptotic value for all the Clusters, which suggests an extreme accuracy in the estimation of the expected value of the sample mean for *N* > 100–200. On the other hand, this can be regarded as an effect of investigating a finite population. Indeed, if the same *N* values were analysed based on a more extended database (i.e., if a higher number of recorded households were monitored), higher values for the standard error would possibly be expected. Moreover, for *N* = *Nmax*, each of the *M* samples is made up of the same elements, so that the *M* estimates of the sample mean are equal, and the standard error of the sample mean is equal to zero.

**Figure 7.** (**a**) Accordance between the standard error estimated for *D* = *Dmax* by Equations (9) and (11) for all the Clusters. (**b**) Comparison among calibrations of Equation (11) performed on the different Clusters for *D* = *Dmax*.

The empirical estimates of the standard error were adopted in Equation (12) to obtain the 95% confidence band centred on the expected value of the sample mean, as shown in Figure 4. Confirming the previous evidence, the confidence band reduces as *N* increases, with an amplitude that can be considered negligible for *N* > 100–200.
