2.2.1. Parameter Definition

Let *qh*,*i*(*d*) be defined as a random variable which describes the water volume consumed by a single household *i* within a specific hour *h* of a specific day *d*; if *D* is the number of days with hourly registrations, the recorded sample for the hour *h* is made up of a maximum of *D* data. *M* random samples of *N* households are drawn from the database of *Nmax* households (1 ≤ *N* ≤ *Nmax*) so that each

household can belong to different samples, but every household can only be extracted once within each sample. The aggregated water demand for each day *d* at hour *h* of the random sample *m* is:

$$Q\_{h,N}^m(d) = \sum\_{i=1}^N q\_{h,i}(d) \qquad \qquad h = 1, \dots, 24 \tag{1}$$

For a group of *N* households, the hourly peak water demand *Qm <sup>p</sup>*,*N*(*d*) of the random sample *<sup>m</sup>* for each day *d* is defined as:

$$Q\_{p,N}^{\rm m}(d) = \max\_{h=1,\ldots,24} \left[ Q\_{h,N}^{\rm m}(d) \right] \tag{2}$$

where *Qm* <sup>μ</sup>,*N*(*d*) is the daily mean water demand of the random sample *m* for a group of *N* households for each day *d*, expressed as:

$$Q^{\rm m}\_{\mu,N}(d) = \frac{\sum\_{h=1}^{24} Q^{\rm m}\_{h,N}(d)}{24} \tag{3}$$

Then, for a group of *N* households, the dimensionless hourly peak water demand factor *CPm*,*N*(*d*) of the random sample *m* for each day *d* is defined as:

$$CP\_{m,N}(d) = \frac{Q^m\_{p,N}(d)}{Q^m\_{\mu,N}(d)}\tag{4}$$

By the adopted notation, *CPm*,*N*(*d*) stands for a *CP* value belonging to the *m*-th sample of size *N* and referring to day *d*. According to the purpose of the analysis, it could be either seen as part of a sub-sample of size *D* made up of all the daily observations of *CP* within one specific sample *m*, or, alternatively, it can be considered as part of a sub-sample of size *M* made up of all the observations referring to one specific day *d* across all the extracted samples. In all cases, *CPm*,*N*(*d*) is a single realization drawn from the population of the random variable *CPN* with expected value μ*N*.

#### 2.2.2. Expected Value, Variance, and Distribution of the Sample Mean

For a group of *N* households, the sample mean of the hourly peak water demand factor related to a sample *m* of size *D* is:

$$
\overline{CP}\_{m,N} = \frac{\sum\_{d=1}^{D} CP\_{m,N}(d)}{D} \tag{5}
$$

If *CPN* is an independent random variable, the mean (i.e., the expected value) of the sample means *CPm*,*<sup>N</sup>* coincides with the population mean μ*N*:

$$
\mu\_N = \frac{\sum\_{m=1}^{M} \overline{\mathbb{C}P}\_{m,N}}{M} \tag{6}
$$

Literature suggests an empirical relationship between μ*<sup>N</sup>* and the number *N* of aggregated households in the following form [7,10,11,16,28]:

$$
\mu\_N = \frac{a}{N^b} + c \tag{7}
$$

where, *a*, *b*, and *c* are the estimated regression coefficients. *c* is the horizontal asymptote of the function, representing the expected value of the sample mean of peak factor *CPN* for a very large *N*.

According to the classic sampling theory, the standard deviation of the sample mean, usually referred to as, "standard error of the sample mean" [47], *ESD*,*N*, is directly related to the population variance and to sample size *D*:

$$\text{ES}\_{D,N}^2 = Var\left\{ \overline{\text{CP}}\_{m,N} \right\} = Var\left\{ \frac{\sum\_{d=1}^D \text{CP}\_{m,N}(d)}{D} \right\} = \frac{1}{D^2} \sum\_{d=1}^D Var\left\{ \text{CP}\_{m,N}(d) \right\} = \frac{\sigma\_N^2}{D} \tag{8}$$

where σ<sup>2</sup> *<sup>N</sup>* is the population variance of *CPN*.

If the random variable *CPN* is normally distributed, the sample mean will be normally distributed too, with *CPm*,*<sup>N</sup>* ∼ *N*(μ*N*, *ESD*,*N*) independently of sample size *D*. Otherwise, based on the central limit theorem, when the dimension of the random sample becomes sufficiently large (*D* ≥ 30), the distribution of the sample mean can be approximated by a normal distribution independently of the specific distribution of the random variable *CPN*. To verify the normality of the sample mean *CPm*,*N*, well-known statistical tests can be adopted such as the Kolmogorov-Smirnov (KS) test [48].

If the random variable *CPN* is not independent (as will be demonstrated in the present paper), Equation (6) is still valid, whereas the standard error of the sample mean can be estimated according to the following Equation [45] that explicitly accounts for the covariance matrix:

$$\begin{aligned} \text{ES}\_{D,N}^2 &= \text{Var}\left\{\frac{\sum\_{d=1}^D \text{CP}\_{mN}(d)}{D}\right\} \\ &= \sum\_{d=1}^D \frac{\text{Var}\{\text{CP}\_{mN}(d)\}}{D^2} + \sum\_{i=1}^D \sum\_{\substack{j=1 \\ j=1}^D}^D \frac{\text{Cov}\{\text{CP}\_{mN}(i), \text{CP}\_{mN}(j)\}}{D^2} \\ &= \frac{1}{D^2} \left[ \sum\_{d=1}^D \text{Var}\{\text{CP}\_{mN}(d)\} + \sum\_{i=1}^D \sum\_{\substack{j=1 \\ j=1 \\ j \neq i}}^D \text{Cov}\{\text{CP}\_{mN}(i), \text{CP}\_{mN}(j)\} \right] \\ &= \frac{1}{D^2} \left[ \sum\_{i=1}^D \sum\_{\substack{j=1 \\ j \neq i}}^D \text{Cov}\{\text{CP}\_{mN}(i), \text{CP}\_{mN}(j)\} \right] \end{aligned} (9)$$

where the first term sums up the cross-sample variance for each day *d*, and the second term sums up the cross-correlation among pairs of samples.

Equation (9) estimates the standard error of sample means. When sample data are extracted from a finite population, as in the present paper, the values of the standard error can be influenced and underestimated, because there is a high probability that the same elements are extracted from the total population. Indeed, for *N* = *Nmax* Equation (9) gives a null value for the standard error, which is a degeneration caused by the fact that the *M* samples are made up of exactly the same *CPN* values. Instead, for an infinite population, a finite, although small, value for the standard error should be expected even for very high *N* values.

In case there is no spatial correlation among water demands, the covariance term in Equation (9) is null and the variance collapses back to Equation (8), with an inverse dependence on sample size *D*. In any other case, also accounting for the finite population effect (i.e., a null asymptotical value for *ES*) the dependence of *ES* on *D* can be formulated for each *N* in the generic form:

$$ES\_D = \alpha\_1 \times D^{\beta\_1} \tag{10}$$

where the coefficients depend on the structure of the spatial correlation [30,41]. Since Equation (10) can be applied for each fixed *N*, the following general equation is proposed to consider the additional dependence of the variance on *N*:

$$ES\_{D\mathcal{N}} = \frac{\alpha\_1 \times D^{\beta\_1}}{(\alpha\_2 + N)^{\beta\_2}} \tag{11}$$

When the distribution of the sample mean *CPm*,*<sup>N</sup>* for a group of *N* households is (at least approximately) normal, the lower and upper limits *CPm*,*<sup>N</sup> p* of a confidence interval centered on the mean μ*N*, for a predefined probability *p*, are:

$$\left[\overline{\mathbb{C}P}\_{m,N}\right]\_p = \mu\_N \pm \xi\_p \times ES\_{D,N} \tag{12}$$

where ξ*<sup>p</sup>* is the normal *p*-th quantile; the standard error *ESD*,*<sup>N</sup>* can be estimated as the square root of either Equation (8) or Equation (9), or directly by its empirical approximation provided by Equation (11), based on the probability distribution of random variable *CPN*. If the sample mean *CPm*,*<sup>N</sup>* is normally distributed, substituting in Equation (12) the 2.5-th and 97.5-th normal percentile values ξ*<sup>p</sup>* = ±1.96, the 95% confidence interval is obtained.
