**1. Introduction**

In the last decades, the understanding and prediction of water consumption have become a focal point of EU policies and directives, with the general aim of supporting safe access to drinking water and basic sanitation services to the people. In this context, the estimation of water demand in a distribution system is a key issue when applying management strategies to reduce costs and preserve the resource [1].

The water demand of a single user exhibits a random and pulsing behaviour; however, the aggregation of a large number of consumers is able to highlight trends, seasonal cycles, and the possible existence of peaks. Such quantities usually have different values and features according to the scales of the measured or aggregated data (hourly, daily, weekly, monthly, seasonally, yearly). The estimate of peak values is crucial to design drinking water distribution networks, in order to obtain reliable systems, able to provide a good level of service in terms of demands and pressures [2]. The knowledge of water consumptions and of the relative peak values is also required in many applications where the simulation of the system functioning is needed, whose results are strongly affected by demand uncertainty [3–6].

The estimation of hourly or sub-hourly peak demand due to residential uses has been widely studied adopting different methods and techniques. Top-Down Deterministic Approaches (TDAs) provide empirical relationships based on the number of users for the estimation of the hourly or sub-hourly demand peak factor, defined as the ratio between the maximum and mean flow. TDAs usually focus on the whole network, analysing the water consumption of the total served population. The first relationships [7,8] estimated the dependence on the population of the instantaneous peak factor in sewer systems. Some research found that the hourly peak factor can be considered constant when the population is lower than a fixed threshold, while it decreases when the users exceed the threshold value [9]. Those empirical equations for wastewater peak factors tend to be restricted to a minimum population of one thousand and a maximum population of one million. More recently, a formula was proposed for characterizing the mean value of the peak water demand for small towns through statistical inferences on a large database [10], providing a lower estimate compared to the Babbitt's formula [7]. Moreover, investigated the effect of the data time sampling interval on the evaluation of the peak factor was investigated [10]. The dependence of peak factors on the number of users was also the subject of investigations [11], to provide empirical relationships for the estimation of the parameters of the Gumbel probability distribution, able to represent the stochastic behaviour of peak water demand.

Bottom-Up Approaches (BUAs) try to reconstruct nodal demands generating a large number of synthetic realizations of individual users' consumptions described by a stochastic variable. It has been proved that at the fine temporal scale the nodal demand takes the shape of a pulse [12]. In this context, temporal trends of instantaneous nodal consumptions are reconstructed aggregating demands produced by stochastic pulse generation methods, such as the Poisson Rectangular Pulse (PRP) (e.g., [13–18]) or the cluster Neyman-Scott Rectangular Pulse (NSRP) (e.g., [4,19,20]). A single pulse is associated with each demand event, whose arrival time is described through a Poisson process. In the proposed methods, pulse duration and intensity have been generated assigning different specific probability distributions: Normal [15], exponential [4,19–21], log-normal [12] for the duration; exponential [4,15,19,20], Weibull [21], log-normal [12] for the intensity. More recently, a method was proposed to account for the correlation between pulse duration and intensity, which led to some improvement in pulse consistency [22].

To apply these methods, model parameters need to be assigned. The parameters' values can be obtained using measured pulse features obtained by monitoring consumptions with an ultra-high time resolution [12,17,18] or reproducing statistical properties of aggregated consumptions, when they are known at a higher temporal step (1 min or larger) [19,23].

In this context, Blokker E.J.M., et. al., [24,25] proposed the SIMDEUM model for the reconstruction of water consumptions starting from the micro-components of water demand. The PRP model was used, but different distributions were adopted to generate the pulses produced by the different household fixtures and users. Then, for its parametrization, knowledge is required about the occupants' habits and about the end uses of the fixtures obtained from a survey of the considered households. This can be done, for example, by analysing the water end-users that drive peak daily demand and examining their diurnal demand patterns using data obtained from high resolution smart meters [26]. The PRP and SIMDEUM models have similar performances [27], with the former prevailing at the single household scale and the latter prevailing in case of multiple households. In all cases, BUAs require a significant computational effort and, for their parametrization, a detailed knowledge of the consumptions at a small spatial scale is required.

More recently, proposed a probabilistic approach was proposed for a reliable estimation of the maximum residential water demand represented by a single variable [28], showing the reliability of the log-normal and Gumbel distributions in representing peak water demand during the day. The authors suggested practical equations for the estimation of the expected value and coefficient of variation of the daily peak factor and investigated time scaling effects.

Many studies investigated the influence of the acquisition time step in water demand modelling [23,29,30]. In this context, analysing water consumption data recorded at time intervals from 5 min to one hour, a significant effect of the sampling time step was observed [31] and new equations were derived for the evaluation of the peak value. A comparison of the instantaneous estimate of the maximum demand obtained ata1s time step through a BUA with the one computed from hourly average estimates using a TDA was also performed [32]. As expected, results showed that the latter gives small demand values, especially at small spatial aggregation scales, while at increasing aggregation levels the difference decreases, because the random fluctuations tend to be smoothed with consequently smaller peak values.

The effect of spatial aggregation is less studied. First attempts investigated the effect of both time step and spatial aggregation on the cross-correlation between nodal demands, however limiting the analysis to a group of five and ten houses [33]. Results highlighted an increase in correlation for increasing spatial aggregation, while a decrease of the standard deviation was observed.

In the last decades, the rising development of smart meters systems for household water consumption monitoring provided new modelling perspectives [34,35]. Smart metering can provide data recordings at different levels of accuracy, from 1 s to hours, depending on the characteristic of the system and on the objective of the investigations [36,37]. With a reasonable economic impact, water companies started with the installation of smart water meters, usually placed in a large number of households and collecting hourly measurements. In fact, water companies are mainly interested in controlling and understanding aggregated consumptions in order to make decisions on pricing strategies, on future interventions, and on consumption reduction. Some approaches have been recently proposed for modelling demand patterns using measurements at large time steps [38–41].

A first objective of the present study is to understand how water companies can obtain information about the estimate of the peak factor, starting from measurements realized for different purposes on large networks with an hourly temporal scale. The paper presents a methodology for performing a statistical analysis of hourly data in order to analyse the behaviour of the hourly peak demand values as a function of spatial data aggregation using a high number of measurements. The considered test-case is a large-size District Metering Area of the water distribution network of Naples (Italy) equipped with a smart metering system, which provided water demand measurements performed with a one hour time aggregation on more than 1000 households for one year [39,40]. The main novelty of the study lies in the complex sampling design adopted, which allows treating hourly peak factors as stochastic variables for each fixed number of aggregated meters, accounting for possible cross-correlation and finite population effects. In this way, the main statistics (including expected values and variability) of the peak factor can be obtained as a function of the size of the considered group of users, and compared with other literature indications adequately scaled to account for different time scales. The main goal of the research is to provide the operators with a procedure for understanding the reliability of the network in terms of demand and pressure at different levels of users' aggregation using available data. This information is particularly useful to analyse the behaviour of old water networks, where the operating conditions may differ from the ones considered at the design stage, or to design future measures to improve the system management, such as the creation of District Meter Areas.

The paper is structured as follows. Section 2 describes the District Metering Area under investigation and the collected measurements, the main objectives and features, and the methodological framework of the analysis. Section 3 reports the outcomes, while Section 4 provides a discussion about the applicability of results. Finally, significant conclusions are drawn in Section 5.

## **2. Materials and Methods**
