**1. Introduction**

A water distribution system (WDS) comprises various components (e.g., nodes, pipes and pumps), each of which has its own purpose and function. For example, a pumping unit elevates the total head of water to supply demand at high elevation, whereas a pipe transports water from one location to another. Hydraulic WDS models have been developed for many reasons, including design and rehabilitation, operation and managemen<sup>t</sup> and system surveillance. The real-world system is simplified for modeling purposes during the development process of a hydraulic model. As shown in Figure 1, a group of households (h1–h5) can be simplified to node N1, because they are the same type of users (*i.e*., residential) and in proximity to each other. Similarly, households h6–h9 can be modeled as node N2. Node N3 is the sum of the commercial demands c1–c5.

**Figure 1.** Residential and commercial demands represented as nodes in a hydraulic WDS model. The dashed ellipse shows a potential node group.

State estimation is defined as the process of calculating a state variable of interest that cannot be directly measured [1,2]. In a WDS, nodal demands are often considered as state variables (*i.e.*, unknown variables) and can be estimated using nodal pressures and pipe flow rates measured in the field. Previous demand forecasting/estimation studies have focused on estimating hourly to monthly macro-scale demand (*i.e.*, total system demand) by using time series models [3–6]. A few studies have recently proposed micro-scale demand (*i.e.*, nodal demands) estimation methodologies with a more precise time step (e.g., 15 min) based on coupling the Kalman filter (KF) with a hydraulic simulator (e.g., EPANET [7]) [2,8]. Advances in sensor technology, such as increased battery lifetime and communication frequency, have encouraged the development of demand estimation techniques compatible with field measurements from advanced sensors (e.g., advanced metering infrastructure (AMI)).

The WDS demand estimation can produce accurate results when sufficient field measurements are available. However, a sufficient number of meters cannot be installed in a system because of budget constraints. In demand estimation, the demand is the unknown variable, whereas the pipe flow/nodal pressure is the known variable. Therefore, the number of unknowns should be reduced to make the demand estimation problem even-determined. An alternative is to aggregate (*i.e.*, group) the nodes. For example, nodes N1 and N2 can be aggregated to Node Group 1 (NG1) because they have the same residential usage pattern (Figure 1).

Many methodologies have been proposed for the WDS component grouping, clustering and aggregation in the last two decades. However, most do not target demand estimation. Mallick *et al.* [9] investigated the tradeoff between the WDS model error caused by simplification (*i.e.*, pipe grouping) and the model prediction error. They subsequently proposed a method to identify the best number of pipe groups. Deuerlein [10] proposed a water distribution network decomposition method that classifies network components into forest, bridge and biconnected-block components. The latter two are called "core components". The three component types are used to augmen<sup>t</sup> the network graph. Perelman and Ostfeld [11] developed a multilayered clustering method as an extension of their previous clustering algorithm [12] based on depth-first search [13] and breadth-first search [14].

Diao *et al.* [15] proposed a community structure identifier that uses modularity (*M*) as an indicator to "quantify the quality of the graph division into communities" [16]. This approach merges vertices so that communities are aggregated with the greatest increment of modularity (Δ*M*). In their follow-up paper [17], they proposed a methodology to decompose a WDS into a twin-simplified pipeline structure comprising backbone mains and community feedlines.

Di Nardo *et al.* [18] used graph theory principles and a heuristic procedure for WDS sectorization based on minimizing the amount of dissipated power in a WDS. Giustolisi and Ridolfi [19] developed a multi-objective strategy for optimal WDS segmentation that maximizes a modified modularity index and minimizes the cost of newly-installed devices to obtain network segments. In summary, little effort has been devoted to the application of an optimization technique to the WDS component grouping, clustering and aggregation.

Few studies have considered node grouping as a method of decreasing the unknown dimensions for the WDS demand estimation. Kang and Lansey [2] compared the two following real-time demand estimation methods with respect to their WDS demand estimation accuracy: the tracking state estimator and the KF. They aggregated nodes in a study network into multiple groups to reduce the number of unknowns and to make the demand estimation problem overdetermined or even-determined, which was also suggested in [20]. Jung and Lansey [8] employed the nodal demand aggregation approach of Kang and Lansey in their KF-based WDS pipe burst detection model. They then demonstrated that using the same number of node groups as the number of flowmeters yielded the best accuracy with the KF-based demand estimation scheme.

Note that the KF-based estimation method has been used for real-time WDS demand estimation [2,8], whereas optimization techniques have been widely used for demand and its hourly pattern calibration, which estimates base demands and leakage (non-real-time estimation) [21–24]. Less estimation time is generally allowed for the former (e.g., 5 min), which results in using different methodologies for the two different estimation purposes. The base demand and its hourly pattern are not to be determined in the real-time demand estimation. The commonality is in the use of a hydraulic model (e.g., EPANET). Accordingly, the demand calibration methodologies solve a non-linear system equation to quantify the accuracy of possible solutions, whereas the KF-based method solves a network equation to obtain pipe flow rates compared to field measurements.

Node grouping is one of the most important factors affecting the accuracy of the WDS demand estimation. However, previous studies have determined node groups based on engineering knowledge/sense (e.g., grouping nodes with the same demand pattern or in proximity) or assumed that they are given. Identifying optimal node groups using such approaches is very difficult considering the complex hydraulic relationship between the pipe flows/nodal pressure at sensor locations and the demand of node groups, especially in actual large networks (mostly loop-dominated). Note that the task is more difficult when sensors are not located at the best points for demand estimation, which is generally not the main concern during the sensor network design. Therefore, an optimization-based approach to find the optimal node groups for a highly accurate WDS demand estimation is required.

This study proposes an optimal node grouping (ONG) model to maximize the accuracy of the real-time WDS demand estimation. The KF-based demand estimation method is linked with a genetic algorithm (GA) for node group optimization. The modified Austin network demand is estimated to demonstrate the proposed model's validity. True demands and field measurements are synthetically generated using a hydraulic model of the study network.
