**2. Methodology**

This research proposes an ONG approach that minimizes the sum of the root-mean-square error (RMSE) for the estimated demands of a given number of node groups. The nodes should be aggregated to decrease the number of unknowns in the demand estimation problem. Figure 2 shows the structure of the proposed ONG model, which comprises three main submodules. Each submodule represents an important factor that affects the demand estimation accuracy.

The WDS demand estimation accuracy is affected by the available information, estimation method and node grouping. The amount of information from field measurements on the estimated demand is a function of the number and types of sensors and their locations. The sensor network layout (*i.e*., number and location of sensors) is assumed to be provided ("sensor network" in Figure 2) because the demand estimation is not a primary concern for decisions on the number and type of sensors and their locations. The KF-based WDS demand estimation methodology proposed by Jung and Lansey [8] is employed in this study because of its high accuracy. Based on the potential node groupings provided from the optimization algorithm (GA) submodule and field measurements obtained from the sensor network, the KF-based method estimates the node group demands. The accuracy is then quantified using the RMSE. The RMSE serves as the fitness value of the node grouping optimization and is minimized to increase the WDS demand estimation accuracy.

**Figure 2.** Structure of the proposed optimal node grouping (ONG) model. OF indicates objective function.

The following subsections detail the node aggregation, field measurements, KF-based demand estimation method, ONG model and GA. The blocks corresponding to each subsection are shown in Figure 2.

### *2.1. Node Groups and Demand Patterns*

Each node can be classified into one of the node groups. However, the total number of groups is predefined. Each node is labeled with an integer number from 1 to the total number of node groups during optimization. For example, each node is indexed among integers 1–14 if the WDS has a total of 14 node groups. This number is set equal to the total number of flowmeters. Therefore, the decision variables of the proposed ONG problem are all integer values (Figure 3).

Figure 4 shows the five following general diurnal demand patterns considered by Kang and Lansey [2]: three residential, one industrial and one commercial. The three residential users are an apartment (Residential 1 in Figure 4), houses with half-acre lots and large home lots (Residential 2 and 3, respectively). The residential apartment demand is characterized by higher peaks early in the morning (6–7 a.m.) and evening (6–7 p.m.) than those of the other two residential demands. Note that a residential user with a large lot has the lowest peak factor and attenuated demand change during the day. The industrial water usage is relatively constant throughout the day, while the commercial demand sharply rises and falls at the start and end of the workday and is constant during the workday.

**Figure 3.** Decision variables of the proposed ONG model.

**Figure 4.** Diurnal demand curves for five user types: Residential 1 (apartments), Residential 2 (houses with 1/2-acre lots), Residential 3 (houses with large home lots), commercial and industrial.

All users are assumed to be in apartments (Residential 1), which is easily valid for highly dense urban areas. More complicated demand estimation problems can be formulated by randomly setting the user type for a node from the above five types, assuming that there is no spatial correlation in the demand patterns. However, this is not realistic. In the real world, demand patterns show strong spatial correlations (e.g., commercial districts are in the downtown area of a city).
