*3.1. Graph Theory*

Most of the existing clustering algorithms developed for WNP relate to graph theory. To gain a deeper understanding of clustering algorithms, it is necessary to generalize some of the topological characteristics and properties of a WDN. Readers unfamiliar with graph theory should refer to previous studies [53,54].

WDN is a social infrastructure that allows water to flow along the pipes and communicate between nodes in the network. The topology structure of WDNs is mapped onto an undirected or directed graph and characterized by a pair of sets *G* = (*V*, *E*), where *V* is the vertex set representing junctions, reservoirs, and tanks and *n* = |*V*| is the total number of vertices. *E* is the set of edges in response to pipes, valves, pumps and *m* = |*E*| is the total number of edges. An undirected graph with edges is an

unordered pair {*v*1, *v*2}, while a directed graph with edges is an ordered pair and the vertices *v*1, *v*<sup>2</sup> are called the endpoints of the edge.

A given network graph, and a WDN in particular, can be converted by an adjacent matrix **A**, which is an *n* × *n* matrix, where *Aij* is the (*i*, *j*) element equal to 1 if *vi* is adjacent with *vj*, otherwise, *Aij* = 0. A weighted graph can be represented mathematically by an adjacency matrix that has a certain weight *Wij* assigned for each pair of vertices (*i*, *j*). The weights are usually non-negative, real numbers, and they must satisfy *Wij* = *Wji* ≥ 0, if *i* and *j* are connected. Otherwise, *Wij* = *Wji* = 0. The nodal degree, *ki*, is the number of edges attached to a vertex *i*. The degree of node *i* is defined as *ki* = *<sup>n</sup> <sup>j</sup>*=<sup>1</sup> *Aij* for the adjacency matrix **A**, and *ki* = *<sup>n</sup> <sup>j</sup>*=<sup>1</sup> *Wij* for the weighted adjacency matrix **W**. From a topological point of view and complex network theory, Giudicianni et al. [35] treated the WDN as a graph by using several complex network metrics to characterize the topology of typical WDNs. It was a preliminary process for better understanding the network itself, and provided the classical approach for partitioning or/and designing the WDNs.

One of the graph theory algorithms applied to network clustering is DFS, which, as proposed by Tarjan [55], allows for the exploration of the connectivity of a graph by traversing a node in the network. The DFS algorithm is a recursive approach based on backtracking. It starts by picking a root node in the network and then searches for nodes as far as possible along each path (in-depth dimension) until there are no more adjacent nodes in the current path to traverse after backtracking to the next path. In contrast, the BFS algorithm proposed by Pohl [56] starts at a root node and traverses the graph broad-wise by first moving horizontally and exploring all the nodes of the current path and then moving to the next path. Figure 3 shows how a DFS and a BFS work.

**Figure 3.** Diagram of depth-first search (DFS) and breadth-first search (BFS).

Tzatchkov et al. [38] applied the DFS and BFS to a WNP project in Mexico. DFS was used to segment a whole network into independent sectors by identifying nodes belonging to each sector (i.e., each sector is supplied exclusively from its own water sources, and it is not connected to other sectors in the network), and BFS was used to exam the set of disconnected nodes from any water sources, thus obtaining the size and configuration of independent sectors in the WDN. More specifically, Perelman and Ostfeld [39] and Lifshitz and Ostfeld [1] proposed a procedure for topology clustering based on the DFS algorithm to identify strongly connected clusters that had at least one path in both directions between them, while the opposing BFS algorithm was used to classify weakly connected clusters that had only one directed path between a set of nodes (i.e., from node *u* to node *v*, but not from node *v* to node *u*). The results were utilized for various purposes, such as contaminant prediction from a source and spread of infection in a WDN [1]. Di Nardo et al. [6] proposed a method for optimizing water network sectorization based on graph theory. DFS was used to find the independent sector combined with a hierarchical approach developed by Di Battista et al. [57] to draw hierarchical levels of a tree

graph corresponding to each source, creating isolated DMAs, each of which was supplied by its own source and was disconnected from the rest of a network through gate valves.

Campbell et al. [58,59] proposed a more advanced orderly combination of a series of graphs to generate a flexible method for defining feasible DMA layouts. They proposed dividing a network into two components, a trunk network and a distribution network. To determine the scope of the trunk network, the shortest path and the BFS concept were implemented. Once the trunk network was determined, it was detached from the network, while the community detection algorithm was adopted for the rest of network (distribution network) to define the best structural communities in the distribution network, which is the configuration of sectors. The innovation of this study was that the trunk pipes acted as entrances to each DMA and were not considered candidates for sectorization, ensuring the reliability of the WDN. In a similar methodology proposed by Alvisi and Franchini [60], BFS defined the location of possible nodes to form an assigned number of DMAs and then the shortest path distance from each source to the nodes was simultaneously estimated to determine the set of boundary pipes for each DMA. In the case of a WDN with numerous water sources, Scarpa et al. [9] successfully applied a BFS algorithm to identify elementary DMAs in which each one was supplied only by its own source.

Gomes et al. [61] proposed a systematic way to divide a WDN into suitable DMAs based on the Floyd–Warshall algorithm [62] and user-defined criteria (e.g., pipe length and number of users in each DMA). This method facilitated the creation of appropriate DMAs by finding the shortest distance from source to nodes depending on the network flow direction at peak flow conditions. Compared with BFS, this algorithm provided superior results, as it considered the shortest path of sources to every other node in the network and identified the best path. The algorithm was repeated until the target number of DMAs and user-defined constraints were met. Further adjustment could be carried out by combining adjacent DMAs to minimize the number of boundary pipes as long as the user-defined criteria are fulfilled.
