Next Article in Journal
Global Applications of the CE-QUAL-W2 Model in Reservoir Eutrophication: A Systematic Review and Perspectives for Brazil
Previous Article in Journal
Multi-Scale Gross Ecosystem Product (GEP) Valuation for Wetland Ecosystems: A Case Study of Lishui City
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Convolutional Graph Neural Network Model for Water Distribution Network Leakage Detection Based on Segment Feature Fusion Strategy

1
Department of Municipal and Environmental Engineering, Hebei University of Architecture, Zhangjiakou 075000, China
2
Key Laboratory of Water Quality Engineering and Comprehensive Utilization of Water Resources, Zhangjiakou 075000, China
*
Author to whom correspondence should be addressed.
Water 2024, 16(24), 3555; https://doi.org/10.3390/w16243555
Submission received: 29 September 2024 / Revised: 6 December 2024 / Accepted: 8 December 2024 / Published: 10 December 2024
(This article belongs to the Section Urban Water Management)

Abstract

:
In this study, an innovative leak detection model based on Convolutional Graph Neural Networks (CGNNs) is proposed to enhance response speed during pipeline bursts and to improve detection accuracy. By integrating node features into pipe segment features, the model effectively combines CGNN with water distribution networks, achieving leak detection at the pipe segment level. Optimizing the receptive field and convolutional layers ensures high detection performance even with sparse monitoring device density. Applied to two representative water distribution networks in City H, China, the model was trained on synthetic leak data generated by EPANET simulations and validated using real-world leak events. The experimental results show that the model achieves 90.28% accuracy in high-density monitoring areas, and over 85% accuracy within three pipe segments of actual leaks in low-density areas (10%–20%). The impact of feature engineering on model performance is also analyzed and strategies are suggested for optimizing monitoring point placement, further improving detection efficiency. This research provides valuable technical support for the intelligent management of water distribution networks under resource-limited conditions.

1. Introduction

The rapid growth of urbanization and water demand worldwide has led to the increasingly severe issue of distribution network aging. Frequent network failures not only waste valuable water resources but also significantly increase operation and maintenance costs [1,2]. Leaks and pipe bursts in aging networks have become more prevalent, severely impacting the sustainability of water resource management and the reliability of the water supply [3,4]. Despite the significant volume of water lost annually through network leaks worldwide, sufficient attention to this issue has not been given in many regions. As distribution networks continue to grow in complexity, the lack of effective management and monitoring methods further exacerbates water loss [5]. Rapid and accurate leak detection and timely repair have become key challenges in ensuring the reliability of water supply systems. While advanced detection devices and high-density monitoring points can significantly improve leak detection accuracy and efficiency [6,7], many regions still lack adequate monitoring equipment due to economic constraints, making leak detection difficult [8,9]. In this context, precise leak detection methods based on limited monitoring data have gradually become a research hotspot. These data-driven approaches, which do not require additional investment, can effectively improve network maintenance efficiency and demonstrate broad application potential [2].
In recent years, data-driven technologies have made significant advancements in the field of leak detection and localization in water distribution networks [10,11]. For example, Farley et al. successfully localized burst events by optimizing the sensitivity matrix and used a genetic algorithm (GA) to optimize sensor placement, effectively reducing the search area [12]. Min et al. developed a two-stage model, initially detecting leaks using the K-means algorithm and then further localizing them through a trial-and-error optimization approach [13]. Although they overcome the limitations of traditional physical models, these methods are primarily suitable for small-scale networks or can achieve leak localization only at the regional level [14]. With the rise of deep learning technologies, pattern recognition in complex network structures has gradually shifted from theory to practice, leading to new breakthroughs in leak detection for distribution networks. Garajeh et al. combined deep learning convolutional neural networks (DL-CNNs) with the analytic network process (ANP) method, demonstrating excellent application potential in urban water network management [15]. Wu et al. integrated fuzzy C-means clustering with the XGBoost algorithm to identify leak areas and predict leak severity based on simulation data [16]. Kang et al. combined a one-dimensional convolutional neural network (1D-CNN) with a support vector machine (SVM) ensemble model, demonstrating its high efficiency and accuracy in leak detection and localization [17]. Ran Yan et al. employed a confidence-learning-based Gaussian mixture model (CLGMM) approach to address the challenge of leak detection in the absence of sufficient leak data by cleaning labels and utilizing long-term historical data. This method improved the accuracy of leak detection in water distribution networks [18]. In addition, the edge-based graph neural network (EGNN) proposed by Kerimov et al. significantly improved the accuracy and transferability of flow and pressure predictions in water distribution systems, demonstrating superior performance and efficient computational capabilities under unknown topologies [19]. These studies highlight the enormous potential of deep learning algorithms in handling complex data and addressing the dynamic environment of distribution networks [20]. Neural network models, with their diverse structural designs and exceptional feature learning capabilities, are capable of capturing nonlinear relationships and multivariable interactions within complex networks, providing critical support for further improving the accuracy and efficiency of leak detection.
Neural networks, as the core of deep learning technologies, have demonstrated significant application potential across various fields. Different architectures of neural networks exhibit unique advantages in specific scenarios due to their inherent characteristics. For instance, convolutional neural networks (CNNs) are widely used in image and signal processing due to their ability to extract local features [21]; recurrent neural networks (RNNs) and their variant, long short-term memory networks (LSTMs), excel in modeling time-series data, particularly for handling dynamic data such as water pressure [22]; and graph neural networks (GNNs), which can directly operate on graph-structured data, have become an effective tool for solving complex topological relationship problems, demonstrating unique advantages in leak detection and localization in water supply networks [23]. In contrast, traditional methods such as Shewhart charts, CUSUM charts, and EWMA charts, based on threshold settings and statistical properties, are simple to implement and offer real-time performance, making them widely used in anomaly detection for water pressure fluctuations [24,25,26]. However, these methods fall short when dealing with multivariable features and complex network interactions, particularly in addressing issues such as node pressure, flow fluctuations, and topological relationships in water distribution networks. Convolutional Graph Neural Networks (CGNNs), through graph convolution operations, effectively capture complex interactions and nonlinear relationships between nodes while enabling the unified modeling of multidimensional data, thus overcoming the limitations of traditional methods. In complex network environments, CGNNs demonstrate significant adaptability and generalization ability, not only enhancing leak detection accuracy but also providing technical support for resource optimization and cost control. Based on the CGNNs framework, an improved method is proposed in this study to address the practical challenges in leak detection for water distribution networks.
Although Convolutional Graph Neural Networks (CGNNs) have demonstrated significant potential in leak localization for water distribution networks, their ability to process topological structures and fuse multidimensional data makes them well suited for complex network environments. However, traditional CGNNs frameworks face two major limitations when applied to the problem of leak localization in water distribution networks: first, due to topological constraints, leaks can be localized only at the node level, making precise localization at the pipe segment level difficult; second, in sparsely distributed monitoring networks, there are fewer nodes with monitoring information, making the effective extraction and fusion of data more challenging. To address these issues, in this paper, an innovative method is proposed that integrates node features into pipe segment features by combining the characteristics of the segment’s start and end points, achieving leak localization at the pipe segment level with data-driven precision, thereby overcoming the limitations of the traditional framework. Furthermore, to accommodate the uneven distribution of monitoring points in real-world scenarios, the model introduces improvements to the receptive field setting. By applying multi-layer convolution operations, the receptive field of each node is expanded, allowing it to gather information from a broader neighborhood. This improvement not only ensures high-precision localization in areas with high monitoring point density, but also maintains strong robustness under sparse monitoring conditions. The effectiveness of the proposed method is validated through both simulated data and real-world leak events. The results demonstrate that the model is capable of efficient leak detection and localization under various monitoring point distribution conditions.

2. Methodology

2.1. Overview of the Convolutional Graph Neural Network (CGNN)

The Graph Convolutional Neural Networks (CGNNs) are utilized in this study to achieve leakage localization in water distribution networks. CGNNs are deep learning model specifically designed for efficient feature extraction and information aggregation in graph-structured data. Unlike traditional convolutional neural networks (CNNs), CGNNs can directly handle complex network structures such as water distribution systems. Through the edges in the graph, the CGNN enables the exchange and integration of information between nodes, allowing each node’s features to reflect not only its own data but also the broader structural information of the entire graph. By performing multi-layer convolution operations, the CGNN gradually expands the receptive field of each node, enabling it to gather information from neighboring nodes and even distant nodes. This characteristic makes CGNNs particularly well suited for handling water distribution network data, effectively integrating pressure, flow, and other features from different monitoring points and segments, thereby aiding in the accurate localization of potential leaks.
To provide a clearer overview of the modeling process for the proposed CGNN-based leak localization model, Figure 1 illustrates the complete sequence of steps. This process summarizes the key stages, including feature engineering, node feature convolution operations, the integration of node features into pipe segment features, and model training and prediction. In Figure 1, the blue sections in the center represent the primary modeling steps, while the light green sections highlight the key innovations and contributions of this study. The yellow sections on the right indicate the objectives of each step, and the left side expands on specific components of the process. Each component will be discussed in detail in the following sections.

2.2. Feature Engineering

Feature engineering plays a crucial role in the development of deep learning models. Although models such as convolutional neural networks (CNNs) and Convolutional Graph Neural Networks (CGNNs) can automatically extract features, carefully designed feature engineering can still significantly enhance the model’s performance and generalization capability for certain complex tasks.
In this study, to implement data-driven network detection, three key features were analyzed: instantaneous flow, cumulative flow, and instantaneous pressure. However, the performance of these features varies in practical applications. Instantaneous flow often exhibits significant fluctuations, largely influenced by short-term changes in water demand, and tends to be noisy, making it difficult to reliably capture abnormal events. Its variation is often subject to multiple factors, complicating the detection process and increasing the likelihood of false positives or false negatives. Cumulative flow, on the other hand, requires a period of accumulation before an issue becomes apparent, resulting in significant delays and limiting the ability to respond promptly, potentially causing delays in repair efforts.
Among these features, instantaneous water pressure exhibits less fluctuation and is less affected by short-term changes in water demand, resulting in more stable data. This makes it more suitable for detecting and analyzing anomalies in the network. When a leak occurs, the water pressure in the affected area drops rapidly, and this change can directly and promptly indicate a physical anomaly in the system. Leakage detection based on instantaneous water pressure can also be combined with spatial location data to more accurately pinpoint leak locations. By deploying pressure sensors throughout the network, pressure anomalies can be quickly detected, allowing for the precise identification of the leakage location, thereby significantly improving detection efficiency and accuracy. For example, Figure 2 shows eight warning events in the SCADA system of City H, where (a) in red represents four actual leakage events, and (b) in blue represents four false alarm events. Analysis reveals that in actual leakage events, both instantaneous flow and instantaneous water pressure exhibit significant and sustained changes, while in false alarms, only the instantaneous flow shows inconsistent fluctuations.
In summary, instantaneous water pressure was selected as the primary indicator for leak detection due to its ability to effectively reflect network anomalies. Compared with cumulative flow and instantaneous flow, instantaneous water pressure has the advantage of more stable data with lower noise levels, making it better suited for anomaly detection and precise localization in complex water supply systems. Specifically, current pressure directly reflects the actual pressure state at a node at a given moment, providing the foundation for capturing real-time operational conditions; predicted pressure provides a reference for the expected pressure under normal operating conditions, serving as a basis for identifying abnormal pressure variations; and pressure fluctuation, defined as the difference between current and predicted pressures, quantifies the range of pressure changes at a node, enabling precise detection of abnormal behaviors. Its scientific validity for leak detection has been confirmed by multiple studies, not only reflecting the direct relationship between pressure and leakage, but also demonstrating significant advantages in model performance optimization [27,28]. Based on this, three feature engineering steps are designed in the study to extract three key features from the nodes, namely, current pressure, predicted pressure, and pressure fluctuation, aiming to comprehensively capture the operational status and leak characteristics of the water distribution network.

2.3. Convolution Operation on Node Features

To extract higher-order node features, this study employs graph convolution layers to perform multi-layer convolution operations on the node features. The specific convolution operation is defined as follows [29]:
X l + 1 = σ D 1 2 A D 1 2 X ( l ) W ( l )
where X(l) represents the node feature matrix at layer l; A is the adjacency matrix of the graph, indicating the connections between nodes; D is the degree matrix, used to normalize the adjacency matrix; W(l) is the weight matrix at layer l, learned during training; and σ(⋅) is the activation function, for which the commonly used ReLU activation function is applied in this study. Through multi-layer convolution operations, the node features gradually incorporate information from neighboring nodes, resulting in more expressive higher-order features.

2.4. Synthesis of Node Features into Segment Features

In traditional graph neural networks, node features are typically processed, while edge features are not explicitly considered [30]. However, in water distribution networks, leak events usually occur on pipe segments (i.e., edges). To address this issue, in this study, an innovative node-based edge feature synthesis method is proposed. Specifically, each node may be connected to one or more edges. By introducing directed edges, the combination of the starting and ending node IDs for each edge becomes a unique identifier. This approach not only overcomes the limitations of traditional graph neural networks in edge feature modeling, but also directly improves leak localization from the traditional node-level or node-referenced fuzzy localization to precise pipe segment-level localization. The specific steps are as follows.
Node Feature Combination: For each segment, its feature vector is constructed by combining the feature vectors of its two endpoints. Let segment eij connect nodes vi and vj; then, the feature vector of segment eij is represented as follows [31]:
f e i j = c o n c a t f v i , f v j
where fvi and fvj are the feature vectors of nodes vi and vj, respectively, and concat(⋅) denotes the concatenation operation of the feature vectors.
Feature Fusion and Dimensionality Reduction: Through feature fusion, the high-dimensional node features are combined into segment features. In practice, to reduce dimensionality and computational complexity, segment features can be further processed through linear transformation or pooling operations [31]:
f e i j = W f f e i j + b f
where wf is the linear transformation matrix, and bf is the bias term.
Figure 3 illustrates the process of fusing node features into segment features through convolutional layers in the water distribution network case study from Section 5.2. In Figure 3a, segment 5 is located between nodes 5 and 6. Through the three convolutional layers shown in Figure 3b, the receptive field of node 5 is progressively expanded. In Figure 3c, node 5’s receptive field includes nodes 4, 5, 6, and 10 (first layer); nodes 3, 9, 11, and 16 (second layer); and nodes 1, 2, 15, 23, 17, 18, and 19 (third layer). Similarly, in Figure 3d, node 6’s receptive field includes nodes 5, 6, 7, and 11 (first layer); nodes 4, 10, 18, and 19 (second layer); and nodes 3, 9, 16, 17, 24, 25, and 20 (third layer). By fusing the features of these two nodes, the final receptive field for segment 5, as shown in Figure 3e, incorporates information from all of the aforementioned nodes. Through the layer-by-layer expansion of the receptive field and feature fusion, the pipe segment features are not only able to integrate information from surrounding nodes, but also significantly improve leak localization accuracy at the pipe segment level.

2.5. Model Architecture and Training

After completing feature engineering and node feature synthesis, the feature matrix of the entire water distribution network is used as the input for the Graph Convolutional Neural Network (CGNN). The model architecture consists of three graph convolutional layers followed by a fully connected layer to output the predicted results. The training process of the model is as follows:
Input Data: The segment features obtained from feature engineering and synthesis are fed into the model.
Convolutional Layers: Multiple convolutional layers are applied sequentially to extract higher-order features.
Global Pooling: Global average pooling is performed on the features of each segment [21]:
h = 1 N i = 1 N X i
where N is the number of nodes and Xi is the feature vector of each node.
Fully Connected Layer and Output: The final features are passed through a fully connected layer, which outputs the probability distribution of failure segments.
Loss Function: The model uses the cross-entropy loss function to measure the error between the predicted and true labels [32]:
L C E = c = 1 K y c l o g y ^ c
where yc is the true class label and y ^ c is the predicted probability for class c.

2.6. Determination of the Number of Convolutional Layers

In practical water distribution networks, not all nodes are equipped with detection devices, and the model’s ability to localize faulty segments relies on data from monitoring nodes. To meet the basic localization requirements, each segment (edge) must have a receptive field that includes at least two monitoring nodes to enable localization based on pressure data. Therefore, the number of convolutional layers L must satisfy the conditions outlined in the following subsections.

2.6.1. Coverage of the Receptive Field

For a regular node vo, after L convolutional layers, its receptive field should cover at least two monitoring nodes. This means that after the L-th convolutional layer, the neighborhood set N(vo,L) of the regular node vo should contain at least two monitoring nodes. This can be expressed as follows:
v m N V O , L 2
where N(vo,L) is the neighborhood set of the regular node vo after L layers of convolution, and vm represents the monitoring nodes in the neighborhood set.

2.6.2. Calculation of the Receptive Field

Each convolutional layer expands the receptive field of a node, allowing the node to receive information from more of its neighboring nodes. The receptive field of a regular node vo after the l-th convolutional layer can be estimated as follows [29]:
R v o , l = i 1 l N v o , i
where R(vo,l) represents the receptive field of the regular node vo after l layers of convolution, covering all nodes accessible through l layers of convolution. N(vo,i) denotes the direct neighborhood set of the regular node vo after the i-th layer of convolution.

2.6.3. Selection of the Number of Convolutional Layers

Considering regularization and model complexity, the optimal number of convolutional layers L* should satisfy the receptive field coverage condition while minimizing the model’s loss function. Specifically,
L * = a r g   m i n ζ t o t a l | i = 1 L v m N v o , i 2
where ζ t o t a l = ζ C E + λ l = 1 L W l 2 is the total loss function, which includes the cross-entropy loss and a regularization term. Using this formula, the required number of convolutional layers L can be systematically determined, ensuring that the model effectively covers all regular nodes while optimizing performance and minimizing overfitting to achieve the best possible outcome.
Figure 4 provides a schematic overview of the entire process of the Convolutional Graph Neural Network (CGNN), visually illustrating the key components of the model and their functions. First, the input layer includes three critical features, namely, current pressure, predicted pressure, and pressure fluctuation, which are used to characterize the real-time status of the network nodes. Next, the three GCN convolutional layers progressively extract higher-order node features while expanding the receptive field of each node, allowing them to gather information from neighboring nodes and their surrounding areas. This layer-by-layer convolution operation enhances feature interactions between nodes, facilitating the effective transmission of information in the complex topology of the water distribution network. The features are then summarized through a global average pooling layer and further refined by a fully connected layer, which outputs the predicted status of each segment (i.e., normal or faulty). Ultimately, the model is able to accurately locate faults for each segment in the network.

3. Water Pressure Forecasting

To meet the real-time and accuracy requirements of the water distribution network leakage localization based on the Convolutional Graph Neural Network (CGNN), water pressure forecasting becomes a crucial step, providing a solid foundation for real-time computation and leak localization. In this study, we compared three commonly used time series forecasting methods: long short-term memory (LSTM), Seasonal Autoregressive Integrated Moving Average (SARIMA), and Random Forest [33,34]. These methods are wellsuited for handling typical time series data such as water pressure and flow.
In the experiment, we selected 10 monitoring points with a frequency of once every 5 min for prediction, and applied the 3 methods to predict water pressure at random time points. The training data were sourced from the historical data of the previous seven days, and the Mean Absolute Error (MAE) was used as the main evaluation metric. Additionally, we evaluated the error between the historical water pressure and the actual water pressure. Table 1 shows the water pressure prediction results.
The experimental results indicate that the errors of all three time series forecasting methods are relatively small, with the maximum absolute error of 0.193 m occurring in the LSTM prediction at node 7. However, this error is far smaller than the pressure fluctuation magnitude during a leak event. Due to the cyclic nature of water pressure, the difference between historical and real-time pressure is also minor, with a maximum error of 0.158 m, which is within the acceptable range for leakage fluctuation detection. This suggests that regardless of the time series forecasting method used, the prediction error has minimal impact on the accuracy of water pressure forecasting. Additionally, historical water pressure data have also been proven to have high reference values, sufficiently supporting efficient leak event detection.

4. Preparation Work

To train the Convolutional Graph Neural Network (CGNN) model and achieve accurate leakage localization in water distribution networks, a large amount of data from different leakage scenarios and leakage levels is required. However, due to the limited number of recorded real-world leakage and burst events, it is challenging to meet the data requirements for model training. Therefore, simulation-generated data are necessary.

4.1. Leakage Simulation Method

In this study, EPANET2.2 software was used to establish a Pressure Dependent Analysis (PDA) hydraulic model. By adding water consumption nodes into the model to simulate leakage points, and adjusting the leakage volume based on the pressure drop observed at surrounding nodes during actual leakage events, the water pressure at the surrounding nodes of the simulated leakage location was made consistent with the pressure values observed during the real leakage events. The range of simulated leakage volume was determined based on pipe diameter and the pressure drop observed in different leakage events. Specifically, the leakage volume under leakage conditions were set to be one to several times the flow rate of the pipe section under normal conditions, as detailed in Table 2.

4.2. Generation of Training Samples

Based on the hydraulic model established for the actual distribution network in City H, pressure data were calculated using real measured consumption through the PDA. In the feature engineering process, three key features were selected: actual pressure, predicted pressure, and pressure difference. Among these, PDA simulation data were used to represent the actual pressure. To simulate the error in predicted pressure, random disturbances ranging from −2 to 2 kPa were added to the simulated pressure at each node (this range exceeds the maximum absolute error between the predicted and actual pressures described in Section 3), effectively accounting for the uncertainty in pressure prediction. Pressure fluctuation was then calculated as the difference between the actual and predicted pressure, quantifying the pressure changes.
To further enhance the model’s generalization ability and scalability, training samples were generated by randomly selecting 25%, 50%, and 75% of the positions along the pipe segments and simulating leakage based on the set leakage flow rates. This approach ensures that the data cover a variety of potential leakage scenarios. This process not only comprehensively simulates the network’s operational status under different leakage conditions, but also strengthens the CGNN model’s robustness and reliability in real-world environments by reasonably incorporating prediction errors. The generated samples provide diverse training data, laying the foundation for the model’s efficiency and accuracy in practical applications.

5. Case Study

5.1. Background

The water distribution network in City H is supplied by two long-operating water plants (A and B) with designed daily water supply capacities of 100,000 and 50,000 cu-bic meters, respectively, serving the daily water needs of 600,000 people. The network covers a variety of water usage scenarios, including residential, commercial, and in-dustrial areas and spanning both urban and rural regions. Figure 5 shows the layout of the City H water distribution network, including water sources, pipe segments, key valves, consumption nodes, and special consumption nodes with monitoring devices. In the urban center near the Ring A Water Plant, the pressure monitoring points are densely distributed, while other areas have sparser and unevenly distributed monitoring points. Figure 6a shows a field photo of the monitoring device at node 6, and Figure 6b shows a field photo of the monitoring device at node 49. This study primarily analyzes two aspects:
(1) The prediction accuracy of leak localization using Convolutional Graph Neural Network (CGNN) in the actual water distribution network, evaluating the performance of the CGNN model in City H’s water distribution network, particularly its localization accuracy within different topological distance ranges;
(2) The impact of pressure monitoring point layout on leak localization accuracy and its optimization, analyzing the effect of pressure monitoring point layout in different re-gions on the CGNN leak localization accuracy and offering suggestions for layout op-timization.

5.2. Case Study 1: Core Area Analysis of Plant A

The core area of the Ring A Water Plant has a dense layout of pressure detection points, which facilitates accurate leakage localization analysis. Data gaps are filled using PDA simulation, ensuring that all nodes are equipped with monitoring data to represent the dense distribution of pressure sensors in the actual pipeline network. The distribution network consists of 55 water consumption nodes and 72 supply pipelines. Figure 7 shows the water supply network map of the core area of the Ring A Water Plant. A training set of 600 leakage conditions was randomly generated, and a CGNN model with three embedded convolutional layers was trained by calculating these conditions. Accuracy tests were conducted on 72 validation sets. Table 3 presents the leakage localization accuracy of the CGNN model for both the training and validation sets in this area.
As shown in the table, the localization accuracy of the training and validation sets is very close, indicating that the model has been well trained. The topological distance here refers to the shortest path distance between the leak point and the node identified by the model, measured based on the topological structure of the network. A topological distance of 0 indicates that the leak point and the predicted point are exactly the same. As the topological distance increases, the model allows for a small margin of error between the leak point and the predicted point while maintaining high accuracy. As the topological distance increases, the localization accuracy of the model gradually improves, reaching 98.61% in the validation set at a topological distance of 3.
This result indicates that the CGNN model achieves high leak localization accuracy in the water distribution network of the Ring A Water Plant. Additionally, we combined data from four actual leak events in the Ring A Water Plant network (shown in Figure 1) with PDA-completed data from unmonitored nodes for leak point localization. Specifically, the monitoring data from nodes equipped with monitoring devices during the leak were input into the model, while simulation data from EPANET under leak conditions were used for the other nodes. The model successfully achieved precise localization, reaching 100% localization accuracy. This further demonstrates that the CGNN-based leak localization method, utilizing water pressure as a feature, is highly applicable to urban water distribution networks with dense monitoring points and holds significant practical value.

5.3. Case Study 2: Practical Application in a Large-Scale Dual Water Treatment Plant Network

In this study, we conducted a case analysis of the water distribution network in City H. This network consists of 332 nodes and 442 pipe segments with 2 water sources, covering a wide range of water supply types. It represents a typical structure of a large urban water distribution network. To examine the impact of monitoring point density and distribution on leakage localization accuracy, the study used a training set of 1,800 simulated leakage scenarios to train the CGNN model followed by accuracy testing on a validation set of 442 scenarios. Additionally, real leakage event data were integrated into the model for detection. Given the large network size, the distribution network was divided into seven zones for analysis based on monitoring point density and distribution patterns (Figure 8). The results are shown in Table 4.
In these areas, Zone C, as the core area, has a monitoring point density of 70.9%, resulting in significantly higher localization accuracy compared with other regions. In particular, when the topological distance is 3, the accuracy reaches 96.1%. Moreover, data from the monitoring devices during the four actual leak events in City H (all located in Zone C) were input into the model, and the model’s localization results perfectly matched the actual leak segments, achieving a 100% localization accuracy. In contrast, while the monitoring point density in the other six areas is similar, there is a significant difference in localization accuracy, mainly due to the uneven distribution of the monitoring points.

6. Results and Discussion

6.1. Impact of Monitoring Device Distribution on Model Localization Accuracy

Zone C is the most densely monitored area in City H’s water distribution network, with a monitoring point density of 70.9%. In Case 1, after completing the data with EPANET simulation, the monitoring point density was increased to 100%, and the localization accuracy reached 90.28% when the topological error was 0. In contrast, in Case 2, the monitoring point density in Zone C was 70.9%, and the localization accuracy was 84.6%, a decrease of only 5.68%. When the topological error was 3, the localization accuracies in Case 1 and Case 2 were 98.61% and 96.1%, respectively, with a difference of only 2.51%. These results indicate that under high-density monitoring conditions, the model can maintain localization accuracy close to that under 100% coverage even when the monitoring point coverage drops to 70%, demonstrating remarkable adaptability and robustness.
The monitoring point density in the remaining areas ranges from 12.1% to 18.2%, indicating a sparse distribution. Even with a low monitoring point density, the model’s localization accuracy exceeds 50.6% when the topological error is 0, and it reaches over 83.9% when the topological error is 3, demonstrating high robustness. However, localization accuracy is influenced not only by monitoring point density but also by the network’s topological structure and the uniformity of the monitoring point distribution. For example, in Zone A, the localization accuracy at topological errors of 2 and 3 is the most significant among the ABDEFG zones. An analysis reveals that some of the pipelines in Zone A have a single-pipeline ring structure, and high-precision localization can be achieved using two key monitoring points located at nodes 193 and 164. In contrast, although Zone E has a similar monitoring point density to Zone A, its monitoring devices are primarily located at the edges of the area, and the pipeline structure is complex and irregular, leading to lower overall localization accuracy. Zone B has the lowest monitoring point density at only 12.1%. Although the accuracy is lower at topological errors of 0 and 1, due to the relatively regular pipeline structure, the localization accuracy improves significantly at topological errors of 2 and 3.
In summary, the monitoring point density is the primary factor influencing the model’s localization accuracy, while the pipeline structure and the uniformity of monitoring point distribution also play crucial roles. Given the limited construction budget for most water distribution networks, optimizing the layout of monitoring points becomes key to enhancing localization performance. Based on this study and the analysis of the seven regions, the following optimization recommendations are proposed: For networks with regular structures, monitoring points should be distributed evenly both within the area and along its edges to improve coverage and localization accuracy; for single-loop long-distance networks, placing monitoring points only at the start and end can achieve precise localization, thereby maximizing the utilization of monitoring points and reducing construction costs.

6.2. Feature Engineering Contribution Analysis

To assess the importance of each feature in the Graph Convolutional Network (GCN), we adopted a gradient-based method to quantify the contribution of each feature [35]. This method measures the impact of each feature on the model’s prediction by computing the gradient of the loss function with respect to the input features, thereby providing an effective way to evaluate feature significance.
Forward propagation: The input data are passed through the model to obtain the predicted value y ^ .
Loss calculation: The error between the predicted value and the actual label y is computed using the loss function L ( y ^ , y ) .
Backward propagation: The gradient of the loss with respect to the input features is computed: L x i .
Absolute sum: The absolute value of each feature’s gradient is taken to represent the contribution of that feature to the model’s output:
F e a t u r e   I m p o r t a n c e = i L x i
This method quantifies the influence of each feature on the loss function, thereby estimating its importance in the model.
To standardize the feature contributions so that their sum equals 1, we normalize the importance of each feature. The normalization formula is as follows:
N o r m a l i z e d   I m p o r t a n c e x i = L x i j L x j
In both cases, we calculated the normalized contribution of each feature to assess its importance in the model’s prediction. Specifically, the feature contributions in Case 1 were current pressure: 14.52%; predicted pressure: 12.58%; and pressure fluctu-ation: 72.90%. In Case 2, the feature contributions were current pressure: 18.63%; pre-dicted pressure: 15.49%; and pressure fluctuation: 65.88%.
Although the contribution values of the features differ between the two cases, their ranking and relative differences remain consistent, indicating a high level of sta-bility in the model’s reliance on these features. Notably, pressure fluctuation consist-ently has a significantly higher contribution than the other features in both cases, demonstrating its dominant role in the model. Specifically, the contribution of pres-sure fluctuation is highest in Case 1 at 72.90%, and in Case 2, it is 65.88%. This finding indicates that pressure fluctuation plays a critical role in predicting the target variable, providing key dynamic information, particularly in capturing the trends and fluctua-tions of water pressure.
Despite pressure fluctuation exhibiting the highest contribution across all features, both current pressure and predicted pressure are also indispensable key features in the model. Current pressure, as a direct monitoring indicator, reflects the system’s current state, making it the most intuitive and fundamental input in the model by providing real-time water pressure data. Predicted pressure, on the other hand, serves as a benchmark for comparison, which is crucial for the prediction task. It helps the model evaluate the future water pressure trends and provides reference information for deci-sion making.

7. Conclusion

In this study, an efficient and robust leak detection method is proposed, offering technical support for the intelligent management of modern urban water supply systems. It also provides a theoretical foundation for optimizing network layout and model design under resource-constrained conditions. The research was applied to two representative water distribution networks in City H, China, leading to the following key conclusions:
  • CGNN model with pipe segment feature fusion: A Convolutional Graph Neural Network (CGNN) model for rapid leak detection in water distribution networks is presented, utilizing a pipe segment feature fusion strategy. The model leverages current pressure, predicted pressure, and pressure fluctuations as core features. By improving leak localization accuracy from node-level to pipe-segment-level precision, the model achieves more accurate leak detection. Its effectiveness has been validated using both EPANET simulation data and real leak event data, demonstrating excellent performance.
  • Accuracy in high-density sensor areas: In high-density sensor areas, the CGNN model achieved a leak localization accuracy of 90.28% with zero topological error, demonstrating its strong practical potential for modern urban water distribution networks.
  • Robustness in low-density sensor areas: In areas with low sensor density (10%-20%) and uneven distribution, the CGNN model maintained a localization accuracy of over 85% within a topological distance of 3, demonstrating strong robustness. This highlights the model’s effectiveness under resource-constrained conditions and its potential for wider application.
  • Contribution of feature engineering: The study shows that pressure fluctuation is the most influential feature, playing a dominant role in leak localization. Current pressure and predicted pressure, as key features, provide essential real-time monitoring and predictive references. A well-designed feature engineering approach is crucial for enhancing model performance.
  • Importance of optimizing monitoring point layout: In this study, it was found that optimizing the layout of monitoring points, particularly in areas with complex topologies or significant pipeline variations, can significantly improve the localization accuracy. This offers both theoretical support and practical recommendations for future monitoring point deployment in water distribution networks.
Future work will focus on refining leak localization at the pipeline segment level by analyzing data from monitoring devices at both ends of a segment and surrounding areas. This will be combined with advanced deep learning frameworks to achieve precise leak localization within specific segments. Additionally, the feature engineering process will be further optimized through multi-feature fusion strategies, incorporating factors such as historical maintenance records, pipe material characteristics, and environmental conditions. These improvements aim to enhance the model’s adaptability, accuracy in complex scenarios, and potential for broader real-world applications.

Author Contributions

Conceptualization, X.L. and Y.W.; methodology, X.L.; resources, Y.W.; data curation, X.L.; writing—original draft preparation, X.L.; writing—review and editing, Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Hebei Education Department (No. ZD2017225) and the Key Laboratory of Water Quality Engineering and Comprehensive Utilization of Water Resources, Zhangjiakou, 075000, China.

Data Availability Statement

The data that support the findings of this study are not publicly available due to restrictions from Hebei Construction Investment Hengshui Water Affairs Company Limited.

Acknowledgments

The authors would like to acknowledge the data support provided by Hebei Construction Investment Hengshui Water Affairs Company Limited.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Khorshidi, M.S.; Nikoo, M.R.; Taravatrooy, N.; Sadegh, M.; Al-Wardy, M.; Al-Rawas, G.A. Pressure Sensor Placement in Water Distribution Networks for Leak Detection Using a Hybrid Information-Entropy Approach. Inf. Sci. 2020, 516, 56–71. [Google Scholar] [CrossRef]
  2. Serafeim, A.V.; Fourniotis, N.T.; Deidda, R.; Kokosalakis, G.; Langousis, A. Leakages in Water Distribution Networks: Estimation Methods, Influential Factors, and Mitigation Strategies—A Comprehensive Review. Water 2024, 16, 1534. [Google Scholar] [CrossRef]
  3. Wu, Y.P.; Ma, X.K.; Guo, G.C.; Jia, T.L.; Huang, Y.J.; Liu, S.M.; Fan, J.J.; Wu, X. Advancing Deep Learning-Based Acoustic Leak Detection Methods Towards Application for Water Distribution Systems from a Data-Centric Perspective. Water Res. 2024, 261, 121999. [Google Scholar] [CrossRef] [PubMed]
  4. Perelman, L.S.; Allen, M.; Preis, A.; Iqbal, M.; Whittle, A.J. Automated Sub-Zoning of Water Distribution Systems. Environ. Model. Softw. 2015, 65, 1–14. [Google Scholar] [CrossRef]
  5. Boaventura, O.D.Z.; Proença, M.S.; Obata, D.H.S.; Paschoalini, A.T. Convolutional Neural Network for Leak Location in Buried Pipes of Underground Water Supply. J. Braz. Soc. Mech. Sci. Eng. 2024, 46, 352. [Google Scholar] [CrossRef]
  6. Chan, T.K.; Chin, C.S.; Zhong, X.H. Review of Current Technologies and Proposed Intelligent Methodologies for Water Distributed Network Leakage Detection. IEEE Access 2018, 6, 78846–78867. [Google Scholar] [CrossRef]
  7. Yu, J.; Zhang, L.; Chen, J.Y.; Xiao, Y.; Hou, D.B.; Huang, P.J.; Zhang, G.X.; Zhang, H.J. An Integrated Bottom-Up Approach for Leak Detection in Water Distribution Networks Based on Assessing Parameters of Water Balance Model. Water 2021, 13, 867. [Google Scholar] [CrossRef]
  8. Islam, M.R.; Azam, S.; Shanmugam, B.; Mathur, D. A Review on Current Technologies and Future Direction of Water Leakage Detection in Water Distribution Network. IEEE Access 2022, 10, 107177–107201. [Google Scholar] [CrossRef]
  9. Fang, Q.S.; Zhao, H.Y.; Xie, C.L.; Chen, T. A Method for Water Supply Network DMA Partitioning Planning Based on Improved Spectral Clustering. Water Supply 2023, 23, 3432–3452. [Google Scholar] [CrossRef]
  10. Nimri, W.; Wang, Y.; Zhang, Z.; Deng, C.; Sellstrom, K. Data-Driven Approaches and Model-Based Methods for Detecting and Locating Leaks in Water Distribution Systems: A Literature Review. Neural Comput. Appl. 2023, 35, 11611–11623. [Google Scholar] [CrossRef]
  11. Zou, X.Y.; Lin, Y.L.; Xu, B.; Guo, Z.B.; Xia, S.J.; Zhang, T.Y.; Wang, A.Q.; Gao, N.Y. A Novel Event Detection Model for Water Distribution Systems Based on Data-Driven Estimation and Support Vector Machine Classification. Water Resour. Manag. 2019, 33, 4569–4581. [Google Scholar] [CrossRef]
  12. Farley, B.; Mounce, S.R.; Boxall, J.B. Development and Field Validation of a Burst Localization Methodology. J. Water Resour. Plan. Manag. 2013, 139, 604–613. [Google Scholar] [CrossRef]
  13. Min, K.W.; Kim, T.; Lee, S.; Choi, Y.H.; Kim, J.H. Detecting and Localizing Leakages in Water Distribution Systems Using a Two-Phase Model. J. Water Resour. Plan. Manag. 2022, 148, 04022051. [Google Scholar] [CrossRef]
  14. Li, R.; Huang, H.; Xin, K.; Tao, T. A Review of Methods for Burst/Leakage Detection and Location in Water Distribution Systems. Water Sci. Technol. Water Supply 2015, 15, 429–441. [Google Scholar] [CrossRef]
  15. Garajeh, M.K.; Feizizadeh, B.; Salmani, B.; Ghasemi, M. Analyzing Urban Drinking Water System Vulnerabilities and Locating Relief Points for Urban Drinking Water Emergencies. Water Resour. Manag. 2024, 38, 2339–2358. [Google Scholar] [CrossRef]
  16. Wu, J.; Ma, D.; Wang, W. Leakage Identification in Water Distribution Networks Based on XGBoost Algorithm. J. Water Resour. Plan. Manag. 2022, 148, 04021107. [Google Scholar] [CrossRef]
  17. Kang, J.; Park, Y.-J.; Lee, J.; Wang, S.-H.; Eom, D.S. Novel Leakage Detection by Ensemble CNN-SVM and Graph-Based Localization in Water Distribution Systems. IEEE Trans. Ind. Electron. 2018, 65, 4279–4289. [Google Scholar] [CrossRef]
  18. Yan, R.; Huang, J.J. Confident Learning-Based Gaussian Mixture Model for Leakage Detection in Water Distribution Networks. Water Res. 2023, 247, 120773. [Google Scholar] [CrossRef]
  19. Kerimov, B.; Taormina, R.; Tscheikner-Gratl, F. Towards Transferable Metamodels for Water Distribution Systems with Edge-Based Graph Neural Networks. Water Res. 2024, 261, 121933. [Google Scholar] [CrossRef]
  20. Pérez, R.; Puig, V.; Pascual, J.; Quevedo, J.; Landeros, E.; Peralta, A. Methodology for Leakage Isolation Using Pressure Sensitivity Analysis in Water Distribution Networks. Control Eng. Pract. 2011, 19, 1067–1078. [Google Scholar] [CrossRef]
  21. LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
  22. Lim, B.; Zohren, S. Time-Series Forecasting with Deep Learning: A Survey. Philos. Trans. R. Soc. A 2021, 379, 20200209. [Google Scholar] [CrossRef] [PubMed]
  23. Molnö, V.; Sandberg, H. On Well-Posedness of the Leak Localization Problem in Paral-lel Pipe Networks. Automatica 2024, 167, 111801. [Google Scholar] [CrossRef]
  24. FREITAS, L.L.G.; Henning, E.; Kalbusch, A.; Konrath, A.C.; Walter, O.M.F.C. Analysis of Water Consumption in Toilets Employing Shewhart, EWMA, and Shewhart-EWMA Combined Control Charts. J. Clean. Prod. 2019, 233, 1146–1157. [Google Scholar] [CrossRef]
  25. Huang, T.; Hu, X.; Tang, A.; Wu, S.; Zhao, M. An Adaptive EWMA Median Chart for Monitoring the Process Mean. In Proceedings of the 31st Chinese Conference on Control and Decision, Nanchang, China, 3–5 June 2019; Volume 3, pp. 1156–1160. [Google Scholar] [CrossRef]
  26. Bakker, M.; Jung, D.; Vreeburg, J.; van de Roer, M.; Lansey, K.; Rietveld, L. Detecting Pipe Bursts Using Heuristic and CUSUM Methods. Procedia Eng. 2014, 89, 975–982. [Google Scholar] [CrossRef]
  27. Xu, X.; Liu, Y.; Liu, S.; Li, J.; Guo, G.; Smith, K. Real-Time Detection of Potable-Reclaimed Water Pipe Cross-Connection Events by Conventional Water Quality Sensors Using Machine Learning Methods. J. Environ. Manag. 2019, 238, 201–209. [Google Scholar] [CrossRef]
  28. Bronstein, M.M.; Bruna, J.; LeCun, Y.; Szlam, A.; Vandergheynst, P. Geometric Deep Learning: Going Beyond Euclidean Data. IEEE Signal Process. Mag. 2017, 34, 18–42. [Google Scholar] [CrossRef]
  29. Ren, S.; Zhou, F. Semi-Supervised Classification for PolSAR Data with Multi-Scale Evolving Weighted Graph Convolutional Network. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 2911–2923. [Google Scholar] [CrossRef]
  30. Ge, L.; Li, Y.; Li, H.; Tian, L.; Wang, Z. A Review of Privacy-Preserving Research on Federated Graph Neural Networks. Neurocomputing 2024, 600, 128166. [Google Scholar] [CrossRef]
  31. Bronstein, M.M.; Bruna, J.; Cohen, T.; Veličković, P. Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges. arXiv 2021, arXiv:2104.13478. [Google Scholar]
  32. Baskerville, N.P.; Keating, J.P.; Mezzadri, F.; Najnudel, J. The Loss Surfaces of Neural Networks with General Activation Functions. J. Stat. Mech. 2020, 2020, 064001. [Google Scholar] [CrossRef]
  33. Kavya, M.; Mathew, A.; Shekar, P.R.; Sarwesh, P. Short Term Water Demand Forecast Modelling Using Artificial Intelligence for Smart Water Management. Sustain. Cities Soc. 2023, 95, 104610. [Google Scholar] [CrossRef]
  34. Chaturvedi, S.; Rajasekar, E.; Natarajan, S.; McCullen, N. A Comparative Assessment of SARIMA, LSTM RNN and Fb Prophet Models to Forecast Total and Peak Monthly Energy Demand for India. Energy Policy 2022, 168, 113097. [Google Scholar] [CrossRef]
  35. Sundararajan, M.; Taly, A.; Yan, Q. Axiomatic Attribution for Deep Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 34, 18–42. [Google Scholar] [CrossRef]
Figure 1. Flowchart of the leak localization model process.
Figure 1. Flowchart of the leak localization model process.
Water 16 03555 g001
Figure 2. Eight warning events in the SCADA system of City H.
Figure 2. Eight warning events in the SCADA system of City H.
Water 16 03555 g002
Figure 3. The process of fusing node features into segment features through convolutional layers. (a) Pipe segment 5 in the water supply network. (b) Convolutional layer connectivity. (c) Receptive field expansion of node 5. (d) Receptive field expansion of node 6. (e) Final receptive field of pipe segment 5.
Figure 3. The process of fusing node features into segment features through convolutional layers. (a) Pipe segment 5 in the water supply network. (b) Convolutional layer connectivity. (c) Receptive field expansion of node 5. (d) Receptive field expansion of node 6. (e) Final receptive field of pipe segment 5.
Water 16 03555 g003
Figure 4. A schematic overview of the entire process of the Convolutional Graph Neural Network (CGNN).
Figure 4. A schematic overview of the entire process of the Convolutional Graph Neural Network (CGNN).
Water 16 03555 g004
Figure 5. Water distribution network of City H.
Figure 5. Water distribution network of City H.
Water 16 03555 g005
Figure 6. On-site photos of pressure sensors in City H.
Figure 6. On-site photos of pressure sensors in City H.
Water 16 03555 g006
Figure 7. Water supply network map of Ring A Water Plant.
Figure 7. Water supply network map of Ring A Water Plant.
Water 16 03555 g007
Figure 8. Zonal analysis of the water distribution network in City H.
Figure 8. Zonal analysis of the water distribution network in City H.
Water 16 03555 g008
Table 1. Maximum absolute error in water pressure prediction.
Table 1. Maximum absolute error in water pressure prediction.
Node IDPrediction Method
LSTM Error (m)Random Forest Error (m)SARIMA Error (m)Historical Pressure Error (m)
10.1180.1620.1190.103
50.1370.1560.1290.114
70.1930.1420.1560.153
120.0550.0680.0910.127
200.1270.1610.1340.158
260.0860.1420.1250.126
450.0730.0890.0790.135
490.1260.1430.1190.098
1630.1530.1620.1440.103
1690.0990.1350.1520.086
Table 2. Simulated leakage flow range for pipe segments.
Table 2. Simulated leakage flow range for pipe segments.
Pipe Diameter (mm)Leakage Flow (Multiples of Normal Flow)
100 ≤ Diameter ≤ 3005–10 times
300 < Diameter ≤ 5003–8 times
500 < Diameter ≤ 8002–6 times
800 < Diameter ≤ 15001–4 times
Table 3. Leakage localization results for the water network in Plant A.
Table 3. Leakage localization results for the water network in Plant A.
Localization Range (Topological Distance)Training Set AccuracyValidation Set Accuracy
092.67%90.28%
195.33%94.44%
298.50%97.22%
3100%98.61%
Table 4. Leakage localization results for the water distribution network in City H.
Table 4. Leakage localization results for the water distribution network in City H.
ZoneMonitoring Point Density (Number of Monitoring Points/Total Nodes in the Zone)Topology 0Topology 1Topology 2Topology 3
A16.7%59.1%77.3%88.6%90.1%
B12.1%50.6%62.9%80.9%88.8%
C70.9%84.6%88.4%92.5%96.1%
D18.2%75.0%83.3%83.3%87.5%
E16.4%51.7%67.7%70.9%83.9%
F15.2%67.9%78.6%82.1%85.7%
G13.7%54.8%76.2%81.0%85.7%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, X.; Wu, Y. A Convolutional Graph Neural Network Model for Water Distribution Network Leakage Detection Based on Segment Feature Fusion Strategy. Water 2024, 16, 3555. https://doi.org/10.3390/w16243555

AMA Style

Li X, Wu Y. A Convolutional Graph Neural Network Model for Water Distribution Network Leakage Detection Based on Segment Feature Fusion Strategy. Water. 2024; 16(24):3555. https://doi.org/10.3390/w16243555

Chicago/Turabian Style

Li, Xuan, and Yongqiang Wu. 2024. "A Convolutional Graph Neural Network Model for Water Distribution Network Leakage Detection Based on Segment Feature Fusion Strategy" Water 16, no. 24: 3555. https://doi.org/10.3390/w16243555

APA Style

Li, X., & Wu, Y. (2024). A Convolutional Graph Neural Network Model for Water Distribution Network Leakage Detection Based on Segment Feature Fusion Strategy. Water, 16(24), 3555. https://doi.org/10.3390/w16243555

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop