Next Article in Journal
Assessment of Potential Aquifer Recharge Zones in the Locumba Basin, Arid Region of the Atacama Desert Using Integration of Two MCDM Methods: Fuzzy AHP and TOPSIS
Previous Article in Journal
What Is Relatively Permanent? Flow Regimes of Arizona Streams within the Context of the 2023 Conforming Rule on the Revised Definition of “Waters of the United States”
Previous Article in Special Issue
Developing Water Quality Formulations for a Semi-Distributed Rainfall–Runoff Model
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Study on Large-Scale Urban Water Distribution Network Computation Method Based on a GPU Framework

1
State Key Laboratory of Eco-Hydraulics in Northwest Arid Region of China, Xi’an University of Technology, Xi’an 710048, China
2
Shaanxi Province Institute of Water Resources and Electric Power Investigation and Design, Xi’an 710048, China
*
Author to whom correspondence should be addressed.
Water 2024, 16(18), 2642; https://doi.org/10.3390/w16182642
Submission received: 22 August 2024 / Revised: 12 September 2024 / Accepted: 16 September 2024 / Published: 18 September 2024
(This article belongs to the Special Issue Urban Flood Mitigation and Sustainable Stormwater Management)

Abstract

:
Large-scale urban water distribution network simulation plays a critical role in the construction, monitoring, and maintenance of urban water distribution systems. However, during the simulation process, matrix inversion calculations generate a large amount of computational data and consume significant amounts of time, posing challenges for practical applications. To address this issue, this paper proposes a parallel gradient calculation algorithm based on GPU hardware and the CUDA Toolkit library and compares it with the EPANET model and a model based on CPU hardware and the Armadillo library. The results show that the GPU-based model not only achieves a precision level very close to the EPANET model, reaching 99% accuracy, but also significantly outperforms the CPU-based model. Furthermore, during the simulation, the GPU architecture is able to efficiently handle large-scale data and achieve faster convergence, significantly reducing the overall simulation time. Particularly in handling larger-scale water distribution networks, the GPU architecture can improve computational efficiency by up to 13 times. Further analysis reveals that different GPU models exhibit significant differences in computational efficiency, with memory capacity being a key factor affecting performance. GPU devices with larger memory capacity demonstrate higher computational efficiency when processing large-scale water distribution networks. This study demonstrates the advantages of GPU acceleration technology in the simulation of large-scale urban water distribution networks and provides important theoretical and technical support for practical applications in this field. By carefully selecting and configuring GPU devices, the computational efficiency of large-scale water distribution networks can be significantly improved, providing more efficient solutions for future urban water resource management and planning.

1. Introduction

The water distribution network is a vital component of modern urban infrastructure, and as urbanization progresses and population growth continues, the demand for water by residents steadily increases, leading to the continuous expansion of water distribution networks [1,2]. According to recent statistics, the total length of urban water distribution networks in China has reached 1.1 million kilometers, with a water supply volume of 58.65 billion m3. Safe and reliable operation of urban water distribution networks is crucial. However, factors such as temperature fluctuations, soil erosion, and ground subsidence often result in damage and corrosion to the pipelines within the network [3,4]. In 2023, the total water loss in urban areas across China was reported to be 7.85 billion m3, with a leakage rate of around 13% [5]. In recent years, with the widespread implementation of China’s urban lifeline projects, the upgrading and renovation of water distribution network systems, particularly in terms of leak monitoring and structural optimization, have become key measures to enhance urban resilience against disasters [6,7]. Therefore, constructing a stable and efficient model for large-scale urban water distribution networks is essential, as it can effectively support the stable operation of urban water distribution systems [8].
However, in the face of the massive scale of urban water supply networks, their complex topological structures, highly variable boundary conditions, and uncertainties in modeling parameters, the process of solving computational equations becomes particularly intricate [9]. This greatly prolongs the model’s computation time, becoming a major technical bottleneck limiting the widespread application of large-scale network models [10]. At the same time, in the simulation of unsteady flow in large-scale water supply networks, complex tank modeling and dynamic head processing have long been research focal points. For instance, Avesani et al. extended the EPANET source code to better simulate unsteady flow with variable head tanks [11]; similarly, Todini’s research adapted the extended global gradient algorithm (GGA) to improve long-period unsteady flow simulations in distribution systems [12]. While these methods have improved model accuracy, they have also increased computational complexity. To address the issue of low computational efficiency in large urban water supply networks, current research has focused on two main approaches to improve computational efficiency: The first approach involves the use of advanced numerical optimization methods, including, but not limited to, parallel computing systems integrated with adaptive hydraulic solvers, hybrid strategies that combine multiple algorithms, and the development of more efficient hydraulic calculation models, or flexible combinations of these methods. These algorithmic innovations aim to streamline the computational process and reduce response times [13,14]. The second approach centers on the upgrade of computing hardware and the deep application of parallel computing technologies [15]. This involves deploying high-performance computers equipped with more powerful processors and acceleration hardware, such as GPUs, and leveraging parallel computing frameworks to fully utilize hardware resources and accelerate the computation process. This hardware-driven method seeks to push the physical limits of single computations, leading to a leap in computational efficiency and providing robust hardware support for the efficient operation of large-scale urban water distribution network models.
Currently, numerical methods mainly focus on optimizing the topological matrices involved in network computation to accelerate the convergence of nonlinear matrices, thereby improving computational efficiency. For example, Ivetic et al. utilized a variation of the ΔQ method for hydraulic calculations in networks [16]. By applying the ΔQ method within the evaluation function, they performed hydraulic solving and ranking for each tested alternative, bypassing the computational burden of iterative solvers, which enhanced computational efficiency in solving single-objective optimization problems and improved the convergence criteria. Similarly, Zecchin et al. used a ΔQ-based method and implemented a triangulation-based loop identification algorithm (TRIBAL) to replace the GGA algorithm [17]. Although this approach increased the number of iterations, it significantly improved computational efficiency compared to GGA-based solvers while maintaining good numerical stability. In another study by Diao et al., the EPANET2 source code was modified using domain Schur decomposition [18]. This method divided the nonlinear equation system into smaller independent submodules, rearranged matrix elements, and used Schur decomposition to calculate the separator values while simultaneously solving the subdomain values. The iteration of these steps continued until the equation system converged, ultimately improving computation speed by approximately eightfold. Other research on improving the traditional GGA algorithm for accelerating the solution of water distribution system (WDS) hydraulic variables includes Zecchin et al., who employed the algebraic multigrid (AMG) method [19]. By utilizing a series of smaller dimensional computational modules to approximate Jacobian matrix calculations, they demonstrated that AMG outperforms sparse Cholesky methods in node reordering (used by the EPANET2 solver), incomplete LU decomposition (ILU), and PARDISO (standard iterative and direct sparse linear solvers), thereby enhancing computational efficiency. Ali Suvizi proposed a parallel computing architecture based on the concept of cellular automata, using Taylor series to solve hydraulic equations [20]. This method, by considering networks with regular structures, replaced complex nonlinear equations with more intuitive forms to accelerate the numerical solution of steady-state distribution network models, effectively improving computational efficiency. However, its acceleration effect was limited by the network size and required the setting of numerous regular structured networks. Duan et al. applied the spanning tree technique from graph theory to improve water distribution network computation methods, reducing computation time and memory consumption compared to existing methods while enhancing accuracy [21]. These methods, by optimizing the underlying formulas, reduced the size of matrices involved in computations, improving both computational efficiency and accuracy. However, the complexity of code modifications in these methods makes them challenging to apply in practical engineering. In particular, solving the hydraulic characteristics of water distribution systems using the ΔQ method involves a high level of programming difficulty.
Additionally, several studies have explored the use of hardware devices to enhance the computational efficiency of large-scale water distribution networks. For example, Corus et al. investigated the possibility of using GPU parallel computing technology to solve the conjugate gradient method [22]. By altering the computational formula of the conjugate gradient method to avoid matrix inversion, they proposed a parallel method that could run on graphics processors for numerically solving hydraulic models of distribution networks. However, this approach did not fully solve the hydraulic model for water distribution networks, and limitations in the interface capacity of GPU and CPU devices led to decreased computational performance, failing to leverage the advantages of GPU parallel acceleration. Guidolin et al. utilized single instruction, multiple data (SIMD), and GPU-based parallel computing technologies (HPC) to accelerate demand-driven hydraulic solvers, implementing the hydraulic model through the CWSNet library [23]. They noted that the linear solver was not the most time-consuming code block in the simulation, and that pipeline head loss calculations were a major factor limiting computational efficiency. However, the acceleration of matrix multiplication did not significantly increase the algorithm’s speed, with computational efficiency only improving by 18% for distribution networks with more than 2000 nodes. Yang used the “Songshan” supercomputing system to simulate large-scale networks, employing a CPU + DCU architecture for intensive data calculations [24]. Current research primarily focuses on using GPU hardware for parallel computing to accelerate the solving process. However, in the computation of large-scale water supply networks, the presence of large sparse matrices limits the efficiency of inversion operations. These limitations arise due to the strong data dependencies in matrix inversion algorithms, the inefficiency of parallelization caused by the sparse matrix structure, and the significant communication overhead required to synchronize intermediate results across processors. Additionally, load imbalance and high memory bandwidth demands further complicate the parallelization of these tasks. As a result, this approach has not been widely adopted in practice.
In traditional water supply network computations, while the CPU has strong versatility and the ability to handle complex logic, its computational efficiency is significantly limited when solving large sparse matrices, especially for nonlinear equation systems. In hydraulic model computations, the CPU gradually exhibits performance bottlenecks, making it difficult to process large-scale data quickly. Although the GPU has excellent parallel processing capabilities and can accelerate matrix operations, early algorithms faced challenges when handling complex sparse matrix inversion, limiting the full potential of its multi-core architecture. However, with the rapid development of GPU hardware in recent years and continuous optimization of CUDA algorithms, particularly in improving linear equation system computations within parallel computing frameworks, the GPU is no longer confined to simple matrix multiplication; it can now efficiently solve sparse matrix inversions. This advancement provides strong technical support for solving water supply network models, significantly reducing computation time and laying a solid foundation for efficient simulation and analysis of water supply network models.
In summary, to address the comprehensive needs of large-scale urban water distribution network computations in terms of programming convenience, computational accuracy, and efficiency improvement, this study adopts a combined acceleration method using GPU acceleration technology and the CUDA Toolkit library to solve water distribution network models. This method aims to optimize the performance bottlenecks of traditional CPUs in large-scale network model computations by leveraging the powerful parallel processing capabilities of GPUs. Compared to traditional CPU-based models, this approach significantly reduces computation time, improves computational efficiency, and enhances accuracy, fully meeting the demands for rapid simulation and analysis of large-scale urban water distribution networks in complex scenarios. This study provides technical support for the application of GPU acceleration technology in water distribution network computations.

2. Computing Methods and Architecture

2.1. GGA “Direct” Flow-Pressure d(H) Approach

This paper converts the actual network structure into a topological structure following the LINK-NODE format [25,26], as shown in Figure 1 below:
In Figure 1, −1 represents the direction of the pipe starting from the node, 1 represents the direction of the pipe ending at the node, and 0 indicates no direct connection between the node and the pipe. The yellow and green grids represent the topological relationships between pipes and known/unknown head nodes. This transformation relationship converts the actual pipeline network into a topological structure.
The hydraulic characteristics of the water supply network are solved using a set of pipe equations. Assuming a network with N junctions and NF known head nodes, by constructing the flow-head loss relationship equation between nodes i and j and the flow continuity equation for all nodes, the values of flow rate Q and head h are simultaneously solved. The relationship between the flow rate within the pipe and the head loss is expressed using the Hazen–Williams formula [8,27,28], as follows:
H i H j = h i j = s Q i j n + m Q i j 2
j Q i j D i = 0
s = 10.654 C 1.852 d 4.871 L
In Equations (1)–(3), H represents the node head in meters (m); h is the head loss, (m); s is the resistance coefficient; Q is the flow rate, (m3/s); n is the flow index, with a value of 1.852; m is the local loss coefficient; Di is the water demand at node i, (m3/s); C is the Hazen–Williams roughness coefficient, which is related to the pipe material; d is the pipe diameter in meters (m); L is the pipe length, (m).
Significant progress has been made in the study of methods for solving pipe equation systems. For instance, the UD-PD computational method introduced by Andrea et al., incorporating distributed pressure-driven demand into the global gradient algorithm (GGA), has been recognized as a key tool for accurately simulating flow and pressure in water distribution networks (WDN) under pressure-deficient conditions [29]. Additionally, a fast and robust solution was proposed by Orazio et al., successfully accelerating the computation of large hydraulic networks under various operational scenarios [30]. However, the primary focus of this paper is to validate the critical role of hardware devices in solving pipe equation systems, and thus the traditional GGA algorithm, as proposed by Todini and Pilati, which has been employed for solving the equation systems. [31]. The Formulas (1) and (2) are linearized, with their differential forms shown as follows:
NA 11 A 12 A 21 0 d Q d h = d E d q
In Equation (4), N and A11 are diagonal matrices, where the diagonal elements of N are 1.852 and the diagonal elements of A11 are the linearization coefficients s Q k n 1 ; A 12 = A 21 T is the incidence matrix composed of 0 and 1, indicating whether a node is associated with a specific component; A10 is the head matrix composed of known head nodes; dE and dq are the residuals for the current solution of dQ and dh. According to the gradient algorithm, the flow rate Q and head h at the next time step are:
h k + 1 = A 21 N 1 A 11 1 A 12 1 A 21 N 1 Q k + A 11 1 A 10 H 0 + q A 21 Q k
Q k + 1 = ( 1 N 1 ) Q k N 1 A 11 1 ( A 12 H k + 1 + A 10 H 0 )
In Equations (5) and (6), hk and Qk represent the current node head and pipe flow in the network, with units of (m) and (m3/s), respectively; H0 is the known node head, (m). Convergence is determined by evaluating the results of successive calculations, with further iterations performed if necessary. This method is widely applied in the EPANET software [32,33,34].
From Equations (5) and (6), it can be observed that solving the water supply network model using the gradient algorithm primarily involves operations such as matrix inversion, addition, subtraction, and multiplication among multiple arrays. The inversion operation, due to its complex steps and higher computational difficulty, typically takes longer than other array operations. Common methods for matrix inversion include the elementary transformation method, the Gaussian elimination method, the adjoint matrix method, and the LU decomposition method. Among these, the LU decomposition method is widely used for inverting large matrices because it offers advantages such as reducing data volume, ease of parallel computation, numerical stability, and ease of implementation and extension [35,36]. Therefore, in this study, the LU decomposition method is employed for matrix inversion under both CPU and GPU architectures [37].

2.2. CPU Computing Architecture

In this study, the CPU architecture used for gradient computation in the water distribution network is based on the C++ language, with code editing carried out using Visual Studio 2022. The differential equations involved in the gradient calculation, as discussed earlier, indicate that the computational process primarily consists of matrix operations such as addition, subtraction, multiplication, and inversion. Matrix inversion is a crucial part of the overall computational task. To enhance programming convenience, the CPU architecture utilizes the Armadillo-12.6.4 library for matrix computations [38]. Armadillo is an open-source, high-quality C++ linear algebra library that integrates LAPACK (Linear Algebra Package) or other highly optimized backend libraries, such as Intel’s MKL, depending on the processor [39,40]. By employing the mature and stable Armadillo library, the limitations associated with developing custom solvers are reduced, enabling more efficient utilization of the CPU’s processing power.
Below is the flowchart for gradient computation in a water distribution network based on the CPU architecture using the Armadillo library. The model constructs a topological matrix of the computational equations by reading the basic information of the network. The state of the network at time T = n − 1 is input into the CPU architecture, where linear calculations are performed using the Armadillo library. Convergence of the computation results is evaluated to determine whether to proceed with the next iteration or output the final results. Due to the complexity and necessity of matrix inversion, the flowchart primarily highlights the matrix inversion process under the CPU architecture, as detailed below:
(1)
Load data from the main memory to the L3 cache.
(2)
Perform partial computations and move data to the L2 cache.
(3)
Continue computations and move data to the L1 cache.
(4)
Complete computations and move data to the CPU core.
(5)
Compute L and U matrices in the CPU core and store results back in the L1 cache.
(6)
Use L and U to compute the inverse matrix in the CPU core and store results in the L1 cache.
(7)
Results are stored back from the L1 cache -> L2 cache -> L3 cache -> main memory.
L1, L2, and L3 caches are organized in a hierarchical structure. The L1 cache, being closest to the CPU core, offers the highest speed but has the smallest capacity. The L2 cache, with greater capacity but slightly lower speed, typically serves to share data between cores. The L3 cache is larger still, with slower access speeds, and is shared across all cores. During execution, data is progressively transferred from the main memory through L3 and L2 to L1, before reaching the CPU core for processing, establishing a multi-tiered storage and transfer mechanism. This architecture enhances computational efficiency by minimizing direct access to the main memory, as shown in Figure 2 and Table 1.
Regarding the sample size, 10 different network structures, from Model-1 to Model-10, were selected, with node counts ranging from 100 to 6000, representing urban water supply networks of various regional scales, as shown in Figure 3. Specifically, Model-1 to Model-3 represent community-level networks with 100, 200, and 500 nodes, respectively; Model-4 to Model-6 represent constituency-level networks with 700, 1000, and 2000 nodes; Model-7 and Model-8 represent district-level networks with 3000 and 4000 nodes; and Model-9 and Model-10 represent city-level networks with 5000 and 6000 nodes. The sample size was chosen to cover a range of network complexities, from small to large-scale networks, ensuring broad applicability and reliability of the model results across different network scales. This approach allows for a comprehensive analysis of improvements in computational efficiency under various conditions. Since the models are self-constructed, the node attributes were uniformly configured for ease of modeling, as shown in Table 2.
By extracting the total computation time and matrix inversion time from each model within the CPU computation framework, the main factors limiting the efficiency of water distribution network calculations were analyzed, as shown in Figure 4. It was observed that for models with fewer than 1000 nodes, most of the computation time is consumed by matrix operations other than inversion. However, for models with more than 1000 nodes, the majority of computation time is spent on matrix inversion. In the most extreme cases, up to 92% of the total computation time is dedicated to inverting large matrices. This finding underscores that improving the efficiency of matrix inversion is a critical factor in enhancing the overall computation efficiency when using gradient algorithms to solve large-scale urban water distribution network models.

2.3. GPU Computing Architecture

Based on the computational results using the CPU architecture, this study transitions to using a GPU architecture to solve the water distribution network equations in order to improve computational efficiency. Earlier GPUs had several limitations when performing matrix inversion, including restrictions due to single-precision floating-point operations, the complexity of parallel computations, memory bandwidth bottlenecks, and limited support from development tools and libraries. These issues resulted in inadequate precision, low efficiency, and increased development difficulty for matrix inversion operations. However, the latest versions of CUDA have addressed these issues by supporting double-precision floating-point operations, optimizing parallel computing models, enhancing memory management efficiency, and providing powerful mathematical libraries. These improvements have significantly enhanced the performance and accuracy of matrix inversion on GPUs, enabling them to handle complex matrix operations more efficiently and reliably. Therefore, unlike the findings by Corus et al., current GPU hardware and corresponding libraries now meet the precision and efficiency requirements for inverting large matrices, making them suitable for building large-scale urban water distribution network models [22].
The GPU architecture employed in this study follows a computational process similar to that of the CPU architecture, utilizing the C++ language and Visual Studio 2022 as the development environment. However, the linear matrix computations are handled using GPU-specific libraries such as cuBLAS and cuSOLVER [41]. Since the GPU does not have a direct function for matrix inversion, the code leverages the cusolverDnDgetrf function to perform LU decomposition of the matrix, followed by cusolverDnDgetrs to solve the equation AX = I, thereby obtaining the inverse of the matrix. A sample code snippet for performing matrix calculations using CUDA is provided below. The above calculation process is shown in Table 3.
The version of CUDA used in this study is CUDA Toolkit 12.0 [42]. Unlike the matrix linear computation process in the CPU using the Armadillo library, the GPU computation process in the CUDA environment involves data transfer between the CPU and GPU while maintaining a high level of parallel computing capability [43]. Below is a flowchart of the gradient computation for the water distribution network based on the GPU architecture using the CUDA library. Unlike the CPU architecture, where matrix inversion relies on multi-level caching for sequential processing, the GPU improves computational efficiency through multi-threaded parallel processing [44]. The GPU processes computation matrices in blocks, performing LU decomposition and matrix inversion simultaneously on each block. The specific computational process is as follows:
(1)
Data is loaded from global memory to the shared memory of thread blocks.
(2)
Data is loaded from shared memory to the registers of each thread.
(3)
Each GPU core performs computations, storing results back in registers, then shared memory, and finally back in global memory.
The above calculation framework is shown in Figure 5.

3. Model Calculation Results

3.1. Validation of Computational Model Stability

To validate the computational accuracy of the CPU and GPU models, the results under the same case were compared with the results from EPANET, specifically comparing the pipeline flow rates and node head distributions under the same simulation conditions. The simulation adopts the case study (Model-Large) from Robert’s paper, as shown in Figure 6a, which contains a total of 3557 nodes, 4021 pipelines, and 1 reservoir. Under the conditions of Robert’s paper, the steady-state hydraulic analysis results of the model are consistent with the literature, verifying the accuracy of the model’s construction [45].
To further validate the accuracy of the model construction under transient simulation, the baseline storage volume for all nodes was uniformly set to 1L/s for ease of model configuration, and the demand factor profile was artificially defined, as shown in Figure 6b. A 12-h simulation was conducted. The initial head of the reservoir was set to 5000 m, a condition established to avoid negative pressure during the computation. Since the purpose of this paper is only to verify the model’s computational stability, the initial head setting was not based on actual conditions.
T = 1 h, 6 h, and 11 h are randomly selected as the time points for validating the model’s stability, and the Nash–Sutcliffe efficiency coefficient (NES) is used to measure the pipe pressure calculation results of the EPANET software, CPU model, and GPU model at each time point, as shown in Formula (7).
NSE = 1 i = 1 n Q o b s , i Q s i m , i 2 i = 1 n Q o b s , i Q ¯ o b s 2
In Formula (7), Q o b s , i is the observed value (calculated by EPANET); Q s i m , i is the simulated value (representing CPU and GPU simulation results); Q ¯ o b s is the mean of observed values; n is the number of observations.
The results are shown in Table 4. It is indicated that the accuracy of the CPU and GPU models remains above 98% at different time points, demonstrating good overall accuracy. Additionally, the accuracy of the GPU calculations exceeds 99%, further indicating that the models constructed in this study have high computational accuracy, and that the GPU computing method using the CUDA Toolkit 12.0 library provides better stability.

3.2. Validation of GPU Acceleration Performance

With the model accuracy meeting the requirements, the next step is to validate the acceleration performance of the GPU model. The 12 water supply network examples, Model-1 to Model-12 from Figure 3, were used to test the computational limits and calculation time under different architectures to verify the acceleration efficiency of the GPU model. The GPU used is an NVIDIA GeForce RTX 3070, and the CPU used is a 12th Gen Intel(R) Core(TM) i7-12700. The equipment used in this study was sourced from ASUS, headquartered in Taipei, Taiwan. The relevant device parameters are shown in Table 5.
The computational results of the model are shown in Figure 7. Regarding the maximum scale of the water distribution network that can be computed, as indicated by the dashed line in Figure 7a, the CPU-i7-12700 device can handle up to 6000 nodes in the network model due to its memory capacity. Beyond this scale, the computational efficiency of the model significantly decreases, and the computer may even freeze. In contrast, the GPU-RTX3070 can compute a network model with up to 8000 nodes, indicating that the GPU model can handle larger water distribution networks and is better suited for the computational demands of large urban water distribution systems.
In terms of computational efficiency, as shown in Figure 7b, the GPU demonstrates a clear advantage, especially when the network contains a large number of nodes. The GPU’s computation time is approximately five times faster than that of the CPU. However, in smaller network structures, the CPU model outperforms the GPU model in computational efficiency, as shown in Figure 7a, for networks with 100 and 200 nodes. This is because in smaller computational ranges, the GPU requires data transfer from the CPU and loading of the GPU computation program, which consumes more computation time. As the number of network nodes increases, the proportion of time spent on data transfer and loading decreases until it becomes negligible relative to the overall computation time. These results indicate that the CPU is better suited for small-scale water distribution network computations, while the GPU is more appropriate for larger-scale networks, making it particularly well-suited for the hydrodynamic calculations of large urban water distribution networks.

3.3. Comparison of GPU Acceleration Performance across Different Devices

Section 3.2 compared the computational efficiency between CPU and GPU under the same computational scenarios. However, in practical engineering, there are also significant differences in performance between different GPU devices. To compare the computational efficiency of different GPU devices, the same network examples from Section 3.2 were used to obtain the model computation response times under various GPU devices. The equipment used in this study was sourced from ASUS, headquartered in Taipei, Taiwan. The detailed information and computational results for each GPU device are shown in the Table 6.
In terms of model computation specifications, Figure 8a shows that all GPU models outperform the CPU in computational efficiency, and the better the GPU’s performance, the larger the water distribution network it can handle. The RTX4090 can compute networks with up to 10,000 nodes and has the potential to handle even more, while the RTX3070 can manage up to 8000 nodes, and the GTX1080 and RTX2080 SUPER can handle up to 6000 nodes, respectively.
In terms of computational efficiency, as shown in Figure 8a, for large-scale network cases, the better the GPU’s performance, the higher the computational efficiency. The RTX4090 far surpasses other GPU models, with a maximum efficiency improvement of over 13 times. However, as shown in Figure 8b, for smaller networks, the higher-performing RTX3070 actually has the worst computational efficiency, with computation times even longer than those of the CPU model, especially when the network has around 100 nodes. This discrepancy is due to differences in how different GPUs load programs, which leads to variations in data transfer and program loading times. However, these differences in overall computation time are relatively small, and as GPU performance continues to improve, these disparities are expected to diminish, with GPU efficiency significantly surpassing that of the CPU. In summary, the better the GPU’s performance, the higher its computational efficiency, with the high-end RTX4090 significantly outperforming the CPU even in small-scale networks, making it suitable for water distribution network computations of varying scales in large cities.
In terms of efficiency improvements across different network scales, as shown in Figure 8d, most GPUs begin to demonstrate their acceleration advantages when the number of network nodes reaches 500. The acceleration factors for the GPUs, ranked from lowest to highest performance, are approximately 1.50, 1.44, 1.36, and 2.83. As the number of nodes increases, computational efficiency gradually improves, but there is little increase before reaching 2000 nodes, where the acceleration factors remain between 1–2. At 2000 nodes, computational efficiency sees a significant boost, with the acceleration factors reaching 3.50, 3.53, 4.40, and 9.58, with the RTX4090 showing the most notable improvement. However, as the number of nodes continues to increase, the acceleration factors slightly decrease for all GPUs except the RTX4090. After reaching 6000 nodes, computational efficiency begins to slightly improve again. Apart from the RTX4090, which consistently shows improved efficiency as the number of nodes increases, other GPUs follow a pattern of slight improvement, rapid improvement, slight decrease, and then slight recovery in efficiency as the number of nodes increases. These results indicate that while there are performance differences between GPU models, most mid-range GPUs exhibit similar acceleration patterns. High-performance GPUs, however, deliver the best computational efficiency and acceleration, making them the most suitable for large-scale urban water distribution network computations.

4. Discussion

4.1. Analysis of GPU Model Computational Efficiency

A comparison of the computational stability and efficiency between CPU and GPU models reveals that GPUs are better suited for solving large-scale water distribution network problems. The better the GPU, the higher the computational efficiency. This finding validates that, with current technology, using GPU parallel computing to solve the inverse of large sparse matrices is effective. It provides an additional option for water distribution network solutions, reducing the complexity of developing custom solvers. As shown in Figure 8a, the comparison between GPU and CPU models across different network sizes indicates that as the number of nodes in the network increases, the CPU’s computation time grows exponentially, while the GPU’s computation time increases only slightly. This difference is attributed to the larger memory capacity of GPUs, which allows parallel computing techniques to alleviate the computational load from the increased data volume, preventing the computation time from increasing exponentially with the number of network nodes.

4.2. Analysis of Differences in GPU Computational Performance

Figure 8c shows that when the number of network nodes reaches 2000, 3000, 4000, and 5000, the computation times for the GPU-GTX1080 and GPU-RTX2080 SUPER are very close, unlike the noticeable differences observed between other GPU devices. This similarity suggests that the two GPUs have comparable performance characteristics. A comparison of the parameters in Table 4 reveals significant differences in the computing architecture, transistor count, stream processor count, single-precision floating-point performance, and memory bandwidth between the two GPUs, with the GTX1080 being less powerful than the RTX2080 SUPER. However, both GPUs have the same memory capacity, indicating that memory capacity is a critical factor limiting computational performance in large-scale matrix inversion. Therefore, when performing large-scale parallel computations, it is advisable to prioritize GPUs with higher memory capacity. However, for particularly large network scales, other GPU parameters should also be considered. For example, although the RTX3070 and RTX2080 SUPER have the same memory capacity, their computational efficiency in large network models differs significantly, which is closely related to other GPU parameters. This observation aligns with findings in other studies [41,46,47]. Therefore, when choosing a GPU for solving water distribution networks, it is recommended to first select a device with higher memory capacity. If memory capacities are the same, other key performance metrics should be considered to maximize GPU performance while minimizing costs.

5. Conclusions

This study addresses the issue of slow matrix inversion in solving large-scale water distribution networks by using a CPU model based on the Armadillo library and a GPU model based on CUDA. Both models were applied to test cases with different network sizes to evaluate computational stability and efficiency, while also exploring the relationship between GPU device parameters and computational efficiency. The following conclusions were drawn:
1. Both the CPU and GPU models developed in this study can solve water distribution network equations with high computational accuracy. The GPU model demonstrates superior accuracy compared to the CPU model, making it better suited to meet the stability requirements of large-scale urban water distribution network computations.
2. The computational efficiency of the GPU model is generally higher than that of the CPU model. While the CPU model shows higher efficiency for smaller networks, the efficiency of the GPU model increases with network size, reaching up to a 13-fold improvement. Therefore, the GPU model is better equipped to meet the computational efficiency demands of large-scale urban water distribution networks.
3. The better the GPU’s performance, the higher the model’s computational efficiency. Additionally, computational efficiency is closely related to the GPU’s memory capacity, which is a critical factor affecting performance. When selecting a GPU device, priority should be given to those with larger memory capacity.
Based on these conclusions, a deeper understanding of large-scale urban water distribution network computations can be gained. By selecting appropriate GPU devices, computational stability and efficiency can be enhanced in large-scale urban network simulations. Compared to other numerical methods for water distribution networks, this approach offers simplicity and efficiency, aiding in the independent development of water distribution network models and their application in leak monitoring and other areas.

6. Future Prospects

While GPU acceleration technology enables efficient solving of water distribution network equations, there are still some limitations and areas for improvement in this research. Firstly, the solution method adopted in this study is relatively simple and has not yet incorporated more advanced computational techniques, meaning there is still room to improve precision and efficiency. Secondly, the study does not fully account for the impact of special components such as pump stations on the solution process, which could lead to certain limitations in practical applications. Additionally, compared to professional software such as EPANET, the method used in this study still lags in computational efficiency, primarily due to the lack of optimization in the model’s computational architecture. Future research could focus on improving the computational architecture and optimizing algorithms to enhance efficiency. Further, incorporating a broader range of GPU devices could allow for more comprehensive exploration and experimentation, ultimately improving computational performance and accuracy.

Author Contributions

R.Z. conceived the conceptualization of the research study, design and development of the experiment, data collection, formal analysis, investigation, methodology, visualization, writing an original draft, reviewed, supervised, and write-up editing. T.W., J.L. and M.I. contributed to the review and editing of the draft. J.H. supervised the entire research study and contributed as an internal reviewer for the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This work is partly supported by the National Natural Science Foundation of China (No. 52079106); Chinesisch-Deutsches Mobilitätsprogramm (No. M-0427), the Natural Science Foundations of Shaanxi Province (No. 2022JC-LHJJ-09), Key R&D Program of Shaanxi of China (2023GXLH-042).

Data Availability Statement

Data is contained within the article.

Acknowledgments

The authors are thankful to and acknowledge the State Key Laboratory of Eco-hydraulics in the Northwest Arid Region of China, Xi’an University of Technology, Xi’an 710048, Shaanxi, China.

Conflicts of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

References

  1. Wang, J.; Liu, G.H.; Wang, J.; Xu, X.; Shao, Y.; Zhang, Q.; Liu, Y.; Qi, L.; Wang, H. Current status, existent problems, and coping strategy of urban drainage pipeline network in China. Environ. Sci. Pollut. Res. Int. 2021, 28, 43035–43049. [Google Scholar] [CrossRef] [PubMed]
  2. Chi, X.; Zhenyang, P.; Hongya, Z.; Zijie, H. Study of Comprehensive Utilization of Water Resources of Urban Water Distribution Network. Water 2021, 13, 2791. [Google Scholar] [CrossRef]
  3. Cieślak, B.T.; Szpak, D.; Żywiec, J.; Rożnowski, M. The concept of estimating the risk of water losses in the water supply network. J. Environ. Manag. 2024, 359, 120965. [Google Scholar] [CrossRef] [PubMed]
  4. Carlos, J.-A.; Ivan, S. Pipe breaks and estimating the impact of pressure control in water supply networks. Reliab. Eng. Syst. Saf. 2021, 210, 107525. [Google Scholar]
  5. Junling, W.; Honghong, W.; Liantao, N. Hydraulic Simulation of Water Supply Network Leakage Based on EPANET. J. Pipeline Syst. Eng. Pract. 2024, 15, 05023006. [Google Scholar]
  6. Qian, J.; Du, Y.; Liang, F.; Yi, J.; Wang, N.; Tu, W.; Huang, S.; Pei, T.; Ma, T.; Burghardt, K.; et al. Evaluating resilience of urban lifelines against flooding in China using social media data. Int. J. Disaster Risk Reduct. 2024, 106, 104453. [Google Scholar] [CrossRef]
  7. Liu, H.; Savić, D.A.; Kapelan, Z.; Creaco, E.; Yuan, Y. Reliability Surrogate Measures for Water Distribution System Design: Comparative Analysis. J. Water Resour. Plan. Manag. 2016, 143, 04016072. [Google Scholar] [CrossRef]
  8. Wannapop, R.; Jearsiripongkul, T.; Jiamjiroch, K. Adaptive urban drinking water supply model using the effect of node elevation and head loss formula: A case study. Heliyon 2024, 10, e26181. [Google Scholar] [CrossRef]
  9. Avrin, A.-P.M.L. China’s Power Sector Decarbonization: Modeling Emission Reduction Potential, Technical Feasibility and Cost Efficiency of Inter-sectoral Approaches. Ph.D. Thesis, University of California, Berkeley, CA, USA, 2018. [Google Scholar]
  10. He, G.; Zhang, T.; Zheng, F.; Zhang, Q. An efficient multi-objective optimization method for water quality sensor placement within water distribution systems considering contamination probability variations. Water Res. 2018, 143, 165–175. [Google Scholar] [CrossRef]
  11. Avesani, D.; Righetti, M.; Righetti, D.; Bertola, P. The extension of EPANET source code to simulate unsteady flow in water distribution networks with variable head tanks. J. Hydroinform. 2012, 14, 960–973. [Google Scholar] [CrossRef]
  12. Todini, E. Extending the global gradient algorithm to unsteady flow extended period simulations of water distribution systems. J. Hydroinform. 2011, 13, 167–180. [Google Scholar] [CrossRef]
  13. Nikita, P.; Vishnu, P.; Ruchi, K. A new multi-objective evolutionary algorithm for the optimization of water distribution networks. Water Supply 2022, 22, 8972–8987. [Google Scholar]
  14. Dan, L.; Pei, M.; Shixuan, L.; Wei, L.; Danhui, F. Graph Convolutional Neural Network for Pressure Prediction in Water Distribution Network Sites. Water Resour. Manag. 2024, 38, 2581–2599. [Google Scholar]
  15. Avila-Melgar, E.Y.; Cruz-Chávez, M.A.; Martínez-Bahena, B.; Eraña-Díaz, M.L.; Cruz-Rosales, M.H. Parallel evolutionary algorithm for Water Distribution Network Design, using the Masters–Students model in distributed environment. Appl. Soft Comput. J. 2023, 135, 109986. [Google Scholar] [CrossRef]
  16. Ivetic, D.; Vasilic, Z.; Stanic, M.; Prodanovic, D. Speeding up the water distribution network design optimization using the Delta Q method. J. Hydroinform. 2016, 18, 33–48. [Google Scholar] [CrossRef]
  17. Vasilic, Z.; Stanic, M.; Kapelan, Z.; Ivetic, D.; Prodanovic, D. Improved Loop-Flow Method for Hydraulic Analysis of Water Distribution Systems. J. Water Resour. Plan. Manag. 2018, 144, 04018012. [Google Scholar] [CrossRef]
  18. Diao, K.; Wang, Z.; Burger, G.; Chen, C.-H.; Rauch, W.; Zhou, Y. Speedup of water distribution simulation by domain decomposition. Environ. Model. Softw. 2014, 52, 253–263. [Google Scholar] [CrossRef]
  19. Zecchin, A.C.; Thum, P.; Simpson, A.R.; Tischendorf, C. Steady-State Behavior of Large Water Distribution Systems: Algebraic Multigrid Method for the Fast Solution of the Linear Step. J. Water Resour. Plan. Manag. 2012, 138, 639–650. [Google Scholar] [CrossRef]
  20. Ali, S.; Azim, F.; Morteza, S.Z. A parallel computing architecture based on cellular automata for hydraulic analysis of water distribution networks. J. Parallel Distrib. Comput. 2023, 178, 11–28. [Google Scholar]
  21. Duan, H.-f.; Yu, G.-p. Spanning tree-based algorithm for hydraulic simulation of large-scale water supply networks. Water Sci. Eng. 2010, 3, 23–35. [Google Scholar]
  22. Crous, P.A.; Zyl, J.E.v.; Roodt, Y. The potential of graphical processing units to solve hydraulic network equations. J. Hydroinform. 2012, 14, 603–612. [Google Scholar] [CrossRef]
  23. Guidolin, M.; Kapelan, Z.; Savić, D. Using high performance techniques to accelerate demand-driven hydraulic solvers. J. Hydroinform. 2013, 15, 38–54. [Google Scholar] [CrossRef]
  24. Yang, Z.; Han, L.; Li, B.; Xie, J.; Han, P.; Liu, Y. Large-Scale Pipe Network Simulation Based on the “Songshan” Supercomputer System. Comput. Eng. 2022, 48, 155–161. [Google Scholar]
  25. Carlo, C.; Enrico, C. Comparison of Pressure-Driven Formulations for WDN Simulation. Water 2018, 10, 523. [Google Scholar] [CrossRef]
  26. Deuerlein, J.W.; Piller, O.; Elhay, S.; Simpson, A.R. Content-Based Active-Set Method for the Pressure-Dependent Model of Water Distribution Systems. J. Water Resour. Plan. Manag. 2019, 145, 04018082. [Google Scholar] [CrossRef]
  27. Sherri, F.; Mahvi, A.H.; Eshlaghy, A.T.; Hassani, A.H. A new approach in simultaneous calibration of Hazen-Williams coefficients and demand of nodes in of water distribution systems. Desalination Water Treat. 2017, 74, 137–148. [Google Scholar] [CrossRef]
  28. Caroline, B.; Filippo, P.; Ivan, S. Prior Assumptions for Leak Localisation in Water Distribution Networks with Uncertainties. Water Resour. Manag. 2021, 35, 5105–5118. [Google Scholar]
  29. Menapace, A.; Avesani, D. Global Gradient Algorithm Extension to Distributed Pressure Driven Pipe Demand Model. Water Resour. Manag. 2019, 33, 1717–1736. [Google Scholar] [CrossRef]
  30. Giustolisi, O.; Moosavian, N. Testing linear solvers for global gradient algorithm. J. Hydroinform. 2014, 16, 1178–1193. [Google Scholar] [CrossRef]
  31. Todini, E.; Santopietro, S.; Gargano, R.; Rossman, L.A. Pressure Flow–Based Algorithms for Pressure-Driven Analysis of Water Distribution Networks. J. Water Resour. Plan. Manag. 2021, 147, 04021048. [Google Scholar] [CrossRef]
  32. Sela, L.; Salomons, E.; Housh, M. Plugin prototyping for the EPANET software. Environ. Model. Softw. 2019, 119, 49–56. [Google Scholar] [CrossRef]
  33. Iglesias-Rey, P.L.; Martínez-Solano, F.J.; Ribelles-Aquilar, J.V. Extending EPANET Capabilities with Add-In Tools. Procedia Eng. 2017, 186, 626–634. [Google Scholar] [CrossRef]
  34. Arandia, E.; Eck, B.J. An R package for EPANET simulations. Environ. Model. Softw. 2018, 107, 59–63. [Google Scholar] [CrossRef]
  35. Yucong, L.; Simiao, J.; Lek-Heng, L. LU decomposition and Toeplitz decomposition of a neural network. Appl. Comput. Harmon. Anal. 2024, 68, 101601. [Google Scholar]
  36. Al-Kurdi, A.; Kincaid, D.R. LU-decomposition with iterative refinement for solving sparse linear systems. J. Comput. Appl. Math. 2005, 185, 391–403. [Google Scholar] [CrossRef]
  37. Ozcan, C.; Sen, B. Investigation of the performance of LU decomposition method using CUDA. Procedia Technol. 2012, 1, 50–54. [Google Scholar] [CrossRef]
  38. Jiang, Z.; Jiazhi, J.; Jiangsu, D.; Dan, H.; Yutong, L. Optimizing massively parallel sparse matrix computing on ARM many-core processor. Parallel Comput. 2023, 117, 103035. [Google Scholar]
  39. Sanderson, C.; Curtin, R.R. Armadillo: A template-based C++ library for linear algebra. J. Open Source Softw. 2016, 1, 26. [Google Scholar] [CrossRef]
  40. Sanderson, C.; Curtin, R. Practical Sparse Matrices in C++ with Hybrid Storage and Template-Based Expression Optimisation. Math. Comput. Appl. 2019, 24, 70. [Google Scholar] [CrossRef]
  41. Yoshida, K.; Miwa, S.; Yamaki, H.; Honda, H. Analyzing the impact of CUDA versions on GPU applications. Parallel Comput. 2024, 120, 103081. [Google Scholar] [CrossRef]
  42. Sharma, G.; Agarwala, A.; Bhattacharya, B. A fast parallel Gauss Jordan algorithm for matrix inversion using CUDA. Comput. Struct. 2013, 128, 31–37. [Google Scholar] [CrossRef]
  43. Pieter, G.; Ryan, S. High performance sparse multifrontal solvers on modern GPUs. Parallel Comput. 2022, 110, 102897. [Google Scholar]
  44. GPU Programming in MATLAB; Elsevier Inc.: Amsterdam, The Netherlands, 2016.
  45. Robert, S.; Mohsen, H.; Sina, H.; Kegong, D. Dual graph characteristics of water distribution networks-how optimal are design solutions? Complex Intell. Syst. 2023, 9, 147–160. [Google Scholar]
  46. Garland, M.; Kirk, D.B. Understanding throughput-oriented architectures. Commun. ACM 2010, 53, 58–66. [Google Scholar] [CrossRef]
  47. Ben-Nun, T.; Hoefler, T. Demystifying Parallel and Distributed Deep Learning. ACM Comput. Surv. (CSUR) 2019, 52, 65. [Google Scholar] [CrossRef]
Figure 1. Topological representation of the pipeline network structure.
Figure 1. Topological representation of the pipeline network structure.
Water 16 02642 g001
Figure 2. CPU computational framework for water supply network.
Figure 2. CPU computational framework for water supply network.
Water 16 02642 g002
Figure 3. Computational model examples.
Figure 3. Computational model examples.
Water 16 02642 g003
Figure 4. Proportion of computation time for the models.
Figure 4. Proportion of computation time for the models.
Water 16 02642 g004
Figure 5. GPU computational framework for water supply network.
Figure 5. GPU computational framework for water supply network.
Water 16 02642 g005
Figure 6. Large computational model.
Figure 6. Large computational model.
Water 16 02642 g006
Figure 7. Comparison of computational performance between GPU and CPU models across different examples.
Figure 7. Comparison of computational performance between GPU and CPU models across different examples.
Water 16 02642 g007
Figure 8. Comparison of computational performance across different scenarios under various GPU devices.
Figure 8. Comparison of computational performance across different scenarios under various GPU devices.
Water 16 02642 g008
Table 1. Sample code for matrix inversion and multiplication using the Armadillo library.
Table 1. Sample code for matrix inversion and multiplication using the Armadillo library.
#include <iostream>
#include <armadillo>
int main() {
  double s ;      //Computed value of s (Calculated according to Equation (3))
  const double n = 1.852;         //Define constant s and parameter n
  //Q represents flow rates, shown here as constants for demonstration
  arma::vec Q = {10.0, 20.0, 30.0, 40.0}
  // Create a diagonal matrix A11
  arma::mat A11 = arma::diagmat(arma::abs(s * arma::pow(Q, n − 1)));
  arma::mat A11_inv = arma::inv(A11);      // Compute and print inverse
  A11_inv.print(“Inverse of A11:”);
  return 0;
}
Table 2. Model parameter settings.
Table 2. Model parameter settings.
PIPENodeReservoir
Length/mDiameter/mmRoughnessBase Demand/(L\s)Total Head\m
1002001301189000
Table 3. Sample code for matrix inversion using the CUDA library.
Table 3. Sample code for matrix inversion using the CUDA library.
#include <cusolverDn.h>
#include <cublas_v2.h>
int main() {
  // Assuming A11 is computed on the CPU and the GPU parameters are set
  //Copy A11 from CPU to GPU
  cudaMemcpy(d_A11, h_A11, N * N * sizeof(double), cudaMemcpyHostToDevice);
  //Perform LU decomposition
  cusolverDnDgetrf(handle, N, N, d_A11, N, NULL, d_ipiv, d_info);
  //Solve for the inverse using the LU factorization
  cusolverDnDgetrs(handle, CUBLAS_OP_N, N, N, d_A11, N, d_ipiv, d_A11, N, d_info);
  //Free resources
  return 0;
}
Table 4. NES of the models’ calculation results at different time points.
Table 4. NES of the models’ calculation results at different time points.
ModelT = 1 hT = 6 hT = 11 h
GPU98.45%99.61%99.61%
CPU98.45%98.45%98.45%
Table 5. Device types and key computational parameters.
Table 5. Device types and key computational parameters.
Device TypeComputational
Architecture
Number of
Transistors
(Hundred Million)
Number of Stream
Processors
Memory
Capacity (MB)
Single-Precision Floating-Point (TFLOPS)Memory
Bandwidth (GB/s)
GPU-RTX3070Ampere7145888819220.37448
Device TypeNumber of Cores/ThreadsBase Frequency (GHz)Maximum Turbo Frequency (GHz)Level 3 Cache (MB)Integrated GraphicsTDP(W)
CPU-i7-1270012(8P + 4E)/202.14.925Intel UHD Graphics 77065
Table 6. GPU device types and key computational parameters.
Table 6. GPU device types and key computational parameters.
Device TypeComputational
Architecture
Number of
Transistors
(Hundred Million)
Number of Stream
Processors
Memory
Capacity (MB)
Single-Precision Floating-Point (TFLOPS)Memory
Bandwidth (GB/s)
NVIDIA GeForce
GPU-GTX1080
Pascal72256081928.89320
NVIDIA GeForce
GPU-RTX2080 SUPER
Turing1363072819211.15496
NVIDIA GeForce
GPU-RTX3070
Ampere7145888819220.37448
NVIDIA GeForce
GPU-RTX4090
Ada
Lovelace
76216,38424,576831008
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, R.; Hou, J.; Li, J.; Wang, T.; Imran, M. Study on Large-Scale Urban Water Distribution Network Computation Method Based on a GPU Framework. Water 2024, 16, 2642. https://doi.org/10.3390/w16182642

AMA Style

Zhang R, Hou J, Li J, Wang T, Imran M. Study on Large-Scale Urban Water Distribution Network Computation Method Based on a GPU Framework. Water. 2024; 16(18):2642. https://doi.org/10.3390/w16182642

Chicago/Turabian Style

Zhang, Rongbin, Jingming Hou, Jingsi Li, Tian Wang, and Muhammad Imran. 2024. "Study on Large-Scale Urban Water Distribution Network Computation Method Based on a GPU Framework" Water 16, no. 18: 2642. https://doi.org/10.3390/w16182642

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop