Next Article in Journal
The Evolution of Drought and Propagation Patterns from Meteorological Drought to Agricultural Drought in the Pearl River Basin
Previous Article in Journal
Spatio-Temporal Dynamics of the Lanalhue Lake Basin in South-Central Chile
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Fine-Grained Leakage Detection for Water Supply Pipelines Based on CNN and Selective State-Space Models

by
Niannian Wang
1,
Weiyi Du
1,*,
Hongjin Liu
1,
Kuankuan Zhang
1,
Yongbin Li
2,
Yanquan He
2 and
Zejun Han
3
1
Yellow River Laboratory, Zhengzhou University, Zhengzhou 450001, China
2
Guangxi New Development Transportation Group Co., Ltd., Nanning 530029, China
3
College of Civil and Transportation Engineering, Guangdong University of Technology, Guangzhou 510006, China
*
Author to whom correspondence should be addressed.
Water 2025, 17(8), 1115; https://doi.org/10.3390/w17081115
Submission received: 20 February 2025 / Revised: 28 March 2025 / Accepted: 7 April 2025 / Published: 9 April 2025
(This article belongs to the Section Urban Water Management)

Abstract

:
The water supply pipeline system is responsible for providing clean drinking water to residents, but pipeline leaks can lead to water resource wastage, increased operational costs, and safety hazards. To effectively detect the leakage level in the water supply pipelines and address the difficulty of accurately distinguishing fine-grained leakage levels using traditional methods, this paper proposes a fine-grained leakage identification method based on Convolutional Neural Networks (CNN) and the Selective State Space Model (Mamba). An experimental platform was built to simulate different leakage conditions, and multi-axis sensors were used to collect data, resulting in the creation of a high-quality dataset. The signals were converted into frequency-domain images using Short-Time Fourier Transform (STFT), and CNN was employed to extract image features. Mamba was integrated to capture the one-dimensional time dynamic characteristics of the leakage signal, and the CosFace loss function was introduced to increase the inter-class distance, thereby improving the fine-grained classification ability. Experimental results show that the proposed method achieves optimal performance across various evaluation metrics. Compared to SVM, BP neural networks, and CNN methods, the accuracy was improved by 17.9%, 15.9%, and 3.0%, respectively. Compared to Support Vector Machine (SVM), Backpropagation neural network (BP), attention mechanism with the LSTM network (LSTM-AM), CNN, and inverted transformers network (iTransformer) methods, the accuracy improved by 17.9%, 15.9%, 7.8%, 3.0%, and 2.3%, respectively. Additionally, the method enhanced intra-class consistency and increased inter-class differences, showing outstanding performance at different leakage levels, which could contribute to improved intelligent management for water pipeline leakage detection.

Graphical Abstract

1. Introduction

Urban water supply pipeline systems, as a critical component of modern urban infrastructure, are tasked with providing clean drinking water to residents. However, due to factors such as pipe aging, poor construction quality, and geological changes, water supply pipelines frequently experience leaks, leading to water resource waste, increased operational costs, and potential safety risks [1,2,3]. The economic losses caused by leaks amount to billions of yuan annually. In some developing countries, non-revenue water, primarily due to pipeline leaks, accounts for up to 40% of the total water supply [4].
To effectively detect leak points in water supply pipelines, researchers have proposed various methods. Traditional acoustic detection methods determine leak locations by listening for abnormal sounds within the pipes, but their accuracy is limited by external noise [5,6,7]. Transient Test-Based Techniques (TTBT) detect and localize leaks by analyzing the propagation and reflection characteristics of pressure waves in pipeline systems [8]. Additionally, flow analysis methods identify leaks by comparing flow data at different times. With advancements in sensor technology and data analysis, more advanced leak detection techniques, such as thermal imaging, fiber-optic sensing, and real-time monitoring systems integrated with the Internet of Things (IoT), have emerged [9,10]. However, variable pipe materials, complex environmental factors, and high costs still pose challenges for these technologies.
In recent years, deep learning has achieved significant success in areas such as image processing and speech recognition. Convolutional Neural Networks (CNNs) can automatically extract features and perform classification, offering a new approach to solving complex pattern recognition problems [11,12,13]. Candelieri et al. proposed a spectral clustering and SVM-based method for accurate water pipeline leak localization [14]. Shen and Cheng developed an intelligent identification method for WDS leak detection based on ML models such as decision trees and random forests [15]. Shukla and Piratla proposed an end-to-end learning method for simultaneous feature extraction and classification, and found that it has better detection performance when combined with support vector machines [16]. Yungyeong Shin integrated LSTM autoencoder with ensemble machine learning methods for detecting the time-series noise signals of sewer pipeline leaks [17]. Ahmad S. et al. conducted experimental research using evolutionary neural networks, employing acoustic imaging and CNN-based prediction for water pipeline leak detection [18]. Boujelben M. et al. discovered that embedding preprocessing tasks directly on the device can avoid transmitting large volumes of acoustic signals, thereby conserving node energy through the use of lightweight neural networks [19]. However, these algorithms did not identify or classify the degree of leakage in water supply pipelines. In existing CNN-based classification methods, the introduction of the fine-grained classification methods used in this algorithm innovatively improved the model’s ability to distinguish subtle differences, especially in complex environments such as those with multiple noises or signal overlaps. Increasing the class spacing helps to reduce the risk of misclassification. Through fine-grained classification, different degrees of pipeline leakage can be more accurately distinguished.
Following the introduction of the Transformer attention mechanism, it has been successfully applied in various fields [20,21]. Wang C. et al. proposed a two-stage acoustic signal-based pipeline leak monitoring framework that transforms input signals into Mel-spectrogram features as network inputs and incorporates spatial group enhancement effects at the spatial scale, demonstrating significant improvements over conventional neural network algorithms [22]. However, when the sequence is very long, capturing long-term dependencies becomes more difficult due to increased computational complexity, especially in noisy environments where the model is prone to losing sensitivity to long-term dependencies [23]. Selective State-Space Models (SSMs), also known as Mamba, have been widely applied in recent years as linear-complexity attention models, demonstrating significant advantages in capturing the dynamic characteristics of long-sequence systems [24].
This article uses the Mamba algorithm to efficiently model long-term dynamic dependencies in time series through a state space model. The dynamic system of its internal state space enables the model to capture signal features over a long time span, without being limited by computational bottlenecks like Transformers. In addition, Mamba selectively models by focusing only on key information in the sequence, thus better extracting and modeling the long-term variation patterns of leakage signals.
Accurately identifying the degree of pipeline leakage is crucial for improving safety. Traditional methods based on experimental mechanisms find it difficult to extract effective features and cope with complex operating conditions. Due to the limitations of models, machine learning and convolutional deep learning methods restrict the decoupling of complex variables and the utilization of global information. Based on the attention mechanism, the computational complexity increases when dealing with long sequence inputs, making it difficult to apply in real time for identifying the degree of pipeline leakage. There is a lack of research on the detection of leakage levels in water supply pipelines both domestically and internationally. The CNN and Mamba methods innovatively combine local feature extraction with dynamic modeling of long time series, overcoming the limitations of traditional convolutional networks in modeling long sequence dependencies and avoiding the high computational complexity problem in Transformers, resulting in higher efficiency and accuracy. Therefore, constructing a sound signal pipeline leak detection model based on CNN and Mamba, combined with a fine-grained classification method that increases class spacing, can solve several major problems in existing algorithms: the computational complexity problem when processing long sequence data, improving the ability to distinguish subtle leak types, and enhancing robustness in complex noise environments.
Therefore, this article first builds an experimental platform to simulate the working conditions of different leakage levels in actual production and uses multiple sensors to collect high-quality data. Next, we construct a convolutional neural network (CNN) and a selective state space model (Mamba), as well as establish a fine-grained classification method based on increasing class spacing to improve the ability to distinguish between different leakage levels. Finally, by comparing different models, visualizing the results, and analyzing different module components, the effectiveness of our method is comprehensively evaluated. In addition, the performance of the model under different noises is evaluated, comprehensively verifying the excellent generalization performance of the model.

2. Experimental Setup and Data Collection

2.1. Water Supply Pipeline Leakage Test Bench

In this study, field experiments were conducted based on actual leakage conditions in a water supply pipeline network to investigate the signal characteristics of different leakage sizes and to detect and identify leakage levels in the pipeline. As illustrated in Figure 1, a water supply pipeline leakage test bench was designed and constructed. The experimental platform comprises key modules, including a water pump, water supply pipelines, a water tank, and a computer terminal. Controllable leakage points were installed at various positions along the water supply pipeline. By adjusting valves, the actual leakage in the pipeline network was regulated to simulate different degrees of leakage that may occur in real-world scenarios.
The experimental platform selects a 50 mm diameter steel pipe water supply pipeline and simulates different degrees of leakage by setting controllable valves. Each valve can adjust the opening of the leakage to simulate the leakage size of different leakage sources. The maximum measurement range of the pipeline system is 159 m, and the spacing between valves is 27 m. By installing sensors on both sides of the pipeline, each collection point records 6 signals related to the operating condition. The design of the experimental platform ensures the simulation of different leakage sources and degrees, enabling us to study the signal characteristics under different leakage scenarios.
The data acquisition system is an important component of the experimental platform, responsible for the real-time collection of signals from various sensors in the experiment. The main collection equipment includes three-axis acceleration sensors, signal acquisition devices, and computer terminals.
The three-axis acceleration sensor installed on the outer surface of the water supply pipeline has a frequency response range of 0–5 kHz and a sensitivity of 50 mV/(m × s2), which can accurately capture small vibration changes. The resolution is 0.00004 g, ensuring the collection of subtle signal changes. The frequency of steel pipe leakage is mainly between 1 kHz and 2 kHz. In order to ensure signal integrity and the capture of high-frequency components, the sampling frequency was set to 5 kHz, which is sufficient to cover the frequency range of leakage signals within 1–2 kHz. This frequency setting meets the requirements of the Nyquist theorem and ensures the accuracy and reliability of data acquisition. Compared to single-axis acceleration sensors, three-axis acceleration sensors can more comprehensively capture the longitudinal, axial, and transverse vibration signals of pipelines, providing richer features. In addition, the experimental platform is also equipped with pressure sensors and data acquisition equipment. The acquisition instrument transmits the signals collected by the sensors in real time to the computer terminal for signal storage, visualization analysis, and post-processing. Nearly 3000 sets of data were collected by controlling the water supply pressure, leakage degree, distance between sensors, and location of the leakage source.

2.2. Data Acquisition and Processing

As shown in Figure 2, the experimental system is capable of simulating various levels of water supply pipeline leakage conditions, including no leakage, 1/8 leakage, 1/4 leakage, 1/2 leakage, and 3/4 leakage. In this experiment, signals were collected from the water supply pipeline using an accelerometer with a sampling frequency of 5 kHz. To enhance the generalization of the results, vibration signals were acquired under different working conditions, such as varying water pressure, sensor spacing, and leakage degrees.
During the process of collecting sensor signals, it is inevitable to encounter abnormal situations such as data loss caused by sensor failures. These abnormal data points may have a negative impact on the effectiveness of model training, so effective outlier handling is needed, using methods such as elimination and interpolation to correct outliers. In order to further enrich the diversity of the dataset and improve the robustness of the model, the data samples were enhanced. By implementing a time shift strategy, the signal changes at different starting times are simulated to provide the model with more diverse learning samples. At the same time, adding an appropriate amount of Gaussian noise to the original signal not only helps to increase the realistic representativeness of the data, but also effectively improves the model’s resistance to external interference factors.
To evaluate the generalization ability of the model, we divided the dataset into a training set, a validation set, and a testing set. The ratio of the training set, validation set, and test set is 6:2:2. This ensures that the model does not overfit during the training process and performs well on unseen data.
As shown in Figure 3, the data processing strategy for the training set is presented. To enhance the diversity of the training samples, slice partitioning is applied to the signals collected from each channel. During the sampling process, a sampling method with overlapping regions is adopted to further enrich the diversity of the samples. Subsequently, frequency domain transformation is performed on each slice sample, and the frequency domain image is ultimately combined with the original time domain signal as the input of the model.

3. Methodology

3.1. Overall Model Architecture

Due to the complex and ever-changing environment in which water supply pipelines are located, detection signals are susceptible to various types of noise interference. Leakage signals are usually weak and have nonlinear characteristics, making it difficult for traditional methods to accurately capture them. In addition, the processing of multi-channel acceleration sensor signals increases the difficulty of leak identification. The demand for real-time monitoring and rapid response also places higher demands on the stability and reliability of the model. In order to effectively integrate multi-channel vibration signals and improve the accuracy of water supply pipeline leakage identification, this paper proposes a fine-grained identification method for water supply pipeline leakage based on a convolutional neural network (CNN) and selective state space (Mamba).
Figure 4 shows the overall framework of the model. The entire framework consists of three main parts: the frequency domain image encoder module, time domain encoder module, and fine-grained classification module. Firstly, a Short Time Fourier Transform (STFT) was performed on each signal of the multi-channel to generate a time-frequency spectrum. Then, the time-frequency spectra of each acceleration sensor channel was concatenated together as the input to the frequency domain image encoder module. The image encoder adopted a Convolutional Neural Network (CNN), which can effectively extract features from images. The input of the time-domain encoder module was the original multi-channel one-dimensional signal. The features of the long sequence signals were extracted through a selective state space Mamba network. To further improve this effect, the features extracted by the frequency domain image encoder module were concatenated and fused with the features extracted by the time domain encoder module. Due to the small differences between signal categories with different degrees of leakage, fine-grained recognition is difficult. Therefore, the fused features were jointly input into the fine-grained classification module to increase the inter-class gap and reduce intra-class spacing, achieving the accurate recognition of different leakage levels.
As shown in Figure 5, the process of leak detection in water supply pipelines starts with data preprocessing, including data sampling and conversion to prepare the dataset. Then, to evaluate the generalization ability of the model, we divided the dataset into a training set, a validation set, and a testing set. The ratio of the training set, validation set, and test set was 6:2:2. This can ensure that the model does not overfit during the training process and can perform well on unseen data. Next, we constructed a hybrid model network that integrated CNN and Mamba to effectively capture frequency domain and time domain features. Subsequently, the model was trained using the training set and periodically checked to determine if the specified number of iterations had been reached or if the model had converged. Once the training criteria were met, the model weights were saved. The final step was to evaluate the model on the test set to assess its performance and ensure that the model can accurately detect leaks in the water supply pipeline.

3.2. A Hybrid Model Based on CNN and Mamba

The time-domain signals for leak detection in water supply pipelines have problems such as long sequences and difficulties in fusing multiple axis sensors. As shown in Figure 6, this paper extracted the features of frequency domain signals through a CNN network, and further combined it with the Mamba network to extract the features of one-dimensional long sequence signals in the time domain, thereby enhancing the feature extraction ability of the model.
The time-frequency spectrum can capture the subtle changes in the frequency domain of water supply pipeline leakage signals. In order to extract useful features from images and apply them to the recognition of the leakage degree, a convolutional neural network module based on MobileNet-V3 was established, which can effectively extract features from time-frequency spectrum images and improve the accuracy of water supply pipeline leakage degree recognition [25]. MobileNet-V3 is an efficient lightweight network architecture, consisting of main modules such as channel separable convolution, the channel attention mechanism, and residual connections, achieving a balance between accuracy and efficiency.
Channel-wise separable convolution decomposes the standard convolution into two parts: deep convolution and pointwise convolution, preserving representational power while significantly reducing computational and parameter complexity. The standard convolution is:
Y = X × K
where, X is the input feature map; K is a convolutional kernel. Channel separable convolution decomposition is:
Y = ( X × D ) × P
where, D is the convolution kernel in the depth channel direction; P is a pointwise convolution kernel.
Residual connections solve the problem of gradient vanishing in deep networks by skipping connections, improving the training stability and performance of the model. The channel attention mechanism introduces adaptive channel weights to enable the network to focus on important feature channels, thereby enhancing the network’s expressive power. The mathematical expression is as follows:
S = F a v g ( X )
Z = σ ( F F C ( S ) )
X o u t = X Z
where, F a v g represents global average pooling, F F C represents the fully connected layer, σ represents the activation function, Z represents the channel weight, and X o u t represents the output feature map.
In Table 1, the detailed structural parameters of the convolutional neural network MobileNet-V3 are presented. Here, SE indicates whether channel attention is present, and NBN indicates no batch normalization. In the nonlinearity category, HS stands for the h-swish activation function, while RE represents the ReLU activation function.
To enhance the feature extraction capability of multi-dimensional signals, the Mamba attention module is introduced based on CNN feature extraction. Mamba is a state-space-based model that exhibits excellent performance in long sequence modeling tasks and has linear computational complexity. By incorporating a global receptive field and dynamic weighting mechanism, Mamba effectively overcomes the limitations of traditional Recurrent Neural Networks (RNNs) in long sequence modeling, while also benefiting from the advantages of the Transformer attention mechanism. Compared to the Transformer, Mamba avoids increases in quadratic computational complexity, allowing it to handle long time-series feature data. This is crucial for complex time-domain feature extraction in water pipeline leakage detection.
Mamba integrates multi-level operations, including linear projection, convolution, activation function, the Selective State Space Model (SSM), and residual concatenation. The design of SSMs allows the model to selectively update state variables over long time series, thereby reducing computational complexity and improving the accuracy of information capture. Then, the SSM layer and MLP are merged to establish temporal feature mixing and channel mixing. The introduction of residual connections ensures the effective transfer of features between layers. Mamba is based on a state space model, derived from a continuous system. The system maps a univariate sequence to an output sequence through implicit intermediate states. The input to the Mamba layer is a tensor of the dimensions batch_size by sequence_length by feature_dim, which is subsequently processed by the Mamba module. At the output layer, a normalization layer and a Feed-Forward Network (FFN) layer are used to normalize and map the output representation of the Mamba module. Combined with residual connections, this enhances the convergence and stability of the model. The system is defined as:
h ( t ) = A h ( t ) + B x ( t )
y ( t ) = C h ( t )
where, A R N × N represents the state transition matrix; B R N × 1 represents the weight matrix of the input space relative to the hidden state; C R N × 1 is the observation matrix that maps hidden intermediate states to the output.
Mamba integrates continuous systems into deep learning architectures by using fixed discretization rules to convert parameters A and B into discrete counterparts. Mamba effectively enhances the ability to capture and model long-term sequence features in the leak detection of water supply pipelines.
To better extract the time-varying frequency characteristics from the signal, it is necessary to convert the signal from the one-dimensional time domain to the two-dimensional time-frequency domain. The Short-Time Fourier Transform (STFT) method is a method used to generate the time-frequency spectrogram of a signal. The STFT method divides a long signal into shorter time segments and performs a Fourier transform on each segment, thereby representing the signal’s distribution in both time and frequency. For a given discrete-time signal x n , the STFT can be expressed as:
X ( m , k ) = n 0 N 1 x n + m R w n e j 2 π k n / N
where, w n is the window function; m is the time index; k is the frequency index; n is the length of the window function; R is the number of overlapping samples between adjacent windows; and X ( m , k ) is the result of STFT.
As shown in Figure 7, a two-dimensional time-frequency spectrum was established based on STFT. The horizontal axis represents time, the vertical axis represents frequency, and the color values represent power spectral density. When the signal strength changes significantly, using decibels to indicate that the strength has increased the dynamic range of the signal, the conversion is as follows:
S d B ( m , k ) = 10 log 10 ( X ( m , k ) 2 )
where, X ( m , k ) 2 is the square modulus of the STFT result, which is the power spectral density.

3.3. Fine-Grained Leakage Degree Recognition Method Based on Increasing Class Spacing

SoftMax loss, commonly referred to as cross entropy loss in the classification context, is commonly used for leak detection in water supply pipelines. It first applies the SoftMax function to the raw outputs (logits) of the network, converting them into probabilities. The SoftMax function is defined as:
p i = e z i j = 1 c e Z j
where, Z i is the non-regularized logit of class i, and C is the total number of classes. After obtaining the probability distribution, the cross-entropy loss is calculated as follows:
L = i = 1 C y i log ( p i )
where, y i is a binary indicator of the observed value, p i is the predicted probability that the observed value belongs to class i.
SoftMax loss pulls the sampling points in the feature space closer to their respective class centers and extends different classes around the origin [26]. However, SoftMax loss alone may not be sufficient to effectively distinguish highly similar categories. Therefore, this article uses the CosFace loss function to achieve fine-grained leak recognition.
The CosFace loss introduces a margin m in cosine similarity, which can be formulated as:
L = log ( e s , cos ( θ y + m ) e s ( cos ( θ y + m ) + i y e s cos ( θ i ) )
where, s is the scaling factor, θ y is the angle between the feature vector and weight vector of the correct class, and θ i is the angle relative to other classes. By subtracting the margin value m in the cosine space, the adjusted cosine similarity is reduced and the angle between the sample and its class center is increased. This encourages the model to further separate different class representations during training, significantly increasing the inter-class distance.
As shown in Figure 8, the margin in CosFace increases the angular distance between different categories, thereby enhancing the separability of categories compared to traditional SoftMax loss. As the training converges, sample points gather more tightly around their respective class centers, while different class points are more clearly separated, thereby improving the sensitivity and overall discriminative ability of the model to subtle differences in leakage degree.

3.4. Evaluation Metrics

In order to comprehensively evaluate the performance of the model, multiple evaluation metrics were used, including accuracy, precision, recall, and F1 score. Accuracy represents the proportion of correct predictions among all predictions made. Precision refers to the accuracy of a model’s prediction of positive class samples. The recall rate focuses on the proportion of successfully identified samples in the actual positive category. The F1 score is a comprehensive evaluation index of accuracy and recall. The calculation formulas for each indicator are as follows:
A c c u r a c y = T P + T N T P + T N + F P + F N
P r e c i s i o n = T P T P + T N
R e c a l l = T P T P + F N
where, TP represents the number of actual positive classes that have been correctly predicted as positive classes; TN represents the actual number of negative classes that have been correctly predicted as negative classes; FP represents the number of individuals who are actually negative but incorrectly predicted as positive; FN represents the number of instances where a positive class is actually present but incorrectly predicted as a negative class.

4. Results and Discussion

This experiment is based on collecting data from a constructed experimental platform and conducting data training and evaluation to verify the effectiveness of the model in identifying the degree of leakage in water supply pipelines. The model training uses a 3090 GPU graphics card (NVIDIA Corporation, Santa Clara, CA, USA) and PyTorch (version 2.2, Meta Platforms, Inc., Menlo Park, CA, USA) framework. The learning rate of the training process was 0.0001, the training batch size was 16, the cosine learning rate plan was used, the cycle was three, the multiplier was two, and the total number of training iterations was 100.

4.1. Analysis of Experimental Results

Figure 9 shows the changes in loss values and validation accuracy during the iterative training process. As the training steps increase, the loss value continuously decreases, indicating that the model gradually learns the mapping relationship between the degree of water supply pipeline leakage and the input signal features. Meanwhile, the recognition accuracy of the validation dataset steadily improves and ultimately stabilizes. This trend indicates that the model not only effectively fits the training data, but also demonstrates good generalization ability to some extent.
The classification performance of the water supply pipeline leakage detection model was analyzed in detail using a confusion matrix. The confusion matrix displays the correspondence between the predicted results of different categories and the actual labels. Tags 1–5 correspond to no leakage, 1/8 leakage, 1/4 leakage, 1/2 leakage, and 3/4 leakage, respectively. As shown in Figure 10, the model performs well at the 1/8 leakage level, with an accuracy of 100%. This indicates that the model can effectively alert to leakage situations. At the 1/2 leakage level, the model exhibits some misclassification, with some 1/2 leakage samples incorrectly predicted as 1/4 leakage and some 1/4 leakage samples misclassified as 3/4 leakage. The performance of this model varies in accuracy at different leakage levels.
The ablation study on leak detection in water supply pipelines systematically analyzed the impact of each module on leak detection performance through variable control methods. This experiment evaluates the contribution of specific components to overall performance by sequentially deleting or modifying certain parts of the model. We compared the performance of using only the MobileNetV3 module, only the Mamba module, and a combination of the two to comprehensively understand the role of each module.
As shown in Table 2, the ablation study evaluated the effects of different components. When using MobileNetV3 alone, the accuracy was 0.960, precision was 0.976, recall was 0.952, and F1 score was 0.964. When MobileNetV3 was combined with Mamba, all evaluation metrics significantly improved. The accuracy of the combined model reached 0.975, indicating that the synergistic effect between the two modules effectively improved overall performance. In addition, the introduction of the CosFace fine-grained classification module further improved all metrics, achieving an accuracy of 0.983. These results demonstrate the contribution of each component to improving recognition performance, and the integration of these modules enhances the model’s ability to identify the degree of water supply pipeline leakage. The ablation study on leak detection in water supply pipelines systematically analyzed the impact of each module on leak detection performance through variable control methods. This experiment evaluated the contribution of specific components to overall performance by sequentially deleting or modifying certain parts of the model. We compared the performance of using only the MobileNetV3 module, only the Mamba module, and a combination of the two to comprehensively understand the role of each module.
As shown in Table 2, the ablation study evaluated the effects of different components. When using MobileNetV3 alone, the accuracy was 0.960, precision was 0.976, recall was 0.952, and F1 score was 0.964. When MobileNetV3 is combined with Mamba, all evaluation metrics will significantly improve. The accuracy of the combined model reached 0.975, indicating that the synergistic effect between the two modules effectively improved overall performance. In addition, the introduction of the CosFace fine-grained classification module further improved all metrics, achieving an accuracy of 0.983. These results demonstrate the contribution of each component to improving recognition performance, and that the integration of these modules enhances the model’s ability to identify the degree of water supply pipeline leakage.
In order to visualize the intrinsic structure of water supply pipeline leakage data and its distinguishability at different leakage levels, t-SNE (t-distributed random neighbor embedding) was used to reduce the dimensionality of high-dimensional features and to visualize them.
Figure 11 shows the distribution of different leakage levels in two-dimensional space, with and without fine-grained identification methods. The color codes are as follows: blue indicates no leakage, pink indicates 1/8 leakage, green indicates 1/4 leakage, red indicates 1/2 leakage, and yellow indicates 3/4 leakage. In Figure 11a, without a fine-grained recognition method based on increasing class separation, there is a significant overlap between the 1/2 leak and 1/4 leak levels, reflecting the high similarity between these two levels, which makes it difficult to distinguish them.
In Figure 11b, after introducing the CosFace fine-grained recognition method, data points with different leakage levels formed relatively clear clusters. Data points within the same class were more tightly clustered together, and the separation between different classes was also more pronounced. This distribution indicates that the feature differences between different leakage levels had been effectively enhanced, improving the distinguishability of these categories. Therefore, the fine-grained recognition method based on increasing class separation significantly improves the recognition accuracy between fine-grained leak levels.

4.2. Comparative Analysis of Different Methods

Table 3 compares the proposed model with the SVM, Backpropagation (BP) neural network, the attention mechanism with the LSTM network (LSTM-AM) [27], inverted transformers network (iTransformer) [28], and CNN models, using accuracy, precision, recall, and F1 score as metrics. The SVM and BP neural networks use statistical methods to extract time-domain and frequency-domain features as inputs, while the CNN is a one-dimensional convolutional network with raw time-domain signals as its input. LSTM-AM and iTransformer also use time-domain signals as their input.
The results show that traditional machine learning methods, which rely on manually designed features, are prone to information loss. Due to the small distinction between different leak degree categories and strong signal coupling, the feature extraction capability of one-dimensional convolutional networks is limited, leading to insufficient recognition performance. SVM and BP neural networks perform similarly, but their performance is constrained by the quality of manual feature engineering, resulting in inferior overall performance compared to deep learning methods. LSTM-AM enhances the understanding of sequential data by introducing an attention mechanism, thereby outperforming traditional SVM and BP neural networks across all evaluation metrics, demonstrating its advantages in handling long sequences and complex patterns. iTransformer shows great potential in processing time-series data, surpassing CNN in performance but still falling slightly behind the proposed method in terms of accuracy, precision, recall, and F1 score. The proposed model significantly outperforms existing methods, achieving the best results across all evaluation metrics with an accuracy of 98.3%. Compared to SVM, BP, LSTM-AM, CNN and iTransformer, the accuracy improved by 17.9%, 15.9%, 7.8%, 3.0%, and 2.3%, respectively.
As shown in Table 4, there are significant differences in the performance of different methods across various classes. Our method performs exceptionally well in all classes, achieving perfect accuracy in Class 2 and maintaining high accuracy in all other classes. In contrast, although CNN performs well overall, its accuracy in Class 4 is slightly lower. LSTM-AM and iTransformer demonstrate balanced and relatively high accuracy across all classes but still fall short of our method in terms of overall performance. SVM shows the most variability among the different classes, with notably lower accuracy in Class 4. Overall, our method demonstrates outstanding performance across all classes, validating its effectiveness and superiority in practical applications and highlighting its powerful capability in handling complex patterns and fine-grained classification tasks.
As shown in Figure 12, the impact of different input window lengths on model performance is illustrated. It is observed that as the sampling length increases, the accuracy of the model shows an upward trend. Specifically, when the sampling length is short, the accuracy of the prediction results significantly decreases, indicating the poor reliability of the model output. When the sampling length reaches 5000 data points, which is equivalent to a 1 s time window, the model achieves near optimal accuracy. Further increases in sampling length will not lead to a significant improvement in accuracy. It is worth noting that a longer input window can lead to increased processing latency, which may affect the real-time response capability of the water leakage detection system. In order to meet the requirements of practical applications, it is necessary to balance high precision and low latency.
In the real environment, the vibration signal of water supply pipelines is often affected by different levels of environmental noise. To evaluate the robustness of the proposed model, we added Gaussian white noise with different signal-to-noise ratios (SNRs) to the test set and conducted further testing. As shown in Figure 13, with increases in the noise ratio, the recognition accuracy of traditional methods significantly decreases. In contrast, the proposed method shows a smaller decrease in accuracy in the presence of increased noise. This result indicates that the proposed method has stronger noise resistance and better generalization ability.

5. Conclusions

This article proposes a novel fine-grained identification method for water supply pipeline leaks based on Convolutional Neural Networks (CNN) and Selective State Space Models (Mamba), which significantly improves the accuracy and robustness of leak detection. The main contributions are as follows:
(1)
In this study, we developed an experimental setup to simulate various leakage conditions and collected a high-quality dataset using multi-axis sensors, encompassing a wide range of leakage levels and environmental conditions. We proposed a hybrid model that integrates Convolutional Neural Networks (CNN) with Mamba, effectively combining frequency-domain image features with one-dimensional long-sequence time-dynamic characteristics. This approach addresses the challenge of capturing weak signals and non-linear features, which traditional methods often fail to detect. Experimental results show that our proposed method significantly outperforms existing techniques. Compared to SVM, BP, LSTM-AM, CNN, and iTransformer, the accuracy improved by 17.9%, 15.9%, 7.8%, 3.0%, and 2.3%, respectively.
(2)
The impact of different factors on the experimental results was analyzed. It was found that increasing the input window length up to 5000 data points improved model accuracy; however, further increases did not yield significant benefits and could introduce processing delays. The proposed method exhibits greater robustness in noisy environments, showing a smaller decrease in accuracy compared to traditional methods when subjected to varying Signal-to-Noise Ratios.
(3)
To enhance the distinguishability between different leakage levels, this study introduced a fine-grained classification method for water supply pipeline leak levels based on CosFace, which enhances inter-class distances and improves distinguishability between different leakage levels. t-SNE visualization confirms that after incorporating CosFace, data points for different leakage levels form clearer and more distinct clusters in two-dimensional space. Additionally, by removing or modifying specific parts of the model, we systematically analyzed the contributions of each component. The results showed that using MobileNetV3 alone achieved an accuracy of 0.960, which improved to 0.975 when combined with Mamba, and further integrating CosFace raised the accuracy to 0.983. These findings highlight the significant role of each component in enhancing overall recognition performance and underscore the effectiveness of our integrated approach.
Discussion on future work and limitations. The experimental bench supports functions such as leakage signal data collection, experiments with different levels of leakage, leakage experiments at different distances, and sensor signal transmission. Due to experimental limitations, it is challenging to fully replicate the actual underground operational conditions of water pipes. Future work will further explore the collection and analysis of vibration signals from underground water pipes made of different materials.

Author Contributions

Methodology, N.W.; Resources, K.Z. and Y.L.; Data curation, H.L.; Writing—original draft, W.D.; Writing—review & editing, W.D.; Supervision, Z.H.; Project administration, Y.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key R&D Program of China (No. 2022YFC3801000), the National Natural Science Foundation of China (No. 52479111 and No. 52108289), the Natural Science Foundation of Henan Province (No. 242300421065), the Program for Innovative Research Team (in Science and Technology) in University of Henan Province (No. 23IRTSTHN004), the Program for Science and Technology Innovation Talents in Universities of Henan Province (No. 23HASTIT006).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

Authors Yongbin Li and Yanquan He were employed by the company Guangxi New Development Transportation Group Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. El-Zahab, S.; Zayed, T. Leak detection in water distribution networks: An introductory overview. Smart Water 2019, 4, 5. [Google Scholar] [CrossRef]
  2. Khulief, Y.A.; Khalifa, A.; Mansour, R.B.; Habib, M.A. Acoustic detection of leaks in water pipelines using measurements inside pipe. J. Pipeline Syst. Eng. Pract. 2012, 3, 47–54. [Google Scholar] [CrossRef]
  3. Korlapati, N.V.; Khan, F.; Noor, Q.; Mirza, S.; Vaddiraju, S. Review and analysis of pipeline leak detection methods. J. Pipeline Sci. Eng. 2022, 2, 100074. [Google Scholar] [CrossRef]
  4. Fan, H.; Tariq, S.; Zayed, T. Acoustic leak detection approaches for water pipelines. Autom. Constr. 2022, 138, 104226. [Google Scholar] [CrossRef]
  5. Lai, W.W.; Lau, T.C.; Cheng, A.Y.; Liu, P.K.; Leung, J.W. Outcome-Based blind tests for leakage diagnosis in underground watermains by acoustic technologies. Tunn. Undergr. Space Technol. 2023, 142, 105413. [Google Scholar]
  6. Rizzo, P. Water and wastewater pipe nondestructive evaluation and health monitoring: A review. Adv. Civ. Eng. 2010, 2010, 818597. [Google Scholar] [CrossRef]
  7. Deshpande, A.; Kalikate, S.; Ranade, N.; Diwanji, A.; Pawar, J.; Kshirsagar, K. Water leakage detection system. Int. J. Eng. Manag. Res. 2022, 12, 259–263. [Google Scholar]
  8. Brunone, B.; Maietta, F.; Capponi, C.; Keramat, A.; Meniconi, S. A review of physical experiments for leak detection in water pipes through transient tests for addressing future research. J. Hydraul. Res. 2022, 60, 894–906. [Google Scholar] [CrossRef]
  9. Islam, M.R.; Azam, S.; Shanmugam, B.; Mathur, D. An intelligent IoT and ML-based water leakage detection system. IEEE Access 2023, 11, 123625–123649. [Google Scholar] [CrossRef]
  10. Banjara, N.K.; Sasmal, S.; Voggu, S. Machine learning supported acoustic emission technique for leakage detection in pipelines. Int. J. Press. Vessel. Pip. 2020, 188, 104243. [Google Scholar] [CrossRef]
  11. Yussif, A.M.; Sadeghi, H.; Zayed, T. Application of machine learning for leak localization in water supply networks. Buildings 2023, 13, 849. [Google Scholar] [CrossRef]
  12. Hu, X.; Han, Y.; Yu, B.; Geng, Z.; Fan, J. Novel leakage detection and water loss management of urban water supply network using multiscale neural networks. J. Clean. Prod. 2021, 278, 123611. [Google Scholar] [CrossRef]
  13. Boaventura, O.D.; Proença, M.S.; Obata, D.H.; Paschoalini, A.T. Convolutional neural network for leak location in buried pipes of underground water supply. J. Braz. Soc. Mech. Sci. Eng. 2024, 46, 352. [Google Scholar] [CrossRef]
  14. Candelieri, A.; Soldi, D.; Conti, D.; Archetti, F. Analytical leakages localization in water distribution networks through spectral clustering and support vector machines. The icewater approach. Procedia Eng. 2014, 89, 1080–1088. [Google Scholar] [CrossRef]
  15. Shen, Y.; Cheng, W. A Tree-Based Machine Learning Method for Pipeline Leakage Detection. Water 2022, 14, 2833. [Google Scholar] [CrossRef]
  16. Shukla, H.; Piratla, K.R. Leakage Detection in Water Distribution Network Using Machine Learning. Pipelines 2023, 192–200. [Google Scholar]
  17. Shin, Y.; Na, K.Y.; Kim, S.E.; Kyung, E.J.; Choi, H.G.; Jeong, J. LSTM-Autoencoder Based Detection of Time-Series Noise Signals for Water Supply and Sewer Pipe Leakages. Water 2024, 16, 2631. [Google Scholar] [CrossRef]
  18. Ahmad, S.; Ahmad, Z.; Kim, C.H.; Kim, J.M. A method for pipeline leak detection based on acoustic imaging and deep learning. Sensors 2022, 22, 1562. [Google Scholar] [CrossRef] [PubMed]
  19. Boujelben, M.; Benmessaoud, Z.; Abid, M.; Elleuchi, M. An efficient system for water leak detection and localization based on IoT and lightweight deep learning. Internet Things 2023, 24, 100995. [Google Scholar] [CrossRef]
  20. McMillan, L.; Fayaz, J.; Varga, L. Machine-Learning-Based Health Monitoring and Leakage Management of Water Distribution Systems. In Proceedings of the International Conference on Evolving Cities, Online, 12–14 September 2023; Evolving Cities Publications: Southampton, UK, 2023; Volume 2022. [Google Scholar]
  21. Nam, Y.W.; Arai, Y.; Kunizane, T.; Koizumi, A. Water leak detection based on convolutional neural network using actual leak sounds and the hold-out method. Water Supply 2021, 21, 3477–3485. [Google Scholar] [CrossRef]
  22. Wang, C.; Chen, X.; Xu, Y.; Yan, W.; Yang, Y.; Shao, Y.; Yu, T. A two-stage leak monitoring framework for water distribution networks based on acoustic signals. Mech. Syst. Signal Process. 2025, 225, 112275. [Google Scholar] [CrossRef]
  23. Cody, R.A.; Dey, P.; Narasimhan, S. Linear prediction for leak detection in water distribution networks. J. Pipeline Syst. Eng. Pract. 2020, 11, 04019043. [Google Scholar] [CrossRef]
  24. Bick, A.; Li, K.; Xing, E.; Kolter, J.Z.; Gu, A. Transformers to ssms: Distilling quadratic knowledge to subquadratic models. Adv. Neural Inf. Process. Syst. 2024, 37, 31788–31812. [Google Scholar]
  25. Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; IEEE: Piscataway, NJ, USA, 2019. [Google Scholar]
  26. Wang, H.; Wang, Y.; Zhou, Z.; Ji, X.; Gong, D.; Zhou, J.; Li, Z.; Liu, W. Cosface: Large margin cosine loss for deep face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
  27. Kammoun, M.; Kammoun, A.; Abid, M. LSTM-AE-WLDL: Unsupervised LSTM auto-encoders for leak detection and location in water distribution networks. Water Resour. Manag. 2023, 37, 731–746. [Google Scholar] [CrossRef]
  28. Zou, Y.; Chen, Y.; Xu, Y.; Zhang, H.; Zhang, S. Short-term freeway traffic speed multistep prediction using an iTransformer model. Phys. A Stat. Mech. Its Appl. 2024, 655, 130185. [Google Scholar] [CrossRef]
Figure 1. Water Supply Pipeline Leakage Test Bench.
Figure 1. Water Supply Pipeline Leakage Test Bench.
Water 17 01115 g001
Figure 2. Different leakage level conditions of the water supply pipeline: (a) 1/8 leakage; (b) 1/4 leakage; (c) 1/2 leakage; (d) 3/4 leakage.
Figure 2. Different leakage level conditions of the water supply pipeline: (a) 1/8 leakage; (b) 1/4 leakage; (c) 1/2 leakage; (d) 3/4 leakage.
Water 17 01115 g002
Figure 3. Data processing strategy for the training set.
Figure 3. Data processing strategy for the training set.
Water 17 01115 g003
Figure 4. Overall framework of the model.
Figure 4. Overall framework of the model.
Water 17 01115 g004
Figure 5. The overall training and testing process flow.
Figure 5. The overall training and testing process flow.
Water 17 01115 g005
Figure 6. Efficient lightweight CNN and Mamba network.
Figure 6. Efficient lightweight CNN and Mamba network.
Water 17 01115 g006
Figure 7. Time-frequency spectrogram generated by STFT.
Figure 7. Time-frequency spectrogram generated by STFT.
Water 17 01115 g007
Figure 8. The principle of CosFace in increasing inter-class distances.
Figure 8. The principle of CosFace in increasing inter-class distances.
Water 17 01115 g008
Figure 9. Loss and accuracy curves at different iteration steps.
Figure 9. Loss and accuracy curves at different iteration steps.
Water 17 01115 g009
Figure 10. Confusion matrix for pipeline leak detection.
Figure 10. Confusion matrix for pipeline leak detection.
Water 17 01115 g010
Figure 11. Visualization analysis of feature dimensionality reduction.
Figure 11. Visualization analysis of feature dimensionality reduction.
Water 17 01115 g011
Figure 12. Impact of different input window lengths on model performance.
Figure 12. Impact of different input window lengths on model performance.
Water 17 01115 g012
Figure 13. Model performance at different noise ratios.
Figure 13. Model performance at different noise ratios.
Water 17 01115 g013
Table 1. Detailed structural parameters of Convolutional Neural Network MobileNet-V3.
Table 1. Detailed structural parameters of Convolutional Neural Network MobileNet-V3.
InputOperatorNumber of Expansion FiltersNumber of
Projection Filters
SE ModuleNonlinearity Types
2242 × 3conv2d, 3 × 3/16/HS
1122 × 16bneck, 3 × 31616RE
562 × 16bneck, 3 × 37224/RE
282 × 24bneck, 3 × 38824/RE
282 × 24bneck, 5 × 59640HS
142 × 40bneck, 5 × 524040HS
142 × 40bneck, 5 × 524040HS
142 × 40bneck, 5 × 512048HS
142 × 48bneck, 5 × 514448HS
142 × 48bneck, 5 × 528896HS
72 × 96bneck, 5 × 557696HS
72 × 96bneck, 5 × 557696HS
72 × 96onv2d, 1 × 1/576HS
72 × 576pool, 7 × 7////
12 × 576conv2d, 1 × 1, NBN/1280/HS
12 × 1280conv2d, 1 × 1, NBN/k//
Table 2. Ablation Study Results.
Table 2. Ablation Study Results.
MethodAccuracyPrecisionRecallF1 Score
MobileNet30.9600.9760.9520.964
Mamba0.9550.9530.9440.948
MobileNet3 + Mamba0.9750.9780.9720.975
MobileNet3 + Mamba + CosFace0.9830.9840.9830.984
Table 3. Comparison of Different Methods.
Table 3. Comparison of Different Methods.
MethodAccuracyPrecisionRecallF1 Score
SVM0.8040.8280.7920.810
BP0.8240.8310.8060.818
LSTM-AM0.9050.9130.9350.924
CNN0.9530.9620.9470.954
iTransformer0.9600.9570.9590.958
Our0.9830.9840.9830.984
Table 4. Performance of Different Methods in Each Class.
Table 4. Performance of Different Methods in Each Class.
MethodClass 1Class 2Class 3Class 4Class 5
SVM0.7910.8170.8030.7890.804
BP0.8110.8280.8230.8210.822
LSTM-AM0.9020.9120.9090.9170.926
CNN0.9610.9470.9530.9490.971
iTransformer0.9580.9610.9570.9590.963
Our0.9911.0000.9920.9920.982
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, N.; Du, W.; Liu, H.; Zhang, K.; Li, Y.; He, Y.; Han, Z. Fine-Grained Leakage Detection for Water Supply Pipelines Based on CNN and Selective State-Space Models. Water 2025, 17, 1115. https://doi.org/10.3390/w17081115

AMA Style

Wang N, Du W, Liu H, Zhang K, Li Y, He Y, Han Z. Fine-Grained Leakage Detection for Water Supply Pipelines Based on CNN and Selective State-Space Models. Water. 2025; 17(8):1115. https://doi.org/10.3390/w17081115

Chicago/Turabian Style

Wang, Niannian, Weiyi Du, Hongjin Liu, Kuankuan Zhang, Yongbin Li, Yanquan He, and Zejun Han. 2025. "Fine-Grained Leakage Detection for Water Supply Pipelines Based on CNN and Selective State-Space Models" Water 17, no. 8: 1115. https://doi.org/10.3390/w17081115

APA Style

Wang, N., Du, W., Liu, H., Zhang, K., Li, Y., He, Y., & Han, Z. (2025). Fine-Grained Leakage Detection for Water Supply Pipelines Based on CNN and Selective State-Space Models. Water, 17(8), 1115. https://doi.org/10.3390/w17081115

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop