1. Introduction
Pipelines are the primary mechanical component required for long-distance liquid and gas material distribution and transportation. Pipeline networks, therefore, need unwavering quality, high safety levels, and efficiency. The current status of a pipeline network involves colossal annual leakage rates and the corresponding waste of natural resources. Pipeline leaks may have detrimental effects on the environment, human safety, property, and reputation and additionally lead to financial losses from fines and cleanup expenses. BenSaleh et al. [
1] noted that it is well-documented that many countries rely heavily on long-distance oil and water transportation from desalination plants to their intended destinations. Unfortunately, significant quantities of these resources are lost annually due to leaks in the pipelines, with an estimated 60% of water being wasted each year due to pipeline leaks [
2]. For pipeline operators to achieve optimal performance, a reliable leak detection system (LDS) is imperative. An influential LDS should be able to detect leaks promptly, provide accurate leak localization, minimize false alarms, be easy to retrofit, function well under various operating conditions, and utilize sensors with high dependability and low maintenance requirements. This article proposes an LDS using acoustic emission (AE) and machine learning (ML) algorithms.
Pipelines, as a means of long-distance distribution and transportation, must meet stringent safety standards and maintain consistent efficiency and quality. However, monitoring these pipelines over long distances poses challenges due to the difficulties in maintaining the infrastructure. Therefore, much research has been conducted to develop robust and reliable methods for detecting spills, explosions, and other anomalies in the pipeline infrastructure. These leaks can result in severe environmental damage, financial loss, and even loss of life [
3]. Wang et al. [
4] discussed the ever-increasing energy needed to support oil distribution, pipeline distribution topological complexity, and the real-time assessment of distribution network safety. Korlapati et al. [
5] surveyed and classified leakage detection into three broad categories: visual inspections, internally determined/computational, and externally based methods and techniques. Fiber optic cable-based leakage detection in pipelines is carried out by laying down the cable alongside the pipes, and changes in strain and temperature are observed. By analyzing Raman scattering, the optical fiber cable can provide a means for measuring the temperature [
6]. Although optical fiber cable can monitor changes in temperature in various locations, it fails both in monitoring small leakages and determining the exact location of a leakage [
7,
8]. Christos et al. [
9,
10] proposed a low-energy and low-cost wireless sensor-based system to immediately detect leakage in metallic pipes. They monitor the changes in the effects of the vibrational signal appearing due to leakage on the pipeline walls. The leakage detection system is proposed by installing the pressure sensor in the middle of the pipeline segment [
11]. It is more sensitive to detecting leaks that emerge far away, such as at the inlet and outlet. Continuous wavelet transform is used to transform the time AE signal into an image and then the convolutional neural network is applied to detect the leakage [
12]. In summary, several internal and external monitoring methods have been developed to detect pipeline leaks. These include the use of negative pressure wave techniques [
13], techniques based on accelerometer [
14], time-domain reflectometry [
15], distributed temperature sensing systems [
16], acoustic emission technology [
17], ultrasonic technology [
18], and magnetic flux leakage techniques [
19]. Among these, acoustic emission technology has gained significant popularity for its ability to quickly detect leaks, with real-time responses, high sensitivity, and ease of retrofit [
20,
21]. Research in this area has focused on using pattern recognition and feature extraction techniques to construct leak detection models [
22]. Studies have demonstrated the effectiveness of techniques, such as wavelet feature extraction and support vector machine classification, for identifying leaks and using frequency-width characteristics to train leak-detection support vector data-description models from time-domain pipeline signals [
23]. Claudia et al. [
24] summarized the AE descriptor’s different applications for damage analysis in fiber-reinforced plastics. They analyzed the amplitude, frequency, and cumulative acoustic energy regarding fiber-reinforced plastic damage, crack analysis, and crack propagation.
ML is a subset of artificial intelligence, namely algorithms that improve through previous data records and experience [
25,
26]. The algorithms focus on building mathematical models using training data or sample data to make decisions or predictions without explicit programming [
27]. When properly executed, machine learning can enable tasks to be automated at a breakneck pace. As such, it is critical to integrate real-world data with artificial intelligence in fields that require rapid and precise detection, such as pipeline leak incidents. El-Zahab et al. [
28] proposed a system that utilizes accelerometer-based monitoring for pressurized water pipelines. The experimental data were analyzed using three machine learning algorithms: support vector machines, decision trees, and naive Bayes. The proposed system demonstrated its effectiveness in accurately detecting leakage events in pressurized water pipelines. Different machine learning algorithms, such as decision trees, random forests, k-nearest neighbors, and neural networks, have been applied to further enhance detection capabilities for analyzing the collected data. Overall, the above research works performed better for leak detection and size identification. However, there exist some shortcomings. Traditional AE hit features can be extracted from the AE signal by defining a threshold above the level of continuous background noise. However, the predefined threshold for extracting the AE features can lead to false alarms due to noise in the AE signal. Furthermore, defining a threshold above the level of continuous background noise requires human expertise and domain understanding. Additionally, the type of transportation medium will also affect the AE hits in the signal. In order to address the above-mentioned problems, it is of primary importance to develop a leak-sensitive model for pipeline leak detection and size identification. As such, in this manuscript, an attempt has been made to develop a leak-sensitive model for pipeline leak detection and size identification with different transportation mediums, such as fluid and gas. Instead of utilizing a predetermined threshold for AE feature extraction, in this paper, a sliding window is used. In order to exploit the statistical changes in the AE signal due to the defects in the pipeline, statistical indicators, such as kurtosis, skewness, mean value, RMS, peak value, standard deviation, and entropy, are calculated from each sliding window. Furthermore, the changes in the frequency spectrum due to the defect in the pipeline are utilized by calculating spectral features from each sliding window. Additionally, a set of classification models were tested and validated for pipeline leak detection and size identification by considering two different transportation mediums, out of which the best classification model is reported in this study.
The overall novelty and contribution of this work can be summarized as follows:
- (i)
In order to exploit the statistical changes in the AE signal due to the defects in the pipeline, a sliding window is used, and from each sliding window, temporal statistical indicators are calculated. Furthermore, the changes in the frequency spectrum due to the defect in the pipeline are utilized by calculating the spectral features from each sliding window. To the best of our knowledge, utilizing a sliding window to extract statistical and spectral indicators from the AE signal is reported for the first time in this work;
- (ii)
A pipeline health-sensitive classification model is reported in this study based on evaluating different classification models for pipeline leak detection and size identification by considering two different transportation mediums, such as fluid and gas;
- (iii)
Real-world industrial fluid pipeline data were utilized in this study for leak detection and size identification using machine learning algorithms.
Overall, the proposed platform achieved an exceptional overall classification accuracy of 99%, which makes it a reliable and effective solution for pipeline leakage detection and leak pinhole size identification.
The following sections make up the structure of the paper.
Section 2 proposes the architecture, methodology, and ML algorithms for leakage detection. The results and pipeline experimental test rig are presented in
Section 3.
Section 4 presents the conclusion of this study.
2. The Proposed Architecture and Methodology
Figure 1 and
Figure 2 show the overview of the proposed methodology for pipeline leakage detection. The architecture is implemented using acoustic emission sensors. There are three sensors placed on the pipeline at locations channel 3 = 0 mm, channel 2 = 1600 mm, and channel 1 = 2500 mm. Data are transmitted to the next step, which is the data acquisition step, in which the signals that gauge physical circumstances in the real world and transform the resulting samples into digital numeric values (that a computer can work with) are sampled. The acquisition of data at the defined sampling rate, 1MHz, extracts the desired features and assigns the labels to the feature vector extracted from one second for the complete data set. The next step is to complete the dataset, which is then processed for testing the classification accuracy with different algorithms to detect the leakage in the pipeline. Once the activity labels are identified, the activity gets either “leakage” or “normal” assigned to its label. It will generate the output of the sensor data containing the leakage or the data are “Normal”. The final step is to show the result on the display for the monitoring supervisor. In the rest of the following subsection, we briefly explain each subcomponent.
2.1. Acoustic Emission
A leak in the pipeline results in a change in the structural integrity of the material. This change in the structural integrity can be due to fatigue rupture, stress cracks, corrosion cracks, and structural discontinuities. The structural discontinuity or leak in the pipeline (irrespective of the cause) will disturb the flow of the fluid or gas inside the pipeline. However, the intramolecular interactions or chemical bonding of the fluid will force the fluid to keep its flow consistent [
29]. Thus, for the fluid to keep its flow consistent inside the pipeline, the molecule of the fluid will exert pressure on the position of pipeline structural discontinuity or leak, which will result in the short, rapid release of energy in the form of an elastic wave. An AE can be defined as a transient sound wave that is generated by a short, rapid release of energy in the form of an elastic wave that is produced by the change in the structural integrity of the material within a specimen, such as a pipeline [
30]. The physical phenomenon resulting from the fluid interaction with the structural discontinuity is referred to as an AE event. This AE event is detected by the AE sensors in the form of AE hits. Thus, the AE hits in the signal can be related to a leak due to fatigue rupture, stress cracks, corrosion cracks, and structural discontinuities. For this reason, the nondestructive method based on AE is considered ‘global’ in nature. The word global means that AE-based monitoring allows the investigator to get a bigger picture of the overall performance of the specimen, irrespective of the cause for degradation [
31]. Specifically, in this study, a nondestructive method based on AE was used to investigate the health conditions of the pipeline. Based on the global nature of the AE, theoretically, the proposed method will work to identify a leak and its size.
2.2. Acoustic Emission Sensor
An acoustic emission (AE) sensor is a device that detects and analyzes the sound waves generated by changes in the internal structure of a material or structure. The R15I-AST, manufactured by MISTRAS Group, Inc, is one example of such a sensor. It uses piezoelectric transducers to convert mechanical stress or strain into electrical signals, which can then be analyzed to determine the location and severity of structural changes. These sensors are commonly used for non-destructive testing and the structural health monitoring of various structures, such as bridges, pipelines, and pressure vessels. With the capability of working under high temperatures, humidity, and pressures, R15I-AST can monitor the structural integrity in near real-time and provide early warnings of potential issues, allowing for the necessary actions to be taken before any damage occurs. The operating frequency range of the R15I-AST sensor and the parameters used for the data acquisition are listed in
Table 1.
2.3. Data Acquisition
In data acquisition, the physical conditions in the real world are measured through the use of sensors. These measurements are then sampled and translated into digital numeric values that a computer can interpret. This is typically achieved by converting analog waveforms into digital values for further processing. The key component of the current data acquisition system is the acoustic emission sensor, which is used to convert physical parameters into electrical signals. Once the data is acquired from the sensors, it can be used to detect leaks or other anomalies in the pipeline. The spectrum was performed on the time amplitude signal to instantly identify the leak time.
Figure 3a–c show the time response for the three sensors, and
Figure 3d–f show the corresponding spectrum when there is no leakage. It clearly shows that both the time amplitude and frequency amplitude are low for both time response and frequency response under normal conditions.
Figure 3g–i show the time-domain amplitude, and
Figure 3j–l show the frequency response. These figures illustrated that when the leakage was introduced, both the time and frequency amplitude increased two times that of the normal condition.
2.4. Features Extraction in the Time and Frequency Domains
The feature extraction and selection methods are helpful for the transformation of data, which translates the preprocessed data into processed data to identify significant trends. Features extraction is an essential approach for reducing data size and provides valuable information for developing a classification model. Most researchers use statistical characteristics approaches for feature extraction. Chai et al. [
32] extracted the various features from the AE signal, such as peak amplitude, entropy, energy, count, peak frequency, and centroid frequency, to find crack growth under different stress ratios. Muir et al. [
33] reviewed the time domain, frequency domain, and composite features extracted from the AE signal for the damage analysis of a structure. They mentioned around 31 various features used in the literature. Traditional AE hit features, such as rise time, decay time, counts, etc., can be extracted from the AE signal by defining a threshold above the level of continuous background noise. However, the predefined threshold for extracting the AE features can lead to false alarms due to noise in the AE signal. Furthermore, defining a threshold above the level of continuous background noise requires human expertise and domain understanding. In order to address this concern, instead of utilizing a predetermined threshold for AE feature extraction, in this paper, a sliding window is used. In order to exploit the statistical changes in the AE signal due to the defects in the pipeline, this research extracts 25 statistical time and frequency domain features for each AE channel using a sliding window. A total of 75 features are extracted from the three AE channels. All these features are provided as input to the classification model for the task of a pipeline health assessment. The features extracted from each AE channel, comprising 11 time domain features (such as mean, standard deviation, skewness, kurtosis, crest factor, clearance factor, etc.) and 14 frequency domain features (namely, P1, P2, P3, P4, P5, …, P14). Considered the frequency spectrum-based feature extraction that incorporated the lower and higher frequencies into their power.
Figure 4 shows the details of feature extraction and feature vectors.
Table 2 and
Table 3 show the mathematical formulas for the time domain and frequency domain features, respectively.
2.5. Machine Learning Algorithms for Leakage Detection and Identification
Classification algorithms are used to predict the class or label of a given set of data. The input to these algorithms is a set of features extracted from raw sensor data, which are associated with specific activities or classes. A decision rule or function that can accurately predict the class of new data based on these features must be determined. This process is known as the classification task.
Classifiers are machine learning techniques that can be used to assign labels to activities. They are trained on a dataset of labeled data, where the feature vector (also known as the training dataset) has been given a label. The learning algorithm adjusts its parameters to generate a model or hypothesis, which can then be used to predict the label ‘y’ for new input data ‘x’. In this research paper, the data collected were related to a pipeline, and we used MATLAB to extract the features of each piece of data. Then, we applied algorithms to classify the data using the software tool MATLAB. In order to check the accuracy of the data, four distinct algorithms were used.
A simple and successful technique for classification and regression applications is k-nearest neighbors (KNN). It operates by locating the K closest data points in the training set for a specific point in the test set, then making a prediction using the labels or values of these nearest neighbors. The Euclidean distance, or the distance along a straight line between two points, is one approach to gauge the separation between data points. The Euclidean distance in KNN is determined as the square root of the sum of the squared coordinate differences between the two points. The performance of the KNN model can be influenced by the distance metric that is selected. Euclidean and Manhattan distances can both be employed. In some instances, Euclidean distance may be more appropriate, while the Manhattan distance may be more suitable in others. The dataset’s characteristics and the task when deciding which distance metric to use with KNN.
A random forest is a method of categorization that relies on the construction of many decision trees (weak learners) and, in the end, adopts the verdict reached by the majority of such learners. The decision tree is a single tree, but the random forest has multiple trees. Normally, Overfitting can be prevented through the use of trimming decision trees. With pruning, you have to choose between precision and simplicity. Complexity, extra work, and more use of resources are the results of not trimming. Equal to the parameters of a decision tree classifier is the random forest.
Three different node types are formed while constructing a decision tree, i.e.,
- -
The root node is the node with no input link and can have no or some output links;
- -
Internal nodes have one input link and two or more output links;
- -
Leaf nodes are the end nodes that have exactly one input link and no output link.
The neural network can be used for more complex models, which can be utilized in multi-class classification. Neural networks are inspired by the brain, which is a network of neurons. The neuron model consists of some inputs with input weights, a hidden layer, and an output (hypothesis). They translate information through a sort of machine recognition, marking, or grouping of the information. The examples they observe are numerical in vector form, into which all correct information, might be pictures, sound, content, or time arrangement, must be deciphered. A neural network is a collection of “neurons” with “synapses” which are connecting. Hidden layers are vital when the neural system needs to realize something truly confounded, relevant, or non-self-evident, like picture acknowledgment. The circles speak to the neurons, and the lines speak to the synapses. Synapses take the input and multiply it by weight. The neurons add the outputs from all synapses and apply an activation function.
2.6. Performance Metrics
In order to evaluate the performance of the proposed method in comparison to the reference method, metrics such as accuracy, precision, and recall were employed. Equations (1)–(3) were used to determine these metrics:
In this context, TPa, FPa, and FNa refer to the true-positive, false-positive, and false-negative results, respectively, obtained from the features that are representative of class a; na represents the total number of samples from class a; A represents the overall number of classes in the dataset. The variable N denotes how many samples there are in all of the testing sets.
4. Conclusions
This article presents a machine learning-based platform for detecting and localizing pipeline leaks using acoustic emission (AE) technology. By extracting various statistical measures from AE signals and using them as features to train machine learning models, the platform can accurately identify and locate leaks in pipelines. In order to preserve the characteristics of both bursts and continuous-type emissions, a sliding window with an adaptive threshold was used, allowing for real-time data collection and analysis. The article also presents an evaluation of the proposed platform by using four datasets that contain water and gas leaks at different pressures and various machine learning classifiers like neural networks, decision tree random forests, and k-nearest neighbors. An overall classification accuracy of 99% was achieved, indicating that the proposed platform is a reliable and effective solution for pipeline leak detection and localization. Overall, the article emphasizes the significance of pipeline leaks and how the proposed machine learning-based platform can be an effective solution for this problem. The severe consequences of pipeline leaks include wasted resources, health risks, distribution downtime, and economic losses, so it is important to develop efficient leak detection systems. The success of the proposed platform in detecting and localizing leaks with high accuracy provides a strong indication of its potential for use in real-world applications. It is also notable that AE technology is a promising solution for detecting pipeline leaks, as it is capable of leak diagnosis, which has been significantly demonstrated. The current study is capable of detecting and identifying the size of a leak in the pipeline. However, the classification model cannot predict the condition of the pipeline, along with the pressure and transportation medium. For this reason, in the future, a classification model can be developed that can predict the condition of the pipeline, along with the pressure and transportation medium.