A Sparse Autoencoder-Based Unsupervised Scheme for Pump Fault Detection and Isolation

Liang, Xiaoxia; Duan, Fang; Bennett, Ian; Mba, David

doi:10.3390/app10196789

Open AccessArticle

A Sparse Autoencoder-Based Unsupervised Scheme for Pump Fault Detection and Isolation

¹

School of Engineering, London South Bank University, London SE1 0AA, UK

²

Technology Manager Services, Shell Research Ltd., Floor 21, London Shell Centre, London SE1 7NA, UK

³

Faculty of Computing, Engineering and Media, De Montfort University, Leicester LE1 9BH, UK

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2020, 10(19), 6789; https://doi.org/10.3390/app10196789

Submission received: 2 September 2020 / Revised: 22 September 2020 / Accepted: 23 September 2020 / Published: 28 September 2020

(This article belongs to the Special Issue Condition Monitoring and Their Applications in Industry)

Download

Browse Figures

Versions Notes

Abstract

:

Pumps are one of the most critical machines in the petrochemical process. Condition monitoring of such parts and detecting faults at an early stage are crucial for reducing downtime in the production line and improving plant safety, efficiency and reliability. This paper develops a fault detection and isolation scheme based on an unsupervised machine learning method, sparse autoencoder (SAE), and evaluates the model on industrial multivariate data. The Mahalanobis distance (MD) is employed to calculate the statistical difference of the residual outputs between monitoring and normal states and is used as a system-wide health indicator. Furthermore, fault isolation is achieved by a reconstruction-based two-dimensional contribution map, in which the variables with larger contributions are responsible for the detected fault. To demonstrate the effectiveness of the proposed scheme, two case studies are carried out based on a multivariate data set from a pump system in an oil and petrochemical factory. The classical principal component analysis (PCA) method is compared with the proposed method and results show that SAE performs better in terms of fault detection than PCA, and can effectively isolate the abnormal variables, which can hence help effectively trace the root cause of the detected fault.

Keywords:

sparse autoencoders; unsupervised learning; multivariate data; fault detection; pump

1. Introduction

As an important component in the oil and petrochemical industry, pumps are widely used in different sectors, including the production line, transportation process and refinery factory. According to investigations in [1], pumps are responsible for the most equipment failures in a petrochemical plant. A minor fault in a pump can gradually develop to a severe one, which threatens the safety and productivity of plants. Therefore, detecting faults in pumps at an early stage is of great importance for engineers and managers.

The fault detection of pumps belongs to a “process monitoring and fault detection” (PMFD) field [2,3]. The genesis of the PMFD strategies can be traced back to the advent of the basic Shewhart control charts in the 1930s [4]. The control chart has been successfully used to date for monitoring a single variable. They are most effective for detecting large shifts in the measured variables which can indicate the health condition of the process or equipment being monitored; however, they are insensitive to smaller shifts, which may indicate an incipient fault in the concerned process or the equipment [5]. This led to the development of the improved control charts (i.e., cumulative sum control chart and exponentially weighted moving average control chart) in the 1950s, as well as model-based and data-based approaches.

The model-based PMFD approach develops explicit mathematical models to realize fault detection activities of the concerned facility. Two main such approaches for pump fault detection are state observer based methods and models based on parameter estimation [3]. However, in practice, model-based approaches often fail to work in complexed equipment monitoring, due to the difficulty in accurately modelling multiple couplings in system parameters [6].

By contrast, the data-driven approaches directly apply the monitored variables to infer the faults, instead of using physical or accurate mathematical models. A distinct feature of data-driven approaches is that no prior information about the system is necessary, making it more suitable for fault detection in complex systems. Prominent among these approaches are the statistical-based [7], frequency- and time–frequency-analysis-based [8,9,10], and artificial intelligence (AI)-based PMFD methods [11,12,13].

The statistical-based PMFD methods started from the combination of control charts (i.e., Shewhart, CUSUM and EWMA control charts) with R chart, S chart, and other various charts for single variable monitoring [7]. With the advancement of data acquisition systems, measurements on a number of important process variables are now being made available. Hotelling (1947) [14] paved the way for multivariate statistical process control method by introducing the T² statistic which became the harbinger for all the multivariate statistical process control methodologies developed henceforth. The squared prediction error (SPE), namely the Q statistics [15], were also developed for multivariate fault detection. Frequency- and time–frequency-analysis-based methods are very popular in pump fault detection. These methods have proved very effective in analysing vibration and acoustic emissions signals [8,9,10].

The availability of cost-effective data acquisition systems and great advancements in machine learning techniques lead to the application of artificial intelligence (AI) in PMFD, namely AI-based strategies [16,17]. A few papers can be found on AI-based PMFD for pumps, and most of them were tested by simulated or experimental data. Zouari et al. [11] created a real-time fault detection tool for a centrifugal pump using multi-layer perceptron neural network and fuzzy techniques system based on vibration measurements. The system was able to detect fault modes, including partial flow rates, loosening of front/rear pump attachments, misalignment, cavitation, and air injection on the inlet, using experimental data. Rajakarunakaran et al. [12] developed two different artificial neural network approaches, which were the feed-forward network with back propagation algorithm and the binary adaptive resonance network for the fault detection of a centrifugal pumping system. Both models had good performance using simulated data. Eom et al. [13] applied convolutional neural networks to detect refrigerant leakage in heat pumps, and the proposed method was proved effective using experimental data.

However, some challenges in the fault detection of pumps still remain.

Based on the literature review, apart from conventional fault detection models (Shewhart control charts and its variants, principal component analysis (PCA), etc.), state-of-the-art fault detection methods (i.e., multi-layer perceptron neural network and fuzzy techniques system [11], binary adaptive resonance network [12], convolutional neural networks [13], SAE, etc.) worked well on simulated and experimental data, lacking tests in real industrial applications.
Most of the aforementioned references detect faults on specific components or fault types, lack a global health indicator from a system-wide perspective. Currently, there is a general trend to construct a system-wide health monitoring system using multivariate industrial field data.
During the operation of modern industrial machines, a huge volume of data is collected and archived by the monitoring systems. However, most data belong to normal operating conditions, while faulty data are usually rare and sometimes cannot be obtained. Instead of classification model, it requires an anomaly detection models that are able to take full advantage of the large volume of healthy data and effectively detect the fault.
Again, based on the literature reviews [5,18,19] and input from our sponsor (Shell), conventional fault detection systems in process industries have the problem of insensitive to incipient faults or generating a large numbers of false alarms, which may incur high maintenance costs and reduce the availability of the equipment. Therefore, there is a requirement for finding an effective fault detection model with lower false alarm rates and higher fault detection rate. Meanwhile, it is necessary to identify the most relevant variables that are strongly related to the detected fault, because this fault isolation can help determine the root cause of the fault and provide decision support for operators to timely adjust pump operation and take maintenance actions if necessary.

Among the Artificial Intelligence (AI)-based fault detection approaches, unsupervised machine learning methods are of great interest. In the strategy, an anomaly/fault detection model is trained using data collected when the system is in a healthy state. When new data come in, it is compared to the model’s prediction and an alarm is raised to indicate an anomaly condition if an intolerable deviation is found. Principal component analysis (PCA) is one of the widely used unsupervised machine learning methods. It projects high dimensional data to a lower dimensional subspace to distinguish “normal” data from “abnormal” data. PCA has been widely applied in process control [7], fault detection and system risk assessment [20], network traffic anomaly detection [21], network intrusion detection [22], etc. More recently, autoencoder (AE) and its variants [23], which can automatically learn features from unlabeled data, are gaining popularity for anomaly detection. An autoencoder is a special type of neural networks that includes an encoder and decoder. The encoder layer(s) has similar principles as PCA, but with a slight difference that PCA is linear while the autoencoder can be nonlinear. The decoder layer(s) of AE reconstructs the input. A notable feature of the AE approach is that it can learn arbitrary relationships among different sensor variables in both linear and nonlinear cases and therefore be more flexible than existing linear monitoring methods [24]. However, like many neural networks, AE suffers from the overfitting problem [25]. Thus, a couple of improved AEs have been derived, e.g., multi-level-denoising AE [24], sparse autoencoder (SAE) [26]. Among them, the SAE is an effective one, which solves the overfitting problem by putting a sparsity constraint on the hidden units [23,26]. A successful application of SAE for anomaly detection in turbomachinery can be found in [27].

To address the aforementioned challenges, we developed a fault detection and isolation scheme based on the unsupervised machine learning method, SAE. Mahalanobis distance (MD) is employed to measure the distance between the condition monitoring data and the data under healthy condition. The MD merged multiple variables into one system-wide health indicator. Fault isolation can be achieved by using a two-dimensional contribution map. The variables with larger contributions are responsible for the fault. The performance of the fault detection model was evaluated by the receiver operating characteristics (ROC) curve and the area under the ROC curve (AUC value), considering both false alarm rate and fault detection rate.

The following of the paper is organized as follows. Section 2 explains the fault detection and isolation scheme and the algorithms employed in this paper. Then, the experimental data from an industrial factory are explained in Section 3. The performance of fault detection models on the multivariate industrial data are evaluated and discussed through two case studies in Section 4. Finally, conclusions are drawn in Section 5.

2. Fault Detection and Isolation Scheme

The proposed fault detection scheme is summarized in Figure 1, which consists of two phases: (1) offline training; and (2) online fault detection. The two phases and techniques applied in these two phases are explained in detail in the following sections.

In general, the fault detection scheme is based on the comparison and analysis of the reconstruction error between the actual monitoring data and its reconstructed output obtained from the trained normal reference model. Changes in the reconstruction error could indicate the occurrence of faults. The reconstruction errors of monitored variables are shown on a two-dimensional contribution map, which stacks multiple observations (time point) into one image to clearly illustrate the contribution of the variables over the entire faulty data times series. On the contribution map, a high reconstruction error will be identified as anomalies or faults.

2.1. Offline Model Training

2.1.1. Data Pre-Processing

As shown Figure 1, the monitoring data are firstly pre-processed and applies to both the offline training and online fault detection phases. This process filters out the erroneous monitoring data, which are common in practical measurement data caused by communication or instrumentation failures [28]. Note that the pre-processing rules can be different for different data and the rules for our experimental data are explained in Section 3.

Moreover, a data standardization process is applied to the monitoring data to scale the data to equal range. Assume that

\hat{X} = {{\hat{x}}_{i}; i = 1, \dots, D}

(

{\hat{x}}_{i} \in ℝ^{n}

for each measurement

i

) with

D

different measurements, is the monitoring dataset,

X = {x_{i}; i = 1, \dots, D}

(

x_{i} \in ℝ^{n}

for each measurement

i

) is the dataset after standardization process. The transformation from

\hat{X}

to

X

is given in Equation (1).

x_{i} = \frac{{\hat{x}}_{i} - μ_{i}}{σ_{i}}, (i = 1, \dots, D)

(1)

where

μ_{i}

is the mean, and

σ_{i}

is the standard deviation of the input

{\hat{x}}_{i}

. Note that

μ_{i}

and

σ_{i}

were calculated form training data, and then applied to both training and monitoring data.

2.1.2. Sparse Autoencoder

The sparse autoencoders (SAE) is a special unsupervised feedforward neural network. A representative structure of SAE is presented in Figure 2. It can be observed that the SAE network includes two parts, i.e., encoder and decoder. The encoder connects the input layer to the hidden layer, with the weight matrix and the bias of this part being represented by

W^{(1)}

and

b^{(1)}

, respectively. The decoder connects the hidden layer to the output layer, with the corresponding weight matrix

W^{(2)}

and bias

b^{(2)}

.

The network tries to reconstruct the input vector in the output layer, as shown in Equation (2).

\tilde{X} = H_{W, b} (X) \approx X

(2)

where

X

denotes the input vector

X \in R^{D \times 1}

, and

\tilde{x}

is the output vector.

H_{W, b} (X)

is the nonlinear function of SAE, which predicts output

\tilde{X}

based on the input

X

, using parameters

W

and

b

.

The SAE is trained by minimizing the cost function [26] in Equation (3). In this paper, the L-BFGS algorithm and back propagation [26,29] were employed to train the network for minimizing the cost function.

E (W, b) = {\frac{1}{D} \sum_{j = 1}^{D} [\frac{1}{2} || H_{W, b} (x_{j}) - x_{j} {||}^{2}]} + \frac{λ_{1}}{2} \sum_{l = 1}^{L - 1} \sum_{i = 1}^{s_{l}} \sum_{j = 1}^{s_{l + 1}} {(W_{j i}^{(l)})}^{2} + β \sum_{H = 1}^{s_{2}} K L (ρ || {\tilde{ρ}}_{H})

(3)

The first part of Equation (3) is an average sum-of-squares error term, which is used for minimizing the error between the input data

x

and the output data

\tilde{x}

.

x_{j}

represents the

j t h

training input variable,

D

is the total amount of the input variables. The decoder function of the

j t h

training variable is given by

H_{W, b} (x_{j}) = f (W^{(2)} a + b^{(2)})

, where

a

is the activation of hidden layer

a = f (W^{(1)} x_{j} + b^{(1)})

, where

W^{(1)}, W^{(2)}

are weights,

b^{(1)}, b^{(2)}

are bias units, and

f = {(1 + \exp (- x))}^{- 1}

, is the logistic sigmoid function.

The second part of the cost function is the regularization term, also called the weight decay term to avoid over-fitting, where

λ_{1}

is the weight decay factor,

L

is the number of layers in the network, and, in Figure 2,

L = 3

.

s_{l}

represents the number of units in layer

l

, excluding the bias unit. Obviously,

W_{j i}^{(l)}

is the weight, which is associated with the connection between the

i t h

unit in layer

l

and the

j t h

unit in layer

l + 1

.

The last part of the function is the sparse penalty term which imposes a sparsity constraint on the hidden units, where

β

is the weight of this term.

s_{2}

is the number of neurons in the hidden layer.

K L (ρ | | {\tilde{ρ}}_{H})

measures how different two distributions are, and this term is given by Kullback–Leibler divergence function in Equation (4):

K L (ρ || {\tilde{ρ}}_{H}) = ρ l o g \frac{ρ}{{\tilde{ρ}}_{H}} + (1 - ρ) l o g \frac{1 - ρ}{1 - {\tilde{ρ}}_{H}}

(4)

where

ρ

is the sparsity parameter which is usually small and close to zero,

H

is the number of hidden unit in the SAE,

{\tilde{ρ}}_{H}

is the average activation of hidden unit

j

given by

{\tilde{ρ}}_{H} = \frac{1}{D} \sum_{j = 1}^{D} [a_{H}^{(2)} x_{j}]

.

a_{H}^{(2)}

represents the activation of hidden unit

H

in SAE.

2.1.3. Residual Evaluation and Threshold Calculation

With the data under a healthy condition, a health reference model can be trained. Then, we can calculate the multivariate residuals

E

between the input variables

X

and the reconstructed outputs

\tilde{X}

as follows:

E = \tilde{X} - X

(5)

To detect the existence of a fault, an integrated monitoring indicator needs to be calculated based on the multivariate residuals. The Euclidean distance or the Mahalanobis distance (MD) are typically used. The MD is a unitless distance measurement, and takes into account the correlations among variables. The was proposed by Indian statistician Mahalanobis to represent the covariance distance of the data. The MD provides a univariate distance value for a multivariate data and is therefore applied in the anomaly detection models. In the past, MD was successfully applied for the anomaly detection in wind turbine data [6,30,31].

In this paper, a robust MD [6,32] is employed, as shown in Equation (6).

h = \sqrt{(E - \hat{μ}) M C D^{- 1} {(E - \hat{μ})}^{T}}

(6)

Next, the fault detection threshold

d

is determined using the statistical properties of

h

. In other words, the fault detection threshold

d

can be obtained from the probability density function (PDF) of

h

for a given confidence level

α

by solving Equation (7):

P (h > d) = \int_{- \infty}^{d} p (h) d h = α

(7)

where

p (h)

is the PDF function of

h

. In this paper, the kernel density estimation (KDE) method is adopted for distribution fitting. The KDE method is a well-established approach in statistical distribution fitting and has been successfully applied to the field of processing monitoring and fault detection [6]. Therefore, according to the KDE method,

p (h)

can be written as:

\begin{matrix} p (h) = \frac{1}{N σ} \sum_{i = 1}^{N} K (\frac{h - h_{i}}{σ}) \end{matrix}

(8)

where

N

is the total number of

h

.

K (\cdot)

is the kernel function and

σ

is the bandwidth. The selection of optimal value for

σ

is described in [33]. Here, the Gaussian kernel is used.

2.2. Online Fault Detection Phase

In the online fault detection phase, the monitoring data are pre-processed by the same method as that in the offline training method, then fed to the trained SAE model in offline mode.

2.2.1. Fault Detection

The health indicator is calculated as

h_{M} = \sqrt{(E_{M} - \hat{μ}) M C D^{- 1} {(E_{M} - \hat{μ})}^{T}}

(9)

E_{M} = Y - \bar{Y}

(10)

where

\hat{μ}

and

M C D^{- 1}

are adopted directly from the training phase as calculated in Equation (5).

E_{M}

is the residual between the actual measurement values

Y

and the reconstructed output

\bar{Y}

obtained using the trained SAE fault detection model.

A fault is detected when the real-time MD value

h_{M}

exceeds the predefined threshold

d

.

2.2.2. Fault Isolation

After a fault being detected, it is necessary to identify the most relevant variables related to the fault. This can be achieved by a reconstruction-based contribution map and the variables with larger contributions are responsible for the fault.

The

Q

statistic (also called squared prediction error, SPE) is widely used in process control for condition monitoring data [34,35,36]. The traditional

Q

statistic contribution plot can be calculated by Equation (11) [37]:

Q_{i} = {(x_{i} - {\hat{x}}_{i})}^{2}

(11)

where

{\hat{x}}_{i}

is the reconstructed value of

x_{i}

by the SAE model.

The contribution plot is a one-dimensional plot, which only examines the contributions at one time point (one observation), and multiple contribution plots are needed to illustrate multiple observations in time series data. In contrast, a two-dimensional contribution map [38] stacks multiple observations into one image to clearly illustrate the contribution of the variables over the entire faulty data times series, which enables the fast identification of faulty variables within large heterogeneous data sets. Therefore, a two-dimensional

Q

statistic contribution map is applied.

2.3. Performance Metrics

To evaluate the performance of the model for fault detection, the receiver operating characteristics (ROC) curve and the resulting quantification metric area under the ROC curve (AUC) [6] are employed. The ROC curve is created by plotting the fault detection rate (FDR) on the Y-axis against false alarm rate (FAR) on the X-axis under different threshold levels. FDR is defined as the percentage of detected fault samples over the number of fault samples, and FAR is regarded as the percentage of detected fault samples over the number of normal samples [39]. Both the range of FDR and FAR fall in [0, 1].

An illustration of the ROC curve and AUC value are shown in Figure 3. On the ROC plot, the points situated at the top left corner have a higher FDR value and a lower FAR value, and hence are regarded having better performance than other points on the ROC curve. The ROC curve and the AUC metric provide a relative trade-off between FDR and FAR. A good fault detector should yield a high FDR and a low FAR on an ROC curve and, accordingly, a large AUC value.

3. Experimental Data Description

High-pressure injection pumps are widely used in the oil and petrochemical industry for oil transportation, lift and injection. Centrifugal pumps are the most common pump type. They are typically operated under high rotating speed, high pressure, and high loading conditions and thus are likely subjected to performance degradations.

The pumps are used to move fluid through mechanical actions. To be specific, the pump’s impeller rotates within the housing, reduces pressure at the inlet, and creates a vacuum. This motion then drives fluid to the outside of the pump’s housing, which increases the pressure high enough to send it outside the discharge [40].

The working conditions of the pump is linked to its mechanical action and controlling methods. According to its mechanical action, several methods have been applied to control the pump to meet the end users′ requirements. The most common include throttling control, the rotational speed control, and using a sequence circuit [41]. These controlling methods adjust the flow, pressure or shaft speed of a pump to achieve expected performance. Consequently, in this paper, the flow, pressure and speed are used to evaluate the pump’s working conditions. The training data are selected under the same working condition as the test data.

The data used in this paper were obtained from a multivariate condition monitoring system mounted on a high-pressure injection pump in an oil and petrochemical industry. The measurement variables are listed in Table 1. The data cover the period from 18 September 2012 to 08 September 2017, captured at a sampling rate of one sample per hour.

In practice, the raw data from monitoring system cannot be directly used, as it often contains erroneous data or missing data due to communication system failure, sensors faults, maintenance or repair actions, etc. In order to filter out the errors in training data, the following rules are applied for data pre-processing:

Filter out all missing values;
Filter out downtime data, where speed is less than 10 rpm;
Filter out all data vectors where one or more parameters have a value higher/lower than a predefined threshold (viewed as erroneous data). This step aims to delete downtime data and erroneous data due to sensor fault. In this paper, the threshold values were decided based on the manufacturer specifications. For example, all measurements with a discharge pressure lower than 130 bar or greater than 250 bar were filtered out. After this process, there might still exist some outliers in the training data. The influence of such outliers was further eliminated by setting reference fault detection thresholds in training process of both SAE and the PCA models using Equation (7).

4. Results and Discussion

4.1. Case One: Detection of a Misalignment Fault

In order to demonstrate the effectiveness of the proposed approach, the traditional PCA method is also applied to process the data set. In this case, both models (PCA and SAE) for the pump anomaly detection were trained on the data acquired in a quarter of the year period from 10 March 2013 to 21 June 2013, where no abnormal events were recorded in the pump. The models were then used to assess the health condition after 22 June 2013. In this period, the pump worked under working conditions: speed around 90–105 rpm with suction pressure 10–12 bar, discharge pressure around 226–230 bar, and flow fluctuates between 850 to 1250 km³/d.

The MD thresholds for PCA and SAE are calculated based on the training data. The results are presented in Figure 4, in which the blue points under the threshold are healthy data, and the red points above the threshold are viewed as anomaly data. The reference thresholds (

d_{P C A}

and

d_{S A E}

) for these two models were calculated based on Equation (7) with confidence level

α = 0.01

[30]. For the PCA model, the number of principal components is determined when the cumulative variance contribution rates exceed 95%. For the SAE model, the number of nodes in the hidden layer was set as 11. Later in this case study, the influence of parameter selection is evaluated by ROC and AUC.

The anomaly detection results for the pump from 22 June 2013 to 6 July 2013 are shown in Figure 5. As can be seen, no anomalies were detected until 3 July for both SAE and PCA models, because no continuous anomalies were found during that period. It is in consistence with the fact that no failure was recorded in the historical maintenance log, and no obvious changes in original signals can be observed in bearing temperatures from 22 June to 3 July in Figure 6.

Moreover, it can be observed in Figure 5 that the SAE model generated continuous alarm after 4:00 pm on 3 July, and the PCA model generated continuous alarm after 6:00 am on 4 July. The maintenance log recorded a maintenance action on 7 July of misalignment fault in the pump system with four bearing temperatures changed altogether. The SAE anomaly detection models detected the fault four days before the maintenance actions being carried out. Contrastingly, in Figure 6, there was no obvious increasement in the bearings’ measurements from 3 July to 4 July. After 4 July, four bearing temperatures increased altogether (see Figure 6), and both the SAE and PCA models showed an obvious increase in MD values.

The two-dimensional Q statistic contribution map of PCA from 22 July to 6 July is presented in Figure 7. A contribution map shows the contribution of each input variable to the anomaly that detected. By contrast, the contribution of each variable calculated by SAE from 22 July to 6 July is presented in Figure 8.

In Figure 8, it clearly shows that it was the value changes in four bearings’ measurements and discharge pressure that mainly contributed to the increase in the healthy indicator. The maintenance staff could check bearing related faults, including bearing faults, misalignment, rotor unbalance, etc. However, in Figure 7, the PCA model failed to show the abnormality in bearing 1 and 2 s temperatures.

To evaluate and compare the performance of PCA and SAE for fault detection, the ROC curves and AUC values are employed. The number of PCA components and the number nodes in hidden layers of SAE can influence the performance of the fault detection models; therefore, the ROC and AUC with different values of these parameters are calculated. The results are presented in Figure 9. As explained in Section 2.3, a good fault detector should yield a high FDR and a low FAR in ROC plot and, accordingly, a high AUC value. As can be seen in Figure 9, these two models show similar performance for this case and the influence of the number of principle components and SAE nodes are not quite obvious.

4.2. Case Two: Detection of a Misalignment Fault and Bearing Fault

From 1st September 2015 to 11 July 2016, the pump worked under another condition, with speed around 95–105 rpm, discharge pressure around 203–220 bar, and flow fluctuate around 1000–1400 km³/d. The training data were selected via setting thresholds for shaft speed, discharge pressure and flow, to make sure that the selected training data had the same working condition with monitoring data. Then, the trained models were applied to check the health condition after 12 July 2016.

The MD thresholds calculated during the training stage by PCA and SAE anomaly detection models, respectively, are presented in Figure 10a,b. As shown in the figure, the blue points under the threshold are regarded as healthy data, and the red points are viewed as abnormal data. Same as in case one, during the training stage, the reference thresholds (

d_{P C A}

and

d_{S A E}

) were calculated via Equation (7), with a confidence level

α = 0.01

[30]. For the PCA model, the number of principal components is determined when the cumulative variance contribution rates exceed 95%. For the SAE model, the number of nodes in the hidden layer was set as 11.

The fault detection results from 12 July 2016 to 23 August 2016 are shown in Figure 11, and, as a comparison, the measurements of bearings are shown in Figure 12. As can be seen in Figure 12, from the 12 July to 17 July, there was no obvious changes in original data, and, after 17 July, four bearings’ temperature increased, resulting from a misalignment fault. In Figure 11a,b, both the PCA and SAE models indicated the pump worked healthily from the 12 July to 17 July. The PCA generated continuous alarms after 22:00 on 18 July. By contrast, the SAE discovered some anomalous and generated continuous alarms after 16:00 on 17 July.

Contribution maps for PCA and SAE models are presented in Figure 13 and Figure 14, respectively. In Figure 13, the anomalies in the PCA model are mainly contributed by bearing 1 and 2, and the contribution map failed to indicate changes in thrust bearings’ temperatures. In Figure 14, it shows that the anomalies in the SAE model were caused by the temperature changes in four bearings, which was inconsistent with the observations in Figure 12.

From 20 July to 23 August 2016, it can be seen in Figure 11a,b that continuous anomalous points were detected with both PCA and SAE models. Both Figure 13 and Figure 14 clearly show that it was bearing 2′s vibration amplitude that mainly contributed to the fault. This consisted of the performance showed in Figure 12b that the bearing 2′s vibration amplitude kept increasing in this period.

To evaluate the performance of PCA and SAE for fault detection, the ROC curves and AUC values are calculated and shown in Figure 15 and Figure 16. The models’ performance, influenced by the number of PCA components and the number nodes in hidden layers of SAE, was explored. Figure 15 is the ROC curves and average AUC values for the detection of the misalignment, and Figure 16 is for the detection of the bearing fault. As can be seen in both figures, the SAE model had higher AUC value than the PCA model, especially when detecting the misalignment fault.

4.3. Discussion

In summary, in these two case studies, the SAE model was able to detect the fault before the measured signals, showing obvious changes. The early detection of faults in pumps enables the operators to schedule maintenance at optimum time and thereafter increase the safety and reliability of the system, reduce energy consumption, service and maintenance costs. When compared with the PCA model, the SAE model can give a relatively lower FAR (around 0.1), higher FDR (over 0.9) and higher AUC value (over 0.98) in both cases. Especially when detecting the misalignment fault in case 2, the FAR for the PCA model is over 0.22 when FDR is above 0.8. In contrast, the SAE model kept its good performance with high FDR (over 0.9) and low FAR (lower than 0.1). The results indicate that a proposed fault detection scheme based on SAE is a good alternative for some conventional fault detection models in industrial fault detection systems, which are insensitive to incipient fault or have higher false alarm rates. Furthermore, the SAE performed better in isolating the relevant variables that are responsible for the fault. It can help to correctly isolate the abnormal variables of the detected fault, assisting operators and data analysts to trace back the root cause, making it easier for maintenance planning.

5. Conclusions

In this paper, a fault detection and isolation scheme based on SAE was developed and applied in an industrial multivariate monitoring data set. Robust Mahalanobis distance was applied to compose multiple variables into one system-wide health indicator. The two-dimensional contribution map was calculated to help isolate variables that are responsible for the fault. In summary, the contribution of this paper includes: (1) The application of a state-of-the-art unsupervised fault detection model, SAE, on industrial multivariate process data, with each step, data pre-processing, parameters setting, and model evaluation explained in detail. Our experience with processing the industrial data set can benefit relevant readers. (2) The proposed fault detection scheme based on SAE and Mahalanobis distance has been shown to be more effective in detecting faults than the traditional PCA method, with a lower false alarm rate, higher fault detection rate and higher AUC values, and therefore offers a good alternative solution for complicated industrial process systems that experience high false alarm rates. (3) The proposed fault detection and isolation scheme can detect faults at its incipient stage and have accurate performance in fault isolation, which can assist operators and data analysts to infer the fault cause, making it easier for maintenance planning. This scheme can also be applied in the fault detection and isolation of other process machines, such as compressors and turbines.

Author Contributions

Conceptualization, X.L. and F.D.; investigation and methodology, X.L. and F.D.; data analysis, X.L.; validation, F.D., and D.M.; writing—original draft preparation, X.L.; writing—review and editing, F.D., D.M. and I.B.; funding acquisition, D.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

Azadeh, A.; Ebrahimipour, V.; Bavar, P. A pump FMEA approach to improve reliability centered maintenance procedure: The case of centrifugal pumps in onshore industry. In Proceedings of the 6th WSEAS International Conference on FLUID MECHANICS (FLUIDS09), Ningbo, China, 10–12 January 2009; pp. 38–45. [Google Scholar]
Isermann, R. Process fault detection based on modeling and estimation methods—A survey. Automatica 1984, 20, 387–404. [Google Scholar] [CrossRef]
Isermann, R. Fault-Diagnosis Systems: An Introduction from Fault Detection to Fault Tolerance; Springer Science & Business Media: Berlin, Germany, 2006. [Google Scholar]
Das, A.; Maiti, J.; Banerjee, R.N. Process monitoring and fault detection strategies: A review. Int. J. Qual. Reliab. Manag. 2012, 29, 720–752. [Google Scholar] [CrossRef]
Antzoulakos, D.L.; Rakitzis, A.C. The modified r out of m control chart. Commun. Stat. Comput. 2008, 37, 396–408. [Google Scholar] [CrossRef] [Green Version]
Jiang, G.; Xie, P.; He, H.; Yan, J. Wind turbine fault detection using a denoising autoencoder with temporal information. IEEE ASME Trans. Mechatron. 2017, 23, 89–100. [Google Scholar] [CrossRef]
Oakland, J.S. Statistical Process Control; Routledge: London, UK, 2007. [Google Scholar]
Mba, D.; Rao, R.B. Development of acoustic emission technology for condition monitoring and diagnosis of rotating machines; bearings, pumps, gearboxes, engines and rotating structures. Shock Vibr. Dig. 2006, 38, 3–16. [Google Scholar] [CrossRef] [Green Version]
Alfayez, L.; Mba, D.; Dyson, G. The application of acoustic emission for detecting incipient cavitation and the best efficiency point of a 60 kW centrifugal pump: Case study. NDT E Int. 2005, 38, 354–358. [Google Scholar] [CrossRef] [Green Version]
Sakthivel, N.R.; Sugumaran, V.; Babudevasenapati, S. Vibration based fault diagnosis of monoblock centrifugal pump using decision tree. Expert Syst. Appl. 2010, 37, 4040–4049. [Google Scholar] [CrossRef]
Zouari, R.; Sieg-Zieba, S.; Sidahmed, M. Fault detection system for centrifugal pumps using neural networks and neuro-fuzzy techniques. Surveillance 2004, 5, 11–13. [Google Scholar]
Rajakarunakaran, S.; Venkumar, P.; Devaraj, D.; Rao, K.S.P. Artificial neural network approach for fault detection in rotary system. Appl. Soft Comput. 2008, 8, 740–748. [Google Scholar] [CrossRef]
Eom, Y.H.; Yoo, J.W.; Hong, S.B.; Kim, M.S. Refrigerant charge fault detection method of air source heat pump system using convolutional neural network for energy saving. Energy 2019, 187, 115877. [Google Scholar] [CrossRef]
Hotelling, H.; Hotelling, H. Multivariate quality control. In Techniques of Statistical Analysis; Eisenhart, C., Hastay, M.W., Wallis, W.A., Eds.; McGraw-Hill: New York, NY, USA, 1947. [Google Scholar]
Ahmed, M.; Baqqar, M.; Gu, F.; Ball, A.D. Fault detection and diagnosis using Principal Component Analysis of vibration data from a reciprocating compressor. In Proceedings of the 2012 UKACC International Conference on Control, Cardiff, UK, 3–5 September 2012; pp. 461–466. [Google Scholar] [CrossRef] [Green Version]
Stetco, A. Machine learning methods for wind turbine condition monitoring: A review. Renew. Energy 2019, 133, 620–635. [Google Scholar] [CrossRef]
Pimentel, M.A.; Clifton, D.A.; Clifton, L.; Tarassenko, L. A review of novelty detection. Signal Process. 2014, 99, 215–249. [Google Scholar] [CrossRef]
Bangert, P. Optimization for Industrial Problems; Springer Science & Business Media: Berlin, Germany, 2012. [Google Scholar]
Bangert, P. Smart condition monitoring using machine learning. Soc. Pet. Eng. 2017. [Google Scholar] [CrossRef] [Green Version]
Zadakbar, O.; Imtiaz, S.; Khan, F. Dynamic risk assessment and fault detection using principal component analysis. Ind. Eng. Chem. Res. 2013, 52, 809–816. [Google Scholar] [CrossRef]
Ringberg, H.; Soule, A.; Rexford, J.; Diot, C. Sensitivity of PCA for traffic anomaly detection. ACM Sigmetr. Perform. Eval. Rev. 2007, 35, 109–120. [Google Scholar] [CrossRef]
Thottan, M.; Ji, C. Anomaly detection in IP networks. IEEE Trans. Signal Process. 2003, 51, 2191–2204. [Google Scholar] [CrossRef] [Green Version]
Géron, A. Hands-on Machine Learning with Scikit-Learn and Tensorflow: Concepts, Tools, and Techniques to Build Intelligent Systems; OReilly Media, Inc.: Newton, MA, USA, 2017. [Google Scholar]
Wu, X.; Jiang, G.; Wang, X.; Xie, P.; Li, X. A Multi-Level-Denoising Autoencoder Approach for Wind Turbine Fault Detection. IEEE Access 2019, 7, 59376–59387. [Google Scholar] [CrossRef]
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
Ng, A. Sparse autoencoder. Cs294a Lect. Notes 2011, 72, 1–19. [Google Scholar]
Gugulothu, N.; Malhotra, P.; Vig, L.; Shroff, G. Sparse Neural Networks for Anomaly Detection in High-Dimensional Time Series. In Proceedings of the AI4IOT workshop in conjunction with ICML, IJCAI and ECAI, Stockholm, Sweden, 13–15 July 2018. [Google Scholar]
Jardine, A.K.S.; Lin, D.; Banjevic, D. A review on machinery diagnostics and prognostics implementing condition-based maintenance. Mech. Syst. Signal Process. 2006, 20, 1483–1510. [Google Scholar] [CrossRef]
Hou, W.; Wei, Y.; Guo, J.; Jin, Y.; Zhu, C. Automatic detection of welding defects using deep neural network. J. Phys: Conf. Ser. 2018, 933, 012006. [Google Scholar] [CrossRef]
Bangalore, P.; Tjernberg, L.B. An Artificial Neural Network Approach for Early Fault Detection of Gearbox Bearings. IEEE Trans. Smart Grid 2015, 6, 980–987. [Google Scholar] [CrossRef]
Bangalore, P.; Letzgus, S.; Karlsson, D.; Patriksson, M. An artificial neural network-based condition monitoring method for wind turbines, with application to the monitoring of the gearbox. Wind Energy 2017, 20, 1421–1438. [Google Scholar] [CrossRef]
Leys, C.; Klein, O.; Dominicy, Y.; Ley, C. Detecting multivariate outliers: Use a robust variant of the Mahalanobis distance. J. Exp. Soc. Psychol. 2018, 74, 150–156. [Google Scholar] [CrossRef]
Odiowei, P.-E.P.; Cao, Y. Nonlinear dynamic process monitoring using canonical variate analysis and kernel density estimations. IEEE Trans. Ind. Inf. 2009, 6, 36–45. [Google Scholar] [CrossRef] [Green Version]
Ketelaere, B.D.E.; Hubert, M.I.A.; Schmitt, E. Overview of PCA-Based Statistical Process-Monitoring Methods for Time-Dependent, High-Dimensional Data. J. Qual. Technol. 2015, 47, 318–335. [Google Scholar] [CrossRef]
Qin, S.J. Survey on data-driven industrial process monitoring and diagnosis. Annu. Rev. Control 2012, 36, 220–234. [Google Scholar] [CrossRef]
Kruger, U.; Xie, L. Statistical Monitoring of Complex Multivariate Processes; John Wiley & Sons, Ltd.: Chichester, UK, 2012. [Google Scholar]
Brown, S.; Tauler, R.; Walczak, B. Comprehensive Chemometrics: Chemical and Biochemical Data Analysis; Elsevier: Amsterdam, The Netherlands, 2020. [Google Scholar]
Zhu, X.; Braatz, R.D. Two-dimensional contribution map for fault identification focus on education. IEEE Control Syst. Mag. 2014, 34, 72–77. [Google Scholar]
Chen, Z. Comparison of two basic statistics for fault detection and process monitoring. IFAC-PapersOnLine 2017, 50, 14776–14781. [Google Scholar] [CrossRef]
The Process Piping, Introduction to Pumps. Available online: https://www.theprocesspiping.com/introduction-to-pumps/ (accessed on 14 August 2020).
Gülich, J.F. Centrifugal Pumps; Springer Science & Business Media: Berlin, Germany, 2007. [Google Scholar]

Figure 1. Diagram of proposed fault detection and isolation scheme.

Figure 2. Sparse autoencoder structure.

Figure 3. Receiver operating characteristics (ROC) curve and area under the ROC curve (AUC) value.

Figure 4. Mahalanobis distance (MD) fault detection thresholds calculated during the training stage from 10 March 2013 to 21 June 2013. (a) MD threshold calculated by principal component analysis (PCA). (b) MD threshold calculated by sparse autoencoder (SAE). (Blue points: healthy data; red points: anomalies; magenta line: reference MD thresholds.).

Figure 5. Anomaly detection from 22 June 2013 to 6 July 2013. (a) Anomaly detection by PCA. (b) Anomaly detection by SAE.

Figure 6. Measured bearing temperature obtained from the pump′s condition monitoring system.

Figure 7. Contribution map by PCA model from 22 June 2013 to 6 July 2013.

Figure 8. Contribution map by SAE model from 22 June 2013 to 6 July 2013.

Figure 9. ROC curves and average AUC values: (a) ROC curves for the PCA model with a different number of principal components; (b) ROC curves for the SAE model with different number nodes in the hidden layer; (c) average AUC values of PCA and SAE models.

Figure 10. MD fault detection thresholds calculated during the training stage from 1 September 2015 to 11 July 2016 of the pump. (a) MD threshold calculated by PCA. (b) MD threshold calculated by SAE. (Blue points: healthy data; red points: anomalies; magenta line: reference MD thresholds.).

Figure 11. Anomaly detection from 12 July 2016 to 23 August 2016. (a) Anomaly detection by PCA. (b) Anomaly detection by SAE.

Figure 12. Measurements of the pump from 12 July to 23 August 2016. (a) The increase in the pump bearings’ temperature due to misalignment. (b) The increase in the pump bearings’ vibration amplitude due to a bearing fault.

Figure 13. Contribution map by the PCA model from 12 July 2016 to 23 August 2016.

Figure 14. Contribution map by the SAE model from 12 July 2016 to 23 August 2016.

Figure 15. ROC curves and average AUC values for detection of the misalignment: (a) ROC curves for the PCA model with a different number of principal components; (b) ROC curves for the SAE model with a different number nodes in the hidden layer; (c) average AUC values of PCA and SAE models.

Figure 16. ROC curves and average AUC values for the detection of the bearing fault: (a) ROC curves for the PCA model with a different number of principal components; (b) ROC curves for the SAE model with different number nodes in the hidden layer; (c) average AUC values of the PCA and SAE models.

Table 1. Measurement variables in the pump monitoring system.

ID	Variable Name	ID	Variable Name	ID	Variable Name
1	Speed	2	Suction pressure	3	Discharge pressure
4	Discharge temperature	5	Actual flow	6	Radial vibration overall X1
7	Radial vibration overall Y1	8	Radial bearing temperature 1	9	Radial vibration overall X2
10	Radial vibration overall Y2	11	Radial bearing temperature 2	12	Thrust position axial probe1
13	Thrust position axial probe 2	14	Active thrust bearing temperature 1	15	Inactive thrust bearing temperature 1

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liang, X.; Duan, F.; Bennett, I.; Mba, D. A Sparse Autoencoder-Based Unsupervised Scheme for Pump Fault Detection and Isolation. Appl. Sci. 2020, 10, 6789. https://doi.org/10.3390/app10196789

AMA Style

Liang X, Duan F, Bennett I, Mba D. A Sparse Autoencoder-Based Unsupervised Scheme for Pump Fault Detection and Isolation. Applied Sciences. 2020; 10(19):6789. https://doi.org/10.3390/app10196789

Chicago/Turabian Style

Liang, Xiaoxia, Fang Duan, Ian Bennett, and David Mba. 2020. "A Sparse Autoencoder-Based Unsupervised Scheme for Pump Fault Detection and Isolation" Applied Sciences 10, no. 19: 6789. https://doi.org/10.3390/app10196789

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Sparse Autoencoder-Based Unsupervised Scheme for Pump Fault Detection and Isolation

Abstract

1. Introduction

2. Fault Detection and Isolation Scheme

2.1. Offline Model Training

2.1.1. Data Pre-Processing

2.1.2. Sparse Autoencoder

2.1.3. Residual Evaluation and Threshold Calculation

2.2. Online Fault Detection Phase

2.2.1. Fault Detection

2.2.2. Fault Isolation

2.3. Performance Metrics

3. Experimental Data Description

4. Results and Discussion

4.1. Case One: Detection of a Misalignment Fault

4.2. Case Two: Detection of a Misalignment Fault and Bearing Fault

4.3. Discussion

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI