An Optimal Spatio-Temporal Hybrid Model Based on Wavelet Transform for Early Fault Detection

Xing, Jingyang; Li, Fangfang; Ma, Xiaoyu; Qin, Qiuyue

doi:10.3390/s24144736

Open AccessArticle

An Optimal Spatio-Temporal Hybrid Model Based on Wavelet Transform for Early Fault Detection

¹

School of Chang Chien, Nantong University, Nantong 226019, China

²

School of Electrical Engineering and Automation, Nantong University, Nantong 226019, China

^*

Author to whom correspondence should be addressed.

Sensors 2024, 24(14), 4736; https://doi.org/10.3390/s24144736 (registering DOI)

Submission received: 26 May 2024 / Revised: 27 June 2024 / Accepted: 15 July 2024 / Published: 21 July 2024

(This article belongs to the Section Fault Diagnosis & Sensors)

Download

Browse Figures

Versions Notes

Abstract

:

An optimal spatio-temporal hybrid model (STHM) based on wavelet transform (WT) is proposed to improve the sensitivity and accuracy of detecting slowly evolving faults that occur in the early stage and easily submerge with noise in complex industrial production systems. Specifically, a WT is performed to denoise the original data, thus reducing the influence of background noise. Then, a principal component analysis (PCA) and the sliding window algorithm are used to acquire the nearest neighbors in both spatial and time dimensions. Subsequently, the cumulative sum (CUSUM) and the mahalanobis distance (MD) are used to reconstruct the hybrid statistic with spatial and temporal sequences. It helps to enhance the correlation between high-frequency temporal dynamics and space and improves fault detection precision. Moreover, the kernel density estimation (KDE) method is used to estimate the upper threshold of the hybrid statistic so as to optimize the fault detection process. Finally, simulations are conducted by applying the WT-based optimal STHM in the early fault detection of the Tennessee Eastman (TE) process, with the aim of proving that the fault detection method proposed has a high fault detection rate (FDR) and a low false alarm rate (FAR), and it can improve both production safety and product quality.

Keywords:

wavelet transform; principal component analysis; kernel density estimation; spatio-temporal hybrid model; early fault detection

1. Introduction

Effective early fault detection in the industrial production process is extremely important for improving the operation safety of production systems and achieving the maximum economic benefit. In modern industrial processes [1], all the links are correlated. In particular, in chemistry [2], petroleum [3], pharmacies [4], and other crucial sectors, the failure of any link will lead to the failure of partial functions in the production process or even the entire process. Currently available fault detection systems cannot effectively process the dynamic data produced by the changes in high-frequency operations of the system and the complex spatial correlation between production links. In addition, these systems often fail to detect and respond to early tiny changes or abnormalities because they lack an advanced data processing method that can comprehensively and synchronously analyze complex dynamic data. Therefore, effective fault detection of complex industrial processes can improve industrial production safety. Developing new technology to improve early fault detection precision and response speeds has become a research focus in this field.

Data-driven fault detection and process monitoring techniques, such as principal component analysis (PCA) [5], partial least squares (PLSs) [6], and independent component analysis (ICA) [7], have been used to monitor complex chemical reactions and process control systems in the chemical industrial process [8]. These methods extract the main or independent components from the key variables and use Hotelling’s T-square (

T^{2}

) [9] and the square prediction error (SPE, also called the Q statistic) [10] for production process monitoring and quality control. However, these statistics are usually based on the assumption that the data come from an independent and identically distributed unimodal production environment [11]. This assumption may not hold in a multimodal process [12]. For instance, when the production conditions change, a single monitoring model will not be able to reflect correctly the dynamic changes in all the operation states [13]. In addition, these methods often do not take nonlinear relationships and complex correlations between time sequences into account, which may lead to the delayed or missed detection of faults.

There are multiple variables involved in the fault detection of modern industrial systems, so the dynamic analysis method has been applied to the real-time monitoring of key parameters in the industrial production process [14]. Zhou et al. proposed a hierarchical PCA method based on differential features for dynamic fault detection and used it to identify abnormalities in the production process [15]. Pan et al. developed a generalized likelihood ratio (GLR)-based fault detection method for non-Gaussian dynamic processes [16]. Song et al. made a dynamic inner slow feature analysis and proposed a method based on feature selection and extraction for enhancing the sensitivity of fault detection in walking gear systems [17]. However, the performance of the above-mentioned methods may be affected by the changes in the actual process. In particular, when they are used for mean-shift detection, delays are likely to occur, and they often cannot detect the fault accurately in time [18].

To solve the problem of nonlinear features in the fault detection of modern industrial systems, Mohammed et al. designed an expanded Kalman fault detection method specifically for nonlinear random systems [19]. Ferdowsi improved this method and introduced fault detection and prediction techniques for actuators and sensors applicable to multidimensional nonlinear partial differential equation systems [20]. Moreover, Han et al. proposed a new fault detection method based on nonlinear factorization and fuzzy models, which could identify early faults that caused data distribution changes [21]. Yan and Zhang et al. analyzed the hierarchical structure of the nonlinear system and built an unmeasurable nonlinear system fault detection framework [22]. Furthermore, Gong et al. used multi-source information fusions in the fault detection of nonlinear systems [23]. This method can effectively detect early faults in chemical processes.

With the development of machine learning in the artificial intelligence field, it has been applied to fault detection and prediction in modern industrial systems [24]. Harichandran et al. used hybrid machine learning frameworks to recognize device activity and detect early faults in automated construction [25]. Wang et al. employed graph autoencoders and ensemble learning for the fault detection of bearings [26]. Shubita et al. proposed a method for fault detection in rotating machines based on sound signals using edge machine learning [27]. Kumar et al. summarized research advances in the machine learning algorithm-based fault detection of asynchronous motors [28]. The above-mentioned learning algorithms perform well in early fault detection, but they require a large training dataset, which is difficult to obtain in practical situations [29]. In addition, deep learning models have high computational complexity, requiring significant computational resources for training and inference, which makes it difficult to meet the demands of real-time detection [30,31]. The interpretability of deep learning models is also low, making the diagnostic results challenging for industrial engineers to understand and apply.

New statistic models have been developed for data analysis and to improve detection sensitivity and accuracy. Hou et al. introduced a spatial data processing technique into the spatial geometry-based fault detection of output feedback systems, aiming at solving the problems that traditional methods encounter when processing complex data [32]. Qian et al. developed a method of fault detection in wind turbines based on spatio-temporal features and neighborhood operation states [33]. Temporal sequence analysis and spatial analysis improved the ability to predict the fault development trend. However, some of these methods cannot acquire the nonlinear relationship between large-scale datasets and systems in complex industrial processes, and some cannot solve the complex correlations between temporal sequences. Moreover, their generalization ability and reliability in practical applications still require verification.

Therefore, a wavelet transform (WT)-based optimal spatio-temporal hybrid model (STHM) is proposed for the high-sensitivity and high-precision detection of early faults in complex industrial production systems [34]. A WT is made first to denoise the data and reduce the influence of background noise. PCA, cumulative sum (CUSUM), and mahalanobis distance (MD) methods are then used to reconstruct the STHM with temporal and spatial sequences [35,36,37]. Moreover, the kernel density estimation (KDE) method is used to optimize the detection process, thereby enhancing the correlation between high-frequency temporal dynamics and space and improving fault detection precision. Simulations are made on the Tenessee Eastman (TE) experimental platform to compare the fault detection performance of the proposed model with that of the PCA and spatio-temporal nearest neighbor (STN) methods [38,39]. This study aims to demonstrate that the proposed method has a high FDR and a low FAR, and it can improve production safety and product quality.

2. Preliminary Work

2.1. Wavelet Transform

Data collected from chemical processes are often complex and dynamic, with high-dimensional and complex temporal sequences. WT is effective in the analysis of signals at different scales since it has the characteristics of time-frequency localization. Therefore, WT is able to preserve more key information in dynamic processes while reducing the impact of noise. During the WT process, signal energy is mainly concentrated in some coefficients, and noise energy is distributed on the entire coefficient axis. The wavelet coefficient of signals is generally larger than that of noise. These coefficients describe the energy distribution of signals at different scales. A noisy model is represented as follows:

Z (i) = X (i) + ε * e (i) i = 1,2 . . ., n

(1)

where

X (i)

is the real signal required,

Z (i)

is the original signal with noise,

e (i)

is noise, and

ε

is the variable coefficient of noise.

To differentiate real signals from noise, a corresponding threshold is set, and wavelet coefficients smaller than this threshold are removed. In this study, a soft threshold function is used. All the wavelet coefficients smaller than the set threshold are considered as noise and removed, and those larger than the threshold are retained and taken as important components of signals. The soft threshold function is the following:

w_{j, k} = \{\begin{matrix} sgn (w_{j, k}) (|w_{j, k}| - t h r) & |w_{j, k}| \geq t h r \\ 0 & |w_{j, k}| < t h r \end{matrix}

(2)

where

w_{j, k}

is the

k

th wavelet coefficient on the

j

th layer of WT, and

t h r

is the threshold. A proper threshold can improve the noise reduction effect and help preserve key features. The threshold is calculated by the following:

t h r = σ \sqrt{2 \ln N}

(3)

where

σ

is the standard deviation of noise after decomposition on the

j

th layer. It is estimated by the following:

σ = \frac{1}{0.6745} \times \frac{1}{N} \sum_{k = 1}^{N} |ω_{j, k}|

(4)

The real signal

X (i)

obtained by WT provides more accurate and clear data support for subsequent fault detection and improves the reliability and accuracy of fault detection. The WT process is detailed in Figure 1.

As shown in Figure 1, there are several key steps for wavelet denoising:

The WT of the original signal is performed. In this process, the signal is decomposed to $i$ layers, including an approximate component $c a i$ and detailed components $(c d 1, c d 2, . . ., c d i)$ .
The detailed components are compared with the given threshold $(t h r)$ , and the signal components lower than this threshold are deemed noise and removed.
The processed detailed components and unprocessed approximate components are used to reconstruct the signal, which is then transformed to the time domain from the wavelet domain. The signal is, thus, finally restored.

2.2. Principal Component Analysis

In the fault detection of multivariate industrial processes, PCA is extensively applied to the dimensional reduction or feature extraction of data as it can effectively identify the most important components in data.

In this study, we assume that an industrial process has

m

observed variables, which are obtained by

n

times of data sampling. The data are preprocessed by WT and a new dataset

X (i)

is acquired. The new dataset is divided into a training dataset

X^{t r} = {[x_{1}^{t r}, x_{2}^{t r}, . . . . . ., x_{n}^{t r}]}^{T} \in R^{n \times m}

and a testing dataset

X^{t e} = {[x_{1}^{t e}, x_{2}^{t e}, . . . . . ., x_{n}^{t e}]}^{T} \in R^{n \times m}

. The training dataset

X^{t r}

and testing dataset

X^{t e}

are standardized into

n \times m

dimensional matrices

X_{n m}^{t r}

and

X_{n m}^{t e}

, respectively. The matrices are normalized to eliminate the adverse effect of primary variables in the dimensional reduction process. The dispersion of

m

columns of data is measured according to the correlation between two features calculated by the covariance formula. The

m \times m

dimensional covariance matrix

C_{m m}

constructed for the calculation of the correlation between two of each

m

features are indicated by the following:

C_{m m} = \frac{1}{n - 1} (X_{n m} - {\overset{⎯}{X}}_{n m})^{T} (X_{n m} - {\overset{⎯}{X}}_{n m})

(5)

The eigenvalue

λ_{m}

and eigenvector

p_{m}

of

C_{m m}

are solved. By arranging eigenvalues

λ_{m}

according to the size, we can obtain corresponding eigenvectors

p_{m}

and generate a matrix

P_{m}

. Supposing that

m

dimensional eigenvectors are reduced to

k

dimensional eigenvectors, the first

k

columns of the load matrix are selected to construct a new load matrix

P_{m k}

. Then,

X_{n m}

is reconstructed, and a score matrix is obtained. That is, the original dataset is projected onto the space formed by

k

eigenvectors. The PCA model is finally obtained as follows:

X = X_{n m} P_{m k} + E

(6)

where

X_{n m} P_{m k}

is the principal subspace and

E

is the residual subspace.

The principal components are selected according to the cumulative percent variance (CPV). An expected CPV no smaller than 85% is given at first. The first

k

eigenvalues of

C_{m m}

are selected, and the ratio of the sum of these eigenvalues to the sum of all eigenvalues reflects the CPV. The

k

value is determined when the CPV calculated is greater than the given value for the first time. The calculation formula for CPV is as follows:

C P V (k) = \frac{\sum_{j = 1}^{k} λ_{j}}{\sum_{i = 1}^{m} λ_{i}} \times 100 % \geq 90 %

(7)

In this study, the PCA method is used to process the training dataset

X_{n m}^{t r}

. The load matrix

P_{m k}

derived from Formulas (5)–(7) effectively extracts the most important variations in the data. These principal components are used to reconstruct the testing dataset

X_{n m}^{t e}

, and finally, the principal subspace

N (x) \in R^{n \times k}

is obtained. The PCA method optimizes data processing and can effectively identify and analyze key features in complex data.

2.3. Spatio-Temporal Nearest Neighbor Method for Fault Detection

A sliding window

W

is given to find the nearest neighbors

T (x) = \{x_{1}, . . ., x_{f}, . . ., x_{W}\} \in R^{W \times k}

of the sample

x

in the principal subspace

N (x)

for the time dimension. The k-nearest neighbors

Q (x) = \{x_{1}, x_{2}, . . ., x_{K}\} \in R^{K \times k}

of the sample

x_{f}

in the spatial dimension are searched based on distance. The mean and standard deviation of

Q (x)

are calculated by the following:

m (Q (x)) = \frac{1}{K} \sum_{f = 1}^{K} x_{f}^{'}

(8)

s (Q (x)) = \sqrt{\frac{1}{K - 1} \sum_{f = 1}^{K} {(x_{f}^{'} - m (Q (x)))}^{2}}

(9)

The sample

x

is standardized:

\overset{⎯}{x} = \frac{1}{W} \sum_{i = 1}^{W} \frac{x - m (Q (x))}{s (Q (x))}

(10)

Euclidean distance is used to measure the distance between two points:

d (x, y) = \sqrt{\sum_{i = 1}^{n} {(x_{i} - y_{i})}^{2}}

(11)

where

x

and

y

are two

n

dimensional points, and

i

is the dimension of data. The distance between each sample in the dataset and its k-nearest neighbors is calculated and used to estimate the local density of the sample. The distance is calculated by the following:

D_{k} (x) = \frac{1}{k} \sum_{i = 1}^{k} d (x, x_{(i)})

(12)

where

x_{(i)}

is the

i

th nearest neighbor of the sample

x

. All

D_{k} (x)

values are arranged in order, and a high percentile

p

(of 0.95) is taken as the control limit. Data points with the highest estimated local density are considered a potential abnormality. The threshold is calculated by the following:

S T N = D_{k}^{(p)}

(13)

where

D_{k}^{(p)}

is the

p

th percentile of all

D_{k} (x)

values. During the monitoring process, the

D_{k} (x_{n e w})

value of a new observed point is calculated and compared with the control limit

S T N

to determine if a malfunction occurs.

3. Spatio-Temporal Hybrid Model for Early Fault Detection

3.1. Construction of the Spatio-Temporal Hybrid Model

A sliding window

W

is given to find the nearest neighbors

T (x) = \{x_{1}, x_{2}, . . ., x_{W}\} \in R^{W \times k}

of the sample

x

in the principal subspace

N (x)

in the time dimension. The k-nearest neighbors

S (x) = \{x_{1}, x_{2}, . . ., x_{K}\} \in R^{K \times k}

in the spatial dimension are searched based on distance.

In data analysis, especially in the monitoring of industrial processes or system state changes, CUSUM is a commonly used algorithm for the detection of minor changes. CUSUM is a sequential detection technology that accumulates incremental changes in data by recursion. The cumulative sum obtained can be used to update and detect statistical deviations in the data flow in real-time and effectively capture small changes or abnormalities in a process. The CUSUM formula for each variable is defined as follows:

C U S U M_{i j} = m a x (0, C U S U M_{i - 1, j} + (x_{i j} - μ_{j} - c))

(14)

where

C U S U M_{i j}

and

x_{i j}

are the CUSUM and observed values of the

j

th variable in the

i

th sample, respectively;

μ_{j}

is the expected offset value of the

j

th variable; and

c

is the decision interval.

The decision interval has great influence on the sensitivity and FAR of the CUSUM algorithm. To improve the performance of the algorithm, the standard deviation

σ

of the nearest neighbor sample set in the time dimension is estimated at first. A coefficient

r

is given, and the sensitivity of CUSUM is measured. The decision interval is calculated by the formula below to determine the change threshold of the statistic.

c = r \times σ

(15)

According to Formulas (14) and (15), for a given time window

W

, the statistic

S_{t}

of the sample

x

in the time dimension is expressed as follows:

S_{t} = \frac{1}{W} \sum_{i = 1}^{W} \sum_{j = 1}^{k} C U S U M_{i j}

(16)

The mean

μ (x) = \frac{1}{K} \sum_{i = 1}^{K} x_{i}

and covariance matrix

\sum

of the nearest neighbor set

S (x)

of the sample

x

in the spatial dimension are obtained.

First, data points are rotated, and dimensions are linearly independent. Then, we obtain the following:

F = S (x) = U^{T} X

(17)

Since the dimensions are linearly independent after transformation and the eigenvalue of each dimension is its variance, we obtain the following:

(F - μ) {(F - μ)}^{T} = U^{T} \sum U

(18)

Through the rotation and scaling of Euclidean distance, MD is acquired as follows:

D (x) = \sqrt{{(x - μ)}^{T} \sum^{- 1} (x - μ)}

(19)

The statistic

S_{s}

of the sample

x

in the spatial dimension is derived from Formulas (17) to (19):

S_{s} = D_{(x)}^{2} = {(x - μ)}^{T} \sum^{- 1} (x - μ)

(20)

3.2. Calculation of the Hybrid Statistic Using the Absolute Deviation

The absolute deviation can improve the sensitivity of the model to data changes in time and spatial dimensions, especially when statistics are related in time. The absolute deviation is defined as follows:

M_{i} = \frac{1}{n} \sum_{i = 1}^{n} |S_{t} (x_{i}) - S_{s} (x_{i})|

(21)

where

M_{i}

is the absolute deviation of the statistics of the

i

th sample in time and spatial dimensions. The basic weight of the time dimension is thus obtained as follows:

ω_{i} = \frac{S_{t} (x_{i})}{M_{i}}

(22)

Through scaling, the ratio of statistics in different dimensions of the STHM enables the measurement of the maximum fault severity is obtained. The final hybrid statistic

S_{t s}

is expressed as follows:

S_{t s} (x_{i}) = ω_{i} \cdot S_{t} (x_{i}) + (1 - ω_{i}) \cdot S_{s} (x_{i})

(23)

The KDE method can flexibly and accurately estimate the distribution density of complex data and improve fault detection accuracy and reliability. A significance level of

α = 0.05

is given, and the upper limit of

S_{t s}

is measured using the KDE method:

\int \begin{array}{l} η_{S_{t s}} (α) \\ - \infty \end{array} p (S_{t s}) d (S_{t s}) = α

(24)

where

p (z)

is the probability density function of a random variable

z

. The following fault detection logic is used in practice as follows:

\{\begin{matrix} S_{t s} (x_{i}) \leq η_{S_{t s}} (α), \forall i \in (1,2, . . ., n) & n o r m a l \\ S_{t s} (x_{i}) > η_{S_{t s}} (α), \forall i \in (1,2, . . ., n) & f a u l t \end{matrix}

(25)

3.3. Steps of Fault Detection by the Spatio-Temporal Hybrid Model

To solve the problem that the nearest neighbor set of the sample in the time dimension is incomplete, the window is extended forward to ensure that the sample set is complete. The fault detection flowchart based on the Spatiotemporal Hybrid Model (STHM) is shown in Figure 2. The fault detection process of the STHM is detailed as follows.

Offline detection

Normal samples are collected to form a dataset $Z (i)$ for the model, and a WT is conducted to reduce the influence of background noise. A real dataset $X (i)$ is thus obtained.
The real dataset is divided into a training dataset $X^{t r} = {[x_{1}^{t r}, x_{2}^{t r}, . . . . . ., x_{n}^{t r}]}^{T} \in R^{n \times m}$ and a testing dataset $X^{t e} = {[x_{1}^{t e}, x_{2}^{t e}, . . . . . ., x_{n}^{t e}]}^{T} \in R^{n \times m}$ , which are standardized into $n \times m$ dimensional matrices $X_{n m}^{t r}$ and $X_{n m}^{t e}$ .
The PCA method is used to build a mode with a normal training dataset, and a load matrix ${P_{m}}_{k}$ is obtained. Through further calculation, the principal subspace $N (x) \in R^{n \times k}$ of the testing dataset is acquired.
A sliding window $W$ is given to find the nearest neighbor set $T (x) = \{x_{1}, x_{2}, . . ., x_{W}\}$ of the sample $x$ in the time dimension. The k-nearest neighbors $S (x) = \{x_{1}, x_{2}, . . ., x_{K}\}$ in the spatial dimension are searched based on distance. The statistics $S_{t}$ and $S_{s}$ of the sample $x$ in time and spatial dimensions are calculated, respectively.
The moving window method is used to repeat the last step. Statistics $S_{t}$ in $n$ time dimensions and statistics $S_{s}$ in $n$ spatial dimensions are obtained.
The absolute deviation $M_{i}$ is calculated, and the hybrid statistic $S_{t s}$ is obtained by the STHM.
A confidence level is given, and the control limit is estimated using the KDE method.

Online detection

Testing samples are collected, and they constitute a sample set $Z {(i)}^{'}$ . Then, a WT is performed to obtain a real dataset $X {(i)}^{'}$ for online detection.
The load matrix ${P_{m}}_{k}$ obtained in offline detection is reconstructed, and the principal subspace $N {(x)}^{'}$ of the real dataset is obtained.
The moving window method is used to calculate statistics ${S_{t}}^{'}$ in the time dimension and ${S_{s}}^{'}$ in the spatial dimension.
The weight $ω_{i}$ is calculated based on the absolute deviation, and then assigned to statistics ${S_{t}}^{'}$ in the time dimension and ${S_{s}}^{'}$ in the spatial dimension. The hybrid statistic ${S_{t s}}^{'}$ is finally acquired.
The control limit $η_{S_{t s}} (α)$ obtained in offline detection is compared with the hybrid statistic ${S_{t s}}^{'}$ to determine if a fault occurs in the testing sample.

Figure 2. Flowchart of early fault detection based on the spatio-temporal hybrid model.

4. Experiment and Result Analysis of the TE Process

4.1. Simulation

The TE process is a reliable simulation method mainly used in research on the control of chemical processes and also in multiple fields such as machinery equipment state estimation in production, the production state prediction model, and recognition of equipment failure sounds. Data produced by the platform are time-varying and nonlinear, with strong coupling strength, so it can adequately simulate typical features of real complex industrial process systems. The TE process mainly consists of five operating units, namely, a product condenser, a reactor, a product strip, a recycle compressor, and a vapor-liquid separator. The process flow diagram of the TE process is shown in Figure 3.

A total of 52 measurements, including 41 measured variables and 11 manipulated variables, were collected in the TE process. As shown in Figure 3, the measured variables included material flow, product pressure, reaction temperature, liquid level, etc. The manipulated variables are the open degrees of the valves. The degrees are set in the range of [0, 100], with 0 indicating close and 100 indicating fully open values. Table 1 introduces the main fault types of the TE process, including step change, random variations, the slow drift of variables, and sticky valves.

As shown in Table 1, in the TE process, a total of 21 faults were detected, which were classified into six types. Faults 1 to 7 are caused by step changes in process variables. Fault 4, which is the abnormal temperature of cooling water in the reactor, is a case in point. Faults 8 to 12 are attributed to random variations in process variables, such as Fault 12, which is the abnormal inlet temperature of cooling water in the condenser. Fault 13 refers to the abnormal constants of reaction kinetics, which are caused by slow changes in reaction kinetics. Faults 14 and 15 (which refer to the abnormal cooling water valve in the condenser) are triggered by sticky valves. Other faults are unknown types.

The dataset of the TE process simulates the real industrial production process. The 48 h-long simulation data were collected every 3 min, and fault data were introduced from the 8th hour. There was 1 normal state dataset and 21 fault datasets of different fault types. The data of each fault type were subdivided into a training set (containing 500 samples) and a testing set (containing 960 samples). To verify the performance of the WT-based STHM in fault detection, these were used to detect faults in the TE process, and their performance was compared to that of STN, T², and SPE as part of the PCA method.

4.2. Signal Processing Model Demonstration

To analyze the effectiveness of the proposed model in signal processing, demonstrations were conducted using the training set of the TE dataset. The tenth feature column was selected for comparative analysis before and after wavelet transform.

First, the raw data of the tenth feature column were analyzed. Next, the same feature column data were processed using wavelet transform. Wavelet transform, through multi-scale analysis, effectively separated the signal from the noise. Figure 4 shows the denoised signal before and after wavelet transform. It can be seen from the figure that wavelet transform reduces the noise, making the main features of the signal more prominent.

4.3. Simulation Result Analysis

Fault 4 (caused by step change), Fault 10 (caused by random variations), and Fault 15 (caused by sticky valves) were selected from the TE process for the comparison of the performance of STHM compared to that of the STN and PCA methods. These types of faults exhibited different characteristics in the early stages, which could fully demonstrate the comprehensive capability of detecting early faults. By analyzing the characteristics of the training dataset, the time window width was set to 8, and the length of the spatial nearest neighbor set was set to 10. The results are shown in Figure 5, Figure 6 and Figure 7. The dashed line in the figure represents the threshold line.

Figure 5 shows the results of Fault 4 detection using the four different methods. STHM responds fast and shows high sensitivity to the fault, especially in the early stage of the fault, due to its comprehensive consideration of the statistical information of time and space. Compared with STN, STHM shows better detection performance before and at the time of fault occurrence. Compared with the

T^{2}

of the PCA method, STHM performs better when the fault is happening. STHM is superior to the SPE of the PCA method during detection performance and before fault occurrence.

Figure 6 shows the results of Fault 10 detection using the four different methods. Compared with other methods, STHM still exhibits excellent performance and can capture accurately and quickly respond to data fluctuations caused by random variations. STHM not only has a lower FAR than STN before fault occurrence but is also more accurate than STN in fault detection after fault occurrence. Compared with the

T^{2}

and SPE of the PCA method, STHM can identify the fault earlier, with a lower FAR.

Figure 7 summarizes the results of Fault 15 detection using the four different methods. STHM has advantages over the other three methods in the accurate identification of early minor faults. STHM has a lower FAR and faster response speed than STN. STHM is able to detect the signs of fault occurrence earlier than the two statistic measures of the PCA method. In general, STHM can greatly enhance responses to early faults and reduce the FAR. It provides a more effective and practical technical means of detecting early faults.

5. Discussion

Products FAR and FDR are often used as indicators of fault detection in industrial processes. In the experiment, the number of normal samples is labeled as

T N

, the number of fault samples

F N

, the number of false alarms

f_{n}

, and the number of samples for fault detection

t_{n}

. Product FAR, also called the false detection rate, refers to the probability that the statistic exceeds the threshold before fault occurrence. A low FAR indicates better detection performance. The FAR is expressed as follows:

F A R = \frac{f_{n}}{T N}

(26)

FDR refers to the probability of false alarm rates when the fault occurs. In the TE process, FDR is the probability that the statistic exceeds the threshold at the time of fault occurrence. It is expressed as follows:

F D R = \frac{t_{n}}{F N}

(27)

Eight different types of faults in the TE process were detected using the STHM, STN, and PCA methods. Their product FARs and FDRs were calculated, and the results are shown in Table 2. Figure 8 compares the FDRs of different methods. In the table, the bold numbers indicate that the product FARs and FDRs of STHM are better than those of the STN and PCA methods.

As shown in Table 2, STHM is superior to other methods in the detection of most types of faults. It has an excellent FDR and product FAR, indicating that it is highly sensitive to and can accurately identify early faults. Figure 8 compares the FDRs of different methods in the detection of different types of faults.

The FDRs of STHM for Faults 1 and 4 (both are caused by step change) are 99.8% and 99.9%, respectively, indicating the outstanding detection performance of STHM. The product FAR of STHM is 0%, demonstrating its higher reliability than the PCA and SPE methods. This is because STHM optimizes data denoising and feature extraction by integrating the WT with spatio-temporal data fusion. Therefore, STHM is more sensitive to and can accurately capture early step changes.

STHM also performs much better than other methods in the detection of Fault 10 (caused by random variations). It has an FDR of 99.4%, which is much higher than the 41.6% of PCA-

T^{2}

, 71.3% of PCA-SPE, and 75.6% of STN. This finding indicates that STHM can capture more accurately and respond more quickly to data fluctuations caused by random variations.

Moreover, STHM has a higher FDR in the detection of Faults 13 and 15 (caused by minor changes) in the principal component space compared with PCA and STN methods. It demonstrates that STHM has advantages in the detection of minor faults. Through the comprehensive analysis of time and space data, STHM shows improved sensitivity to tiny changes and can accurately detect the signs of faults in the early stage. Therefore, STHM enhances the early warning performance of the system. The product FAR of STHM is slightly higher than that of PCA-

T^{2}

. The reason for this may be that the calculations of CUSUM in the time dimension and its correlation in the spatial dimension increase the sensitivity of STHM to faults, leading to it responding to normal fluctuations or non-fault variations. The simulation results prove that STHM performs excellently in the detection of early faults in complex industrial processes.

Early fault detection has significant practical application value in many industrial fields. In chemical production processes, any failure at a given stage can lead to severe safety incidents and economic losses. The proposed model can be used to detect early faults in critical equipment, such as reactors and separation devices, preventing downtime and accidents caused by equipment failure. For example, by monitoring the temperature and pressure data of a reactor, the proposed model can identify anomalies at an early stage of failure, allowing timely preventive measures to be taken to ensure production safety.

In automated manufacturing processes, equipment failure can lead to production line stoppages, affecting production efficiency. The proposed model can be used to detect early faults in various mechanical equipment on the production line, enhancing equipment maintenance management. For example, in automotive manufacturing, by monitoring the motion trajectories and current data of robotic arms, the proposed model can provide early warnings of mechanical failures, reducing downtime and maintenance costs.

6. Conclusions

Traditional spatio-temporal analysis methods have the problem of poorly correlating high-frequency temporal dynamics with space. Therefore, a WT-based STHM is proposed in this paper to improve the accuracy and sensitivity of early fault detection in complex industrial systems. The data were denoised through the WT, and the PCA method was used to construct the principal subspace. Moreover, the CUSUM and MD were used to build the hybrid model, which greatly enhanced fault detection performance. According to the simulation of the TE process, STHM outperforms the PCA and STN methods in both the FDR and FAR, indicating the great potential of STHM in actual industrial applications. However, the high sensitivity of STHM may lead to false alarms. Future research should focus on the optimization of model performance and improvement of algorithm robustness so that more comprehensive technical innovations and applications can be achieved.

Author Contributions

Conceptualization, J.X. and Q.Q.; software, J.X. and F.L.; validation, J.X. and X.M.; formal analysis, J.X. and F.L.; investigation, J.X. and Q.Q.; data curation, J.X. and X.M.; writing—original draft preparation, J.X. and Q.Q.; writing—review and editing, J.X. and F.L.; supervision, Q.Q.; project administration, Q.Q.; funding acquisition, Q.Q.; All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Talent Introduction Startup Fund of Nantong University, grant number 135437612076.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Yan, W.; Wang, J.; Lu, S.; Zhou, M.; Peng, X. A Review of Real-Time Fault Diagnosis Methods for Industrial Smart Manufacturing. Processes 2023, 11, 369. [Google Scholar] [CrossRef]
Taqvi, S.A.A.; Zabiri, H.; Tufa, L.D.; Uddin, F.; Fatima, S.A.; Maulud, A.S. A Review on Data-Driven Learning Approaches for Fault Detection and Diagnosis in Chemical Processes. ChemBioEng Rev. 2021, 8, 239–259. [Google Scholar] [CrossRef]
Perez-Zuniga, G.; Sotomayor-Moriano, J.; Rivas-Perez, R.; Sanchez-Zurita, V. Distributed Fault Detection and Isolation Approach for Oil Pipelines. Appl. Sci. 2021, 11, 11993. [Google Scholar] [CrossRef]
Colucci, D.; Prats-Montalban, J.M.; Ferrer, A.; Fissore, D. On-line Product Quality and Process Failure Monitoring in Freeze-Drying of Pharmaceutical Products. Dry. Technol. 2021, 39, 134–147. [Google Scholar] [CrossRef]
Yang, X.; Chen, J.; Gu, X.; He, R.; Wang, J. Sensitivity Analysis of Scalable Data on Three PCA Related Fault Detection Methods Considering Data Window and Thermal Load Matching Strategies. Expert Syst. Appl. 2023, 234, 121024. [Google Scholar] [CrossRef]
Feng, L.; Guo, S.; Wu, Y.; Xing, Y.; Li, Y. Application of Time-Space Neighborhood Standardization Technology to Complex Multi-Stage Process Fault Detection. J. Chemometr. 2024, e3546. [Google Scholar] [CrossRef]
Jung, S.; Kim, M.; Lee, H.; Kim, J.; Kim, S. Fault Detection Method for Multivariate Process using ICA. J. Korea Inst. Inf. Commun. Eng. 2020, 24, 192–197. [Google Scholar]
Mercorelli, P. Recent Advances in Intelligent Algorithms for Fault Detection and Diagnosis. Sensors 2024, 24, 2656. [Google Scholar] [CrossRef] [PubMed]
Aslam, M.; Arif, O.H. Multivariate Analysis under Indeterminacy: An Application to Chemical Content Data. J. Anal. Methods Chem. 2020, 2020, 1406028. [Google Scholar] [CrossRef]
Zeng, L.; Long, W.; Li, Y. A Novel Method for Gas Turbine Condition Monitoring Based on KPCA and Analysis of Statistics T2 and SPE. Processes 2019, 7, 124. [Google Scholar] [CrossRef]
Wang, M.; Wang, M.; Wang, X. On A Spitzer-Type Law of Large Numbers for Partial Sums of Independent and Identically Distributed Random Variables under Sub-Linear Expectations. Bull. Korean. Math. Soc. 2023, 60, 687–703. [Google Scholar]
Duan, S.; Shi, Q.; Wu, J. Multimodal Sensors and ML-Based Data Fusion for Advanced Robots. Adv. Intell. Syst. 2022, 4, 2200213. [Google Scholar] [CrossRef]
Hu, C.; Xu, Z.; Kong, X.; Luo, J. Recursive-CPLS-Based Quality-Relevant and Process-Relevant Fault Monitoring with Application to the Tennessee Eastman Process. IEEE Access 2019, 7, 128746–128757. [Google Scholar] [CrossRef]
Djerida, A.; Zhao, Z.; Zhao, J. Background Subtraction in Dynamic Scenes Using the Dynamic Principal Component Analysis. IET Image Process. 2020, 14, 245–255. [Google Scholar] [CrossRef]
Zhou, F.; Park, J.H.; Liu, Y. Differential Feature Based Hierarchical PCA Fault Detection Method for Dynamic Fault. Neurocomputing 2016, 202, 27–35. [Google Scholar] [CrossRef]
Pan, X.; Gao, L.; Jiao, Y.; Chen, Z. A Dynamic GLR-Based Fault Detection Method for Non-Gaussain Dynamic Processes. Symmetry 2022, 14, 1332. [Google Scholar] [CrossRef]
Song, Y.; Yang, S.; Cheng, C.; Xie, P. A Novel Fault Detection Method for Running Gear Systems Based on Dynamic Inner Slow Feature Analysis. IEEE Access 2020, 8, 211371–211379. [Google Scholar] [CrossRef]
Yang, J.; Rahardja, S.; Franti, P. Mean-shift Outlier Detection and Filtering. Pattern Recognit. 2021, 115, 107874. [Google Scholar] [CrossRef]
Abbas, M.; Chafouk, H.; Ardjoun, S.A.E.M. Fault Diagnosis in Wind Turbine Current Sensors: Detecting Single and Multiple Faults with the Extended Kalman Filter Bank Approach. Sensors 2024, 24, 728. [Google Scholar] [CrossRef]
Ferdowsi, H.; Cai, J.; Jagannathan, S. Actuator and Sensor Fault Detection and Failure Prediction for Systems with Multi-dimensional Nonlinear Partial Differential Equations. Int. J. Control Autom. Syst. 2022, 20, 789–802. [Google Scholar] [CrossRef]
Han, H.; Han, H.; Zhao, D.; Gao, X. Fault Detection Approach for Nonlinear Systems via Nonlinear Factorization and Fuzzy Models. IEEE Trans. Circuits Syst. II-Express Briefs 2022, 69, 3425–3429. [Google Scholar] [CrossRef]
Yan, L.; Zhang, Y.; Xiao, B.; Xia, Y.; Fu, M. Fault Detection for Nonlinear Systems with Unreliable Measurements Based on Hierarchy Cubature Kalman Filter. Can. J. Chem. Eng. 2018, 96, 497–506. [Google Scholar] [CrossRef]
Gong, C.; Peng, R. A Novel Hierarchical Vision Transformer and Wavelet Time-Frequency Based on Multi-Source Information Fusion for Intelligent Fault Diagnosis. Sensors 2024, 24, 1799. [Google Scholar] [CrossRef] [PubMed]
Leite, D.; Martins, A., Jr.; Rativa, D.; De Oliveira, J.F.L.; Maciel, A.M.A. An Automated Machine Learning Approach for Real-Time Fault Detection and Diagnosis. Sensors 2022, 22, 6138. [Google Scholar] [CrossRef] [PubMed]
Harichandran, A.; Raphael, B.; Mukherjee, A. Equipment Activity Recognition and Early Fault Detection in Automated Construction Through a Hybrid Machine Learning Framework. Comput. Aided Civ. Infrastruct. Eng. 2023, 38, 253–268. [Google Scholar] [CrossRef]
Wang, M.; Yu, J.; Leng, H.; Du, X.; Liu, Y. Bearing Fault Detection by using Graph Autoencoder and Ensemble learning. Sci. Rep. 2024, 14, 5206. [Google Scholar] [CrossRef] [PubMed]
Shubita, R.R.; Alsadeh, A.S.; Khater, I.M. Fault Detection in Rotating Machinery Based on Sound Signal Using Edge Machine Learning. IEEE Access 2023, 11, 6665–6672. [Google Scholar] [CrossRef]
Kumar, P.; Hati, A.S. Review on Machine Learning Algorithm Based Fault Detection in Induction Motors. Arch. Comput. Method Eng. 2021, 28, 1929–1940. [Google Scholar] [CrossRef]
Kramberger, T.; Potocnik, B. LSUN-Stanford Car Dataset: Enhancing Large-Scale Car Image Datasets Using Deep Learning for Usage in GAN Training. Appl. Sci. 2020, 10, 4913. [Google Scholar] [CrossRef]
Chen, Y.Y.; Zhang, D.; Zhang, H.; Wang, Q.G. Dual-Path Mixed-Domain Residual Threshold Networks for Bearing Fault Diagnosis. IEEE Trans. Ind. Electron. 2022, 69, 13462–13472. [Google Scholar] [CrossRef]
Chen, Y.Y.; Zhang, D.; Zhu, K.P.; Yan, R.Q. An Adaptive Activation Transfer Learning Approach for Fault Diagnosis. IEEE-ASME Trans. Mechatron. 2023, 28, 2645–2656. [Google Scholar] [CrossRef]
Hou, Y.; Huang, R.; Cheng, Q.; Hou, L.; Wang, X. Fault Detection and Isolation for Output Feedback System Based on Space Geometry Method. Cluster Comput. 2019, 22, S9313–S9321. [Google Scholar] [CrossRef]
Qian, X.; Sun, T.; Zhang, Y.; Wang, B.; Gendeel, M.A.A. Wind Turbine Fault Detection Based on Spatial-Temporal Feature and Neighbor Operation State. Renew. Energy 2023, 219, 119419. [Google Scholar] [CrossRef]
Guo, T.; Zhang, T.; Lim, E.; Lopez-Benitez, M.; Ma, F.; Yu, L. A Review of Wavelet Analysis and Its Applications: Challenges and Opportunities. IEEE Access 2022, 10, 58869–58903. [Google Scholar] [CrossRef]
Chen, Q.; Yu, W.; Zhao, X.; Nie, F.; Li, X. Rooted Mahalanobis Distance based Gustafson-Kessel Fuzzy C-means. Inf. Sci. 2023, 644, 118878. [Google Scholar] [CrossRef]
Haq, A.; Munir, W. New CUSUM and Shewhart-CUSUM Charts for Monitoring the Process Mean. Qual. Reliab. Eng. Int. 2021, 37, 3512–3528. [Google Scholar] [CrossRef]
Huang, C.J.; Lu, S.L.; Chen, J.-H. Enhanced Generally Weighted Moving Average Variance Charts for Monitoring Process Variance with Individual Observations. Qual. Reliab. Eng. Int. 2020, 36, 285–302. [Google Scholar] [CrossRef]
Isensee, J.; Datseris, G.; Parlitz, U. Predicting Spatio-Temporal Time Series Using Dimension Reduced Local States. J. Nonlinear Sci. 2020, 30, 713–735. [Google Scholar] [CrossRef]
Wang, J.; Zhou, Z.; Li, Z.; Du, S. A Novel Fault Detection Scheme Based on Mutual k-Nearest Neighbor Method: Application on the Industrial Processes with Outliers. Processes 2022, 10, 497. [Google Scholar] [CrossRef]

Figure 1. Wavelet denoising workflow diagram.

Figure 3. Process flow diagram of the TE process.

Figure 4. Detection results of statistical measures for Fault 4 using different detection methods. (a) Demonstration of raw data for Fault 1; (b) demonstration of denoised data for Fault 1; (c) demonstration of raw data for Fault 2; (d) demonstration of denoised data for Fault 2; (e) demonstration of raw data for Fault 3; (f) demonstration of denoised data for Fault 3; (g) demonstration of raw data for Fault 4; (h) demonstration of denoised data for Fault 4; (i) demonstration of raw data for Fault 5; (j) demonstration of denoised data for Fault 5; (k) demonstration of raw data for Fault 8; (l) demonstration of denoised data for Fault 8; (m) demonstration of raw data for Fault 10; (n) demonstration of denoised data for Fault 10; (o) demonstration of raw data for Fault 12; (p) demonstration of denoised data for Fault 12; (q) demonstration of raw data for Fault 13; and (r) demonstration of denoised data for Fault 13.

Figure 5. Detection results of statistical measures for Fault 4 using different detection methods. (a) Statistical measures of the STHM method; (b) statistical measures of the STN method; (c)

T^{2}