A Hierarchical Matrix Factorization-Based Method for Intelligent Industrial Fault Diagnosis

Li, Yanxia; Zhou, Han; Liu, Jiajia; Tan, Xuemin

doi:10.3390/s24165408

Open AccessArticle

A Hierarchical Matrix Factorization-Based Method for Intelligent Industrial Fault Diagnosis

¹

School of Automation, Chengdu University of Information Technology, Chengdu 610225, China

²

School of Automation, Chongqing University, Chongqing 400044, China

^*

Author to whom correspondence should be addressed.

Sensors 2024, 24(16), 5408; https://doi.org/10.3390/s24165408 (registering DOI)

Submission received: 21 June 2024 / Revised: 29 July 2024 / Accepted: 6 August 2024 / Published: 21 August 2024

(This article belongs to the Special Issue Artificial Intelligence Enhanced Health Monitoring and Diagnostics: 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

:

Data-driven fault diagnosis, identifying abnormality causes using collected industrial data, is one of the challenging tasks for intelligent industry safety management. It is worth noting that practical industrial data are usually related to a mixture of several physical attributes, such as the operating environment, product quality and working conditions. However, the traditional models may not be sufficient to leverage the coherent information for diagnostic performance enhancement, due to their shallow architecture. This paper presents a hierarchical matrix factorization (HMF) that relies on a succession of matrix factoring to find an efficient representation of industrial data for fault diagnosis. Specifically, HMF consecutively decomposes data into several hierarchies. The intermediate hierarchies play the role of analysis operators which automatically learn implicit characteristics of industrial data; the final hierarchy outputs high-level and discriminative features. Furthermore, HMF is also extended in a nonlinear manner by introducing activation functions, referred as NHMF, to deal with nonlinearities in practical industrial processes. The applications of HMF and NHMF to fault diagnosis are evaluated by the multiple-phase flow process. The experimental results show that our models achieve competitive performance against the considered shallow and deep models, consuming less computing time than deep models.

Keywords:

industrial processes; fault diagnosis; non-negative matrix factorization; hierarchical; nonlinear

1. Introduction

To ensure industrial processes’ reliability and safety, fault diagnosis, which identifies abnormality causes at an early stage, is one of the ongoing research activities in intelligent industry safety management. Over the past decades, the increasing system complexity has posed challenges to traditional methods [1], since these methods usually require a good deal of expertise or rigorous system structure clarification. Meanwhile, with the extensive usage of intelligent sensors and computer systems, amounts of process data can be recorded and stored in industrial databases. Under these circumstances, data-driven methods, which automatically extract valuable information from industrial data and make reliable decisions without much prior knowledge, supply a feasible solution to the fault diagnosis problem.

For data-driven fault diagnosis, it is reasonable to treat process data that share similar features as the same fault class. If a new receiving datum is assigned to a known class that has similar features to it, its fault is then identified. One class of methodologies is collectively referred to as Multivariate Statistical Analysis (MSA), using statistical methodologies to analyze jointly two or more statistical variables collected from the industry. Basic techniques include Principal Component Analysis (PCA) [2], Fisher Discriminant Analysis (FDA) [3], Independent Component Analysis (ICA) [4], etc. Another class can be called representation learning (RL)-based methods. They learn representative features of fault data so that different faults can be distinguished by trained classifiers. Popular techniques include Dictionary Learning [5], Manifold Learning [6], Non-negative Matrix Factorization (NMF) [7,8,9,10], etc. Their advantages lie in the model interpretability and complexity reduction in analysis.

However, practical industrial process data usually contain many physical attributes like product quality and operating modes that can help accurately identify faults. As shown in Figure 1a, the shallow structure, which only finds one mapping between low-dimensional features and original process data, might not be sufficient to exploit the complex industrial natures. Fortunately, deep learning-based methods can deal with such cases via hierarchies [11,12,13,14,15]. Unfortunately, most deep network-based methods have apparent deficiencies. Firstly, the very tricky (hyper-)parameter tuning strategy makes deep networks theoretically unexplainable because of too many interfering factors. Secondly, some models require a huge amount of training data; however, fault data are not always sufficient due to the high collecting and labeling costs. Further, they also consume prodigious time and computing resources (such as distributed computers and GPU facilities) during the training process, limiting their wide application in many practical situations. Inspired by the recent advances in deep learning, we conjecture that, if we leverage the hierarchy into other learning models, we might be able to achieve competitive fault diagnosis performance but with fewer of the aforementioned deficiencies.

Currently, hierarchical learning has been developed and applied to a diverse range of practical tasks with much success. Instead of learning a single mapping, hierarchical learning trends to learn multiple layers of mapping, as shown in Figure 1b. There are some efforts to extend shallow models to hierarchical models. For example, Yao et al. [16] introduced a deep discriminative sparse representation learning framework with a deep architecture for machinery fault diagnosis. Chen et al. [17] proposed a deep PCA-based fault diagnosis method for the electrical drive in high-speed trains. Particularly, along with the subspaces obtained by shallow PCA, they continue individually decomposing them until they reach the n-th (

n \geq 2

)-order subspaces. The higher-order subspace could mine more implicit information about fault data. As the layer number increases, however, the computation cost will exponentially increase because their model has to achieve

2^{n}

subspaces at the n-th layer. Deng et al. [18] stacked multiple Kernel PCA models for layer-wise features, where the output (score vector) of the previous KPCA layer is used as the input for the next KPCA layer. Ren et al. [19] stacked an autoencoder on NMF. The first layer tries to provide the nonlinear interpretation of process data and the second layer achieves dimensionality reduction. Despite of their effectiveness, [18,19] still require tricky parameter/model-structure determination for the nonlinearity interpretation.

Based on the above observations and inspired by the recent advance in deep learning, this paper presents a hierarchical matrix factorization-based method (HMF) for industrial process fault diagnosis, by extending popular NMF to a hierarchical model. The contributions of our work are as follows:

This paper proposes an HMF for fault diagnosis. The presented model consecutively decomposes fault data, cascading for several hierarchies. The middle hierarchies are expected to learn intrinsic characteristics of processes, the final hierarchy is expected to achieve high-level discriminative features of process data.
Further, this paper extends HMF to the nonlinear version, referring to nonlinear HMF (NHMF). It adopts activation functions as the nonlinear transformer to describe the nonlinearities in practical industrial processes. Unlike traditional kernel-based methods, this strategy allows our model to avoid tricky kernel parameter tuning.
The experimental results on the Multi-Phase Flow Process verify the effectiveness of the hierarchies and the nonlinear transformations.

The remainder of this paper is structured as follows. Section 2 introduces the construction of HMF and NHMF. In Section 3, a case study on a practical industrial process is carried out to validate the effectiveness of the proposed method. Our conclusions are drawn in Section 4.

2. Preliminary

NMF has gained much attention since it satisfies the psychological and physiological evidence for the part-based learning strategy in human brains [20]. Specifically, NMF discovers the non-negative low-dimensional features

Y^{+} \in ℜ^{k \times N}

of non-negative original process data

X^{+} \in ℜ^{d \times N}

with one mapping

U^{+} \in ℜ^{k \times d}

:

min_{U^{+}, Y^{+}} | | X^{+} - U^{+} Y^{+} {| |}_{F}^{2}

(1)

where N and d denote the item number and dimensionality of process data, respectively. k is the expected feature dimensionality.

A^{+}

demotes that matrix

A

only contains positive elements. It should be noted that this problem is not convex and its suboptimal solutions can be obtained by an iterative multiplicative strategy:

\begin{matrix} U_{i k} & \leftarrow U_{i k} \frac{{(X Y^{T})}_{i k}}{{(U Y Y^{T})}_{i k}} \\ Y_{i k} & \leftarrow Y_{k j} \frac{{(U^{T} X)}_{k j}}{{(U^{T} U Y)}_{k j}} \end{matrix}

(2)

where + is omitted for simplicity.

Researchers usually make improvements on NMF to meet the requirements of practical fault diagnosis problems, such as sparseness [21], and geometry preservation [22]. Reference Yang et al. [23] embedded the fault’s prior information into the traditional NMF to enhance its diagnostic performance on diesel engines. Reference Yi et al. [24] adopted the kernel trick on NMF. The Gaussian kernel function was empirically selected to deal with the nonlinearities of industrial processes. In their work, they also introduced the False Nearest Neighbors algorithm into NMF to reduce the fault diagnosis time and space costs. However, they only found one mapping between low-dimensional features and original process data, which might not be sufficient to exploit the complex industrial nature. Motivated by the recent progress of deep learning, we extend the standard NMF to a hierarchical mode, which can automatically learn high-level, discriminative features of industrial process data. We build our discriminative hierarchical feature learning scheme based on the NMF structure; thus, the proposed HMF is still an NMF-based method.

3. Methodology

Similar to previous fault diagnosis works, our goal is to obtain a representative feature matrix

Y

of the original process fault data

X

. A matrix

U

serves as the mapping function between

Y

and

X

. Industrial process data we wish to analyze often have intrinsic attributes. Leveraging the intrinsic attributes may help accurately identify faults. We conjecture that, if we consecutively map the original process data into several intermediate feature spaces, we may discover rich process attributes and finally achieve discriminant features.

3.1. Hierarchical Matrix Factorization

Following standard NMF, the hierarchical matrix factorization (HMF) extracts features by consecutively decomposing an original data matrix

X^{+} \in ℜ^{d \times N}

into

m + 1

layered structures:

\begin{matrix} min_{U_{1} \dots U_{m}, Y_{m}} \frac{1}{2} | | X^{+} - U_{1}^{+} \underset{Y_{1}^{+}}{\underset{⏟}{U_{2}^{+} \underset{Y_{2}^{+}}{\underset{⏟}{U_{3}^{+} \dots \underset{Y_{m - 1}^{+}}{\underset{⏟}{U_{m}^{+} Y_{m}^{+}}}}}}} {| |}_{F}^{2} \end{matrix}

(3)

where

| | \cdot {| |}_{F}^{2}

denotes the Frobenius norm.

U_{1}^{+} \in ℜ^{d \times k_{1}}

,

U_{m}^{+} \in ℜ^{k_{m - 1} \times k_{m}}

and

Y_{m}^{+} \in ℜ^{k_{m} \times N}

. As shown in Figure 2, in this model, each hierarchy is expected to be automatically related to a distinct attribute and is assigned the implicit feature accordingly. Finally, the last layer is able to find the high-level features by modeling the complex process attributes.

Since practical process data usually include negative elements, we allow

X

and

U

in Equation (3) to have negative parts while enforcing that

Y

only contains positive elements. This strategy still retains the part-based interpretability [25]. Therefore, the objective function of Equation (3) is formulated as follows:

\begin{matrix} J = min_{U_{1} \dots U_{m}, Y_{m}} \frac{1}{2} {∥X^{\pm} - \prod_{i = 1}^{m} U_{i}^{\pm} Y_{m}^{+}∥}_{F}^{2} s . t . Y_{m}^{+} \geq 0 \end{matrix}

(4)

where

A^{\pm}

demotes that matrix

A

contains both the positive and negative elements. Equation (4) is a non-convex problem but we can adopt an alternative strategy to solve it, i.e., updating one factor while keeping others fixed. For simplicity, we omit the ± and + in the following equations.

Updating $U_{i}$ while keeping others fixed. Obviously, Equation (4) is the following sum of squared residuals:

\begin{matrix} J = \frac{1}{2} {∥X - \prod_{i = 1}^{m} U_{i} Y_{m}∥}_{F}^{2} \\ = t r ({XX}^{T} - 2 X^{T} \prod_{i = 1}^{m} U_{i} Y_{m} + Y_{m}^{T} \prod_{i = 1}^{m} U_{i}^{T} \prod_{i = 1}^{m} U_{i} Y_{m}) \end{matrix}

(5)

We set

\frac{\partial J}{\partial U_{i}} = 0

and give ourselves the following updating rule for

U_{i}

:

U_{i} \leftarrow {[{(\prod_{s = 1}^{i - 1} U_{s})}^{T} \prod_{s = 1}^{i - 1} U_{s}]}^{- 1} {(\prod_{s = 1}^{i - 1} U_{s})}^{T} X Y_{i}^{T} {(Y_{i} Y_{i}^{T})}^{- 1}

(6)

Updating $Y_{i}$ while keeping others fixed. We employ the Lagrange multiplier

Φ_{i}

for the non-negativity constraint on

Y_{i}

and the Lagrange function related to

Y_{i}

is written as

\begin{matrix} L (Y_{i}) = \\ T r (- 2 X^{T} \prod_{s = 1}^{i - 1} U_{s} Y_{i} + Y_{i}^{T} \prod_{i = 1}^{i - 1} U_{s}^{T} \prod_{s = 1}^{i} U_{i} Y_{i}) - T r (Φ_{i} Y_{i}) \end{matrix}

(7)

Let its gradient be equal to zero; from the complementary slackness condition, we can obtain

\begin{matrix} [- {(\prod_{s = 1}^{i} U_{s})}^{T} X + {(\prod_{s = 1}^{i} U_{s})}^{T} \prod_{s = 1}^{i} U_{s} Y_{i}] ⊙ Y_{i} = Φ_{i} ⊙ Y_{i} = 0 \end{matrix}

(8)

where ⊙ denotes the dot-product.

This is a fixed-point equation that the limiting solution must satisfy at the convergence of

Y_{i}^{(\infty)} = Y_{i}^{(t + 1)} = Y_{i}^{(t)} = Y_{i}

, i.e.,

\begin{matrix} Y_{i} \leftarrow \\ Y_{i} ⊙ \sqrt{\frac{{[{(\prod_{s = 1}^{i - 1} U_{i})}^{T} X]}^{p o s} + {[{(\prod_{s = 1}^{i - 1} U_{i})}^{T} \prod_{s = 1}^{i - 1} U_{i}]}^{n e g} Y_{i}}{{[{(\prod_{s = 1}^{i - 1} U_{i})}^{T} X]}^{n e g} + {[{(\prod_{s = 1}^{i - 1} U_{i})}^{T} \prod_{s = 1}^{i - 1} U_{i}]}^{p o s} Y_{i}}} \end{matrix}

(9)

where

A^{p o s}

denotes a matrix that only contains all the positive elements;

A^{n e g}

denotes a matrix that only contains all the negative elements:

A^{p o s} = \frac{| A | + A}{2}, A^{n e g} = \frac{| A | - A}{2}

(10)

By iteratively updating factors

U

and

Y

with Equations (6) and (9), we can get the sub-optimal solution to the formulated Equation (4).

Pre-training has been successfully employed in previous deep learning works and it can greatly reduce the training time [12]. We also followed this tactic to have an initial approximation of each layer and, thus, expedite the approximation of

U_{i}

and

Y_{i}

in the HMF and NHMF. To be specific, the original data matrix

X

is firstly decomposed into

U_{1} \in ℜ_{\pm}^{d \times k_{1}}

and

Y_{1} \in ℜ_{0, +}^{k_{1} \times n}

using conventional semi-NMF [13]. Further,

Y_{1}

is decomposed into

U_{2} \in ℜ_{\pm}^{k_{1} \times k_{2}}

and

Y_{2} \in ℜ_{0, +}^{k_{2} \times n}

, continuing to do so until all layers have been initialized. Afterward,

U_{i}

and

Y_{i}

are iteratively fine-tuned via Equations (6)–(9) until convergence. The stop criterion of the algorithms could be the maximum iteration or small-loss value.

The computational complexity and space complexity are calculated as follows. We need

O (d N k)

for Equation (6) to update

U_{i}

and

O (d N k + (d + N) k^{2})

for Equation (9) to update

Y_{i}

, where

k = m a x {k_{i}}

. Therefore, to optimize a model with m layers and t iterations, the overall computational complexity is

O ((d N k + (d + N) k^{2}) m t)

.

3.2. Nonlinear Hierarchical Matrix Factorization

Practical industrial processes usually exhibit nonlinearity due to their complex system mechanism, coupling sensors and varying operating conditions. The nonlinearity brings difficulties to the HMF since it can only learn the low-dimensional features by linear mapping. The failure to describe the nonlinear correlations may lead to unsatisfactory fault diagnosis performance.

From the mathematical point of view, to bring the nonlinearity to the HMF, one can utilize a nonlinear function in each layer of representations:

Y_{i - 1} = φ (U_{i} Y_{i})

(11)

where

φ

denotes a nonlinear function. Popular choices could be kernel functions. The kernel trick, however, usually requires cumbersome parameter determinations. Therefore, activation functions are introduced to the HMF. In this situation, the nonlinear extension of Equation (4) is formulated as

J_{n o n} = \frac{1}{2} {∥X^{\pm} - U_{1}^{\pm} φ (\dots φ (U_{m}^{\pm} Y_{m}^{+}))∥}_{F}^{2} s . t . Y_{m} \geq 0

(12)

We can use the gradient descent optimizations to minimize this cost function and the derivative for each factor needs to be computed.

Particularly, when

i = 1

, this model is equivalent to the shallow one:

min_{U_{1}, Y_{1}} \frac{1}{2} {∥X - U_{1} φ (Y_{1})∥}_{F}^{2}

(13)

and the derivation of

J_{n o n}

with respect to

Y_{1}

is written as

\begin{matrix} \frac{\partial J_{n o n}}{\partial Y_{1}} & = \frac{1}{2} \frac{\partial T r {- 2 X^{T} U_{1} φ (Y_{1}) + {[U_{1} φ (Y_{1})]}^{T} U_{1} φ (Y_{1})}}{\partial φ (Y_{1})} ⊙ \nabla φ (Y_{1}) \\ = U_{1}^{T} U_{1} φ (Y_{1}) - U_{1}^{T} X ⊙ \nabla φ (Y_{1}) \\ = U_{1}^{T} [U_{1} φ (Y_{1}) - X] ⊙ \nabla φ (Y_{1}) \end{matrix}

(14)

the derivation of

J_{n o n}

with respect to

U_{1}

is written as

\begin{matrix} \frac{\partial J_{n o n}}{\partial U_{1}} & = \frac{1}{2} \frac{\partial T r {- 2 X^{T} U_{1} φ (Y_{1}) + {[U_{1} φ (Y_{1})]}^{T} U_{1} φ (Y_{1})}}{\partial φ (U_{1})} \\ = [U_{1} φ (Y_{1}) - X] φ (Y_{1}^{T}) \end{matrix}

(15)

In order to compute the derivative of

Y_{i}

(i \geq 2)

, we use the chain rule:

\begin{matrix} \frac{\partial J_{n o n}}{\partial Y_{i}} & = U_{i}^{T} \frac{\partial J_{n o n}}{\partial U_{i} Y_{i}} \\ = U_{i}^{T} [\frac{\partial J_{n o n}}{\partial φ (U_{i} Y_{i})} ⊙ \nabla φ (U_{i} Y_{i})] \\ = U_{i}^{T} [\frac{\partial J_{n o n}}{\partial Y_{i - 1}} ⊙ \nabla φ (U_{i} Y_{i})] \end{matrix}

(16)

Similarly, for

U_{i}

(i \geq 2)

, we obtain

\begin{matrix} \frac{\partial J_{n o n}}{\partial U_{i}} & = \frac{\partial J_{n o n}}{\partial U_{i} Y_{i}} Y_{i}^{T} \\ = [\frac{\partial J_{n o n}}{\partial φ (U_{i} Y_{i})} ⊙ \nabla φ (U_{i} Y_{i})] Y_{i}^{T} \\ = [\frac{\partial J_{n o n}}{\partial U_{i - 1}} ⊙ \nabla φ (U_{i} Y_{i})] Y_{i}^{T} \end{matrix}

(17)

With these derivatives, gradient descent optimizations can be utilized to minimize the cost function with respect to each layer of

U_{i}

and

Y_{i}

.

\begin{matrix} Y_{i} & \leftarrow Y_{i} - η \frac{\partial J_{n o n}}{\partial Y_{i}} \\ U_{i} & \leftarrow U_{i} - η \frac{\partial J_{n o n}}{\partial U_{i}} \end{matrix}

(18)

where

η

is the learning rate.

By iteratively updating factors

U

and

Y

with Equation (18), we can get the sub-optimal solution to the formulated equation, Equation (12). Similar to HMF, pre-training can greatly expedite the approximation procedure and we also use conventional semi-NMF to initialize each layer of this model. Afterward, each layer is fine-tuned via Equation (18) until convergence.

3.3. Fault Identification

As shown in Figure 2, since both HMF and NHMF are unsupervised models, all samples

X = [X^{t r a i n}; X^{t e s t}]

need to be fed into our model to learn the fault features

Y_{m} = [Y^{t r a i n}; Y^{t e s t}]

. Further,

Y_{m}

is divided into

Y^{t r a i n}

and

Y^{t e s t}

.

Y^{t r a i n}

is utilized to train a simple classifier

C

while the fault type of

Y^{t e s t}

can be predicted with the classifier

C

. In this work, we adopt the K-Nearest Neighbor (KNN) classifier for its simplicity and efficiency. To be specific, the KNN assigns

y_{i}^{t e s t}

to the fault type with the following rules:

\begin{matrix} i f, & i_{0} = min_{i} ∥y_{i}^{t e s t} - y_{i}^{t r a i n}∥, \\ t h e n, & l a b e l (y_{i}^{t e s t}) = l a b e l (y_{i 0}^{t r a i n}) \end{matrix}

(19)

If a new datum

x^{*}

is coming, we can either use HMF or NHMF to project it to learn its feature

y^{*}

. This can be achieved via basic matrix reconstruction:

y^{*} \approx {[U_{1} U_{2} \dots U_{m}]}^{†} x^{*}

(20)

for the linear model, and

y^{*} \approx φ^{- 1} (U_{m}^{†} (\dots (U_{2}^{†} φ^{- 1} (U_{1}^{†} x^{*}))))

(21)

for the nonlinear model, where † denotes the Moore–Penrose pseudo-inverse.

4. Case Study

To verify the effectiveness of the proposed methods, this section provides the experimental results and discussion on the Multiple Phase Flow process (MPF) (http://www.mathworks.com/matlabcentral/fileexchange/50938-a-benchmark-case-for-statistical-process-monitoring-cranfield-multiphase-flow-facility (accessed on 10 June 2024)).

4.1. Data Description

The MPF is utilized to verify the effectiveness of the proposed models. It was designed by Cranfield University to provide a controlled and measured flow rate of water, oil and air to a pressurized system whose diagram is shown in Figure 3. In the MPF process, there are 24 measurements to describe the condition of this process. All the data were captured at a sampling rate of 1 Hz. As summarized in Table 1, we select five different conditions as the dataset, which include the normal condition, air line blockage, water line blockage, top separator input blockage and open direct bypass. A total of 50% of the samples are utilized as the training dataset while others are the testing dataset. In particular, these fault data are collected under changing operational conditions instead of in the steady-state regime. The multimode and nonlinearity characteristics, together with the system size of the MPF, make this case a desirable benchmark.

4.2. Comparison with Shallow Methods

In this work, the shallow model is a type of machine learning algorithm with only one layer of composition. The linear models include Principal Component Analysis (PCA) [27], Linear Discriminant Analysis (LDA) [28], Sparse Discriminant Analysis (SDA) [29] and Non-negative Matrix Factorization (NMF) [20], while the nonlinear models include Kernel Principal Component Analysis (KPCA) [30], Sparse Exponential Discriminant Analysis (SEDA) [31] and Kernel Non-negative Matrix Factorization (KNMF) [32]. For all methods, the final representation dimensionality k was set to 5, except LDA, which was set to 4. For SDA and SEDA, their parameters were set as recommended by the author in the original paper. For the kernel-based methods, the Gaussian kernel was empirically set to 1.3. The offset and degree parameters of the polynomial kernel in PNMF were set to 1 and 20, respectively. For both the HMF and NHMF, we set the layer number m to 3 and their size was set to 19, 12 and 5 for the first to third layers. Moreover, the nonlinear function for NHMF is selected as the tanh function.

We report the performance of different methods related to each fault class, where the performance is evaluated by the True Positive Rate (TPR) and Positive Predictive Value (PPV):

\begin{matrix} T P R & = n_{i}^{t} / n_{i} \times 100 %, A v g (T P R) = \sum_{i = 1}^{C} \times T P R (i) / n \\ P P V & = n_{i}^{t} / n_{i}^{p} \times 100 %, A v g (P P V) = \sum_{i = 1}^{C} \times P P V (i) / n \end{matrix}

(22)

where C denotes the total number of fault types. n is the number of total samples.

n_{i}^{t}

and

n_{i}

are the correctly diagnosed sample number and total sample number in i-th fault type, respectively.

n_{i}^{p}

is the number of predicted samples as the i-th fault. Specifically, TPR measures the accuracy of the models and PPV measures the precision of the models. Larger values denote better performance.

The diagnosis performance with respect to TPR and PPV is summarized in Table 2. There are some interesting observations in this table. (1) Comparing the results between PCA and KPCA, between LDA/SDA and SEDA and between NMF and KNMF, models with kernel or activation functions achieve more promising diagnosis performance. This may be attributed to the nonlinearity of the MPF process. (2) During the grid search procedure for kernel parameters, we find that some parameter combinations even decrease the fault diagnosis performance of KPCA and KNMF. How to determine proper parameters is very tricky in practical cases, whereas NHMF only requires the selection of activation functions. (3) Both HMF and NHMF perform better than the shallow NMF. To be specific, the average TPR and PPV of HMF are 5.33% and 4.13% higher than those of the best competitors; the average TPR and PPV of NHMF are 7.58% and 6.70% higher than those of the best competitors.

For more detailed results of diagnosis performance, the confusion matrices of different methods are provided in Figure 4. Class #1 denotes the normal condition of TPF while #2–#5 denote air line blockage, water line blockage, top separator input blockage and open direct bypass, respectively. It is clear that the results of NHMF have the most obvious diagonal block structure among these five methods. Namely, NHMF works well for all four fault types and one normal condition. HMF performs well on faults #3, #4 and #5 while it has a relevant higher misclassification error on class #1 and class #2. Shallow models perform even worse because they fail to exploit complex attributes of industrial processes.

4.3. Comparison with Deep Models

As for deep models, our work mainly focuses on deep neural networks that consist of multiple fully connected layers: an input layer, multiple hidden layers (at least two) and a single output layer. In this experiment, we selected two typical deep neural network-based fault diagnosis methods as the competitors, i.e., Stacked Auto-Encoder (SAE) and Deep Boltzmann Machine (DBN). A total of four layers were constructed in SAE and DBN whose neural numbers from the first layer to the fourth layer were set to 24, 19, 12 and 5, respectively.

The experimental results are reported in Table 3. Compared with Table 2, all the hierarchical models achieve better diagnosis performance, with approximately 90% TPR and PPV. This fact shows the effectiveness of the deep learning strategy. We also provide the violin plots of the obtained results in Figure 5. Only the results in terms of TPR are provided. Similar observations also can be seen in terms of PPV. Generally speaking, the performance of deep neural networks varies in a large range (80%, 96%). This may be attributed to the limited training samples and randomized initialization. We cannot deny that the potential power of deep neural networks is extremely strong. However, their powerful performance depends heavily on some tricky operations. If they are not well-tuned, they may also achieve an unsatisfactory performance, whereas our model can achieve a relatively stable and desirable diagnosis performance.

4.4. Time Consumption Analysis

In this experiment, we compared the average time consumption of the HMF and NMHF with DBN and NMF. Specifically, we varied the number of training samples between 200, 500, 1000, 2000 and 4000 and each algorithm was performed 50 times to avoid randomness. The layer sizes of both the HMF and NHMF were set to 19-12-5. Their maximum iteration was set to 50, since they converge very fast. The configurations of DBN and NMF were set as previously mentioned. Figure 6 shows the results where the shadow denotes variance. We only report the time of DBN since similar observations can be found in SAE. Clearly, from this figure, shallow architecture (NMF) generally takes the least computational time among these competitors due to its simple structure. The time cost of HMF is slightly higher than that of NMF but it is still within an acceptable range. NHMF costs much more time because the derivation of nonlinear functions brings computational complexity during optimization. However, it is better than DBN, which costs the most prodigious time during the training procedure.

4.5. Effectiveness of Hierarchies

To show the effectiveness of the layer-wise structure of HMF and NHMF, we provide the diagnosis results obtained by the shallow model and hierarchical models. The configuration of NMF is similar to the previous one. For the two-layer-structured HMF and NHMF, their layer size numbers were set to 19 and 5 for the first and second layers while, for the three-layer-structured HMF and NHMF, their layer size numbers were set to 19, 12 and 5 from the first to last layer. As shown in Table 4 and Table 5, the deeper structured models achieve higher scores in both TPR and PPV. To be specific, the average TPR of shallow-model NMF is 82.80% while those of two-layer models are 89.78% and 88.50%. Furthermore, the three-layer-structured HMF and NHMF outperform both the shallow model and the two-layer models by exceeding 90% TRP and PPV in the experiments. This fact demonstrates that a relatively deeper layer can truly yield better diagnosis performance. However, the increasing layers also may even decrease the performance because the limited data make the deep model under-fitted. Unfortunately, similar to other deep models, how to precisely decide the number of HMFs and NHMFs is still an open problem in research because it depends on the quality of available data. Some empirical strategies can help the decision, such as trial and error, heuristic search and exhaustive search. In this paper, we suggest (2,4) layers for the datasets whose scales are similar to the MPF process.

4.6. Parameter Study on Hierarchical Structure

Next, we proceeded to evaluate the influences of the layer size. To achieve this goal, we constructed a two-layer HMF and NHMF. The size of their first layer varied from 7 to 23 and the size of the second layer varied from 6 to 22, all with the interval 2. Figure 7 illustrates the performance heat map when HMF and NHMF adopt different layer sizes. In Figure 7, the number on the diagonal is the size of the last layer while the number on the horizontal and vertical axes is the size of the first layer. It should be noted that the size of the first layer is larger than that of the last layer. Clearly, as the layer size increases, both the TPR and PPV scores are getting higher. To balance the performance and model size, we may suggest the size of the m-th layer to be

{(λ)}^{m} d

, where

λ \in

(0.6,0.8).

4.7. Convergence Analysis

As discussed in previous sections, we adopted iterative updating rules to obtain the local optima of HMF and NHMF. Figure 8 experimentally illustrates the convergence of our models on the TPP dataset. As we can see, the loss value of both HMF and NHMF monotonically decreases with the increase of iterations and converges fast. Specifically, HMF converges to be stable within 10 iterations while NHMF converges to be stable within 15 iterations.

4.8. Visualization of the Learned Feature Matrices

The learned fault features matrices

Y

are visualized in Figure 9, where the horizontal axis denotes the learned feature and the vertical axis is the weight value. It should be noted that all weight values in the matrix

Y

are normalized to 0-1. It reveals that the five weight values extracted from the NMF are nearly indistinguishable from the remaining values, thus leading to possibly learning some indiscriminate features. In contrast, the discrimination between the weight values of HMF and NHMF is notably enhanced, contributing to the superior fault diagnosis performance of the proposed method.

5. Conclusions

This paper presented a hierarchical matrix factorization-based fault diagnosis method. HMF consecutively decomposes the original process data into several intermediate spaces so that it can automatically learn process attributes, allowing for better feature discrimination and interpretability. Further, to provide nonlinear interpretability for practical industrial processes, the HMF is extended to a nonlinear case with the aid of activation functions. The experiment results on a practical process demonstrate that, by adding intermediate hierarchies in the conventional shallow model, the presented HMF is able to learn high-level, discriminative features of industrial process data. With activation functions, the NHMF is able to deal with nonlinearities existing in practical industrial processes without tricky parameter tuning and achieve better diagnosis performance. Both of them outperformed the considered range of typical powerful fault diagnosis methods, consuming less computing time than the considered deep models.

However, the proposed approach still has some limitations. Firstly, the proposed method may be a bit sensitive to outliers. When the training data are contaminated heavily because of noise and outliers, the proposed method may result in a degenerated classification performance. Secondly, it complies with the underlying assumptions of balanced data distributions. When the data are imbalanced, the proposed method may tend to strongly favor the majority fault class and detect the minority fault class at extremely low rates, leading to poor fault diagnosis performance. Future work will focus on designing a more robust classification framework and considering imbalanced data distributions to achieve better diagnosis performance.

Author Contributions

Conceptualization, Y.L.; methodology, H.Z.; validation, J.L. and X.T.; writing—original draft preparation, H.Z.; funding acquisition, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Scientific Research Fund of Sichuan Science and Technology Program, under Grants (2023NSFSC1429, 2024NSFSC1425 and 2024NSFSC0868), the Key R&D project of Science and Technology Department of Sichuan Province, under Grants (2023YFG0196 and 2023YFN0077), Sichuan unmanned system and intelligent perception Engineering Laboratory Open Fund and Research Fund of Chengdu University of information engineering, under Grant (No. WRXT2022-005) and Research Fund of Chengdu University of Information Technology, under Grant (KYTZ202139).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Yang, Z.; Chai, Y.; Yin, H.; Tao, S. LPV model based sensor fault diagnosis and isolation for permanent magnet synchronous generator in wind energy conversion systems. Appl. Sci. 2018, 8, 1816. [Google Scholar] [CrossRef]
Fan, S.K.S.; Cheng, C.W.; Tsai, D.M. Fault Diagnosis of Wafer Acceptance Test and Chip Probing between Front-End-of-Line and Back-End-of-Line Processes. IEEE Trans. Autom. Sci. Eng. 2022, 19, 3068. [Google Scholar] [CrossRef]
Sun, Z.; Wang, Y.; Sun, G. Fault Diagnosis of Rotating Machinery Based on Local Centroid Mean Local Fisher Discriminant Analysis. J. Vib. Eng. Technol. 2023, 11, 1417–1441. [Google Scholar] [CrossRef]
Liu, M.; Kong, X.; Luo, Z.Y.L. Novel reduced kernel independent component analysis for process monitoring. Trans. Inst. Meas. Control. 2024, 46, 1374–1387. [Google Scholar] [CrossRef]
Liu, Z.; Wu, D.; Huang, K.; Yang, C.; Gui, W. Fault Diagnosis of Complex Industrial Systems Based on Multi-Granularity Dictionary Learning and Its Application. IEEE Trans. Autom. Sci. Eng. 2024, 21, 192–203. [Google Scholar] [CrossRef]
Tian, J.; Zhang, J.; Jiang, Y.; Wu, S.; Luo, H.; Yin, S. A novel generalized source-free domain adaptation approach for cross-domain industrial fault diagnosis. Reliab. Eng. Syst. Saf. 2024, 243, 1–12. [Google Scholar] [CrossRef]
Zhang, B.; Lin, J.; Miao, Y.; Jiao, J.; Liu, H. Harmonic Sparse Structured Nonnegative Matrix Factorization: A Novel Method for the Separation of Coupled Fault Feature. IEEE Trans. Ind. Inform. 2024, 20, 6209–6221. [Google Scholar] [CrossRef]
Liang, L.; Ding, X.; Wen, H.; Liu, F. Impulsive components separation using minimum-determinant KL-divergence NMF of bi-variable map for bearing diagnosis. Mech. Syst. Signal Process. 2022, 175, 1–23. [Google Scholar] [CrossRef]
Zhou, H.; Yin, H.; Chai, Y. Multi-grained mode partition and robust fault diagnosis for multimode industrial processes. Reliab. Eng. Syst. Saf. 2023, 231, 109011. [Google Scholar] [CrossRef]
Xiu, X.; Fan, J.; Yang, Y.; Liu, W. Fault Detection Using Structured Joint Sparse Nonnegative Matrix Factorization. IEEE Trans. Instrum. Meas. 2021, 70, 1–11. [Google Scholar] [CrossRef]
Matania, O.; Dattner, I.; Bortman, J.; Kenett, R.S.; Parmet, Y. A systematic literature review of deep learning for vibration-based fault diagnosis of critical rotating machinery: Limitations and challenges. J. Sound Vib. 2024, 590, 118562. [Google Scholar] [CrossRef]
Jalal, M.; Khalil, I.; Haq, A. Deep Learning approaches for visual faults diagnosis of photovoltaic systems: State-of-the-art review. Results Eng. 2024, 23, 102622. [Google Scholar] [CrossRef]
Liu, Y.; Jiang, H.; Yao, R.; Zhu, H. Interpretable data-augmented adversarial variational autoencoder with sequential attention for imbalanced fault diagnosis. J. Manuf. Syst. 2023, 71, 342–359. [Google Scholar] [CrossRef]
Liu, Y.; Jiang, H.; Yao, R.; Zeng, T. Counterfactual-augmented few-shot contrastive learning for machinery intelligent fault diagnosis with limited samples. Mech. Syst. Signal Process. 2024, 216, 111507. [Google Scholar] [CrossRef]
Liu, Y.; Jiang, H.; Liu, C.; Yang, W.; Sun, W. Data augmented wavelet capsule generative adversarial network for rolling bearing fault diagnosis. Knowl. Based Syst. 2022, 252, 109439. [Google Scholar] [CrossRef]
Yao, R.; Jiang, H.; Jiang, W.; Liu, Y.; Dong, Y. Deep discriminative sparse representation learning for machinery fault diagnosis. Eng. Appl. Artif. Intell. 2024, 135, 108836. [Google Scholar] [CrossRef]
Chen, H.; Jiang, B.; Lu, N.; Mao, Z. Deep PCA based real-time incipient fault detection and diagnosis methodology for electrical drive in high-speed trains. IEEE Trans. Veh. Technol. 2018, 67, 4819–4830. [Google Scholar] [CrossRef]
Deng, X.; Tian, X.; Chen, S.; Harris, C.J. Deep principal component analysis based on layerwise feature extraction and its application to nonlinear process monitoring. IEEE Trans. Control. Syst. Technol. 2018, 27, 2526–2540. [Google Scholar] [CrossRef]
Ren, Z.; Zhang, W.; Zhang, Z. A Deep Nonnegative Matrix Factorization Approach via Autoencoder for Nonlinear Fault Detection. IEEE Trans. Ind. Inform. 2019, 16, 5042–5052. [Google Scholar] [CrossRef]
Lee, D.D.; Seung, H.S. Learning the parts of objects by non-negative matrix factorization. Nature 1999, 401, 788–791. [Google Scholar] [CrossRef]
Yang, J.; Chen, Y.; Zhang, L. An efficient approach for fault detection, isolation, and data recovery of self-validating multifunctional sensors. IEEE Trans. Instrum. Meas. 2017, 66, 543–558. [Google Scholar] [CrossRef]
Fan, S.; Jia, Q.; Cheng, W.S. Safety Monitoring by a Graph-Regularized Semi-Supervised Nonnegative Matrix Factorization with Applications to a Vision-Based Marking Process. IEEE Access 2020, 8, 112278–112286. [Google Scholar] [CrossRef]
Yang, Y.; Ming, A.; Zhang, Y.; Zhu, Y. Discriminative non-negative matrix factorization (DNMF) and its application to the fault diagnosis of diesel engine. Mech. Syst. Signal Process. 2017, 95, 158–171. [Google Scholar] [CrossRef]
Yi, J.; Wu, L.; Zhou, W.; Wei, Z.; He, H.; Yao, L. A Sparse Dimensionality Reduction Approach Based on False Nearest Neighbors for Nonlinear Fault Detection. IEEE Trans. Syst. Man Cybern. Syst. 2021, 51, 4980–4992. [Google Scholar] [CrossRef]
Ding, C.H.Q.; Li, T.; Jordan, M.I. Convex and semi-nonnegative matrix factorizations. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 32, 45–55. [Google Scholar] [CrossRef]
Ruiz-Carcel, C.; Cao, Y.; Mba, D.; Lao, L.; Samuel, R.T. Statistical process monitoring of a multiphase flow facility. Control. Eng. Pract. 2015, 42, 74–88. [Google Scholar] [CrossRef]
Chiang, L.H.; Russell, E.L.; Braatz, R.D. Fault diagnosis in chemical processes using Fisher discriminant analysis, discriminant partial least squares, and principal component analysis. Chemom. Intell. Lab. Syst. 2000, 50, 243–252. [Google Scholar] [CrossRef]
Balakrishnama, S.; Ganapathiraju, A. Linear discriminant analysis-a brief tutorial. Inst. Signal Inf. Process. 1998, 18, 1–8. [Google Scholar]
Clemmensen, L.; Hastie, T.; Witten, D.; Ersbll, B. Sparse discriminant analysis. Technometrics 2011, 53, 406–413. [Google Scholar] [CrossRef]
Lee, J.-M.; Yoo, C.K.; Choi, S.W.; Vanrolleghem, P.A.; Lee, I.-B. Nonlinear process monitoring using kernel principal component analysis. Chem. Eng. Sci. 2004, 59, 223–234. [Google Scholar] [CrossRef]
Yu, W.; Zhao, C. Sparse exponential discriminant analysis and its application to fault diagnosis. IEEE Trans. Ind. Electron. 2017, 65, 5931–5940. [Google Scholar] [CrossRef]
Lee, H.; Cichocki, A.; Choi, S. Kernel nonnegative matrix factorization for spectral EEG feature extraction. Neurocomputing 2009, 72, 3182–3190. [Google Scholar] [CrossRef]

Figure 1. Illustration Authors: We have noted that changes to the position of figures and tables may occur during the production stage. of hypothesis of the characteristics of the shallow model and hierarchical model.

Figure 2. Illustration of the HMF-based fault diagnosis.

Figure 3. Diagram of the Multiple Phase Flow process [26].

Figure 4. The confusion matrices of different methods.

Figure 5. Diagnosis performance of deep models and our models in terms of TPR.

Figure 6. Average time cost comparison of NMF, HMF, NHMF and DBN.

Figure 7. The performance heat maps when models adopt different layer sizes: (a) HMF and (b) NHMF.

Figure 8. The convergence curves of (a) HMF and (b) NHMF.

Figure 9. The weight vectors of

Y

. (a) NMF, (b) HMF and (c) NHMF.

Figure 9. The weight vectors of

Y

. (a) NMF, (b) HMF and (c) NHMF.

Table 1. The description of the TPF process.

Class	Type	Description	#Samples
#1	Normal condition	-	4000
#2	Fault	Air line blockage	1000
#3	Fault	Water line blockage	1000
#4	Fault	Top separator input blockage	1000
#5	Fault	Open direct bypass	1000

Table 2. The diagnosis performance comparison in terms of TPR and PPV.

	Method	avg.TPR	avg.PPV		Method	avg.TPR	avg.PPV
	PCA(5)	71.98%	79.95%		KPCA(Gaussian)	87.40%	87.21%
	LDA(4)	82.23%	83.21%		SEDA(Default)	88.40%	88.05%
Linear	SDA(4)	85.88%	86.41%	Nonlinear	KNMF(Gaussian)	85.70%	86.05%
	NMF(5)	82.80%	83.29%		KNMF(Polynomial)	86.90%	88.25%
	HMF(19-12-5)	92.23%	92.34%		NHMF(19-12-5)	94.48%	94.45%

Table 3. The diagnosis performance comparison of deep models.

	SAE	DBN	HMF	NHMF
avg. TPR	88.49%	89.88%	92.23%	94.48%
avg. PPV	88.45%	89.98%	92.34%	94.45%

Table 4. The performance comparison when NHMF adopts different layers.

	NMF		NHMF
	Shallow (5)		2-Layer (19-5)		3-Layer (19-12-5)		4-Layer (20-15-10-5)		5-Layer (22-20-15-10-5)
	TPR	PPV	TPR	PPV	TPR	PPV	TPR	PPV	TPR	PPV
#1	79.55%	85.26%	85.95%	90.57%	90.85%	97.95%	93.05%	96.18%	92.20%	95.20%
#2	53.60%	54.03%	94.00%	81.74%	98.00%	87.50%	94.80%	84.79%	88.20%	84.48%
#3	100.00%	74.85%	72.20%	68.89%	94.80%	81.87%	91.20%	91.38%	95.20%	89.81%
#4	91.40%	96.41%	98.00%	97.42%	99.60%	98.81%	99.20%	97.83%	98.00%	96.27%
#5	99.20%	100.00%	100.00%	100.00%	100.00%	99.60%	99.80%	99.80%	100.00%	99.60%
Avg.	82.80%	83.29%	88.50%	88.79%	94.48%	94.95%	94.65%	94.81%	93.78%	93.87%

Table 5. The performance comparison when HMF adopts different layers.

	NMF		HMF
	Shallow (5)		2-Layer (19-5)		3-Layer (19-12-5)		4-Layer (20-15-10-5)		5-Layer (22-20-15-10-5)
	TPR	PPV	TPR	PPV	TPR	PPV	TPR	PPV	TPR	PPV
#1	79.55%	85.26%	89.00%	91.24%	91.65%	92.81%	88.60%	88.78%	89.50%	92.55%
#2	53.60%	54.03%	74.00%	71.43%	84.80%	79.25%	79.20%	74.86%	77.80%	75.68%
#3	100.00%	74.85%	96.20%	89.91%	90.00%	93.56%	79.40%	80.36%	97.20%	87.88%
#4	91.40%	96.41%	95.20%	93.15%	96.40%	94.88%	96.60%	96.60%	95.20%	90.32%
#5	99.20%	100.00%	96.80%	99.79%	100.00%	99.80%	96.00%	99.79%	93.40%	98.94%
Avg.	82.80%	83.29%	89.78%	89.90%	92.23%	92.34%	88.20%	88.34%	90.20%	90.38%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, Y.; Zhou, H.; Liu, J.; Tan, X. A Hierarchical Matrix Factorization-Based Method for Intelligent Industrial Fault Diagnosis. Sensors 2024, 24, 5408. https://doi.org/10.3390/s24165408

AMA Style

Li Y, Zhou H, Liu J, Tan X. A Hierarchical Matrix Factorization-Based Method for Intelligent Industrial Fault Diagnosis. Sensors. 2024; 24(16):5408. https://doi.org/10.3390/s24165408

Chicago/Turabian Style

Li, Yanxia, Han Zhou, Jiajia Liu, and Xuemin Tan. 2024. "A Hierarchical Matrix Factorization-Based Method for Intelligent Industrial Fault Diagnosis" Sensors 24, no. 16: 5408. https://doi.org/10.3390/s24165408

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Hierarchical Matrix Factorization-Based Method for Intelligent Industrial Fault Diagnosis

Abstract

1. Introduction

2. Preliminary

3. Methodology

3.1. Hierarchical Matrix Factorization

3.2. Nonlinear Hierarchical Matrix Factorization

3.3. Fault Identification

4. Case Study

4.1. Data Description

4.2. Comparison with Shallow Methods

4.3. Comparison with Deep Models

4.4. Time Consumption Analysis

4.5. Effectiveness of Hierarchies

4.6. Parameter Study on Hierarchical Structure

4.7. Convergence Analysis

4.8. Visualization of the Learned Feature Matrices

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI