Fault Detection and Isolation for Time-Varying Processes Using Neural-Based Principal Component Analysis

Kazemi, Pezhman; Masoumian, Armin; Martin, Philip

doi:10.3390/pr12061218

Open AccessArticle

Fault Detection and Isolation for Time-Varying Processes Using Neural-Based Principal Component Analysis

by

Pezhman Kazemi

^1,*

,

Armin Masoumian

²

and

Philip Martin

¹

Department of Chemical Engineering, The University of Manchester, Manchester M13 9PL, UK

²

Department of Computer Engineering and Mathematics, Universitat Rovira I Virgili, 43007 Tarragona, Spain

^*

Author to whom correspondence should be addressed.

Processes 2024, 12(6), 1218; https://doi.org/10.3390/pr12061218

Submission received: 13 May 2024 / Revised: 1 June 2024 / Accepted: 9 June 2024 / Published: 14 June 2024

(This article belongs to the Section Process Control and Monitoring)

Download

Browse Figures

Versions Notes

Abstract

:

This paper introduces a new adaptive framework for fault detection and diagnosis using neural-based PCA. This framework addresses the limitations of traditional PCA in handling time-varying processes. The adaptive framework updates the correlation matrix recursively, allowing it to adapt to the natural time-varying behavior of processes. It also recursively determines the number of principal components and the confidence limits for three process monitoring statistics (T², Q, and the combined index φ). To diagnose faults, four different types of contribution plots are used as follows: complete decomposition contributions (CDC), partial decomposition contributions (PDC), diagonal-based contributions (DBCs), and reconstruction-based contributions (RBCs). The evaluation through three simulation studies—including a numerical example, the continuous stirred tank reactor (CSTR) process, and water resource recovery facilities (WRRFs)—demonstrates that the combined statistics provided superior fault detection and diagnosis performance compared with individual statistics. Additionally, the study of the isolation method shows that no single method can definitively be claimed as superior. Overall, our study highlights the strength and versatility of neural-based PCA for detecting and diagnosing faults in dynamic processes.

Keywords:

neural-based PCA; Benchmark Simulation No. 2; continuous stirred tank reactor; fault detection; fault isolation; time-varying processes

1. Introduction

Although fault detection and diagnosis (FDD) has been a well-established field for several decades, it continues to receive significant attention because of the ever-increasing complexity of industrial processes. Implementing a proper process monitoring framework can improve product quality, prevent wastage of resources, and enhance process safety. Faults are the unpermitted deviation of at least one process feature from its expected value [1]. With the vast amount of sensor measurements in industrial processes, it is likely impossible for plant operators to monitor all measurements and detect faults manually. Therefore, developing an automated process monitoring framework is imperative for such industries [2,3].

Fault detection frameworks are divided into three main categories as follows: model-based methods knowledge-based methods, and data-driven methods [4,5]. Because of the complex nature of modern industrial processes, applying the first two categories may become more challenging because acquiring in-depth knowledge of a complex system’s behavior takes tremendous time and effort and sometimes is impossible. The lack of an understanding of system behavior, the availability of a massive amount of sensor measurements, digitalization, and the emergence of the Fourth Industrial Revolution are among the most important driving forces for employing data-driven methods that are merely designed based on historical data [2,6]. Data-driven methods are divided into two main categories including supervised and unsupervised approaches. In the supervised approach, a model is developed using labeled data that contain normal and faulty samples, while in the unsupervised approach, the data do not have any labels. In the past few years, different unsupervised methods such as principal component analysis (PCA), partial least squares (PLS), fisher discriminant analysis (FDA), independent component analysis (ICA), and canonical variate analysis (CVA) have been used for process monitoring [7]. PCA is widely recognized as an effective unsupervised technique for process monitoring. It transforms a high-dimensional dataset containing correlated variables into a collection of linearly uncorrelated variables termed principal components (PCs). One of their key features is that they maintain the original data information. The first step in developing a PCA-based fault detection framework is collecting process data under normal operation. Then, two common monitoring statistics (e.g., Hotelling’s T², Q) are calculated with their corresponding control limits [8]. If one of these two statistics passes its control limit, it indicates the presence of a fault in the process. Although numerous successful applications of the conventional PCA method have been reported in the literature, it also has a few significant drawbacks. Perhaps, one of the most significant ones is the inability to adapt to the natural slow time-varying changes in processes. Therefore, if applied to processes with time-varying behavior, it interprets the natural changes in the process as a fault, which may lead to excessive rates of false alarms and missing detections. Catalyst deactivation, heat exchanger fouling, equipment and sensor aging, and process time-drifting are some examples of natural changes in the processes [9]. Because of the problem mentioned above, developing an adaptive PCA fault detection framework that represents the current condition of the process is essential. Two important properties of the PCA fault detection framework for the time-varying processes are the speed of adaption and computational complexity. The speed of adaptation describes how fast the model adapts to the natural time-varying changes in the process, and the computational complexity is the time that the algorithm spends to finish one iteration [10]. These two properties are dependent on the method used for the eigendecomposition of the sample covariance matrix during the adaptation process. In adaptive PCA fault detection, it is computationally expensive to recalculate the eigendecomposition for each available new data sample. Therefore, seeking less computationally expensive PCA fault detection frameworks that employ new approaches for estimating eigendecomposition is an ongoing research topic in this field [11]. The commonly used adaptive frameworks for PCA fault detection are moving window (MW) and recursive. In MW, as the window moves forward, the new process model is generated by limiting the effect of older observations and including the newest one to estimate the mean and covariance. The size of the window is equivalent to the adaption speed of the model; therefore, the computational complexity of MW can be severely increased when a large window size is considered to capture sufficient process variation. On the other hand, choosing a smaller window size to enhance the computational complexity may not properly capture the underlying relationships among the process variables because of fewer observations inside the window. Therefore, the trade-off between computational complexity and capturing sufficient information should be considered for the choice of window size. Unlike MW, in the recursive framework, the model is updated as a new sample becomes available without discarding the older samples. This can significantly reduce the computational complexity by updating the previous model rather than building a new model from the original data. In addition, using the forgetting factor in the recursive techniques allows for down-weighting the older samples in favor of newer ones that accurately reflect the current process operation [10]. The forgetting factor needs to be determined according to the speed of process changes; it should be bigger for the process with rapid change, whereas a smaller forgetting factor should be applied when the change is slow. The computational complexity of the algorithm is commonly expressed as a count of floating-point operations or flops. A flop is a basic computation unit defined as one addition, subtraction, multiplication, or division of floating-point numbers [12]. As discussed earlier, many researchers developed fault detection frameworks by applying different eigendecomposition algorithms with various computational complexity; some of them are stated in the following paragraph. Assuming that m and q are the numbers of process variables and the retained PCs, respectively, Elshenawy et al. presented two recursive fault detection frameworks based on first-order perturbation analysis (FOP) and data projection method (DPM) techniques. The computational complexity of the FOP and DPM techniques was O(m²) and O(mq²), respectively. The effectiveness of the presented fault detection frameworks was evaluated by monitoring a non-isothermal continuous stirred tank reactor (CSTR) process [13]. Haimi et al. investigated a PCA fault detection framework for a large-scale municipal water resource recovery facility (WRRF), employing two types (adaptive and fixed window lengths) of moving-window PCA (MWPCA). For both types of MWPCA, the computational complexity was O(m³). The study found that monitoring systems with appropriate settings were able to effectively detect shifts and spikes in measurements and process conditions [14]. Additionally, Kazemi et al. created a comprehensive process monitoring framework using incremental principal component analysis (IPCA). In their proposed framework, the eigenspace is updated by incrementing new data to the PCA with the computational complexity of O(mq²). The performance of the proposed fault detection framework was evaluated using Benchmark Simulation Model No. 2 (BSM2) developed by the International Water Association (IWA). The simulation results showed that the proposed framework could correctly isolate sensor faults even when these faults were relatively small [6]. In another work, Elshenawy et al. introduced a recursive fault detection framework based on an orthogonal iteration method, i.e., the fast data projection method (FDPM). It was claimed that the proposed method had the lowest computational complexity O(mq) among the methods commonly used for eigendecomposition of the covariance matrix. The effectiveness of the proposed recursive fault detection framework was demonstrated by three simulation studies, including a numerical example, the CSTR process, and the Tennessee Eastman process. The simulation results showed the reliability of the proposed adaptive fault detection method [15]. Chakour et al. developed an adaptive CIPCA (complete information PCA) method for handling complex data, particularly time-varying and interval data. The proposed approach, RCIPCA, recursively updated the CIPCA model using weighted mean and covariance matrix formulas for interval-valued data. The method was validated using a wind turbine benchmark, demonstrating its effectiveness in online monitoring uncertain systems. However, the authors did not address the complexity of the calculations involved in updating the CIPCA model [16]. Recently, Meng Qi et al. introduced a new technique for identifying faults in non-stationary processes. This method utilizes data-driven Dynamic Surrogate Models (DSMs) to predict process dynamic behaviors, allowing for robust fault detection. A crucial aspect of this method was the calculation of residuals between the model predictions and actual process measurements, which was essential for identifying significant process faults. To accomplish this, a detector based on either a One-Class Support Vector Machine (OC-SVM) or an Autoencoder (AE) was employed. The OC-SVM detector demonstrated superiority over the AE detector [17]. While this method has numerous advantages, it also presents challenges. For instance, creating highly accurate models for large-scale processes and establishing the key performance indicator for generating residuals can be difficult. The latter may pose challenges, so using an unsupervised technique such as PCA might be a better approach.

Adaptive PCA-based fault detection methods face a significant challenge in computational complexity. As can be seen, most of the current techniques are computationally expensive, with a complexity of O(m²) or O(m³), except for FDPM, which offers a complexity of O(mq). Our proposed method exhibits the same favorable computational complexity of O(mq), highlighting its potential for efficient fault detection.

Once a fault is identified, it is important to isolate the problematic variables in order to determine the underlying root cause precisely. The typical approach for this is using contribution plots [18]. These plots help to determine the impact of each variable on monitoring statistics like T² and Q. Usually, the variable with the highest contribution is often considered the primary factor responsible for fault occurrence. In real-time operation of the fault detection framework, contribution plots are continually updated for each sample based on its eigendecomposition.

This paper introduces a new recursive low-computational cost fault detection and isolation framework based on neural network-based PCAs (NN-based PCAs). NN-based PCAs are a subset of adaptive PCA in which the principal components are updated after each sample becomes available. Several types of NN-based PCA algorithms have been developed by drawing inspiration from Hebbian and Oja’s learning rules [19,20,21]. These rules have a remarkable similarity to neural network algorithms, where each principal component is similar to a neuron that is defined by its input weights. The Hebbian learning rule determines weight adjustments based on the correlation between presynaptic and postsynaptic signals [20]. Many other algorithms, like Oja’s rule, are built upon the normalized version of the Hebbian learning rule. Oja introduced a weight decay term into the normalized Hebbian rule to prevent instability [22]. The proposed adaptive fault detection framework in this paper utilizes the Oja rule that has the complexity of O(mq) [23]. The advantages of this PCA method are its simplicity of implementation, reasonable complexity, stability, and fast convergence toward orthonormality. The main objective of this paper is to provide a complete, novel, real-time adaptive fault detection and isolation framework based on the low-computational cost O(mq) neural-based PCA.

The paper is structured as follows: Section 2 offers a concise overview of conventional PCA-based fault detection. Section 3 outlines the architecture of the neural-based PCA. In Section 4, adaptive fault isolation methods are presented. The various stages for implementing the proposed framework are detailed in Section 5. Section 6 examines the performance of the proposed fault detection and diagnosis framework through a numerical example, as well as simulations of the CSTR process and the WRRFs. Section 6 also provides a discussion of the results. Finally, Section 7 offers conclusions drawn from this study.

2. Conventional PCA

The PCA method is a multivariate statistical technique commonly used for reducing the dimensionality of data in an interpretable way while preserving most of the information. In other words, PCA transforms a dataset containing potentially correlated variables into a collection of linearly uncorrelated variables known as principal components (PCs) [24]. Let X∈ℜ^N×m with N time samples and m sensor measurements be a historical data matrix collected under normal operating conditions. To avoid the scaling problem, x must be scaled to zero mean and unit variance. The PCA model can be expressed as:

X = T P^{T} + E

(1)

where E∈ℜ^N×m is the residual matrix and T∈ℜ^N×m and P∈ℜ^m×m are the score (columns are PCs) and loading matrices (columns are eigenvectors), respectively. The columns of the score matrix are orthogonal and generated by projecting X onto the loading matrix (

T = X P

).

To drive a PCA model, the covariance matrix, S∈ℜ^m×m, is decomposed by singular value decomposition into:

S = \frac{1}{n - 1} X^{T} X = P^{T} Λ P

(2)

where Λ∈ℜ^m×m is a diagonal matrix containing the eigenvalues Λ = diag (λ₁, λ₂… λ_m) arranged in descending order. Usually, the first (β) PCs explain the most variation in the dataset; therefore, P∈ℜ^m×β and its corresponding eigenvalues are Λ∈ℜ^β×β. In conventional PCA, the number of retained PCs is calculated once and remains constant over time. For process fault detection, two essential statistics, including Hotelling’s T² and Q, should be estimated. Hotelling’s T² measures the variation in the PCA model, while the non-captured variation is monitored using Q statistics [3]. These statistics are given by:

T^{2} = (\frac{x^{T} - μ}{δ}) P Λ^{- 1} P^{T} (\frac{x - μ}{δ})

(3)

Q = (\frac{x^{T} - μ}{δ}) (I - P P^{T}) (\frac{x - μ}{δ})

(4)

where x is the vector of sensor measurement or data stream x∈ℜ^1×m,

μ

and

δ

are the mean and standard deviation of x, respectively, and I∈ℜ^m×m is the identity matrix.

The control limit for Hotelling’s T² index is approximated with the chi distribution χ², β degrees of freedom, and significance level α

σ^{2} = χ_{α, β}^{2}

(5)

and for the Q statistic, the control limit is given by

γ^{2} = g^{Q} χ_{α, h^{Q}}^{2}

(6)

where

θ_{i} = \sum_{j = β + 1}^{m} λ_{j;}^{i} i = 1,2

(7)

g^{Q} = \frac{θ_{2}}{θ_{1}}

(8)

h^{Q} = \frac{θ_{1}^{2}}{θ_{2}}

(9)

Yue et al. suggested that the T² and Q statistics behave in a complementary manner. Therefore, they combined these two statistics into one to simplify the fault detection task [25]. The proposed combined statistic

φ

for fault detection is estimated as follows:

φ = \frac{T^{2}}{σ^{2}} + \frac{Q}{γ^{2}} = (\frac{x^{T} - μ}{δ}) Φ (\frac{x - μ}{δ})

(10)

where

Φ = P Λ^{- 1} P^{T} / σ + (I - P P^{T}) / γ

(11)

and the

φ

control limit can be calculated as follows:

η^{2} = g^{φ} χ_{α, h^{φ}}^{2}

(12)

where

g^{φ} = (\frac{β}{σ^{4}} + \frac{θ_{2}}{γ^{4}}) / (\frac{β}{σ^{2}} + \frac{θ_{1}}{γ^{2}})

(13)

h^{φ} = (\frac{β}{σ^{2}} + \frac{θ_{1}}{γ^{2}})^{2} / (\frac{β}{σ^{4}} + \frac{θ_{2}}{γ^{4}})

(14)

The fault can be detected by plotting these three statistics and their corresponding control limits. If one of these statistics passes its control limits, a fault has occurred; otherwise, there is no fault in the process.

3. Neural Network-Based PCA

For time-varying processes, the covariance matrix can vary over a period of time; therefore, if conventional PCA is employed for fault detection in such processes, distinguishing between faults and typical time-varying characteristics resulting from external disturbances and operational changes becomes challenging. Thus, an adaptive fault detection framework must be developed to cope with this issue. Adaptive PCA refers to a subset of PCAs that can dynamically update its parameters without requiring prior knowledge of the data’s history. NN-based PCA is a type of adaptive PCA in which the eigenvalues (λ₁, λ₂,…, λ_m) and eigenvectors (u₁,…, u_m) are estimated recursively from the input data stream x_k∈ℜ^1×m (with k being discrete time). Usually, the sequence of x_k is assumed stationary, but the proposed algorithm can also be used as an adaptive PCA for non-stationary (time-varying) input streams. In adaptive fault detection, the mean

μ

and the standard deviation

δ

must be re-estimated for each data sample according to the following equations [26]:

μ_{k} = (1 - f) μ_{k - 1} + f x_{k}

(15)

δ_{k} = \sqrt{(1 - f) δ_{k - 1}^{2} + f {(x_{k} - μ_{k})}^{2}}

(16)

where

0 \leq f < 1

is a forgetting factor.

By using the above equations, x can be scaled to zero mean and unit variance according to:

{\hat{x}}_{k} = \frac{x_{k} - μ_{k}}{δ_{k}}

(17)

where

{\hat{x}}_{k}

is the standardized sample.

Oja suggested the following recursive PCA expression [26]:

Let U_k = (u_1,k,…,

u_{β_{k}, k}

) be an N × β_k matrix (β_k < m) consisting of vectors u_i_,k. Therefore, the eigenvectors can be updated as

{\tilde{U}}_{k} = U_{k - 1} + g {\hat{x}}_{k} {\hat{x}}_{k}^{T} U_{k - 1}

(18)

U_{k} = O r t h o n o r m a l i z a t i o n ({\tilde{U}}_{k})

(19)

where the columns of

U_{k} \in R^{m \times β_{k}}

are the eigenvectors for β PCs at time k, and

g

is the gain parameter.

In contrast to conventional PCA in the recursive PCA framework, the number of retained PCs must be estimated over time (β_k). Equation (18) can be orthonormalized using different approaches. For example, Oja employed the Gram–Schmidt procedure and called the combination of Equations (18) and (19) the Stochastic Gradient Ascent (SGA) algorithm. Performing the Gram–Schmidt orthonormalization on the columns of matrix U_k yields the following result:

Lemma:

For small

g

, the j-th column u_j,k of matrix

U_{k}

in Equations (18) and (19) satisfies

\begin{matrix} u_{j, k} = u_{j, k - 1} & + g {\hat{x}}_{k}^{T} u_{j, k - 1} \\ \times [{\hat{x}}_{k} - ({\hat{x}}_{k}^{T} u_{j, k - 1}) u_{j, k - 1} - 2 \sum_{i = 1}^{j - 1} ({\hat{x}}_{k}^{T} u_{i, k - 1}) u_{i, k - 1}] + O (τ^{2}) \\ , j = 1, \dots, β_{k} \end{matrix}

(20)

Equation (20) is very similar to the implementation of a one-layer neural network of β parallel units, with

x_{k}

as input vector and

u_{j, k - 1}

as the weight vector of the j-th unit. The output of the unit j can be expressed by

y_{j, k} = {{\hat{x}}_{k}^{T} u}_{j, k - 1}

(21)

The computational speed of Equation (20) can be increased by applying the first-order approximation of the Gram–Schmidt orthonormalization (i.e., omitting the terms of order O(

τ

²)). This enables the approximate of Equation (20) as follows

\begin{matrix} u_{j, k} = u_{j, k - 1} + g y_{j, k} [{\hat{x}}_{k} - y_{j, k} u_{j, k - 1} - 2 \sum_{i = 1}^{j - 1} y_{i, k} u_{i, k - 1}] \\ , j = 1, \dots, β_{k} \end{matrix}

(22)

This implementation of the SGA algorithm can be interpreted as NN-based PCA [23]. By considering j = 1, Equation (22) is converted to Oja’s learning rule [27].

Using the SGA algorithm allows for consistent recursive estimation of the eigenvalues λ_j_,_k corresponding to its eigenvectors

u_{j, k}

by the following equation [28]:

λ_{j, k} = λ_{j, k - 1} + g (y_{j, k - 1}^{2} - λ_{j, k - 1})

(23)

As mentioned earlier, for the recursive fault detection framework, the monitoring statistics need to be updated for each data sample. Therefore, the monitoring statistics and their control limits defined in Section 2 should be modified to the following:

For Hotelling’s T² statistic:

T_{k}^{2} = {\hat{x}}_{k}^{T} U_{k} Λ_{k}^{- 1} U_{k}^{T} {\hat{x}}_{k}

(24)

σ_{k}^{2} = χ_{α, β_{k}}^{2}

(25)

for the Q statistic:

Q_{k} = {\hat{x}}_{k}^{T} (I - U_{k} U_{k}^{T}) {\hat{x}}_{k}

(26)

γ_{k}^{2} = g_{k}^{Q} χ_{α, h_{k}^{Q}}^{2}

(27)

where

θ_{i, k} = \sum_{j = β_{k} + 1}^{m} λ_{j, k;}^{i} i = 1,2

(28)

g_{k}^{Q} = \frac{θ_{2, k}}{θ_{1, k}}

(29)

h_{k}^{Q} = \frac{θ_{1, k}^{2}}{θ_{2, k}}

(30)

and for the

φ

statistic:

φ_{k} = \frac{T_{k}^{2}}{σ_{k}^{2}} + \frac{Q_{k}}{γ_{k}^{2}} = {\hat{x}}_{k}^{T} Φ_{k} {\hat{x}}_{k}

(31)

where

Φ_{k} = U_{k} Λ_{k}^{- 1} U_{k}^{T} / σ_{k}^{2} + (I - U_{k} U_{k}^{T}) / γ_{k}^{2}

η_{k}^{2} = g_{k}^{φ} χ_{α, h_{k}^{φ}}^{2}

(32)

where

g_{k}^{φ} = (\frac{β_{k}}{σ_{k}^{4}} + \frac{θ_{2, k}}{γ_{k}^{4}}) / (\frac{β_{k}}{σ_{k}^{2}} + \frac{θ_{1, k}}{γ_{k}^{2}})

(33)

h_{k}^{φ} = (\frac{β_{k}}{σ_{k}^{2}} + \frac{θ_{1, k}}{γ_{k}^{2}})^{2} / (\frac{β_{k}}{σ_{k}^{4}} + \frac{θ_{2, k}}{γ_{k}^{4}})

(34)

To simplify the notation, for the rest of this paper, Ξ_k is defined as in Table 1.

Determination of the Number of PCs

As stated previously, for the recursive PCA framework, the number of retained PCs needs to be estimated over time. There are many ways to determine the number of PCs [29]. In this study, we utilized the cumulative percent variance (CPV), calculated as follows:

C P V (β_{k}) = \frac{\sum_{j = 1}^{β_{k}} λ_{j}}{\sum_{j = 1}^{m} λ_{j}} 100 %

(35)

where

β_{k}

represents the number of selected PCs. The determination of the number of PCs occurs when the CPV attains a pre-established threshold, such as 95%.

4. Fault Isolation

After identifying a fault, the subsequent step involves determining its primary cause. Among the widely used approaches for fault diagnosis is the utilization of contribution plots. This method entails calculating the contribution of each variable to the fault, with those exhibiting the highest contribution likely to be the faulty variables. Since there is not a singular method for decomposing fault detection statistics, several techniques have been proposed for fault isolation [30]. In this study, we implement and compare four common fault isolation methods including complete decomposition contributions (CDCs), partial decomposition contributions (PDCs), diagonal-based contributions (DBC), and reconstruction-based contributions (RBCs).

4.1. Complete Decomposition Contributions (CDCs)

In this approach, the fault detection indices are decomposed into the sum of individual variable contributions. The CDC is widely used in industry and was presented for the first time for Q statistics by Miller et al. [31]. Later, the diagnosis method for T² and φ statistics for CDCs were introduced by Wise et al. and Qin et al., respectively [30,32]. The CDC is calculated according to the following relation

{C D C}_{i, k}^{i n d e x} = {{(ξ}_{i}^{T} Ξ_{k}^{0.5} {\hat{x}}_{k})}^{2}

(36)

where Ξ_k is calculated recursively according to Table 1 for the monitoring statistics,

T_{k}^{2}

,

Q_{k},

and

φ_{k}

.

ξ_{i}

is the i-th column of the identity matrix. It should be noted that the

{C D C}_{i, k}^{i n d e x}

is always positive.

4.2. Partial Decomposition Contributions (PDCs)

Unlike the DCD, the PDC partially decomposes a fault detection static as the summation of variable contributions. This method was first introduced for T² statistics by Nomikos and MacGregor [18]. Later, Alcala and Qin proposed the PDC method for both

Q

and

φ

statistics [30]. The PDC is calculated according to the following relation

{P D C}_{i, k}^{i n d e x} = {\hat{x}}_{k}^{T} Ξ_{k} {ξ_{i} ξ}_{i}^{T} {\hat{x}}_{k}

(37)

The PDC could be negative according to the results in the study by Westerhuis et al. [33].

4.3. Diagonal-Based Contributions (DBCs)

In the DBC method, the correlations among variables are ignored. While ignoring the correlations between variables is not recommended for fault detection, it can be helpful in the contribution analysis method for fault diagnosis. The DBC method was introduced for T² by Qin and Li [34], for Q by Alcala and Qin [35], and for

φ

statistics by [36]. The general form of the DBC equation can be written as follows

{D B C}_{i, k}^{i n d e x} = {\hat{x}}_{k}^{T} {ξ_{i} ξ}_{i}^{T} Ξ_{k} {ξ_{i} ξ}_{i}^{T} {\hat{x}}_{k}

(38)

According to the above relation, DBC values can be negative as well.

4.4. Reconstruction-Based Contributions (RBCs)

The RBC method uses the amount of reconstruction of monitoring statistics along a variable direction as the contribution of that variable to the monitoring statistics. RBC values are always positive. Alcala and Qin introduced this method for Hotelling’s T², Q, and

φ

statistics [37]. The general form of the RBC equation can be written as follows

{R B C}_{i, k}^{i n d e x} = \frac{{{(ξ}_{i}^{T} Ξ_{k} {\hat{x}}_{k})}^{2}}{ξ_{i}^{T} Ξ_{k} ξ_{i}}

(39)

5. Adaptive Fault Detection and Isolation

This section outlines the comprehensive structure of the adaptive fault detection and isolation approach. The proposed framework comprises two primary stages as follows: (1) offline model training and (2) online model updating. The subsequent steps provide an overview of the entire process.

Offline training

Collect the training data matrix X∈ℜ^N×m and normalize it to zero mean and unit variance using Equations (15)–(17).
Utilize Equation (2) to compute S and its corresponding eigenpairs $u_{j, k - 1}$ and $λ_{j, k - 1}$ .
Determine the number of retained principal components PCs ( $β_{k - 1}$ ) by applying Equation (35).
Calculate the control limits of the fault detection statistics $σ_{k - 1}$ , $γ_{k - 1},$ and $η_{k - 1}^{2}$ using Equations (25), (27), and (32).

Online monitoring

Collect a new data vector x_k and normalize it to zero mean and unit variance using Equations (15)–(17).
Calculate the monitoring statistics $T_{k}^{2}$ , $Q_{k}$ , and $φ_{k}$ using Equations (24), (26), and (31).
If the values of the monitoring statistics are below their corresponding control limits ( $σ_{k - 1}$ , $γ_{k - 1},$ and $η_{k}^{2}$ ), the process status is normal. The updating procedure continues as follows:
(i)
Update $μ_{k}$ and $δ_{k}$ using Equations (15) and (16).
(ii)
Calculate the updated eigenpairs $u_{j, k}$ and $λ_{j, k}$ using Equations (22) and (23).
(iii)
Determine the number of retained PCs, β_k, using Equation (35).
(iv)
Update the monitoring statistics control limits using Equations (24), (26), and (31).

Return to step one.
If the values of the monitoring statistics exceed their corresponding control limits, indicating a faulty process status, the updating procedure is halted. The fault isolation procedure is initiated to identify the process variables responsible for the detected fault.
(i)
Calculate the contribution statistics for the faulty points according to Equations (36)–(39).
(ii)
The process variables with the largest contributions are responsible for the fault.

The complete diagram for the proposed adaptive NN-based PCA monitoring procedure is shown in Figure 1.

6. Simulation Results

In this section, we evaluate the effectiveness of the proposed adaptive fault detection and isolation approach through three simulation cases including a numerical example, the continuous stirred tank reactor (CSTR) chemical process, and the WRRFs. We demonstrate that the conventional PCA model struggles to differentiate natural process variations from faults, leading to numerous false alarms in each simulation case. To gauge the performance of the proposed framework, we calculate two key indices including the false alarm rate (FAR) and the missed detection rate (MDR). FAR quantifies the proportion of data samples violating the control limits under normal operating conditions, while MDR measures the proportion of faulty events misidentified as normal under faulty operation conditions. These indices can be obtained by the following Equations:

F A R = \frac{n u m b e r o f v i o l a t e d s a m p l e s (N o r m a l o p e r a t i o n)}{n u m b e r o f n o r m a l d a t a s a m p l e s} \times 100

(40)

M D R = 1 - \frac{n u m b e r o f v i o l a t e d s a m p l e s (F a u l t y o p e r a t i o n)}{n u m b e r o f f a u l t y d a t a s a m p l e s} \times 100

(41)

In this paper, the detection delay is defined as the number of data samples from fault onset until the time that monitoring statistics exceed their control limit for five continuous data samples. The

f

,

g,

and

α

parameters of NN-based PCA for all simulations were calculated by trial and error. The tuning parameters were adjusted in both normal and faulty scenarios until the algorithm adapted to normal non-stationary behavior and improved the fault detection rate.

6.1. Numerical Example

The data in the numerical example were obtained by the simulation of a numeric multivariate process. The process contains six variables that depend on three parameters, t₁, t_2, and t₃, according to the following equation [7]:

[\begin{matrix} \begin{matrix} x_{1} \\ x_{2} \end{matrix} \\ x_{3} \\ \begin{matrix} x_{4} \\ \begin{matrix} x_{5} \\ x_{6} \end{matrix} \end{matrix} \end{matrix}] = [\begin{matrix} \begin{matrix} - 0.2310 & - 0.0816 & - 0.2662 \end{matrix} \\ \begin{matrix} \begin{matrix} - 0.3241 & 0.7055 & - 0.2158 \end{matrix} \\ \begin{matrix} - 0.217 & - 0.3056 & - 0.5207 \end{matrix} \\ \begin{matrix} \begin{matrix} - 0.4089 & - 0.3442 & - 0.4501 \end{matrix} \\ \begin{matrix} - 0.6408 & 0.3102 & 0.2372 \end{matrix} \\ \begin{matrix} - 0.4655 & - 0.433 & 0.5938 \end{matrix} \end{matrix} \end{matrix} \end{matrix}] [\begin{matrix} t_{1} \\ \begin{matrix} t_{2} \\ t_{3} \end{matrix} \end{matrix}] + e

(42)

The parameters t₁, t₂ and t₃ are uniformly distributed random variables over the interval of (0, 2), (0, 1.6), and (0, 1.2), respectively. The noise e adds uncertainty to the model, and each of its elements has a zero mean and standard deviation of 0.2. A dataset of 1600 samples was generated to evaluate the performance of the proposed framework. Out of these 1600 samples, 300 were used for offline training, and the rest (1300 samples) were used for online monitoring. In this paper, the offline training samples are not presented in the monitoring plots. The time-varying characteristic was introduced by slow drift in variables t₁, t₂, and t₃ at the beginning of the time sample 500. The slopes of drift were 0.001, 0.003, and 0.0008, respectively, and continued until the simulation ended. Two faults occurred in the process at time sample 700:

Fault 1: step change of t₂ by 1.
Fault 2: t₂ linear ramp with a slope of 0.007.

A dataset without fault was generated to show the drawback of conventional PCA in dealing with the time-varying characteristic of the simulated data. The number of PCs for both methods was selected based on 84% of the total variance. The

f

,

g,

and

α

parameters of the NN-based PCA were chosen to be 0.00007, 0.004, and 0.98, respectively. Figure 2 shows the monitoring statistics estimated by conventional PCA and NN-based PCA for the fault-free simulation.

As illustrated in Figure 2, the T², Q, and

φ

monitoring statistics for conventional PCA exceeded their respective control limits several times under a fault-free state of numerical simulation. While in the case of NN-based PCA, the number of times that the monitoring statistics exceeded their control limits is very few (low FAR in Table 2) compared with conventional PCA. These results indicate the significant drawback of conventional PCA in dealing with the time-varying characteristics of processes. Using conventional PCA for processes with time-varying characteristics can lead to a high rate of false alarms. Time-varying processes have dynamic or statistical characteristics that change over time. However, conventional PCA assumes these characteristics to be constant, resulting in an excessive rate of false alarms [6]. Because of the drawback of conventional PCA, we only show the results of this method for the fault-free scenarios in this paper. Figure 3 shows the detection results for numerical simulation faults 1 and 2. As seen in both cases, the

φ

statistic outperformed the other monitoring statistics.

Table 2 summarizes the detection performance of different simulated cases. The results show that the

φ

statistic performs better than the Q and T² statistics because of its lower MDR for faulty cases.

In the case of fault 2, both the T² and

φ

statistics detected the fault with some delay (176 and 136 data samples, respectively). The delay is due to small increments (0.007) of the linear ramp, making it difficult to allocate early at the beginning of its occurrence. The FAR of the

φ

statistic for both faulty and fault-free cases is slightly more than the other statistic; however, it can be neglected because of its high sensitivity in detecting the faults. Generally, based on Table 2 and Figure 3, it can be concluded that the

φ

statistic is more accurate in detecting the faults compared with the Q and T² statistics in this case.

The diagnosis results of NN-based PCA for numerical simulation fault 1 using different contribution plots are illustrated in Figure 4.

As can be seen, almost all contribution plots correctly detected variable two as a faulty variable. However, in the RBC-T² and RBC-Q plots, variable 5 also contributed to the fault. This is attributed to the smearing effect, wherein a fault identified in one variable influences the contributions of other variables. The degree of smearing is determined by the chosen number of principal components in the underlying PCA model [38]. The diagnosis results for numerical simulation fault 2 can be found in the Supplementary Materials Section, Figure S1.

6.2. CSTR Process

The simulated CSTR process is a well-known benchmark for assessing the performance of fault detection frameworks [7]. A schematic of the CSTR process is shown in Figure 5.

The process inputs include a premixed stream of reactant and solvent and a cooling water stream. The nine measured process variables are as follows: the reactant flow rate F_a, the reactant concentration C_a, the solvent flow rate F_s, the solvent concentration C_s, the coolant water flow rate F_c, the coolant water temperature T_c, the inlet stream temperature T_i, the outlet concentration C, and the outlet temperature T. The governing equations of the CSTR process are as follows:

\frac{d T}{d t} = \frac{F}{V} (T_{i} - T) - \frac{U A}{V ρ C_{p} (1 + \frac{U A}{2 F_{c} ρ_{c} C_{p c}})} (T - T_{c}) + (- ∆ H_{r}) V r

(43)

\frac{d C}{d t} = F (C - C_{i}) - V r

(44)

where

U A = β_{U A} a F_{c}^{b}

is the heat transfer coefficient,

r = β_{r} k_{0} e^{- \frac{E}{R T}} C

is the chemical reaction rate, and

C_{i} = {(F}_{a} C_{a} + F_{s} C_{s}) / (F_{a} + F_{s})

is the concentration of the inlet stream. The concentration C and temperature T of the product stream are controlled by manipulating the reactant flow rate F_a and the coolant water flow rate F_c using two PI controllers.

All settings and parameters of the CSTR model are given in the Supplementary Materials Section (Tables S1–S5). Process disturbances (see Table S4) are modeled as first-order autoregressive (AR) processes,

y_{i + 1} = ω_{y} + ν_{i + 1}

with

ν_{i + 1} ~ N (0, σ_{ν}^{2})

. All measured variables are contaminated with white Gaussian noise

e (t)

(see Table S5) and given as

e (t) ~ N (0, σ_{e}^{2})

. The process variables were arranged into the measurement vector as follows:

x = {[C C_{a} C_{s} F_{a} F_{c} F_{s} T T_{c} T_{i}]}^{T}

To evaluate the performance of the proposed framework, a dataset of 1600 samples was generated. Out of these 1600 samples, 300 were used for offline training, and the rest (1300 samples) were used for online monitoring. The time-varying characteristic was introduced by slow drift in

β_{r}

with magnitude dβ_r/dt = −10⁻³ at the beginning of the time sample 500. Five faulty scenarios were simulated; the faults started from time sample 700 and continued until time sample 1000. These faulty scenarios are described in Table 3.

The number of PCs for both methods was selected based on 95% of the total variance. The f, g, and α parameters of the NN-based PCA were chosen to be 13 × 10⁻⁹, 0.007, and 0.99, respectively.

The monitoring statistics estimated by conventional PCA and NN-based PCA for the CSTR simulation without faults are shown in Figure 6.

As can be seen, all monitoring statistics of the conventional PCA model exceeded their control limit after a few samples of variation in

β_{r}

. The proposed adaptive process monitoring model can automatically adapt to the natural process changes; therefore, the false alarm rate decreased drastically. Because of the space limitations of this paper, the fault detection results for faults 1—Bias and 3—Drift of the NN-based PCA are only shown and discussed in this section. The rest of the results can be found in the Supplementary Materials Section, Figures S2–S7. The fault detection results of the NN-based PCA for CSTR simulation with faults 1—Bias and 3—Drift are shown in Figure 7.

As can be seen in the case of fault 1—Bias, all three monitoring statistics could detect the fault; however, the Q and

φ

statistics have lower MDR compared with the T² statistic. In the case of fault 3—Drift, again, all three monitoring statistics could detect the fault with reasonable accuracy, although the Q monitoring statistic has more delays. Also, in Figure 7, it can be seen that for both faulty scenarios, as the fault ended at the 1000th time sample, the detection statics return to their normal value. This shows the ability of the model to continue to work without any intervention by the operator. The performance of the proposed framework applied to the CSTR simulation is summarized in Table 4. The results show that the

φ

statistic has a better performance than the T² and Q statistics in terms of average MDR and delay. Furthermore, comparing the monitoring statistics shows that both T² and Q have lower values of the average FAR compared with the

φ

statistic. However, the higher average FAR value of the

φ

statistic can be ignored because of its lower value of delay and MDR.

The diagnosis results of the NN-based PCA for CSTR simulation faults 1—Bias and 3—Drift using different contribution plots are illustrated in Figure 8 and Figure 9, respectively. As can be seen in Figure 8, most of the contribution plots correctly isolated sensor 6 (

F_{s}

) as the faulty sensor except for the RBC-Q and RBC-T² contribution plots, which suffer from the smearing effect and mislead the faulty sensor. The smearing effect also can be seen in Figure 9. Because of this effect, the PDC-Q, DBC-Q, and RBC-Q contribution plots in Figure 9 could not correctly isolate sensor 3 (C_s) as a faulty sensor.

6.3. Water Resource Recovery Facilities

Water resource recovery facilities (WRRFs) are known for being complicated and constantly changing because of factors such as the variable flow rate and the composition of influent. In this section, we use a proposed NN-based PCA framework to detect and isolate various faults in Benchmark Simulation No. 2 (BSM2) [39]. BSM2 is a sophisticated dynamic mathematical model that is commonly used to simulate WRRFs. It involves various unit operations such as a primary clarifier, an activated sludge biological reactor, a secondary clarifier, a thickener, an anaerobic digester, a dewatering unit, and a storage tank.

The activated sludge unit in BSM2 comprises two anoxic reactors (utilized for nitrification) with a total volume of 3000 m³ and three aerobic reactors (utilized for predenitrification) with a total volume of 9000 m³. The plant has a capacity of 20,648 m³ d⁻¹, with an average influent dry weather flow rate of 592 mg L⁻¹ of average biodegradable COD. To model the biological processes in the activated sludge and anaerobic digestion reactors, Activated Sludge Model No. 1 (ASM1) and the Anaerobic Digestion Model (ADM1) were employed, respectively.

The influent characteristics are represented by a 609-day dynamic influent data file, with a sampling frequency of one data point every 15 min, capturing variations in rainfall and seasonal temperature over the year [40]. To simulate different fault types and obtain sensor data from various parts of the process, modifications were made to the BSM2 simulation, resulting in a dataset comprising 320 data measurements (16 state variables × 20 measurement points). Given that monitoring all these variables is not standard practice in real WRRFs, this study focuses on realistic and commonly available sensor measurements. Figure 10 shows the layout of BSM2 and twenty-eight sensor measurements that can be obtained from various locations of the process.

These sensor measurements, captured every 15 min, are intentionally corrupted with white noise to mimic real-world sensor behavior. For the construction of the offline NN-based PCA process monitoring framework, data were collected under normal operating conditions spanning from days 435 to 457, amounting to approximately 2113 samples. This period was selected specifically because of the absence of rain or storm events.

Subsequently, measurements from days 458 to 488, totaling around 2974 samples, were utilized to evaluate the performance of the NN-based PCA framework under conditions of time-varying characteristics and abnormal behavior. The primary sources of time-variant behavior in this simulation stem from drift and abrupt changes in the total suspended solid (TSS) concentration of the reject flow from the dewatering unit. These fluctuations in TSS concentration are influenced by seasonal and diurnal patterns.

To simulate fault scenarios, six distinct scenarios related to sensors, process variables, and process parameters were generated. These faults were introduced starting from the 1248th time sample and persisted to the end of the simulation. Detailed descriptions of these faulty scenarios can be found in Table 5. The number of principal components for both methods was chosen based on capturing 82% of the total variance. The f, g, and α parameters of the NN-based PCA were chosen to be 10⁻⁸, 9 × 10⁻⁵, and 0.999, respectively.

The monitoring statistics estimated by conventional PCA and NN-based PCA for the WRRF simulation without faults are shown in Figure 11. As can be seen, because of the time-varying nature of WRRFs, all monitoring statistics of the conventional PCA passed their control limits after some time and caused many false alarms. In contrast, all monitoring statistics of the NN-based PCA framework stayed within their control limits because of its adaptation mechanism.

Because of the space limitations of this paper, the fault detection results for faults 1 and 5 of the NN-based PCA are only shown and discussed in this section. The rest of the results can be found in the Supplementary Materials Section, Figures S8–S15. The fault detection results of the NN-based PCA for WRRF simulation with faults 1 and 5 are shown in Figure 12. In the figure, it can be seen that the monitoring statistics correctly detect both faults. However, in both cases, there is some delay in detection. The delay in the case of fault 1 is due to the PI controller action; as the drift starts, the PI controller tries to compensate for it. Therefore, it cannot be detected at the initial stage of the drift occurrence. Fault 5 simulates the step-change in the nitrogen level of the anaerobic digestion (AD) process. In the AD process, the nitrogen level is an important factor that needs to be monitored closely. This is because high levels of nitrogen can inhibit microbial activity. It is important to note that fault 5 may not be detected immediately as it is a relatively minor issue, and its impact may not be significant during the early stages.

The contribution plots of fault 1 are shown in Figure 13. As can be seen, most of the contribution plots isolate sensor no. 10 instead of sensor no. 7, which is the faulty sensor. This is due to the PI controller action that compensates for the drift. Thus, the sensor cannot be detected directly by the contribution plots. However, the oxygen level reduction in the anoxic reactors (reactors no. 3, 4, and 5) because of the oxygen sensor drift will affect the other sensor measurements. For example, in this case, sensor no. 10, which measures the ammonia concentration in the output stream of the reactors, is isolated as faulty. As the concentration of dissolved oxygen decreases under the influence of the PI controller, the nitrification rate declines, leading to an increase in the concentration of ammonia. The other sensor affected by the oxygen level reduction is sensor no. 6, which is isolated by the RBC-Q and RBC-

φ

plots.

The contribution plots of fault 5 are shown in Figure 14. Fault 5 is not a sensor fault; therefore, the contribution plots cannot detect it directly. As seen in Figure 14, the variable that contributes most to the fault is variable no. 27. This is because of the direct correlation between the inorganic nitrogen concentration and the pH in AD; the pH will increase by increasing the inorganic nitrogen concentration in AD. Also, the smearing effect is more pronounced for the contribution plots related to the T² statistic. Although the contribution plots could not detect the fault directly in either case, they provide some indirect clues as to the root of the fault. The performance of the proposed framework applied to the WRRF simulation is summarized in Table 6. The results revealed that the φ statistic outperforms the T² and Q statistics in terms of average MDR and delay. Additionally, all three monitoring statistics have low average FAR values. When comparing the CSTR example with the WRRF simulation, a more significant delay in fault detection is observed in the WRRFs. This is expected because of the large scale of WRRFs and the prolonged residence time associated with these processes. Another contributing factor is the inherently slow nature of the biological processes involved.

7. Conclusions

This paper introduces a new and computationally efficient adaptive fault detection and isolation framework based on NN-based PCA. The computational complexity of this framework falls into the category of fast methods, making it suitable for real control system implementation. The proposed framework is validated through application to the numerical simulation, the CSTR process, and WRRF simulation. In comparison with conventional PCA, NN-based PCA adapts to the nonlinear and time-varying characteristics of the studied processes, leading to a significant reduction in the FAR. Additionally, during faulty scenarios, the framework effectively adapts to time-varying process characteristics while detecting and isolating faults. In order to detect faults, commonly used statistics such as the T², Q, and φ statistics were employed. The results indicate that the combined statistic φ has a slightly higher FAR, but it provides better fault detection and diagnosis performance compared with single statistics like T² or Q. Fault isolation was carried out using four different approaches including CDC, PDC, DBC, and RBC. The results revealed that the effectiveness of isolation methods is heavily reliant on the type of fault and the specific processes being studied. Consequently, it cannot be definitively stated that one isolation method holds an advantage over the others.

The main limitation of this approach is the lack of a systematic way to adjust its tuning parameters. This is a common issue with most adaptive frameworks introduced by other researchers. Future research should focus on implementing a systematic approach to fine-tune this method.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/pr12061218/s1, Figure S1: Fault isolation results of various contribution plots for the numerical simulation fault 2 (720th sample).; Figure S2: Fault detection results of NN-based PCA for CSTR simulation with fault 2-Drift (horizontal red line: confidence limit; dashed vertical red line: onset of the fault).; Figure S3: Fault isolation results of various contribution plots for the CSTR simulation fault 2-Drift (720th sample).; Figure S4: Fault detection results of NN-based PCA for CSTR simulation with fault 4-Bias (horizontal red line: confidence limit; dashed vertical red line: onset of the fault).; Figure S5: Fault isolation results of various contribution plots for the CSTR simulation fault 4-Bias (720th sample).; Figure S6: Fault detection results of NN-based PCA for CSTR simulation with fault 5- Complete failure (horizontal red line: confidence limit; dashed vertical red line: onset of the fault).; Figure S7: Fault isolation results of various contribution plots for the CSTR simulation fault 5- Complete failure (720th sample).; Figure S8: Fault detection results of NN-based PCA for WRRF simulation with fault 2 (horizontal red line: confidence limit; dashed vertical red line: onset of the fault).; Figure S9: Fault isolation results of various contribution plots for the WRRF simulation fault 2 (1500th sample).; Figure S10: Fault detection results of NN-based PCA for WRRF simulation with fault 3 (horizontal red line: confidence limit; dashed vertical red line: onset of the fault).; Figure S11: Fault isolation results of various contribution plots for the WRRF simulation fault 3 (1500th sample).; Figure S12: Fault detection results of NN-based PCA for WRRF simulation with fault 4 (horizontal red line: confidence limit; dashed vertical red line: onset of the fault).; Figure S13: Fault isolation results of various contribution plots for the WRRF simulation fault 4 (1500th sample).; Figure S14: Fault detection results of NN-based PCA for WRRF simulation with fault 6 (horizontal red line: confidence limit; dashed vertical red line: onset of the fault).; Figure S15: Fault isolation results of various contribution plots for the WRRF simulation fault 6 (1500th sample).; Table S1: Summary of reactor parameters.; Table S2: Summary of reactor initial conditions.; Table S3: Summary of parameters of the PI controllers.; Table S4: Autoregressive model parameters of input variables.; Table S5: Measurement noise of process variables.

Author Contributions

Conceptualization, P.K. and P.M.; Methodology, P.K.; Validation, P.K.; Investigation, A.M.; Data curation, P.K. and A.M.; Writing—original draft, P.K. and A.M.; Writing—review & editing, P.K., A.M. and P.M.; Supervision, P.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article and Supplementary Materials.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

AD	anaerobic digestion
AE	Autoencoder
ADM1	Anaerobic Digestion Model No.1
ASM1	Activated Sludge Model No.1
BSM2	Benchmark Simulation Model No. 2
CC	concentration controller
CDCs	complete decomposition contributions
CPV	cumulative percent variance
CSTR	continuous stirred tank reactor
CIPCA	complete information PCA
CVA	canonical variate analysis
DSMs	Dynamic Surrogate Models
DBCs	diagonal-based contributions
DPM	data projection method
FAR	false alarm rate
FDA	fisher discriminant analysis
FDD	fault detection diagnosis
FDPM	fast data projection method
FOP	first-order perturbation analysis
ICA	independent component analysis
IPCA	incremental principal component analysis
IWA	International Water Association
MW	moving window
MDR	missed detection rate
NN	neural network
OC-SVM	One-Class Support Vector Machine
PCs	principal components
PCA	principal component analysis
PDCs	partial decomposition contributions
PLS	partial least squares
RBCs	reconstruction-based contributions
SGA	Stochastic Gradient Ascent
TC	temperature controller
TSS	total suspended solids
WRRFs	water resource recovery facilities

References

Isermann, R.; Ballé, P. Trends in the Application of Model-Based Fault Detection and Diagnosis of Technical Processes. Control Eng. Pract. 1997, 5, 709–719. [Google Scholar] [CrossRef]
Maran Beena, A.; Pani, A.K. Fault Detection of Complex Processes Using Nonlinear Mean Function Based Gaussian Process Regression: Application to the Tennessee Eastman Process. Arab. J. Sci. Eng. 2021, 46, 6369–6390. [Google Scholar] [CrossRef]
Lee, J.M.; Qin, S.J.; Lee, I.B. Fault Detection and Diagnosis Based on Modified Independent Component Analysis. AIChE J. 2006, 52, 3501–3514. [Google Scholar] [CrossRef]
Venkatasubramanian, V.; Rengaswamy, R.; Kavuri, S.N. A Review of Process Fault Detection and Diagnosis: Part II: Qualitative Models and Search Strategies. Comput. Chem. Eng. 2003, 27, 313–326. [Google Scholar] [CrossRef]
Venkatasubramanian, V.; Rengaswamy, R.; Yin, K.; Kavuri, S.N. A Review of Process Fault Detection and Diagnosis: Part I: Quantitative Model-Based Methods. Comput. Chem. Eng. 2003, 27, 293–311. [Google Scholar] [CrossRef]
Kazemi, P.; Giralt, J.; Bengoa, C.; Masoumian, A.; Steyer, J.-P. Fault Detection and Diagnosis in Water Resource Recovery Facilities Using Incremental PCA. Water Sci. Technol. 2020, 82, 2711–2724. [Google Scholar] [CrossRef]
Elshenawy, L.M.; Mahmoud, T.A. Fault Diagnosis of Time-Varying Processes Using Modified Reconstruction-Based Contributions. J. Process Control 2018, 70, 12–23. [Google Scholar] [CrossRef]
Jackson, J.E.; Mudholkar, G.S. Control Procedures for Residuals Associated with Principal Component Analysis. Technometrics 1979, 21, 341–349. [Google Scholar] [CrossRef]
Portnoy, I.; Melendez, K.; Pinzon, H.; Sanjuan, M. An Improved Weighted Recursive PCA Algorithm for Adaptive Fault Detection. Control Eng. Pract. 2016, 50, 69–83. [Google Scholar] [CrossRef]
Wang, X.; Kruger, U.; Irwin, G.W. Process Monitoring Approach Using Fast Moving Window PCA. Ind. Eng. Chem. Res. 2005, 44, 5691–5702. [Google Scholar] [CrossRef]
Chakour, C. Fault Diagnosis of Dynamic Processes Based on Neuronal Principal Component Analysis. In Proceedings of the International Conference on Automatic control, Telecommunications and Signals (ICATS15), Annaba, Algeria, 16–18 November 2015; pp. 91–96. [Google Scholar]
Golub, G.H.; Van Loan, C.F. Matrix Computations, 4th ed.; Johns Hopkins University Press: Baltimore, MD, USA, 2013; ISBN 978-1421407944. [Google Scholar]
Elshenawy, L.M.; Yin, S.; Naik, A.S.; Ding, S.X. Efficient Recursive Principal Component Analysis Algorithms for Process Monitoring. Ind. Eng. Chem. Res. 2010, 49, 252–259. [Google Scholar] [CrossRef]
Haimi, H.; Mulas, M.; Corona, F.; Marsili-Libelli, S.; Lindell, P.; Heinonen, M.; Vahala, R. Adaptive Data-Derived Anomaly Detection in the Activated Sludge Process of a Large-Scale Wastewater Treatment Plant. Eng. Appl. Artif. Intell. 2016, 52, 65–80. [Google Scholar] [CrossRef]
Elshenawy, L.M.; Mahmoud, T.A.; Chakour, C. Simultaneous Fault Detection and Diagnosis Using Adaptive Principal Component Analysis and Multivariate Contribution Analysis. Ind. Eng. Chem. Res. 2020, 59, 20798–20815. [Google Scholar] [CrossRef]
Chakour, C.; Hamza, A.; Elshenawy, L.M. Adaptive CIPCA-Based Fault Diagnosis Scheme for Uncertain Time-Varying Processes. Neural Comput Appl. [CrossRef]
Qi, M.; Jang, K.; Cui, C.; Moon, I. Novel Control-Aware Fault Detection Approach for Non-Stationary Processes via Deep Learning-Based Dynamic Surrogate Modeling. Process Saf. Environ. Prot. 2023, 172, 379–394. [Google Scholar] [CrossRef]
Nomikos, P.; MacGregor, J.F. Multivariate SPC Charts for Monitoring Batch Processes. Technometrics 1995, 37, 41–59. [Google Scholar] [CrossRef]
Oja, E. Neural Networks, Principal Components, and Subspaces. Int. J. Neural Syst. 2011, 1, 61–68. [Google Scholar] [CrossRef]
Kong, X.; Hu, C.; Duan, Z. Principal Component Analysis Networks and Algorithms. In Principal Component Analysis Networks and Algorithms; Springer: Singapore; Science Press: Beijing, China, 2017; pp. 1–323. [Google Scholar] [CrossRef]
Du, K.L.; Swamy, M.N.S. Neural Networks and Statistical Learning, Second Edition. Neural Networks and Statistical Learning, 2nd ed.; Springer: London, UK, 2019; pp. 1–988. [Google Scholar] [CrossRef]
Qiu, J.; Wang, H.; Lu, J.; Zhang, B.; Du, K.-L. Neural Network Implementations for PCA and Its Extensions. Int. Sch. Res. Netw. ISRN Artif. Intell. 2012, 2012, 847305. [Google Scholar] [CrossRef]
Oja, E. Principal Components, Minor Components, and Linear Neural Networks. Neural Netw. 1992, 5, 927–935. [Google Scholar] [CrossRef]
Jollife, I.T.; Cadima, J. Principal Component Analysis: A Review and Recent Developments. Philos. Trans. R. Soc. A: Math. Phys. Eng. Sci. 2016, 374, 20150202. [Google Scholar] [CrossRef]
Yue, H.H.; Qin, S.J. Reconstruction-Based Fault Identification Using a Combined Index. Ind. Eng. Chem. Res. 2001, 40, 4403–4414. [Google Scholar] [CrossRef]
Cardot, H.; Degras, D. Online Principal Component Analysis in High Dimension: Which Algorithm to Choose? Int. Stat. Rev. 2018, 86, 29–50. [Google Scholar] [CrossRef]
Oja, E. Simplified Neuron Model as a Principal Component Analyzer. J. Math. Biol. 1982, 15, 267–273. [Google Scholar] [CrossRef] [PubMed]
Oja, E.; Karhunen, J. On Stochastic Approximation of the Eigenvectors and Eigenvalues of the Expectation of a Random Matrix. J. Math. Anal. Appl. 1985, 106, 69–84. [Google Scholar] [CrossRef]
Li, W.; Yue, H.H.; Valle-Cervantes, S.; Qin, S.J. Recursive PCA for Adaptive Process Monitoring. J. Process Control 2000, 10, 471–486. [Google Scholar] [CrossRef]
Alcala, C.F.; Joe Qin, S. Analysis and Generalization of Fault Diagnosis Methods for Process Monitoring. J. Process Control 2011, 21, 322–330. [Google Scholar] [CrossRef]
Miller, P.; Swanson, R.; Heckler, C. Contribution Plots: A Missing Link in Multivariate Quality Control. Int. J. Appl. Math. Comput. Sci. 1998, 8, 775–792. [Google Scholar]
Wise, B.M.; Gallagher, N.B.; Bro, R.; Shaver, J.M.; Windig, W.; Koch, R.S. PLS Toolbox User Manual; Eigenvector Research Inc.: Manson, WA, USA, 2006; ISBN 0976118416. [Google Scholar]
Westerhuis, J.A.; Gurden, S.P.; Smilde, A.K. Generalized Contribution Plots in Multivariate Statistical Process Monitoring. Chemom. Intell. Lab. Syst. 2000, 51, 95–114. [Google Scholar] [CrossRef]
Qin, S.J.; Li, W. Detection and Identification of Faulty Sensors in Dynamic Processes. AIChE J. 2001, 47, 1581–1593. [Google Scholar] [CrossRef]
Alcala, C.F.; Qin, S.J. Unified Analysis of Diagnosis Methods for Process Monitoring. IFAC Proc. Vol. 2009, 42, 1007–1012. [Google Scholar] [CrossRef]
Cherry, G.A.; Qin, S.J. Multiblock Principal Component Analysis Based on a Combined Index for Semiconductor Fault Detection and Diagnosis. IEEE Trans. Semicond. Manuf. 2006, 19, 159–171. [Google Scholar] [CrossRef]
Alcala, C.F.; Qin, S.J. Reconstruction-Based Contribution for Process Monitoring. Automatica 2009, 45, 1593–1600. [Google Scholar] [CrossRef]
Van Den Kerkhof, P.; Vanlaer, J.; Gins, G.; Van Impe, J.F.M. Contribution Plots for Statistical Process Control: Analysis of the Smearing-Out Effect. In Proceedings of the 2013 European Control Conference (ECC), Zurich, Switzerland, 17–19 July 2013; ISBN 9783952417348. [Google Scholar]
Jeppsson, U.; Rosen, C.; Alex, J.; Copp, J.; Gernaey, K.V.; Pons, M.-N.; Vanrolleghem, P.A. Towards a Benchmark Simulation Model for Plant-Wide Control Strategy Performance Evaluation of WWTPs. Water Sci. Technol. 2006, 53, 287–295. [Google Scholar] [CrossRef]
Nopens, I.; Benedetti, L.; Jeppsson, U.; Pons, M.-N.; Alex, J.; Copp, J.B.; Gernaey, K.V.; Rosen, C.; Steyer, J.-P.; Vanrolleghem, P.A. Benchmark Simulation Model No 2: Finalisation of Plant Layout and Default Control Strategy. Water Sci. Technol. 2010, 62, 1967–1974. [Google Scholar] [CrossRef]

Figure 1. Adaptive NN-based PCA diagram.

Figure 2. Fault detection results of conventional PCA and NN-based PCA for numerical simulation without fault (horizontal red line: confidence limit).

Figure 3. Fault detection results of NN-based PCA for numerical simulation with faults 1 and 2 (horizontal red line: confidence limit; dashed vertical red line: onset of the fault).

Figure 4. Fault isolation results of various contribution plots for numerical simulation fault 1 (720th sample).

Figure 5. The schematic of the CSTR process with a concentration controller (CC) and temperature controller (TC).

Figure 6. Fault detection results of conventional PCA and NN-based PCA for CSTR simulation without fault (horizontal red line: confidence limit).

Figure 7. Fault detection results of NN-based PCA for CSTR simulation with faults 1—Bias and 3—Drift (horizontal red line: confidence limit; dashed vertical red line: onset of the fault).

Figure 8. Fault isolation results of various contribution plots for the CSTR simulation fault 1—Bias (720th sample).

Figure 9. Fault isolation results of various contribution plots for the CSTR simulation fault 3—Drift (720th sample).

Figure 10. Layout of BSM2 and locations of the measured variables.

Figure 11. Fault detection results of conventional PCA and NN-based PCA for WRRF simulation without fault (horizontal red line: confidence limit).

Figure 12. Fault detection results of NN-based PCA for WRRF simulation with faults 1 and 5 (horizontal red line: confidence limit; dashed vertical red line: onset of the fault).

Figure 13. Fault isolation results of various contribution plots for the WRRF simulation fault 1 (1500th sample).

Figure 14. Fault isolation results of various contribution plots for the WRRF simulation fault 5 (1500th sample).

Table 1. Monitoring statistics.

Monitoring Statistics (Index)	$T_{k}^{2}$	$Q_{k}$	$φ_{k}$
Ξ_k	$U_{k} Λ_{k}^{- 1} U_{k}^{T}$	$I - U_{k} U_{k}^{T}$	$Φ_{k}$

Table 2. The detection performance of NN-based PCA for numerical simulation.

	T²			Q			$φ$
Case	FAR	MDR	Delay	FAR	MDR	Delay	FAR	MDR	Delay
Normal	0.69	-	-	2.30	-	-	6.76	-	-
Fault 1	0.71	24.91	138	1.56	65.71	14	6.11	2.67	11
Fault 2	0.85	28.59	176	2.41	80.60	491	7.96	18.72	136

Table 3. CSTR faulty scenarios.

Fault Type	Faulty Sensor	Fault Description	Fault Time
1—Bias	F_s (6) *	f = 2 (m³/min)	700–1000
2—Drift	F_s (6)	f = dF_s/dt = 0.05 (m³/min)	700–1000
3—Drift	C_s (3)	f = dC_s/dt = 0.01 (kmol/m³)	700–1000
4—Bias	T_c (8)	f = 7 (K)	700–1000
5—Complete failure	T (7)	f = 363 (K)	700–1000

* variable number in parenthesis.

Table 4. The detection performance of NN-based PCA for CSTR simulation.

	T²			Q			$φ$
Case	FAR	MDR	Delay	FAR	MDR	Delay	FAR	MDR	Delay
Normal	2.45	-	-	1.69	-	-	3.07	-	-
1—Bias	2.49	36.45	90	2.59	3.01	1	4.58	2.34	1
2—Drift	2.89	11.03	37	5.78	4.68	23	6.38	4.01	20
3—Drift	2.39	6.68	24	2.69	20.40	73	3.19	8.02	24
4—Bias	1.89	45.48	69	4.68	0	1	5.08	0	1
5—Complete failure	2.49	90.96	294	1.99	0	1	3.88	0	1
Average	2.43	31.76	85.66	3.23	4.68	16.5	4.36	2.39	7.83

Table 5. WRRFs faulty scenarios.

Fault Type	Faulty Sensor or Parameter	Fault Description	Fault Time
1—O₂ sensor drift	Sensor no. 7	f = dO₂/dt = 0.5 (g·m⁻³·d⁻¹)	1248
2—pH sensor drift	Sensor no. 27	f = d(pH)/dt = 0.5	1248
3—O₂ sensor bias	Sensor no. 7	f = 1 (g·m⁻³·d⁻¹)	1248
4—pH sensor bias	Sensor no. 27	f = 2
5—step change in inorganic nitrogen of anaerobic digester	S_in	f = 0.2 (kmol/m³)	1248
6—secondary clarifier parameter	v_s	f = Decreased by 50%	1248

Table 6. The detection performance of NN-based PCA for WRRF simulation.

	T²			Q			$φ$
Case	FAR	MDR	Delay	FAR	MDR	Delay	FAR	MDR	Delay
Normal	0.16	-	-	0.36	-	-	0.70	-	-
1—O₂ sensor drift	0.08	16.51	241	0.16	11.81	147	0.32	10.31	142
2—pH sensor drift	0.08	6.25	103	0.16	3.18	70	0.32	2.027	34
3—O₂ sensor bias	0.08	93.22	804	0.16	23.92	26	0.32	16.28	32
4—pH sensor bias	0.08	0	1	0.16	0	1	0.32	0	1
5—Step change in inorganic nitrogen of anaerobic digester	0.08	9.26	130	0.16	9.09	161	0.32	6.14	101
6—Secondary clarifier parameter	0.08	72.36	132	0.16	26.36	4	0.32	25.89	5
Average	0.09	28.22	201.57	0.18	10.62	58.42	0.37	8.66	45

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kazemi, P.; Masoumian, A.; Martin, P. Fault Detection and Isolation for Time-Varying Processes Using Neural-Based Principal Component Analysis. Processes 2024, 12, 1218. https://doi.org/10.3390/pr12061218

AMA Style

Kazemi P, Masoumian A, Martin P. Fault Detection and Isolation for Time-Varying Processes Using Neural-Based Principal Component Analysis. Processes. 2024; 12(6):1218. https://doi.org/10.3390/pr12061218

Chicago/Turabian Style

Kazemi, Pezhman, Armin Masoumian, and Philip Martin. 2024. "Fault Detection and Isolation for Time-Varying Processes Using Neural-Based Principal Component Analysis" Processes 12, no. 6: 1218. https://doi.org/10.3390/pr12061218

APA Style

Kazemi, P., Masoumian, A., & Martin, P. (2024). Fault Detection and Isolation for Time-Varying Processes Using Neural-Based Principal Component Analysis. Processes, 12(6), 1218. https://doi.org/10.3390/pr12061218

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Fault Detection and Isolation for Time-Varying Processes Using Neural-Based Principal Component Analysis

Abstract

1. Introduction

2. Conventional PCA

3. Neural Network-Based PCA

Determination of the Number of PCs

4. Fault Isolation

4.1. Complete Decomposition Contributions (CDCs)

4.2. Partial Decomposition Contributions (PDCs)

4.3. Diagonal-Based Contributions (DBCs)

4.4. Reconstruction-Based Contributions (RBCs)

5. Adaptive Fault Detection and Isolation

6. Simulation Results

6.1. Numerical Example

6.2. CSTR Process

6.3. Water Resource Recovery Facilities

7. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI