2.1. Distributed Robust Dictionary Pair Learning
To solve the deficiency of traditional dictionary learning, dictionary pair learning combines synthetical dictionary and analytical dictionary to reduce the computational burden of
or
norm constraint and enhance the reconstruction ability of dictionary learning [
18]. The structure diagram of dictionary pair learning is shown in
Figure 3, and its general model is formulated as follows:
where
is
p-dimensional data matrix,
represents the synthetical dictionary,
is the analytical dictionary,
m is the number of dictionary atoms.
denotes the reconstruction error term of dictionary pair learning,
are some discriminative functions, and
Y stands for the label matrix of
X. In the dictionary pair learning model, the representation coefficients
A can be obtained by linear projection instead of nonlinear sparse coding with
or
norm. That is, we can learn an analytical dictionary
P, such that
A can be analytically obtained as
. Based on this, dictionary pair learning method learns such an analytical dictionary
P together with the synthetical dictionary
D, then the data matrix
X can be reconstructed by
D,
P, and
X, i.e.,
, where
D is used to reconstruct
X, and
P is applied to analytically code
X.
Dictionary pair learning described above has been improved and used in industrial process monitoring. However, the process data collected in practical industrial systems often contain noise and outliers, which bring difficulties to process monitoring.
Figure 4 shows the impact of noise and outliers on process monitoring results, where
is the control threshold, a value lower than
is considered normal, while a value higher than
is detected as an anomaly. From
Figure 4a, we can observe that normal samples and abnormal samples can be correctly detected without the interferences of noise and outlier. In
Figure 4b, we can see that when process data contains noise and outliers, false positives will appear in abnormal detection results of process monitoring.
Therefore, to address the problem of process monitoring performance degradation caused by outliers and noise, we propose a robust dictionary pair learning method for industrial process monitoring. In addition, the proposed process monitoring method based on distributed robust dictionary pairs is developed for high-dimensional process data. The prior process knowledge is used to divide the training samples into blocks in the dimension direction, i.e.,
. The RDPL model of the
Kth block in training samples
is denoted as follows:
where
represents the synthetical dictionary and
represents the analytical dictionary for the
Kth block, respectively.
m denotes the number of dictionary atoms. The first term is the reconstruction function of data, the second term is the sparse regularization of analytical dictionary, and the third term is the low-rank constraint of synthetical dictionary.
and
are the positive parameters used to balance the terms. Besides, the constraint
is imposed to ensure that the coding coefficient
is non-negative.
By introducing analytical coding matrix
, the non-convex problem of Equation (
2) is relaxed and transformed into the optimization function as follows:
where
, and
is a scalar constant. The optimization of the objective function in Equation (
3) is conducted in the following steps.
Numbered lists can be added as follows:
- (1)
Fix and , update
Firstly, we fix the synthetical dictionary
and the analytical dictionary
, the problem with respect to the analytical coding matrix
can be reformulated as follows:
Based on the definition of
norm [
28], we have
, where
is a diagonal matrix with the
th diagonal entries
,
is the
ith row vector of
. In fact, since
may be equal to 0, we approximate
instead.
is a small value to avoid singular values and to make the inversion more stable. Similarly,
, where
is a diagonal matrix with the
th diagonal entries
,
is the
ith row vector of
. We use
to approximate
. Then, the problem with respect to
can be reformulated as follows:
Let
be the Lagrange multiplier for
and
[
20], the Lagrange function
can be deduced as follows:
The partial derivatives of
with respect to
in Equation (
6) are computed as follows:
By the definition of KTT condition [
29], the equation with respect to
is obtained as follows:
Thus, we can obtain that the elements in
rth row and
cth column of
are updated as follows:
- (2)
Fix and , update
Secondly, after the analytical coding matrix
is updated, we can update the analytical dictionary
. By removing the terms that irrelevant to
, the problem in Equation (
3) is reformulated as follows:
Similarly, we have
, where
is a diagonal matrix with the
th diagonal entries
,
is the
ith row vector of
. We use
to approximate
. Then, the problem with respect to
can be converted as follows:
Let the partial derivative of Equation (
11) with respect to
be 0, and we can obtain the closed-form solution of
as follows:
- (3)
Fix and , update
Finally, after the analytical dictionary
is calculated, we can update the synthetical dictionary
. By removing the terms that irrelevant to
, the problem with respect to
is expressed as follows:
Obviously, the optimization problem in Equation (
13) is an NP-hard problem. Therefore, we use the low-rank function with the nuclear norm constraint to relax the optimization problem as follows [
30]:
where
is the nuclear of
. To reduce the computational complexity, we use
to replace
[
31], where
and
. Thus, the optimization problem of Equation (
14) can be converted as follows:
We use the inexact ALM algorithm [
30] to solve the optimization problem in Equation (15), and the augmented Lagrange function is formulated as follows:
where
is a penalty parameter, and
is a Lagrange multiplier.
To solve the optimization problem in Equation (
16), we minimize the augmented Lagrange function by iterative updating as follows:
- (1)
Fix and , update
By removing the terms of Equation (
16) that irrelevant to
, the optimization problem with respect to
can be reformulated as follows:
Let the partial derivative of Equation (
17) with respect to
be 0, and we can update the synthetical dictionary
as follows:
- (2)
Fix and , update
By removing the terms of Equation (
16) that irrelevant to
, the optimization problem with respect to
can be reformulated as follows:
Let the partial derivative of Equation (
19) with respect to
be 0, and we can update the variable matrix
as follows:
- (3)
Fix and , update
By removing the terms of Equation (
16) that irrelevant to
, the optimization problem with respect to
can be reformulated as follows:
Let the partial derivative of Equation (
21) with respect to
be 0, and we can update the variable matrix
as follows:
Thus, the iterative updating process of the synthetical dictionary
is summarized as follows:
To fully introduce the proposed method, Algorithm 1 describes the optimization of RDPL, which stops optimizing each variable when the algorithm reaches the maximum iteration
T.
Algorithm 1 Robust Dictionary Pair Learning |
- 1:
Input: The training samples of the Kth sub-block, the parameters , , , , and . - 2:
Step 1: Initialize the synthetical dictionary and the analytical dictionary as random matrixes with unit Frobenius norm, set . - 3:
Step 2: Repeat until ; - 4:
Step 2.1: Fix the analytical dictionary and the synthetical dictionary , update the analytical coding matrix by Equation ( 9); - 5:
Step 2.2: Fix the synthetical dictionary and the analytical coding matrix , update the analytical dictionary by Equation ( 12); - 6:
Step 2.3: Fix analytical coding matrix and the analytical dictionary , update the synthetical dictionary by Equation ( 23); - 7:
Step 2.4: Set . - 8:
Output: The analytical dictionary and the synthetical dictionary of the Kth sub-block.
|
By building RDPL model through Algorithm 1, we can calculate the reconstruction error of training samples in the
Kth sub-block as follows:
Then, the control threshold
of the
Kth sub-block can be obtained by the kernel density estimation (KDE) method [
32], and the univariate kernel density estimation is conducted as follow:
where
x represents the data point under consideration,
M is the number of training samples,
H represents the bandwidth,
is the reconstruction error of the th sample in the
Kth sub-block, and
is the uniform kernel function.
2.3. Contribution Index Based Anomaly Isolation
For the detected abnormal samples, we need to further locate abnormal sources. The location of the anomaly is found by the method of locating the abnormal block based on counting time [
27], that is, when the posteriori probability of the abnormal block exceeds the significance level, the block anomaly flag (BAF) is set to 1, and BAF is defined as
where
represents the
hth abnormal sample of the
Kth block. To ensure the reliability of abnormal block isolation, block anomaly index (BAI) and block contribution index (BCI) are defined as follows:
where
H is the number of abnormal samples.
In industrial process monitoring, contribution plot method [
34] has become a common method for anomaly isolation. On the basis of locating the abnormal block, the contribution plot method is used to locate the abnormal variable to realize anomaly isolation accurately. Suppose that the synthetical dictionary and the analytical coding matrix of the abnormal sample
are defined as
and
, respectively, the abnormal sample
can be expressed as follows:
where
is an identity matrix, and
s is the number of variables in the
Kth sub-block. The non-zero terms of the vector
f represent the position and size of the anomaly source. To more clearly represent the anomaly source, the augmented synthetical dictionary is defined as
, so the new analytical coding matrix of the anomaly sample
under
is calculated as follows:
Then, the abnormal sample is reformulated as
. In addition, the vector
f can be replaced by
, where
, and
O is the zero matrix. The variable contribution (VC) of the
jth variable in the
Kth block is defined by contribution plot method, which can be calculated as follows:
And the corresponding variable contribution index (VCI) is expressed as
where
is an identity matrix.