1. Introduction
To ensure production safety and product quality, process monitoring technology has become an indispensable ingredient for industrial processes in recent years. It is commonly divided into model-based methods and data-driven methods. Compared with the former ones, the later ones can take advantage of the routine measurement and do not rely on process prior knowledge and precise mechanism models, which are unavailable or cost-intensive to obtained at times [
1,
2]. Therefore, they are widely used in modern industrial process.
During the past decades, many data-driven process monitoring methods have been published [
3,
4,
5,
6]. Kim et al. [
7] proposed a probabilistic PCA to monitoring industrial processes, which firstly extracts redundant information from the variables and constructs feature distribution for monitoring, but it only extracts features of the input space. Zhao et al. [
8] proposed the probabilistic PLSR process monitoring method to monitor quality-related faults, which can simultaneously consider the fault characteristics of the input and output spaces for monitoring. Furthermore, Chen et al. [
9] proposed a probability-related PCA method for detecting incipient faults, which can greatly improve the detection ability of minor faults. Probabilistic framework modeling can overcome process noise [
10]. However, the process monitoring methods currently proposed are all static methods, and the actual production processes are dynamical featured with variable time lag [
11,
12].
Process dynamics could refer to the mutual influence before and after the current sampling [
13]. To deal with the dynamics of process, Ku et al. [
14] built an augmented matrix and extended the static PCA model to the dynamic PCA (DPCA) for process monitoring. However, the introduction of augmented matrix increases the parameter dimensions called the curse of dimensionality [
15]. Motivated by DPCA, Li et al. [
16] proposed a dynamic latent variable model for monitoring the Tennessee Eastman process. In this model, the autoregressive model is used to extract data dynamic information, and PCA is performed to reduce redundancy between variables. It divides variable order reduction and dynamic information extraction into two stages, which makes the system complex and not easy to tune. In addition, compared with process variables, quality variables also contain useful fault information [
17]. For this reason, Ge et al. [
18] proposed a supervised linear dynamic system model process monitoring method. This method uses a first-order autoregressive equation to simulate the first-order dynamic [
19] but does not take the variable time lag into account.
Variable time lag refers to the delay between the effects of variables [
20]. The existing monitoring methods considering time lag are usually divided into two categories [
21]. One is to find the time lag between variables and translate the data to eliminate the time lag and then establish a static process monitoring model for the processed data. For example, Wang et al. [
22] proposed a spatial reconstruction method to identify system time lag, then aligned the data and established a monitoring model, but the alignment operation will destroy the data structure and cause data loss. The other idea is to use time lag as an unknown parameter of the process monitoring model and identify the parameters through a data-driven method. For example, Huber et al. [
23] proposed to take the time lag as a parameter of a high-order state space system model and then solve it uniformly with the model parameters, but this method relies on the setting of the time lag parameter and the parameter identification method.
From the above discussions, it can be observed that the variable time lag characteristic of a process makes the previous work unfavorable. However, this characteristic is common in industrial processes [
24,
25]. To deal with this problem, this paper proposes a process monitoring method based on a dynamic autoregressive latent variable model. Firstly, from the data point of view, a linear dynamic model is constructed between process variables and quality variables, and the dynamic information of process input and process output is compressed to latent variables, and then a dynamic autoregressive latent variable model (DALM) is constructed for latent variables to extract variable time lag information. In addition, a fusion Bayesian filtering, smoothing and expectation maximization algorithm is used to identify model parameters. Then, the DALM is applied to the industrial monitoring process. The process variables are filtered through improved Bayesian filtering technology to obtain the latent space distribution of the current state, and the T
2 statistics of the latent space are constructed and monitored [
26] to realize the process monitoring task. The main contribution can be concluded as (1) a process monitoring method based on dynamic autoregressive latent variable model is proposed in this paper; (2) a dynamic autoregressive latent variable model (DALM) is developed to extract variable time lag information; (3) a fusion Bayesian filtering, smoothing and expectation maximization algorithm is improved to identify model parameters; (4) based on the DALM, the T
2 statistics of the latent space are constructed to realize the process monitoring task.
The main structure of the paper is arranged as follows. In the second section, a dynamic autoregressive latent variable model is proposed, and the parameter identification algorithm of the model is derived in detail. A process monitoring method based on DALM is proposed in the third chapter. The fourth section uses the monitoring method to monitor the sintering process of the ternary cathode material to verify the monitoring performance of the proposed method. Finally, the last section concludes.
3. Process Monitoring Method Based on Dynamic Autoregressive Latent Variable Model
In this section, the established dynamic autoregressive latent variable model is used for industrial process monitoring. At first, DALM was used to model the process data so that the current state information was reflected in the latent variables, and then the latent space at the current time was obtained by filtering the process data distribution, constructing statistics and monitoring them. Let us introduce the monitoring process in detail below.
Although the latent space was unobservable, the establishment of a data-driven DALM model based on the characteristics of the process data extracted the information of the process variables to the spatial distribution of the latent variables. The process input
and output
needed to be pre-processed by the normalization method, as shown in (9).
where
and
are the means of the variables
and
,
and
are the variances of the variables
and
. Preprocessed data were filtered through the filtering algorithm to obtain the spatial distribution of the latent variables, as shown in (10).
Among them,
and
are the mean and variance of the latent space distribution, respectively, which were obtained from (11). The detailed derivation process is shown in (A12)–(A16).
It can be seen from (A24) that the information of the data
and
at the current and previous moments was filtered into the current latent variable, and the latent variable distribution at the current moment is shown in (12).
Because the latent space contains the current state of the process dynamics and variable time lag information, the process statistic T
2 was constructed for the current latent variable at time
t, as shown in (13).
Among them, the mathematical expectation and variance of the latent variables on the observation data at the current moment are shown in (14).
The probability of the latent variable obeyed the Gaussian distribution. Therefore, according to the definition of chi-square distribution, this statistic obeyed the chi-square distribution
after data preprocessing. Then, combining to the latent variable dimension
d of the model and the significance level
required by the industry, the control threshold
of the process monitoring method was obtained, and then the statistics of each time data were calculated online and compared with the control threshold, to determine whether the process deviated from the normal state. The process monitoring logic is determined by (15).
Too large an
value will lead to a high false alarm rate, and too low an
will lead to a high false alarm rate; therefore, in practice, it is a balance between false alarms and missed alarms. This paper chose
as 0.01, which means that the false positive rate of normal data was 0.01. If
, the system was in a normal state. Otherwise, the process located in a fault state, and further diagnosis and identification of the fault was required for process maintenance. The process of DALM modeling and online process monitoring is shown in
Figure 2.
The main steps of the process monitoring method based on the DALM model were as follows:
Step 1: Collect process data, divide the training and test data sets and standardize them.
Step 2: Use the training data set to learn the parameters of the DALM model.
Step 3: Build the model and determine the control threshold.
Step 4: Filter the process data online to get the latent space distribution at the current moment.
Step 5: Calculate statistics and compare with the control threshold to determine whether the process is abnormal.
4. Case Study on the Sintering Process of Ternary Cathode Materials
In this section, the proposed process monitoring method based on the dynamic autoregressive latent variable model is used to monitor the sintering process of ternary cathode materials to verify the effectiveness of the method. First the sintering process technology of the ternary cathode material was introduced, then the model structure and parameter determination were introduced in detail, and finally the performance of the model was evaluated.
4.1. Introduction to the Sintering Process of Ternary Cathode Materials
The rapid development of the new energy industry has led to an extremely urgent demand for high-quality ternary cathode materials, and the sintering process of battery materials is the core and key process of battery preparation. This process consists of a series connection of a heating section, a constant temperature section and a cooling section, as shown in
Figure 3. The optimal production state of a single temperature section cannot guarantee that the product performance indicators of the entire sintering process are within the optimal range; at the same time, changes in the sintering process, such as environmental humidity or temperature, also affect the stability of product performance indicators. In order to ensure the stability of product performance indicators as much as possible, while reducing energy consumption and material consumption, it is necessary to adjust the sintering parameters of the kiln according to the sintering state in real time, which leads to many variables in each temperature zone and series coupling, which makes the process data present complex process characteristics [
29].
The temperature field in the sintering process has a significant effect on the material properties. Over-firing will cause changes in the material morphology and internal structure, and under firing will not provide sufficient activation energy for chemical reactions. However, the decomposition reaction that occurs in the heating section is an endothermic process and requires sufficient heat supply, otherwise a reverse reaction will occur, resulting in inefficient water removal, which will affect the subsequent oxidation reaction. Therefore, the state of the heating section is very important to the sintering process. At the same time, the residual lithium content can directly reflect the quality of the product. In order to monitor the process status in real time, a monitoring model is established for the temperature and residual lithium content of the heating section.
Huang et al. [
30] established a temperature field monitoring model based on the PBF equipment equation to monitor the dynamic sintering process of parts, but this method requires precise grinding tool structure parameters and can only monitor uniformly distributed temperature fields. Egorova et al. [
31] tried to combine neural networks and PCA diagnosis method monitor and diagnose the sintering process. This method can locate the fault and diagnose the cause of the fault. However, the introduction of neural networks increases the time and space complexity of the system and ignores the system dynamic and time lag problems.
Due to the severe temperature interval coupling, the process variables exhibit complex characteristics, making the traditional static monitoring methods unable to achieve accurate monitoring results. The dynamic autoregressive latent variable model proposed in this section considers the dynamic and time lag information of the process at the same time, so it is more in line with the sintering process.
4.2. Determination of Model Parameters
This section establishes a monitoring model for the temperature and product quality in the heating section of the sintering process. The heating section contained seven temperature zones, and each temperature zone had two upper and lower temperature measuring points, but the temperature changes in the 4th to 7th temperature zones were not obvious. The temperature of the first three temperature zones was selected as the process variable
of the model. At the same time, the residual lithium content of the product reflects the quality of the battery, as does the quality variable
of the model,
Table 1 lists the physical meaning of these variables.
To test the monitoring effect of the model under different faults, a total of 2200 continuous time data samples were collected on site with a sampling period of five minutes. The process included a total of three types of faults such as over-temperature, under-temperature and shutdown. For detailed status information, see
Table 2.
First, analyze the dynamics of the data and the time lag characteristics of the variables from the data point of view.
Figure 4 shows the autocorrelation and cross-correlation diagrams of process data.
Figure 4 shows the correlation and cross-correlation between the first four process variables. The value at time 0 in each figure represents the cross-correlation between variables; the value at non-zero time shows the autocorrelation between variables under different time lags. It is worth mentioning that the cross-correlation index can measure the redundancy of variable information, and the autocorrelation index can indirectly measure the dynamic and time delay information between variables. It can be seen that the cross-correlation performance between the variables was above 0.5, indicating that there was strong redundant information between the variables. At the same time, even if there was a difference of 10 sampling times, the autocorrelation between the variables was still very high, indicating that there were time lags and dynamic characteristics between the variables. Therefore, the establishment of a DALM model for the process can be considered. The emission equation of the model extracts the redundant information of the data, and the autoregressive equation of the model extracts the dynamic and time lag information of the variables. This paper uses the trend similarity algorithm, which constructs the trend similarity function according to the time lag feature and solves it, to determine the time lag coefficient, that is, L = 3.
To verify the rationality of the time lag coefficient, under different time lag coefficients, a dynamic autoregressive latent variable monitoring model was established respectively. Note: In order to avoid the latent variable dimension from interfering with the selection of the time lag coefficient the latent variable dimension selected by Akaike information criterion (AIC) was temporarily used [
32]. The false alarm rate (false alarm rate, FAR) and fault detection rate (fault detection rate, FDR) were defined to evaluate and monitor performance indicators, as defined in (16).
represents the number of normal samples that were mistakenly detected as abnormal by the monitoring method, and Nn is the number of all normal samples. represents the number of fault samples correctly monitored by the monitoring method, is the number of all abnormal samples. Therefore, the closer the FAR is to the significance level, the better, and the closer the FDR is to 1, the better. The significance level of this work was set to 0.01.
The first 1000 normal samples were selected to train the model, and the data type fault 1 was used to test the monitoring effect of the model.
Table 3 shows the indicators of the monitoring results of the new method under different time lag coefficients.
The model did not converge when the time lag coefficient was 5, and when the model time lag coefficient was 3, the error and false alarm rate of the model were the best. Therefore, when the time lag coefficient was 3, the model gave the best performance. In order to visually see the monitoring results of the model,
Figure 5 shows the monitoring T
2 diagram when the model’s time lag coefficient was 2, 3 and 4.
It can be seen from
Figure 5 that when the model time lag coefficient was 2 and 4, it was easy to misclassify the sample. Especially in the fault interval of 201st–400th: the divided normal samples and abnormal samples were close to the monitoring threshold, which shows that the robustness of the model with this time lag is low; when the model had a time lag coefficient of 3, it is insensitive to the noise and the false alarms are the smallest. Hence, its
and
were the best. Therefore, the time lag coefficient obtained by the trend similarity identification algorithm enabled the model to obtain a better monitoring effect.
Next, the latent variable dimension was determined. The latent variable dimension is the result of comprehensively considering the complexity and accuracy of the model. The root mean square error (RMSE) is an indicator to measure the accuracy of the model. The expression is shown in (17).
where
N is the number of test samples,
is the prediction of the true value
and
is the mean value of the true value of the test sample. Samples from the 1st to the 600th were used to train the model, and samples from the 601st to the 1000th were used as the test set.
Table 4 shows the root mean square error of model prediction under different latent variable dimensions.
Table 4 shows that the prediction performance of the model tends to be stable after the latent variable dimension increased to 3, which was the balance point between model complexity and accuracy. It is worth mentioning that under the time lag coefficient, the latent variable dimension selected by the AIC algorithm was also 3, so the latent variable dimension was determined to be 3.
4.3. Model Performance Test
This section verifies the effect of the proposed monitoring method, and constructs a first-order dynamic process monitoring method: DPLVM [
18] and static process monitoring method: PPLSR [
33], which were used to compare with the proposed method. The latent variable dimensions of the model were adjusted to 3.
The first 1000 normal samples were used to train the parameters of the model, and the trained model was monitored for three types of different fault samples. In order to distinguish between normal and abnormal samples, the first 200 samples of each type of failure test set were normal samples, and the last 200 samples were their respective failure samples.
Table 5 shows the FAR and FDR of different monitoring methods under different failure test sets, and the last line calculates the average value of different indicators.
It can be seen from
Table 5 that the monitoring performance of the proposed method was better than that of the static model PPLSR and the first-order dynamic model DPLVM. Therefore, the detection performance was greatly improved after the autoregressive equation was added to the model to extract the dynamic and time lag information. Compared with the basic first-order dynamic DPLVM fault detection method, DALM considered the time lag characteristics, so the model performance was further improved. The detailed monitoring results of the three methods for the three types of faults are shown in
Figure 6,
Figure 7,
Figure 8.
For each type of fault test set, the first 200 samples were in a normal state, and the last 200 samples were fault samples. It can be seen from
Figure 8 that the static model PPLSR easily mistakenly classified normal samples into faulty samples, and it also easily classified faulty samples into normal samples. The error rate of the first-order dynamic model DPLVM was reduced a lot. Furthermore, the FAR based on the DALM fault detection method proposed in this paper was close to the significance level and the FDR was close to 1, verifying that its monitoring performance was greatly improved.
5. Conclusions
A process monitoring method based on the dynamic autoregressive latent variable model was proposed in this paper. Compared with the traditional DPLVM monitoring method, this method not only considered the dynamic characteristics of the process but also considered the complex time lag characteristics, integrated the time lag information into the model, and greatly improved the monitoring performance of the model in the time lag process. First, from the point of data, this method established a dynamic autoregressive latent variable model to adopt the characteristics of dynamics and variable time lag. Then a fusion Bayesian filtering, smoothing and expectation maximization algorithm was used to identify model parameters. Then, on the basis of the identified model, the improved Bayesian filtering technique was used to infer the latent variable distribution of the process state, and the T2 statistic was constructed for the latent space and online monitoring is performed. Finally, the proposed method was applied to the monitoring of the sintering process of ternary cathode materials. Through industrial case studies, the modeling and monitoring results of the proposed method show that the DALM model was better than the static and first-order dynamic modeling process monitoring methods.
An important issue for process monitoring application in industrial processes is the multisampling rate problem. The method proposed in this paper assumed that the input and output data had the same sampling rate. If the sampling rate was inconsistent, some data were deleted by down-sampling. However, a more worthwhile way to try would be to combine semi-supervised learning methods, which can train data on unbalanced input and output data, thereby improving data utilization. Another practical problem is the non-linear relationship between process data, which is very common in industrial processes. How to effectively deal with this problem is worthy of further research in the near future to make the monitoring method more applicable.