**2. Methodology**

#### *2.1. Critical-to-Quality Phase Identification Based on Time-Slice Model*

In the batch process, there will be different process requirements in the whole operation process, causing obvious phase characteristics. The batch process can be divided into several phases according to process variable relevance. Due to the phase characteristic, there is no significant change in the correlation between process variables and quality variables at different sampling times in the same phase; that is to say, the effect of process operation behavior on quality is similar in the same phase. However, in different phases, the influences of process variables on quality are different, and they show different statistical relationships. Because of the above characteristics of batch processes, a phase that has a significant contribution to the final quality is defined as the critical-to-quality phase. There may be several critical-to-quality phases in the batch process. If production has multiple quality variables, the critical-to-quality phases may be different or the same for different quality variables, depending on the characteristics of the process. Therefore, it is important to find out the critical-to-quality phases that contribute the most to the quality change.

Batch process data are generally represented by **X**(*I* × *Jx* × *<sup>K</sup>*), where *I* is the number of batches, *Jx* is the number of process variables, and *K* is the sample times. The quality data is generally represented by **Y**(*I* × *Jy*), where *Jy* is the number of measurement values. The measurement values of all *Jx* variables at the sampling interval *k* (*k* = 1, ... ,*K*) are stored in **<sup>X</sup>***k*(*<sup>I</sup>* × *Jx*), which is called the *k*th time slice of **X***k*. The relationship between process variables and quality variables at time interval *k* can be collected from matrices **X***k* and **Y**. By applying PLS, the *k*th time-slice PLS model is realized.

$$\begin{array}{l} \mathsf{X}\_{k} = \mathsf{T}\_{k} \mathsf{P}\_{k}^{\mathsf{T}} + \mathsf{E}\_{k} \\ \mathsf{Y} = \mathsf{U}\_{k} \mathsf{Q}\_{k}^{\mathsf{T}} + \mathsf{F}\_{k} \end{array} \tag{1}$$

The previous model can be expressed by the regression model as:

$$
\hat{\mathbf{Y}}\_k = \mathbf{X}\_k \mathbf{B}\_k \tag{2}
$$

Where **T***k* and **U***k* are the score matrices, **P***k* and **Q***k* are the loading matrices, **E***k* and **F***k* are the residual matrices, **B***k* is the regression parameter matrix, *k* = 1,2, ... ,*K*, and **^ Y***k* is the predicted quality. When considering a single quality variable **y**(*<sup>I</sup>* × <sup>1</sup>), the regression modelcanbesimplyexpressedas:

$$
\hat{\mathbf{y}}\_k = \mathbf{X}\_k \mathfrak{g}\_k \tag{3}
$$

β*k* is the regression parameter and **y***k* is the predicted quality at the current time. In the regression model, the number of latent variables needs to be determined, and the four-fold cross-validation method is used in this work [25,26].

**^**

In this paper, the index *R*2, which is used to describe the goodness of fit of the regression model in the field of multivariable linear regression, is used to measure the influence of each time slice on the final quality. Those time slices with high *R*<sup>2</sup> are identified to be critical to quality, and the phases with these time slices are identified as critical-toquality phases. The *k*th sampling time is defined. The prediction accuracy *R*2*k* of the quality prediction model for the quality index **y** is as follows:

$$R\_k^2 = \frac{\sum\_{i=1}^I (\mathbf{\hat{y}}\_{i,k} - \overline{\mathbf{y}})^2}{\sum\_{i=1}^I (\mathbf{y}\_i - \overline{\mathbf{y}})^2} \tag{4}$$

where **y***i* is the quality variable measurement value of the *i*th batch operation in the test batch, **<sup>y</sup>**<sup>ˆ</sup>*i*,*<sup>k</sup>* is the model prediction value of the *i*th batch operation quality variable of the predicted *k*th time slice, and **y** is the average value of the quality variable measurement value of the test batch. The value range of *R*2*k* is 0–1. When *R*2*k* approaches 1, it indicates that the accuracy of the quality prediction model is high, which indicates that the bigger impact on the quality variables is in this phase. On the contrary, the smaller the *R*2*k* is, the smaller the impact on quality variables is. Therefore, by observing the *R*2*k* size of different phases, the critical-to-quality phases in the batch process can be determined.

#### *2.2. Phase Mean Model*

According to the characteristics of batch processes, the whole process can be divided into several phases. There are obvious differences in the process variables in different phases, and the same phase can be almost considered to have similar process variables. In this work, for the multi-phase and multi-mode quality analysis, it is supposed that the characteristics along the time direction in each phase are constant. It is considered to establish such a model that can represent the process variable relationships of the entire phase. The phase mean model is achieved as follows.

First, the average variable matrix is calculated in phase *c*,

$$\overline{\mathbf{X}}\_{\mathbf{c}} = \sum\_{k=1}^{K\_{\mathbf{c}}} \mathbf{X}\_{k,\mathbf{c}} / K\_{\mathbf{c}} \tag{5}$$

where *Kc* is the data length of phase *c*. **<sup>X</sup>***k*,*<sup>c</sup>* is the data matrix of the process variables at the *k* moment in phase *c*. Thus, **X***c* is the average variable matrix of phase *c*.

Within phase *c*, phase regression models can be built using the PLS method,

$$\begin{aligned} \overline{\mathbf{X}}\_{\mathfrak{c}} &= \mathbf{T}\_{\mathfrak{c}} \mathbf{P}\_{\mathfrak{c}}^{\mathrm{T}} + \mathbf{E}\_{\mathfrak{c}} \\ \mathbf{Y} &= \mathbf{U}\_{\mathfrak{c}} \mathbf{Q}\_{\mathfrak{c}}^{\mathrm{T}} + \mathbf{F}\_{\mathfrak{c}} \end{aligned} \tag{6}$$

The previous model can be expressed by the regression form as:

$$
\hat{\mathbf{Y}}\_{\mathfrak{c}} = \overline{\mathbf{X}}\_{\mathfrak{c}} \mathbf{B}\_{\mathfrak{c}} \tag{7}
$$

**^**

where the concepts of **T***<sup>c</sup>*, **U***<sup>c</sup>*, **P***<sup>c</sup>*, **E***<sup>c</sup>*, **F***<sup>c</sup>*, and **B***c* are the same as those of the time-slice model,

except that each matrix is with the meaning of the phase mean. **Y***c* is the predicted quality of the *c*th phase mean model. When a single quality variable **y**(*<sup>I</sup>* × 1) is considered, the regression model can be simply expressed as:

$$
\hat{\mathbf{y}}\_{\varepsilon} = \overline{\mathbf{X}}\_{\varepsilon} \mathfrak{g}\_{\varepsilon} \tag{8}
$$

where β*c* is the regression parameter, and at present **T***c* is a matrix of the dimension *I* × *H*, **U***c* is a matrix of the dimension *I* × 1.

For process monitoring, Hotelling-*T*<sup>2</sup> and *SPE* statistics are calculated in systematic and residual subspaces, respectively [8,27].

$$\left|T\_{\mathbf{c}}\right|^{2} = \mathbf{x}\_{\mathbf{c}}{}^{\mathrm{T}}\mathbf{R}\_{\mathbf{c}}(\frac{\mathbf{T}\_{\mathbf{c}}{}^{\mathrm{T}}\mathbf{T}\_{\mathbf{c}}}{I-1})^{-1}\mathbf{R}\_{\mathbf{c}}{}^{\mathrm{T}}\mathbf{x}\_{\mathbf{c}}\tag{9}$$

$$\text{SPE}\_{\mathbf{c}} = \left\lVert \left\lVert \tilde{\mathbf{x}}\_{\mathbf{c}} \right\rVert \right\rVert^{2} = \left\lVert \left(\mathbf{I}\_{I\_{\mathbf{x}}} - \mathbf{P}\_{\mathbf{c}} \mathbf{R}\_{\mathbf{c}}^{\top}\right) \mathbf{x}\_{\mathbf{c}} \right\rVert^{2} \tag{10}$$

where **x***c* is the residual vector; **R***c* = **<sup>W</sup>***c*(**<sup>P</sup>***c*T**W***c*)<sup>−</sup>1; **W***c* is the weight matrix; *<sup>δ</sup>Tc*<sup>2</sup> (*α*) is the control limit with *α* confidence of *<sup>T</sup>c*2; and *SPEc*(*α*) is the *α* confidence limit of *SPE*. The detailed properties and calculations can be found in reference [28].

The corresponding control limits are:

$$\mathcal{S}\_{T\_c^{-2}}(a) = \frac{H(I^2 - 1)}{I(I - H)} F\_{ca}(H, I - H) \tag{11}$$

$$\text{SPE}\_{\mathfrak{C}}(\mathfrak{a}) = \mathfrak{g}\_{\mathfrak{c}} \chi^2\_{\mathfrak{h}, \mathfrak{a}} \tag{12}$$

where *<sup>F</sup>cα*(*<sup>H</sup>*, *I* − *H*) is the *F* distribution with *α* confidence and *H* and *I* − *H* degrees of freedom, and *H* is the number of retained latent variables; *<sup>g</sup>cχ*2*h*,*<sup>α</sup>* is the *χ*2 distribution with the same confidence level of *α* and the proportional coefficient of *gc* = *s*/2*μ*; *h* = <sup>2</sup>*μ*2/*s*; and *μ* is the mean value of *SPE*; *s* is the variance of *SPE*.

#### *2.3. Multi-Phase Residual Recursive Modeling for Single Mode*

In the phase-based PLS method, a phase regression model is established between the process variables and the final quality variables in each phase. It is assumed that in each phase, the model can capture the relationship between process variables and final quality variables. However, these individual models are not related to each other, and each phase seems to contribute to the final quality individually. This is in contradiction with the nature of the multi-phase batch process; that is, multi-phase acts on the final quality together in sequence. In addition, it should be noted that in the multi-phase batch process, the former phase may affect the later phase and the final process quality. In the current phase of quality regression modeling, the influence of the previous phases should be considered. Therefore, a recursive quality regression method for the multi-phase batch process is proposed, which uses the quality residuals of the previous phase model to establish the current phase regression model. All phases that are critical to quality are correlated by phase-based recursive regression residuals so that they together contribute to the final quality.

The establishment of a multi-phase residual recursive model is shown in Figure 1.

**Figure 1.** Illustration of multi-phase residual recursive modeling.

For a single phase, each phase is modeled by the regression between the average variable matrix **X***c* and the current quality residual **f***<sup>c</sup>*, then the regression parameter β*c* and the residual prediction quantity **^ f***c* are obtained,

$$\begin{aligned} \mathbf{f}\_{\varepsilon} &= \overline{\mathbf{X}}\_{\varepsilon} \mathfrak{B}\_{\varepsilon} + \mathbf{f}^\* \\ \stackrel{\scriptstyle \mathbf{f}}{\mathbf{f}}\_{\varepsilon} &= \overline{\mathbf{X}}\_{\varepsilon} \mathfrak{B}\_{\varepsilon} \end{aligned} \tag{13}$$

δ

The quality residual in the first phase **f**1 is the quality measurement itself. The residual of the second phase is the deviation between the prediction quality of the first phase and the residual of the first phase, and so on.

The current phase quality prediction results are the sum of the completed phase and the current phase quality residual prediction,

$$\stackrel{\star}{\mathbf{y}}\_{\mathcal{L}} = \sum\_{i=1}^{\mathcal{L}} \hat{\mathbf{f}}\_{i\prime} c = 1,2,...,\mathbf{C} \tag{14}$$

The final online quality prediction results are as follows:

$$\begin{array}{ll} \mathbf{x} \\ \mathbf{y}\_k = \begin{cases} \mathbf{X}\_{k,c\_1} \mathbf{\mathcal{B}}\_1 & k \in c\_1 \\ \mathbf{\hat{f}}\_1 + \mathbf{X}\_{k,c\_2} \mathbf{\mathcal{B}}\_2 & k \in c\_2 \\ \mathbf{\hat{f}}\_1 + \mathbf{\hat{f}}\_2 + \mathbf{X}\_{k,c\_3} \mathbf{\mathcal{B}}\_3 & k \in c\_3 \\ \mathbf{\hat{f}}\_1 + \mathbf{\hat{f}}\_2 + \mathbf{\hat{f}}\_3 + \mathbf{X}\_{k,c\_4} \mathbf{\mathcal{B}}\_4 & k \in c\_4 \end{array} \tag{15}$$

where *c*1, ... , *c*4 are four phases, respectively, and **<sup>X</sup>***k*,*c*<sup>1</sup> , ... , **<sup>X</sup>***k*,*c*<sup>4</sup> are the phase mean variable matrices.

The Hotelling-*T*<sup>2</sup> and *SPE* statistics for the current time *k* are:

$$\mathbf{T}\_k \mathbf{T}\_k^2 = \mathbf{x}\_k \mathbf{T} \mathbf{R}\_k \left(\frac{\mathbf{T}\_k \mathbf{}^\mathrm{T} \mathbf{T}\_k}{I - 1}\right)^{-1} \mathbf{R}\_k \mathbf{}^\mathrm{T} \mathbf{x}\_k \tag{16}$$

$$\text{SPE}\_k = \left\| \tilde{\mathbf{x}}\_k \right\|^2 = \left\| (\mathbf{I}\_{f\_k} - \mathbf{P}\_k \mathbf{R}\_k^{\top}) \mathbf{x}\_k \right\|^2 \tag{17}$$

where **<sup>x</sup>***k* is the residual vector at the current time.

The corresponding control limits are:

$$\mathcal{S}\_{T\_k^{-2}}(a) = \frac{H(I^2 - 1)}{I(I - H)} \mathcal{F}\_{ka}(H, I - H) \tag{18}$$

$$\text{SPE}\_k(a) = \text{g}\_k \text{\textquotedblleft}\_{h,a} \tag{19}$$

where *<sup>F</sup>kα*(*<sup>H</sup>*, *I* − *H*) is the *F* distribution with *α* confidence and *H* and *I* − *H* degrees of freedom, and *H* is the number of retained latent variables; *<sup>g</sup>kχ*2*h*,*<sup>α</sup>* is the *χ*2 distribution with the same confidence level of *α* and the proportional coefficient of *gk* = *s*/2*μ*; *h* = <sup>2</sup>*μ*2/*s*; and *μ* is the mean value of *SPE*; *s* is the variance of *SPE*.

#### *2.4. Between-Mode Modeling for Multiple Modes*

The multi-phase problem has been addressed in the previous part; thus, in this part, the multi-mode problem is the key interesting issue. While it does not mean the multi-phase problem is not considered any longer and without a special statement, the methodology below is proposed based on the above multi-phase analysis.

To solve the multi-mode problem, the main idea is to extract the relationship between the historical modes and the new mode. This proposed model contains more modal information and can better predict and monitor multiple modes. The framework of this section is shown in Figure 2. The model is established not only based on the new mode but also on the historical modes in the modal library. Firstly, the process variables and quality variables in historical modes are regressed and analyzed using the single-mode model. Secondly, the new mode process variables are applied to those single-mode models of historical modes, and the assumed predicted qualities of the new mode are obtained. Then, the regression analysis is carried out on the assumed predicted qualities and the final actual quality, obtaining the between-mode model. Finally, by applying the between-mode model, the final prediction quality is obtained. The details of between-mode quality regression modeling are introduced as follows.

**Figure 2.** Illustration of between-mode quality regression modeling.

Within phase *c*, for the new mode with the normalized time-slice process variables, **<sup>X</sup>***t*,*<sup>k</sup>*(*It* × *Jx*), and the quality variable **y***<sup>t</sup>*, process variables are first applied to the regression models obtained from the historical modes to obtain the assumed quality predictions,

$$\stackrel{\Lambda}{\mathbf{y}}\_{t,m,k} = \mathbb{X}\_{t,k} \mathfrak{B}\_{m,\mathbf{c}} \tag{20}$$

where *m* is the number of the historical modes, *m* = 1,2, ... ,*M*, and *t* represents the new mode. **^ y***<sup>t</sup>*,*m*,*<sup>k</sup>* are the assumed prediction quality. β*<sup>m</sup>*,*<sup>c</sup>* are the regression parameters of mode *m* of phase *c* for the historical modes. By obtaining the assumed quality predictions, the quality information of historical modes is shared by the new mode. Further, the quality information of historical modes will be judged and extracted by the next regression.

Then, the assumed quality predictions will be regressed with the quality data of the new mode. All these assumed predictions of the historical modes can comprise a new matrix **<sup>Z</sup>***t*,*<sup>k</sup>*(*It* × *<sup>M</sup>*), **<sup>Z</sup>***t*,*<sup>k</sup>* = **^y***<sup>t</sup>*,1,*k*,..., **^y***<sup>t</sup>*,*m*,*k*,..., **^**<sup>y</sup>*<sup>t</sup>*,*M*,*<sup>k</sup>*. Then, the *k*th time-slice PLS regression model is built between **<sup>Z</sup>***t*,*<sup>k</sup>* and **y***t* as follows [29]:

$$\begin{aligned} \mathbf{Z}\_{t,k} &= \mathbf{T}\_{t,k} \mathbf{P}\_{t,k}^{\mathrm{T}} + \mathbf{E}\_{t,k} \\ \mathbf{y}\_t &= \mathbf{U}\_{t,k} \mathbf{Q}\_{t,k}^{\mathrm{T}} + \mathbf{F}\_{t,k} \end{aligned} \tag{21}$$

where **<sup>T</sup>***t*,*<sup>k</sup>* and **<sup>U</sup>***t*,*<sup>k</sup>* are the score matrices of the new mode, **<sup>P</sup>***t*,*<sup>k</sup>* and **Q***<sup>t</sup>*,*<sup>k</sup>* are the loading matrices of the new mode, and **<sup>E</sup>***t*,*<sup>k</sup>* and **<sup>F</sup>***t*,*<sup>k</sup>* are the residual matrices of the new mode. Novel predictions are obtained,

$$\stackrel{\frown}{\mathbf{y}}\_{t,k}^{\*} = \mathbf{Z}\_{t,k} \mathbf{\color{red}{\mathbf{x}}\_{t,k}} \tag{22}$$

where **^ y** *t*,*k* shows this new regression relationship of the between-mode relationship analysis, *k* = 1,2, . . . ,*Kc*, and <sup>α</sup>*t*,*k* is the regression parameter of the *k*th time-slice model.

The regression parameters of phase *c* can be obtained from the regression parameters ofthetime-slicemodel,

<sup>α</sup>*t*,*<sup>c</sup>* = 1 *Kc Kc* ∑ *k*=1 <sup>α</sup>*t*,*k* (23)

**^**

∗

where *Kc* is the number of the time intervals within phase *c*. Then the predictions, **y** *t*,*c*,*k*, based on the regression parameter of the whole phase, <sup>α</sup>*t*,*c*, are obtained,

$$\stackrel{\star}{\mathbf{y}}\_{t,c,k}^{\*} = \mathbf{Z}\_{t,k} \mathbf{x}\_{t,c} \tag{24}$$

∗

Then for phase *c*, corresponding coefficients can be obtained:

$$\mathbf{Z}\_{t,\boldsymbol{\varepsilon}} = \frac{1}{K\_{\boldsymbol{\varepsilon}}} \sum\_{k=1}^{K\_{\boldsymbol{\varepsilon}}} \mathbf{Z}\_{t,k} \tag{25}$$

$$\mathbf{T}\_{t,\boldsymbol{\varepsilon}} = \frac{1}{K\_{\boldsymbol{\varepsilon}}} \sum\_{k=1}^{K\_{\boldsymbol{\varepsilon}}} \mathbf{T}\_{t,k} \tag{26}$$

$$\mathbf{P}\_{t,\mathcal{E}} = \frac{1}{K\_{\mathcal{E}}} \sum\_{k=1}^{K\_{\mathcal{E}}} \mathbf{P}\_{t,k} \tag{27}$$

$$\mathbf{W}\_{t,\boldsymbol{c}} = \frac{1}{K\_{\boldsymbol{c}}} \sum\_{k=1}^{K\_{\boldsymbol{c}}} \mathbf{W}\_{t,k} \tag{28}$$

$$\mathbf{R}\_{t,\varepsilon} = \mathbf{W}\_{t,\varepsilon} \left(\mathbf{P}\_{t,\varepsilon}\mathbf{W}\_{t,\varepsilon}\right)^{-1} \tag{29}$$

where *Kc* is the number of time intervals in phase *c*, *k* = 1,2, . . . ,*Kc*.

In online monitoring, the score matrix **T***t*,*c*, the load matrix **P***t*,*c*, and the weight matrix **W***t*,*<sup>c</sup>* are obtained according to the offline model. The online *T*<sup>2</sup> statistics and online *SPE* statistics are calculated:

Online *T*<sup>2</sup> statistics:

$$\left|T\_k\right|^2 = \mathbf{z}\_k^\mathrm{T} \mathbf{R}\_{t,\mathcal{c}} \left(\frac{\mathbf{T}\_{t,\mathcal{c}} \, ^\mathrm{T} \mathbf{T}\_{t,\mathcal{c}}}{I - 1}\right)^{-1} \mathbf{R}\_{t,\mathcal{c}} \, ^\mathrm{T} \mathbf{z}\_k \tag{30}$$

Online *SPE* statistics:

$$\text{SPE}\_k = \left\| \tilde{\mathbf{z}}\_k \right\|^2 = \left\| \left( \mathbf{I}\_M - \mathbf{P}\_{t,\mathbf{c}} \mathbf{R}\_{t,\mathbf{c}} \mathbf{}^\mathbf{T} \right) \mathbf{z}\_k \right\|^2 \tag{31}$$

where *<sup>T</sup>k*<sup>2</sup> and *SPEk* are the *T*<sup>2</sup> and *SPE* statistics calculated at the *k*th time interval, respectively, and **<sup>z</sup>***k* is the residual vector of the *k*th time interval.

The corresponding control limits are:

$$\mathcal{S}\_{T\_k^{-2}}(\alpha) = \frac{H(I^2 - 1)}{I(I - H)} \mathcal{F}\_{\text{ka}}(H, I - H) \tag{32}$$

$$\text{SPE}\_k(a) = \text{g}\_k \text{y}\_{h,a}^2 \tag{33}$$

where *<sup>F</sup>kα*(*<sup>H</sup>*, *I* − *H*) is the *F* distribution with *α* confidence and *H* and *I* − *H* degrees of freedom, and *H* is the number of retained latent variables; *<sup>g</sup>kχ*2*h*,*<sup>α</sup>* is the *χ*2 distribution with the same confidence level of *α* and the proportional coefficient of *gk* = *s*/2*μ*; *h* = <sup>2</sup>*μ*2/*s*;*μ* is the mean value of *SPE*; and *s* is the variance of *SPE*.

#### *2.5. Model Comparison and Selection*

In this section, two models are compared, which are the single-mode model and the between-mode model. To be clear, the single-mode model is introduced in Section 2.3. This model only considers one mode, and the quality is forecasted and monitored in its own mode on the basis of the critical-to-quality phase residual recursive analysis. The other model is developed in Section 2.4, and the between-mode model, which is established based on the historical modes and the new mode to obtain the assumed quality predictions and involve the quality information of the historical modes in the regression model for the new mode. It should be noticed that in both models, the multi-phase issue is addressed in the same way, by the residual recursive modeling, for the fair comparison as well as strategy consistency.

First, for the new batches **<sup>X</sup>***new*(*Inew* × *Jx*), the single-mode quality predictions **^ y***new*,*t*,*m*,*<sup>k</sup>* are gained at *k*th time. The multi-mode quality predictions **^ y** ∗ *new*,*t*,*c*,*k* are gained at *k*th time.

$$\begin{array}{l}\underset{\begin{subarray}{c}\mathbf{y}\_{n\times n},t,c,k\\\mathbf{y}\_{n\times n},t,m,k\end{subarray}}{\mathbf{y}\_{n\times n}}\mathbb{X}\_{n\times n} \\ \underset{\mathbf{y}\_{n\times n},t,m,k}{\mathbf{y}\_{n\times n}}\mathbb{X}\_{m,c} \end{array} \tag{34}$$

$$\mathbf{Z}\_{ncvw,t,k} = [\stackrel{\star}{\mathbf{y}}\_{ncvw,t,1,k}, \dots, \stackrel{\star}{\mathbf{y}}\_{ncvw,t,m,k}, \dots, \stackrel{\star}{\mathbf{y}}\_{ncvw,t,M,k}] \tag{35}$$

$$\stackrel{\star}{\mathbf{y}}\_{ncvw,t,c,k}^{\*} = \mathbf{Z}\_{ncvw,t,k} \mathbf{a}\_{t,c}$$

Then, the root-mean-square error (RMSE) values are obtained,

$$\mathbf{RMSE} = \sqrt{\frac{1}{I\_{new}} \sum\_{i=1}^{I\_{new}} \left( \mathbf{y}\_{new,t,i} - \mathbf{\hat{y}}\_{new,t,i,k} \right)^2} \tag{36}$$

$$\mathbf{RMSE} = \sqrt{\frac{1}{I\_{new}} \sum\_{i=1}^{I\_{new}} \left( \mathbf{y}\_{new,t,i} - \mathbf{\hat{y}}\_{new,t,i,k}^\* \right)^2} \tag{37}$$

The RMSE values can well reflect the precision of prediction. The smaller the RMSE values, the higher the prediction accuracy.

#### **3. Illustration and Discussions**

## *3.1. Process Description*

Injection molding technology is one of the important means of plastic processing, and it is also a typical batch process. In order to accurately predict the quality of products, it is necessary to know enough about the injection molding process. A complete injection molding process is mainly composed of mold closing, injection, packing-holding, plasticizing, cooling, mold opening, part ejection, and other processes. There are four phases that are the most important operation phases to determine the quality of parts: the first one is the injection phase, which injects the molten plastic into the mold; secondly, in the packing-holding phase, the packaging materials are used under a certain pressure; then, in the plasticizing phase, the material is transported forward, plasticized and melted, and then transferred to viscous fluid for storage; the final phase is the cooling phase, where the plastic is cooled in the mold until the part becomes sufficiently rigid for ejection. The process variables that have an important influence on the final quality can be read online by high-precision sensors.

In this work, high-density polyethylene (HDPE) was used as the injection material. The quality index analyzed in this experiment is the weight of injection molded parts. According to the different settings of packing pressure (PP) and barrel temperature (BT), the experimental batches can be divided into five different modes. The experimental conditions are shown in Table 1. The process data of each mode is stored in X (23 × 11 × 525). The quality data of each mode is stored in y (23 × 1). The data used in the modeling process are all real data obtained from experiments.


**Table 1.** Different operation modes.

#### *3.2. Critical-to-Quality Phase Identification*

In the injection molding process, different phases have different effects on the quality of products. For example, in the injection phase, the main variables affecting the final product weight are the injection speed and the barrel temperature. In general, the higher the barrel temperature is, the lower the product weight is. The faster the injection rate increases, the more melt injection and the greater the product weight. In addition, the pressure variables (such as the nozzle pressure, the cylinder pressure), the screw stroke, the injection speed, and the barrel temperature are positively correlated with the sputtering quality of injection products. That is to say, the faster the injection speed, the higher the pressure and the temperature are, and the more likely the sputtering phenomenon will appear in the intermittent operation. In the packing-holding phase, the weight of the injection molded part is mainly determined by the nozzle pressure, the cylinder pressure, and the cavity pressure. Two temperature variables, the cavity temperature and the barrel temperature, also affect the weight of the product. The lower the temperature, the greater the weight.

Taking mode 3 as an example, the critical-to-quality phase analysis is carried out. There are 23 batches in mode 3. A total of 18 train batches are selected as the prediction batches to analyze the phase characteristics. The *R*<sup>2</sup> *k* and the phase mean of *R*<sup>2</sup> *k* are shown in Figure 3. It can be seen from the figure that the *R*<sup>2</sup> *k* values of the injection phase and the packing-holding phase are larger, which means these two phases have greater impacts on the final prediction quality than other phases.

**Figure 3.** *R*<sup>2</sup> *k*contribution rate of the different batches in mode 3.

The phase mean value of *R*<sup>2</sup> *k* of the four phases under three different modes is shown in Table 2.

**Table 2.** Phase mean value *R*<sup>2</sup> *k*of the different modes and different phases.


According to the data in the above table, for mode 1, mode 2, mode 3, and mode 4, the phase mean values of *R*<sup>2</sup> *k* of the injection phase and the packing-holding phase are greater than the phase mean values of *R*<sup>2</sup> *k* of the plasticizing phase and the cooling phase. For mode 5, the phase mean value of *R*<sup>2</sup> *k* of the cooling phase is the largest. Based on the mean *R*<sup>2</sup> *k*, the injection phase and the packing-holding phase are selected as the critical-to-quality phases for subsequent monitoring and analysis.

#### *3.3. Multi-Phase Monitoring for Single Mode*

In this part, the single-mode model is adopted for quality prediction and process monitoring. The first 18 batches of mode 3 are selected for modeling, and the last 5 batches of mode 3 are tested. According to the four-fold cross-validation method, in the modeling of the injection molding phase, the number of reserved latent variables of the traditional method and the proposed method is four. In the packing-holding phase, the number of latent variables of the traditional method is three, while the number of latent variables of the proposed method is two. The confidence level of *α* is set to 0.99. The simulation result of the predicted quality of one test batch is shown in Figure 4 and compared with the traditional partial multi-phase least squares method [30], in which for each phase, one single model is built for quality prediction. The mean RMSE predicted for the five test batches under different prediction methods are shown in Table 3. It can be seen from Table 3 that the mean RMSE predicted by the traditional method is 0.0702, while the mean RMSE predicted by the proposed method is 0.0632, which indicates that the proposed recursive method of phase residuals shows a more accurate prediction effect. The results of monitoring of the first test batch are shown in Figures 5 and 6. Because the traditional method also divides the batch process into four phases, in each phase, the quality is directly predicted and monitored, and in the first phase the proposed method regards the actual quality as the residual of the first phase, so the prediction and monitoring effects of the first phase, namely the injection phase, of the traditional method and proposed method is the same. The monitoring results of the injection phase are shown in Figure 5. It can be seen that *T*<sup>2</sup> and *SPE* are not beyond the control limits. In Figure 6, the monitoring results of the packing-holding phase are shown. It can be seen that *T*<sup>2</sup> and *SPE* of both the traditional method and the proposed method are not beyond their respective control limits. This shows that the proposed modeling method based on a single mode can monitor the corresponding test batches.

**Figure 4.** Single-mode online prediction of mode 3.

**Table 3.** RMSE of the different prediction modes in single-mode prediction.


**Figure 5.** Single-mode online monitoring of injection phase of mode 3.

**Figure 6.** Single-mode online monitoring of packing-holding phase of mode 3.

In addition, batches from mode 1 are tested using the monitoring model built based on mode 3; that is, the first 18 batches of mode 3 are selected for modeling, and 5 batches of mode 1 are tested. The results of each phase of one batch of five test batches in mode 1 are displayed. The quality prediction result is shown in Figure 7 and compared with that of the traditional partial least squares method. The mean RMSE predicted for the five test batches under different prediction methods are shown in Table 3. The mean RMSE predicted by the traditional method is 0.1398, while the mean RMSE predicted by the proposed method is 0.1154, which indicates that the proposed recursive method of phase residuals shows a more accurate prediction effect. The monitoring results of the injection phase of mode 1 are shown in Figure 8. It can be seen that *T*<sup>2</sup> statistics do not exceed the control limit, but *SPE* statistics have exceeded the limit. The monitoring results of mode 1 in the packing-holding phase are shown in Figure 9. It can be seen that *SPE* statistics of the proposed method have exceeded the control limits in the beginning part. However, *SPE* statistics of the traditional method do not exceed the control limit. So the proposed method can distinguish this batch of mode 1 and is better than the traditional method. Thus, when a single mode is used for modeling, the other modes can be distinguished by the proposed method.

In order to compare the prediction effect of different modes and different methods under the single-mode modeling, RMSE of prediction results of five test batches of mode 3 and mode 1 are calculated respectively on the basis of the model of mode 3, as shown in Table 3.

**Figure 7.** Single-mode online prediction of mode 1.

**Figure 8.** Single-mode online monitoring of injection phase of mode 1.

**Figure 9.** Single-mode online monitoring of packing-holding phase of mode 1.

It can be seen from the above table that in the single-mode modeling and prediction, the prediction effect of the test mode, which is the same as the modeling mode, is better than that of other test modes. In addition, according to the comparison of different methods, it can be concluded that the prediction effect of the proposed method is more accurate than that of the traditional method.

In the injection molding process, there are two main faults. One is material disturbance. A small amount of polypropylene (PP) is mixed into the original material HDPE. Because

the viscosity of PP is higher than that of HDPE, higher heat will be generated in the operation process, resulting in the melt temperature in the nozzle being higher than the normal state. The second is the sensor fault. Due to the sensor fault, no data can be detected, resulting in a fault in the process.

First, a faulty batch caused by material disturbance is selected for monitoring, where the temperature variable is increased by 5 ◦C at the 60th sampling point. Therefore, according to the actual process situation, a batch is selected in the test batch of mode 3, and the temperature variable is increased by 5 ◦C at the 60th sampling point. The monitoring effects of the traditional method and the proposed method are shown in Figure 10. Compared with the traditional method, the monitoring effect of the proposed method is better since the statistics will rise rapidly when the fault occurs, especially for the *T*<sup>2</sup> statistics.

**Figure 10.** Single-mode online monitoring of material disturbance fault.

For the sensor fault, a test batch with the pressure variable removed after the 150th sampling point is monitored. The *T<sup>2</sup>* and *SPE* monitoring effects of the traditional method and the proposed method of the single-mode model are shown in Figure 11. Compared with the traditional method, the statistics of the proposed method rise more rapidly, and the amplitudes are relatively large.

**Figure 11.** Single-mode online monitoring of sensor fault.
