**5. Conclusions**

A process monitoring method based on the dynamic autoregressive latent variable model was proposed in this paper. Compared with the traditional DPLVM monitoring method, this method not only considered the dynamic characteristics of the process but also considered the complex time lag characteristics, integrated the time lag information into the model, and greatly improved the monitoring performance of the model in the time lag process. First, from the point of data, this method established a dynamic autoregressive latent variable model to adopt the characteristics of dynamics and variable time lag. Then a fusion Bayesian filtering, smoothing and expectation maximization algorithm was used to identify model parameters. Then, on the basis of the identified model, the improved Bayesian filtering technique was used to infer the latent variable distribution of the process state, and the T<sup>2</sup> statistic was constructed for the latent space and online monitoring is performed. Finally, the proposed method was applied to the monitoring of the sintering process of ternary cathode materials. Through industrial case studies, the modeling and monitoring results of the proposed method show that the DALM model was better than the static and first-order dynamic modeling process monitoring methods.

An important issue for process monitoring application in industrial processes is the multisampling rate problem. The method proposed in this paper assumed that the input and output data had the same sampling rate. If the sampling rate was inconsistent, some data were deleted by down-sampling. However, a more worthwhile way to try would be to combine semi-supervised learning methods, which can train data on unbalanced input and output data, thereby improving data utilization. Another practical problem is the non-linear relationship between process data, which is very common in industrial processes. How to effectively deal with this problem is worthy of further research in the near future to make the monitoring method more applicable.

**Author Contributions:** Conceptualization, N.C. and W.G.; methodology, F.H. and J.C.; software, F.H. and Z.C.; validation, J.C.; writing—review and editing, N.C., F.H., J.C., Z.C. and X.L.; supervision, N.C., Z.C. and W.G.; project administration, W.G. and X.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was funded in part by the Key Program of the National Natural Science Foundation of China (62033014) and in part by the Application Projects of Integrated Standardization and New Paradigm for Intelligent Manufacturing from the Ministry of Industry and Information Technology of China in 2016 and in part by the Fundamental Research Funds for the Central Universities of Central South University(2021zzts0700), in part by the Project of State Key Laboratory of High Performance Complex Manufacturing (#ZZYJKT2020-14).

**Institutional Review Board Statement:** Not applicable. **Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are available upon request from the first author. The data are not publicly available due to intellectual property protection.

**Acknowledgments:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **Appendix A. Detailed Derivation of the M-Step**

According to the EM algorithm, all the parameters of the DALM model can be updated in M steps. By maximizing the cost function *<sup>Q</sup>***Θ**|**Θ***old*, the estimated value **Θ***new* of the next iteration parameter was determined, which is shown in (A1).

$$\Theta^{\rm nco} = \operatorname\*{argmax}\_{\Theta} Q(\Theta | \Theta\_{old}). \tag{A1}$$

 The Q function was applied to the partial derivative of the model parameters and the derivative was set to zero.

$$\frac{\partial Q(\boldsymbol{\Theta}|\boldsymbol{\Theta}\_{old})}{\partial \boldsymbol{\Theta}} = 0.\tag{A2}$$

The updated value of the model parameter **Θ***new* was obtained, as shown in (A3)–(A10).

$$\mathbf{u}\_0^{\text{new}} = E\_{\mathbf{z}\_I} \left( \begin{bmatrix} & \mathbf{z}\_0 \\ & \vdots \\ & \mathbf{z}\_{-L+1} \end{bmatrix} \right), \tag{A3}$$

$$\mathbf{V}\_{0}^{\text{new}} = E\_{\mathbf{z}\_{T}} \left( \begin{bmatrix} \mathbf{z}\_{0} \\ \vdots \\ \mathbf{z}\_{-L+1} \end{bmatrix} \begin{bmatrix} \mathbf{z}\_{0} \\ \vdots \\ \mathbf{z}\_{-L+1} \end{bmatrix}^{T} \right) - E\_{\mathbf{z}\_{T}} \left( \begin{bmatrix} \mathbf{z}\_{0} \\ \vdots \\ \mathbf{z}\_{-L+1} \end{bmatrix} \right) E\_{\mathbf{z}\_{T}} \left( \begin{bmatrix} \mathbf{z}\_{0} \\ \vdots \\ \mathbf{z}\_{-L+1} \end{bmatrix} \right)^{T}, \tag{A4}$$
 
$$(\mathbf{z}\_{-T}, \dots, \mathbf{z}\_{-T}) \cdot \int\_{-\infty}^{\infty} \mathbf{z}\_{-T}^{\text{old}} \times \mathbf{f}\_{-T} \left( \begin{bmatrix} \mathbf{z}\_{0} \\ \vdots \\ \mathbf{z}\_{-L+1} \end{bmatrix} \right)^{T} \mathbf{J}^{-1}$$

$$\mathbf{A}^{\text{new}} = \sum\_{t=1}^{T} \mathbb{E}\_{\mathbf{Z}\_{T}} \left( \mathbf{z}\_{t} \begin{bmatrix} \mathbf{z}\_{t-1} \\ \vdots \\ \mathbf{z}\_{t-L} \end{bmatrix}^{T} \right) \left[ \sum\_{t=1}^{T} \mathbb{E}\_{\mathbf{Z}\_{T}} \left( \begin{bmatrix} \mathbf{z}\_{t-1} \\ \vdots \\ \mathbf{z}\_{t-L} \end{bmatrix} \begin{bmatrix} \mathbf{z}\_{t-1} \\ \vdots \\ \mathbf{z}\_{t-L} \end{bmatrix}^{T} \right) \right]^{-1},\tag{A5}$$

$$\Sigma\_{\mathbf{z}}^{\rm new} = \frac{1}{T} \sum\_{t=1}^{T} \left( E\_{\mathbf{z}\_{T}} \left( \mathbf{z}\_{t} \mathbf{z}\_{t}^{T} \right) - 2 \mathbf{A}^{\rm new} E\_{\mathbf{z}\_{T}} \left( \begin{bmatrix} \mathbf{z}\_{t-1} \\ \vdots \\ \mathbf{z}\_{t-L} \end{bmatrix} \mathbf{z}\_{t}^{T} \right) + \mathbf{A}^{\rm new} E\_{\mathbf{z}\_{T}} \left( \begin{bmatrix} \mathbf{z}\_{t-1} \\ \vdots \\ \mathbf{z}\_{t-L} \end{bmatrix} \begin{bmatrix} \mathbf{z}\_{t-1} \\ \vdots \\ \mathbf{z}\_{t-L} \end{bmatrix}^{T} \right) \mathbf{A}^{\rm new} T \right), \tag{A6}$$
 
$$\mathbf{z}\_{t} = \mathbf{z}\_{t-1} \tag{A7}$$

$$\mathbf{B}\_{\mathbf{x}}^{\rm nc\mathcal{W}} = \sum\_{l=1}^{T} \mathbf{x}\_{l} E\_{\mathbf{x}\_{l}} \left( \mathbf{z}\_{l}^{T} \right) \left[ \sum\_{l'=1}^{T} E\_{\mathbf{z}\_{l'}} \left( \mathbf{z}\_{l} \mathbf{z}\_{l}^{T} \right) \right]^{-1},\tag{A7}$$

$$\mathbf{B}\_{\mathbf{y}}^{\text{ucc}} = \sum\_{t=1}^{T} \mathbf{y}\_{t} E\_{\mathbf{z}\_{T}} \left( \mathbf{z}\_{t}^{T} \right) \left[ \sum\_{t=1}^{T} E\_{\mathbf{z}\_{T}} \left( \mathbf{z}\_{t} \mathbf{z}\_{t}^{T} \right) \right]^{-1} \text{ .} \tag{A8}$$

$$\Sigma\_{\mathbf{z}}^{\rm ncw} = \frac{1}{T} \sum\_{t=1}^{T} \left( \mathbf{x}\_{l} \mathbf{x}\_{l}^{T} - 2 \mathbf{B}\_{\mathbf{x}}^{\rm ncw} E\_{\mathbf{z}\_{l}}(\mathbf{z}\_{l}) \mathbf{x}\_{l}^{T} + \mathbf{B}\_{\mathbf{x}}^{\rm ncw} E\_{\mathbf{z}\_{l}} \left( \mathbf{z}\_{l} \mathbf{z}\_{l}^{T} \right) \mathbf{B}\_{\mathbf{x}}^{\rm ncw} \right), \tag{A9}$$

$$\Sigma\_{\mathbf{y}}^{\rm new} = \frac{1}{T} \sum\_{t=1}^{T} \left( \mathbf{y}\_t \mathbf{y}\_t^T - 2 \mathbf{B}\_{\mathbf{y}}^{\rm new} \mathbf{E}\_{\mathbf{z}\_T}(\mathbf{z}\_t) \mathbf{y}\_t^T + \mathbf{B}\_{\mathbf{y}}^{\rm new} \boldsymbol{E}\_{\mathbf{z}\_T} \left( \mathbf{z}\_t \mathbf{z}\_t^T \right) \mathbf{B}\_{\mathbf{y}}^{\rm new} \boldsymbol{v}^T \right). \tag{A10}$$

The updated parameter set **Θ***new* = 1**A***new*,**B***new* **x** ,**B***new* **y** , **u***new* 0 , **V***new* 0 , Σ*new* **z** , Σ*new* **x** , Σ*new* **y** 3, E steps and M steps were iterated until the parameter **Θ** matrix converged, that is, satisfied (A11), where *ς* is a sufficiently small constant, and the model parameter identification was completed.

$$||\Theta^{\rm{ucr}} - \Theta^{\rm{ol}}|| \, \prec \, \emptyset. \tag{A11}$$

Among them, **Θ***old* is the parameter of the last iteration, and **Θ***new* is the parameter after this round of iteration. Only when the parameters obtained by two adjacent identifications converged did the algorithm stop calculating. Therefore, the parameter convergence can be guaranteed by the EM algorithm itself.

### **Appendix B. Detailed Derivation of the E-Step**

In order to determine the statistics *E***<sup>z</sup>***T* (**<sup>z</sup>***t*), *E***<sup>z</sup>***T* **<sup>z</sup>***t***<sup>z</sup>***Tt* and *E***<sup>z</sup>***T* **<sup>z</sup>***t***<sup>z</sup>***Tt*−*<sup>i</sup>*, the forward and backward algorithm were employed. This is an iterative calculation method, which includes the forward filtering and backward correction step.

In the Bayesian filtering stage, the goal was to calculate the posterior probability of the latent variable [**<sup>z</sup>***t*, **<sup>z</sup>***t*−1, ··· , **<sup>z</sup>***t*−*L*+<sup>1</sup>] with respect to the variable **<sup>x</sup>**1:*t*, **y**1:*t* at time *t*, given the posterior distribution **<sup>z</sup>***t*−1, **z***t*−2, ··· , **<sup>z</sup>***t*−*<sup>L</sup>*|**<sup>x</sup>**1:*t*−1, **y**1:*t*−1 ∼ *<sup>N</sup>*(**<sup>u</sup>***t*−1, **<sup>V</sup>***t*−<sup>1</sup>) of the latent variable [**<sup>z</sup>***t*−1, **z***t*−2, ··· , **<sup>z</sup>***t*−*<sup>L</sup>*] at the previous time *t* − 1 on the variable **<sup>x</sup>**1:*t*−1, **y**1:*t*−1, as shown in (A12) where 1 ≤ *t* ≤ *T*,

$$(\mathbf{z}\_{l}, \mathbf{z}\_{l-1}, \dots, \mathbf{z}\_{l-L+1} | \mathbf{x}\_{1:l}, \mathbf{y}\_{1:l} \sim N(\mathbf{u}\_{l}, \mathbf{V}\_{l}) = N\left( \begin{bmatrix} \mathbf{u}\_{l}^{1} \\ \vdots \\ \mathbf{u}\_{l}^{L} \end{bmatrix}, \begin{bmatrix} \mathbf{V}\_{l}^{11} & \cdots & \mathbf{V}\_{l}^{1L} \\ \vdots & \ddots & \vdots \\ \mathbf{V}\_{l}^{L1} & \cdots & \mathbf{V}\_{l}^{LL} \end{bmatrix} \right). \tag{A12}$$

The joint probability distribution of the latent variables [**<sup>z</sup>***t*, **<sup>z</sup>***t*−1, ··· , **<sup>z</sup>***t*−*L*+<sup>1</sup>] and **x***t*, **y***t* with respect to the variable **<sup>x</sup>**1:*t*−1, **y**1:*t*−1 is shown in (A13).

$$\mathbf{z}\_{t}, \mathbf{z}\_{t-1}, \dots, \mathbf{z}\_{t-L+1}, \mathbf{x}\_{t}, \mathbf{y}\_{t} | \mathbf{x}\_{1:t-1}, \mathbf{y}\_{1:t-1} \sim N(\mathbf{g}\_{t}, \mathbf{G}\_{t})$$

$$\mathbf{z} = N\left( \begin{bmatrix} \mathbf{g}\_{t}^{1} \\ \vdots \\ \mathbf{g}\_{t}^{L+2} \end{bmatrix}, \begin{bmatrix} \mathbf{G}\_{t}^{11} & \cdots & \mathbf{G}\_{t}^{1(L+2)} \\ \vdots & \ddots & \vdots \\ \mathbf{G}\_{t}^{(L+2)1} & \cdots & \mathbf{G}\_{t}^{(L+2)(L+2)} \end{bmatrix} \right). \tag{A13}$$

The parameters of (A13) can be calculated by (A15)

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩ **g**1*t* = *<sup>E</sup>* **<sup>z</sup>***t*|**<sup>x</sup>**1:*t*−1, **<sup>y</sup>**1:*t*−<sup>1</sup> = **Au***t*−<sup>1</sup> **g***it* = *<sup>E</sup>* **<sup>z</sup>***t*−*i*+<sup>1</sup>|**<sup>x</sup>**1:*t*−1, **<sup>y</sup>**1:*t*−<sup>1</sup> = **u***i*−<sup>1</sup> *t*−1 **g***L*+<sup>1</sup> *t* = *<sup>E</sup>* **<sup>x</sup>***t*|**<sup>x</sup>**1:*t*−1, **<sup>y</sup>**1:*t*−<sup>1</sup> = **Cg**<sup>1</sup>*t* **g***L*+<sup>2</sup> *t* = *<sup>E</sup>* **<sup>y</sup>***t*|**<sup>x</sup>**1:*t*−1, **<sup>y</sup>**1:*t*−<sup>1</sup> = **Pg**<sup>1</sup>*t* , (A14) **<sup>G</sup>**11*t* = cov**<sup>z</sup>***t*, **<sup>z</sup>***t*|**<sup>x</sup>**1:*t*−1, **<sup>y</sup>**1:*t*−<sup>1</sup> = **Au***t*−1**A***<sup>T</sup>* + Σ**z G**1*<sup>i</sup> t* = cov**<sup>z</sup>***t*, **<sup>z</sup>***t*−*i*+<sup>1</sup>|**<sup>x</sup>**1:*t*−1, **<sup>y</sup>**1:*t*−<sup>1</sup> = **A**⎡⎢⎢⎣ **V**<sup>1</sup>(*<sup>i</sup>*−<sup>1</sup>) *t*−1... **V***<sup>L</sup>*(*<sup>i</sup>*−<sup>1</sup>) *t*−1 ⎤⎥⎥⎦ **<sup>G</sup>***i*1*t* = **G**1*<sup>i</sup> t T* **G***ijt* = cov**<sup>z</sup>***t*−*i*+1, **<sup>z</sup>***t*−*j*+<sup>1</sup>|**<sup>x</sup>**1:*t*−1, **<sup>y</sup>**1:*t*−<sup>1</sup> = **V**(*<sup>i</sup>*−<sup>1</sup>)(*j*−<sup>1</sup>) *t*−1 **G**(*<sup>L</sup>*+<sup>1</sup>)*<sup>k</sup> t* = cov**<sup>x</sup>***t*, **<sup>z</sup>***t*−*k*+<sup>1</sup>|**<sup>x</sup>**1:*t*−1, **<sup>y</sup>**1:*t*−<sup>1</sup> = **BxG**<sup>1</sup>*kt* **G**(*<sup>L</sup>*+<sup>1</sup>)*<sup>k</sup> t* = **G***k*(*<sup>L</sup>*+<sup>1</sup>) *t T* **G**(*<sup>L</sup>*+<sup>2</sup>)*<sup>k</sup> t* = cov**<sup>y</sup>***t*, **<sup>z</sup>***t*−*k*+<sup>1</sup>|**<sup>x</sup>**1:*t*−1, **<sup>y</sup>**1:*t*−<sup>1</sup> = **ByG**<sup>1</sup>*kt* **G**(*<sup>L</sup>*+<sup>2</sup>)*<sup>k</sup> t* = **G***k*(*<sup>L</sup>*+<sup>2</sup>) *t T* **G**(*<sup>L</sup>*+<sup>1</sup>)(*<sup>L</sup>*+<sup>1</sup>) *t* = cov**<sup>x</sup>***t*, **<sup>x</sup>***t*|**<sup>x</sup>**1:*t*−1, **<sup>y</sup>**1:*t*−<sup>1</sup> = **BxG**<sup>11</sup>*t* **Bx***<sup>T</sup>* + Σ**x G**(*<sup>L</sup>*+<sup>1</sup>)(*<sup>L</sup>*+<sup>2</sup>) *t* = cov**<sup>x</sup>***t*, **<sup>y</sup>***t*|**<sup>x</sup>**1:*t*−1, **<sup>y</sup>**1:*t*−<sup>1</sup> = **BxG**<sup>11</sup>*t* **By***<sup>T</sup>* **G**(*<sup>L</sup>*+<sup>2</sup>)(*<sup>L</sup>*+<sup>1</sup>) *t* = **G**(*<sup>L</sup>*+<sup>2</sup>)(*<sup>L</sup>*+<sup>1</sup>) *t T* **G**(*<sup>L</sup>*+<sup>2</sup>)(*<sup>L</sup>*+<sup>2</sup>) *t* = cov**<sup>y</sup>***t*, **<sup>y</sup>***t*|**<sup>x</sup>**1:*t*−1, **<sup>y</sup>**1:*t*−<sup>1</sup> = **ByG**<sup>11</sup>*t* **By***<sup>T</sup>* + <sup>Σ</sup>**y** , (A15)

where *i* = 2, 3, ··· , *L*; *j* = 2, 3, ··· , *L*; *k* = 1, 2, ··· , *L*, therefore, according to the knowledge of conditional probability [34] and Appendix C, the mean value and variance of the latent variable filter distribution **z***t*, **<sup>z</sup>***t*−1, ··· , **<sup>z</sup>***t*−*L*+<sup>1</sup>|**<sup>x</sup>**1:*t*, **y**1:*t* ∼ *<sup>N</sup>*(**<sup>u</sup>***t*, **<sup>V</sup>***t*) were calculated as shown in (A16).

$$\begin{aligned} \mathbf{u}\_{l} &= \begin{bmatrix} \mathbf{g}\_{l}^{L} \\ \vdots \\ \mathbf{g}\_{l}^{L} \end{bmatrix} + \begin{bmatrix} \mathbf{G}\_{l}^{1(L+1)} & \mathbf{G}\_{l}^{1(L+2)} \\ \vdots & \vdots \\ \mathbf{G}\_{l}^{L(L+1)} & \mathbf{G}\_{l}^{L(L+2)} \end{bmatrix} \begin{bmatrix} \mathbf{G}\_{l}^{(L+1)(L+1)} & \mathbf{G}\_{l}^{(L+1)(L+2)} \\ \mathbf{G}\_{l}^{(L+2)(L+1)} & \mathbf{G}\_{l}^{(L+2)(L+2)} \end{bmatrix}^{-1} \begin{bmatrix} \mathbf{x}\_{l} - \mathbf{g}\_{l}^{L+1} \\ \mathbf{y}\_{l} - \mathbf{g}\_{l}^{L+2} \end{bmatrix}, \\ \mathbf{u}\_{l} &= \begin{bmatrix} \mathbf{G}\_{l}^{1(l)} & \cdots & \mathbf{G}\_{l}^{1L} \\ \vdots & \ddots & \vdots \\ \mathbf{G}\_{l}^{(L+1)} & \cdots & \mathbf{G}\_{l}^{(L+2)(L+2)} \end{bmatrix} - \begin{bmatrix} \mathbf{G}\_{l}^{(L+1)(L+1)} & \mathbf{G}\_{l}^{(L+1)(L+2)} \\ \vdots & \vdots \\ \mathbf{G}\_{l}^{(L+2)(L+1)} & \mathbf{G}\_{l}^{(L+2)(L+2)} \end{bmatrix}^{-1} \begin{bmatrix} \mathbf{G}\_{l}^{(L+1)} & \mathbf{G}\_{l}^{(L+2)} \\ \vdots & \vdots \\ \mathbf{G}\_{l}^{(L+1)} & \mathbf{G}\_{l}^{(L+2)(L+2)} \end{bmatrix}^{T} \end{aligned} \tag{A16}$$

In the Bayesian smoothing stage, the goal was to calculate the posterior probability **<sup>z</sup>***t*+1, **z***t*, ··· , **<sup>z</sup>***t*−*L*+<sup>2</sup>|**<sup>x</sup>**1:*T*, **y**1:*T* ∼ *<sup>N</sup>*(**<sup>m</sup>***t*+1, **<sup>M</sup>***t*+<sup>1</sup>) of the latent variable [**<sup>z</sup>***t*+1, **z***t*, ··· , **<sup>z</sup>***t*−*L*+<sup>2</sup>] with respect to the variable **x**1:*T*, **y**1:*T* at time *t* + 1 to calculate the posterior probability of the latent variable [**<sup>z</sup>***t*, **<sup>z</sup>***t*−1, ··· , **<sup>z</sup>***t*−*L*+<sup>1</sup>] with respect to the variable **x**1:*T*, **y**1:*T* at time *t*, as shown in (A17).

$$(\mathbf{z}\_{l}, \mathbf{z}\_{l-1}, \dots, \mathbf{z}\_{l-L+1} | \mathbf{x}\_{1:T}, \mathbf{y}\_{1:T} \sim N(\mathbf{m}\_{l}, \mathbf{M}\_{l}) = N\left( \begin{bmatrix} \mathbf{m}\_{l}^{1} \\ \vdots \\ \mathbf{m}\_{l}^{L} \end{bmatrix}, \begin{bmatrix} \mathbf{M}\_{l}^{11} & \cdots & \mathbf{M}\_{l}^{1L} \\ \vdots & \ddots & \vdots \\ \mathbf{M}\_{l}^{L1} & \cdots & \mathbf{M}\_{l}^{LL} \end{bmatrix} \right), \tag{A17}$$

where 0 ≤ *t* ≤ *T*, T, there is **m***T* = **<sup>u</sup>***T*,**M***<sup>T</sup>* = **V***T*. In order to calculate the distribution, first, the posterior distribution of the latent variable [**<sup>z</sup>***t*+1, **z***t*, ··· , **<sup>z</sup>***t*−*L*+<sup>1</sup>] was calculated with respect to the variable **<sup>x</sup>**1:*t*, **y**1:*t*, as shown in (A18).

$$(\mathbf{z}\_{l+1}, \mathbf{z}\_{l}, \cdots, \mathbf{z}\_{l-L+1} | \mathbf{x}\_{1:t}, \mathbf{y}\_{1:t} \sim N(\mathbf{d}\_{l}, \mathbf{D}\_{l}) = N\left(\begin{bmatrix} \mathbf{d}\_{l}^{1} \\ \vdots \\ \vdots \\ \mathbf{d}\_{l}^{L+1} \end{bmatrix}, \begin{bmatrix} \mathbf{D}\_{l}^{11} & \cdots & \mathbf{D}\_{l}^{1(L+1)} \\ \vdots & \ddots & \vdots \\ \mathbf{D}\_{l}^{(L+1)1} & \cdots & \mathbf{D}\_{l}^{(L+1)(L+1)} \end{bmatrix}\right). \tag{A18}$$

The parameter calculation of (A18) is shown in (A19)–(A20).

$$\begin{cases} \mathbf{d}\_{l}^{1} = E(\mathbf{z}\_{l+1}|\mathbf{x}\_{1:t}, \mathbf{y}\_{1:t}) = \mathbf{A}\mathbf{u}\_{l} \\\\ \mathbf{d}\_{l}^{i} = E(\mathbf{z}\_{l-i+2}|\mathbf{x}\_{1:t}, \mathbf{y}\_{1:t}) = \mathbf{u}\_{l}^{i-1} \\\\ \mathbf{D}\_{l}^{1} = \text{cov}(\mathbf{z}\_{l+1}, \mathbf{z}\_{l+1}|\mathbf{x}\_{1:t}, \mathbf{y}\_{1:t}) = \mathbf{A}\mathbf{V}\_{l}\mathbf{A}^{T} + \Sigma\_{\mathbf{Q}} \\\\ \mathbf{D}\_{l}^{1\mid l} = \text{cov}(\mathbf{z}\_{l+1}, \mathbf{z}\_{l-i+2}|\mathbf{x}\_{1:t}, \mathbf{y}\_{1:t}) = \mathbf{A} \begin{bmatrix} \mathbf{v}\_{l}^{1(\ell-1)} \\\\ \vdots \\\\ \mathbf{v}\_{l}^{L(\ell-1)} \end{bmatrix}^{\top}, & \text{(A19)} \\\\ \mathbf{D}\_{l}^{ij} = \text{cov}\left(\mathbf{z}\_{l-i+2}, \mathbf{z}\_{l-j+2}|\mathbf{x}\_{1:t}, \mathbf{y}\_{1:t}\right) = \mathbf{V}\_{l}^{(i-1)(j-1)} \end{cases} \tag{A19}$$

where i = 2, 3, ··· , *L* + 1; *j* = 2, 3, ··· , *L* + 1, and then the following distribution was calculated.

*<sup>P</sup>*(**<sup>z</sup>***t*−*L*+<sup>1</sup>|**<sup>z</sup>***t*+1, **z***t*, ··· , **<sup>z</sup>***t*−*L*+2, **x**1:*T*, **<sup>y</sup>**1:*T*) = *<sup>P</sup>*(**<sup>z</sup>***t*−*L*+<sup>1</sup>|**<sup>z</sup>***t*+1, **z***t*, ··· , **<sup>z</sup>***t*−*L*+2, **<sup>x</sup>**1:*t*, **<sup>y</sup>**1:*t*) = *<sup>N</sup>*(**<sup>r</sup>***t*, **<sup>R</sup>***t*). (A20)

According to the knowledge of conditional probability [34] and Appendix C. The calculation of its mean and variance is shown in (A21).

$$\begin{aligned} \mathbf{r}\_{l} &= \mathbf{d}\_{l}^{L+1} + \begin{bmatrix} \mathbf{D}\_{l}^{1(L+1)} \\ \vdots \\ \mathbf{D}\_{l}^{L(L+1)} \end{bmatrix}^{T} \begin{bmatrix} \mathbf{D}\_{l}^{11} & \cdots & \mathbf{D}\_{l}^{11} \\ \vdots & \ddots & \vdots \\ \mathbf{D}\_{l}^{L1} & \vdots & \mathbf{D}\_{l}^{LL} \end{bmatrix}^{-1} \begin{bmatrix} \mathbf{z}\_{l+1} - \mathbf{d}\_{l}^{1} \\ \vdots \\ \mathbf{z}\_{l-L+2} - \mathbf{d}\_{l}^{L} \end{bmatrix}, \\\ \mathbf{R}\_{l} &= \mathbf{D}\_{l}^{(L+1)(L+1)} - \begin{bmatrix} \mathbf{D}\_{l}^{1(L+1)} \\ \vdots \\ \mathbf{D}\_{l}^{L(L+1)} \end{bmatrix}^{T} \begin{bmatrix} \mathbf{D}\_{l}^{11} & \cdots & \mathbf{D}\_{l}^{11} \\ \vdots & \ddots & \vdots \\ \mathbf{D}\_{l}^{L1} & \cdots & \mathbf{D}\_{l}^{LL} \end{bmatrix}^{-1} \begin{bmatrix} \mathbf{D}\_{l}^{1(L+1)} \\ \vdots \\ \mathbf{D}\_{l}^{L(L+1)} \end{bmatrix}. \end{aligned} \tag{A21}$$

The posterior probability of the latent variable [**<sup>z</sup>***t*+1, **z***t*, ··· , **<sup>z</sup>***t*−*L*+<sup>1</sup>] was obtained with respect to the variable **x**1:*T*, **y**1:*T* as shown in (A22).

$$\begin{split} &P(\mathbf{z}\_{t+1},\mathbf{z}\_{t},\cdots,\mathbf{z}\_{t-L+1}|\mathbf{x}\_{1:T},\mathbf{y}\_{1:T}) \\ &= P(\mathbf{z}\_{t+1},\mathbf{z}\_{t},\cdots,\mathbf{z}\_{t-L+2}|\mathbf{x}\_{1:T},\mathbf{y}\_{1:T})P(\mathbf{z}\_{t-1}|\mathbf{z}\_{t+1},\mathbf{z}\_{t},\cdots,\mathbf{z}\_{t-L+2},\mathbf{x}\_{1:T},\mathbf{y}\_{1:T}) \\ &= N(\mathbf{h}\_{t},\mathbf{H}\_{t}) = N\left(\begin{bmatrix} \mathbf{h}\_{t}^{1} \\ \vdots \\ \mathbf{h}\_{t}^{L+1} \end{bmatrix}, \begin{bmatrix} \mathbf{H}\_{t}^{11} & \cdots & \mathbf{H}\_{t}^{1(L+1)} \\ \vdots & \ddots & \vdots \\ \mathbf{H}\_{t}^{(L+1)1} & \cdots & \mathbf{H}\_{t}^{(L+1)(L+1)} \end{bmatrix}\right). \end{split} \tag{A22}$$

The mean and variance of the distribution were calculated as shown in (A23).

$$\begin{aligned} \mathbf{h}\_{l} &= \left[ \begin{bmatrix} \mathbf{m}\_{l+1} \\ \mathbf{m}\_{l+1}^{1} - \mathbf{d}\_{l}^{1} \\ \vdots \\ \mathbf{m}\_{l+1}^{L} - \mathbf{d}\_{l}^{L} \end{bmatrix} + \mathbf{d}\_{l}^{L+1} \right], \\\ \mathbf{H}\_{l} &= \left[ \begin{bmatrix} \mathbf{M}\_{l+1} & \mathbf{M}\_{l+1} \mathbf{K}^{T} \\ \mathbf{K} \mathbf{M}\_{l+1} & \mathbf{K} \mathbf{M}\_{l+1} \mathbf{K}^{T} + \mathbf{R}\_{l} \end{bmatrix} \text{ where } \mathbf{K} = \begin{bmatrix} \mathbf{D}\_{l}^{1(L+1)} \\ \vdots \\ \mathbf{D}\_{l}^{L(L+1)} \end{bmatrix}^{T} \begin{bmatrix} \mathbf{D}\_{l}^{11} & \cdots & \mathbf{D}\_{l}^{11} \\ \vdots & \ddots & \vdots \\ \mathbf{D}\_{l}^{L1} & \vdots & \mathbf{D}\_{l}^{LL} \end{bmatrix}^{-1}. \end{aligned} \tag{A23}$$

According to the Bayesian smoothing rule [31], the smooth distribution of the latent variable was obtained, as shown in (A24).

$$(\mathbf{z}\_l, \mathbf{z}\_{l-1}, \dots, \mathbf{z}\_{l-L+1} | \mathbf{x}\_{1:T}, \mathbf{y}\_{1:T} \sim N(\mathbf{m}\_l, \mathbf{M}\_l). \tag{A24}$$

The calculation of its mean and variance is shown in (A25).

$$\mathbf{m}\_{l} = \begin{bmatrix} \mathbf{h}\_{l}^{2} \\ \vdots \\ \mathbf{h}\_{l}^{L+1} \end{bmatrix} \quad \mathbf{M}\_{l} = \begin{bmatrix} \mathbf{H}\_{l}^{22} & \cdots & \mathbf{H}\_{l}^{2(L+1)} \\ \vdots & \ddots & \vdots \\ \mathbf{H}\_{l}^{(L+1)2} & \cdots & \mathbf{H}\_{l}^{(L+1)(L+1)} \end{bmatrix}. \tag{A25}$$

### **Appendix C. Properties of Gaussian Distribution**

**Definition A1.** *(Gaussian distribution) A random variable* **x** ∈ *Rn has a Gaussian distribution with mean* **m** ∈ *Rn and covariance* **P** ∈ *Rn*×*n if its probability density has the form.*

$$N(\mathbf{x}|\mathbf{m},\mathbf{P}) = \frac{1}{(2\pi)^{n/2}|\mathbf{P}|^{1/2}} \exp\left(-\frac{1}{2}(\mathbf{x}-\mathbf{m})^{T}\mathbf{P}^{T}(\mathbf{x}-\mathbf{m})\right),\tag{A26}$$

*where* |**P**| *is the determinant of the matrix* **P**.

**Lemma A1.** *(Joint distribution of Gaussian variables) If random variables* **x** ∈ *Rn and* **y** ∈ *R***m** *have the Gaussian probability distributions.*

$$\begin{cases} \mathbf{x} \sim \mathbf{N}(\mathbf{m}, \mathbf{P}),\\ \mathbf{y}|\mathbf{x} \sim \mathbf{N}(\mathbf{H}\mathbf{x} + \mathbf{u}, \mathbf{R}). \end{cases} \tag{A27}$$

*then the joint distribution of* **<sup>x</sup>**,**y** *and the marginal distribution of* **y** *are given as (A28).*

$$\begin{aligned} \mathbf{y} & \begin{pmatrix} \mathbf{x} \\ \mathbf{y} \end{pmatrix} \sim \mathbf{N} \left( \begin{pmatrix} \mathbf{m} \\ \mathbf{H} \mathbf{m} + \mathbf{u} \end{pmatrix} \Big/ \begin{pmatrix} \mathbf{P} & \mathbf{P} \mathbf{H}^{\mathsf{T}} \\ \mathbf{H} \mathbf{P} & \mathbf{H} \mathbf{P} \mathbf{H}^{\mathsf{T}} + \mathbf{R} \end{pmatrix} \right), \\ \mathbf{y} & \sim \mathbf{N} (\mathbf{H} \mathbf{m} + \mathbf{u}, \mathbf{H} \mathbf{P} \mathbf{H}^{\mathsf{T}} + \mathbf{R}). \end{aligned} \tag{A28}$$

**Lemma A2.** *(Conditional distribution of Gaussian variables) If the random variables* **x** *and* **y** *have the joint Gaussian probability distribution.*

$$N\begin{pmatrix} \mathbf{x} \\ \mathbf{y} \end{pmatrix} \sim \mathbf{N} \left( \begin{pmatrix} \mathbf{a} \\ \mathbf{b} \end{pmatrix} \Big/ \begin{pmatrix} \mathbf{A} & \mathbf{C} \\ \mathbf{C}^T & \mathbf{B} \end{pmatrix} \right). \tag{A29}$$

*then the marginal and conditional distributions of* **x** *and* **y** *are given as follows:*

$$\begin{aligned} &\mathbf{x} \sim \mathbf{N}(\mathbf{a}, \mathbf{A}), \\ &\mathbf{y} \sim \mathbf{N}(\mathbf{b}, \mathbf{B}), \\ &\mathbf{x}|\mathbf{y} \sim \mathbf{N}(\mathbf{a} + \mathbf{C}\mathbf{B}^{-1}(\mathbf{y} - \mathbf{b}), \mathbf{A} - \mathbf{C}\mathbf{B}^{-1}\mathbf{C}^{T}), \\ &\mathbf{y}|\mathbf{x} \sim \mathbf{N}(\mathbf{b} + \mathbf{C}^{T}\mathbf{A}^{-1}(\mathbf{x} - \mathbf{a}), \mathbf{B} - \mathbf{C}^{T}\mathbf{A}^{-1}\mathbf{C}). \end{aligned} \tag{A30}$$
