**1. Introduction**

Process monitoring plays an essential role in intelligent manufacturing [1,2]. Statistical process control (SPC) is a quality control technique that uses statistical methods to monitor and control processes. The aim of SPC is to ensure that the process runs efficiently, producing more products that meet specifications, while reducing waste at the same time. Shewhart control charts are a key SPC tool used to determine whether a process is in control. If the observation value is within the upper and lower control limits, the process is in control; otherwise, the process is out of control. Shewhart charts often assume that the data collected from the process are independent. However, this assumption is not true in a variety of processes. For example, consecutive measurements on chemical and pharmaceutical processes or product characteristics are often highly autocorrelated, or the chronological measurements of every characteristic on every unit in automated test and inspection procedures are often autocorrelated. It has been shown in numerous studies that conventional charts do not work well, in the form of giving too many false alarms if the observations exhibit even a small amount of autocorrelation over time [3–9]. Clearly, better approaches are needed. In the following paragraphs, three common ways to solve the problems related to autocorrelation are introduced.

The first approach to solving an autocorrelation problem is simply to sample from the observation data stream at a lower frequency [10]. This seems to be an easy solution, although it has some shortcomings. Concretely speaking, it makes inefficient use of the available data. For example, if every tenth observation is sampled, approximately 90% of the information is discarded. In addition, since only every tenth datum is used, this approach may delay the discovery of a process shift.

The second approach is to re-estimate the real process variance, aiming to revise the upper and lower control limits. See, for example, [9,11–19].

**Citation:** Li, Y.; Li, H.; Chen, Z.; Zhu, Y. An Improved Hidden Markov Model for Monitoring the Process with Autocorrelated Observations. *Energies* **2022**, *15*, 1685. https:// doi.org/10.3390/en15051685

Academic Editor: Abu-Siada Ahmed

Received: 17 January 2022 Accepted: 16 February 2022 Published: 24 February 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors.Licensee MDPI, Basel, Switzerland.This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

<sup>1</sup> College of Economics and Management, Nanjing Forestry University, Nanjing 210037, China; lihaiyan117@njfu.edu.cn

The third approach is to use residual charts. The statistics in these charts are residuals; values are calculated by subtracting predicted values from observed values. In the implementation of residual charts, the key step is choosing a reasonable prediction model to obtain predicted values. Autoregressive integrated moving average (ARIMA) models are mostly used to model autocorrelated data. See, for example, [3,8,20–26]. In addition, multistage or multivariate autocorrelated processes are mostly studied by using Hotelling *T*<sup>2</sup> control charts, such as in [27–30]. Some exceptions include Pan et al., who proposed an overall run length (ORL) to replace *T*<sup>2</sup> charts [31], and S. Yang and C. Yang, who used a residual chart and cause-selecting control chart [32].

With the rapid development of artificial intelligence, machine-learning methods are becoming more and more popular in SPC in the case of observation autocorrelation [33–38]. Most of the related literature focuses on neural networks methods [39–42]. Some successful applications have also been introduced, for example, the failure diagnosis of wind turbine [43], and the prediction of the remaining useful life (RUL) of bearings [44]. Another machine-learning approach proposed in SPC is the hidden Markov model (HMM). Lee et al. proposed a modified HMM, combined with a Hotelling multivariate control chart to perform adaptive fault diagnosis [45]. The HMMs, whose training sets contain autocorrelated data, were employed to forecast observation values for residual charts in process monitoring [46]. However, although HMMs are supposed to deal with the autocorrelated processes, the essence of the model itself is inconsistent with the case of autocorrelations. This is because one key assumption in conventional HMMs is that the observations are independent of each other. Therefore, it is worth developing a modified HMM, by accounting for observation autocorrelations in models themselves, and observing whether it is better than a traditional HMM.

Therefore, to realize the goal of monitoring the autocorrelated process well, the residual chart based on an improved hidden Markov model (IHMM) with autocorrelated observations considered is developed. Due to the autocorrelations, the conventional expectation maximization (EM) algorithm for HMMs is not appropriate. A new EM algorithm is developed for the solutions. The Shewhart residual chart is employed in quality shift detection in conjunction with the IHMM. The residual is defined as the deviation of the predicted value by the IHMM and the current real observation value. Through the residual chart, we are able to see whether the process is in control. If the chart initiates an alarm, one running length is obtained. Thus, the average running lengths (ARLs) can be calculated with sufficient samples. The ARL is set as the comparison index for different models, including IHMM, HMM and AR.

The rest of this paper is organized as follows: Section 2 introduces the development of the IHMM model and its algorithm. In addition, the comparison of the prediction performances of different approaches is presented in Section 2. In Section 3, residual charts are introduced. In Section 4, numerical examples and interesting results are presented. The conclusions and possible areas for future research are given in Section 5.

#### **2. Model Development**

#### *2.1. Hidden Markov Models*

Denote a Markov chain with a finite state set {*<sup>s</sup>*1,*s*2, ··· ,*sN*} by {*Sn*, *n* = 1, 2, ···}. Let *aij* be the probabilities that the Markov chain enters state *sj* from state *si* (1 ≤ *i*, *j* ≤ *N*) and *πi* = *<sup>P</sup>*{*<sup>S</sup>*1 = *si*}, *i* = 1, 2, ··· , *N* be the initial state probabilities. Denote a finite set of signals by *ζ*, and suppose a signal from *ζ* is sent each time the Markov chain enters a state. Suppose that when the Markov chain enters state *sj* independently of previous states and signals, the signal sent is *or* with probability *p or sj* that meets ∑*or*<sup>∈</sup>*<sup>ζ</sup> p or sj* = 1. That is, if *On* represents the *n*th signal, then it can be written by

$$P\left\{O\_1 = o\_r \middle| S\_1 = s\_j\right\} = p(o\_r \middle| s\_j),\tag{1}$$

$$P\left\{O\_n = o\_r \middle| S\_1, O\_{1\prime} \cdots \right\}, S\_{n-1\prime}, O\_{n-1\prime}, S\_n = s\_\dagger\right\} = p(o\_r \vert s\_\dagger). \tag{2}$$

Such a model, in which the signal sequence *O*1,*O*2, ··· can be observed while the underlying Markov chain state sequence *S*1, *S*2, ··· cannot be observed, is called a hidden Markov model [47].

Normally, an HMM contains the following elements [48]:


$$a\_{i\rangle} = P\{S\_{t+1} = s\_{\rangle} | S\_t = s\_{i\}} \, , 1 \le i \, \_\prime j \le N \, \_\prime \sum\_{\vec{j}} a\_{i\vec{j}} = 1 \, . \tag{3}$$

• A distribution of initial state probabilities, *π* = {*<sup>π</sup>i*}, where

$$
\pi\_i = P\{S\_1 = s\_i\}, i = 1, 2, \cdots, N, \sum\_i \pi\_i = 1. \tag{4}
$$

• A conditional probability distribution of the observations, given *St* = *si*, *B* = {*bi*(*qk*)}, where

$$b\_i(q\_k) = P\{o\_t = q\_k | S\_t = s\_i\}, 1 \le i \le N, 1 \le t \le T\_\prime\\\sum\_{q\_k} b\_i(q\_k) = 1. \tag{5}$$

In general, an HMM contains three elements, denoted by *λ* = {*<sup>A</sup>*,*B*, *<sup>π</sup>*}.

#### *2.2. The Improved HMM with Autocorrelated Observations*

#### 2.2.1. The Selection for the Order of Autocorrelation

Different from the traditional HMMs, in which current observations are assumed to be independent of previous states and observations, autocorrelated observations are considered. The observations are assumed to follow a Gaussian distribution, and the current observations are required to be dependent, not only on the current hidden state, but also on the previous observations.

The autocorrelated data from production may be multi-order, and it seems not costeffective for judging this detailed order by using engineering experience and professional knowledge. Therefore, it is hoped to find a suitable order that will keep the implementation of the IHMM feasible, efficient and cost-effective.

The numerical experiment on the order selection for autocorrelated data in SPC application is designed as follows:

Step 1. Generate stationary autocorrelated data by a *d*th-order autoregressive model, AR(*d*), *d* ≥ 2;

Step 2. Use AR(*p*), *p* = *d*, *d* − 1, ... , 1 models to fit the data, respectively, and obtain the parameters that need to be estimated by least-squares estimation;

Step 3. Generate 2000 stationary data sequences with a same length of 2000 under the processes with a mean shift magnitude of *δ*;

Step 4. Predict each sequence by using AR(*p*), *p* = *d*, *d* − 1, ... , 1 models, respectively, and calculate their residuals;

Step 5. Determine the central lines (CLs), upper control limits (UCLs) and lower control limits (LCLs) of *p* residual charts;

Step 6. Generate stationary autocorrelated data by a *d*th-order autoregressive model;

Step 7. Calculate the average running lengths (ARLs) of *p* residual charts.

In this study, twelve AR(2) and eight AR(3) models are used to generate data. The control limit coefficients of residual charts are 3. The shift magnitudes of the mean are set by 0, 1.5 and 3, respectively.

When data are generated by AR(2) models for an in-control process, the ARL of the residual charts is 371.3803 if the data are fitted by AR(2) models, and the ARL is 361.1688 if the data are fitted by AR(1) models. Thus, the probability of the type I error is increased by 2.83%. Similarly, when data are generated by AR(3) models, the ARL of the residual charts is 371.8250 if the data are fitted by AR(3) models, the ARL is 358.5401 if the data are fitted by AR(2) models, and the ARL is 345.8113 if the data are fitted by AR(1) models. Thus, the probabilities of the type I error are increased by 3.7% and 7.51%, respectively.

The ARLs of different situations are shown in Figure 1, in which the numbers on the x-axis represent the combinations of autocorrelation coefficients.

**Figure 1.** The ARL results of the order selection experiment.

From Figure 1, it is seen that generally, AR(1) models outperformed other AR(*d*), *d* ≥ 2 models in shift detection regardless of the autocorrelation order. Although AR(1) models lead to a slight increase in type I error, it seems to be insignificant compared with their good performances in detecting quality shifts. Therefore, only the first-order autocorrelation is considered in this study. As a result, there is no need to judge what order the autocorrelation is so that the modeling cost can be saved.

#### 2.2.2. The Development of the IHMM

According to the analysis in Section 2.2.1, the construction of the IHMM, in which the current observation is related to its previous observation, is shown in Figure 2.

**Figure 2.** The construction of the IHMM.

In Figure 2, arrows pointing to the right indicate the correlation between neighboring states or observations. It is intuitive that the IHMM will become a traditional HMM if there is no correlation between neighboring observations.

Let *oi*(*t*) be the observation value at time *t* given state *si*; *oi*(*t*) can be fitted with the following function:

$$
\rho\_i(t) = \varsigma\_i + \varepsilon\_i \omicron\_{t-1} + \varepsilon\_i, 1 \le i \le N, 1 \le t \le T,\tag{6}
$$

where *ci* is the first-order autocorrelation coefficient, *ςi* is the constant term and *εi* is white noise, following a normal distribution with mean zero and variance *σ*2*i*.

Let *μi*(*t*) be the mean of *oi*(*t*) given state *si*, *xt* = (1, *ot*−<sup>1</sup>)-, then Equation (6) can be written as:

$$
\mu\_i(t) = \mathcal{C}\_i \mathbf{x}\_{t\prime} \tag{7}
$$

where *Ci* is a 1 × 2 matrix consisting of {*<sup>ς</sup>i*} and {*ci*}.

Hence, we conclude that the states are a Markov process, and that the conditional observations given state *si* follows a Gaussian distribution with mean *Cixt* and variance *σ*2*i*, i.e.,

$$P\{S\_{t+1}|S\_{1\prime}\cdots\cdot, S\_{t\prime}o\_{1\prime}\cdots\cdot, o\_{t}\} = p(S\_{t+1}|S\_{t}),\tag{8}$$

$$b\_i(o\_t|o\_{t-1}) = P(o\_t|o\_{t-1}, S\_t = s\_i) = \frac{1}{\sqrt{2\pi}\sigma\_i} \exp\left(-\frac{\left(o\_t - \mathbf{C}\_i\mathbf{x}\_t\right)^2}{2\sigma\_i^2}\right). \tag{9}$$

If the current observation is not related to its previous observations, *μi*(*t*) is the constant *ςi*. Thus, the IHMM becomes a traditional HMM.

#### *2.3. Parameter Estimation*

The aim of using an IHMM is to forecast observation values. Firstly, we estimate the parameters to obtain an optimal *λ* ˆ by maximizing the probability *<sup>P</sup>*(*o*|*λ*).

The EM algorithm is a popular method to estimate the parameters for HMMs. However, since autocorrelation between observations is considered, the traditional algorithm could not be used directly. We redefine the parameters as *λ* = '*<sup>A</sup>*,*C*, *σ*2, *π*( where *C* = {*<sup>C</sup>i*}, *σ*<sup>2</sup> = '*σ*2*i* (, and the definitions of *Ci* and *σ*2*i* are provided in Section 2.2.2. The flowchart demonstrating how to estimate the parameters in IHMM with an improved EM algorithm is shown in Figure 3.

**Figure 3.** The flowchart of the improved EM algorithm.

Firstly, we introduce two types of probabilities: forward probability, *Ft*(*i*), and backward probability, *Bt*(*i*), defined as:

$$F\_l(i) = P(o\_{1\prime} \cdot \cdot \cdot \text{ , } o\_{l\prime} S\_l = s\_l | \lambda \text{ )},\tag{10}$$

$$B\_t(i) = P(o\_{t+1}, \dots, o\_T | S\_t = s\_{i\prime} \lambda). \tag{11}$$

The calculation of the two probabilities here is similar to that of traditional HMMs. Slightly different from traditional HMMs, the initial values are defined by *Fd*+<sup>1</sup>(*i*) = *<sup>π</sup>ibi*(*od*+<sup>1</sup>|*<sup>o</sup>*1, ··· , *od* ) and *BT*(*i*) = 1, respectively.

Based on the forward and backward probabilities, some intermediate probabilities are computed. Given *λ* and *o*, denote the probability that the process is in state *si* at time *t* by *<sup>γ</sup>t*(*i*) and the probability that the process is in state *si* at time *t* and in state *sj* at time *t* + 1 by *ξt*(*<sup>i</sup>*, *j*). *<sup>γ</sup>t*(*i*) and *ξt*(*<sup>i</sup>*, *j*) can be written as Equations (12) and (13). Please refer to reference [48] for the details of the derivations.

$$\gamma\_t(i) = \frac{F\_t(i)B\_t(i)}{\sum\_j F\_t(i)B\_t(i)},\tag{12}$$

$$\xi\_t(i,j) = \frac{F\_t(i)a\_{ij}b\_j(o\_{t+1})B\_{t+1}(j)}{\sum\_{i=1}^N \sum\_{j=1}^N F\_t(i)a\_{ij}b\_j(o\_{t+1})B\_{t+1}(j)}.\tag{13}$$

Next, we develop an improved EM algorithm for the IHMM, step by step.

Step 1. Determine the log-likelihood function for complete data.

The complete data are (*o*, *S*) = (*<sup>o</sup>*1, ··· , *oT*,*s*1, ··· ,*sT*), and its log-likelihood function is log *<sup>P</sup>*(*<sup>o</sup>*,*<sup>s</sup>*|*λ*).

Step 2. E-step: determine *Q* functions.

$$Q\_1\left(\lambda,\lambda^{(n)}\right) = E\_s\left[\log P(\mathbf{o},\mathbf{s}|\lambda) \Big| \mathbf{o},\lambda^{(n)}\right] = \sum\_s \log P(\mathbf{o},\mathbf{s}|\lambda) P(\mathbf{o},\mathbf{s}|\lambda^{(n)}),\tag{14}$$

where *λ* is the parameter that will maximize the *Q*1 function, while *λ*(*n*) is the current parameter value.

$$\begin{array}{ccccc}\text{With }P(\mathfrak{o},\mathfrak{s}|\lambda) &=& \pi\_{\mathfrak{s}\_1}b\_{\mathfrak{s}\_1}(o\_1)a\_{\mathfrak{s}\_1\mathfrak{s}\_2}b\_{\mathfrak{s}\_2}(o\_2)\cdots a\_{\mathfrak{s}\_{T-1}\mathfrak{s}\_T}b\_{\mathfrak{s}\_T}(o\_T), & Q\_1\left(\lambda,\lambda^{(n)}\right) & \text{can be written} \\ \text{then by} & & & & \\ & & & \cdot \quad \cdot \quad \cdot \quad \cdot \quad \cdot \quad \cdot \quad \cdot \quad \cdot \quad \cdot \quad \cdot \quad \cdot \quad \cdot \end{array}$$

$$\begin{split} Q\_1\left(\boldsymbol{\lambda},\boldsymbol{\lambda}^{(n)}\right) &= \sum\_{\boldsymbol{s}} \pi\_{\boldsymbol{s}\_1} \log P\left(\boldsymbol{o},\mathbf{s} \middle| \boldsymbol{\lambda}^{(n)}\right) \\ &+ \sum\_{\boldsymbol{s}} \left(\sum\_{t=1}^{T-1} \log a\_{\boldsymbol{s}\_t \boldsymbol{s}\_{t+1}}\right) P\left(\boldsymbol{o},\mathbf{s} \middle| \boldsymbol{\lambda}^{(n)}\right) \\ &+ \sum\_{\boldsymbol{s}} \left(\sum\_{t=1}^{T} \log b\_{\boldsymbol{s}\_t}(o\_t)\right) P\left(\boldsymbol{o},\mathbf{s} \middle| \boldsymbol{\lambda}^{(n)}\right). \end{split} \tag{15}$$

For the re-estimation of *bi*(*qk*), one more auxiliary function *Q*2 is proposed by taking the conditional expectation of the log-likelihood of the observation sequence:

$$Q\_2\left(\lambda\_\prime \lambda^{(\eta)}\right) = \sum\_s \sum\_{t=1}^T \gamma\_t(i) \left( \ln\left(\frac{1}{\sqrt{2\pi}}\right) + \ln \frac{1}{\sigma\_i} - \frac{\left(o\_t - \mathbb{C}\_i \mathbf{x}\_t\right)^2}{2\sigma\_i^2} \right). \tag{16}$$

Step 3. M-step: re-estimation.

Re-estimate *λ* that maximizes *<sup>Q</sup>*1*<sup>λ</sup>*, *λ*(*n*)-, that is

$$
\lambda^{(n+1)} = \underset{\lambda}{\text{argmax}} Q\_1 \left( \lambda, \lambda^{(n)} \right). \tag{17}
$$

The state transition probabilities are derived as:

$$a\_{i\hat{j}}^{\*(n+1)} = \frac{\sum\_{t=1}^{T-1} \zeta\_t(i,j)}{\sum\_{t=1}^{T-1} \gamma\_t(i)}.\tag{18}$$

The initial state probabilities are derived as:

$$
\pi\_i^{(n+1)} = \gamma\_1(i). \tag{19}
$$

By re-estimating *λ* that maximizes *<sup>Q</sup>*2*<sup>λ</sup>*, *λ*(*n*)-, *ci*,*<sup>τ</sup>*(*n*+<sup>1</sup>), *<sup>ς</sup>i*(*n*+<sup>1</sup>), *σ*2*i* (*n*+<sup>1</sup>) can be written as: 

$$c\_{t}^{\cdot(n+1)} = \frac{\sum\_{t=1}^{T} \gamma\_{t}(i) o\_{t-1} \left( o\_{t} - \frac{\sum\_{t=1}^{T} \gamma\_{t}(i) o\_{t}}{\sum\_{t=1}^{T} \gamma\_{t}(i)} \right)}{\sum\_{t=1}^{T} \gamma\_{t}(i) o\_{t-1} \left( o\_{t-1} - \frac{\sum\_{t=1}^{T} \gamma\_{t}(i) o\_{t-1}}{\sum\_{t=1}^{T} \gamma\_{t}(i)} \right)} \,\tag{20}$$

$$\varsigma\_{i}^{(n+1)} = \frac{\sum\_{t=1}^{T} \gamma\_{t}(i)o\_{t} - \sum\_{t=1}^{T} \gamma\_{t}(i)o\_{t-1}c\_{i}^{(n+1)}}{\sum\_{t=1}^{T} \gamma\_{t}(i)},\tag{21}$$

$$
\sigma\_i^{2(n+1)} = \frac{\sum\_{t=1}^T \gamma\_t(i) \left(o\_t - c\_i^{(n+1)} o\_{t-1} - \varsigma\_i^{(n+1)}\right)^2}{\sum\_{t=1}^T \gamma\_t(i)}.\tag{22}
$$

Repeat Step 2 and Step 3 until the log-likelihood function converges.
