**1. Introduction**

With the development of technology, the operation of power systems has an increasing demand for higher reliability, lower environmental risks, and higher human safety. Power system failures usually come at the cost of high maintenance costs and uncertain downtime. However, it is difficult to obtain accurate health status and predict failures of a power system in time. Thus, power system health prognosis is an important topic in reliability and maintenance engineering, which determines how to properly integrate positioning degradation data into power system fault detection and failure prevention [1]. Health prognosis involves evaluating the current state, classifying the current state of several failure modes, and predicting the residual lifetime of the power system. Residual lifetime refers to the remaining life from the current health status to the functional failure of the system.

As a probabilistic statistical method, the hidden Markov model (HMM) has a good randomness representation ability and potential structural relationship description ability. It is widely used in the field of complex system modeling. Carey first migrated HMM

**Citation:** Liu, Q.; Li, D.; Liu, W.; Xia, T.; Li, J. A Novel Health Prognosis Method for a Power System Based on a High-Order Hidden Semi-Markov Model. *Energies* **2021**, *14*, 8208. https://doi.org/10.3390/en14248208

Academic Editor: Andrea Mariscotti

Received: 28 October 2021 Accepted: 3 December 2021 Published: 7 December 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

from the field of speech recognition to the field of power system fault diagnosis, creating a new research direction of fault diagnosis [2]. Researchers proposed an improved hidden Semi-Markov model (HSMM) to ge<sup>t</sup> rid of the constraints of HMM characteristics, which further divided the macro-states into micro-states, and made a priori assumptions about the time distribution of each macro-state to successfully realize the application of HMM in the field of residual lifetime prediction. Liu et al. proposed an HSMM of information fusion, completed the diagnosis and life prediction of equipment, and achieved good results [3].

System health prognosis is a complex process, and the real-time operation of the power system is particularly difficult. Most of the methods in the field of reliability involve intensive calculation and processing of a large number of historical data [4,5]. The methods are usually divided into three categories: model-driven models, physical models, and data-driven models.

The design of the physical model considers the operating conditions to model the operating process of the system. The models integrate the physical features and numeric features of the system under monitoring and develop the mappings between parameters and prognostics features, such as rotating machinery [6,7] and lithium-ion batteries [8]. However, it is difficult to guarantee its accuracy. There is no perfect physical model to adapt to the system.

The data-driven model is suitable for systems with little prior knowledge or a complex structure because of the characteristics of using existing data to predict the health of the system. Moreover, they rely solely on the past observed trajectories [9], including neural networks [10–12], support vector machines [13,14], and Gaussian process regression [15]. Finding the effective part of the original data is an arduous task. The main disadvantages of this model are slow convergence and the ease in which it falls into local minimums; these shortcomings limit the application of this model.

Model-driven models focus on the prediction level fusion of information. For the first time, Dong et al. used HSMM to conduct research on various sensor diagnoses [16]. Then, several variants of Markovian-based models were applied to the diagnosis and prognostics of equipment. Yan et al. used HSMM-based equipment health estimation and proposed several new methods to effectively predict equipment RUL [17]. Li et al. proposed an improved HMM that is improved through performance degradation and successfully applied it to the reliability evaluation of wind turbine bearings [18]. Huang et al. proposed an improved HMM based on a predictive neural network and applied it to a motor drive system to evaluate the proposed model [19]. These studies performed well in different domains, whereas the modification was mostly done in the modeling part, and little consideration was given to signal preprocessing and parameters estimation' furthermore, the recognizing and training processes of model-driven models are usually time-consuming, so they are applied to offline health prognosis.

In order to better refer to the historical statistical information to improve the model recognition rate, a high-order hidden Markov model (HOHMM) was proposed by researchers. For the problem of parameter explosion and a more complex derivation of the high-order model, researchers have established an order reduction algorithm, ORED, to simplify HOHMM. Basically, the fast incremental algorithm was proposed to train HOHMM and the three problems were discussed, respectively [20]. Compared with the successive order reduction of the ORED algorithm, Hadar proposed a more complex algorithm based on the idea of equivalent transformation, which transformed any high-order model into the corresponding first-order model [21]. The above researchers provide a general research method for the research of the high-order model. However, in the process of reducing the order of HOHMM to HMM, it is necessary to re-deduce each variable of the model and re-evaluate the parameters based on the Baum–Welch algorithm. With the increase of the order of the model, the workload will increase explosively. Dong et al. established the HO-HMMAR model, deduced variables, and better solved the optimal portfolio problem [22]. Heng et al. established the daily average temperature evolution model and assumed that the asymmetric component was a high-order hidden Markov

process. The results showed that the model can effectively capture the characteristics of temperature data under various conditions Polynomial fitting was a simple tool for nonlinear fitting, which has efficient data processing ability [23]. Ritesh et al. used the coefficients of polynomial fitting to generate the feature vector of iris recognition and verified the effectiveness of the polynomial method through the benchmark IITD and casia-v4 interval iris data [24]. With the popularization of neural networks and various machine learning methods, the flashpoint of polynomial fitting became dimmer and dimmer in the field of fault diagnosis. However, the polynomial fitting is one of the few methods that can directly write the analytical formula of a nonlinear relationship. It cannot be done by the neural network.

The problem of parameter estimation has always been the most difficult link of the probability model. The EM algorithm with a heavy workload depends on the initial value and can easily fall into local optimization; thus, more and more researchers are turning their attention to various intelligent optimization algorithms. The algorithm of Yu [13] and other genetic algorithms were combined within the grey model. The numerical results showed that the new method combined with a genetic algorithm (GA) can greatly improve the prediction accuracy. Krause [14] et al. used the function approximator based on a GA to estimate the parameters of the induction motor, which achieved better results than other methods. Zhang [15] and others used the artificial immune algorithm to optimize HMM and obtained the initial observation matrix with the highest identification degree. Zhang [16] and others used an adaptive genetic particle swarm optimization algorithm to optimize HSMM and obtained more accurate results than traditional HSMM. How an intelligent simulation algorithm can be used to simplify the research process of HMM is a point that should be paid attention to in future research.

In order to obtain more accurate and practical predictions under the premise of unknown distribution, the contribution of this paper is to propose an improved HOHSMM for the health prognosis of the power system. First, the framework of HOHSMM based on HSMM is established. By considering the ORED algorithm and Hadar's equivalent transformation, a model reduction method based on permutation is proposed, which uses the definition of the high-order hidden Markov group model to transform it into the corresponding first-order model, and the solution of three problems of the low-order model can be used in the high-order complex model. Moreover, the transition probability matrix and observation probability matrix are deformed so that the interdependence information of nodes in the high-order complex model is naturally integrated into the model parameters, and the effect of simplifying the model is achieved. Then, the auxiliary resident variables are defined and deduced, and the parameter group carrying more dependency information is estimated by using the intelligent optimization algorithm group. The maximizing occurrence probability of observation is described as objective, the decomposition dependency of the high-order model is represented by the parameter group, and the complexity is transferred from the model itself to the parameter group. The polynomial fitting method is used to fit each resident variable sequence, and the residual lifetime of the power system is predicted when the distribution is unknown. Finally, the proposed model is verified and evaluated with one power system data set. The results show that the method proposed in this paper is feasible and effective.

This article aims to develop a more superior power system health prognostics method. The paper is organized as follows: Section 2 introduces first-order and high-order HMM, and Section 3 develops an improved high-order hidden semi-Markov model. Then, the residual lifetime prognosis method of this paper is proposed in Section 4. Section 5 analyzes and discusses the case study of this article, and finally the article is concluded in Section 6.

#### **2. Hidden Markov Model**

#### *2.1. First-Order HMM*

The first-order HMM can be described as *λ* = (*<sup>π</sup>*, *A*, *<sup>B</sup>*). It is composed of observable nodes and hidden state nodes. The health state of the node at time *t* is represented by

*Statet*, then the transition between the hidden state of the hidden state node conforms to the Markov property.

$$\text{Prob}(State\_l | State\_1 \cdot \cdot \cdotState\_{l-1}) = \text{Prob}(State\_l |State\_{l-1})$$

Note that the total number of health states is *N* and the total number of observations is *M*. *π* is the initial state distribution vector in the model, and it is the probability value of each state of power system when *t* = 1, *π* = [*π*]<sup>1</sup>×*N*. *A* is the transition probability matrix, and *A* = *aij<sup>N</sup>*×*N*. *aij* is the probability value from state *i* to state *j*, and *aij* = Prob(*Statet*+<sup>1</sup> = *jStatet* = *i*), *N* ∑ *i*=1 *aij* = 1, *N* ∑ *j*=1 *aij* = 1. *B* is the observation probability matrix, and *B* = [*bi*(*k*)]*<sup>N</sup>*×*M*; it is the probability value of the observation generated by the node under state *i* and Prob(*ot* = *k*|*Statet* = *i*). *ot* represents the observation value at time *t*, and meets *N* ∑ *i*=1*bi*(*k*)=1 and *M* ∑ *k*=1*bi*(*k*)=1.

In order to solve the evaluation problem of HMM, that is, Prob(*O*|λ), forward– backward variables based on HMM are usually developed, and the evaluation problem is decomposed into recursive expressions of forward–backward variables. In order to solve the learning problem, the Baum–Welch algorithm is usually used to solve the parameter set <sup>−</sup>*λ*, which may optimize the model and produce the current observations. Its idea is to establish the Lagrange multiplier equation between the current parameter group *λ* and the maximum parameter group *λ*, and calculate the partial derivative under the constraints of each parameter to optimize the current parameters. In order to solve the prognosis problem, dynamic programming is applied to HMM to produce the Viterbi algorithm to find the most likely health state path. Due to its own characteristics, the conventional HMM has the defects that it must obey the exponential distribution and cannot describe the deterioration of the power system.

#### *2.2. The High-Order HMM*

The high-order hidden Markov model (HOHMM) is a generalization of the first-order Markov model, and it retains more historical statistical information. It is assumed that the current health state of the research object is related to the previous health states. Taking an *n*-order hidden Markov model as an example, its health state at time *t*(*t* > *n* + 1) is related to the health state at the previous *n* times, and it is expressed as

$$\text{Prob}(State\_l |State\_1 \cdot \cdot \cdot state\_{l-1}) = \text{Prob}(State\_l |State\_{l-N} \cdot \cdot \cdot state\_{t-1})$$

HOHMM is different from the conventional HMM, and it is described as *λ* = *<sup>π</sup>*, *A*, *B*, *A*˘, *<sup>B</sup>*˘, where *A*˘ is the state transition matrix only applicable to the previous *n* times and *B* ˘ is the observation probability matrix only applicable to the previous *n* times. HOHMM also has three problems from HMM. However, due to relatively complex node dependencies, the dependencies become more complex with the increase of orders, and the model parameters will increase exponentially. Thus, the research on HOHMM generally focuses on model order reduction. HOHMM improves the model recognition rate on the basis of retaining more historical statistical information, but it still fails to overcome the shortcomings of HMM.

#### **3. Improved High-Order Hidden Semi-Markov Model**

In this paper, by considering the shortcomings of HMM and the advantages of highorder modeling, an improved high-order hidden semi-Markov model (HOHSMM) is proposed based on HSMM.

Taking the second-order HSMM as an example, the model can be described as *λ* = *<sup>π</sup>*, *A*, *B*, *A*˘, *<sup>B</sup>*˘. It is usually assumed that each sub-state conforms to the same time distribution, and the topology of a second-order HSMM is described in Figure 1.

**Figure 1.** The illustration of two-order HSMM.

#### *3.1. The Model Order Reduction Based on Permutation Mapping*

Similarly, taking the second-order HOHSMM as an example, due to the structural changes of the second-order model, the model parameters and related algorithms can be changed accordingly. The respective algorithms for solving the three problems of the low-order model are not applicable to the high-order model. Thus, this paper proposes a model reduction method based on combinatorial mapping, and it is essential to merge the hidden state nodes corresponding to two adjacent time points in the second-order model into one node; then, the merged nodes can be modeled by the Markov process, as shown in Figure 2.

**Figure 2.** The illustration of the model reduction based on permutation. (**a**): Division of compound nodes; (**b**): Markov chain model with reduced order after compound nodes.

In this paper, a health state in HSMM generates a segmen<sup>t</sup> of observations, as opposed to a single observation in the HMM. Thus, the states in a segmen<sup>t</sup> semi-Markov model are called super states. Each super state consists of several single states, which are called son states, which can be seen in Figure 2. Figure 2a is a division of the model on the second-order HOHSMM, and the adjacent state nodes can be combined into a new, larger node. The relationship between the two health states in the new node has no impact on the Markov property of the whole model. The Markov property of the new model after order reduction can be described as Equation (1).

Prob(*Statet*, *Statet*−<sup>1</sup>|*Statet*−<sup>1</sup> ··· *State*1) = Prob(*Statet*, *Statet*−<sup>1</sup>|*Statet*−1, *Statet*−<sup>2</sup>)(<sup>t</sup> ≥ 3) (1)

> The description of time in the new model also changes *t* to (*t*, *t* + <sup>1</sup>). If the first element of the time combination (*t*, *t* + 1) is the unique index of the time of the new model, the time can also be expressed as ˘*t*. ˘*t* is different from *t* in the mathematical sense, but in the physical sense, ˘*t* = *t*. When the implicit state node of the new model combination time ˘*t* is expressed as (*State*˘*t*, *Statet*+<sup>1</sup>), it generates two groups of observations, o˘*t* and o˘*t*+1, where, o˘*t*+<sup>1</sup> completely depends on (*State*˘*t*, *Statet*+<sup>1</sup>) and o˘*t* completely depends on *State* ˘ *t*−1, *Statet*. Moreover, at time ˘*t* − 1, the observations connected by dotted lines are the same observation in Figure 2b, and the sub-state set connected by dotted lines is also the same sub-state set. Thus, each group of observations can be obtained from the unique combined hidden state node, while the unique determined observations corresponding to the combined hidden state can be only retained (except the initial time) in its topology.

> In the reduced order model, taking the combination of hidden states as the modeling object, different states are essentially an arrangemen<sup>t</sup> problem. Based on the above partial definitions of HMM parameters, if a second-order HOHSMM has *N* different super states, and considering the power system has performance degradation and the state is irreversible, then there are *<sup>C</sup>*2*N* + *N* states appearing in the original model. In this paper, a simple mapping between the arrangemen<sup>t</sup> scheme and natural numbers is introduced, as shown in Equation (2), where *i*, *j* are states, respectively.

$$Index = \text{Mapping}(i, j) = 4i + j - \sum\_{z=0}^{i} (i, j \in (0, 1 \cdots N - 1), j \ge i) \tag{2}$$

In the parameters of the second-order HOHSMM original model, the transition probability matrix becomes the transition probability cube of *N* × *N* × *N* and the observation probability matrix becomes the observation probability cube of *N* × *N* × *M*. The initial probability distribution remains unchanged and the initial transition probability matrix and the initial observation probability matrix are the same as the first-order model. The new model parameters after order reduction are composed of a transition probability matrix (*C*2*N* + *N*) × (*C*2*N* + *<sup>N</sup>*), observation probability matrix (*C*2*N* + *N*) × *M*, initial probability distribution, initial transition probability matrix, and initial observation probability matrix. Letting the reduced-order transition probability matrix be *A*ˆ with the element *<sup>a</sup>*<sup>ˆ</sup>*ij*(*<sup>i</sup>*, *j* ∈ (0, ··· *N* − <sup>1</sup>)), the reduced-order observation probability matrix is *B*ˆ, and the element is ˆ *bij*(*<sup>i</sup>*, *j* ∈ (0, ··· *N* − <sup>1</sup>)). For *λ* = *<sup>π</sup>*, *A*, *B*, *A*˘, *<sup>B</sup>*˘, the element of *π* is *<sup>π</sup>i*(*<sup>i</sup>* ∈ (0, ··· *N* − <sup>1</sup>)), the element of *A*˘ is *<sup>a</sup>*˘*ij*(*<sup>i</sup>*, *j* ∈ (0, ··· *N* − <sup>1</sup>)), and the element of *B*˘ is ˘ *bi*(*j*)(*<sup>i</sup>*, *j* ∈ (0, ··· *N* − <sup>1</sup>)).

Based on the order reduction method in Section 3.1, the transition probability matrix *A* ˆ ((*C*2*N* + *N*) × (*C*2*N* + *N*)) after dimension reduction can be obtained. The transition probability matrix *A* ˆ is sparse, and the actual amount of effective data in the matrix that is not 0 is

$$\sum\_{j=0}^{N-1} \left( -j^2 + jN + N - j \right) \tag{3}$$

where *j* is the second state in the state arrangemen<sup>t</sup> (*i*, *j*). Taking the second-order HOHSMM with *n* = 4 as an example, the effective data of the reduced probability transfer matrix of order reduction are 20. In Table 1, the positions represented by 1 and 1\* are the effective data, the composite state represented by the column header at the position of 1\* is the main state, and the composite state represented by the column header at the position of 1 is defined as the transition state, while the location represented by 0 is invalid data.


**Table 1.** Sparse representation of two-order HOHSMM reduced-order transition probability matrix with 4 states.

\*: Main state; Shadow: Valid data.

#### *3.2. The Model Reasoning*

In view of the idea of a forward–backward algorithm, a linger time (LT) mechanism is introduced, and an auxiliary variable *ξt*(*i*) is established, as is described in Equation (4).

$$\xi\_t(i,d) = \text{Prob}(O\_{[1:t]}, LT((i,i)) = d \Big| \lambda) \tag{4} \tag{4}$$

The probability that the observation sequence *<sup>O</sup>*[1:*t*] is generated at the cut-off time *t* and has stayed in the current state for *d* time under a given model parameter group *λ* can be obtained by Equation (4).

In this section, the sparse representation of the transition probability matrix of the general second-order HOHSMM is given, and the significance of the main state and the transition state can be described, respectively. The transition among the different main health states of the conventional second-order model is a gradual process.

Main state → (Transition state sequence) → Next main state

This transformation process needs at most *j* − *i* + 2 time points to be fully described. Taking the transition from the health state (0,0) to (3,3) as an example, the transition process needs five time points to be described, as shown in Figure 3.

**Figure 3.** Schematic diagram of main state transition process.

In order to facilitate model reasoning, a transition variable *<sup>ϑ</sup>t*(*j*, *i*, *Road*) is defined, representing the process intermediate value from the main state (*j*, *j*) to the time main state (*i*, *i*) of *t* time. The main states, by being transferred out, can be defined as acceptance states,

and the main states being transferred in can be defined as return states. → *t* is defined as the time point when the acceptance states appear.

(1) For *i* = 0 and *d* = *t*, *ξt*(*i*) can be described as

$$\xi\_l^x(0,d) = \text{Prob}(O\_{[1:t]}, LT(0) = d \Big| \lambda) \tag{5}$$

The recursive initial value is

$$
\xi\_t(0,1) = \pi\_0 \check{b}\_0(o\_1)\check{b}\_0(o\_2)\mathfrak{d}\_{00} \tag{6}
$$

The inner transfer recursion is

$$\mathfrak{a}\_t^\pi(0,d) = \mathfrak{z}\_{t-1}(0,d-1)\mathfrak{a}\_{(0,0)(0,0)}(t-1)\,\hat{\mathfrak{z}}\_{(0,0)}(o\_t) \tag{7}$$

(2) For *i* = 1, *ξt*(*i*) can be described as

$$\xi\_t(1,d) = \text{Prob}(O\_{[1:t]}, LT(1) = d \Big| \lambda) \tag{8}$$

The acceptance state can only be (0,0) and the transition state can only be (0,1). Then, the acceptance value is *ξ<sup>t</sup>*−*<sup>d</sup>*(0, *t* − *d*), *<sup>ϑ</sup>t*(0, 1, *Road*) = *<sup>a</sup>*<sup>ˆ</sup>(0,0)(0,1)(*<sup>t</sup>* − *d* + <sup>1</sup>)<sup>ˆ</sup>*b*(0,1)(*ot*−*d*+<sup>1</sup>). Thus, the recursive median is described as *ξt*(1, <sup>0</sup>)=*ξ<sup>t</sup>*−*<sup>d</sup>*(0, *t* − *<sup>d</sup>*)*a*<sup>ˆ</sup>(0,0)(0,1)(*<sup>t</sup>* − *d* + <sup>1</sup>)<sup>ˆ</sup>*b*(0,1) (*ot*−*d*+<sup>1</sup>)*a*<sup>ˆ</sup>(0,1)(1,1)(*<sup>t</sup>* − *d* + <sup>2</sup>)<sup>ˆ</sup>*b*(1,1)(*ot*−*d*+<sup>2</sup>), and the recursive equation of internal transfer is

$$\mathfrak{z}\_t^\mathbf{r}(1,d) = \mathfrak{z}\_{t-\Delta t}(1,d-1)\mathfrak{z}\_{(1,1)(1,1)}(t-\Delta t)\hat{b}\_{(1,1)}(o\_t) \tag{9}$$

(3) For *I* = 2, the acceptance states can be (0,0) and (1,1) and *ξt*(*i*) can be described as

$$\xi\_l(2,d) = \text{Prob}\left(O\_{[1:t]}, LT(2) = d \, \middle| \, \lambda \right) \tag{10}$$

The inner transfer recursion is

$$
\mathfrak{F}\_t(2,d) = \mathfrak{F}\_{t-1}(2,d-1)\mathfrak{d}\_{(2,2)(2,2)}(t-1)\,\hat{\mathfrak{d}}\_{(2,2)}(o\_t) \tag{11}
$$

Sit1: When the acceptance state is (0,0), it needs four time points to describe from (0,0) to (2,2) and the acceptance value is

$$\mathcal{S}\_{\vec{t}}^{\rightarrow} \begin{pmatrix} 0, \stackrel{\rightarrow}{t} \\ \end{pmatrix} \stackrel{\rightarrow}{t} \in \{ t - d - 3, t - d - 2 \} \tag{12}$$

The corresponding *<sup>ϑ</sup>t*(0, 2, *Road*) can be described as

$$\begin{cases} \mathfrak{a}\_{(0,0)(0,1)}\left(\stackrel{\rightarrow}{t}\right)\widehat{b}\_{(0,1)}\left(o\_{t+1}^{\rightarrow}\right)\mathfrak{a}\_{(0,1)(1,2)}\left(\stackrel{\rightarrow}{t}+1\right)\widehat{b}\_{(1,2)}\left(o\_{t+2}^{\rightarrow}\right)\mathfrak{a}\_{(1,2)(2,2)}\left(\stackrel{\rightarrow}{t}+2\right)\widehat{b}\_{(2,2)}\left(o\_{t+3}^{\rightarrow}\right)\stackrel{\rightarrow}{t} = t-d-3\\ \mathfrak{a}\_{(0,0)(0,2)}\left(\stackrel{\rightarrow}{t}\right)\widehat{b}\_{(0,2)}\left(o\_{t-d-1}^{\rightarrow}\right)\mathfrak{a}\_{(0,2)(2,2)}\left(\stackrel{\rightarrow}{t}+1\right)\mathfrak{b}\_{(2,2)}\left(o\_{t+2}^{\rightarrow}\right)\mathfrak{a}\_{(t+2)}\left(\stackrel{\rightarrow}{t}\right) = t-d-2 \end{cases} \tag{13}$$

The corresponding return value is given by Equation (14).

$$\begin{cases} \begin{array}{c} \xi\_{-\frac{\gamma}{t}} \left( 0, \frac{\rightarrow}{t} \right) \mathfrak{h}\_{(0,0)(0,1)} \left( \stackrel{\rightarrow}{t} \right) \widehat{\mathfrak{h}}\_{(0,1)} \left( \stackrel{\rightarrow}{o^{-}\_{t+1}} \right) \mathfrak{h}\_{(0,1)(1,2)} \left( \stackrel{\rightarrow}{t}+1 \right) \widehat{\mathfrak{h}}\_{(1,2)} \left( \stackrel{\rightarrow}{o^{-}\_{t+2}} \right) \mathfrak{h}\_{(1,2)(2,2)} \left( \stackrel{\rightarrow}{t}+2 \right) \widehat{\mathfrak{h}}\_{(2,2)} \left( \stackrel{\rightarrow}{o^{+}\_{t+3}} \right) \stackrel{\rightarrow}{t} = t-d-3\\ \xi\_{\frac{\gamma}{t}} \left( 0, \frac{\rightarrow}{t} \right) \mathfrak{h}\_{(0,0)(0,2)} \left( \stackrel{\rightarrow}{t} \right) \widehat{\mathfrak{h}}\_{(0,2)} \left( \stackrel{\rightarrow}{o^{-}\_{t-d-1}} \right) \mathfrak{h}\_{(0,2)(2,2)} \left( \stackrel{\rightarrow}{t}+1 \right) \widehat{\mathfrak{h}}\_{(2,2)} \left( \stackrel{\rightarrow}{o^{+}\_{t+2}} \right) \stackrel{\rightarrow}{t} = t-d-2 \end{cases} \tag{14}$$

Sit2: When the acceptance state is (1,1), it needs three time points to describe from (1,1) to (2,2) and the corresponding → *t* = *t* − *d* − 2 acceptance value is

$$\sum\_{d\mu=1}^{\overrightarrow{t}} \xi\_{\overrightarrow{t}}^{\overrightarrow{t}}(1, d\_{\mu})$$

*<sup>ϑ</sup>t*(1, 2, *Road*) is

$$\mathfrak{a}\_{(1,1)(1,2)}\left(\stackrel{\rightarrow}{t}\right)\hat{b}\_{(0,1)}\left(o\_{\stackrel{\rightarrow}{t}+1}\right)\mathfrak{a}\_{(1,2)(2,2)}\left(\stackrel{\rightarrow}{t}+1\right)\hat{b}\_{(2,2)}\left(o\_{\stackrel{\rightarrow}{t}+2}\right)\tag{15}$$

The return value is

$$\sum\_{t=u+1}^{\overrightarrow{t}-1} \mathfrak{c}\_{\frac{t}{t}}^{\rightarrow}(1,d\_{u})\hat{a}\_{(1,1)(1,2)}\left(\overset{\rightarrow}{t}\right)\hat{b}\_{(0,1)}\left(o\_{\overset{\rightarrow}{t}+1}\right)\hat{a}\_{(1,2)(2,2)}\left(\overset{\rightarrow}{t}+1\right)\hat{b}\_{(2,2)}\left(o\_{\overset{\rightarrow}{t}+2}\right) \tag{16}$$

(4) For *I* = 3, the acceptance states can be (0,0), (1,1), and (2,2) and *ξt*(*i*) can be described as

$$\mathcal{J}\_t(\mathfrak{d}, d) = \text{Prob}\left(O\_{[1:t]}, LT(\mathfrak{d}) = d \, \middle| \, \lambda \right) \tag{17}$$

Inner transfer recursion is

$$\mathfrak{z}\_t(\mathfrak{z},d) = \mathfrak{z}\_{t-1}(\mathfrak{z},d-1)\mathfrak{z}\_{(\mathfrak{z},\mathfrak{z})(\mathfrak{z},\mathfrak{z})}(t-1)\mathfrak{z}\_{(\mathfrak{z},\mathfrak{z})}(o\_t) \tag{18}$$

Sit1: When the acceptance state is (0,0), it needs five time points to describe from (0,0) to (3,3), and the corresponding acceptance value is

$$\sideset{}{^{\mathbb{Z}}}{^{\rightarrow}}\_{t}\left(0,\stackrel{\rightarrow}{t}\right)\stackrel{\rightarrow}{t}\in\{t-d-4, t-d-3, t-d-2\}\tag{19}$$

*<sup>ϑ</sup>t*(0, 3, *Road*) can be described as

$$\begin{cases} \begin{array}{c} \mathfrak{a}\_{(0,0)(1)} \left( \overset{\rightarrow}{t} \right) \mathfrak{b}\_{(1)} \left( \underset{t+1}{\right)} \mathfrak{a}\_{(1)(1)} \left( \overset{\rightarrow}{t+1} \right) \mathfrak{b}\_{(2)} \left( \overset{\rightarrow}{t+2} \right) \mathfrak{b}\_{(2,3)} \left( \overset{\rightarrow}{t} \right) \mathfrak{a}\_{(2,3)} \left( \overset{\rightarrow}{t} \right) \mathfrak{b}\_{(3,3)} \left( \overset{\rightarrow}{t+3} \right) \mathfrak{b}\_{(3,3)} \left( \overset{\rightarrow}{t+2} \right) \mathfrak{b}\_{(3,3)} \left( \overset{\rightarrow}{t+2} \right) \mathfrak{b}\_{(3,3)} \left( \overset{\rightarrow}{t+3} \right) \mathfrak{b}\_{(3,3)} \left( \overset{\rightarrow}{t+3} \right) \mathfrak{b}\_{(3,3)} \left( \overset{\rightarrow}{t+3} \right) \mathfrak{b}\_{(2,3)} \left( \overset{\rightarrow}{t+3} \right) \mathfrak{b}\_{(3,3)} \left( \overset{\rightarrow}{t+3} \right) \mathfrak{b}\_{(2,3)} \left( \overset{\rightarrow}{t+3} \right) \mathfrak{b}\_{(3,3)} \left( \overset{\rightarrow}{t+3} \right) \mathfrak{b}\_{(2,3)} \left( \overset{\rightarrow}{t+3} \right) \mathfrak{b}\_{(3,3)} \left( \overset{\rightarrow}{t+3} \right) \mathfrak{b}\_{(2,3)} \left( \overset{\rightarrow}{t+3} \right) \mathfrak{b}\_{(2,3)} \left( \overset{\rightarrow}{t+3} \right) \mathfrak{b}\_{(3,3)} \left( \overset{\rightarrow}{t+3} \right) \mathfrak{b}\_{(2,3)} \left( \overset{\rightarrow}{t+3} \right) \mathfrak{b}\_{($$

The return value is the product of the corresponding acceptance value and the intermediate value.

Sit2: When the acceptance state is (1,1), it needs four time points to describe from (1,1) to (3,3), and the corresponding acceptance value is

$$\mathcal{S}\_{\vec{t}}^{\rightarrow} \begin{pmatrix} 1, \stackrel{\rightarrow}{t} \end{pmatrix} \stackrel{\rightarrow}{t} \in \{ t - d - 3, t - d - 2 \} \tag{21}$$

The corresponding intermediate value *<sup>ϑ</sup>t*(0, 3, *Road*) is

$$\begin{cases} \begin{array}{c} \mathfrak{h}\_{(1,1)(1,2)}\left(\stackrel{\rightarrow}{t}\right)\mathfrak{h}\_{(1,2)}\left(o\_{\stackrel{\rightarrow}{t}+1}\right)\mathfrak{h}\_{(1,2)(2,3)}\left(\stackrel{\rightarrow}{t}+1\right)\mathfrak{h}\_{(2,3)}\left(o\_{\stackrel{\rightarrow}{t}+2}\right)\mathfrak{h}\_{(2,3)(3,3)}\left(\stackrel{\rightarrow}{t}+2\right)\mathfrak{h}\_{(3,3)}\left(o\_{\stackrel{\rightarrow}{t}+3}\right)\stackrel{\rightarrow}{t} = t-d-3\\ \end{array} \end{cases} \tag{22}$$

The recursive median is

$$\begin{cases} \ \ \xi\_{\stackrel{\rightarrow}{l}} \left( 1, \stackrel{\rightarrow}{l} \right) \mathfrak{h}\_{(1,1)(1,2)} \left( \stackrel{\rightarrow}{l} \right) \mathfrak{h}\_{(1,2)} \left( \stackrel{\rightarrow}{l}\_{(1,2)(2,3)} \right) \mathfrak{h}\_{(1,2)} \left( \stackrel{\rightarrow}{l} + 1 \right) \mathfrak{h}\_{(2,3)} \left( \stackrel{\rightarrow}{l}\_{(2,3)(3,3)} \right) \mathfrak{h}\_{(2,3)} \left( \stackrel{\rightarrow}{l} + 2 \right) \mathfrak{h}\_{(3,3)} \left( \stackrel{\rightarrow}{l} \right) \stackrel{\rightarrow}{t} = t - d - 3 \\ \ \ \ \xi\_{\stackrel{\rightarrow}{l}} \left( 1, \stackrel{\rightarrow}{l} \right) \mathfrak{h}\_{(1,1)(1,3)} \left( \stackrel{\rightarrow}{l} \right) \mathfrak{h}\_{(1,3)} \left( \stackrel{\rightarrow}{l}\_{(1,3)(3,3)} \right) \mathfrak{h}\_{(1,3)} \left( \stackrel{\rightarrow}{l} + 1 \right) \mathfrak{h}\_{(3,3)} \left( \stackrel{\rightarrow}{l} \right) \stackrel{\rightarrow}{t} = t - d - 2 \end{cases} \tag{23}$$

Sit3: When the acceptance state is (2,2), it needs three time points to describe from (2,2) to (3,3), and the corresponding acceptance value is

$$\xi\_{\stackrel{\rightarrow}{t}}^{\mathbb{Z}}\left(2,\stackrel{\rightarrow}{t}\right)\stackrel{\rightarrow}{t}\in\{t-d-2\}\tag{24}$$

The corresponding intermediate value *<sup>ϑ</sup>t*(2, 3, *Road*) is

$$\left(\hat{a}\_{(2,2)(2,3)}\left(\stackrel{\rightarrow}{t}\right)\hat{b}\_{(1,3)}\left(o\_{\stackrel{\rightarrow}{t}+1}\right)\hat{a}\_{(2,3)(3,3)}\left(\stackrel{\rightarrow}{t}+1\right)\hat{b}\_{(3,3)}\left(o\_{\stackrel{\rightarrow}{t}+2}\right)\tag{25}$$

The recursive median is

$$\mathfrak{h}\_{\stackrel{\mathbb{Z}}{t}}^{\mathbb{Z}}(2,\stackrel{\rightarrow}{t})\mathfrak{h}\_{(2,2)(2,3)}\left(\stackrel{\rightarrow}{t}\right)\mathfrak{h}\_{(1,3)}\left(o\_{\stackrel{\rightarrow}{t}+1}\right)\mathfrak{h}\_{(2,3)(3,3)}\left(\stackrel{\rightarrow}{t}+1\right)\mathfrak{h}\_{(3,3)}\left(o\_{\stackrel{\rightarrow}{t}+2}\right)\tag{26}$$

The general recursive equation of ξ*t*(*<sup>i</sup>*, *d*) can be expressed by Equation (27):

$$\mathcal{S}\_t(i,d) = \sum\_{j=0}^{i} \left[ \left( \prod\_{k=t-d}^{t} \mathfrak{A}\_{(i,i)(i,j)} \mathfrak{f}\_{(i,i)}(o\_k) \right) \sum\_{d\_i=1}^{ct} \sum\_{\text{Round}\in R} \mathfrak{E}\_{ct}(j, d\_i) \mathfrak{f}\_t(j, i, \text{Road}) \right] \tag{27}$$

where c*t* = *t* − *d* − *lr* + 1, *lr* is the length of *Road*.

Based on the above classification and derivation of recursion, the auxiliary variables *ξt* in the higher state are decomposed into the expression of lower auxiliary variables. In the recursive process, all the transition process state paths meeting the main state transition can be generated, and *ξt*(*<sup>i</sup>*, *d*) can calculated by traversing all possible transition state paths. The transition state paths included by each main state are given in Table 2, where *x* is not included in the path length.

**Table 2.** Path generator for each main state transition process.


By the given model parameters, the probability of generating observations *O* is

$$\text{Prob}(O|\lambda) = \sum\_{i=0}^{N-1} \sum\_{d=1}^{D} \xi\_T(i,d)$$

Auxiliary variable <sup>τ</sup>*t*(*index*) is the probability of being in Mapping−<sup>1</sup>(*index*) at time *t* under the premise of the given model parameters and observations.

$$\pi\_\mathsf{I}(index) = \mathrm{Prob}(\mathsf{State}\_\mathsf{I} = \mathsf{Mapproj}^{-1}(index) \, | \, \lambda, \mathcal{O})$$

Where <sup>τ</sup>0(*index*) can be obtained by calculating *π*, *A*˘, *B*˘, and the <sup>τ</sup>*t*(*index*) recursive equation can be shown as follows.

$$\tau\_{l+1}(index) = \frac{\text{prob}\left(o\_{\stackrel{l}{l}+1}, \text{State}\_{l} = \text{Mapping}^{-1}(index) \middle| \lambda \right)}{\sum\_{\text{index}l=0}^{q} \text{prob}\left(o\_{\stackrel{l}{l}+1}, \text{State}\_{l} = \text{Mapping}^{-1}(index) \middle| \lambda \right)} = \frac{\delta\_{(index)}\left(o\_{\stackrel{l}{l}+1}\right) \sum\_{l=0}^{q} \tau\_{l}(i) \mathfrak{d}\_{(l)(index)}}{\sum\_{\text{index}l=0}^{q} \hat{\mathfrak{b}}\_{(index)} \left(o\_{\stackrel{l}{l}+1}\right) \sum\_{l=0}^{q} \tau\_{l}(i) \hat{\mathfrak{a}}\_{(l)(index)}} \tag{28}$$

For the corresponding low-order model, the probability in a single state *i*(*i* ∈ (1, 2, 3, 4)) at time t can be described by the high-order model.

$$\text{Prob}(S\_t = i) = \sum\_{indexxi \in I} \pi\_{t-1}(indexxi)$$

*I* is the index set corresponding to the second sub state *i* under different composite states, and the index correspondence is given in Table 3.


For the parameter estimation problem, this paper selects the intelligent simulation algorithm group to replace the Baum–Welch algorithm and carries out a two-stage estimation. First, a one-stage likelihood function *<sup>L</sup><sup>λ</sup>*, *A*˘, *B*˘ is established, and it is described as

$$L(\lambda, \check{A}, \check{B}) = \text{Prob}(O\_{[1:2]} \left| \lambda, \check{A}, \check{B} \right.)$$

For the situation of the current *λ*, *A* ˘ , *B* ˘ , if the probability of the first two observations is generated, then the optimal *λ*, *A* ˘ , *B* ˘ in one stage is

$$\lambda, \mathcal{A}, \mathcal{B} = \operatorname\*{arg\,max}\_{\lambda, \mathcal{A}, \mathcal{B}} \text{Prob}\left(O\_{[1:2]} \Big| \lambda, \mathcal{A}, \mathcal{B}\right)$$

The two-stage likelihood function is

$$L(A, B) = \text{Prob}(O\_{[3:]} \left| \lambda , \check{A} , \check{B} , A , B \right)$$

In the situation of the current *λ*, *A* ˘ , *B* ˘ , *A*, *B*, the probability from the third to the final observation is generated, and the corresponding optimal parameters *A*, *B* in the second stage are

$$A, B = \arg\max\_{A, B} \text{Prob}\left(O\_{[3:]} \middle| \begin{matrix} \lambda \ \vec{A} \end{matrix} \vec{B}, \vec{B} \begin{matrix} \lambda \ \vec{B} \end{matrix} \right)$$

Different intelligent simulation algorithms are used to optimize and compare the two-stage likelihood function, and finally to select the best result.

#### **4. Residual Life Prognosis under Uncertain Distribution**

For the recursive reasoning on the duration of each state in Section 3.2, the system duration of each health state and the joint probability value of the observation can be obtained. In fact, there is a certain correlation between the observations generated by the system and the system duration in a single state, and it can be described by the observation and state transition matrix through a specific mode. In order to obtain the edge probability of the system duration in each health state, it is advisable to assume that system observation and dwell are independent of each other. For the above obtained *ξt*(*<sup>i</sup>*, *d*), by calculating the generation probability of the corresponding observation, the conditional probability equation can be obtained.

$$\text{Prob}(LT(i) = d | \lambda) = \frac{\oint\_t (i, d)}{\text{Prob}(O | \lambda)}$$

The time represented by each time point is used to calculate the probability that a certain state produces the duration, and a group of duration probability sequences are obtained. Generally, when the discrete data points are known, an a priori distribution assumption is conducive to study the continuity characteristics of the data. However, if the data distribution is incorrectly assumed, there will be grea<sup>t</sup> errors in the fitting and the original properties of the data will be lost. Thus, this paper proposes a method to fit the data under an unknown distribution based on polynomial regression. The polynomial regression can be described as

$$f^{\mathfrak{n}}(\mathbf{x}) = w\mathfrak{o} + w\_1\mathfrak{x} + w\_2\mathfrak{x}^2 + w\_3\mathfrak{x}^3 \cdots + w\_{\mathfrak{m}}\mathfrak{x}^{\mathfrak{m}}$$

Letting the *m*-order polynomial function fitted by duration probability sequences generated by health *i* be described as *Ldsm<sup>i</sup>*(*t*), the residual lifetime of system at time *t* can be obtained.

$$RIL1\_1(t) = \sum\_{l=0}^{N-1} \left[ \int\_t^D \frac{Lds\_n^{-l}(\mathbf{x})\mathcal{R}\_l}{\int\_0^D Lds\_n^{-l}(\mathbf{y})dy} (\mathbf{x} - t)d\mathbf{x} + \sum\_{j=l+1}^{N-1} \int\_t^D \mathbf{x}Lds\_{ll}^{-l}(\mathbf{x})\mathcal{R}\_j d\mathbf{x} \right] \text{Prob}(s\_l = i) \tag{29}$$

where, *D* is the maximum duration of each state and Prob(*st* = *i*) is the probability in health state *i* at time *t*. *Ri*, *Rj* are the integral scaling coefficients after continuous discrete data.

#### **5. Case Study**

This paper verifies and evaluates the proposed model and method through the example of the system health diagnosis and life prediction of one power station. A LW42A-40.5 high voltage circuit breaker system was selected as the experimental platform; it was equipped with a CT14 spring operating mechanism, three-phase linkage during opening and closing, and SF6 was used as the arc extinguishing medium. An LW42A-40.5 high voltage circuit breaker system is mainly used in 35 kV power transmission and distribution systems for protection and control. The vibration signals of the system in the laboratory were collected by a hydraulic accelerometer installed parallel to the rotating shaft of the power system. In the application example, the power system was filled with 20, 40, 60, and 80 mg of micro dust, respectively, and a fixed length time window every 10 min to collect a vibration signal (P6) of about 1 min was used, as shown in Figure 4. Then, the vibration signal was divided into five layers by using 10 dB wavelets and the array of high-frequency and low-frequency wavelet coefficients were obtained. The dimensionally reduced wavelet coefficients were used as the input feature sequence vector of DGHMM. During the whole experiment, the states of the power system could be divided into four types: Baseline, Cont1, Cont2, and Cont3. Cont3 is the complete failure state of system. The whole experimental analysis platform was Python 3 and the platform running environment was Windows 10.

The power system state monitoring information was used to predict the state of health. In order to obtain a reliable health prognosis, the features to be monitored should be sensitive to vibration trends. We used wavelet transform to remove noise from the original signal and perform feature extraction in this paper [25]. It can generate a suitable framework to study the multi-scale transient representation of the signal. It is also good at time-frequency analysis and processing non-stationary signals. More than one characteristic should usually be used for a healthy prognosis. Here, a wavelet amplitude model based on the overall monitoring signal was used to demonstrate how to use features for health prognosis.

**Figure 4.** Experimental bench and signal acquisition.

#### *5.1. Partial Data Preview*

Using the vibration signal monitoring data from the hydraulic accelerometer, the health status diagnosis and residual life prognosis of the power system were carried out. Part of the vibration data (P6) after wavelet transform is shown in Table 4.



#### *5.2. Parameter Estimation of the Intelligent Optimization Algorithm Group*

Considering that the maximum expectation algorithm has a large dependence on the initial value, we used a genetic algorithm (GA), particle swarm algorithm (PSO), and artificial fish swarm algorithm (AFSA) to estimate the parameters of the proposed model under the same set of full observations, respectively.

The maximum number of iterations was 300, and the number of population (the number of particles in a particle swarm) was 30. The GA adopted a large parameter adaptive value encoding and decoding strategy. The PSO adopted the fixed inertia weight strategy, with an inertia weight of 0.5 and a learning factor of 2. The AFSA perception field was 1 and the crowding factor was 0.6. The objective optimization model was to maximize the occurrence probability of observation. The iterative process and results are shown in Figure 5. Under the premise of the same population number and iteration times, GA had good results with a continuous evolution and maximum likelihood. The likelihood values of the parameter groups found by the three optimization algorithms are shown in Table 5, and the reduced probability transition probability matrix is given in Table 6.




**Table 6.** The optimal probability transfer matrix is represented sparsely.

#### *5.3. Residual Lifetime Prognosis*

**Table 5.** The maximum likelihood of each algorithm.

Based on the model reasoning, the duration of each state was analyzed, and the full probability equation was used for equivalent replacement in the calculation process. The different duration values of each state at different time points were separated according to the state to obtain the duration probability sequence of each state, and the polynomial regression was used to fit the sequence to obtain their respective analytical formulas. In the fitting process, the order *m* with less of a resonance effect on the prediction of the residual life of the subsequent state was preferentially selected (priority selection principle). The polynomial fitting of four different states at the final time point is given as an example in Figure 6. The optimal values of fitting in the four states were 20, 15, 19, and 17 respectively, and the detailed polynomial regression coefficients are given in Table 7.

**Figure 5.** Swarm intelligence algorithm iteration process.

**Figure 6.** The polynomial fitting of each state at the final time.


**Table 7.** The polynomial fitting coefficients of each state at the final time.

In principle, negative values are not allowed for probability integration, but polynomial regression shows obvious fluctuation characteristics, positive and negative values have certain offset effect, and resonance also exists at the same time point in different states. It is not difficult to predict that the predicted residual life value may have certain fluctuation characteristics. In order to better fit the polynomial, the original data points were linearly interpolated combined with the characteristics of the original data. In Figure 7, the red point is the original data point, that is, the adaptive duration generation probability value, the light blue is the interpolation point, and the yellow line is the polynomial fitting line.

**Figure 7.** Residual lifetime prognosis of P6.

Finally, the residual life prognosis of P6 is shown in Figure 7, in which the discrete prediction points have been interpolated and smoothed. Mark 1 shows that there is a large deviation in the predicted value at the final time point. Corresponding to the phenomenon of the initial fluctuation range of the polynomial regression formula in each subgraph of Figure 7 being large but the fluctuation gradually decreasing with the increase of the time point, this is the result of the priority selection principle. The phenomenon of early damage appears at Mark 2, corresponding to the phenomenon of residence early reduction in Figure 6d.

Compared with the low-order HSMM results [3], the residual life prognosis by the proposed model in this paper is shown in Table 8. From the sampling time point, the overall effect of the residual life prognosis method based on HOHSMM and polynomial fitting is significantly better than that of the conventional HSMM. The case shows that the life prognosis method based on HOHSMM and polynomial regression is effective and feasible.


**Table 8.** Relative error analysis of the predicted RU.
