Next Article in Journal
Noether Symmetry of Multi-Time-Delay Non-Conservative Mechanical System and Its Conserved Quantity
Previous Article in Journal
Lie Symmetry Analysis and Conservation Laws of Fractional Benjamin–Ono Equation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Bayesian Inference of Recurrent Switching Linear Dynamical Systems with Higher-Order Dependence

1
School of Science, Wuhan University of Technology, Wuhan 430071, China
2
Hubei Longzhong Laboratory, Xiangyang 441100, China
*
Author to whom correspondence should be addressed.
Symmetry 2024, 16(4), 474; https://doi.org/10.3390/sym16040474
Submission received: 6 March 2024 / Revised: 7 April 2024 / Accepted: 10 April 2024 / Published: 13 April 2024

Abstract

:
Many complicated dynamical events may be broken down into simpler pieces and efficiently described by a system that shifts among a variety of conditionally dynamical modes. Building on switching linear dynamical systems, we develop a new model that extends the switching linear dynamical systems for better discovering these dynamical modes. In the proposed model, the linear dynamics of latent variables can be described by a higher-order vector autoregressive process, which makes it feasible to evaluate the higher-order dependency relationships in the dynamics. In addition, the transition of switching states is determined by a stick-breaking logistic regression, overcoming the limitation of a restricted geometric state duration and recovering the symmetric dependency between the switching states and the latent variables from asymmetric relationships. Furthermore, logistic regression evidence potentials can appear as conditionally Gaussian potentials by utilizing the Pólya-gamma augmentation strategy. Filtering and smoothing algorithms and Bayesian inference for parameter learning in the proposed model are presented. The utility and versatility of the proposed model are demonstrated on synthetic data and public functional magnetic resonance imaging data. Our model improves the current methods for learning the switching linear dynamical modes, which will facilitate the identification and assessment of the dynamics of complex systems.

1. Introduction

Complex systems frequently exhibit multiple levels of abstraction in their descriptions [1]. For example, a computer program can be characterized by the collection of functions it calls, the sequence in which it carries out statements or the assembly instructions it sends to the CPU. This assertion is true for a multitude of natural systems. Brain activity can be classified based on either broad psychological states or the activation of individual ion channels. The necessary amount of specificity may differ based on the particular task being performed [2]. Through the identification of these behavioral units and their interdependence, we can obtain a deeper understanding of the intricate mechanisms that give rise to complex natural occurrences. Furthermore, modern machine learning offers powerful tools to help model the dynamics of complex systems. The toolbox has recently been improved to incorporate more versatile elements, such as Gaussian processes [3] and neural networks [4], into probabilistic time series models.
Time series analysis encompasses several methodologies and models, such as stationary process models, spectral models, state space models, and non-linear models [5]. Among them, the state space methods are regarded as more versatile and adept at addressing a broader range of problems compared to the other models. Hidden Markov models (HMM) and switching linear dynamical systems (SLDS) are two well-known state space models. In numerous real-world time series situations, the present condition of a dynamic system is connected to its condition in prior time intervals. Both HMM and SLDS can address such problems and have been widely used in several problem fields. Examples include human motion [6,7], computer vision [8], speech recognition [9], econometrics [10], machine learning [11] and neuroscience [12,13].
HMM is a discrete Markov process with doubly embedded stochastic models, namely, (i) an unobservable hidden process characterized by a Markov chain and (ii) an observation process determined by hidden states. HMM-based models have garnered significant interest and demonstrated their utility in the signal processing community for accurately simulating the intricate temporal progression of signals. More precisely, HMM have a lengthy track record in signal processing, particularly in the field of speech processing where they have achieved notable success [14]. The SLDS allows for the modeling of nonlinear time series data by dividing it into subsequences controlled by linear dynamics. Indeed, the generative model of SLDS is slightly different from HMM, where there is an additional set of latent variables between the switching states and the observations. In addition, a SLDS differs from a HMM by choosing from a collection of linear Gaussian dynamics that evolve continuously, instead of a standard Gaussian mixture density as in HMM. The SLDS can be viewed as an expansion of the HMM, where each HMM state, or mode, is linked to a linear dynamical process. One further benefit of SLDS is its ability to handle time series data with high dimensionality. From a generative model perspective, it can be claimed that high-dimensional time series can be cheaply represented by a dynamical process specified on a low-dimensional manifold. This representation benefits from the model structure of SLDS. Nevertheless, employing HMM directly on time series with high dimensionality is likely to result in overfitting. [15]. Therefore, SLDS offers the potential for increased descriptive capability [1].
Although SLDS offers advanced predictive models, the dynamics from the data are only stated under certain conditions, which poses some shortcomings of the generative model and may limit its application. First, the continuously evolving linear Gaussian dynamics of latent variables is essentially equivalent to a first-order switching vector autoregressive (VAR) process, i.e., VAR(1) process, which may not be suitable for the case when there are higher-order dependency relationships in the time series. Higher-order autoregressive interactions are more common in time series data [16,17]. Furthermore, determining an appropriate autoregressive order must trade off the model likelihood and complexity, constituting a non-trivial task in the applications [18]. Indeed, a restricted VAR(1) process involved in the linear dynamics of latent variables limits the ability of SLDS to recover the higher-order dependency relationships in dynamical phenomena.
Second, the switching state transition in SLDS follows the Markov assumption, which means that the duration time (i.e., time interval spent in a certain state) is restricted to a geometric distribution [19]. Thus, more weight is given to shorter consecutive time periods within a certain state, which means states switch frequently. This may not be appropriate when the system spends a long time in one state [20]. In addition, the state transitions are based only on the previous state and are unrelated to the latent variables. If a discrete switch happens when the system enters a particular area of the state space, the SLDS cannot recognize this relationship. While hidden semi-Markov model (HSMM) based on the semi-Markov assumption is an extension of the HMM that alleviates this issue, it should be noted that state transitions in HSMM still have no associations with the previous observations [20].
To address the mentioned limitations, an extension of SLDS, the recurrent switching linear dynamical systems with higher-order dependence (HO-rSLDS), is presented. The HO-rSLDS provides a new method that improves the modeling of dynamics from complex systems, by enabling a higher-order dependence in the linear Gaussian dynamics of latent variables and allowing the switching state transition probabilities to hinge on the preceding values of the latent variables. Specifically, to address the first limitation, we enhance the generative process of latent variables by directly making the current latent variable depend on the state values at the previous several time steps, rather than only the previous one time step. Therefore, the proposed model is feasible for capturing the higher-order autoregressive relationships in the data. To address the second limitation, we utilize the stick-breaking logistic regression to determine the switching state transition [21]. In addition, we leverage a Pólya-gamma augmentation approach [22] to improve block inference updating. This enhancement transforms specific logistic regression evidence potentials into conditionally Gaussian potentials, facilitating efficient Bayesian inference procedures. In this way, the transition of switching states is directly associated with the previous switching state and latent variable. Moreover, since the switching states and the latent variables are mutually interdependent in HO-rSLDS, the symmetric dependency among them is recovered, enhancing the generative process of both switching states and latent variables.
Similar to SLDS, the exact inference in an HO-rSLDS is intractable, which impedes efficient model estimation and parameter learning [8]. In this study, the HO-rSLDS is learned through variational inference, which is an approximate scheme to maximize the evidence lower bound, i.e., minimize the Kullback–Leibler divergence between a restricted family of functions of the model parameters and the actual posterior distribution [23]. Moreover, we propose message-passing algorithms about switching states and latent variables in HO-rSLDS to evaluate sufficient statistics for further parameter updates.
The primary contributions of the novel approach suggested in this study are outlined below:
  • HO-rSLDS assumes the linear dynamics of latent variables can be described by a higher-order VAR process, which makes it feasible to dig out and evaluate the long-term dependency relationships from dynamical phenomena;
  • Stick-breaking logistic regression is applied to determine the switching state transition in HO-rSLDS. By this means, the transition probabilities are time-varying and can be adjusted according to the previous latent variable, which overcomes the limitation of restricted geometric state duration time by Markov assumption and recovers the symmetric dependency between the switching states and the latent variable;
  • The Pólya-gamma augmentation strategy permits efficient Bayesian inference algorithms. In addition, we propose message-passing algorithms, including a forward Kalman filter and backward Kalman smoother for HO-rSLDS, which facilitates the parameter update in variational inference.
The rest of this study is organized as follows. In Section 2.3, we discuss the related models. In Section 3, we present the model learning algorithms in detail. In Section 4, we demonstrate the performance of the proposed HO-rSLDS on simulated data. Our method is applied to public functional magnetic resonance imaging (fMRI) data as well, and the results are described in Section 5. Finally, we wrap up our conclusions in Section 6.

2. Methods

2.1. HMM and HSMM

HMMs are mathematical models that analyze dynamic systems by using random processes with hidden states. A HMM is constructed using a Markov chain and is composed of two layers: a layer representing hidden states and a layer representing observations. The observed sequences are allocated to their respective hidden states according to the observation probability distribution. In the context of HMM, the observations are solely influenced by the current state and are not influenced by any preceding states [20]. The entire state sequence can be presented as z 1 : T = z t t = 1 T , which can be viewed as a sequence over a finite set Ƶ with cardinality | Ƶ | . The state transition probability from state i to j is defined as π i j = p ( z t + 1 = j | z t = i ) . The distribution of observations y t R n given a specific state is denoted by p ( y t | z t , θ j ) , where θ j denotes the emission parameters of state j. Above all, the HMM can be described as:
z t | z t 1 π z t 1 ,
y t | z t F ( θ z t ) ,
where F ( · ) denotes emission distribution.
Compared to HMM, the HSMM formalism improves upon the standard HMM by incorporating a random state length time, chosen from a state-specific distribution, into the generative process. The state remains constant until the period elapses, at which juncture there is a Markov transition to a fresh state. By using this approach, the distribution of state durations is not limited to a geometric form [24,25]. The random variable D t represents the length of time that a state is entered at time t. The associated probability mass function is denoted as p ( D t | z t = j ) and the HSMM can be described as
ν s | ν s 1 π ν s 1 ,
D s g ( ω s ) ,
z t s 1 : t s D s + 1 = ν s ,
y t s 1 : t s D s + 1 | z t s 1 : t s D s + 1 F ( θ ν s , D s ) ,
where ν s indexes the state shared by state segment s. A graphical illustration of HMM and HSMM is shown in Figure 1. HMM or HSMM typically uses a Gaussian mixture as the emission distribution, which fails to capture potential dependencies in observed time series. An effective approach to tackle this problem is to employ the VAR emission model, which describes the behavior of time series through linear historical interactions among the observed time series. In this way, the emission model in HMM or HSMM can be specifically expressed as:
y t | z t = k N ( l = 1 p W k , l y t l , Σ k ) ,
where p denotes the maximum lag order, W k , l are n × n dimensional matrices representing the k-th state autoregressive coefficient matrices for lag l and the covariance matrix is Σ k . Subsequently, we refer to the H(S)MM with VAR(p) emission model as the AR(p)-H(S)MM.

2.2. SLDS

A SLDS Markov process differs from a hidden Markov model (HMM) by choosing from a collection of linear Gaussian dynamics that evolve continuously, instead of a standard Gaussian mixture density as in HMM. Therefore, SLDS offers the potential for increased descriptive capability.
Similar to HMM, at each time t = 1 , 2 , , T there is a discrete switching state z t 1 , 2 , , K that evolves according to Markov dynamics:
z t + 1 | z t , π k k = 1 K π z t ,
π k k = 1 K constitutes the Markov transition matrix and π k 0 , 1 K is its k-th row. The z 1 is assumed to be sampled from an initial distribution π 0 parameterized by prior parameter α 0 , i.e., z 1 | α 0 π 0 ( α 0 ) . Furthermore, a continuous latent variable x t R m follows conditionally linear dynamics depend on the state values with at the previous time step, and the switching state z t determines the linear dynamical system at time t by:
x t + 1 = A z t + 1 ( 0 ) x t + v t , v t i.i.d N ( 0 , Q z t + 1 ( 0 ) ) ,
where A z t + 1 ( 0 ) R m × m denotes the coefficient matrix and Q z t + 1 ( 0 ) R m × m is covariance matrix. At each time t, a linear Gaussian observation y t R n is produced from the associated latent continuous variable:
y t = C ( 0 ) x t + w t , w t i.i.d N ( 0 , S ( 0 ) ) ,
where C ( 0 ) R n × m , S ( 0 ) R n × n . Usually, n is much larger than m, such that the dependence structure of high-dimensional observation data can be evaluated based on the low-dimensional latent variable space. A graphical illustration of SLDS is given in Figure 2.
In classical form, however, there are some shortcomings in the generative model of SLDS, which may limit its application. First, the continuously-evolving linear Gaussian dynamics of x t is essentially equivalent to a first-order switching vector autoregressive (VAR) process, i.e., the VAR(1) process, which may not be suitable for the case when there are higher-order dependency relationships in x t or y t . Substitute Equation (9) into Equation (10) and assume C ( 0 ) C ( 0 ) = I and w t approximately zero, one can immediately obtain the conclusion that SLDS describes the dependency relationships by a VAR(1) process since y t = C ( 0 ) A z t ( 0 ) C ( 0 ) y t 1 + C ( 0 ) v t . Second, the state transition of z t follows the Markov assumption, which means that the duration time is restricted to a geometric distribution [19]. Thus, more weight is given to shorter consecutive time periods within a certain state, which means states switch frequently. This may not be appropriate when the system spends a long time in one state [20]. In addition, the state z t + 1 transitions based only on the previous state z t , and z t + 1 | z t is unrelated to the latent variable x t . If a discrete switch happens when the system enters a particular area of the state space, the SLDS will not be able to recognize this relationship. To address these issues, the proposed HO-rSLDS enhances the generative process of x t and the switching state transition, which is demonstrated in the following subsection.

2.3. Recurrent Switching Linear Dynamic System with Higher-Order Dependence

In this section, we describe the proposed HO-rSLDS comprising two main components that are switching linear dynamic systems with higher-order dependence (HO-SLDS, without recurrence) and the stick-breaking logistic regression as well as the Pólya-gamma augmentation for HO-SLDS. Each component is demonstrated below.

2.3.1. HO-SLDS

The switching state transition in HO-SLDS is the same as that in SLDS. However, the continuous latent variable x t R m follows conditionally linear dynamics depending on the state values at the previous p time steps, rather than only the previous one time step, and the switching state z t determines the linear dynamical system at time t by:
x t + 1 = A z t + 1 x ¯ t + v t , v t i.i.d N ( 0 , Q z t + 1 ) ,
where x ¯ t = [ x t 1 , , x t p ] R p m , A z t + 1 R m × p m and Q z t + 1 R m × m . Indeed, the evolution of the latent variable dynamics in x t can be considered as a VAR ( p ) process. The initial x ¯ 0 given the initial discrete state z 0 is supposed to be normally distributed with mean μ 0 and covariance Σ 0 , i.e., x ¯ 0 N ( μ 0 , Σ 0 ) . Similarly, at each time t, a linear Gaussian observation y t R n is produced from the associated latent continuous variable:
y t = C z t x t + w t , w t i.i.d N ( 0 , S z t ) ,
where C z t R n × m , S z t R n × n . Note that here we assume C z t and S z t are related to z t . Usually, n is much larger than m, such that the dependence structure of high-dimensional observation data can be evaluated based on the low-dimensional latent variable space. The system parameters consist of the discrete Markov transition matrix and the collection of linear dynamical system matrices, denoted as:
θ = π 0 μ 0 , Σ 0 , π k , A k , Q k , C k , S k k = 1 K .

2.3.2. Stick-Breaking Logistic Regression and Pólya-Gamma Augmentation for HO-SLDS

Another component included in HO-rSLDS is a stick-breaking logistic regression. We utilize a Pólya-gamma augmentation technique to improve block inference updates [21]. This approach allows certain logistic regression evidence potentials to transform to conditionally Gaussian potentials in an augmented distribution, which facilitates the Bayesian inference algorithms.
Consider a logistic regression model with regressors x R m that maps to a categorical distribution over the switching state z 1 , 2 , , K , denoted as:
z | x π SB ( ν ) , ν = R x + r ,
where R R K 1 × M represents a weight matrix and r R K 1 denotes a bias vector. We use a stick-breaking link function π SB : R K 1 [ 0 , 1 ] K , which transforms a real vector into a normalized probability vector based on the stick-breaking process:
π SB ( ν ) = π SB ( ν ) ( 1 ) , , π SB ( ν ) ( K ) ,
π SB ( k ) = σ ( ν k ) j < K σ ( ν j ) ,
for k = 1 , , K 1 and π SB ( K ) = k = 1 K 1 σ ( ν k ) where ν k represents the kth component of ν and σ ( ν ) = ( 1 + e ν ) 1 is the logistic function. The probability mass function p ( z | x ) is given by
p ( z | x ) = k = 1 K 1 σ ( ν k ) I ( z = k ) σ ( ν k ) I ( z > k ) ,
where I ( · ) is the indicator function.
The posterior density p ( x | z ) is non-Gaussian and does not allow straightforward Bayesian inference for this regression model due to the likelihood p ( z | x ) not aligning with a Gaussian prior density p ( x ) . To solve this problem, we introduce Pólya-gamma auxiliary variables ω = ω k k = 1 K so that the conditional density p ( x | z , ω ) becomes Gaussian [22]. Specifically, the conditional density of ω k is distributed according to a Pólya-Gamma distribution. A random variable X has a Pólya-Gamma distribution with parameters b > 0 and c R , denoted as X PG ( b , c ) , if
X = D 1 2 π 2 k = 1 g k ( k 1 / 2 ) 2 + c 2 / ( 4 π 2 ) ,
where the g k Gamma ( b , 1 ) are independent Gamma random variables, and where = D indicates equality in distribution. In particular, by choosing ω k | x , z PG ( I ( z > k ) , ν k ) , we have the following:
x | z , ω N ( Ω 1 κ , Ω 1 ) ,
where the mean vector Ω 1 κ and covariance matrix Ω 1 are calculated based on the following:
Ω = diag ( ω ) ,
κ k = I ( z = k ) 1 2 I ( z > k ) .
Thus allowing efficient block updates while maintaining Gaussian posterior distribution p ( x | z ) in Bayesian inference.
The HO-rSLDS splits the latent space into K sections, with each section following its own linear dynamics by including the recurrence (Equation (17)) in the transition density of z t , which can be described as follows:
z t + 1 | z t , x t , R z t , r z t π SB ( ν t ) ,
ν t = R z t x t + r z t .
Moreover, we place Gaussian priors on R z t and r z t . In this way, the probability mass function p ( z t + 1 | x t ) can be expressed as:
p ( z t + 1 | x t ) = k = 1 K 1 σ ( ν k ) I ( z t + 1 = k ) σ ( ν k ) I ( z t + 1 > k ) = k = 1 K 1 ( e ν t , k ) I ( z t + 1 = k ) ( 1 + e ν t , k ) I ( z t + 1 > k ) .
The Pólya-gamma augmentation focuses on exactly the above densities, utilizing the following integral identity:
( e ν ) a ( 1 + e ν ) b = 2 b e κ ν 0 e ω ν 2 p PG ( ω | b , 0 ) d ω ,
where κ = a b / 2 and p PG ( ω | b , 0 ) refers to the density of the Pólya-gamma distribution, PG ( b , 0 ) , which is independent of ν [22]. Combing Equations (24) and (25), p ( z t + 1 | x t ) can be expressed as the marginalization of a component in the probability distribution p ( z t + 1 | x t , ω t ) , where ω t R K 1 represents a vector of auxiliary variables. Dependent on ν t , there is
p ( z t + 1 , x t , ω t ) k = 1 K 1 exp 1 2 ω t , k ν t , k 2 + κ t + 1 , k ν t , k ,
where κ t + 1 , k = I ( z t + 1 = k ) 1 2 I ( z t + 1 > k ) . Hence,
p ( z t + 1 , x t , ω t ) N ( ν t | Ω t 1 κ t + 1 , Ω t 1 ) ,
where Ω t = diag ( ω t ) and κ t + 1 = ( κ t + 1 , 1 , , κ t + 1 , K 1 ) . After augmentation, the conditional density on x t becomes virtually Gaussian, simplifying the evaluation of the integrals needed for message passing during the Bayesian inference. Finally, we summarize the proposed HO-rSLDS as
z 1 | θ π 0 ( α 0 ) , x ¯ 0 | θ N ( x ¯ 0 | μ 0 , Σ 0 ) , z t + 1 | z t , x t , θ π SB ( ν t ) , ν t = R z t x t + r z t , x t + 1 | x ¯ t , z t + 1 , θ N ( x t + 1 | A z t + 1 x ¯ t , Q z t + 1 ) , y t | x t , z t , θ N ( y t | C z t x t , S z t ) , θ = α 0 , μ 0 , Σ 0 A k , Q k , C k , S k , R k , r k k = 1 K .
To learn the HO-rSLDS utilizing Bayesian inference, we assign conjugate priors to each component of the model parameters θ , given by
π 0 | α 0 Dir ( α 0 ) , ( A k , Q k ) | η MNIW ( η ) , ( C k , S k ) | λ MNIW ( λ ) , R k | ζ i = 1 K 1 N ( R k , i | ζ i ) , r k | ρ N ( r k | ρ ) ,
where Dir and MNIW denote Dirichelet and matrix normal inverse Wishart distribution, respectively, R k , i denotes the ith row of R k . A graphical illustration of HO-rSLDS is given in Figure 3.

3. Variational Inference of HO-rSLDS

Now we describe the model learning inference of HO-rSLDS, which is fulfilled by variational inference. In variational inference, a parameter is inferred by maximizing the evidence’s lower bound, hence minimizing the Kullback–Leibler divergence as well as the variational free energy. Specifically, we denote Y : = y 1 : T , Θ : = z 1 : T , ( x ¯ 0 , x 1 : T ) , ω 1 : T , θ , then the following decomposition holds:
ln p ( Y ) = L q ( Θ ) + KL q ( Θ ) | | p ( Θ | Y ) ,
where
L q ( Θ ) = q ( Θ ) log p ( Y , Θ ) q ( Θ ) d Θ ,
KL q ( Θ ) | | p ( Θ | Y ) = q ( Θ ) log p ( Θ | Y ) q ( Θ ) d Θ .
The exact evaluation of p ( z 1 : T , x 0 : T , ω 1 : T | y 1 : T ) where x 0 : T : = x ¯ 0 x 1 : T in HO-rSLDS is intractable. In this study, we leverage an approximate variational inference method to assess the approximate posterior q ( z 1 : T , x 0 : T , ω 1 : T | y 1 : T ) in the proposed HO-rSLDS. In order to obtain a tractable inference, a structured mean field approximation is adopted on the generative model, given by
q ( z 1 : T , x 0 : T , ω 1 : T , θ | y 1 : T ) = q ( z 1 : T ) q ( x 0 : T ) q ( ω 1 : T ) q ( θ ) .
Moreover, q ( ω 1 : T ) further factorizes as:
q ( ω 1 : T ) = t = 1 T k = 1 K 1 q ( ω t , k ) .
Given the factorized forms in Equations (33) and (34), we can obtain the optimal solution of the factorized variational posteriors by maximizing the evidence lower bound in Equation (31), i.e., q ( z 1 : T ) , q ( x 0 : T ) , q ( ω 1 : T ) and q ( θ ) , given by
ln q ( z 1 : T ) E q ( z 1 : T ) ln p ( z 1 : T , x 0 : T , ω 1 : T , θ , y 1 : T ) ,
ln q ( x 0 : T ) E q ( x 0 : T ) ln p ( z 1 : T , x 0 : T , ω 1 : T , θ , y 1 : T ) ,
ln q ( ω 1 : T ) E q ( ω 1 : T ) ln p ( z 1 : T , x 0 : T , ω 1 : T , θ , y 1 : T ) ,
ln q ( θ ) E q ( θ ) ln p ( z 1 : T , x 0 : T , ω 1 : T , θ , y 1 : T ) ,
where E q ( z 1 : T ) ( · ) : = E q ( x 0 : T ) q ( ω 1 : T ) q ( θ ) ( · ) . Hence the joint log-likelihood is
ln p ( z 1 : T , x 0 : T , ω 1 : T , θ , y 1 : T ) = ln p ( z 1 : T , x 0 : T , ω 1 : T , y 1 : T | θ ) + ln p ( θ ) .
The conditional log-likelihood ln p ( z 1 : T , x 0 : T , ω 1 : T , y 1 : T | θ ) can be written as
ln p ( z 1 : T , x 0 : T , ω 1 : T , y 1 : T | θ ) = ln p ( z 1 | θ ) + ln p ( x ¯ 0 | θ ) + t = 1 T ln p ( z t | x t 1 , z t 1 , ω t 1 , θ ) I ( t > 1 ) + ln ( x t | x ¯ t 1 , z t , θ ) + t = 1 T p ( y t | z t , x t , θ )
ln p ( z 1 | θ ) 1 2 ( x ¯ 0 μ 0 ) ( Σ 0 ) 1 ( x ¯ 0 μ 0 ) 1 2 t = 1 T [ ( ν t 1 Ω t 1 1 κ t ) Ω t 1 ( ν t 1 Ω t 1 1 κ t ) I ( t > 1 ) 1 2 t = 1 T [ a a a + ( x t A z t x ¯ t 1 ) Q z t 1 ( x t A z t x ¯ t 1 ) ] 1 2 t = 1 T ( y t C z t x t ) S z t 1 ( y t C z t x t ) .
The exact optimal solutions for q ( z 1 : T ) and q ( x 0 : T ) will not be explicitly evaluated; rather, we use message passing algorithms to obtain the sufficient statistics with respect to these factors. In the following subsections, we present the involved message passing algorithms and the detailed derivations of update formula for each factor.

3.1. Update for q ( x 0 : T )

Since E q ( x 0 : T ) ln p ( θ ) is not a function of x 0 : T , Equation (36) can be written as
ln q ( x 0 : T ) E q ( x 0 : T ) ln p ( z 1 : T , x 0 : T , ω 1 : T , y 1 : T | θ ) , E q ( x 0 : T ) 1 2 ( x ¯ 0 μ 0 ) ( Σ 0 ) 1 ( x ¯ 0 μ 0 ) + E q ( x 0 : T ) 1 2 t = 2 T ( ν t 1 Ω t 1 1 κ t ) Ω t 1 ( ν t 1 Ω t 1 1 κ t ) + E q ( x 0 : T ) 1 2 t = 1 T ( x t A z t x ¯ t 1 ) Q z t 1 ( x t A z t x ¯ t 1 ) + E q ( x 0 : T ) 1 2 t = 1 T ( y t C z t x t ) S z t 1 ( y t C z t x t ) .
From Equation (42) it can be observed that the structure of the variational posterior ln q ( x 0 : T ) resembles the joint log-likelihood function of a time-varying linear dynamical system with higher-order dependence (HO-LDS). Thus, the ln q ( x 0 : T ) can be re-expressed as
ln q ( x 0 : T ) 1 2 ( x ¯ 0 μ ^ 0 ) ( Σ ^ 0 ) 1 ( x ¯ 0 μ ^ 0 ) 1 2 t = 1 T ( x t A ^ t x ¯ t 1 ) Q ^ t 1 ( x t A ^ t x ¯ t 1 ) 1 2 t = 1 T ( y t C ^ t x t ) S ^ t 1 ( y t C ^ t x t ) ,
and the set of variational parameters involved in Equation (43) are defined as
θ ^ x 0 : T : = μ ^ 0 , Σ ^ 0 , A ^ t t = 1 T , Q ^ t t = 1 T , C ^ t t = 1 T , S ^ t t = 1 T .
Each component in the θ ^ x 0 : T can be precisely determined by matching the coefficients of each term in Equations (42) and (43). Algorithm 1 outlines an efficient scheme for this purpose.
Algorithm 1: Obtaining θ ^ x 0 : T
Symmetry 16 00474 i001
Remark 1. 
A z t x ¯ t 1 = j = 1 p A z t ( j ) x t j where A z t ( j ) is defined as the submatrix of A z t associated with x t j in A z t x ¯ t 1 .
Thus, we perform message passing on a time-varing HO-LDS parameterized by θ ^ x 0 : T to obtain the posterior distribution q ( x 0 : T ) and sufficient statistics used in the update for the other factors. To be concrete, the message passing comprises the generalized Kalman filter and Kalman smoother for the time-varing HO-LDS, which are demonstrated by Theorems 1 and 2. To prove the theorems, we need the following lemmas [26]:
Lemma 1. 
If random variables X and Y follow the Gaussian probability distributions:
X N ( m , P ) ,
Y | X N ( H X + u , R ) ,
then the joint distribution of X and Y, as well as the marginal distribution of Y, are provided by
X Y N m H m + u , P P H H P H P H + R , Y N H m + u , H P H + R .
Lemma 2. 
If the random variables X and Y follow the joint Gaussian probability distribution
X Y N a b , A C C B ,
then the marginal and conditional distributions of X and Y are as follows:
X N ( a , A ) , Y N ( b , B ) , X | Y N a + C B 1 ( Y b ) , A C B 1 C , Y | X N b + C A 1 ( X a ) , B C A 1 C .
Now we demonstrate Theorems 1 and 2 in turn, which provides the Kalman filter and smoother equations about q ( x 0 : T ) in the time-varying HO-LDS.
Theorem 1. 
The Kalman filter equations for the time-varing HO-LDS (Equation (43)) parameterized by θ ^ x 0 : T can be evaluated in closed form, which results in Gaussian distributions:
p ( x t | y 1 : t 1 ) = N ( x t | μ t ( p ) , Σ t ( p ) ) ,
p ( x t | y 1 : t ) = N ( x t | μ t ( u ) , Σ t ( u ) ) ,
p ( x ¯ t | y 1 : t ) = N ( x ¯ t | μ t ( m ) , Σ t ( m ) ) .
The distribution parameters can be calculated using the following Kalman filter prediction, update and merging steps detailed in the following proof. The recursion begins with the prior mean μ 0 m and covariance Σ 0 m (i.e., μ ^ 0 and Σ ^ 0 ).
Proof. 
The Gaussian filter distributions (Equations (50)–(52)) can be obtained by the following steps:
1. Prediction step. By Lemma 1, the joint distribution of x t and x ¯ t 1 given y 1 : t 1 is
p ( x t , x ¯ t 1 | y 1 : t 1 ) = p ( x t | x ¯ t 1 ) p ( x ¯ t 1 | y 1 : t 1 ) = N ( x t | A ^ t x ¯ t 1 , Q ^ t ) N ( x ¯ t 1 | μ t 1 ( m ) , Σ t 1 ( m ) ) = N x ¯ t 1 x t μ ( 1 ) , Σ ( 1 ) ,
where
μ ( 1 ) = μ t 1 ( m ) A ^ t μ t 1 ( m ) , Σ ( 1 ) = Σ t 1 ( m ) Σ t 1 ( m ) A ^ t A ^ t Σ t 1 ( m ) A ^ t Σ t 1 ( m ) A ^ t + Q ^ t ,
and by Lemma 2 the marginal distribution of x t is given by
p ( x t | y 1 : t 1 ) = N ( x t | μ t ( p ) , Σ t ( p ) ) ,
where
μ t ( p ) = A ^ t μ t 1 ( m ) , Σ t ( p ) = A ^ t Σ t 1 ( m ) A ^ t + Q ^ t .
2. Update step. By Lemma 1, the joint distribution of y t and x t given y 1 : t 1 is
p ( x t , y t | y 1 : t 1 ) = p ( y t | x t ) p ( x t | y 1 : t 1 ) = N ( y t | C ^ t x t , S ^ t ) N ( x t | μ t ( p ) , Σ t ( p ) ) = N x t y t μ ( 2 ) , Σ ( 2 ) ,
where
μ ( 2 ) = μ t ( p ) C ^ t μ t ( p ) , Σ ( 2 ) = Σ t ( p ) Σ t ( p ) C ^ t C ^ t Σ t ( p ) C ^ t Σ t p C ^ t + S ^ t .
By Lemma 2 the conditional distribution of x t given y 1 : t is
p ( x t | y t , y 1 : t 1 ) = p ( x t | y 1 : t ) = N ( x t | μ ( u ) , Σ ( u ) ) ,
where
μ ( u ) = μ t ( p ) + Σ t ( p ) C ^ t ( C ^ t Σ t ( p ) C ^ t + S ^ t ) 1 ( y t C ^ t μ t ( p ) ) ,
Σ ( u ) = Σ t ( p ) Σ t ( p ) C ^ t ( C ^ t Σ t ( p ) C ^ t + S ¯ t ) 1 C ^ t Σ t ( p ) .
3. Merging step. By Lemma 1, the joint distribution of x ¯ t 1 , x t , y t given y 1 : t 1 is
p ( x ¯ t 1 , x t , y t | y 1 : t 1 ) = p ( y t | x t ) p ( x t , x ¯ t 1 | y 1 : t 1 ) = N ( C ^ t x t , S ^ t ) N x ¯ t 1 x t μ ( 1 ) , Σ ( 1 ) , = N x ¯ t 1 x t y t μ ( 3 ) , Σ ( 3 ) ,
where
μ ( 3 ) = μ 1 ( 3 ) μ 2 ( 3 ) μ 3 ( 3 ) = μ t 1 ( m ) A ^ t μ t 1 ( m ) C ^ t A ^ t μ t 1 ( m ) ,
Σ ( 3 ) = Σ 11 ( 3 ) Σ 12 ( 3 ) Σ 13 ( 3 ) Σ 21 ( 3 ) Σ 21 ( 3 ) Σ 23 ( 3 ) Σ 31 ( 3 ) Σ 32 ( 3 ) Σ 33 ( 3 ) , = Σ t 1 ( m ) Σ t 1 ( m ) A ^ t Σ t 1 ( m ) A ^ t C ^ t A ^ t Σ t 1 ( m ) A ^ t Σ t 1 ( m ) A ^ t + Q ^ t ( A ^ t Σ t 1 ( m ) A ^ t + Q ^ t ) C ^ t C ^ t A ^ t σ t 1 ( m ) C ^ t ( A ^ t Σ t 1 ( m ) A ^ + Q ^ t ) C ^ t ( A ^ t Σ t 1 ( m ) A ^ t + Q ^ t ) C ^ t + S ^ t .
By Lemma 2, the conditional distribution of x ¯ t 1 and x t given y 1 : t is
p ( x ¯ t 1 , x t | y t , y 1 : t 1 ) = p ( x ¯ t 1 , x t | y 1 : t ) = N ( x ¯ t 1 , x t | μ ( 4 ) , Σ ( 4 ) ) ,
where
μ ( 4 ) = μ 1 ( 3 ) μ 2 ( 3 ) + Σ 13 ( 3 ) Σ 23 ( 3 ) ( Σ 33 ( 3 ) ) 1 ( y t μ 3 ( 3 ) ) ,
Σ ( 4 ) = Σ 11 ( 3 ) Σ 13 ( 3 ) Σ 23 ( 3 ) ( Σ 33 ( 3 ) ) 1 Σ 13 ( 3 ) Σ 23 ( 3 ) .
Finally, the marginal distribution of x ¯ t is
p ( x ¯ t | y 1 : t ) = N ( x ¯ t | μ t ( m ) , Σ t ( m ) ) = N ( x ¯ t | μ ( 4 ) ( x ¯ t ) , Σ ( 4 ) ( x ¯ t ) ) ,
where μ ( 4 ) ( x ¯ t ) and Σ ( 4 ) ( x ¯ t ) are marginal mean vector and covariance matrix of x ¯ t .    □
The Kalman filter for the time-varying HO-LDS in Theorem 1 calculates the estimates based on measurements collected up to and including time step t. After obtaining the filtering posterior state distributions, Theorem 2 outlines the results for calculating the marginal posterior distributions for each time step based on all data up to the final time step T, which is described as the following:
Theorem 2. 
The Kalman smoother equations for the time-varing HO-LDS (Equation (43)) parameterized by θ ^ x 0 : T can be evaluated in closed form, which results Gaussian distributions:
p ( x t | y 1 : T ) = N ( x t | μ t ( s 1 ) , Σ t ( s 1 ) ) ,
p ( x ¯ t | y 1 : T ) = N ( x ¯ t | μ t ( s 2 ) , Σ t ( s 2 ) ) ,
p ( x ¯ t 1 , x t | y 1 : T ) = N ( ( x ¯ t 1 , x t ) | μ t ( s 3 ) , Σ t ( s 3 ) ) .
The smoothing distribution and filtering distribution of the last time step are identical, allowing for the recursive computation of the smoothing distribution for all time steps by beginning from the last step and moving backward.
Proof. 
Firstly by Lemma 1, the joint distribution of x ¯ t 1 and x t given y 1 : t 1 is
p ( x t , x ¯ t 1 | y 1 : t 1 ) = p ( x t | x ¯ t 1 ) p ( x ¯ t 1 | y 1 : t 1 ) = N ( x t | A ^ t x ¯ t 1 , Q ^ t ) N ( x ¯ t 1 | μ t 1 ( m ) , Σ t 1 ( m ) ) = N x ¯ t 1 x t μ ( 1 ) , Σ ( 1 ) ,
where
μ ( 1 ) = μ t 1 ( m ) A ^ t μ t 1 ( m ) , Σ ( 1 ) = Σ t 1 ( m ) Σ t 1 ( m ) A ^ t A ^ t Σ t 1 ( m ) A ^ t Σ t 1 ( m ) A ^ t + Q ^ t .
According to the Markov property of the states, we obtain
p ( x ¯ t 1 | x t , y 1 : T ) = p ( x ¯ t 1 | x t , y t 1 ) .
Therefore, we have the conditional distribution as
p ( x ¯ t 1 | x t , y 1 : T ) = p ( x ¯ t 1 | x t , y t 1 ) , = N ( x ¯ t 1 | μ ˜ ( 2 ) , Σ ˜ ( 2 ) ) ,
where
G t = Σ t 1 ( m ) A ^ t ( A ^ t Σ t 1 ( m ) A ^ t + Q ^ t ) 1 , μ ˜ ( 2 ) = μ t 1 ( m ) + G t ( x t A ^ t μ t 1 ( m ) ) , Σ ˜ ( 2 ) = Σ t 1 ( m ) G t A ^ t Σ t 1 ( m ) .
Then, the joint distribution of x ¯ t 1 and x t given all the data is
p ( x ¯ t 1 , x t | y 1 : T ) = p ( x ¯ t 1 | x t , y 1 : T ) p ( x t | y 1 : T ) = N ( x ¯ t 1 | μ ˜ ( 2 ) , Σ ˜ ( 2 ) ) N ( x t | μ t ( s 1 ) , Σ t ( s 1 ) ) = N x ¯ t 1 x t μ t ( s 3 ) , Σ t ( s 3 ) ,
where
μ t ( s 3 ) = G t μ t ( s 1 ) + μ t 1 ( m ) G t A ^ t μ t 1 ( m ) μ t ( s 1 ) ,
Σ t s 3 = G t Σ t ( s 1 ) G t + Σ ˜ ( 2 ) Σ t ( s 1 ) G t G t Σ t ( s 1 ) Σ t ( s 1 ) .
Thus, the marginal distribution of x ¯ t 1 is given as
p ( x ¯ t 1 | y 1 : T ) = N ( x ¯ t 1 | μ t ( s 2 ) , Σ t ( s 2 ) ) ,
where
μ t ( s 2 ) = G t μ t ( s 1 ) + μ t 1 ( m ) G t A ^ t μ t 1 ( m ) ,
Σ t ( s 2 ) = G t Σ t ( s 1 ) G t + Σ ˜ ( 2 ) .
The p ( x t 1 | y 1 : T ) will be directly obtained from p ( x ¯ t 1 | y 1 : T ) . Therefore, we have derived the concrete expressions of the Gaussian smoother distributions (Equations (69)–(71)).    □
Thus, we can obtain the posterior distributions of q ( x 0 : T | y 1 : T ) as well as the expectations involved in the update for the other factors, which are E q ( x 0 : T ) ( x t ) , E q ( x 0 : T ) ( x t x t ) and E q ( x 0 : T ) ( x ¯ t x t + 1 ) .

3.2. Update for q ( ω 1 : T )

q ( ω 1 : T ) can be updated using a closed-form expression by the Pólya-gamma augmentation. Equations (37) and (41) results in the following update for q ( ω 1 : T ) = t = 1 T k = 1 K 1 q ( ω t , k ) :
ln q ( ω t , k ) E q ( ω 1 : T ) ln p ( z t + 1 , ω t | x t ) E q ( ω 1 : T ) ln p ( z t + 1 | ω t , x t ) + E q ( ω 1 : T ) ln PG ( ω t , k | I ( z t + 1 > k ) , 0 ) .
Since E q ( ω 1 : T ) ln p ( z t + 1 | ω t , x t ) 1 2 E q ( ω 1 : T ) ( ν t , k 2 ) ω t , k , we have
q ( ω t , k ) = PG ( ω t , k | E q ( z 1 : T ) I ( z t + 1 > k ) , E q ( ω 1 : T ) ( ν t , k 2 ) ) .
Moreover, the expectation of the posterior Pólya-gamma distribution is available in closed form [22]:
E ( ω t , k ) = E q ( z 1 : T ) I ( z t + 1 > k ) 2 E q ( z 1 : T ) q ( x 1 : T ) q ( θ ) ( ν t , k 2 ) tanh 1 2 E q ( z 1 : T ) q ( x 1 : T ) q ( θ ) ( ν t , k 2 ) .

3.3. Update for q ( z 1 : T )

Since E q ( z 1 : T ) ln p ( θ ) is not a function of z 1 : T , Equation (35) can be written as
ln q ( z 1 : T ) E q ( z 1 : T ) ln p ( z 1 : T , x 0 : T , ω 1 : T , y 1 : T | θ ) E q ( z 1 : T ) ln p ( z 1 | θ ) + t = 2 T ln p ( z t | x t 1 , z t 1 , ω t 1 , θ ) + t = 1 T ln p ( x t | z t , θ ) .
From Equation (86) it is evident that the resulting factor q ( z 1 : T ) has the form of an HMM parameterized by q ( x 1 : T , ω 1 : T , θ ) , and the expected sufficient statistics required for updating the other factors can be obtained by message passing algorithms. We define ln π ˜ z 1 : = E q ( z 1 : T ) ln p ( z 1 | θ ) , ln a ˜ z t z t + 1 : = E q ( z 1 : T ) ln p ( z t + 1 | z t , x t , ω t , θ ) , ln b ˜ z t ( x t ) : = E q ( z 1 : T ) ln p ( x t | z t , θ ) and the set of the HMM parameters is denoted as θ ^ z 1 : T . Thus, we have
ln q ( z 1 : T ) ln π ˜ z 1 + t = 2 T ln a ˜ z t 1 z t ( t 1 ) + t = 1 T ln b ˜ z t ( x t ) .
Note that the densities p ( z 1 | θ ) , p ( z t + 1 | z t , x t , ω t , θ ) , p ( x t | z t , θ ) belong to the exponential family, these expectations can be evaluated in closed form. The computation of the HMM parameters by Equation (86) is summarized in Algorithm 2.
We apply a standard forward-backward algorithm for an HMM where the forward recursion is:
α t ( i ) : = p ( z t = i | x 1 : t ) b ˜ i ( x t ) j = 1 K α t 1 ( j ) ,
where α 1 ( i ) π ˜ i b ˜ i ( x 1 ) and the backward recursion is:
β t ( i ) : = p ( x t + 1 : T | z t = i ) j = 1 K β t ( j ) b ˜ j ( x t ) a ˜ i j ( t ) .
Algorithm 2: Obtaining θ ^ z 1 : T
Symmetry 16 00474 i002
Then we can obtain the sufficient statistics used in the update for the other factors using the following equations:
γ t ( i ) : = p ( z t = i | x 1 : T ) α t ( i ) β t ( i ) ,
ξ t ( i , j ) : = p ( z t = i , z t + 1 = j | x 1 : T ) α t ( i ) a ˜ i j ( t ) b ˜ j ( x t + 1 ) β t + 1 ( j ) .

3.4. Update for q ( θ )

Given the sufficient statistics of q ( z 1 : T ) and q ( x 1 : T ) obtained by message passing, the conjugate posterior distribution of the model parameters can be updated in closed form by evaluating Equation (38). For each parameter, we present the optimized variational distribution in the following:
  • Update q ( π 0 ) . Simplifying Equation (38) and extracting the terms related to π 0 , we have
    ln q ( π 0 ) E q ( θ ) ln p ( z 1 | θ ) + ln p ( π 0 ) ,
    which results in
    q ( π 0 ) Dir ( π 0 | α 0 ) ,
    where α 0 = α 0 , 1 , , α 0 , k , , α 0 , K R K and
    α 0 , k = α 0 , k + γ 1 ( k ) ,
    α 0 , k are prior parameters of p ( π 0 ) .
  • Update ( μ 0 , Σ 0 ) . The posterior q ( μ 0 , Σ 0 ) can be obtained by the expectations of q ( x 0 : T ) , given by
    μ 0 E q ( x 0 : T ) x ¯ 0 ,
    Σ 0 E q ( x 0 : T ) x ¯ 0 x ¯ 0 μ 0 ( μ 0 ) .
  • Update q ( A k , Q k ) . Simplifying Equation (38) and extracting the terms related to ( A k , Q k ) , we have
    ln q ( A k , Q k ) E q ( θ ) t = 1 T γ t ( k ) 2 ln | Q k | 1 2 t = 1 T γ t ( k ) ( x t A k x ¯ t 1 ) Q k 1 ( x t A k x ¯ t 1 ) + ln p ( A k , Q k ) ,
    which results in
    q ( A k , Q k ) MNIW ( A k , Q k | η k ) ,
    where η k : = M k ( η ) , Φ k ( η ) , ι k ( η ) , Δ k ( η ) ,
    q ( A k , Q k ) | Q k | ι k ( η ) + ( p + 1 ) m + 1 2 exp { 1 2 Tr ( Δ k ( η ) Q k 1 ) 1 2 Tr ( Φ k ( η ) ) 1 ( A k M k ( η ) ) Q k 1 ( A k M k ( η ) ) } .
    The variational posterior parameters can be optimized as
    Φ k ( η ) = ( Φ k ( η ) ) 1 + t = 1 T E q ( x 0 : T ) ( γ t ( k ) x ¯ t 1 x ¯ t 1 ) 1 , M k ( η ) = M k ( η ) ( Φ k ( η ) ) 1 + t = 1 T E q ( x 0 : T ) ( γ t ( k ) x t x ¯ t 1 ) Φ k ( η ) , Δ k ( η ) = Δ k ( η ) + t = 1 T E q ( x 0 : T ) ( γ t ( k ) x t x t ) + M k ( η ) ( Φ k ( η ) ) 1 ( M k ( η ) ) M k ( η ) ( Φ k ( η ) ) 1 ( M k ( η ) ) , ι k ( η ) = ι k ( η ) + t = 1 T γ t ( k ) ,
    where M k ( η ) , Φ k ( η ) , ι k ( η ) , Δ k ( η ) are prior parameters of p ( A k , Q k ) .
  • Update q ( C k , S k ) . Simplifying Equation (38) and extracting the terms related to ( C k , S k ) , we have
    ln q ( C k , S k ) E q ( θ ) t = 1 T γ t ( k ) 2 ln | S k | 1 2 t = 1 T ( y t C k x t ) S k 1 ( y t C k x t ) + ln p ( C k , S k ) ,
    which results in
    q ( C k , S k ) MNIW ( C k , S k | λ k ) ,
    where λ k : = M k ( λ ) , Φ k ( λ ) , ι k ( λ ) , Δ k ( λ ) ,
    q ( C k , S k ) | S k | ι k ( λ ) + ( p + 1 ) m + 1 2 exp { 1 2 Tr ( Δ k ( λ ) S k 1 ) 1 2 Tr ( Φ k ( λ ) ) 1 ( C k M k ( λ ) ) Q k 1 ( C k M k ( λ ) ) } .
    The variational posterior parameters can be optimized as
    Φ k ( λ ) = ( Φ k ( λ ) ) 1 + t = 1 T E q ( x 0 : T ) ( γ t ( k ) x t x t ) 1 , M k ( λ ) = M k ( λ ) ( Φ k ( λ ) ) 1 + t = 1 T E q ( x 0 : T ) ( γ t ( k ) y t x t ) Φ k ( λ ) , Δ k ( λ ) = Δ k ( λ ) + t = 1 T γ t ( k ) y t y t + M k ( λ ) ( Φ k ( λ ) ) 1 ( M k ( λ ) ) M k ( λ ) ( Φ k ( λ ) ) 1 ( M k ( λ ) ) , ι k ( λ ) = ι k ( λ ) + t = 1 T γ t ( k ) ,
    where M k ( λ ) , Φ k ( λ ) , ι k ( λ ) , Δ k ( λ ) are prior parameters of p ( C k , S k ) .
  • Update q ( R k ) . Simplifying Equation (38) and extracting the terms related to R k , we have
    ln q ( R k ) E q ( θ ) 1 2 t = 1 T 1 γ t ( k ) ( R k x t + r k Ω t 1 κ t + 1 ) Ω t ( R k x t + r k Ω t 1 κ t + 1 ) + i = 1 K 1 ln p ( R k , i ) ,
    which results in
    q ( R k ) i = 1 K 1 N ( R k , i | ζ k , i ) ,
    where ζ k , i : = μ k , i ( ζ ) , Σ k , i ( ζ ) , which can be optimized as
    Σ k , i ( ζ ) = Σ k , i ( ζ ) 1 + t = 1 T E q ( ω 1 : T , x 0 : T ) x t γ t ( k ) Ω t , i x t 1 ,
    μ k , i ( ζ ) = Σ k , i ( ζ ) t = 1 T 1 E q ( ω 1 : T ) q ( x 0 : T ) γ t ( k ) Ω t , i H t , k , i x t + Σ k , i ( ζ ) 1 μ k , i ( ζ ) ,
    H t , k = [ H t , k , 1 , , H t , k , i , , H t , k , K 1 ] = E q ( r k ) q ( ω 1 : T ) r k Ω t 1 κ t + 1 ,
    and μ k , i ( ζ ) , Σ k , i ( ζ ) are prior parameters of p ( R k , i ) . To derive Equation (107), just note that R k Ω t R k = i = 1 K 1 R k , i Ω k , i R k , i and use the matrix trace.
  • Update q ( r k ) . Simplifying Equation (38) and extracting the terms related to r k , we have
    ln q ( r k ) E q ( θ ) 1 2 t = 1 T 1 γ t ( k ) ( R k x t + r k Ω t 1 κ t + 1 ) Ω t ( R k x t + r k Ω t 1 κ t + 1 ) + ln p ( r k ) ,
    which results in
    q ( r k ) N ( r k | ρ k ) ,
    where ρ k : = μ k ( ρ ) , Σ k ( ρ ) , which can be optimized as
    Σ k ( ρ ) = Σ k ( ρ ) 1 + t = 1 T 1 E q ( ω 1 : T ) γ t ( k ) Ω t 1 ,
    μ k ( ρ ) = Σ k ( ρ ) Σ k ( ρ ) 1 μ k ( ρ ) + t = 1 T 1 E q ( R k ) q ( x 0 : T ) q ( ω 1 : T ) γ t ( k ) Ω t R k x t Ω t 1 κ t + 1 ,
    where μ k ( ρ ) , Σ k ( ρ ) are prior parameters of p ( r k ) .

3.5. Initialization

To obtain a reliable variational inference, we initialized the model parameters and latent states with reasonable values by using the following initialization procedure:
  • Probabilistic Principal Component Analysis (PPCA) [27] is conducted on the data, to initialize the continuous latent variables, x 0 : T and the parameters C;
  • Fit an AR(p)-HMM to initialize the switching states, z 1 : T and the parameters, A k , Q k . The autoregressive order, p, is determined by AIC, BIC, and HQ criterion;
  • To alleviate the possible and undesirable dependence on ordering that arises from the stick-breaking formulation during the inference, we adopt a strategy of greedily fitting a decision list [21] to identify the most suitable permutation of the switching states for the stick-breaking process. Specifically, we start by performing a greedy search on permutations by creating a decision list based on ( x t , z t ) , z t + 1 pairs, given by
    z t + 1 = o 1 if p 1 o 2 if ¬ p 1 p 2 o K if i = 1 K 1 ( ¬ p i ) ,
    where ( o 1 , , o K ) is a permutation of ( 1 , , K ) , and p 1 , , p k are predicates that rely on ( x t , z t ) and provide a true or false. In our framework, these predicates are defined by logistic functions:
    p j = σ R 0 , j x t > ε ,
    where ε ( 0 , 1 ) should be predetermined. To determine o 1 and r 1 , we used the maximum a posteriori estimate of the model for each of the K potential states. For the kth logistic regression, the inputs are x 0 : T and the outputs are y t = I ( z t + 1 = k ) . We chose the logistic regression model with the largest likelihood as the first output. Then we excluded time points where z t + 1 = o 1 from the data and proceeded to K 1 logistic regressions to predict the subsequent output, o 2 , and so on. After cycling through all K results, we obtained the permutation of z 1 : T . In addition, the R 0 , j j = 1 K 1 served as an initialization of the recurrence weights (R).

4. Numerical Experiments

To investigate the performance of the proposed HO-rSLDS, we generated synthetic datasets through the following steps:
  • Benchmark model settings.We take AR-HSMM as a benchmark model to generate synthetic time series. Specifically, the dimensionality of the synthetic time series is n = 5 , and the total length is T = 1000 . The state number is K = 3 , and the autoregressive order p takes values 1 , 2 , 3 .
  • Generating ( A k , Q k ) . ( A k , Q k ) corresponds to the emission parameters involved in the VAR process of latent variables x t (Equation (11)). The autoregressive coefficient matrices A k are generated with 50% sparsity (defined as the proportion of the zero elements), i.e., 50% of the elements in A k are 0. The non-zero elements of A k are generated to be positive or negative with equal probability, and their absolute value is sampled uniformly between 0.2 and 0.5. The covariance matrices Q k are generated as Q k = P k diag σ 1 , , σ d P k , with P k being an orthogonal matrix and σ i assumed positive. The matrix P k is constructed by orthogonalizing a random matrix whose entries are simulated from a standard Normal, while each σ i is uniformly sampled in the interval [ 1 , 3 ] . Additionally, a rejection step is done to check that the sampled ( A k , Q k ) constituted a stable VAR process.
  • Generating switching state sequence z 1 : T . To generate z 1 : T , we set the state transition probabilities π j k to 0.5, for j k = 1 , 2 , 3 . Note that in an HSMM, the self-transition probability is 0. Furthermore, we simulate the state duration time in two cases. In the first case, the state duration time is sampled from a geometric distribution (the between-state transition probabilities are set to 0.1). In the second case, the state duration time is directly sampled from 1 , , 10 with specified sampling probabilities. Since the second case does not correspond to a parametric distribution, we denote these two cases as geometric and nonparametric, respectively. In addition, the initial state is uniformly sampled from 1 , , 10 .
  • Generating ( C k , S k ) . ( C k , S k ) corresponds to the emission parameters generating the Gaussian observations y t (Equation (12)). We generate ( C k , S k ) in the same manner as ( A k , Q k ) .
  • Repeat the above procedure 100 times to obtain 100 synthetic data by setting different random seeds when generating ( A k , Q k , C k , S k ) .
Figure 4 displays all dimensions and the true state sequence of one synthetic dataset.
We investigated the performance of HO-rSLDS by comparing with the competing models, while the estimation accuracy is evaluated by the normalized mean squared error (NMSE) between the true and estimated emission model parameters and the signal-to-noise ratio (SNR) of the inferred optimized switching state sequence ( z ^ 1 : T ), given by
NMSE = 1 K k = 1 K C ^ k C k F C k F ,
SNR = t = 1 T I ( z t = z ^ t ) T ,
where · F denotes Frobenius norm, C ^ k and z ^ 1 : T are estimated values. For AR-HMM, the NMSE is evaluated based on the true and estimated VAR emission parameters. To obtain z ^ 1 : T , we set z ^ t = arg max k γ t ( k ) . In addition, we avoid the optimization about discrete state number (K) and the dependence order (p) in HO-rSLDS by directly setting them to the true value.
The results of estimated NMSE and SNR for the three models are summarized in Table 1 and Table 2, respectively. It is observed that HO-rSLDS outperforms AR-HMM and SLDS with lower average NMSE and higher average SNR for all the autoregressive orders and state duration distribution combinations. In addition, when p = 1 , for both geometric and nonparametric state duration, SLDS outperforms AR-HMM slightly, which may be because the synthetic data are generated by a simulated extensive SLDS structure. However, when p > 1 and there is long-term dependence involved in the latent variables or observation data, it is just the opposite, since AR-HMM with higher p can capture the temporal relationships in the observation data while SLDS cannot. Moreover, when the transition of the underlying discrete states is characterized by a nonparametric duration distribution, the estimation accuracy of AR-HMM and SLDS is reduced, while HO-rSLDS remains stable with good estimation accuracy. This is because state transitions in both AR-HMM and SLDS are characterized by a Markov assumption and restricted to geometric duration distribution. Thus, more weight is given to shorter consecutive time periods within a certain state, which means the regimes switch frequently, which may not be appropriate in cases without sufficient prior information about a geometric state duration. By comparison, HO-rSLDS exhibits robustness to the state duration distributions since it improves the state transition by leveraging the stick-breaking logistic regression. The current state z t no longer depends on only the preceding state z t 1 , but also the latent variable x t 1 , and, furthermore, the previous observations.
A more exhaustive comparison is illustrated in Figure 5 where HO-rSLDS outperforms the competing models with better estimation accuracy and statistical significance in all the cases (t-test, ** p < 0.01 ).

5. Dynamic Functional Connectivity Analysis in fMRI Data

To show that HO-rSLDS can also work well with real-world data, we evaluate the performance of HO-rSLDS by comparing it with standard SLDS based on public functional magnetic resonance imaging (fMRI) data. The data are collected in the Human Connectome Project (HCP) 1200 Parcellation Timeseries Netmats (HCP1200-TPN) dataset (Available at https://www.humanconnectome.org, (accessed on 6 March 2024)). Please see [28] for a detailed explanation of the entire acquisition protocol. In this study, we apply HO-rSLDS to data from 20 unrelated subjects with dimension d = 15 and times series length T 0 = 4800 for each subject included in the HCP1200-PTN release to perform dynamic functional connectivity (DFC) analysis, which can be used to analysis the dynamical temporal coherence among endogenous fluctuations in distributed brain regions [29,30]. In general, HO-rSLDS and SLDS assume that there are metastable states with characteristic connectivity pattern in the brain, while the connectivity pattern can be represented by the statistical correlations directly. Specifically, by Lemma 1, the functional connectivity pattern of state k in HO-rSLDS and SLDS can be obtained as
DFC k = t = 1 T γ t ( k ) E C k E x t x t E x t E x t E C k + E S k t = 1 T γ t ( k ) .
In addition, there are two unknown parameters: the state number K and the dependence order p. We use the mixture minimum description length (MMDL) considering optimal code lengths for each state in a mixed model [31], given by
MMDL ( K ) : = ln L ( K ) + 1 2 ln ( T ) K ( K 1 ) + k = 1 K 1 2 ln ( T γ ¯ k ) d 2 ,
where L ( K ) is the variational lower bound evaluated during the model inference and γ ¯ k : = t = 1 T γ t ( k ) . The concept of MMDL may be elucidated by the minimal description length principle in [32]. The state number K is determined by minimizing the MMDL on a specified list of K. In addition, we fit VAR model on the fMRI time series and the optimal dependence order p will be determined by AIC, BIC and HQ criteria [33], given by
AIC : = ln det ( Σ ^ ) + 2 p d 2 T ,
BIC : = ln det ( Σ ^ ) + ln ( T ) p d 2 T ,
HQ : = ln det ( Σ ) + 2 ln ln T p d 2 T ,
where Σ ^ is the estimated covariance matrix of noise in VAR model. To avoid the model order too large to permit feasible computation, we choose the lag order on the condition when AIC/BIC/HQ show no further substantial decreases at higher orders [34]. As illustrated in Figure 6, we set K = 4 according to the minimal MMDL and p = 4 since the downward trend of AIC, BIC and HQ tends to be flat with higher order.
In Figure 7, we present the DFC patterns derived from the proposed HO-rSLDS and the standard SLDS. The 15 brain regions producing the fMRI time series are indexed by numbers 1 to 15. The states are ordered by their fractional occupancy (defined as the percentage of time allocated to that specific state); it is evident that the 4 states from SLDS have a more average fractional occupancy distribution. Moreover, it is observed that DFC patterns derived from SLDS are more similar than those from HO-rSLDS, where there is only a slight difference in DFC patterns between State4 and the other three states from SLDS. We use cosine similarity to measure the difference of the DFC patterns characterized by each state, given by
cos ( F 1 , F 2 ) = Tr ( F 1 F 2 ) F 1 F F 2 F ,
where F 1 and F 2 are square matrices. By this means, the averaged cosine similarity for SLDS is 0.9189 while that for HO-rSLDS is 0.6731. Although the real DFC patterns of this fMRI data are unknown, compared to standard SLDS, HO-rSLDS can indeed dig out more information about the DFC patterns by the model superiority. For example, the State4 in HO-rSLDS, with the lowest fractional occupancy, can be considered as a strongly connected state where the absolute functional connectivity is significantly larger than that of the other three states. Moreover, compared with standard SLDS, there is significantly positive or negative functional connectivity in the four states derived from HO-rSLDS. In addition, there is similar functional connectivity between State1 and State2, which results in a higher cosine similarity than the averaged value for HO-rSLDS (0.7776). The similar functional connectivity is not exhibited obviously in State3 (such as the negative connectivity between region 8 and regions 11, 12, 13). However, these findings cannot be obtained by standard SLDS.

6. Conclusions

In this study, we have established a new method named HO-rSLDS as an extension of SLDS that improves the statistical modeling of dynamics from complex systems. First, HO-rSLDS addresses the problem of discovering a higher-order temporal dependence involved in the latent variables or observation data by allowing the current latent variable to be associated with the measurements of several previous time steps as in AR-HMM. Second, HO-rSLDS improves the switching state transition by utilizing stick-breaking logistic regression and Pólya-gamma augmentation, making the transition dependent on the latent variable and thus overcoming the limitations of the Markov assumption, leading to a geometric distribution of state durations. Moreover, the symmetric dependency between the switching states and the latent variable is recovered from the asymmetric relationships as in standard SLDS. We have presented the detailed inference algorithms of the HO-rSLDS for message passing and parameter learning. In numerical experiments, we concluded that HO-rSLDS outperforms the competing models with higher estimation accuracy and robustness. The utility and versatility of the developed HO-rSLDS have been demonstrated on real-world data as well. The application of fMRI data for DFC analyses indicates the superiority of the method in discovering abundant information about DFC patterns. A potential limitation of the proposed HO-rSLDS is that the state number and dependency order need to be predetermined in applications, while the hyperparameter optimization is always a non-trivial task. Future research will improve the proposed model by utilizing Bayesian nonparametric methods so that the complicated hyperparameter optimization can be efficiently avoided [35]. The proposed model improves the current methods for learning the switching linear dynamical modes, which will facilitate the identification and assessment of the dynamics of complex systems.

Author Contributions

Conceptualization, H.W. and J.C.; methodology, H.W.; software, H.W.; validation, H.W.; formal analysis, H.W. and J.C.; investigation, H.W.; resources, H.W. and J.C.; data curation, J.C.; writing—original draft preparation, H.W.; writing—review and editing, J.C.; visualization, H.W.; supervision, J.C.; project administration, J.C.; funding acquisition, J.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation (grant number 81671633), and the project was supported by the Open Fund of Hubei Longzhong Laboratory.

Data Availability Statement

The data that supported the findings of this study are included in this article.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
SLDSSwitching linear dynamical systems
HO-SLDSSwitching linear dynamical systems with higher-order dependence
HO-rSLDSRecurrent switching linear dynamical systems with higher-order dependence
HMMHidden Markov model
VARVector autoregressive
AR-HMMAutoregressive hidden Markov model
AR-HSMMAutoregressive hidden semi-Markov model
NMSENormalized mean squared error
SNRSignal-to-noise ratio
fMRIFunctional magnetic resonance imaging
DFCDynamic functional connectivity

References

  1. Fox, E.; Sudderth, E.B.; Jordan, M.I.; Willsky, A.S. Bayesian nonparametric inference of switching dynamic linear models. IEEE Trans. Signal Process. 2011, 59, 1569–1585. [Google Scholar] [CrossRef]
  2. Pandarinath, C.; O’Shea, D.J.; Collins, J.; Jozefowicz, R.; Stavisky, S.D.; Kao, J.C.; Trautmann, E.M.; Kaufman, M.T.; Ryu, S.I.; Hochberg, L.R.; et al. Inferring single-trial neural population dynamics using sequential auto-encoders. Nat. Methods 2018, 15, 805–815. [Google Scholar] [CrossRef] [PubMed]
  3. Frigola, R.; Chen, Y.; Rasmussen, C.E. Variational Gaussian process state-space models. In Proceedings of the Annual Conference on Neural Information Processing Systems 2014, Montreal, QC, Canada, 8–13 December 2014; Volume 27. [Google Scholar]
  4. Krishnan, R.; Shalit, U.; Sontag, D. Structured inference networks for nonlinear state space models. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar]
  5. Hamilton, J.D. Time Series Analysis; Princeton University Press: Princeton, NJ, USA, 2020. [Google Scholar]
  6. de Souza Baptista, R.; Bó, A.P.; Hayashibe, M. Automatic human movement assessment with switching linear dynamic system: Motion segmentation and motor performance. IEEE Trans. Neural Syst. Rehabil. Eng. 2016, 25, 628–640. [Google Scholar] [CrossRef] [PubMed]
  7. Liu, H.; Wang, L. Human motion prediction for human-robot collaboration. J. Manuf. Syst. 2017, 44, 287–294. [Google Scholar] [CrossRef]
  8. Alameda-Pineda, X.; Drouard, V.; Horaud, R.P. Variational inference and learning of piecewise linear dynamical systems. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 3753–3764. [Google Scholar] [CrossRef] [PubMed]
  9. Ma, T.; Srinivasan, S.; Lazarou, G.; Picone, J. Continuous speech recognition using linear dynamic models. Int. J. Speech Technol. 2014, 17, 11–16. [Google Scholar] [CrossRef]
  10. Pagan, A.R.; Pesaran, M.H. Econometric analysis of structural systems with permanent and transitory shocks. J. Econ. Dyn. Control 2008, 32, 3376–3395. [Google Scholar] [CrossRef]
  11. Haluszczynski, A.; Räth, C. Controlling nonlinear dynamical systems into arbitrary states using machine learning. Sci. Rep. 2021, 11, 12991. [Google Scholar] [CrossRef] [PubMed]
  12. Smith, J.F.; Pillai, A.; Chen, K.; Horwitz, B. Identification and validation of effective connectivity networks in functional magnetic resonance imaging using switching linear dynamic systems. Neuroimage 2010, 52, 1027–1040. [Google Scholar] [CrossRef]
  13. Wang, E.T.; Vannucci, M.; Haneef, Z.; Moss, R.; Rao, V.R.; Chiang, S. A Bayesian switching linear dynamical system for estimating seizure chronotypes. Proc. Natl. Acad. Sci. USA 2022, 119, e2200822119. [Google Scholar] [CrossRef]
  14. Rabiner, L.R. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 1989, 77, 257–286. [Google Scholar] [CrossRef]
  15. Vidaurre, D. A new model for simultaneous dimensionality reduction and time-varying functional connectivity estimation. PLoS Comput. Biol. 2021, 17, e1008580. [Google Scholar] [CrossRef] [PubMed]
  16. Hamada, R.; Kubo, T.; Ikeda, K.; Zhang, Z.; Shibata, T.; Bando, T.; Hitomi, K.; Egawa, M. Modeling and prediction of driving behaviors using a nonparametric Bayesian method with AR models. IEEE Trans. Intell. Veh. 2016, 1, 131–138. [Google Scholar] [CrossRef]
  17. Houpt, J.W.; Frame, M.E.; Blaha, L.M. Unsupervised parsing of gaze data with a beta-process vector auto-regressive hidden Markov model. Behav. Res. Methods 2018, 50, 2074–2096. [Google Scholar] [CrossRef] [PubMed]
  18. Vidaurre, D.; Quinn, A.J.; Baker, A.P.; Dupret, D.; Tejero-Cantero, A.; Woolrich, M.W. Spectrally resolved fast transient brain states in electrophysiological data. Neuroimage 2016, 126, 81–95. [Google Scholar] [CrossRef] [PubMed]
  19. Glennie, R.; Adam, T.; Leos-Barajas, V.; Michelot, T.; Photopoulou, T.; McClintock, B.T. Hidden Markov models: Pitfalls and opportunities in ecology. Methods Ecol. Evol. 2023, 14, 43–56. [Google Scholar] [CrossRef]
  20. Johnson, M.J.; Willsky, A.S. Bayesian Nonparametric Hidden Semi-Markov Models. J. Mach. Learn. Res. 2013, 14, 673–701. [Google Scholar]
  21. Linderman, S.W.; Johnson, M.J.; Miller, A.C.; Adams, R.P.; Blei, D.M.; Paninski, L. Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), Fort Lauderdale, FL, USA, 20–22 April 2017; Volume 54, pp. 914–922. [Google Scholar]
  22. Polson, N.G.; Scott, J.G.; Windle, J. Bayesian inference for logistic models using Pólya–Gamma latent variables. J. Am. Stat. Assoc. 2013, 108, 1339–1349. [Google Scholar] [CrossRef]
  23. Jacobs, W.R.; Baldacchino, T.; Dodd, T.; Anderson, S.R. Sparse Bayesian nonlinear system identification using variational inference. IEEE Trans. Autom. Control 2018, 63, 4172–4187. [Google Scholar] [CrossRef]
  24. Yu, S.Z. Hidden semi-Markov models. Artif. Intell. 2010, 174, 215–243. [Google Scholar] [CrossRef]
  25. Liu, Q.; Liu, W.; Dong, M.; Li, Z.; Zheng, Y. Residual useful life prognosis of equipment based on modified hidden semi-Markov model with a co-evolutional optimization method. Comput. Ind. Eng. 2023, 182, 109433. [Google Scholar] [CrossRef]
  26. Särkkä, S.; Svensson, L. Bayesian Filtering and Smoothing; Cambridge University Press: Cambridge, UK, 2023; Volume 17. [Google Scholar]
  27. Tipping, M.E.; Bishop, C.M. Probabilistic principal component analysis. J. R. Stat. Soc. Ser. B Stat. Methodol. 1999, 61, 611–622. [Google Scholar] [CrossRef]
  28. Glasser, M.F.; Sotiropoulos, S.N.; Wilson, J.A.; Coalson, T.S.; Fischl, B.; Andersson, J.L.; Xu, J.; Jbabdi, S.; Webster, M.; Polimeni, J.R.; et al. The minimal preprocessing pipelines for the Human Connectome Project. Neuroimage 2013, 80, 105–124. [Google Scholar] [CrossRef] [PubMed]
  29. Hutchison, R.M.; Womelsdorf, T.; Allen, E.A.; Bandettini, P.A.; Calhoun, V.D.; Corbetta, M.; Della Penna, S.; Duyn, J.H.; Glover, G.H.; Gonzalez-Castillo, J.; et al. Dynamic functional connectivity: Promise, issues, and interpretations. Neuroimage 2013, 80, 360–378. [Google Scholar] [CrossRef] [PubMed]
  30. Motlaghian, S.; Vahidi, V.; Baker, B.; Belger, A.; Bustillo, J.; Faghiri, A.; Ford, J.; Iraji, A.; Lim, K.; Mathalon, D.; et al. A method for estimating and characterizing explicitly nonlinear dynamic functional network connectivity in resting-state fMRI data. J. Neurosci. Methods 2023, 389, 109794. [Google Scholar] [CrossRef]
  31. Zhang, G.; Cai, B.; Zhang, A.; Stephen, J.M.; Wilson, T.W.; Calhoun, V.D.; Wang, Y.P. Estimating dynamic functional brain connectivity with a sparse hidden Markov model. IEEE Trans. Med. Imaging 2019, 39, 488–498. [Google Scholar] [CrossRef] [PubMed]
  32. Barron, A.; Rissanen, J.; Yu, B. The minimum description length principle in coding and modeling. IEEE Trans. Inf. Theory 1998, 44, 2743–2760. [Google Scholar] [CrossRef]
  33. Gredenhoff, M.; Karlsson, S. Lag-length selection in VAR-models using equal and unequal lag-length procedures. Comput. Stat. 1999, 14, 171–187. [Google Scholar] [CrossRef]
  34. Brovelli, A.; Ding, M.; Ledberg, A.; Chen, Y.; Nakamura, R.; Bressler, S.L. Beta oscillations in a large-scale sensorimotor cortical network: Directional influences revealed by Granger causality. Proc. Natl. Acad. Sci. USA 2004, 101, 9849–9854. [Google Scholar] [CrossRef]
  35. Teh, Y.W.; Jordan, M.I.; Beal, M.J.; Blei, D.M. Hierarchical Dirichlet Processes. J. Am. Stat. Assoc. 2006, 101, 1566–1581. [Google Scholar] [CrossRef]
Figure 1. Graphical model of HMM (a) and HSMM (b). An HMM typically consists of two layers: a hidden state layer based on a Markov chain and an observation layer dependent on the current state. An HSMM enhances the generative process of the standard HMM with a random state duration time, which is drawn from some state-specific distribution when the state is entered. K denotes the cardinality of the hidden state space, which can be a finite or infinite value.
Figure 1. Graphical model of HMM (a) and HSMM (b). An HMM typically consists of two layers: a hidden state layer based on a Markov chain and an observation layer dependent on the current state. An HSMM enhances the generative process of the standard HMM with a random state duration time, which is drawn from some state-specific distribution when the state is entered. K denotes the cardinality of the hidden state space, which can be a finite or infinite value.
Symmetry 16 00474 g001
Figure 2. The graphical model of the SLDS, which consists of switching states ( z t ), latent variables ( x t ) and observations ( y t ). Note that we assume it does not contain a link from z t to y t .
Figure 2. The graphical model of the SLDS, which consists of switching states ( z t ), latent variables ( x t ) and observations ( y t ). Note that we assume it does not contain a link from z t to y t .
Symmetry 16 00474 g002
Figure 3. The graphical model of the HO-rSLDS, where there is higher-order dependence involved in x t (blue) and the recurrent dependence from x t 1 to z t (red).
Figure 3. The graphical model of the HO-rSLDS, where there is higher-order dependence involved in x t (blue) and the recurrent dependence from x t 1 to z t (red).
Symmetry 16 00474 g003
Figure 4. Simulated time series with dimension d = 5 and total time points T = 1000 (the last 200 time points are exhibited). The five dimensions are displayed in different colors. The color intervals indicate the state sequence.
Figure 4. Simulated time series with dimension d = 5 and total time points T = 1000 (the last 200 time points are exhibited). The five dimensions are displayed in different colors. The color intervals indicate the state sequence.
Symmetry 16 00474 g004
Figure 5. Box-plot of NMSE and SNR derived from AR-HMM, SLDS and HO-rSLDS based on the 100 numerical experiments. HO-rSLDS outperforms the competing models with significant lower NMSE and higher SNR on average for all the autoregressive order and state duration distribution combination. (t-test, ** p < 0.01 ).
Figure 5. Box-plot of NMSE and SNR derived from AR-HMM, SLDS and HO-rSLDS based on the 100 numerical experiments. HO-rSLDS outperforms the competing models with significant lower NMSE and higher SNR on average for all the autoregressive order and state duration distribution combination. (t-test, ** p < 0.01 ).
Symmetry 16 00474 g005
Figure 6. (a) State number (K) determined by MMDL. The best K = 4 is chosen with respect to the minimal MMDL. The y-axis has been logarithmically scaled to highlight the variation. (b) Dependence order p is determined by AIC, BIC, and HQ criteria. The ideal p is set to 4 since there is no further substantial decrease with higher order.
Figure 6. (a) State number (K) determined by MMDL. The best K = 4 is chosen with respect to the minimal MMDL. The y-axis has been logarithmically scaled to highlight the variation. (b) Dependence order p is determined by AIC, BIC, and HQ criteria. The ideal p is set to 4 since there is no further substantial decrease with higher order.
Symmetry 16 00474 g006
Figure 7. DFC patterns derived from HO-rSLDS (a) and standard SLDS (b). The 15 brain regions are indexed by the numbers 1 to 15, and the states are ordered by their fractional occupancy (number in the brackets). The color bar is shared by the two group heatmaps.
Figure 7. DFC patterns derived from HO-rSLDS (a) and standard SLDS (b). The 15 brain regions are indexed by the numbers 1 to 15, and the states are ordered by their fractional occupancy (number in the brackets). The color bar is shared by the two group heatmaps.
Symmetry 16 00474 g007
Table 1. NMSE derived from AR-HMM, SLDS, and HO-rSLDS based on the synthetic time series with different state duration distribution and autoregressive order in the generative model. All the results are the average of the 100 experiments (mean ± std).
Table 1. NMSE derived from AR-HMM, SLDS, and HO-rSLDS based on the synthetic time series with different state duration distribution and autoregressive order in the generative model. All the results are the average of the 100 experiments (mean ± std).
NMSE p = 1 p = 2 p = 3
GeometricAR-HMM0.25 ± 0.060.33 ± 0.070.37 ± 0.09
SLDS0.23 ± 0.050.39 ± 0.090.44 ± 0.11
HO-rSLDS0.16 ± 0.040.17 ± 0.050.20 ± 0.06
NonparametricAR-HMM0.32 ± 0.070.41 ± 0.090.48 ± 0.11
SLDS0.31 ± 0.090.44 ± 0.110.49 ± 0.13
HO-rSLDS0.18 ± 0.060.20 ± 0.050.21 ± 0.06
Table 2. SNR derived from AR-HMM, SLDS, and HO-rSLDS based on the synthetic time series with different state duration distribution and autoregressive order in the generative model. All the results are the average of the 100 experiments (mean ± std).
Table 2. SNR derived from AR-HMM, SLDS, and HO-rSLDS based on the synthetic time series with different state duration distribution and autoregressive order in the generative model. All the results are the average of the 100 experiments (mean ± std).
SNR p = 1 p = 2 p = 3
GeometricAR-HMM0.79 ± 0.080.76 ± 0.070.69 ± 0.08
SLDS0.85 ± 0.060.68 ± 0.090.64 ± 0.09
HO-rSLDS0.93 ± 0.040.92 ± 0.050.90 ± 0.05
NonparametricAR-HMM0.73 ± 0.080.65 ± 0.070.59 ± 0.10
SLDS0.80 ± 0.060.60 ± 0.090.54 ± 0.11
HO-rSLDS0.91 ± 0.040.87 ± 0.050.86 ± 0.05
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, H.; Chen, J. Bayesian Inference of Recurrent Switching Linear Dynamical Systems with Higher-Order Dependence. Symmetry 2024, 16, 474. https://doi.org/10.3390/sym16040474

AMA Style

Wang H, Chen J. Bayesian Inference of Recurrent Switching Linear Dynamical Systems with Higher-Order Dependence. Symmetry. 2024; 16(4):474. https://doi.org/10.3390/sym16040474

Chicago/Turabian Style

Wang, Houxiang, and Jiaqing Chen. 2024. "Bayesian Inference of Recurrent Switching Linear Dynamical Systems with Higher-Order Dependence" Symmetry 16, no. 4: 474. https://doi.org/10.3390/sym16040474

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop