Bayesian Inference of Recurrent Switching Linear Dynamical Systems with Higher-Order Dependence

Wang, Houxiang; Chen, Jiaqing

doi:10.3390/sym16040474

Open AccessArticle

Bayesian Inference of Recurrent Switching Linear Dynamical Systems with Higher-Order Dependence

by

Houxiang Wang

¹

and

Jiaqing Chen

^1,2,*

¹

School of Science, Wuhan University of Technology, Wuhan 430071, China

²

Hubei Longzhong Laboratory, Xiangyang 441100, China

^*

Author to whom correspondence should be addressed.

Symmetry 2024, 16(4), 474; https://doi.org/10.3390/sym16040474

Submission received: 6 March 2024 / Revised: 7 April 2024 / Accepted: 10 April 2024 / Published: 13 April 2024

(This article belongs to the Topic Nonlinear Phenomena, Chaos, Control and Applications to Engineering and Science and Experimental Aspects of Complex Systems)

Download

Browse Figures

Versions Notes

Abstract

:

Many complicated dynamical events may be broken down into simpler pieces and efficiently described by a system that shifts among a variety of conditionally dynamical modes. Building on switching linear dynamical systems, we develop a new model that extends the switching linear dynamical systems for better discovering these dynamical modes. In the proposed model, the linear dynamics of latent variables can be described by a higher-order vector autoregressive process, which makes it feasible to evaluate the higher-order dependency relationships in the dynamics. In addition, the transition of switching states is determined by a stick-breaking logistic regression, overcoming the limitation of a restricted geometric state duration and recovering the symmetric dependency between the switching states and the latent variables from asymmetric relationships. Furthermore, logistic regression evidence potentials can appear as conditionally Gaussian potentials by utilizing the Pólya-gamma augmentation strategy. Filtering and smoothing algorithms and Bayesian inference for parameter learning in the proposed model are presented. The utility and versatility of the proposed model are demonstrated on synthetic data and public functional magnetic resonance imaging data. Our model improves the current methods for learning the switching linear dynamical modes, which will facilitate the identification and assessment of the dynamics of complex systems.

Keywords:

dynamic systems; state space methods; Kalman filter; Bayesian inference

1. Introduction

Complex systems frequently exhibit multiple levels of abstraction in their descriptions [1]. For example, a computer program can be characterized by the collection of functions it calls, the sequence in which it carries out statements or the assembly instructions it sends to the CPU. This assertion is true for a multitude of natural systems. Brain activity can be classified based on either broad psychological states or the activation of individual ion channels. The necessary amount of specificity may differ based on the particular task being performed [2]. Through the identification of these behavioral units and their interdependence, we can obtain a deeper understanding of the intricate mechanisms that give rise to complex natural occurrences. Furthermore, modern machine learning offers powerful tools to help model the dynamics of complex systems. The toolbox has recently been improved to incorporate more versatile elements, such as Gaussian processes [3] and neural networks [4], into probabilistic time series models.

Time series analysis encompasses several methodologies and models, such as stationary process models, spectral models, state space models, and non-linear models [5]. Among them, the state space methods are regarded as more versatile and adept at addressing a broader range of problems compared to the other models. Hidden Markov models (HMM) and switching linear dynamical systems (SLDS) are two well-known state space models. In numerous real-world time series situations, the present condition of a dynamic system is connected to its condition in prior time intervals. Both HMM and SLDS can address such problems and have been widely used in several problem fields. Examples include human motion [6,7], computer vision [8], speech recognition [9], econometrics [10], machine learning [11] and neuroscience [12,13].

HMM is a discrete Markov process with doubly embedded stochastic models, namely, (i) an unobservable hidden process characterized by a Markov chain and (ii) an observation process determined by hidden states. HMM-based models have garnered significant interest and demonstrated their utility in the signal processing community for accurately simulating the intricate temporal progression of signals. More precisely, HMM have a lengthy track record in signal processing, particularly in the field of speech processing where they have achieved notable success [14]. The SLDS allows for the modeling of nonlinear time series data by dividing it into subsequences controlled by linear dynamics. Indeed, the generative model of SLDS is slightly different from HMM, where there is an additional set of latent variables between the switching states and the observations. In addition, a SLDS differs from a HMM by choosing from a collection of linear Gaussian dynamics that evolve continuously, instead of a standard Gaussian mixture density as in HMM. The SLDS can be viewed as an expansion of the HMM, where each HMM state, or mode, is linked to a linear dynamical process. One further benefit of SLDS is its ability to handle time series data with high dimensionality. From a generative model perspective, it can be claimed that high-dimensional time series can be cheaply represented by a dynamical process specified on a low-dimensional manifold. This representation benefits from the model structure of SLDS. Nevertheless, employing HMM directly on time series with high dimensionality is likely to result in overfitting. [15]. Therefore, SLDS offers the potential for increased descriptive capability [1].

Although SLDS offers advanced predictive models, the dynamics from the data are only stated under certain conditions, which poses some shortcomings of the generative model and may limit its application. First, the continuously evolving linear Gaussian dynamics of latent variables is essentially equivalent to a first-order switching vector autoregressive (VAR) process, i.e., VAR(1) process, which may not be suitable for the case when there are higher-order dependency relationships in the time series. Higher-order autoregressive interactions are more common in time series data [16,17]. Furthermore, determining an appropriate autoregressive order must trade off the model likelihood and complexity, constituting a non-trivial task in the applications [18]. Indeed, a restricted VAR(1) process involved in the linear dynamics of latent variables limits the ability of SLDS to recover the higher-order dependency relationships in dynamical phenomena.

Second, the switching state transition in SLDS follows the Markov assumption, which means that the duration time (i.e., time interval spent in a certain state) is restricted to a geometric distribution [19]. Thus, more weight is given to shorter consecutive time periods within a certain state, which means states switch frequently. This may not be appropriate when the system spends a long time in one state [20]. In addition, the state transitions are based only on the previous state and are unrelated to the latent variables. If a discrete switch happens when the system enters a particular area of the state space, the SLDS cannot recognize this relationship. While hidden semi-Markov model (HSMM) based on the semi-Markov assumption is an extension of the HMM that alleviates this issue, it should be noted that state transitions in HSMM still have no associations with the previous observations [20].

To address the mentioned limitations, an extension of SLDS, the recurrent switching linear dynamical systems with higher-order dependence (HO-rSLDS), is presented. The HO-rSLDS provides a new method that improves the modeling of dynamics from complex systems, by enabling a higher-order dependence in the linear Gaussian dynamics of latent variables and allowing the switching state transition probabilities to hinge on the preceding values of the latent variables. Specifically, to address the first limitation, we enhance the generative process of latent variables by directly making the current latent variable depend on the state values at the previous several time steps, rather than only the previous one time step. Therefore, the proposed model is feasible for capturing the higher-order autoregressive relationships in the data. To address the second limitation, we utilize the stick-breaking logistic regression to determine the switching state transition [21]. In addition, we leverage a Pólya-gamma augmentation approach [22] to improve block inference updating. This enhancement transforms specific logistic regression evidence potentials into conditionally Gaussian potentials, facilitating efficient Bayesian inference procedures. In this way, the transition of switching states is directly associated with the previous switching state and latent variable. Moreover, since the switching states and the latent variables are mutually interdependent in HO-rSLDS, the symmetric dependency among them is recovered, enhancing the generative process of both switching states and latent variables.

Similar to SLDS, the exact inference in an HO-rSLDS is intractable, which impedes efficient model estimation and parameter learning [8]. In this study, the HO-rSLDS is learned through variational inference, which is an approximate scheme to maximize the evidence lower bound, i.e., minimize the Kullback–Leibler divergence between a restricted family of functions of the model parameters and the actual posterior distribution [23]. Moreover, we propose message-passing algorithms about switching states and latent variables in HO-rSLDS to evaluate sufficient statistics for further parameter updates.

The primary contributions of the novel approach suggested in this study are outlined below:

HO-rSLDS assumes the linear dynamics of latent variables can be described by a higher-order VAR process, which makes it feasible to dig out and evaluate the long-term dependency relationships from dynamical phenomena;
Stick-breaking logistic regression is applied to determine the switching state transition in HO-rSLDS. By this means, the transition probabilities are time-varying and can be adjusted according to the previous latent variable, which overcomes the limitation of restricted geometric state duration time by Markov assumption and recovers the symmetric dependency between the switching states and the latent variable;
The Pólya-gamma augmentation strategy permits efficient Bayesian inference algorithms. In addition, we propose message-passing algorithms, including a forward Kalman filter and backward Kalman smoother for HO-rSLDS, which facilitates the parameter update in variational inference.

The rest of this study is organized as follows. In Section 2.3, we discuss the related models. In Section 3, we present the model learning algorithms in detail. In Section 4, we demonstrate the performance of the proposed HO-rSLDS on simulated data. Our method is applied to public functional magnetic resonance imaging (fMRI) data as well, and the results are described in Section 5. Finally, we wrap up our conclusions in Section 6.

2. Methods

2.1. HMM and HSMM

HMMs are mathematical models that analyze dynamic systems by using random processes with hidden states. A HMM is constructed using a Markov chain and is composed of two layers: a layer representing hidden states and a layer representing observations. The observed sequences are allocated to their respective hidden states according to the observation probability distribution. In the context of HMM, the observations are solely influenced by the current state and are not influenced by any preceding states [20]. The entire state sequence can be presented as

z_{1 : T} = {\{z_{t}\}}_{t = 1}^{T}

, which can be viewed as a sequence over a finite set

Ƶ

with cardinality

| Ƶ |

. The state transition probability from state i to j is defined as

π_{i j} = p (z_{t + 1} = j | z_{t} = i)

. The distribution of observations

y_{t} \in R^{n}

given a specific state is denoted by

p (y_{t} | z_{t}, θ_{j})

, where

θ_{j}

denotes the emission parameters of state j. Above all, the HMM can be described as:

\begin{matrix} z_{t} | z_{t - 1} \sim π_{z_{t - 1}}, \end{matrix}

(1)

\begin{matrix} y_{t} | z_{t} \sim F (θ_{z_{t}}), \end{matrix}

(2)

where

F (\cdot)

denotes emission distribution.

Compared to HMM, the HSMM formalism improves upon the standard HMM by incorporating a random state length time, chosen from a state-specific distribution, into the generative process. The state remains constant until the period elapses, at which juncture there is a Markov transition to a fresh state. By using this approach, the distribution of state durations is not limited to a geometric form [24,25]. The random variable

D_{t}

represents the length of time that a state is entered at time t. The associated probability mass function is denoted as

p (D_{t} | z_{t} = j)

and the HSMM can be described as

ν_{s} | ν_{s - 1} \sim π_{ν_{s - 1}},

(3)

D_{s} \sim g (ω_{s}),

(4)

z_{t_{s}^{1} : t_{s}^{D_{s} + 1}} = ν_{s},

(5)

y_{t_{s}^{1} : t_{s}^{D_{s} + 1}} | z_{t_{s}^{1} : t_{s}^{D_{s} + 1}} \sim F (θ_{ν_{s}}, D_{s}),

(6)

where

ν_{s}

indexes the state shared by state segment s. A graphical illustration of HMM and HSMM is shown in Figure 1. HMM or HSMM typically uses a Gaussian mixture as the emission distribution, which fails to capture potential dependencies in observed time series. An effective approach to tackle this problem is to employ the VAR emission model, which describes the behavior of time series through linear historical interactions among the observed time series. In this way, the emission model in HMM or HSMM can be specifically expressed as:

\begin{matrix} y_{t} | z_{t} = k \sim N (\sum_{l = 1}^{p} W_{k, l} y_{t - l}, Σ_{k}), \end{matrix}

(7)

where p denotes the maximum lag order,

W_{k, l}

are

n \times n

dimensional matrices representing the k-th state autoregressive coefficient matrices for lag l and the covariance matrix is

Σ_{k}

. Subsequently, we refer to the H(S)MM with VAR(p) emission model as the AR(p)-H(S)MM.

2.2. SLDS

A SLDS Markov process differs from a hidden Markov model (HMM) by choosing from a collection of linear Gaussian dynamics that evolve continuously, instead of a standard Gaussian mixture density as in HMM. Therefore, SLDS offers the potential for increased descriptive capability.

Similar to HMM, at each time

t = 1, 2, \dots, T

there is a discrete switching state

z_{t} \in \{1, 2, \dots, K\}

that evolves according to Markov dynamics:

\begin{matrix} z_{t + 1} | z_{t}, {\{π_{k}\}}_{k = 1}^{K} \sim π_{z_{t}}, \end{matrix}

(8)

{\{π_{k}\}}_{k = 1}^{K}

constitutes the Markov transition matrix and

π_{k} \in {[0, 1]}^{K}

is its k-th row. The

z_{1}

is assumed to be sampled from an initial distribution

π_{0}

parameterized by prior parameter

α_{0}

, i.e.,

z_{1} | α_{0} \sim π_{0} (α_{0})

. Furthermore, a continuous latent variable

x_{t} \in R^{m}

follows conditionally linear dynamics depend on the state values with at the previous time step, and the switching state

z_{t}

determines the linear dynamical system at time t by:

\begin{matrix} x_{t + 1} = A_{z_{t + 1}}^{(0)} x_{t} + v_{t}, v_{t} \overset{i.i.d}{\sim} N (0, Q_{z_{t + 1}}^{(0)}), \end{matrix}

(9)

where

A_{z_{t + 1}}^{(0)} \in R^{m \times m}

denotes the coefficient matrix and

Q_{z_{t + 1}}^{(0)} \in R^{m \times m}

is covariance matrix. At each time t, a linear Gaussian observation

y_{t} \in R^{n}

is produced from the associated latent continuous variable:

\begin{matrix} y_{t} = C^{(0)} x_{t} + w_{t}, w_{t} \overset{i.i.d}{\sim} N (0, S^{(0)}), \end{matrix}

(10)

where

C^{(0)} \in R^{n \times m}

,

S^{(0)} \in R^{n \times n}

. Usually, n is much larger than m, such that the dependence structure of high-dimensional observation data can be evaluated based on the low-dimensional latent variable space. A graphical illustration of SLDS is given in Figure 2.

In classical form, however, there are some shortcomings in the generative model of SLDS, which may limit its application. First, the continuously-evolving linear Gaussian dynamics of

x_{t}

is essentially equivalent to a first-order switching vector autoregressive (VAR) process, i.e., the VAR(1) process, which may not be suitable for the case when there are higher-order dependency relationships in

x_{t}

or

y_{t}

. Substitute Equation (9) into Equation (10) and assume

{C^{(0)}}^{⊤} C^{(0)} = I

and

w_{t}

approximately zero, one can immediately obtain the conclusion that SLDS describes the dependency relationships by a VAR(1) process since

y_{t} = C^{(0)} A_{z_{t}}^{(0)} {C^{(0)}}^{⊤} y_{t - 1} + C^{(0)} v_{t}

. Second, the state transition of

z_{t}

follows the Markov assumption, which means that the duration time is restricted to a geometric distribution [19]. Thus, more weight is given to shorter consecutive time periods within a certain state, which means states switch frequently. This may not be appropriate when the system spends a long time in one state [20]. In addition, the state

z_{t + 1}

transitions based only on the previous state

z_{t}

, and

z_{t + 1} | z_{t}

is unrelated to the latent variable

x_{t}

. If a discrete switch happens when the system enters a particular area of the state space, the SLDS will not be able to recognize this relationship. To address these issues, the proposed HO-rSLDS enhances the generative process of

x_{t}

and the switching state transition, which is demonstrated in the following subsection.

2.3. Recurrent Switching Linear Dynamic System with Higher-Order Dependence

In this section, we describe the proposed HO-rSLDS comprising two main components that are switching linear dynamic systems with higher-order dependence (HO-SLDS, without recurrence) and the stick-breaking logistic regression as well as the Pólya-gamma augmentation for HO-SLDS. Each component is demonstrated below.

2.3.1. HO-SLDS

The switching state transition in HO-SLDS is the same as that in SLDS. However, the continuous latent variable

x_{t} \in R^{m}

follows conditionally linear dynamics depending on the state values at the previous p time steps, rather than only the previous one time step, and the switching state

z_{t}

determines the linear dynamical system at time t by:

\begin{matrix} x_{t + 1} = A_{z_{t + 1}} {\bar{x}}_{t} + v_{t}, v_{t} \overset{i.i.d}{\sim} N (0, Q_{z_{t + 1}}), \end{matrix}

(11)

where

{\bar{x}}_{t} = {[x_{t - 1}^{⊤}, \dots, x_{t - p}^{⊤}]}^{⊤} \in R^{p m}

,

A_{z_{t + 1}} \in R^{m \times p m}

and

Q_{z_{t + 1}} \in R^{m \times m}

. Indeed, the evolution of the latent variable dynamics in

x_{t}

can be considered as a

VAR (p)

process. The initial

{\bar{x}}_{0}

given the initial discrete state

z_{0}

is supposed to be normally distributed with mean

μ_{0}

and covariance

Σ_{0}

, i.e.,

{\bar{x}}_{0} \sim N (μ_{0}, Σ_{0})

. Similarly, at each time t, a linear Gaussian observation

y_{t} \in R^{n}

is produced from the associated latent continuous variable:

\begin{matrix} y_{t} = C_{z_{t}} x_{t} + w_{t}, w_{t} \overset{i.i.d}{\sim} N (0, S_{z_{t}}), \end{matrix}

(12)

where

C_{z_{t}} \in R^{n \times m}

,

S_{z_{t}} \in R^{n \times n}

. Note that here we assume

C_{z_{t}}

and

S_{z_{t}}

are related to

z_{t}

. Usually, n is much larger than m, such that the dependence structure of high-dimensional observation data can be evaluated based on the low-dimensional latent variable space. The system parameters consist of the discrete Markov transition matrix and the collection of linear dynamical system matrices, denoted as:

\begin{matrix} θ = π_{0} \cup {\{μ_{0}, Σ_{0}, π_{k}, A_{k}, Q_{k}, C_{k}, S_{k}\}}_{k = 1}^{K} . \end{matrix}

(13)

2.3.2. Stick-Breaking Logistic Regression and Pólya-Gamma Augmentation for HO-SLDS

Another component included in HO-rSLDS is a stick-breaking logistic regression. We utilize a Pólya-gamma augmentation technique to improve block inference updates [21]. This approach allows certain logistic regression evidence potentials to transform to conditionally Gaussian potentials in an augmented distribution, which facilitates the Bayesian inference algorithms.

Consider a logistic regression model with regressors

x \in R^{m}

that maps to a categorical distribution over the switching state

z \in \{1, 2, \dots, K\}

, denoted as:

\begin{matrix} z | x \sim π_{SB} (ν), ν = R x + r, \end{matrix}

(14)

where

R \in R^{K - 1 \times M}

represents a weight matrix and

r \in R^{K - 1}

denotes a bias vector. We use a stick-breaking link function

π_{SB} : R^{K - 1} \to {[0, 1]}^{K}

, which transforms a real vector into a normalized probability vector based on the stick-breaking process:

\begin{matrix} π_{SB} (ν) = (π_{SB} {(ν)}^{(1)}, \dots, π_{SB} {(ν)}^{(K)}), \end{matrix}

(15)

\begin{matrix} π_{SB}^{(k)} = σ (ν_{k}) \prod_{j < K} σ (- ν_{j}), \end{matrix}

(16)

for

k = 1, \dots, K - 1

and

π_{SB}^{(K)} = \prod_{k = 1}^{K - 1} σ (- ν_{k})

where

ν_{k}

represents the kth component of

ν

and

σ (ν) = {(1 + e^{- ν})}^{- 1}

is the logistic function. The probability mass function

p (z | x)

is given by

\begin{matrix} p (z | x) = \prod_{k = 1}^{K - 1} σ {(ν_{k})}^{I (z = k)} σ {(- ν_{k})}^{I (z > k)}, \end{matrix}

(17)

where

I (\cdot)

is the indicator function.

The posterior density

p (x | z)

is non-Gaussian and does not allow straightforward Bayesian inference for this regression model due to the likelihood

p (z | x)

not aligning with a Gaussian prior density

p (x)

. To solve this problem, we introduce Pólya-gamma auxiliary variables

ω = {\{ω_{k}\}}_{k = 1}^{K}

so that the conditional density

p (x | z, ω)

becomes Gaussian [22]. Specifically, the conditional density of

ω_{k}

is distributed according to a Pólya-Gamma distribution. A random variable X has a Pólya-Gamma distribution with parameters

b > 0

and

c \in R

, denoted as

X \sim PG (b, c)

, if

\begin{matrix} X \overset{D}{=} \frac{1}{2 π^{2}} \sum_{k = 1}^{\infty} \frac{g_{k}}{{(k - 1 / 2)}^{2} + c^{2} / (4 π^{2})}, \end{matrix}

(18)

where the

g_{k} \sim Gamma (b, 1)

are independent Gamma random variables, and where

\overset{D}{=}

indicates equality in distribution. In particular, by choosing

ω_{k} | x, z \sim PG (I (z > k), ν_{k})

, we have the following:

\begin{matrix} x | z, ω \sim N (Ω^{- 1} κ, Ω^{- 1}), \end{matrix}

(19)

where the mean vector

Ω^{- 1} κ

and covariance matrix

Ω^{- 1}

are calculated based on the following:

\begin{matrix} Ω = diag (ω), \end{matrix}

(20)

\begin{matrix} κ_{k} = I (z = k) - \frac{1}{2} I (z > k) . \end{matrix}

(21)

Thus allowing efficient block updates while maintaining Gaussian posterior distribution

p (x | z)

in Bayesian inference.

The HO-rSLDS splits the latent space into K sections, with each section following its own linear dynamics by including the recurrence (Equation (17)) in the transition density of

z_{t}

, which can be described as follows:

\begin{matrix} z_{t + 1} | z_{t}, x_{t}, R_{z_{t}}, r_{z_{t}} \sim π_{SB} (ν_{t}), \end{matrix}

(22)

\begin{matrix} ν_{t} = R_{z_{t}} x_{t} + r_{z_{t}} . \end{matrix}

(23)

Moreover, we place Gaussian priors on

R_{z_{t}}

and

r_{z_{t}}

. In this way, the probability mass function

p (z_{t + 1} | x_{t})

can be expressed as:

\begin{matrix} p (z_{t + 1} | x_{t}) & = \prod_{k = 1}^{K - 1} σ {(ν_{k})}^{I (z_{t + 1} = k)} σ {(- ν_{k})}^{I (z_{t + 1} > k)} \\ = \prod_{k = 1}^{K - 1} \frac{{(e^{ν_{t, k}})}^{I (z_{t + 1} = k)}}{{(1 + e^{ν_{t, k}})}^{I (z_{t + 1} > k)}} . \end{matrix}

(24)

The Pólya-gamma augmentation focuses on exactly the above densities, utilizing the following integral identity:

\begin{matrix} \frac{{(e^{ν})}^{a}}{{(1 + e^{ν})}^{b}} = 2^{- b} e^{κ ν} \int_{0}^{\infty} e^{- ω ν^{2}} p_{PG} (ω | b, 0) d ω, \end{matrix}

(25)

where

κ = a - b / 2

and

p_{PG} (ω | b, 0)

refers to the density of the Pólya-gamma distribution,

PG (b, 0)

, which is independent of

ν

[22]. Combing Equations (24) and (25),

p (z_{t + 1} | x_{t})

can be expressed as the marginalization of a component in the probability distribution

p (z_{t + 1} | x_{t}, ω_{t})

, where

ω_{t} \in R^{K - 1}

represents a vector of auxiliary variables. Dependent on

ν_{t}

, there is

\begin{matrix} p (z_{t + 1}, x_{t}, ω_{t}) \propto \prod_{k = 1}^{K - 1} exp \{- \frac{1}{2} ω_{t, k} ν_{t, k}^{2} + κ_{t + 1, k} ν_{t, k}\}, \end{matrix}

(26)

where

κ_{t + 1, k} = I (z_{t + 1} = k) - \frac{1}{2} I (z_{t + 1} > k)

. Hence,

\begin{matrix} p (z_{t + 1}, x_{t}, ω_{t}) \propto N (ν_{t} | Ω_{t}^{- 1} κ_{t + 1}, Ω_{t}^{- 1}), \end{matrix}

(27)

where

Ω_{t} = diag (ω_{t})

and

κ_{t + 1} = (κ_{t + 1, 1}, \dots, κ_{t + 1, K - 1})

. After augmentation, the conditional density on

x_{t}

becomes virtually Gaussian, simplifying the evaluation of the integrals needed for message passing during the Bayesian inference. Finally, we summarize the proposed HO-rSLDS as

\begin{matrix} z_{1} | θ \sim π_{0} (α_{0}), \\ {\bar{x}}_{0} | θ \sim N ({\bar{x}}_{0} | μ_{0}, Σ_{0}), \\ z_{t + 1} | z_{t}, x_{t}, θ \sim π_{SB} (ν_{t}), ν_{t} = R_{z_{t}} x_{t} + r_{z_{t}}, \\ x_{t + 1} | {\bar{x}}_{t}, z_{t + 1}, θ \sim N (x_{t + 1} | A_{z_{t + 1}} {\bar{x}}_{t}, Q_{z_{t + 1}}), \\ y_{t} | x_{t}, z_{t}, θ \sim N (y_{t} | C_{z_{t}} x_{t}, S_{z_{t}}), \\ θ = \{α_{0}, μ_{0}, Σ_{0}\} \cup {\{A_{k}, Q_{k}, C_{k}, S_{k}, R_{k}, r_{k}\}}_{k = 1}^{K} . \end{matrix}

(28)

To learn the HO-rSLDS utilizing Bayesian inference, we assign conjugate priors to each component of the model parameters

θ

, given by

\begin{matrix} π_{0} | α_{0} \sim Dir (α_{0}), \\ (A_{k}, Q_{k}) | η \sim MNIW (η), \\ (C_{k}, S_{k}) | λ \sim MNIW (λ), \\ R_{k} | ζ \sim \prod_{i = 1}^{K - 1} N (R_{k, i} | ζ_{i}), \\ r_{k} | ρ \sim N (r_{k} | ρ), \end{matrix}

(29)

where Dir and MNIW denote Dirichelet and matrix normal inverse Wishart distribution, respectively,

R_{k, i}

denotes the ith row of

R_{k}

. A graphical illustration of HO-rSLDS is given in Figure 3.

3. Variational Inference of HO-rSLDS

Now we describe the model learning inference of HO-rSLDS, which is fulfilled by variational inference. In variational inference, a parameter is inferred by maximizing the evidence’s lower bound, hence minimizing the Kullback–Leibler divergence as well as the variational free energy. Specifically, we denote

Y : = y_{1 : T}

,

Θ : = \{z_{1 : T}, ({\bar{x}}_{0}, x_{1 : T}), ω_{1 : T}, θ\}

, then the following decomposition holds:

\begin{matrix} ln p (Y) = L (q (Θ)) + KL (q (Θ) | | p (Θ | Y)), \end{matrix}

(30)

where

\begin{matrix} L (q (Θ)) = \int q (Θ) log \frac{p (Y, Θ)}{q (Θ)} d Θ, \end{matrix}

(31)

\begin{matrix} KL (q (Θ) | | p (Θ | Y)) = - \int q (Θ) log \frac{p (Θ | Y)}{q (Θ)} d Θ . \end{matrix}

(32)

The exact evaluation of

p (z_{1 : T}, x_{0 : T}, ω_{1 : T} | y_{1 : T})

where

x_{0 : T} : = {\bar{x}}_{0} \cup x_{1 : T}

in HO-rSLDS is intractable. In this study, we leverage an approximate variational inference method to assess the approximate posterior

q (z_{1 : T}, x_{0 : T}, ω_{1 : T} | y_{1 : T})

in the proposed HO-rSLDS. In order to obtain a tractable inference, a structured mean field approximation is adopted on the generative model, given by

\begin{matrix} q (z_{1 : T}, x_{0 : T}, ω_{1 : T}, θ | y_{1 : T}) = q (z_{1 : T}) q (x_{0 : T}) q (ω_{1 : T}) q (θ) . \end{matrix}

(33)

Moreover,

q (ω_{1 : T})

further factorizes as:

\begin{matrix} q (ω_{1 : T}) = \prod_{t = 1}^{T} \prod_{k = 1}^{K - 1} q (ω_{t, k}) . \end{matrix}

(34)

Given the factorized forms in Equations (33) and (34), we can obtain the optimal solution of the factorized variational posteriors by maximizing the evidence lower bound in Equation (31), i.e.,

q (z_{1 : T}), q (x_{0 : T}), q (ω_{1 : T})

and

q (θ)

, given by

\begin{matrix} ln q (z_{1 : T}) \propto E_{- q (z_{1 : T})} ln p (z_{1 : T}, x_{0 : T}, ω_{1 : T}, θ, y_{1 : T}), \end{matrix}

(35)

\begin{matrix} ln q (x_{0 : T}) \propto E_{- q (x_{0 : T})} ln p (z_{1 : T}, x_{0 : T}, ω_{1 : T}, θ, y_{1 : T}), \end{matrix}

(36)

\begin{matrix} ln q (ω_{1 : T}) \propto E_{- q (ω_{1 : T})} ln p (z_{1 : T}, x_{0 : T}, ω_{1 : T}, θ, y_{1 : T}), \end{matrix}

(37)

\begin{matrix} ln q (θ) \propto E_{- q (θ)} ln p (z_{1 : T}, x_{0 : T}, ω_{1 : T}, θ, y_{1 : T}), \end{matrix}

(38)

where

E_{- q (z_{1 : T})} (\cdot) : = E_{q (x_{0 : T}) q (ω_{1 : T}) q (θ)} (\cdot)

. Hence the joint log-likelihood is

\begin{matrix} ln p (z_{1 : T}, x_{0 : T}, ω_{1 : T}, θ, y_{1 : T}) = ln p (z_{1 : T}, x_{0 : T}, ω_{1 : T}, y_{1 : T} | θ) + ln p (θ) . \end{matrix}

(39)

The conditional log-likelihood

ln p (z_{1 : T}, x_{0 : T}, ω_{1 : T}, y_{1 : T} | θ)

can be written as

\begin{matrix} ln p (z_{1 : T}, x_{0 : T}, ω_{1 : T}, y_{1 : T} | θ) = \\ ln p (z_{1} | θ) + ln p ({\bar{x}}_{0} | θ) + \sum_{t = 1}^{T} [ln p (z_{t} | x_{t - 1}, z_{t - 1}, ω_{t - 1}, θ) I (t > 1) + ln (x_{t} | {\bar{x}}_{t - 1}, z_{t}, θ)] \\ + \sum_{t = 1}^{T} p (y_{t} | z_{t}, x_{t}, θ) \end{matrix}

(40)

\begin{matrix} \propto ln p (z_{1} | θ) - \frac{1}{2} {({\bar{x}}_{0} - μ_{0})}^{⊤} {(Σ_{0})}^{- 1} ({\bar{x}}_{0} - μ_{0}) \\ - \frac{1}{2} \sum_{t = 1}^{T} [{(ν_{t - 1} - Ω_{t - 1}^{- 1} κ_{t})}^{⊤} Ω_{t - 1} (ν_{t - 1} - Ω_{t - 1}^{- 1} κ_{t}) I (t > 1) \\ + {(x_{t} - A_{z_{t}} {\bar{x}}_{t - 1})}^{⊤} Q_{z_{t}}^{- 1} (x_{t} - A_{z_{t}} {\bar{x}}_{t - 1})] \\ - \frac{1}{2} \sum_{t = 1}^{T} {(y_{t} - C_{z_{t}} x_{t})}^{⊤} S_{z_{t}}^{- 1} (y_{t} - C_{z_{t}} x_{t}) . \end{matrix}

(41)

The exact optimal solutions for

q (z_{1 : T})

and

q (x_{0 : T})

will not be explicitly evaluated; rather, we use message passing algorithms to obtain the sufficient statistics with respect to these factors. In the following subsections, we present the involved message passing algorithms and the detailed derivations of update formula for each factor.

3.1. Update for $q (x_{0 : T})$

Since

E_{- q (x_{0 : T})} ln p (θ)

is not a function of

x_{0 : T}

, Equation (36) can be written as

\begin{matrix} ln q (x_{0 : T}) & \propto E_{- q (x_{0 : T})} ln p (z_{1 : T}, x_{0 : T}, ω_{1 : T}, y_{1 : T} | θ), \\ \propto E_{- q (x_{0 : T})} [- \frac{1}{2} {({\bar{x}}_{0} - μ_{0})}^{⊤} {(Σ_{0})}^{- 1} ({\bar{x}}_{0} - μ^{0})] \\ + E_{- q (x_{0 : T})} [- \frac{1}{2} \sum_{t = 2}^{T} {(ν_{t - 1} - Ω_{t - 1}^{- 1} κ_{t})}^{⊤} Ω_{t - 1} (ν_{t - 1} - Ω_{t - 1}^{- 1} κ_{t})] \\ + E_{- q (x_{0 : T})} [- \frac{1}{2} \sum_{t = 1}^{T} {(x_{t} - A_{z_{t}} {\bar{x}}_{t - 1})}^{⊤} Q_{z_{t}}^{- 1} (x_{t} - A_{z_{t}} {\bar{x}}_{t - 1})] \\ + E_{- q (x_{0 : T})} [- \frac{1}{2} \sum_{t = 1}^{T} {(y_{t} - C_{z_{t}} x_{t})}^{⊤} S_{z_{t}}^{- 1} (y_{t} - C_{z_{t}} x_{t})] . \end{matrix}

(42)

From Equation (42) it can be observed that the structure of the variational posterior

ln q (x_{0 : T})

resembles the joint log-likelihood function of a time-varying linear dynamical system with higher-order dependence (HO-LDS). Thus, the

ln q (x_{0 : T})

can be re-expressed as

\begin{matrix} ln q (x_{0 : T}) & \propto - \frac{1}{2} {({\bar{x}}_{0} - {\hat{μ}}_{0})}^{⊤} {({\hat{Σ}}_{0})}^{- 1} ({\bar{x}}_{0} - {\hat{μ}}_{0}) \\ - \frac{1}{2} \sum_{t = 1}^{T} {(x_{t} - {\hat{A}}_{t} {\bar{x}}_{t - 1})}^{⊤} {\hat{Q}}_{t}^{- 1} (x_{t} - {\hat{A}}_{t} {\bar{x}}_{t - 1}) \\ - \frac{1}{2} \sum_{t = 1}^{T} {(y_{t} - {\hat{C}}_{t} x_{t})}^{⊤} {\hat{S}}_{t}^{- 1} (y_{t} - {\hat{C}}_{t} x_{t}), \end{matrix}

(43)

and the set of variational parameters involved in Equation (43) are defined as

\begin{matrix} {\hat{θ}}_{x_{0 : T}} : = \{{\hat{μ}}_{0}, {\hat{Σ}}_{0}, {\{{\hat{A}}_{t}\}}_{t = 1}^{T}, {\{{\hat{Q}}_{t}\}}_{t = 1}^{T}, {\{{\hat{C}}_{t}\}}_{t = 1}^{T}, {\{{\hat{S}}_{t}\}}_{t = 1}^{T}\} . \end{matrix}

(44)

Each component in the

{\hat{θ}}_{x_{0 : T}}

can be precisely determined by matching the coefficients of each term in Equations (42) and (43). Algorithm 1 outlines an efficient scheme for this purpose.

Algorithm 1: Obtaining

{\hat{θ}}_{x_{0 : T}}

Remark 1.

A_{z_{t}} {\bar{x}}_{t - 1} = \sum_{j = 1}^{p} A_{z_{t}} (j) x_{t - j}

where

A_{z_{t}} (j)

is defined as the submatrix of

A_{z_{t}}

associated with

x_{t - j}

in

A_{z_{t}} {\bar{x}}_{t - 1}

.

Thus, we perform message passing on a time-varing HO-LDS parameterized by

{\hat{θ}}_{x_{0 : T}}

to obtain the posterior distribution

q (x_{0 : T})

and sufficient statistics used in the update for the other factors. To be concrete, the message passing comprises the generalized Kalman filter and Kalman smoother for the time-varing HO-LDS, which are demonstrated by Theorems 1 and 2. To prove the theorems, we need the following lemmas [26]:

Lemma 1.

If random variables X and Y follow the Gaussian probability distributions:

\begin{matrix} X \sim N (m, P), \end{matrix}

(45)

\begin{matrix} Y | X \sim N (H X + u, R), \end{matrix}

(46)

then the joint distribution of X and Y, as well as the marginal distribution of Y, are provided by

\begin{matrix} (\begin{matrix} X \\ Y \end{matrix}) \sim N ((\begin{matrix} m \\ H m + u \end{matrix}), (\begin{matrix} P & P H^{⊤} \\ H P & H P H^{⊤} + R \end{matrix})), \\ Y \sim N (H m + u, H P H^{⊤} + R) . \end{matrix}

(47)

Lemma 2.

If the random variables X and Y follow the joint Gaussian probability distribution

\begin{matrix} (\begin{matrix} X \\ Y \end{matrix}) \sim N ((\begin{matrix} a \\ b \end{matrix}), (\begin{matrix} A & C \\ C^{⊤} & B \end{matrix})), \end{matrix}

(48)

then the marginal and conditional distributions of X and Y are as follows:

\begin{matrix} X \sim N (a, A), \\ Y \sim N (b, B), \\ X | Y \sim N (a + C B^{- 1} (Y - b), A - C B^{- 1} C^{⊤}), \\ Y | X \sim N (b + C^{⊤} A^{- 1} (X - a), B - C^{⊤} A^{- 1} C) . \end{matrix}

(49)

Now we demonstrate Theorems 1 and 2 in turn, which provides the Kalman filter and smoother equations about

q (x_{0 : T})

in the time-varying HO-LDS.

Theorem 1.

The Kalman filter equations for the time-varing HO-LDS (Equation (43)) parameterized by

{\hat{θ}}_{x_{0 : T}}

can be evaluated in closed form, which results in Gaussian distributions:

\begin{matrix} p (x_{t} | y_{1 : t - 1}) = N (x_{t} | μ_{t}^{(p)}, Σ_{t}^{(p)}), \end{matrix}

(50)

\begin{matrix} p (x_{t} | y_{1 : t}) = N (x_{t} | μ_{t}^{(u)}, Σ_{t}^{(u)}), \end{matrix}

(51)

\begin{matrix} p ({\bar{x}}_{t} | y_{1 : t}) = N ({\bar{x}}_{t} | μ_{t}^{(m)}, Σ_{t}^{(m)}) . \end{matrix}

(52)

The distribution parameters can be calculated using the following Kalman filter prediction, update and merging steps detailed in the following proof. The recursion begins with the prior mean

μ_{0}^{m}

and covariance

Σ_{0}^{m}

(i.e.,

{\hat{μ}}_{0}

and

{\hat{Σ}}_{0}

).

Proof.

The Gaussian filter distributions (Equations (50)–(52)) can be obtained by the following steps:

1. Prediction step. By Lemma 1, the joint distribution of

x_{t}

and

{\bar{x}}_{t - 1}

given

y_{1 : t - 1}

is

\begin{matrix} p (x_{t}, {\bar{x}}_{t - 1} | y_{1 : t - 1}) & = p (x_{t} | {\bar{x}}_{t - 1}) p ({\bar{x}}_{t - 1} | y_{1 : t - 1}) \\ = N (x_{t} | {\hat{A}}_{t} {\bar{x}}_{t - 1}, {\hat{Q}}_{t}) N ({\bar{x}}_{t - 1} | μ_{t - 1}^{(m)}, Σ_{t - 1}^{(m)}) \\ = N ((\begin{matrix} {\bar{x}}_{t - 1} \\ x_{t} \end{matrix})| μ^{(1)}, Σ^{(1)}), \end{matrix}

(53)

where

\begin{matrix} μ^{(1)} = (\begin{matrix} μ_{t - 1}^{(m)} \\ {\hat{A}}_{t} μ_{t - 1}^{(m)} \end{matrix}), Σ^{(1)} = (\begin{matrix} Σ_{t - 1}^{(m)} & Σ_{t - 1}^{(m)} {\hat{A}}_{t}^{⊤} \\ {\hat{A}}_{t} Σ_{t - 1}^{(m)} & {\hat{A}}_{t} Σ_{t - 1}^{(m)} {\hat{A}}_{t}^{⊤} + {\hat{Q}}_{t} \end{matrix}), \end{matrix}

(54)

and by Lemma 2 the marginal distribution of

x_{t}

is given by

\begin{matrix} p (x_{t} | y_{1 : t - 1}) = N (x_{t} | μ_{t}^{(p)}, Σ_{t}^{(p)}), \end{matrix}

(55)

where

\begin{matrix} μ_{t}^{(p)} = {\hat{A}}_{t} μ_{t - 1}^{(m)}, Σ_{t}^{(p)} = {\hat{A}}_{t} Σ_{t - 1}^{(m)} {\hat{A}}_{t}^{⊤} + {\hat{Q}}_{t} . \end{matrix}

(56)

2. Update step. By Lemma 1, the joint distribution of

y_{t}

and

x_{t}

given

y_{1 : t - 1}

is

\begin{matrix} p (x_{t}, y_{t} | y_{1 : t - 1}) & = p (y_{t} | x_{t}) p (x_{t} | y_{1 : t - 1}) \\ = N (y_{t} | {\hat{C}}_{t} x_{t}, {\hat{S}}_{t}) N (x_{t} | μ_{t}^{(p)}, Σ_{t}^{(p)}) \\ = N ((\begin{matrix} x_{t} \\ y_{t} \end{matrix})| μ^{(2)}, Σ^{(2)}), \end{matrix}

(57)

where

\begin{matrix} μ^{(2)} = (\begin{matrix} μ_{t}^{(p)} \\ {\hat{C}}_{t} μ_{t}^{(p)} \end{matrix}), Σ^{(2)} = (\begin{matrix} Σ_{t}^{(p)} & Σ_{t}^{(p)} {\hat{C}}_{t}^{⊤} \\ {\hat{C}}_{t} Σ_{t}^{(p)} & {\hat{C}}_{t} Σ_{t}^{p} {\hat{C}}_{t}^{⊤} + {\hat{S}}_{t} \end{matrix}) . \end{matrix}

(58)

By Lemma 2 the conditional distribution of

x_{t}

given

y_{1 : t}

is

\begin{matrix} p (x_{t} | y_{t}, y_{1 : t - 1}) & = p (x_{t} | y_{1 : t}) \\ = N (x_{t} | μ^{(u)}, Σ^{(u)}), \end{matrix}

(59)

where

\begin{matrix} μ^{(u)} = μ_{t}^{(p)} + Σ_{t}^{(p)} {\hat{C}}_{t}^{⊤} {({\hat{C}}_{t} Σ_{t}^{(p)} {\hat{C}}_{t}^{⊤} + {\hat{S}}_{t})}^{- 1} (y_{t} - {\hat{C}}_{t} μ_{t}^{(p)}), \end{matrix}

(60)

\begin{matrix} Σ^{(u)} = Σ_{t}^{(p)} - Σ_{t}^{(p)} {\hat{C}}_{t}^{⊤} {({\hat{C}}_{t} Σ_{t}^{(p)} {\hat{C}}_{t}^{⊤} + {\bar{S}}_{t})}^{- 1} {\hat{C}}_{t} Σ_{t}^{(p)} . \end{matrix}

(61)

3. Merging step. By Lemma 1, the joint distribution of

{\bar{x}}_{t - 1}, x_{t}, y_{t}

given

y_{1 : t - 1}

is

\begin{matrix} p ({\bar{x}}_{t - 1}, x_{t}, y_{t} | y_{1 : t - 1}) & = p (y_{t} | x_{t}) p (x_{t}, {\bar{x}}_{t - 1} | y_{1 : t - 1}) \\ = N ({\hat{C}}_{t} x_{t}, {\hat{S}}_{t}) N ((\begin{matrix} {\bar{x}}_{t - 1} \\ x_{t} \end{matrix})| μ^{(1)}, Σ^{(1)}), \\ = N ((\begin{matrix} {\bar{x}}_{t - 1} \\ x_{t} \\ y_{t} \end{matrix})| μ^{(3)}, Σ^{(3)}), \end{matrix}

(62)

where

\begin{matrix} μ^{(3)} = (\begin{matrix} μ_{1}^{(3)} \\ μ_{2}^{(3)} \\ μ_{3}^{(3)} \end{matrix}) = (\begin{matrix} μ_{t - 1}^{(m)} \\ {\hat{A}}_{t} μ_{t - 1}^{(m)} \\ {\hat{C}}_{t} {\hat{A}}_{t} μ_{t - 1}^{(m)} \end{matrix}), \end{matrix}

(63)

\begin{matrix} Σ^{(3)} & = (\begin{matrix} Σ_{11}^{(3)} & Σ_{12}^{(3)} & Σ_{13}^{(3)} \\ Σ_{21}^{(3)} & Σ_{21}^{(3)} & Σ_{23}^{(3)} \\ Σ_{31}^{(3)} & Σ_{32}^{(3)} & Σ_{33}^{(3)} \end{matrix}), \\ = (\begin{matrix} Σ_{t - 1}^{(m)} & Σ_{t - 1}^{(m)} {\hat{A}}_{t}^{⊤} & Σ_{t - 1}^{(m)} {\hat{A}}_{t}^{⊤} {\hat{C}}_{t}^{⊤} \\ {\hat{A}}_{t} Σ_{t - 1}^{(m)} & {\hat{A}}_{t} Σ_{t - 1}^{(m)} {\hat{A}}_{t}^{⊤} + {\hat{Q}}_{t} & ({\hat{A}}_{t} Σ_{t - 1}^{(m)} {\hat{A}}_{t}^{⊤} + {\hat{Q}}_{t}) {\hat{C}}_{t}^{⊤} \\ {\hat{C}}_{t} {\hat{A}}_{t} σ_{t - 1}^{(m)} & {\hat{C}}_{t} ({\hat{A}}_{t} Σ_{t - 1}^{(m)} {\hat{A}}^{⊤} + {\hat{Q}}_{t}) & {\hat{C}}_{t} ({\hat{A}}_{t} Σ_{t - 1}^{(m)} {\hat{A}}_{t}^{⊤} + {\hat{Q}}_{t}) {\hat{C}}_{t}^{⊤} + {\hat{S}}_{t} \end{matrix}) . \end{matrix}

(64)

By Lemma 2, the conditional distribution of

{\bar{x}}_{t - 1}

and

x_{t}

given

y_{1 : t}

is

\begin{matrix} p ({\bar{x}}_{t - 1}, x_{t} | y_{t}, y_{1 : t - 1}) & = p ({\bar{x}}_{t - 1}, x_{t} | y_{1 : t}) \\ = N ({\bar{x}}_{t - 1}, x_{t} | μ^{(4)}, Σ^{(4)}), \end{matrix}

(65)

where

\begin{matrix} μ^{(4)} = (\begin{matrix} μ_{1}^{(3)} \\ μ_{2}^{(3)} \end{matrix}) + (\begin{matrix} Σ_{13}^{(3)} \\ Σ_{23}^{(3)} \end{matrix}) {(Σ_{33}^{(3)})}^{- 1} (y_{t} - μ_{3}^{(3)}), \end{matrix}

(66)

\begin{matrix} Σ^{(4)} = Σ_{11}^{(3)} - (\begin{matrix} Σ_{13}^{(3)} \\ Σ_{23}^{(3)} \end{matrix}) {(Σ_{33}^{(3)})}^{- 1} (\begin{matrix} Σ_{13}^{(3)} & Σ_{23}^{(3)} \end{matrix}) . \end{matrix}

(67)

Finally, the marginal distribution of

{\bar{x}}_{t}

is

\begin{matrix} p ({\bar{x}}_{t} | y_{1 : t}) & = N ({\bar{x}}_{t} | μ_{t}^{(m)}, Σ_{t}^{(m)}) \\ = N ({\bar{x}}_{t} | μ^{(4)} ({\bar{x}}_{t}), Σ^{(4)} ({\bar{x}}_{t})), \end{matrix}

(68)

where

μ^{(4)} ({\bar{x}}_{t})

and

Σ^{(4)} ({\bar{x}}_{t})

are marginal mean vector and covariance matrix of

{\bar{x}}_{t}

. □

The Kalman filter for the time-varying HO-LDS in Theorem 1 calculates the estimates based on measurements collected up to and including time step t. After obtaining the filtering posterior state distributions, Theorem 2 outlines the results for calculating the marginal posterior distributions for each time step based on all data up to the final time step T, which is described as the following:

Theorem 2.

The Kalman smoother equations for the time-varing HO-LDS (Equation (43)) parameterized by

{\hat{θ}}_{x_{0 : T}}

can be evaluated in closed form, which results Gaussian distributions:

\begin{matrix} p (x_{t} | y_{1 : T}) = N (x_{t} | μ_{t}^{(s_{1})}, Σ_{t}^{(s_{1})}), \end{matrix}

(69)

\begin{matrix} p ({\bar{x}}_{t} | y_{1 : T}) = N ({\bar{x}}_{t} | μ_{t}^{(s_{2})}, Σ_{t}^{(s_{2})}), \end{matrix}

(70)

\begin{matrix} p ({\bar{x}}_{t - 1}, x_{t} | y_{1 : T}) = N (({\bar{x}}_{t - 1}, x_{t}) | μ_{t}^{(s_{3})}, Σ_{t}^{(s_{3})}) . \end{matrix}

(71)

The smoothing distribution and filtering distribution of the last time step are identical, allowing for the recursive computation of the smoothing distribution for all time steps by beginning from the last step and moving backward.

Proof.

Firstly by Lemma 1, the joint distribution of

{\bar{x}}_{t - 1}

and

x_{t}

given

y_{1 : t - 1}

is

\begin{matrix} p (x_{t}, {\bar{x}}_{t - 1} | y_{1 : t - 1}) & = p (x_{t} | {\bar{x}}_{t - 1}) p ({\bar{x}}_{t - 1} | y_{1 : t - 1}) \\ = N (x_{t} | {\hat{A}}_{t} {\bar{x}}_{t - 1}, {\hat{Q}}_{t}) N ({\bar{x}}_{t - 1} | μ_{t - 1}^{(m)}, Σ_{t - 1}^{(m)}) \\ = N ((\begin{matrix} {\bar{x}}_{t - 1} \\ x_{t} \end{matrix})| μ^{(1)}, Σ^{(1)}), \end{matrix}

(72)

where

\begin{matrix} μ^{(1)} = (\begin{matrix} μ_{t - 1}^{(m)} \\ {\hat{A}}_{t} μ_{t - 1}^{(m)} \end{matrix}), Σ^{(1)} = (\begin{matrix} Σ_{t - 1}^{(m)} & Σ_{t - 1}^{(m)} {\hat{A}}_{t}^{⊤} \\ {\hat{A}}_{t} Σ_{t - 1}^{(m)} & {\hat{A}}_{t} Σ_{t - 1}^{(m)} {\hat{A}}_{t}^{⊤} + {\hat{Q}}_{t} \end{matrix}) . \end{matrix}

(73)

According to the Markov property of the states, we obtain

\begin{matrix} p ({\bar{x}}_{t - 1} | x_{t}, y_{1 : T}) = p ({\bar{x}}_{t - 1} | x_{t}, y_{t - 1}) . \end{matrix}

(74)

Therefore, we have the conditional distribution as

\begin{matrix} p ({\bar{x}}_{t - 1} | x_{t}, y_{1 : T}) & = p ({\bar{x}}_{t - 1} | x_{t}, y_{t - 1}), \\ = N ({\bar{x}}_{t - 1} | {\tilde{μ}}^{(2)}, {\tilde{Σ}}^{(2)}), \end{matrix}

(75)

where

\begin{matrix} G_{t} = Σ_{t - 1}^{(m)} {\hat{A}}_{t}^{⊤} {({\hat{A}}_{t} Σ_{t - 1}^{(m)} {\hat{A}}_{t}^{⊤} + {\hat{Q}}_{t})}^{- 1}, \\ {\tilde{μ}}^{(2)} = μ_{t - 1}^{(m)} + G_{t} (x_{t} - {\hat{A}}_{t} μ_{t - 1}^{(m)}), \\ {\tilde{Σ}}^{(2)} = Σ_{t - 1}^{(m)} - G_{t} {\hat{A}}_{t} Σ_{t - 1}^{(m)} . \end{matrix}

(76)

Then, the joint distribution of

{\bar{x}}_{t - 1}

and

x_{t}

given all the data is

\begin{matrix} p ({\bar{x}}_{t - 1}, x_{t} | y_{1 : T}) & = p ({\bar{x}}_{t - 1} | x_{t}, y_{1 : T}) p (x_{t} | y_{1 : T}) \\ = N ({\bar{x}}_{t - 1} | {\tilde{μ}}^{(2)}, {\tilde{Σ}}^{(2)}) N (x_{t} | μ_{t}^{(s_{1})}, Σ_{t}^{(s_{1})}) \\ = N ((\begin{matrix} {\bar{x}}_{t - 1} \\ x_{t} \end{matrix})| μ_{t}^{(s_{3})}, Σ_{t}^{(s_{3})}), \end{matrix}

(77)

where

\begin{matrix} μ_{t}^{(s_{3})} = (\begin{matrix} G_{t} μ_{t}^{(s_{1})} + μ_{t - 1}^{(m)} - G_{t} {\hat{A}}_{t} μ_{t - 1}^{(m)} \\ μ_{t}^{(s_{1})} \end{matrix}), \end{matrix}

(78)

\begin{matrix} Σ_{t}^{s 3} = (\begin{matrix} G_{t} Σ_{t}^{(s_{1})} G_{t}^{⊤} + {\tilde{Σ}}^{(2)} & Σ_{t}^{(s_{1})} G_{t}^{⊤} \\ G_{t} Σ_{t}^{(s_{1})} & Σ_{t}^{(s_{1})} \end{matrix}) . \end{matrix}

(79)

Thus, the marginal distribution of

{\bar{x}}_{t - 1}

is given as

\begin{matrix} p ({\bar{x}}_{t - 1} | y_{1 : T}) = N ({\bar{x}}_{t - 1} | μ_{t}^{(s_{2})}, Σ_{t}^{(s_{2})}), \end{matrix}

(80)

where

\begin{matrix} μ_{t}^{(s_{2})} = G_{t} μ_{t}^{(s_{1})} + μ_{t - 1}^{(m)} - G_{t} {\hat{A}}_{t} μ_{t - 1}^{(m)}, \end{matrix}

(81)

\begin{matrix} Σ_{t}^{(s_{2})} = G_{t} Σ_{t}^{(s_{1})} G_{t}^{⊤} + {\tilde{Σ}}^{(2)} . \end{matrix}

(82)

The

p (x_{t - 1} | y_{1 : T})

will be directly obtained from

p ({\bar{x}}_{t - 1} | y_{1 : T})

. Therefore, we have derived the concrete expressions of the Gaussian smoother distributions (Equations (69)–(71)). □

Thus, we can obtain the posterior distributions of

q (x_{0 : T} | y_{1 : T})

as well as the expectations involved in the update for the other factors, which are

E_{q (x_{0 : T})} (x_{t}), E_{q (x_{0 : T})} (x_{t} x_{t}^{⊤})

and

E_{q (x_{0 : T})} ({\bar{x}}_{t} x_{t + 1}^{⊤})

.

3.2. Update for $q (ω_{1 : T})$

q (ω_{1 : T})

can be updated using a closed-form expression by the Pólya-gamma augmentation. Equations (37) and (41) results in the following update for

q (ω_{1 : T}) = \prod_{t = 1}^{T} \prod_{k = 1}^{K - 1} q (ω_{t, k})

:

\begin{matrix} ln q (ω_{t, k}) & \propto E_{- q (ω_{1 : T})} ln p (z_{t + 1}, ω_{t} | x_{t}) \\ \propto E_{- q (ω_{1 : T})} ln p (z_{t + 1} | ω_{t}, x_{t}) + E_{- q (ω_{1 : T})} ln PG (ω_{t, k} | I (z_{t + 1} > k), 0) . \end{matrix}

(83)

Since

E_{- q (ω_{1 : T})} ln p (z_{t + 1} | ω_{t}, x_{t}) \propto - \frac{1}{2} E_{- q (ω_{1 : T})} (ν_{t, k}^{2}) ω_{t, k}

, we have

\begin{matrix} q (ω_{t, k}) = PG (ω_{t, k} | E_{q (z_{1 : T})} I (z_{t + 1} > k), E_{- q (ω_{1 : T})} (ν_{t, k}^{2})) . \end{matrix}

(84)

Moreover, the expectation of the posterior Pólya-gamma distribution is available in closed form [22]:

\begin{matrix} E (ω_{t, k}) = \frac{E_{q (z_{1 : T})} I (z_{t + 1} > k)}{2 E_{q (z_{1 : T}) q (x_{1 : T}) q (θ)} (ν_{t, k}^{2})} tanh [\frac{1}{2} E_{q (z_{1 : T}) q (x_{1 : T}) q (θ)} (ν_{t, k}^{2})] . \end{matrix}

(85)

3.3. Update for $q (z_{1 : T})$

Since

E_{- q (z_{1 : T})} ln p (θ)

is not a function of

z_{1 : T}

, Equation (35) can be written as

\begin{matrix} ln q (z_{1 : T}) & \propto E_{- q (z_{1 : T})} ln p (z_{1 : T}, x_{0 : T}, ω_{1 : T}, y_{1 : T} | θ) \\ \propto E_{- q (z_{1 : T})} [ln p (z_{1} | θ) + \sum_{t = 2}^{T} ln p (z_{t} | x_{t - 1}, z_{t - 1}, ω_{t - 1}, θ) + \sum_{t = 1}^{T} ln p (x_{t} | z_{t}, θ)] . \end{matrix}

(86)

From Equation (86) it is evident that the resulting factor

q (z_{1 : T})

has the form of an HMM parameterized by

q (x_{1 : T}, ω_{1 : T}, θ)

, and the expected sufficient statistics required for updating the other factors can be obtained by message passing algorithms. We define

ln {\tilde{π}}_{z_{1}} : = E_{- q (z_{1 : T})} ln p (z_{1} | θ)

,

ln {\tilde{a}}_{z_{t} z_{t + 1}} : = E_{- q (z_{1 : T})} ln p (z_{t + 1} | z_{t}, x_{t}, ω_{t}, θ)

,

ln {\tilde{b}}_{z_{t}} (x_{t}) : = E_{- q (z_{1 : T})}

ln p (x_{t} | z_{t}, θ)

and the set of the HMM parameters is denoted as

{\hat{θ}}_{z_{1 : T}}

. Thus, we have

\begin{matrix} ln q (z_{1 : T}) \propto ln {\tilde{π}}_{z_{1}} + \sum_{t = 2}^{T} ln {\tilde{a}}_{z_{t - 1} z_{t}} (t - 1) + \sum_{t = 1}^{T} ln {\tilde{b}}_{z_{t}} (x_{t}) . \end{matrix}

(87)

Note that the densities

p (z_{1} | θ), p (z_{t + 1} | z_{t}, x_{t}, ω_{t}, θ), p (x_{t} | z_{t}, θ)

belong to the exponential family, these expectations can be evaluated in closed form. The computation of the HMM parameters by Equation (86) is summarized in Algorithm 2.

We apply a standard forward-backward algorithm for an HMM where the forward recursion is:

\begin{matrix} α_{t} (i) : = p (z_{t} = i | x_{1 : t}) \propto {\tilde{b}}_{i} (x_{t}) \sum_{j = 1}^{K} α_{t - 1} (j), \end{matrix}

(88)

where

α_{1} (i) \propto {\tilde{π}}_{i} {\tilde{b}}_{i} (x_{1})

and the backward recursion is:

\begin{matrix} β_{t} (i) : = p (x_{t + 1 : T} | z_{t} = i) \propto \sum_{j = 1}^{K} β_{t} (j) {\tilde{b}}_{j} (x_{t}) {\tilde{a}}_{i j} (t) . \end{matrix}

(89)

Algorithm 2: Obtaining

{\hat{θ}}_{z_{1 : T}}

Then we can obtain the sufficient statistics used in the update for the other factors using the following equations:

\begin{matrix} γ_{t} (i) : = p (z_{t} = i | x_{1 : T}) \propto α_{t} (i) β_{t} (i), \end{matrix}

(90)

\begin{matrix} ξ_{t} (i, j) : = p (z_{t} = i, z_{t + 1} = j | x_{1 : T}) \propto α_{t} (i) {\tilde{a}}_{i j} (t) {\tilde{b}}_{j} (x_{t + 1}) β_{t + 1} (j) . \end{matrix}

(91)

3.4. Update for $q (θ)$

Given the sufficient statistics of

q (z_{1 : T})

and

q (x_{1 : T})

obtained by message passing, the conjugate posterior distribution of the model parameters can be updated in closed form by evaluating Equation (38). For each parameter, we present the optimized variational distribution in the following:

Update $q (π_{0})$ . Simplifying Equation (38) and extracting the terms related to $π_{0}$ , we have

$\begin{matrix} ln q (π_{0}) \propto E_{q (- θ)} ln p (z_{1} | θ) + ln p (π_{0}), \end{matrix}$

(92)

which results in

$\begin{matrix} q (π_{0}) \sim Dir (π_{0} | α_{0}^{'}), \end{matrix}$

(93)

where $α_{0}^{'} = [α_{0, 1}^{'}, \dots, α_{0, k}^{'}, \dots, α_{0, K}^{'}] \in R^{K}$ and

$\begin{matrix} α_{0, k}^{'} = α_{0, k} + γ_{1} (k), \end{matrix}$

(94)

$α_{0, k}$ are prior parameters of $p (π_{0})$ .
Update $(μ_{0}, Σ_{0})$ . The posterior $q (μ_{0}, Σ_{0})$ can be obtained by the expectations of $q (x_{0 : T})$ , given by

$\begin{matrix} μ_{0} \leftarrow E_{q (x_{0 : T})} ({\bar{x}}_{0}), \end{matrix}$

(95)

$\begin{matrix} Σ_{0} \leftarrow E_{q (x_{0 : T})} ({\bar{x}}_{0} {\bar{x}}_{0}^{⊤}) - μ_{0} {(μ_{0})}^{⊤} . \end{matrix}$

(96)
Update $q (A_{k}, Q_{k})$ . Simplifying Equation (38) and extracting the terms related to $(A_{k}, Q_{k})$ , we have

$\begin{matrix} ln q (A_{k}, Q_{k}) \propto \\ E_{- q (θ)} [- \frac{\sum_{t = 1}^{T} γ_{t} (k)}{2} ln | Q_{k} | - \frac{1}{2} \sum_{t = 1}^{T} γ_{t} (k) {(x_{t} - A_{k} {\bar{x}}_{t - 1})}^{⊤} Q_{k}^{- 1} (x_{t} - A_{k} {\bar{x}}_{t - 1})] \\ + ln p (A_{k}, Q_{k}), \end{matrix}$

(97)

which results in

$\begin{matrix} q (A_{k}, Q_{k}) \sim MNIW (A_{k}, Q_{k} | η_{k}^{'}), \end{matrix}$

(98)

where $η_{k}^{'} : = \{M_{k}^{(η^{'})}, Φ_{k}^{(η^{'})}, ι_{k}^{(η^{'})}, Δ_{k}^{(η^{'})}\}$ ,

$\begin{matrix} q (A_{k}, Q_{k}) & \propto | Q_{k} |^{- \frac{ι_{k}^{(η^{'})} + (p + 1) m + 1}{2}} exp {- \frac{1}{2} Tr (Δ_{k}^{(η^{'})} Q_{k}^{- 1}) \\ - \frac{1}{2} Tr [{(Φ_{k}^{(η^{'})})}^{- 1} {(A_{k} - M_{k}^{(η^{'})})}^{⊤} Q_{k}^{- 1} (A_{k} - M_{k}^{(η^{'})})]} . \end{matrix}$

(99)

The variational posterior parameters can be optimized as

$\begin{matrix} Φ_{k}^{(η^{'})} = {({(Φ_{k}^{(η)})}^{- 1} + \sum_{t = 1}^{T} E_{q (x_{0 : T})} (γ_{t} (k) {\bar{x}}_{t - 1} {\bar{x}}_{t - 1}^{⊤}))}^{- 1}, \\ M_{k}^{(η^{'})} = (M_{k}^{(η)} {(Φ_{k}^{(η)})}^{- 1} + \sum_{t = 1}^{T} E_{q (x_{0 : T})} (γ_{t} (k) x_{t} {\bar{x}}_{t - 1}^{⊤})) Φ_{k}^{(η^{'})}, \\ Δ_{k}^{(η^{'})} = Δ_{k}^{(η)} + \sum_{t = 1}^{T} E_{q (x_{0 : T})} (γ_{t} (k) x_{t} x_{t}^{⊤}) + M_{k}^{(η)} {(Φ_{k}^{(η)})}^{- 1} {(M_{k}^{(η)})}^{⊤} - M_{k}^{(η^{'})} {(Φ_{k}^{(η^{'})})}^{- 1} {(M_{k}^{(η^{'})})}^{⊤}, \\ ι_{k}^{(η^{'})} = ι_{k}^{(η)} + \sum_{t = 1}^{T} γ_{t} (k), \end{matrix}$

(100)

where $M_{k}^{(η)}, Φ_{k}^{(η)}, ι_{k}^{(η)}, Δ_{k}^{(η)}$ are prior parameters of $p (A_{k}, Q_{k})$ .
Update $q (C_{k}, S_{k})$ . Simplifying Equation (38) and extracting the terms related to $(C_{k}, S_{k})$ , we have

$\begin{matrix} ln q (C_{k}, S_{k}) \propto \\ E_{- q (θ)} [- \frac{\sum_{t = 1}^{T} γ_{t} (k)}{2} ln | S_{k} | - \frac{1}{2} \sum_{t = 1}^{T} {(y_{t} - C_{k} x_{t})}^{⊤} S_{k}^{- 1} (y_{t} - C_{k} x_{t})] \\ + ln p (C_{k}, S_{k}), \end{matrix}$

(101)

which results in

$\begin{matrix} q (C_{k}, S_{k}) \sim MNIW (C_{k}, S_{k} | λ_{k}^{'}), \end{matrix}$

(102)

where $λ_{k}^{'} : = \{M_{k}^{(λ^{'})}, Φ_{k}^{(λ^{'})}, ι_{k}^{(λ^{'})}, Δ_{k}^{(λ^{'})}\}$ ,

$\begin{matrix} q (C_{k}, S_{k}) & \propto | S_{k} |^{- \frac{ι_{k}^{(λ^{'})} + (p + 1) m + 1}{2}} exp {- \frac{1}{2} Tr (Δ_{k}^{(λ^{'})} S_{k}^{- 1}) \\ - \frac{1}{2} Tr [{(Φ_{k}^{(λ^{'})})}^{- 1} {(C_{k} - M_{k}^{(λ^{'})})}^{⊤} Q_{k}^{- 1} (C_{k} - M_{k}^{(λ^{'})})]} . \end{matrix}$

(103)

The variational posterior parameters can be optimized as

$\begin{matrix} Φ_{k}^{(λ^{'})} = {({(Φ_{k}^{(λ)})}^{- 1} + \sum_{t = 1}^{T} E_{q (x_{0 : T})} (γ_{t} (k) x_{t} x_{t}^{⊤}))}^{- 1}, \\ M_{k}^{(λ^{'})} = (M_{k}^{(λ)} {(Φ_{k}^{(λ)})}^{- 1} + \sum_{t = 1}^{T} E_{q (x_{0 : T})} (γ_{t} (k) y_{t} x_{t}^{⊤})) Φ_{k}^{(λ^{'})}, \\ Δ_{k}^{(λ^{'})} = Δ_{k}^{(λ)} + \sum_{t = 1}^{T} γ_{t} (k) y_{t} y_{t}^{⊤} + M_{k}^{(λ)} {(Φ_{k}^{(λ)})}^{- 1} {(M_{k}^{(λ)})}^{⊤} - M_{k}^{(λ^{'})} {(Φ_{k}^{(λ^{'})})}^{- 1} {(M_{k}^{(λ^{'})})}^{⊤}, \\ ι_{k}^{(λ^{'})} = ι_{k}^{(λ)} + \sum_{t = 1}^{T} γ_{t} (k), \end{matrix}$

(104)

where $M_{k}^{(λ)}, Φ_{k}^{(λ)}, ι_{k}^{(λ)}, Δ_{k}^{(λ)}$ are prior parameters of $p (C_{k}, S_{k})$ .
Update $q (R_{k})$ . Simplifying Equation (38) and extracting the terms related to $R_{k}$ , we have

$\begin{matrix} ln q (R_{k}) \propto \\ E_{- q (θ)} [- \frac{1}{2} \sum_{t = 1}^{T - 1} γ_{t} (k) {(R_{k} x_{t} + r_{k} - Ω_{t}^{- 1} κ_{t + 1})}^{⊤} Ω_{t} (R_{k} x_{t} + r_{k} - Ω_{t}^{- 1} κ_{t + 1})] \\ + \sum_{i = 1}^{K - 1} ln p (R_{k, i}), \end{matrix}$

(105)

which results in

$\begin{matrix} q (R_{k}) \sim \prod_{i = 1}^{K - 1} N (R_{k, i} | ζ_{k, i}^{'}), \end{matrix}$

(106)

where $ζ_{k, i}^{'} : = \{μ_{k, i}^{(ζ^{'})}, Σ_{k, i}^{(ζ^{'})}\}$ , which can be optimized as

$\begin{matrix} Σ_{k, i}^{(ζ^{'})} = {[{(Σ_{k, i}^{(ζ)})}^{- 1} + \sum_{t = 1}^{T} E_{q (ω_{1 : T}, x_{0 : T})} (x_{t} γ_{t} (k) Ω_{t, i} x_{t}^{⊤})]}^{- 1}, \end{matrix}$

(107)

$\begin{matrix} μ_{k, i}^{(ζ^{'})} = Σ_{k, i}^{(ζ^{'})} [\sum_{t = 1}^{T - 1} E_{q (ω_{1 : T}) q (x_{0 : T})} (γ_{t} (k) Ω_{t, i} H_{t, k, i} x_{t}) + {(Σ_{k, i}^{(ζ)})}^{- 1} μ_{k, i}^{(ζ)}], \end{matrix}$

(108)

$\begin{matrix} H_{t, k} = {[H_{t, k, 1}, \dots, H_{t, k, i}, \dots, H_{t, k, K - 1}]}^{⊤} = E_{q (r_{k}) q (ω_{1 : T})} (r_{k} - Ω_{t}^{- 1} κ_{t + 1}), \end{matrix}$

(109)

and $μ_{k, i}^{(ζ)}, Σ_{k, i}^{(ζ)}$ are prior parameters of $p (R_{k, i})$ . To derive Equation (107), just note that $R_{k}^{⊤} Ω_{t} R_{k} = \sum_{i = 1}^{K - 1} R_{k, i} Ω_{k, i} R_{k, i}^{⊤}$ and use the matrix trace.
Update $q (r_{k})$ . Simplifying Equation (38) and extracting the terms related to $r_{k}$ , we have

$\begin{matrix} ln q (r_{k}) \propto \\ E_{- q (θ)} [- \frac{1}{2} \sum_{t = 1}^{T - 1} γ_{t} (k) {(R_{k} x_{t} + r_{k} - Ω_{t}^{- 1} κ_{t + 1})}^{⊤} Ω_{t} (R_{k} x_{t} + r_{k} - Ω_{t}^{- 1} κ_{t + 1})] \\ + ln p (r_{k}), \end{matrix}$

(110)

which results in

$\begin{matrix} q (r_{k}) \sim N (r_{k} | ρ_{k}^{'}), \end{matrix}$

(111)

where $ρ_{k}^{'} : = \{μ_{k}^{(ρ^{'})}, Σ_{k}^{(ρ^{'})}\}$ , which can be optimized as

$\begin{matrix} Σ_{k}^{(ρ^{'})} = {[{(Σ_{k}^{(ρ)})}^{- 1} + \sum_{t = 1}^{T - 1} E_{q (ω_{1 : T})} (γ_{t} (k) Ω_{t})]}^{- 1}, \end{matrix}$

(112)

$\begin{matrix} μ_{k}^{(ρ^{'})} = Σ_{k}^{(ρ^{'})} [{(Σ_{k}^{(ρ)})}^{- 1} μ_{k}^{(ρ)} + \sum_{t = 1}^{T - 1} E_{q (R_{k}) q (x_{0 : T}) q (ω_{1 : T})} (γ_{t} (k) Ω_{t} (R_{k} x_{t} - Ω_{t}^{- 1} κ_{t + 1}))], \end{matrix}$

(113)

where $μ_{k}^{(ρ)}, Σ_{k}^{(ρ)}$ are prior parameters of $p (r_{k})$ .

3.5. Initialization

To obtain a reliable variational inference, we initialized the model parameters and latent states with reasonable values by using the following initialization procedure:

Probabilistic Principal Component Analysis (PPCA) [27] is conducted on the data, to initialize the continuous latent variables, $x_{0 : T}$ and the parameters C;
Fit an AR(p)-HMM to initialize the switching states, $z_{1 : T}$ and the parameters, $\{A_{k}, Q_{k}\}$ . The autoregressive order, p, is determined by AIC, BIC, and HQ criterion;
To alleviate the possible and undesirable dependence on ordering that arises from the stick-breaking formulation during the inference, we adopt a strategy of greedily fitting a decision list [21] to identify the most suitable permutation of the switching states for the stick-breaking process. Specifically, we start by performing a greedy search on permutations by creating a decision list based on $(x_{t}, z_{t}), z_{t + 1}$ pairs, given by

$\begin{matrix} z_{t + 1} = \{\begin{matrix} o_{1} & if p_{1} \\ o_{2} & if \neg p_{1} \cap p_{2} \\ ⋮ \\ o_{K} & if \cap_{i = 1}^{K - 1} (\neg p_{i}) \end{matrix}, \end{matrix}$

(114)

where $(o_{1}, \dots, o_{K})$ is a permutation of $(1, \dots, K)$ , and $p_{1}, \dots, p_{k}$ are predicates that rely on $(x_{t}, z_{t})$ and provide a true or false. In our framework, these predicates are defined by logistic functions:

$\begin{matrix} p_{j} = σ (R_{0, j}^{⊤} x_{t}) > ε, \end{matrix}$

(115)

where $ε \in (0, 1)$ should be predetermined. To determine $o_{1}$ and $r_{1}$ , we used the maximum a posteriori estimate of the model for each of the K potential states. For the kth logistic regression, the inputs are $x_{0 : T}$ and the outputs are $y_{t} = I (z_{t + 1} = k)$ . We chose the logistic regression model with the largest likelihood as the first output. Then we excluded time points where $z_{t + 1} = o_{1}$ from the data and proceeded to $K - 1$ logistic regressions to predict the subsequent output, $o_{2}$ , and so on. After cycling through all K results, we obtained the permutation of $z_{1 : T}$ . In addition, the ${\{R_{0, j}\}}_{j = 1}^{K - 1}$ served as an initialization of the recurrence weights (R).

4. Numerical Experiments

To investigate the performance of the proposed HO-rSLDS, we generated synthetic datasets through the following steps:

Benchmark model settings.We take AR-HSMM as a benchmark model to generate synthetic time series. Specifically, the dimensionality of the synthetic time series is $n = 5$ , and the total length is $T = 1000$ . The state number is $K = 3$ , and the autoregressive order p takes values $\{1, 2, 3\}$ .
Generating $(A_{k}, Q_{k})$ . $(A_{k}, Q_{k})$ corresponds to the emission parameters involved in the VAR process of latent variables $x_{t}$ (Equation (11)). The autoregressive coefficient matrices $A_{k}$ are generated with 50% sparsity (defined as the proportion of the zero elements), i.e., 50% of the elements in $A_{k}$ are 0. The non-zero elements of $A_{k}$ are generated to be positive or negative with equal probability, and their absolute value is sampled uniformly between 0.2 and 0.5. The covariance matrices $Q_{k}$ are generated as $Q_{k} = P_{k}^{⊤} diag (σ_{1}, \dots, σ_{d}) P_{k}$ , with $P_{k}$ being an orthogonal matrix and $σ_{i}$ assumed positive. The matrix $P_{k}$ is constructed by orthogonalizing a random matrix whose entries are simulated from a standard Normal, while each $σ_{i}$ is uniformly sampled in the interval $[1, 3]$ . Additionally, a rejection step is done to check that the sampled $(A_{k}, Q_{k})$ constituted a stable VAR process.
Generating switching state sequence $z_{1 : T}$ . To generate $z_{1 : T}$ , we set the state transition probabilities $π_{j k}$ to 0.5, for $j \neq k = 1, 2, 3$ . Note that in an HSMM, the self-transition probability is 0. Furthermore, we simulate the state duration time in two cases. In the first case, the state duration time is sampled from a geometric distribution (the between-state transition probabilities are set to 0.1). In the second case, the state duration time is directly sampled from $\{1, \dots, 10\}$ with specified sampling probabilities. Since the second case does not correspond to a parametric distribution, we denote these two cases as geometric and nonparametric, respectively. In addition, the initial state is uniformly sampled from $\{1, \dots, 10\}$ .
Generating $(C_{k}, S_{k})$ . $(C_{k}, S_{k})$ corresponds to the emission parameters generating the Gaussian observations $y_{t}$ (Equation (12)). We generate $(C_{k}, S_{k})$ in the same manner as $(A_{k}, Q_{k})$ .
Repeat the above procedure 100 times to obtain 100 synthetic data by setting different random seeds when generating $(A_{k}, Q_{k}, C_{k}, S_{k})$ .

Figure 4 displays all dimensions and the true state sequence of one synthetic dataset.

We investigated the performance of HO-rSLDS by comparing with the competing models, while the estimation accuracy is evaluated by the normalized mean squared error (NMSE) between the true and estimated emission model parameters and the signal-to-noise ratio (SNR) of the inferred optimized switching state sequence (

{\hat{z}}_{1 : T}

), given by

\begin{matrix} NMSE = \frac{1}{K} \sum_{k = 1}^{K} \frac{∥ {\hat{C}}_{k} - C_{k} ∥_{F}}{∥ C_{k} ∥_{F}}, \end{matrix}

(116)

\begin{matrix} SNR = \frac{\sum_{t = 1}^{T} I (z_{t} = {\hat{z}}_{t})}{T}, \end{matrix}

(117)

where

{∥ \cdot ∥}_{F}

denotes Frobenius norm,

{\hat{C}}_{k}

and

{\hat{z}}_{1 : T}

are estimated values. For AR-HMM, the NMSE is evaluated based on the true and estimated VAR emission parameters. To obtain

{\hat{z}}_{1 : T}

, we set

{\hat{z}}_{t} = \underset{k}{arg max} γ_{t} (k)

. In addition, we avoid the optimization about discrete state number (K) and the dependence order (p) in HO-rSLDS by directly setting them to the true value.

The results of estimated NMSE and SNR for the three models are summarized in Table 1 and Table 2, respectively. It is observed that HO-rSLDS outperforms AR-HMM and SLDS with lower average NMSE and higher average SNR for all the autoregressive orders and state duration distribution combinations. In addition, when

p = 1

, for both geometric and nonparametric state duration, SLDS outperforms AR-HMM slightly, which may be because the synthetic data are generated by a simulated extensive SLDS structure. However, when

p > 1

and there is long-term dependence involved in the latent variables or observation data, it is just the opposite, since AR-HMM with higher p can capture the temporal relationships in the observation data while SLDS cannot. Moreover, when the transition of the underlying discrete states is characterized by a nonparametric duration distribution, the estimation accuracy of AR-HMM and SLDS is reduced, while HO-rSLDS remains stable with good estimation accuracy. This is because state transitions in both AR-HMM and SLDS are characterized by a Markov assumption and restricted to geometric duration distribution. Thus, more weight is given to shorter consecutive time periods within a certain state, which means the regimes switch frequently, which may not be appropriate in cases without sufficient prior information about a geometric state duration. By comparison, HO-rSLDS exhibits robustness to the state duration distributions since it improves the state transition by leveraging the stick-breaking logistic regression. The current state

z_{t}

no longer depends on only the preceding state

z_{t - 1}

, but also the latent variable

x_{t - 1}

, and, furthermore, the previous observations.

A more exhaustive comparison is illustrated in Figure 5 where HO-rSLDS outperforms the competing models with better estimation accuracy and statistical significance in all the cases (t-test, **

p < 0.01

).

5. Dynamic Functional Connectivity Analysis in fMRI Data

To show that HO-rSLDS can also work well with real-world data, we evaluate the performance of HO-rSLDS by comparing it with standard SLDS based on public functional magnetic resonance imaging (fMRI) data. The data are collected in the Human Connectome Project (HCP) 1200 Parcellation Timeseries Netmats (HCP1200-TPN) dataset (Available at https://www.humanconnectome.org, (accessed on 6 March 2024)). Please see [28] for a detailed explanation of the entire acquisition protocol. In this study, we apply HO-rSLDS to data from 20 unrelated subjects with dimension

d = 15

and times series length

T_{0} = 4800

for each subject included in the HCP1200-PTN release to perform dynamic functional connectivity (DFC) analysis, which can be used to analysis the dynamical temporal coherence among endogenous fluctuations in distributed brain regions [29,30]. In general, HO-rSLDS and SLDS assume that there are metastable states with characteristic connectivity pattern in the brain, while the connectivity pattern can be represented by the statistical correlations directly. Specifically, by Lemma 1, the functional connectivity pattern of state k in HO-rSLDS and SLDS can be obtained as

\begin{matrix} {DFC}_{k} = \frac{\sum_{t = 1}^{T} γ_{t} (k) [E C_{k} (E x_{t} x_{t}^{⊤} - E x_{t} E x_{t}^{⊤}) E C_{k}^{⊤} + E S_{k}]}{\sum_{t = 1}^{T} γ_{t} (k)} . \end{matrix}

(118)

In addition, there are two unknown parameters: the state number K and the dependence order p. We use the mixture minimum description length (MMDL) considering optimal code lengths for each state in a mixed model [31], given by

\begin{matrix} MMDL (K) : = - ln L (K) + \frac{1}{2} ln (T) K (K - 1) + \sum_{k = 1}^{K} \frac{1}{2} ln (T {\bar{γ}}_{k}) d^{2}, \end{matrix}

(119)

where

L (K)

is the variational lower bound evaluated during the model inference and

{\bar{γ}}_{k} : = \sum_{t = 1}^{T} γ_{t} (k)

. The concept of MMDL may be elucidated by the minimal description length principle in [32]. The state number K is determined by minimizing the MMDL on a specified list of K. In addition, we fit VAR model on the fMRI time series and the optimal dependence order p will be determined by AIC, BIC and HQ criteria [33], given by

\begin{matrix} AIC : = ln (det (\hat{Σ})) + \frac{2 p d^{2}}{T}, \end{matrix}

(120)

\begin{matrix} BIC : = ln (det (\hat{Σ})) + \frac{ln (T) p d^{2}}{T}, \end{matrix}

(121)

\begin{matrix} HQ : = ln (det (Σ)) + \frac{2 ln (ln T) p d^{2}}{T}, \end{matrix}

(122)

where

\hat{Σ}

is the estimated covariance matrix of noise in VAR model. To avoid the model order too large to permit feasible computation, we choose the lag order on the condition when AIC/BIC/HQ show no further substantial decreases at higher orders [34]. As illustrated in Figure 6, we set

K = 4

according to the minimal MMDL and

p = 4

since the downward trend of AIC, BIC and HQ tends to be flat with higher order.

In Figure 7, we present the DFC patterns derived from the proposed HO-rSLDS and the standard SLDS. The 15 brain regions producing the fMRI time series are indexed by numbers 1 to 15. The states are ordered by their fractional occupancy (defined as the percentage of time allocated to that specific state); it is evident that the 4 states from SLDS have a more average fractional occupancy distribution. Moreover, it is observed that DFC patterns derived from SLDS are more similar than those from HO-rSLDS, where there is only a slight difference in DFC patterns between State4 and the other three states from SLDS. We use cosine similarity to measure the difference of the DFC patterns characterized by each state, given by

\begin{matrix} cos (F_{1}, F_{2}) = \frac{Tr (F_{1} F_{2})}{∥ F_{1} ∥_{F} {∥ F_{2} ∥}_{F}}, \end{matrix}

(123)

where

F_{1}

and

F_{2}

are square matrices. By this means, the averaged cosine similarity for SLDS is 0.9189 while that for HO-rSLDS is 0.6731. Although the real DFC patterns of this fMRI data are unknown, compared to standard SLDS, HO-rSLDS can indeed dig out more information about the DFC patterns by the model superiority. For example, the State4 in HO-rSLDS, with the lowest fractional occupancy, can be considered as a strongly connected state where the absolute functional connectivity is significantly larger than that of the other three states. Moreover, compared with standard SLDS, there is significantly positive or negative functional connectivity in the four states derived from HO-rSLDS. In addition, there is similar functional connectivity between State1 and State2, which results in a higher cosine similarity than the averaged value for HO-rSLDS (0.7776). The similar functional connectivity is not exhibited obviously in State3 (such as the negative connectivity between region 8 and regions 11, 12, 13). However, these findings cannot be obtained by standard SLDS.

6. Conclusions

In this study, we have established a new method named HO-rSLDS as an extension of SLDS that improves the statistical modeling of dynamics from complex systems. First, HO-rSLDS addresses the problem of discovering a higher-order temporal dependence involved in the latent variables or observation data by allowing the current latent variable to be associated with the measurements of several previous time steps as in AR-HMM. Second, HO-rSLDS improves the switching state transition by utilizing stick-breaking logistic regression and Pólya-gamma augmentation, making the transition dependent on the latent variable and thus overcoming the limitations of the Markov assumption, leading to a geometric distribution of state durations. Moreover, the symmetric dependency between the switching states and the latent variable is recovered from the asymmetric relationships as in standard SLDS. We have presented the detailed inference algorithms of the HO-rSLDS for message passing and parameter learning. In numerical experiments, we concluded that HO-rSLDS outperforms the competing models with higher estimation accuracy and robustness. The utility and versatility of the developed HO-rSLDS have been demonstrated on real-world data as well. The application of fMRI data for DFC analyses indicates the superiority of the method in discovering abundant information about DFC patterns. A potential limitation of the proposed HO-rSLDS is that the state number and dependency order need to be predetermined in applications, while the hyperparameter optimization is always a non-trivial task. Future research will improve the proposed model by utilizing Bayesian nonparametric methods so that the complicated hyperparameter optimization can be efficiently avoided [35]. The proposed model improves the current methods for learning the switching linear dynamical modes, which will facilitate the identification and assessment of the dynamics of complex systems.

Author Contributions

Conceptualization, H.W. and J.C.; methodology, H.W.; software, H.W.; validation, H.W.; formal analysis, H.W. and J.C.; investigation, H.W.; resources, H.W. and J.C.; data curation, J.C.; writing—original draft preparation, H.W.; writing—review and editing, J.C.; visualization, H.W.; supervision, J.C.; project administration, J.C.; funding acquisition, J.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation (grant number 81671633), and the project was supported by the Open Fund of Hubei Longzhong Laboratory.

Data Availability Statement

The data that supported the findings of this study are included in this article.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SLDS	Switching linear dynamical systems
HO-SLDS	Switching linear dynamical systems with higher-order dependence
HO-rSLDS	Recurrent switching linear dynamical systems with higher-order dependence
HMM	Hidden Markov model
VAR	Vector autoregressive
AR-HMM	Autoregressive hidden Markov model
AR-HSMM	Autoregressive hidden semi-Markov model
NMSE	Normalized mean squared error
SNR	Signal-to-noise ratio
fMRI	Functional magnetic resonance imaging
DFC	Dynamic functional connectivity

References

Fox, E.; Sudderth, E.B.; Jordan, M.I.; Willsky, A.S. Bayesian nonparametric inference of switching dynamic linear models. IEEE Trans. Signal Process. 2011, 59, 1569–1585. [Google Scholar] [CrossRef]
Pandarinath, C.; O’Shea, D.J.; Collins, J.; Jozefowicz, R.; Stavisky, S.D.; Kao, J.C.; Trautmann, E.M.; Kaufman, M.T.; Ryu, S.I.; Hochberg, L.R.; et al. Inferring single-trial neural population dynamics using sequential auto-encoders. Nat. Methods 2018, 15, 805–815. [Google Scholar] [CrossRef] [PubMed]
Frigola, R.; Chen, Y.; Rasmussen, C.E. Variational Gaussian process state-space models. In Proceedings of the Annual Conference on Neural Information Processing Systems 2014, Montreal, QC, Canada, 8–13 December 2014; Volume 27. [Google Scholar]
Krishnan, R.; Shalit, U.; Sontag, D. Structured inference networks for nonlinear state space models. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar]
Hamilton, J.D. Time Series Analysis; Princeton University Press: Princeton, NJ, USA, 2020. [Google Scholar]
de Souza Baptista, R.; Bó, A.P.; Hayashibe, M. Automatic human movement assessment with switching linear dynamic system: Motion segmentation and motor performance. IEEE Trans. Neural Syst. Rehabil. Eng. 2016, 25, 628–640. [Google Scholar] [CrossRef] [PubMed]
Liu, H.; Wang, L. Human motion prediction for human-robot collaboration. J. Manuf. Syst. 2017, 44, 287–294. [Google Scholar] [CrossRef]
Alameda-Pineda, X.; Drouard, V.; Horaud, R.P. Variational inference and learning of piecewise linear dynamical systems. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 3753–3764. [Google Scholar] [CrossRef] [PubMed]
Ma, T.; Srinivasan, S.; Lazarou, G.; Picone, J. Continuous speech recognition using linear dynamic models. Int. J. Speech Technol. 2014, 17, 11–16. [Google Scholar] [CrossRef]
Pagan, A.R.; Pesaran, M.H. Econometric analysis of structural systems with permanent and transitory shocks. J. Econ. Dyn. Control 2008, 32, 3376–3395. [Google Scholar] [CrossRef]
Haluszczynski, A.; Räth, C. Controlling nonlinear dynamical systems into arbitrary states using machine learning. Sci. Rep. 2021, 11, 12991. [Google Scholar] [CrossRef] [PubMed]
Smith, J.F.; Pillai, A.; Chen, K.; Horwitz, B. Identification and validation of effective connectivity networks in functional magnetic resonance imaging using switching linear dynamic systems. Neuroimage 2010, 52, 1027–1040. [Google Scholar] [CrossRef]
Wang, E.T.; Vannucci, M.; Haneef, Z.; Moss, R.; Rao, V.R.; Chiang, S. A Bayesian switching linear dynamical system for estimating seizure chronotypes. Proc. Natl. Acad. Sci. USA 2022, 119, e2200822119. [Google Scholar] [CrossRef]
Rabiner, L.R. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 1989, 77, 257–286. [Google Scholar] [CrossRef]
Vidaurre, D. A new model for simultaneous dimensionality reduction and time-varying functional connectivity estimation. PLoS Comput. Biol. 2021, 17, e1008580. [Google Scholar] [CrossRef] [PubMed]
Hamada, R.; Kubo, T.; Ikeda, K.; Zhang, Z.; Shibata, T.; Bando, T.; Hitomi, K.; Egawa, M. Modeling and prediction of driving behaviors using a nonparametric Bayesian method with AR models. IEEE Trans. Intell. Veh. 2016, 1, 131–138. [Google Scholar] [CrossRef]
Houpt, J.W.; Frame, M.E.; Blaha, L.M. Unsupervised parsing of gaze data with a beta-process vector auto-regressive hidden Markov model. Behav. Res. Methods 2018, 50, 2074–2096. [Google Scholar] [CrossRef] [PubMed]
Vidaurre, D.; Quinn, A.J.; Baker, A.P.; Dupret, D.; Tejero-Cantero, A.; Woolrich, M.W. Spectrally resolved fast transient brain states in electrophysiological data. Neuroimage 2016, 126, 81–95. [Google Scholar] [CrossRef] [PubMed]
Glennie, R.; Adam, T.; Leos-Barajas, V.; Michelot, T.; Photopoulou, T.; McClintock, B.T. Hidden Markov models: Pitfalls and opportunities in ecology. Methods Ecol. Evol. 2023, 14, 43–56. [Google Scholar] [CrossRef]
Johnson, M.J.; Willsky, A.S. Bayesian Nonparametric Hidden Semi-Markov Models. J. Mach. Learn. Res. 2013, 14, 673–701. [Google Scholar]
Linderman, S.W.; Johnson, M.J.; Miller, A.C.; Adams, R.P.; Blei, D.M.; Paninski, L. Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), Fort Lauderdale, FL, USA, 20–22 April 2017; Volume 54, pp. 914–922. [Google Scholar]
Polson, N.G.; Scott, J.G.; Windle, J. Bayesian inference for logistic models using Pólya–Gamma latent variables. J. Am. Stat. Assoc. 2013, 108, 1339–1349. [Google Scholar] [CrossRef]
Jacobs, W.R.; Baldacchino, T.; Dodd, T.; Anderson, S.R. Sparse Bayesian nonlinear system identification using variational inference. IEEE Trans. Autom. Control 2018, 63, 4172–4187. [Google Scholar] [CrossRef]
Yu, S.Z. Hidden semi-Markov models. Artif. Intell. 2010, 174, 215–243. [Google Scholar] [CrossRef]
Liu, Q.; Liu, W.; Dong, M.; Li, Z.; Zheng, Y. Residual useful life prognosis of equipment based on modified hidden semi-Markov model with a co-evolutional optimization method. Comput. Ind. Eng. 2023, 182, 109433. [Google Scholar] [CrossRef]
Särkkä, S.; Svensson, L. Bayesian Filtering and Smoothing; Cambridge University Press: Cambridge, UK, 2023; Volume 17. [Google Scholar]
Tipping, M.E.; Bishop, C.M. Probabilistic principal component analysis. J. R. Stat. Soc. Ser. B Stat. Methodol. 1999, 61, 611–622. [Google Scholar] [CrossRef]
Glasser, M.F.; Sotiropoulos, S.N.; Wilson, J.A.; Coalson, T.S.; Fischl, B.; Andersson, J.L.; Xu, J.; Jbabdi, S.; Webster, M.; Polimeni, J.R.; et al. The minimal preprocessing pipelines for the Human Connectome Project. Neuroimage 2013, 80, 105–124. [Google Scholar] [CrossRef] [PubMed]
Hutchison, R.M.; Womelsdorf, T.; Allen, E.A.; Bandettini, P.A.; Calhoun, V.D.; Corbetta, M.; Della Penna, S.; Duyn, J.H.; Glover, G.H.; Gonzalez-Castillo, J.; et al. Dynamic functional connectivity: Promise, issues, and interpretations. Neuroimage 2013, 80, 360–378. [Google Scholar] [CrossRef] [PubMed]
Motlaghian, S.; Vahidi, V.; Baker, B.; Belger, A.; Bustillo, J.; Faghiri, A.; Ford, J.; Iraji, A.; Lim, K.; Mathalon, D.; et al. A method for estimating and characterizing explicitly nonlinear dynamic functional network connectivity in resting-state fMRI data. J. Neurosci. Methods 2023, 389, 109794. [Google Scholar] [CrossRef]
Zhang, G.; Cai, B.; Zhang, A.; Stephen, J.M.; Wilson, T.W.; Calhoun, V.D.; Wang, Y.P. Estimating dynamic functional brain connectivity with a sparse hidden Markov model. IEEE Trans. Med. Imaging 2019, 39, 488–498. [Google Scholar] [CrossRef] [PubMed]
Barron, A.; Rissanen, J.; Yu, B. The minimum description length principle in coding and modeling. IEEE Trans. Inf. Theory 1998, 44, 2743–2760. [Google Scholar] [CrossRef]
Gredenhoff, M.; Karlsson, S. Lag-length selection in VAR-models using equal and unequal lag-length procedures. Comput. Stat. 1999, 14, 171–187. [Google Scholar] [CrossRef]
Brovelli, A.; Ding, M.; Ledberg, A.; Chen, Y.; Nakamura, R.; Bressler, S.L. Beta oscillations in a large-scale sensorimotor cortical network: Directional influences revealed by Granger causality. Proc. Natl. Acad. Sci. USA 2004, 101, 9849–9854. [Google Scholar] [CrossRef]
Teh, Y.W.; Jordan, M.I.; Beal, M.J.; Blei, D.M. Hierarchical Dirichlet Processes. J. Am. Stat. Assoc. 2006, 101, 1566–1581. [Google Scholar] [CrossRef]

Figure 1. Graphical model of HMM (a) and HSMM (b). An HMM typically consists of two layers: a hidden state layer based on a Markov chain and an observation layer dependent on the current state. An HSMM enhances the generative process of the standard HMM with a random state duration time, which is drawn from some state-specific distribution when the state is entered. K denotes the cardinality of the hidden state space, which can be a finite or infinite value.

Figure 2. The graphical model of the SLDS, which consists of switching states (

z_{t}

), latent variables (

x_{t}

) and observations (

y_{t}

). Note that we assume it does not contain a link from

z_{t}

to

y_{t}

.

Figure 2. The graphical model of the SLDS, which consists of switching states (

z_{t}

), latent variables (

x_{t}

) and observations (

y_{t}

). Note that we assume it does not contain a link from

z_{t}

to

y_{t}

.

Figure 3. The graphical model of the HO-rSLDS, where there is higher-order dependence involved in

x_{t}

(blue) and the recurrent dependence from

x_{t - 1}

to

z_{t}

(red).

Figure 3. The graphical model of the HO-rSLDS, where there is higher-order dependence involved in

x_{t}

(blue) and the recurrent dependence from

x_{t - 1}

to

z_{t}

(red).

Figure 4. Simulated time series with dimension

d = 5

and total time points

T = 1000

(the last 200 time points are exhibited). The five dimensions are displayed in different colors. The color intervals indicate the state sequence.

Figure 4. Simulated time series with dimension

d = 5

and total time points

T = 1000

(the last 200 time points are exhibited). The five dimensions are displayed in different colors. The color intervals indicate the state sequence.

Figure 5. Box-plot of NMSE and SNR derived from AR-HMM, SLDS and HO-rSLDS based on the 100 numerical experiments. HO-rSLDS outperforms the competing models with significant lower NMSE and higher SNR on average for all the autoregressive order and state duration distribution combination. (t-test, **

p < 0.01

).

Figure 5. Box-plot of NMSE and SNR derived from AR-HMM, SLDS and HO-rSLDS based on the 100 numerical experiments. HO-rSLDS outperforms the competing models with significant lower NMSE and higher SNR on average for all the autoregressive order and state duration distribution combination. (t-test, **

p < 0.01

).

Figure 6. (a) State number (K) determined by MMDL. The best

K = 4

is chosen with respect to the minimal MMDL. The y-axis has been logarithmically scaled to highlight the variation. (b) Dependence order p is determined by AIC, BIC, and HQ criteria. The ideal p is set to 4 since there is no further substantial decrease with higher order.

Figure 6. (a) State number (K) determined by MMDL. The best

K = 4

is chosen with respect to the minimal MMDL. The y-axis has been logarithmically scaled to highlight the variation. (b) Dependence order p is determined by AIC, BIC, and HQ criteria. The ideal p is set to 4 since there is no further substantial decrease with higher order.

Figure 7. DFC patterns derived from HO-rSLDS (a) and standard SLDS (b). The 15 brain regions are indexed by the numbers 1 to 15, and the states are ordered by their fractional occupancy (number in the brackets). The color bar is shared by the two group heatmaps.

Table 1. NMSE derived from AR-HMM, SLDS, and HO-rSLDS based on the synthetic time series with different state duration distribution and autoregressive order in the generative model. All the results are the average of the 100 experiments (mean ± std).

NMSE		$p = 1$	$p = 2$	$p = 3$
Geometric	AR-HMM	0.25 ± 0.06	0.33 ± 0.07	0.37 ± 0.09
	SLDS	0.23 ± 0.05	0.39 ± 0.09	0.44 ± 0.11
	HO-rSLDS	0.16 ± 0.04	0.17 ± 0.05	0.20 ± 0.06
Nonparametric	AR-HMM	0.32 ± 0.07	0.41 ± 0.09	0.48 ± 0.11
	SLDS	0.31 ± 0.09	0.44 ± 0.11	0.49 ± 0.13
	HO-rSLDS	0.18 ± 0.06	0.20 ± 0.05	0.21 ± 0.06

Table 2. SNR derived from AR-HMM, SLDS, and HO-rSLDS based on the synthetic time series with different state duration distribution and autoregressive order in the generative model. All the results are the average of the 100 experiments (mean ± std).

SNR		$p = 1$	$p = 2$	$p = 3$
Geometric	AR-HMM	0.79 ± 0.08	0.76 ± 0.07	0.69 ± 0.08
	SLDS	0.85 ± 0.06	0.68 ± 0.09	0.64 ± 0.09
	HO-rSLDS	0.93 ± 0.04	0.92 ± 0.05	0.90 ± 0.05
Nonparametric	AR-HMM	0.73 ± 0.08	0.65 ± 0.07	0.59 ± 0.10
	SLDS	0.80 ± 0.06	0.60 ± 0.09	0.54 ± 0.11
	HO-rSLDS	0.91 ± 0.04	0.87 ± 0.05	0.86 ± 0.05

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, H.; Chen, J. Bayesian Inference of Recurrent Switching Linear Dynamical Systems with Higher-Order Dependence. Symmetry 2024, 16, 474. https://doi.org/10.3390/sym16040474

AMA Style

Wang H, Chen J. Bayesian Inference of Recurrent Switching Linear Dynamical Systems with Higher-Order Dependence. Symmetry. 2024; 16(4):474. https://doi.org/10.3390/sym16040474

Chicago/Turabian Style

Wang, Houxiang, and Jiaqing Chen. 2024. "Bayesian Inference of Recurrent Switching Linear Dynamical Systems with Higher-Order Dependence" Symmetry 16, no. 4: 474. https://doi.org/10.3390/sym16040474

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Bayesian Inference of Recurrent Switching Linear Dynamical Systems with Higher-Order Dependence

Abstract

1. Introduction

2. Methods

2.1. HMM and HSMM

2.2. SLDS

2.3. Recurrent Switching Linear Dynamic System with Higher-Order Dependence

2.3.1. HO-SLDS

2.3.2. Stick-Breaking Logistic Regression and Pólya-Gamma Augmentation for HO-SLDS

3. Variational Inference of HO-rSLDS

3.1. Update for $q (x_{0 : T})$

3.2. Update for $q (ω_{1 : T})$

3.3. Update for $q (z_{1 : T})$

3.4. Update for $q (θ)$

3.5. Initialization

4. Numerical Experiments

5. Dynamic Functional Connectivity Analysis in fMRI Data

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Bayesian Inference of Recurrent Switching Linear Dynamical Systems with Higher-Order Dependence

Abstract

1. Introduction

2. Methods

2.1. HMM and HSMM

2.2. SLDS

2.3. Recurrent Switching Linear Dynamic System with Higher-Order Dependence

2.3.1. HO-SLDS

2.3.2. Stick-Breaking Logistic Regression and Pólya-Gamma Augmentation for HO-SLDS

3. Variational Inference of HO-rSLDS

3.1. Update for q ( x 0 : T )

3.2. Update for q ( ω 1 : T )

3.3. Update for q ( z 1 : T )

3.4. Update for q ( θ )

3.5. Initialization

4. Numerical Experiments

5. Dynamic Functional Connectivity Analysis in fMRI Data

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.1. Update for $q (x_{0 : T})$

3.2. Update for $q (ω_{1 : T})$

3.3. Update for $q (z_{1 : T})$

3.4. Update for $q (θ)$