1. Introduction
Regime switching is a compelling approach for modeling the process of switching between regimes in a time series characterized by different dynamics or characteristics. The two key concepts in regime switching models are the regime itself and the switching mechanism. These regimes arise from structural breaks in the time series, such as financial crises or changes in government policies, and they often represent different market conditions or economic cycles. Examples include bull or bear markets, low- or high-volatility periods, inflation or deflation phases, and growth or recession cycles see e.g., [
1]. The regime switching mechanism can be visible or hidden.
Tong [
2] first proposed the threshold autoregression (TAR) model in 1978 to capture the dynamics of financial time series data by switching between different regimes. The general form of Tong’s TAR model is given by
where
is a sequence of independent and identically distributed (IID) random variables with mean 0 and variance
, and
is an indicator variable representing the regimes or states. The value of
, determined by a threshold variable
, can be either observable or hidden.
Threshold models are a class of non-linear process designed to provide local linear approximations to underlying non-linear relationships rather than to describe regime switching behavior. To capture regime switching behavior like business cycles, Hamilton [
3] extended the regime switching model by introducing a regime switching mechanism based on hidden Markov chains (HMCs) with an unobservable stochastic process
, called hidden Markov switching (HMS). In this framework, the transitions between regimes are not directly observable; instead, probabilistic inference is used to determine whether and when a regime change is likely to occur based on the time series. In [
3,
4], an autoregressive conditional heteroskedasticity (ARCH) type models were applied in the regimes.
Ferguson [
5] extended the hidden Markov chain (HMC) model to the hidden semi-Markov state transition (HSMS) model, introducing more flexible sojourn distributions
that represent the probability of spending time
u in state
k. These sojourn distributions
were generalized from the geometric distributions used in HMC to a wider range of discrete and continuous distributions with positive support, such as Poisson, negative binomial, logarithmic, nonparametric, gamma, Weibull, and log-normal distributions [
6].
Teräsvirta [
7] proposed another regime switching model, the smooth transition autoregressive (STAR) time series model. In this model, regime changes occur gradually rather than abruptly. The smooth transition function
determines the degree of influence exerted by each regime.
is continuous and bounded between 0 and 1. The transition variable
is usually considered to be a lagged endogenous variable and is defined as
, where
is an integer. The parameter
controls the smoothness of the transition, while
c represents the threshold between two regimes. A common example of
is a first-order logistic function:
which defines the logistic STAR (LSTAR) model.
This article reviews various regime switching models with different switching mechanisms. Threshold models that depend on observable variables are discussed in
Section 2, hidden Markov regime switching models are covered in
Section 3, hidden semi-Markov models in
Section 4, smooth transition models in
Section 5, some new switching mechanisms will be introduced in
Section 6.
Section 8 concludes this review paper.
2. Threshold Models
Let us start with a simple threshold model example. The autoregressive (AR) model is a popular choice for modeling the time series
,
, and its formula is as follows:
where
k is the order of the model,
and
,
, are parameters, and
represents white noise.
The regime switching version of the autoregressive model of order
k has the following formulation:
where
. In addition to continuous-valued models such as autoregressive models, discrete-valued models can also be used to model a given regime
. Many regime switching discrete-valued models have been proposed, such as the models in [
8,
9,
10,
11]. The parameters of the model depend on
, where
, which is an indicator variable representing the regime or state at time
t. The parameters vary between
K different regimes. The threshold model assumes that
is determined by a deterministic function of the observed variable
. Different types of threshold models may have different deterministic functions of the observed variable
or different model specifications within regimes.
Tong and Lim [
12] introduced the self-exciting TAR (SETAR) model. In this model,
is the lagged endogenous variable
. For the SETAR model,
is determined as follows:
The parameters
d and
in the SETAR model represent the “delay” parameter and the thresholds, respectively, both of which are unobservable.
determined by
d and
is unobservable and needs to be estimated from the data. The threshold variable
can also be an exogenous time series
[
13], a linear combination of lagged or exogenous variables [
14,
15,
16], or a non-linear combination [
17,
18].
There are different types of threshold models for the financial time series. Chen et al. [
19] provided a comprehensive summary of threshold models categorized into three classes: non-linearity in mean, non-linearity in volatility, and ‘double’ threshold dynamics (in both mean and volatility). Non-linearity models in mean include the threshold autoregressive (TAR) model [
20,
21,
22,
23,
24], the threshold moving-average (TMA) model [
25,
26], and the threshold autoregressive moving-average (TARMA) model [
27,
28]. Within each regime, the autoregressive, moving-average (MA), and autoregressive moving-average (ARMA) models are employed.
Non-linearities in volatility models include GJR-GARCH [
29], ST-GARCH [
30,
31], DTGARCH [
15], Threshold GJR-GARCH [
32], TARR [
33], TGARCH [
4] and DTGARCH [
34]. These models are extensions of one of the most popular volatility models, the GARCH-type model.
To account for non-linearity in the mean and volatility models, Li and Li [
35] proposed a DTARCH model to handle asymmetric mean responses in the presence of non-linearities in both mean and volatility. In addition, Brooks [
36] extended the DTARCH model to a double threshold GARCH (DT-GARCH) model to assess asymmetries in market indices, stock returns, and exchange rates. Chen and So [
15] allowed
to be a weighted average of important auxiliary variables, defined as follows:
Here,
can represent any function of endogenous variables
and exogenous variables. Therefore, information about both endogenous variables and exogenous variables can be used to govern the switching mechanism. Chen and So [
15] found that domestic returns are the main factor determining the threshold compared with international returns. Chen et al. [
34,
37] proposed a three-regime DTGARCH model to capture the asymmetry in both mean and volatility. Audrino and Bühlmann [
38] proposed a tree-based GARCH model. In addition to GARCH-type models, So et al. [
39] proposed threshold stochastic volatility (THSV) models. Chen et al. [
40] introduced a generalization of the THSV model, incorporating error innovations that follow a standardized t-distribution.
The challenge in estimating threshold models is that the likelihood function is not differentiable with respect to the thresholds and is often multi-modal. Various approaches have been used to estimate the parameters of threshold models. Two-stage methods are commonly used, as described in [
12,
41]. These methods specify delay parameters and thresholds and then use maximum likelihood estimation (MLE) to estimate the parameters of the model. Researchers are increasingly using Bayesian methods via Markov chain Monte Carlo (MCMC) methods to estimate more complex models, benefiting from improvements in computational speed. Subsequently, all parameters, including the delay parameters and thresholds, can be estimated simultaneously.
3. Hidden Markov Regime Switching Models
Hamilton [
3] and Hamilton and Susmel [
42] introduced the hidden Markov regime switching model (also called hidden Markov switching model, abbreviated by HMS model) to analyze financial time series and Hamilton and Susmel [
42] incorporated the ARCH model within regimes. The regimes are unobservable and are modeled using probabilistic reasoning with a Markov chains, rather than deterministic functions. In contrast to the threshold models, which are used to model non-linearities in data, the Markov regime switching model is used to capture regime switching behaviors, such as business cycles. Since then, hidden Markov regime switching models have been widely used in financial modeling. The HMS model can also be viewed as a type of hidden Markov model (HMM) [
43] with continuous observations. HMMs are statistical models designed using a Markov process with hidden states. These models are widely used in speech recognition, bioinformatics, human activity recognition, and other fields.
In the HMS model, represents the latent regime or state at time t. The observations , ; depend on the emission probabilities , which depends on the hidden state . Here, is the conditional distribution of the observations given the hidden state , where denotes the model parameters under the regime , . The initial probability represents the probability of starting from state k, and the transition probability represents the probability of transitioning from state i to state j.
The two-state HMS model (where
) remains a popular specification in financial applications. The two states can usually correspond to depreciation and appreciation regimes. The three-state HMS model (
) is also used in financial applications. The three states can correspond to three different growth rate phases: recession regime, medium growth regime, and high growth regime, as described in [
44,
45].
In contrast to the threshold model, the hidden state in the Markov regime switching model is unobservable and is modeled by probabilistic reasoning with a Markov chain rather than a deterministic function. The Markov regime switching chain is defined by the transition matrix
. For the two-state model, the transition matrix
is given by
where
This reflects the Markov property, in contrast to the behavior of the threshold model. In the Markov switching model, regimes evolve exogenously with respect to the time series, whereas the threshold model shows that the regimes evolve endogenously based on the behavior of the observable variables. Markov switching models do not tie regime switching to the behavior of particular observations.
In finance, there are several popular examples of different types of Markov switching models. Hamilton and Susmel [
42] and Gray [
46] discussed the Markov switching GARCH (MS-GARCH) model to capture the volatility dynamics of financial time series. Krolzig [
47] developed the Markov switching vector autoregression (MS-VAR) model using the traditional vector autoregression (VAR) model as the emission distribution. So et al. [
48] combined the stochastic volatility model with Markov regime switching to model time-varying volatility. Hardy [
49] considered a simplified Markov regime switching model for analyzing complex monthly stock price return data using log-normal distributions within regimes. Shi and Ho [
50] first modeled long memory and regime switching via the autoregressive fractional moving average (ARFIMA) and Markov regime switching models. Ma et al. [
51] introduced Markov regime switching into the heterogeneous autoregressive model of realized volatility (HAR-RV type) models and evaluated the forecasting performance.
3.1. The Maximum Likelihood Estimation (MLE) Algorithm
Among the estimation methods of the HMS model, likelihood method is one of the main approaches for estimating the model parameters. For example, Hamilton [
3] used maximum likelihood estimation (MLE) to estimate the parameters in the HMS model.
Denote by
the vector of all the parameters in an HMS model. For a given
, the log-likelihood function is given by
where
denotes the density function of
, and
represents the observations up to time
t.
For the simplest case with
regimes, where
or 2,
is given as follows:
where
denotes the transition probabilities between regimes. The likelihood of the observations,
, is then computed as follows:
If we use the regime switching AR(1) model (Equation (
3) with
) as the state-dependent model, the conditional likelihood function in Equation (
6) is expressed as follows:
where
. By Bayes’ rule, the posterior probability of regime
i at time
t is
The right-hand side of (
8) is obtained using Equations (
5) and (
7).
To initialize the model, we calculate the initial probabilities as follows:
where
and
are the initial probabilities. These probabilities can be derived by solving the following stationary condition:
It is noted that
above cannot be the identity matrix. Otherwise, regime switching would not occur, and they will depend solely on the initial probabilities, even in the presence of stationary distributions. Solving Equation (
9) yields
Using these initial probabilities, we can recursively update
via Equation (
8) if
, which follows an AR(1) model (Equation (
3) with
) model here, is known. Finally, the likelihood function
can be constructed recursively, allowing the parameters to be estimated by maximizing the likelihood function.
3.2. The Expectation-Maximization (EM) Algorithm
In some cases, directly maximizing the likelihood of the observed data can be challenging. For example, Augustyniak [
52] applied the EM algorithm and an importance sampling method to improve the performance of MLE in the Markov-switching GARCH model. The EM algorithm provides an alternative technique to obtain the MLE of the likelihood of the observed data through an iterative process without directly calculating the likelihood function.
The EM algorithm consists of two steps, the E-step and the M-step, as described below:
E-step: Compute the expected complete data log-likelihood to obtain the Q-function,
, based on the observed data
and the current estimate of the parameters,
, at the
ℓ-th iteration:
If we use the regime switching AR(1) model (Equation (
3) with k = 1) as the state-dependent model,
is the same as in Equation (
7), and
is the same as in Equation (
8).
M-step: Maximize the Q-function with respect to
, i.e.,
Here, denotes the set of model parameters. The above two steps are iterated until convergence, at which point the parameter estimates stabilize.
3.3. The Markov Chain Monte Carlo (MCMC) Algorithm
The Markov chain Monte Carlo (MCMC) algorithm is a commonly used approach for HMS estimation. For instance, Amisano and Fagan [
53] estimated parameters via the MCMC algorithm because their model exhibited a highly irregular likelihood surface, which makes it challenging to estimate parameters via maximum likelihood estimation. A joint posterior distribution for the parameters and latent variables is obtained by the Bayesian approach. This is done through Gibbs sampling-based posterior simulation. Kim and Nelson [
54] described in detail in Chapter 9 how to use the Gibbs sampling method in the HMS model. Within the HMS framework, the model’s unknown parameters
are estimated first. Then, the latent regime
conditional on parameters
is estimated. In the Bayesian analysis, both parameters
and latent regime
are treated as random variables. The key to the Bayesian approach is that the unknown parameters
are treated as random variables. The joint posterior density is derived as follows:
where
. Equation (
11) assumes that the transition probabilities in
given
are independent of other parameters
and the observed data
so that
. We then apply the Gibbs sampling as follows:
Generate from , where
is a vector of S variables that exclude ;
or generate from .
Generate transition probabilities from .
Generate from .
These three steps are iterated until convergence. The detailed implementation is given below.
3.3.1. Step 1
There are two generation methods. The first is single-step Gibbs sampling, which generates
based on
,
. By suppressing the conditioning on the parameters,
is generated by
, and the following results can be derived:
For the derivation process, see Section 9.1.1 of [
54]. Using Equation (
12),
can be calculated as
Then,
can be generated using a uniform distribution. If the random number generated from
is less than
, then
; otherwise,
.
Similarly, the second method is multi-move Gibbs sampling, which generates
based on
,
. Finally,
can be calculated by the following formula:
The derivation is given in Section 9.1.1 of [
54]. Then,
can be generated by a random variable drawn from
based on
, which is similar to the single-step Gibbs sampling method.
3.3.2. Step 2
Kim and Nelson [
54] used the beta distribution as a conjugate prior for the transition probabilities. The proposed posterior distribution is given by the following two independent beta distributions:
where
is a known prior hyper-parameter and
represents the transitions from state
i to
j, which can be easily calculated for a given
.
3.3.3. Step 3
The last step is based on the parameters in the state-dependent model, which need to be derived from different models. An example in [
54] is about generating
and
, conditional on
. They assumed a normal prior for
as follows:
where
. The posterior distribution is given by
where
and
,
, and
. The parameters
and
can be generated based on the posterior distribution.
3.4. Other Algorithms
In addition, Francq and Zakoı [
55] employed the generalized method of moments (GMM) to estimate the MS-GARCH parameters. GMM provides an alternative to simulation-based methods such as MCMC, which typically require more computational power. For moderately large samples, GMM performs well.
Furthermore, Zheng et al. [
56] proposed a spectral clustering hidden Markov model that extracts information using spectral methods. Spectral methods aim to reveal the relationship between hidden states and observations through singular value decomposition (SVD) to relate past and future observations. Zheng et al. [
56] extended the spectral method of discrete HMMs with discrete observations to accommodate continuous observations, which can be used for HMS models. The advantages of spectral methods include scalability, strong theoretical properties, and the fact that no likelihood function is required.
4. Hidden Semi-Markov State Transition Models
HMS has been extended and adapted into more complex models, such as hidden semi-Markov models (HSMMs), for tasks where HMS faces limitations. Note that the HMS model has two strong assumptions. First, the current latent state depends only on the most recent state in the past, i.e.,
. Second, the time spent in a state follows a geometric distribution. Let
be the probability that time
u is spent in state
k, called the sojourn distribution or duration distribution. In the HMS model,
is simply shown to be geometrically distributed [
57].
In fact, the two strong assumptions mentioned above limit the scope and flexibility of possible applications of the HMS model. If the underlying process does not follow a geometric sojourn distribution, the HMS model may not be appropriate. More drawbacks of the HMS model can be found in [
58,
59]. To overcome the inflexibility, Ferguson Ferguson [
5] proposed a hidden semi-Markov state switching (HSMS) model that allows for arbitrary sojourn distributions with great flexibility, including discrete and continuous distributions with positive support, such as Poisson, negative binomial, logarithmic, nonparametric, gamma, Weibull, log-normal [
6].
In the hidden semi-Markov switching model, we still use the following notations as in HMS models:
the initial probabilities ;
the emission probabilities ;
the transition matrix , where .
In addition, the probability
of the hidden process
staying in the
kth state for a time
u, is defined in HSMS model as follows:
Ferguson [
5] pointed out that speech recognition is one of the earliest and most successful implementations of HSMS. Over time, this approach has been extended to different domains, including the recognition of printed text, handwriting, human DNA sequences, film events, symbolic plan, and language. Its versatility has also inspired many other applications, such as ground target tracking, mobility tracking in cellular networks, protein structure prediction, air particle prediction, speech synthesis, music classification, image segmentation, electrocardiogram segmentation, and financial time series modeling. Yu [
60] gave an overview of these applications, while Balcilar et al. [
61] used the HSMS model to study various stages of human information processing. Based on the HSMS framework, Xiao and Dong [
62] introduced an innovative reputation management system for the online-to-offline (O2O) e-commerce market. There are also some applications of the HSMS model in the financial field. Bernardi et al. [
63] constructed an HSMS model with Student-t distribution for tail risk interdependence. Maruotti et al. [
59] introduced an HSMS model with a multivariate leptokurtic-normal distribution. Maruotti et al. [
57] combined the HSMS model with quantile regression for time series. Qin et al. [
64] proposed a robust HSMS model with Huber’s least favorable distribution.
The expectation-maximization (EM) algorithm and forward-backward algorithm are commonly used to estimate parameters
in the HSMS models, where
. If we continue with the AR(1) model (Equation (
3) with
) as a simple state-dependent example, the parameter set is given by
. Through simple calculation, the Q function in the HSMS model becomes the following formula:
where
5. Smooth Transition Models
Teräsvirta [
7] and Eitrheim and Teräsvirta [
65] first introduced and discussed the smooth transition (ST) model and related procedures. The autoregressive model that allows for regime switching in Equation (
3) can also be viewed as
where
is a function of the state
. The regime
can be represented by the vector
, where
are observable variables. Then, Equation (
14) can be written as
where
can be estimated by nonparametric estimation methods.
For the observed data
, where
, the smooth transition model is as follows:
where
consists of lagged endogenous and exogenous variables. The coefficients are denoted as
, where
and
. The variables
are observable and exogenous. The transition function
is continuous and bounded between 0 and 1. The transition variable
can be a lagged endogenous variable (
for a certain integer
), an exogenous variable (
, where
is exogenous), or a function of lagged endogenous and exogenous variables. A popular choice for
is the lagged endogenous variable
. The threshold between the two regimes is denoted by
c, and the smoothness of the transition is controlled by the parameter
.
We consider the AR(1) model (Equation (
3) with
) as a simple state-dependent example with the number of states being
. The smooth transition autoregressive (STAR) model is given as follows:
The STAR model is a regime switching model with two regimes, corresponding to
and
. Moreover, the transition between the two regimes is gradual because it depends on the smooth variation of the transition function
. As an example, we use the logistic STAR (LSTAR) model, where the transition function
is defined in Equation (
2) and discussed in
Section 1. We further consider
, which leads to the self-excited TAR model as described in
Section 2. When
, the value of the transition function
approaches 0.5. Therefore, when
, the LSTAR model degenerates into a linear model.
In the LSTAR model, the two regimes are usually associated with smaller and larger values of
associated with a threshold
c. This type of transition function is particularly suitable for modeling business cycles. The LSTAR model with
can describe periods of positive and negative growth. However, there are other types of regime switching behavior. If regime transitions are associated with smaller and larger absolute values of
, then the exponential STAR (ESTAR) model with the following
is more appropriate:
For example, Michael et al. [
66] and Baum et al. [
67] applied the ESTAR model to real exchange rates. The ESTAR model becomes linear for both
and
. Unlike the LSTAR model, the ESTAR model does not nest SETAR as a special case.
In order to overcome the limitations of SETAR, Jansen and Teräsvirta [
68] proposed a quadratic logistic function
This STAR model can nest a three-state SETAR model as a special case.
The smooth transition model mentioned above is a basic smooth transition model that only allows the transition function to handle two regimes. To overcome this limitation, the model has also been extended to incorporate multiple regimes, accounting for three business cycle phases: depreciation period, slow appreciation, and strong appreciation.
The Equation (
16) can be rewritten as
This can then be extended to a three-regime model by adding a second non-linear component as follows:
Assuming that
, the autoregressive parameters in this model change smoothly from
to
as
increases.
More generally, it can be extended to an
m-regime model with
smoothing parameters
and thresholds
, as shown below:
Another way to extend the basic model to a four-regime model is to nest two different two-regime smooth transition models, as shown in the following equation:
A further extension of the multi-regime model discussed in the previous paragraphs is the time-varying smooth transition autoregressive (TV-STAR) model, which allows for both non-linear dynamics and time-varying parameters. The TV-STAR model is derived from Equation (
19) by setting
and
, using a simplified notation for the transition functions, as shown below:
The switching variables in Equation (
20) include the lagged endogenous variable
and time
t.
6. Some Other Switching Mechanisms
The previous sections review switching mechanisms in regime switching models, which have attracted the attention of many researchers. There also some other regime switching models with different switching mechanisms.
Chang et al. [
69] introduced a new regime switching model with autoregressive latent factors, which combined the concepts of both threshold and Markov switching. In the HMS models, the Markov chain determines the regimes independent from other parts of the model, which is unrealistic in some cases. In the model proposed by [
69], the mean and volatility processes switch depending on whether the autoregressive latent factor is above or below a certain threshold. The innovation in the latent factor is correlated with the previous innovation. Chang et al. [
69] also allowed the autoregressive latent factor to have a unit root and to accommodate a strongly persistent regime change. If the autoregressive latent factor is exogenous and stationary, the regime switching model proposed by [
69] simplifies to a conventional Markov-switching model. This new model can take advantage of HMS models and overcome their shortcomings effectively.
Bazzi et al. [
70] proposed a time-varying Markov regime switching model. In the basic HMS model, the transition probability matrix
is constant over time. Bazzi et al. [
70] let the transition probability matrix vary over time as specific transformations of lagged dependent observations. The update of the time-varying parameters is based on the probability of the regimes, given information up to time
.
7. Empirical Examples
In this section, we compare the performance of threshold models, hidden Markov regime switching models, hidden semi-Markov models, and smooth transition models when applied to real data. These models are assumed to have the same AR(1) state-dependent structure as Equation (
7) but are different in the switching mechanism. In addition, we consider the hidden Markov regime switching model and the hidden semi-Markov model, where the state-dependent conditional distributions of observations are
For convenience, we denote these models as follows: the threshold model with an AR(1) state-dependent structure (TH-ar), the hidden Markov regime switching model with an AR(1) state-dependent structure (HMS-ar), the hidden semi-Markov model with an AR(1) state-dependent structure (HSMS-ar), the smooth transition model with an AR(1) state-dependent structure (ST-ar), the hidden Markov regime switching model with a normal distribution state-dependent structure (HMS-norm), and the hidden semi-Markov model with a normal distribution state-dependent structure (HSMS-norm). The number of regimes here is set to two.
For the TH-ar model, we only employ a simplest SETAR model, where the state variable
is given by
We estimate the model parameters
, and
using the least squares estimation (LSE) method [
71] by minimizing the following equation:
After obtaining the estimates
, and
, we estimate
and
as follows:
where
and
.
For the HMS models, we estimate the model parameters by applying the maximum likelihood estimation (MLE) method as described in
Section 3. Details of the MLE algorithm are provided in [
49]. For the HSMS models, we estimate the model parameters by utilizing the expectation-maximization (EM) algorithm described in
Section 4 through the
Q function in Equation (
13). Specifically, a nonparametric distribution is used as the sojourn distribution. Details of the EM algorithm are provided in [
64].
For the ST-ar model, we use the logistic function in Equation (
2) as the transition function, with
. We estimate the parameters
and
c by minimizing the following function [
72]:
where
7.1. Example 1: The Daily Returns of the S&P 500
This subsection presents a real example comparing above six different regime switching models. We consider the daily log returns of the S&P 500 from 1 January 2020 to 27 December 2024, which contain 1255 observations. The original dataset is available at
https://ca.investing.com/indices/us-spx-500-historical-data (accessed on 28 December 2024). This example illustrates the switching processes modeled by these six different regime switching approaches.
Figure 1 and
Figure 2 visually depict the estimated regimes or states of the daily log returns of the S&P 500 via these six models. As discussed in
Section 1, the TH-ar model aims to provide a local linear approximation to the underlying non-linear relationship, but it fails to capture regime switching behavior. The TH-ar model partitions the data based on an estimated threshold value of
that does not convey any information about the business cycle. In contrast, the HMS and HSMS models effectively identify bear and bull markets as States 1 and 2, respectively. By the similarity of the plots of the HMS-norm and HMS-ar in
Figure 1, or HSMS-norm and HSMS-ar in
Figure 2, the state-dependent structure has much less influence on state estimation compared with switching mechanisms. The ST-ar model provides a smooth transition between states, capturing the intermediate dynamics. In addition, the ST-ar model can represent the business cycle, with dark blue dots appearing around recession periods in the time series, indicating that the model aligns more closely with State 1 during these time periods.
Table 1 provides the estimated parameters for each regime or state. State 1 represents depreciation periods with larger standard deviations and negative means. The TH-ar model exhibits slight differences in estimated parameters compared to the HMS and HSMS models. The HMS and HSMS exhibit more similar estimated parameters because HSMS model extends the HMS model by incorporating more flexible sojourn distributions. Both models are able to effectively capture regime switching behavior. The histogram of the state estimates by fitting the ST-ar model is displayed in
Figure 3. Most estimated states fall within the range of 1.6 to 1.8. The state-dependent parameters of the ST-ar framework exhibit greater variability across states than those of the other models.
Ref. [
73] introduced the Akaike Information Criterion (AIC), one of the most popular model selection criteria, which is defined as follows:
where
k is the number of parameters, and
is the log-likelihood of the model with estimated parameters. Some studies have found that AIC is not consistent, especially in large samples.
Since AIC tends to select over-parameterized models ([
74]), to address this issue, Ref. [
75] introduced a consistent criterion for large samples, the Bayesian information criterion (BIC), which is based on a Bayesian framework and is given by
where
k is the number of parameters, and
is the log-likelihood of the model with estimated parameters.
Table 2 compares the AIC and BIC of two-state regime switching models for the daily S&P 500 returns.The HMS-AR model performs best in terms of both AIC and BIC, with the best result highlighted in bold. The HSMS model performs better under AIC but worse than the TH-ar model under BIC because BIC imposes a larger penalty on model complexity.
7.2. Example 2: The Weekly Returns of the EURO STOXX 50
We estimate the two-state regime switching model for the weekly returns of EURO STOXX 50 spanning 1 January 2015, to 1 January 2025, using data available as of January 2025. The original dataset is from
https://ca.investing.com/indices/eu-stoxx50-historical-data (accessed on 2 January 2025).
Figure 4 and
Figure 5 illustrate the states estimated using six different approaches. The TH-ar model fails to clearly distinguish between depreciation and appreciation periods, as evidenced by the overlap in the state assignments for States 1 and 2.
In contrast, the HMS and HSMS models successfully separated the two business phases, with the states showing more pronounced differences. These differences are even more distinct than those for the daily returns of the S&P 500. It is noteworthy that the HSMS model identifies a greater number of State 1, which can be attributed to its more flexible sojourn distribution. This flexibility suggests that, unlike the states estimated for the daily returns of the S&P 500, the sojourn distribution for the weekly returns of EURO STOXX 50 is not close to a geometric distribution.
The ST-ar model performs poorly, resulting in overlapping state estimates and thus no clear state separation can be achieved. The estimated parameters for all models are summarized in
Table 3, while the histogram of the state estimates by fitting the ST-ar model to the EURO STOXX 50 returns is presented in
Figure 6. The parameter estimates for HMS and HSMS models are similar, closely aligning with the results observed for the daily returns of the S&P 500. However, the state distribution by the ST-ar model for the weekly returns of EURO STOXX 50 is less concentrated than that of the daily returns of the S&P 500, where states are mainly located around 1 and 2.
Table 4 compares the AIC and BIC of the two-state regime switching models for weekly EURO STOXX 50 returns. The HMS-ar model performs best according to the AIC, while the HMS-Norm model performs best according to the BIC, with the best result highlighted in bold. The HSMS model performs better under AIC, but underperforms the TH-ar model under BIC, as BIC imposes a larger penalty on model complexity.
8. Conclusions
In this paper, we review four popular regime switching models, each defined by a different mechanism: threshold models, hidden Markov regime switching models, hidden semi-Markov models, and smooth transition models. For each type of switching mechanism, we review its relevant framework, popular models, and commonly used estimation methods. In addition, in
Section 6, we introduce several emerging switching models that deserve further theoretical development and empirical investigation.
Furthermore, we compare six different regime switching models using two real data examples. The comparison considers different switching mechanisms and state-dependent structures. Threshold models aim to provide a local linear approximation of the underlying non-linear relationship. HMS and HSMS can better capture the business cycle dynamics compared to other models. In addition, HSMS features a more flexible sojourn distribution. Smooth transition models help achieve a smooth regime switching process. However, in some data applications, they may have difficulty in clearly separating the states.