1. Introduction
Conditional volatility models are an essential tool in the fields of financial econometrics and risk management, and because of this importance, the literature related to the development of econometric models and inference methods to capture volatility patterns over time is wide-ranging. The main problem is that conditional volatility is a latent process, which adds a considerable degree of difficulty to traditional inference methods, such as maximum likelihood estimators.
Stochastic volatility (SV) models play a crucial role in understanding the dynamic behavior of financial time series, especially in capturing the inherent variability in returns over time. A common and effective approach in these models is to use a first-order autoregressive (AR(1)) structure for the latent log-variance, which is the natural logarithm of the conditional variance of returns. This AR(1) specification, as proposed in
Taylor (
1986), models the evolution of log-variance as a process that depends linearly on its past value plus a stochastic innovation. The formulation is particularly attractive due to its simplicity and ability to reproduce well-known stylized facts related to financial time series, such as volatility clustering, heavy tails, and the non-Gaussian nature of return distributions.
The AR(1) structure assumes a mean-reverting process where volatility tends to revert to a long-term average level, capturing the short-memory property of financial volatility. This property aligns with the observation that shocks to volatility have a significant but temporary impact, decaying over time. Practical advantages of this structure lie in its tractability and the relative ease of estimation using state-space methods such as the Kalman filter or Bayesian inference techniques like Markov Chain Monte Carlo (MCMC).
Despite these advantages, the short-memory assumption inherent in AR(1) models limits their ability to capture certain empirical characteristics of volatility. Specifically, financial markets often exhibit persistent volatility patterns that decay more slowly for very distant lags than an AR(1) process can represent. This phenomenon, known as long-range dependence or long memory, implies that past volatility shocks can have prolonged effects that influence future volatility over extended periods. Short-memory models, which rely on exponential decay, fail to account for this feature, potentially leading to biased estimates and suboptimal forecasting performance when applied to assets with strong persistence in terms of volatility.
To address these limitations, long-memory structures for volatility, such as those based on fractional Brownian motion or fractional Gaussian noise, have been proposed. These models exhibit hyperbolic rather than exponential decay, capturing the persistence seen in empirical studies of financial time series. The Long-Memory Stochastic Volatility (LMSV) model is a prominent example, where the log-variance process follows a fractional integration parameter that allows for gradual decays in dependence. This modification enables the model to better represent the observed behavior of volatility, particularly in markets with persistent volatility clusters.
The importance of long-memory structures is reinforced by studies demonstrating their relevance for various asset classes and volatility measures. For instance, evidence from stock market indices, commodities, and even cryptocurrency markets indicates that incorporating long memory into volatility modeling can significantly improve in-sample fit and out-of-sample forecasting performance. Additionally, the use of long-memory models helps capture the slow mean reversion observed in realized volatility measures obtained from high-frequency data, aligning with empirical findings in financial econometrics (
Christensen and Nielsen 2007;
Maasoumi and McAleer 2008).
In summary, while first-order autoregressive structures are useful for modeling stochastic volatility due to their simplicity and alignment with short-term volatility dynamics, they fall short when addressing the long-term persistence seen in financial markets. Introducing long memory into volatility models provides a more flexible framework that aligns with the empirical properties of asset returns, enhancing both the modeling accuracy and the practical application of these models in risk management and financial forecasting.
The modeling of stochastic volatility with the use of a long-memory structure presents an additional complexity in relation to the models that assume an autoregressive structure. The key point is that the long-memory structure based on Fractional Brownian Motion/Fractional Gaussian Noise is a non-Markovian and non-semimartingale process, violating the usual assumptions regarding Markovian structures and the Martingale difference innovation processes used in the estimation of these models. As a direct example of the difficulties generated by the introduction of long-memory structures, note that a good portion of the methods used in the estimation of stochastic volatility models is based on a linear Gaussian state space representation, whose fundamental assumption is the use of a Markovian structure of dependence for latent states. Due to this difficulty, alternative forms were proposed for the estimation of this class of models.
In this paper, we employed an alternative Bayesian estimation for long-memory stochastic volatility models using an approximation of a Fractional Gaussian Noise process that allows us to represent this process as a Gaussian Markov Random Field, as proposed by in
Sørbye et al. (
2019). From this representation, we can use Integrated Nested Laplace Approximations (INLA) to perform the Bayesian estimation of parameters and latent log-variances (
Rue et al. 2009). This formulation is interesting since it is computationally efficient, allowing for the fast and accurate estimation of this class of models without necessitating the use of simulation-based methods, which can become computationally prohibitive when the sample size is very large; this, for example, occurs in the estimation using high-frequency data to realize volatility measurements. An application of this methodology in modeling interest rates in multifactor models can be found in
Valente and Laurini (
2024), showing the good properties of this methodology for the approximation of long-memory structures.
The introduced methodology is also interesting in that it allows for modifications to be introduced to long-memory models involving variance. In this paper, we propose two extensions of the stochastic volatility model with long memory. The first extension is a two-factor model, where the first factor is the long-memory component, and the second component can be thought of as a smooth variation process in the long-term average of the volatility process, similar to the formulation of a Spline-GARCH model (e.g.,
Engle and Rangel 2008). The second modification is a stochastic volatility model with long memory that is adapted to the persistence patterns observed in high-frequency data. For this model, we employed a structure with two latent factors, with the first latent factor being a long-memory process based on a Fractional Gaussian Noise representatio, and the second latent factor allowing us to incorporate patterns of intraday seasonality into the stochastic volatility structure. This structure allows for the simultaneous estimation of the long-memory process and intraday seasonality pattern, avoiding the problems associated with the two-stage estimation proposed in
Deo et al. (
2006).
We can summarize the main research hypotheses as follows: the INLA methodology can estimate latent parameters and log-variances in LMSV models with a comparable accuracy to traditional MCMC methods, while significantly reducing computational complexity and alleviating convergence issues. By approximating the Fractional Gaussian Noise process as a Gaussian Markov Random Field, the approach is hypothesized to be computationally efficient, making it suitable for high-frequency financial data analysis. Furthermore, it is expected that multi-factor extensions of LMSV models using INLA will demonstrate a strong in-sample fit and out-of-sample forecasting performance, particularly in applications such as 5 min Bitcoin returns with integrated intra-day seasonal components.
Our simulation results suggest the INLA methodology can provide estimates of latent parameters and log-variances in LMSV models, which are comparable in terms of accuracy to traditional MCMC methods. The proposed extensions display good properties in terms of their in-sample adjustment and out-of-sample forecast, showcasing the flexibility of the INLA approach. The computational burden that arises from a large number of latent variables often prohibits the application of traditional posterior simulation methods in high-frequency settings. This is especially true for LMSV models. The LMSV specification we apply to 5 min Bitcoin returns incorporates the simultaneous estimation of additive intra-day seasonal components and performs well in the construction of realized volatility measures.
This paper is structured as follows.
Section 3 exposes the formulation proposed for the stochastic volatility models and discusses the estimation methods.
Section 2 presents a literature review of the estimation of SV models. In
Section 4, we present a Monte Carlo study of the small sample properties of the estimators. In
Section 5, we apply the methodology to the daily returns of major cryptocurrencies and S&P 500 Index, then compare the in-sample fit and out-of-sample forecast performance to popular alternatives. The method’s application to high-frequency Bitcoin data is presented in
Section 6.
Section 7 concludes the paper.
2. Literature Review
The literature on econometric models of conditional volatility can be divided into two main classes of models based on the form of treatment applied for the latent volatility process. The first important class is the class of observation-driven models, which considers that the process of latent variance can be approximated by a structure that depends only on observable processes. In this class of models, the main reference is the GARCH (Generalized Autoregressive Conditional Heteroskedasticity) family of models, introduced in
Engle (
1982) and generalized in
Bollerslev (
1986), and later extended in several other directions.
In this structure, latent variance is considered a deterministic function of the past variance itself and the past squared returns or residues. A general framework for observation-driven process modeling is the generalized score model class proposed in
Creal et al. (
2013), which uses the model’s past score as the update mechanism for time-varying model parameters. The use of this observation-driven structure is interesting because, considering that the variance process depends only on observable components in the current period, it avoids the inference problems generated by the presence of latent variables in the model, allowing for the direct use of maximum likelihood estimators.
The second class of models is based on the treatment of conditional volatility as a truly stochastic dependent process, with the main reference in discrete time being the so-called log-normal model of stochastic volatility (SV) introduced in
Taylor (
1986). In this model, the dynamics of the conditional log-variance are derived through a first-order autoregressive process. This structure is very convenient since this formulation is consistent with the main stylized facts of financial series, such as volatility clusters, heavy tails, and the non-Gaussianity of returns, and avoids the artificial assumption that volatility is deterministic.
However, assuming that the process of latent volatility is stochastic adds significant difficulties in the inference procedures for parameters and for the filtering of the conditional volatility itself. When assuming a stochastic structure, it is not possible to directly use maximum likelihood estimators, since these require the marginalization of each unobserved volatility, thus necessitating the calculation of a multiple integral with a dimension equal to the sample size, which is not computationally feasible. Due to this difficulty, a series of frequentist and Bayesian methods was created to estimate stochastic volatility models. The estimation method proposed in
Taylor (
1986) is based on the method of moments, which does not allow for an estimation of the latent volatility process itself, only the parameter vector of the model.
Among the frequentist methods, the main reference is the Quasi-Maximum Likelihood method based on the decomposition of the forecast error through the Kalman Filter independently proposed in
Harvey et al. (
1992) and
Nelson (
1988). In this structure, the log of the squared returns is composed of a mean component plus a latent autoregressive process, which is the log-variance of the returns, formulated as a state-space process that allows for the use of the Kalman Filter to filter this latent component. As this estimation is based on a linearization of the model, the estimator is approximated and is therefore a Quasi-Maximum Likelihood estimator. Although this estimator is computationally simple, its performance may be sub-optimal due to the bias introduced by the linearization of the model, as discussed in
Andersen and Sørensen (
1996). To circumvent this limitation, other frequentist estimation methods use maximum simulated likelihood (
Sandmann and Koopman 1998), efficient method of moments (
Andersen et al. 1999), empirical likelihood, and minimum discrepancy (
Laurini and Hotta 2017), among other possible treatments.
Several Bayesian estimation methods have been proposed for the class of stochastic volatility models. Bayesian estimation is particularly interesting for the problem at hand because latent processes can be treated as additional components of the model, and thus additional parameters to be estimated. This allows for the use of traditional methods of Bayesian inference without additional complications. Among the main works proposing the use of Markov Chain Monte Carlo (MCMC) for estimating SV models, we have
Kim et al. (
1998),
Chib et al. (
2002) and
Asai (
2005), and as examples of recent developments, we have
Kastner and Frühwirth-Schnatter (
2014) and
Kastner (
2019).
Martino et al. (
2011) proposed an alternative form of the Bayesian estimation of SV models based on the Integrated Nested Laplace Approximations (INLA) framework of
Rue et al. (
2009). The INLA methodology allows for one to perform Bayesian estimation for parameters and latent factors for a wide class of models that can be represented or approximated by Gaussian Markov Random Fields. Because INLA is based on an analytical approximation, it does not require simulation procedures and thus is not affected by the chain convergence issues that sometimes effect MCMC algorithms. In addition, these methods are very accurate and are generally computationally fast and efficient, since, in conjunction with the INLA methodology, it is possible to use a series of sparse matrix representations to represent the Gaussian Markov Random Field. The properties of this method for the estimation of SV models are analyzed in
Ehlers and Zevallos (
2015), which shows that the method is very accurate and has good properties for the calculation of measures derived from conditional volatility, estimated as the Value at Risk. A multifactor formulation of multivariate stochastic volatility models is presented in
Nacinben and Laurini (
2024a).
An important point is that most of the models proposed for stochastic volatility modeling are based on a short-memory structure for the dependency process, assuming a mean reversion velocity compatible with an autoregressive process. An alternative possibility is to assume that the process of dependence on volatility is provided by a long-memory process, also known as a long-range dependence process. In this structure, the most used form is an innovation structure based on a discrete version of the fractional Brownian Motion, a representation based on a fractional Gaussian Noise, as described in
Taqqu (
2003). In this case, the innovations of the dependency process follow a hyperbolic decay process, rather than the standard exponential decay of short-memory models. References to the use of long-memory structures in stochastic volatility models can be found in
Harvey (
1998),
Breidt et al. (
1998), and
Hurvich and Soulier (
2009). The main model used is the Long-Memory Stochastic Volatility (LMSV), where the log variance follows a long-memory process.
The use of long-memory models for variance may be interesting as there is empirical evidence showing that this structure may be appropriate for classes of assets and measures of volatility.
Christensen and Nielsen (
2007),
Scharth and Medeiros (
2009),
Hillebrand and Medeiros (
2016),
Shackleton et al. (
2008), and
Fleming and Kirby (
2011) discuss the importance of a long memory in conditional stock market volatility. Another important point is the impact of long memory on the calculation of realized volatility measurements using high-frequency data, as discussed in
Maasoumi and McAleer (
2008),
McAleer and Medeiros (
2008), and
Lieberman and Phillips (
2008). There is recent interest in the use of these models because of the ample evidence of the presence of long-range dependence features in cryptocurrency markets, as shown by
Mensi et al. (
2019),
Phillip et al. (
2019),
Bouri et al. (
2019), and
Chaim and Laurini (
2019), among other works.
Many methods have been proposed for the estimation of LMSV models.
Harvey (
1998) and
Breidt et al. (
1998) employed a spectral approximation for the likelihood function, while
Arteche (
2004) specified a semi-parametric form based on a local Whittle estimator. Different formulations can be used using an approximation of the state space representation through a truncated Autoregressive Fractionally Integrated Moving Average (ARFIMA) process, as used in
Chan and Palma (
1998),
Basak et al. (
2001) and
Ferraz and Hotta (
2007) in frequentist formulations, and
Chan and Petris (
2000) for the Bayesian estimation in a Markov Chain Monte Carlo algorithm. Comparative reviews of estimation methods of stochastic volatility models with long memory can be found in
Perez and Ruiz (
2001) and
Crato and Ray (
2002).
3. Long-Memory Stochastic Volatility Models
The standard stochastic volatility (SV) model, introduced in
Taylor (
1986), is a log-normal mixture model in which the log normal variance component follows a stationary autoregressive process of the first order. Let
be an equally spaced sequence of asset returns; then, an SV model can be written as
This model has three parameters that dictate the AR(1) dynamics of latent log-variance . The long-run mean is , autoregressive persistence depends on , and the latent log-variance negatively depends on the (marginal) precision . Due to the internal INLA conventions, in this paper, we report the marginal precision of log-variance , . It is more common to speak in terms of the precision of the i.i.d. error term , which can be easily obtained by the relation .
Because log-variance is not an observed variable, maximum likelihood estimation requires evaluating a multiple integral with a dimension equal to the sample size, which is often computationally unfeasible. By employing the method of moments, originally proposed by
Taylor (
1986), one can estimate parameters
,
, and
, but is unable to recover latent log-variance
. A traditional frequentist solution is the Quasi-Maximum Likelihood (QML) method, based on the forecast error decomposition obtained using Kalman Filter (
Harvey 1989). Here, log-squared returns are linearized and decomposed into a mean. Additionally, a first-order autoregressive process is used, such that the model has a linear state space representation, and is then amenable to Kalman filtering. Alternative frequentist methods which can circumvent these issues include simulated maximum likelihood (
Sandmann and Koopman 1998), the efficient method of moments (
Andersen et al. 1999), empirical likelihood, and generalized minimum discrepancy (
Laurini and Hotta 2017), to mention a few.
There are several Bayesian estimation methods for the class of stochastic volatility models. Because latent variables are treated as additional parameters, traditional Bayesian inference methods are directly applicable to SV models. Markov Chain Monte Carlo algorithms are commonly used for posterior sampling, e.g.,
Gamerman and Lopes (
2006) and
Johanes and Polson (
2005). It is known that due to the high correlation within latent variables and between latent variables and deep parameters, MCMC methods can require long chains to provide a good posterior characterization of SV models (
Kastner and Frühwirth-Schnatter 2014;
Gong and Stoffer 2020).
Although estimation using posterior simulation methods is feasible in a wide range of practical applications, MCMC methods still can be computationally demanding, especially models involving latent variables, whose computational burden increases more than proportionally with larger sample sizes.
Martino et al. (
2011) show how an SV model can be estimated using Integrated Nested Laplace Approximations (INLA), a class of methods introduced by
Rue et al. (
2009), which can be applied to the Bayesian estimation of parameters and latent factors of models that can be represented by, or approximated by, a Gaussian Markov Random Field (GMRF). Because INLA calculations are analytic, this avoids issues inherent to posterior simulation MCMC methods, such as slow chain convergence.
Following
Martino et al. (
2011), if we assign mean parameter
a Gaussian prior with zero mean and a large known variance, the standard stochastic volatility model (
1) and (2) can be seen as a latent Gaussian model with latent field
:
where
are the parameters driving the log-variance. The
-dimensional latent Gaussian field is partially observed through the
conditionally independent data
with the following likelihood:
where
are parameters of the
return process, which, in our case, is a precision of one, since
.
Let
. The main goal in estimating SV models is to evaluate the marginal distributions
The INLA procedure is a computationally efficient method to compute these marginal distributions. At its core is a very fast Gaussian approximation to densities of the following form:
where
. The Gaussian approximation
is obtained by matching the curvature at the posterior mode value
, which is computed iteratively using a Newton–Raphson algorithm:
where
is a normalizing constant, and the vector
is a correction term given by second-order terms in the Taylor expansion of
at the modal value
.
3.1. Long-Memory Stochastic Volatility Models
Although SV models capture key aspects of returns series such as there being almost no autocorrelation in terms of levels and excess kurtosis, for some assets, the decay in the autocorrelation of distant lags of squared returns can be slower than implied by the autoregressive structure in Equation (2).
An alternative is to model log-variance as a stationary process with long memory. Following
Beran (
2017), a weakly stationary process has long memory if its autocovariance function
for distant lags
k satisfies
for
, with
. Or, equivalently, in the frequency domain, if its spectral density
for frequencies
close to zero satisfies
for
and
.
If the log-variance component
has long-range dependence properties, then we have the long-memory stochastic volatility model (LMSV), as introduced by
Harvey (
1998) and
Breidt et al. (
1998).
The traditional approach when introducing long-range dependence to SV models involves characterizing latent log-variance dynamics as an ARFIMA process. In its simplest form, log-variance
follows an ARFIMA(0,d,0) and this model can be written as
where
B is the backshift operator and parameter
determines the fractional integration order.
Because long-memory processes do not have a finite state space representation, the estimation of LMSV models is not straightforward. Notice that many of the presented methods for estimating SV models take advantage of some sort of state-space representation of the model. Estimations of LMSV models have traditionally been carried out in the frequency domain using spectral likelihood estimators (
Breidt et al. 1998), and in the time domain using simulation-based Bayesian methods and Quasi-Maximum likelihood estimators. The latter approaches are often based on a truncated state-space representation of the long-memory process, as discussed in
Chan and Palma (
1998),
Chan and Petris (
2000) and
Ferraz and Hotta (
2007). A popular approach is to deal with the approximate representation of an ARFIMA process as a long lagged Autoregressive Moving Average (ARMA) model. A simple representation of this is the expansion of an ARFIMA(0, d, 0) through an autoregressive process in the following form:
with
,
,
P as the order of approximation,
as the latent process, and
as a white noise process. This representation is quite interesting since it is sufficient to rewrite the representation in state space as that of an AR(P) model, in this case using the coefficients determined by the above expansion. The usual quasi-maximum likelihood and MCMC estimators can easily be adapted to this representation.
The approach we use here involves introducing long-range dependence through a fractional Gaussian noise (fGn) process rather than an ARFIMA. Conceptually, an ARFIMA comes from the fractional difference of a discrete ARMA process, while an fGn comes from the fractional differentiation of a continuous Brownian motion (
Hosking 1981). The two processes are closely related, especially if the autoregressive and moving average order of the ARFIMA are 0. The relationship between the Hurst exponent
H and the fractional integration order
d is such that
.
The LMSV specification we explore here can be written as
where
is a mean parameter, and
is an fGn with parameters
and
H.
Specifically,
is a zero-mean multinormal vector such that
The covariance matrix
has a Toeplitz structure, with the first-row elements given by the autocorrelation function:
where
. Note that the autocorrelation function is indeed a form of hyperbolic decay, as
when
. We will sometimes write
as a shorthand notation when referring to the structure described above.
3.2. Gaussian Markov Random Field Approximation of Fractional Gaussian Noise
Sørbye et al. (
2019) took advantage of the known relationship between long memory and cross-sectional aggregation (
Granger 1980;
Beran et al. 2010) to construct a Gaussian Markov Random Field-based approximation of a fractional Gaussian noise model from weighted sums of independent AR(1) components. They proposed matching the autocorrelation function of this composite autoregressive process to the autocorrelation (
10) of a fGn.
Following
Sørbye et al. (
2019), consider
m independent AR(1) processes
where
is the first-order autoregressive parameter of the
j-th process. Also, let
be zero-mean independent Gaussian shocks with variance
. Define the cross-sectional aggregation of the
m processes as
where
denotes
and the weights
sum to one.
Haldrup and Valdés (
2017) studied the finite sample properties of similar aggregations of AR(1) processes.
The autocorrelation function of (
11) is simply
The idea put forward in
Sørbye et al. (
2019) is to fit weights
and coefficients
such that the autocorrelation function (
12) of the composite AR(1) process matches the autocorrelation function of a true fGn process, as in Equation (
10). Values of
are obtained by minimizing the squared error
where
is an arbitrary upper limit to the number of lags included. Since the squared error is weighted by
k, persistence at distant lags has little impact on the objective function (
13).
With this approximation of the fGn process, we can represent the LMSV model (
7) and (8) as a latent GMRF using the INLA method, as shown in
Martino et al. (
2011). Following the recommendations in
Sørbye et al. (
2019), we use a third-order approximation to represent the fGn process. Further details on the implementation can be found in
Sørbye et al. (
2019).
3.3. Alternative Specifications
The INLA methodology is not restrictive and SV models can be easily augmented with additional latent factors. As long as the affine structure of log-variance is preserved, the model has a GMRF representation similar to Equation (
3), and can be estimated in the same manner. To showcase this flexibility, we consider alternative formulations for both SV and LMSV models, in which log-variance is composed of a smooth spline trend in addition to either the AR(1) or fGn processes. These specifications are similar to the Spline-GARCH model of
Engle and Rangel (
2008), where conditional volatility is subject to low-frequency variations which work as a time-varying long-run average.
The AR(1) Spline-SV model can be written as
where
follows a second-order random walk, which is constructed by assuming independent second-order increments with precision
. That is,
Since we use an unrestricted specification for this component, i.e., the component does not sum to zero, it can be used to capture a long-run average of a log-variance that is varying smoothly. Hence, in this specification, it is not necessary to include the mean parameter, which is already captured by the dynamics of this process.
Therefore, the AR(1) Spline model has three parameters: the autoregressive coefficient
, the marginal precision of log-volatility
, and the precision of the spline (trend) component
. More details on the specification of the second-order random walk model can be found in the
Appendix A.
Likewise, consider the LMSV Spline model
This model has three parameters: the Hurst exponent H, which determines the temporal persistence of the fGn process, the marginal precision of fGn component , and the precision of the second-order random walk spline.
Another alternative specification explored here is the inclusion of an additive seasonal component
in conditional log-variance dynamics. The seasonal component
has a set periodicity
M and is restricted such as the sum of the
m individual components is zero. This model can be written as
This seasonal component is of special interest when we are dealing with high-frequency data. As discussed in
Deo et al. (
2006), there is evidence of periodic patterns in intraday returns volatility. The periodicity
M, and thus the number of seasonal components, depend on the aggregation used to compute intraday returns. For example, in
Section 6 we use prices 5 min apart, and this defines the number of individual seasonal components.
The direct incorporation of this periodicity structure is advantageous since it avoids the use of multi-stage methods for the estimation of the LMSV model for high-frequency data. For example, in
Deo et al. (
2006) the periodicity pattern is estimated in the first stage, and then returns adjusted for this pattern are modeled as an LMSV process. Our formulation allows for the simultaneous estimation of these components, avoiding the problems associated with estimation at different stages.
A flowchart presenting a summary of the steps required to apply the method is shown in
Figure 1.
4. Monte Carlo Simulation Study
We performed Monte Carlo experiments to evaluate the small sample properties of the INLA fGn approximation and the estimation of LMSV models described in
Section 2. INLA estimates were compared to a traditional approach of employing an MCMC posterior sampling algorithm based on a truncated state space representation of long-memory processes, as proposed by
Chan and Petris (
2000). The long-memory process is approximated as an AR(20). We sample parameters
H and
and the latent volatility
using a Random Walk Metropolis scheme, while parameter
has a conjugate sampler. For a detailed description of this method, see
Chan and Petris (
2000) or
Ferraz and Hotta (
2007). Each MCMC estimation was based on a chain of 20,000 samples, after discarding the first 4000. Longer chains did not show significant gains in terms of MCMC accuracy performance in this specific application.
Three prior specifications/estimation methods were compared. The first two had the same prior configuration. For the MCMC estimation and the baseline INLA (INLA 1) implementation, we considered a Gaussian prior with a mean
and precision of
for the average volatility parameter
, Gamma prior with the parameters
for marginal precision
, and Gaussian with mean
and a precision of
for parameter
H. We also considered a second prior structure (INLA 2), which is based on the penalized complexity priors framework (pc-priors), as introduced by
Simpson et al. (
2017). These define pc-priors as invariant to reparameterization, having excellent robustness properties, and allowing for a direct comparison between different conditional models.
Sørbye and Rue (
2018) showed how to construct such priors for precision
and persistence
H parameters describing an fGn process. See specific details in the
Appendix A.
We chose two parameter configurations for the data-generating processes describing different levels of volatility. The first one,
, represents the standard market conditions. The second parameter vector,
,
,
, represents more volatile conditions, such as cryptocurrency markets. For both parameter vectors, we simulated 1000 samples from an LMSV model with 500 and 1000 observations. In each experiment, we compared the point estimates to the true parameter value in terms of mean error (ME), root mean squared error (RMSE), and mean absolute error (MAE).
Table 1 and
Table 2 present the results for each parameter vector, with sample sizes of 500 and 1000. It can be observed that the estimation of INLA presents better results in terms of mean error (ME), root mean squared error (RMSE), and mean absolute error (MAE) for parameters
and
for the two parameter configurations and sample sizes. The different parameter configurations used in the estimation of the INLA do not seem to have relevant impacts on the estimation of these parameters. When looking at the Hurst exponent
H, we find rather mixed results. For the first parameter vector in
Table 1, INLA fares better with respect to all three accuracy measures. Interestingly, the opposite is true for the second parameter configuration in
Table 2, whose values represent more turbulent market conditions.
We also compared the performance of these methods in estimating the paths of latent log-variance
. In this case, we compute the ME, RMSE, and MAE between the true and estimated log-variance for each replication of the experiment, and report the average values between all simulations in
Table 1 and
Table 2. The performances of the MCMC and INLA methods in estimating latent log-variance are quite similar, with a marginally better estimation performance being obtained using INLA for both parameter settings and sample sizes. This result is very relevant, since the main objective in risk measurements is to accurately recover the true volatility of the process, which is the necessary input for the calculation of risk management measures such as Value at Risk. In this regard, the proposed methodology has a satisfactory performance for empirical applications, showing the validity of the method for real problems of risk measurement and management.
A major advantage of the INLA approach comes from its lower computational cost. We performed the MCMC estimations using a compiled C++ code though the R package nimble and INLA with the r-inla
1 package. The average time required for MCMC estimations with a sample size of 1000 was 38.5 s, increasing to 106.8 s if we include compilation time. Each INLA estimation was performed, on average, in 4.805 s. This very simple comparison suggests an increase in speed of about eight times. The results of the Monte Carlo experiment indicate that the INLA estimation presents a performance that is generally equal or superior to the MCMC method. Thus, we have evidence of INLA’s good performance in LMSV model estimations, in terms of both statistical performance an computational time.
5. Application to Daily Data
Table 3 reports descriptive statistics of teh daily log returns and absolute returns of Bitcoin and Ethereum, as well as the S&P500 index. The trajectories are depicted in
Figure 2.
For each asset, we experimented with four specifications. The first one, denoted Ar1SV, is a traditional stochastic volatility model with AR(1) conditional variance, as in Equations (
1) and (2). Our second specification is the LMSV model with volatility persistence given by a fractional Gaussian noise, as described in Equations (
7) and (8). Since we employed penalized complexity priors for the fGn process, the LMSV specification is equivalent to INLA 2 from the Monte Carlo simulation tables of
Section 4. As discussed in
Section 3, to showcase the flexibility of the INLA method we augmented both Ar1SV and LMSV with a smooth spline component following a second-order random walk, which functions as a slow-moving unconditional average volatility. These specifications, Ar1SV Spline and LMSV Spline, are presented in Equations (
14)–(
19). For further details on implementation and prior structure, we point to
Appendix A.
5.1. Parameter Estimates
Table 4 and
Table 5 report descriptive statistics of the posterior distributions of the estimated parameters for Bitcoin and S&P500, respectively. Due to space constraints, we do not present the estimated parameters of Ethereum, but the results can be obtained from the authors.
The qualitative implications of our estimations appear consistent over all three assets. The estimated values for persistence parameters H and in models Ar1SV and LMSV are large, but not so close to unity as to suggest the nonstationarity of latent volatility, which is typical in applications to financial data.
Incorporating a spline component into our stochastic volatility models has the effect of diminishing the persistence of latent log-variance through lowering the values of parameters
(Ar1SV-Spline) and
H (LMSV-Spline). Since we did not include a constant in the expressions for log-variance in our Ar1SV-Spline and LMSV-Spline specifications, the second-order random walk component serves as a smoothly varying unconditional mean log-variance. The effect of reduced persistence is more pronounced for Bitcoin than for the S&P 500, which could be taken as a suggestion that mean log-variance itself varies more over time for Bitcoin. This contributes to the body of empirical evidence showing that cryptocurrencies display higher overall levels of volatility, but also a relatively higher variability in the volatility level over time (e.g.,
Ghosh et al. 2023;
Ahmed et al. 2024).
5.2. Model Selection
Table 6 reports the estimated log marginal likelihood (mlik) and widely applicable information criterion (waic) measures for all assets and models.
Vehtari et al. (
2016) introduced waic as an adequate method for choosing between Bayesian models of different complexities with an interpretation analogous to traditional information criteria (the lower the better).
The boldface entries in
Table 6 indicate the best specification for each asset according to waic. For the two cryptocurrencies, the LMSV spline model was chosen as the best specification, while for S&P 500 the Ar1SV model was selected. These results seem to indicate that the LMSV specification is suitable for increased volatility and persistent cryptocurrencies, as the inclusion of a smoothly varying component in the unconditional average log-variance seems to better capture the volatility dynamics of this market. The inclusion of this component provides a way of incorporating a change in parameters into the unconditional volatility process, and the results are consistent with the many changes that occurred in these markets in the analyzed period.
In-sample fit measures, ME, RMSE, and MAE are reported in
Table 7. Absolute returns were taken as a proxy for the true unobserved volatility. We observe that the LMSV Spline model was the best model in terms of in-sample fit for S&P500, achieving the best results for all criteria, whereas for Bitcoin and Ethereum, the Ar1SV Spline model was chosen by ME, RMSE, and MAE. An out-of-sample forecast analysis is presented in the next subsection.
Figure 3,
Figure 4,
Figure 5 and
Figure 6 present a comparison of the volatility implied by each model specification and the observed absolute returns for Bitcoin and S&P 500. We can see that the adjustment of estimated models is consistent with the pattern of volatility observed in all the analyzed series.
5.3. Dynamic Value at Risk Estimation
Dynamic Value at Risk (VaR) measurements, calculated using different specifications, are presented in the bottom panels of
Figure 3,
Figure 4,
Figure 5 and
Figure 6. The values were computed using the approximation VaR(
) =
, where
is the sample mean of the process,
is the
-quantile of a standard Normal distributionm and
is the estimated conditional volatility. In this experimentm we show the results for the calculation of a VaR with
, a usual measure of tail risk. The results for other VaR levels are available from the authors upon request. In the same Figuresm we also show the dynamic Value at Risk (VaR) measurements calculated using the different volatility components estimated using the four analyzed specifications.
Broadly, dynamic VaR measurements appear to follow the observed tail risk for all assets. In order to more carefully verify the performance of the different specifications for conditional volatility, we show, in
Table 8, the proportion of observed violations within the sample (prop. viol.) and the p-values from the backtesting method for the VaR proposed in
Christoffersen (
1998) to verify if the empirical coverage of the estimated VaR is statistically equal to the VaR nominal value. This is compared against an alternative hypothesis of empirical coverage that is distinct from the nominal, indicating a problem in the VaR estimation.
For the calculation of VaR for BTC, the Ar1SV and LMSV spline models present different performances. Ar1SV comes closest to the expected value of violations (5%), while the LMSV spline tends to be conservative, underestimating the violations. For ETH, Ar1SV is closest to the expected performance with respect to the VaR of 5%. The LMSV and LMSV spline models consistently underestimate the violations. Finally, for the S&P 500 series, both Ar1SV and Ar1SV spline are well-calibrated with the VaR of 5%, with violation proportions very close to 5%. The LMSV spline also presents a good adherence.
5.4. Out-of-Sample Forecast
Our previous analyses compared the in-sample performance of the conditional volatility models. To verify the predictive performance for conditional volatility out-of-sample, we performed a forecast experiment using a rolling sample for the last 22 observations in the sample, which were left out of the estimation procedure.
In this experiment, we employed rolling samples with a size equal to the size of the original series minus 22, and with this new series, we estimate the four specifications analyzed above. From these estimates, we predict future volatility for 1, 5, and 22 steps ahead, and compare this with the observed values of absolute returns (proxy for true unobserved volatility). Then, we added one more observation in the estimation sample, and replicated the forecasting process until the end of this augmented sample. The prediction results for each analyzed series, four different specifications, and the three forecast horizons are presented in
Table 9. The best results in each category are highlighted in bold.
To compare the predictive results, we evaluated three error indicators for each time series (btc, eth, and sp500) and three forecast horizons (1-step, 5-step, and 22-step) for the four forecast models: Ar1SV, LMSV, Ar1SV Spline, and LMSV Spline. For the btc series and the one-step-ahead forecast, in terms of average error, the LMSV spline and Ar1SV spline stand out as they present the values closest to zero (0.0001 and −0.0012), respectively. In terms of RMSE, LMSV has the best performance (0.01558), which is repeated for the MAE in this precision horizon (0.0120), showing that the LMSV stands out in this configuration. For the five-step horizon, the LMSV model presents the best result in terms of ME (−0.00056), RMSE (0.01375), and MAE (0.01070). For the 22-step horizon, the Ar1 spline (−0.00055) and LMSV spline (0.00101) have the lowest mean errors, while the LMSV model has the lowest RMSE (0.01419), and the Ar1 spline model has the lowest MAE (0.01123).
Analyzing the predictive results for eth, for one-step-ahead, the Ar1SV spline model stands out in terms of having the best ME (−7 ), RMSE (0.01571), and MAE (0.012190). For the five-step horizon, the Ar1SV spline is again better in terms of ME (0.00143) and MAE (0.01320), and the LMSV presents the lowest RMSE (0.01700). The results are similar for the 22-step horizon, with a better ME (−0.00017) and MAE (0.01283) for the Ar1 spline model and a better RMSE (0.017190) being obtained by the LMSV. The results for sp500 indicate that the AR1SV spline obtained the best performance for one-step-ahead, with an ME, RMSE, and MAE of 0.001690, 0.00380, and 0.00359, respectively. Again, the best performance for five-steps-ahead was achieved with an ME, RMSE and MAE of 0.00187, 0.00334, and 0.002980, and for 22-steps, the LMSV had the best ME (5 ), and Ar1 presented with the lowest RMSE (0.00504) and Ar1 spline with the lowest MAE (0.00346).
In summary, for btc, the LMSV model is consistently the best in terms of RMSE and MAE, except for 22-steps, where the Ar1 spline excels in terms of MAE. For eth, the Ar1SV spline is the most effective for 1-step, while for 5- and 22 steps, the LMSV and Ar1 spline models present the best results in terms of RMSE and MAE, respectively, and for sp500, the Ar1SV spline has a better performance for 1- and 5-steps, while for 22-steps, the Ar1 and Ar1 spline stand out.
6. Application to High Frequency Data
A natural application of long-memory models is to model conditional volatility in high-frequency return data (
Deo et al. 2006). This type of application is important since it is possible to use the estimated conditional variance for high-frequency data to calculate the realized volatility measurements (e.g.,
Maasoumi and McAleer 2008;
McAleer and Medeiros 2008;
Lieberman and Phillips 2008) for daily data. The estimation methodology based on INLA and the approximation of an fGn by a Gaussian Markov Random Field is especially interesting. It is computationally efficient in terms of computational speed and memory usage, and thus can be used for the large number of observations observed in high-frequency data. The second advantage is the possibility of directly incorporating a latent variable into the model to capture patterns of intraday seasonality in market volatility without the need for multi-step estimation procedures, as proposed in
Deo et al. (
2006). The model uses the sum of fGN and zero-sum intraday periodic components, as provided by Equation (
20).
We estimated an LMSV model using 5 min intraday Bitcoin data from Bitstamp exchange for the period from 1 January 2021 until 20 September 2024, corresponding to a sample with 390,544 intraday observations.
Figure 7 shows the intraday Bitcoin returns that were analyzed. As described in
Section 3, additive intraday seasonality was included. For each 5 min window, one component was included, totaling 288 different intraday seasonal fixed effects.
Table 10 shows the parameters estimated using the LMSV model for this series, and
Figure 8 shows the estimated conditional volatility compared to the absolute intraday returns. We can observe that the model closely fits the temporal variation observed in the intraday returns.
The patterns of intraday seasonality estimated by the model can be seen in
Figure 9, which shows that there is substantial variation during the day. Log-variance is higher during the night time (with a notable increase starting from 18:00 UTC) and lower during the day, from 6:00 until 17:00. Shaded bands represent 95% credible intervals.
The mean error in the estimation of this model, using absolute returns as the metric of true log-variance, was calculated as 0.00229, indicating a minimum bias in this estimation. The root mean squared error was calculated with the value of 0.04588, and the mean absolute error was calculated with the value of 0.02710, indicating an adequate adjustment to the observed intraday data.
As previously discussed, one of the main advantages of using INLA in the Bayesian estimation procedure is its computational efficiency, a problem that is a fundamental limitation in the estimation of complex dynamic models with latent variables using intraday data. The total computational time involved in estimating this model was 3905 s, which is quite reduced compared to the complexity of a model with long memory and this high number of observations, emphasizing the computational gains of this approach compared to alternative methods, such as MCMC.
7. Conclusions
In this work, we explored the use of an estimation methodology for long-memory stochastic volatility models using an approximation of a fractional Gaussian Noise process as a Gaussian Markov Random Field, and with this approach we used the Integrated Nested Laplace Approximations methodology to perform a Bayesian estimation of parameters and latent variables.
This methodology is a useful addition to the set of tools used in the modeling of conditional volatility in time series, since it is computationally efficient when compared to traditional posterior simulation methods and the model specification can easily be extended to include additional latent factors. In this work, we show that a simple extension, with the addition of a smooth variation component to model the variation in the long-run average of the conditional variance, analogous to a spline, allows for gains in terms of in-sample fit and out-of-sample forecasting. Another extension is that of an LMSV model that includes seasonal patterns, which is especially important in the modeling of intraday returns.
The computational efficiency of this method is especially important in the modeling of high-frequency data, which are characterized by a large number of observations. Sample size may be an important limitation for other methods of conditional variance estimation, such as quasi-maximum likelihood estimators and MCMC-based methods. We show how the proposed estimator can be used to construct measures of realized variance, and also to make out-of-sample forecasts for this measure, incorporating the entire dependence structure in the conditional variance observed in the intraday returns.
The Integrated Nested Laplace Approximations (INLA) methodology, despite its notable computational efficiency and effectiveness, presents several limitations that warrant consideration. Primarily, INLA excels in the estimation of latent Gaussian models; however, its applicability may be constrained when addressing models that do not conform to this framework. More intricate non-Gaussian or non-linear models can pose challenges for estimation through INLA. While it is feasible to incorporate alternative distributions as innovation processes within the observation equation, the reliance on Gaussian distributions for latent variables constitutes a fundamental assumption of the INLA approach. Consequently, stochastic volatility models that incorporate non-Gaussian distributions often necessitate modifications, such as the implementation of variational approximations (
Cabral et al. 2024;
Van Niekerk et al. 2023), which can complicate the inference process and increase computational demands.
Additionally, INLA employs numerical approximations to compute posterior marginals, which may not consistently achieve the precision of fully simulated methods like the Markov Chain Monte Carlo (MCMC), particularly in scenarios characterized by highly skewed or multi-modal posterior distributions. Although INLA is well-suited to hierarchical and structured models, its flexibility may be more limited compared to MCMC, which can accommodate models with complex dependencies and structures that diverge from conventional latent Gaussian settings. Furthermore, while INLA is adept at managing large datasets, the computational time and memory requirements may significantly escalate when applied to extremely high-frequency data or models with intricate dependencies, while remaining more efficient than traditional MCMC-based methodologies.
This methodology can be extended in several directions. It is possible to construct univariate multifactor models, for example, with permanent and transient components, or factors with short and long memory, in the same model. It is possible to formulate a model with both Ar1 and fGn components, and to verify whether there are different memory patterns in the conditional variance. A model with Ar1 and fGn components would be similar to an ARFIMA (1,d,0) specification for the conditional variance of the process, which may be a useful specification for some time series.
The method presented in this article also can be used to generalize the structure of multivariate stochastic volatility models estimated using INLA, as proposed in
Nacinben and Laurini (
2024a). In this respect, the long-memory dynamics could be used as an alternative to the first-order autoregressive structure in the construction of multivariate models using a multifactor structure. Another possible application is to compare the fit and predictions of the long memory model on conditional volatility with alternative models using regime shifts and other forms of parameter variation. Since the approximation structure for the long memory process used in our study is itself based on the aggregation of short-memory models, it is expected that this method could be an alternative way to predict volatility in the presence of changes in the process memory.
Another potential extension is the incorporation of non-Gaussian distributions into the innovation process of latent factor dynamics. This approach could enhance the model’s robustness in the presence of heavy-tailed distributions that affect the conditional volatility process. Furthermore, variational methods can be integrated with INLA to facilitate inference in stochastic volatility models that employ non-Gaussian distributions, as demonstrated in
Cabral et al. (
2024) and applied in the estimation of univariate SV models by
Nacinben and Laurini (
2024b). By combining long memory structures with non-Gaussian distributions, we can broaden the applicability of stochastic volatility models to encompass the two significant empirical features commonly observed in financial data.