For methodological purposes, we detail the econometric methodology and the various estimation tasks step-by-step.
3.1. Objectives of the Model
Looking at the evolution of the prices of any asset, a good understanding of the evolution of the associated volatility is essential. GARCH processes (Engle [
31], Bollerslev [
32]) allow the extraction of information about the current level of volatility. Nevertheless, the current level of volatility based on the squared returns can appear as poor information when this volatility changes rapidly to a new level. Thus, a GARCH process is slow at catching these changes.
If the assets are known by their daily prices, the GARCH models give information on the volatility of these daily prices. The recent availability of high-frequency data (second, minutes, etc.) has caused researchers to introduce new measures of volatility called realized measures of volatility, which were introduced into the GARCH modeling. Thus, Hansen et al. [
3] introduced the Realized GARCH (REGARCH) model in 2011. These REGARCH models have provided an advantageous structure for the joint modeling of stock returns and realized volatility measures.
According to Andersen et al. [
33], the exploitation of granular information in high-frequency data by considering realized measures constitutes a much stronger signal of latent volatility than squared returns. However, with this approach, we do not capture the existence of the dependence structure characterized by a positive and slowly decaying autocorrelation function or a persistence parameter close to unity (the famous ‘integrated’ GARCH effect). A multiplicative decomposition of the conditional variance into a short-term and a long-term component has been developed to capture the evident high persistence.
In 2019, Borup and Jakobsen [
8] modeled the short-term component via a first-order REGARCH model and the long-term component via, for instance, a MIxed-DAta Sampling (MIDAS) structure. In this long-term component, several MIDAS variables can be taken into account. Typically, they could be macro variables like global financial stress indexes, some economic policy uncertainty indexes known at a monthly frequency, or financial variables like the monthly realized volatility computed through the daily realized variances.
In this paper, we aimed to understand better the evolution of Bitcoin and Ethereum price swings observed within a day. Thus, we investigated the volatility of these crypto assets by looking simultaneously at the evolution of their latent volatility based on daily returns with long- and short-term components, as well as their jump-robust realized volatility based on high-frequency data (60 min). With the perspective of a long-term forecasting horizon, we are interested in detecting the drivers behind these cryptocurrency assets by considering the existence of persistence in their volatility.
We proceed with a brief exposition of the REGARCH-MIDAS-X model to answer these different expectations. A full-fledged version can be found in Borup and Jakobsen [
8].
3.2. Some Notations
Let denote a time series of returns, a time series of realized measures, and a filtration so that is adapted to . We define the conditional mean and the conditional variance . We now introduce the specific modeling for introducing several time scales that we used in terms of low and high frequencies.
We define the daily returns
, where
is the price of the asset. In the MIDAS framework, it is convenient to introduce two time scales: here,
denotes the monthly frequency and
the number of days within a month
t, (Typically,
to amount to the number of days traded within a month.
m can be formulated either via keeping it locally constant or else based on a local moving window. Engle et al. [
2] documented that the difference between the two is negligible.) We assumed that the conditional mean
corresponds to a zero-beta portfolio (Black [
34]):
with
The innovation
is assumed to be i.i.d. with mean zero and variance one (The innovation
is an i.i.d random variable from a Student’s
t distribution with
degrees of freedom, as fat-tailed distributions are found adequate to describe financial data (Kuester et al. [
35]).);
and
denote the short- and long-term components of the conditional variance, respectively.
The short-term component
varies at the daily frequency and follows a unit variance GARCH(1,1) process. We used a logarithmic specification, which automatically ensures a positive variance:
where
,
, and
.
The specification of the long-term (secular, low-frequency) volatility component builds on a long tradition, dating back to Merton [
36] and Schwert [
37], of measuring long-run volatility by realized volatility over a monthly horizon. What is new here is that
is specified by smoothing historical realized volatilities in the spirit of MIDAS filtering, instead. Accordingly, the long-term component
is regressed on a set of variables, varies at the monthly frequency, and is given by:
where
and
K is the number of periods over which the variables are smooth.
denotes the MIDAS variable at the monthly frequency.
assigns the importance of the MIDAS variable.
z is the parameter accounting for the statistical significance of the volatility index
inserted one at a time as an exogenous explanatory variable in our REGARCH-MIDAS-X experiments.
The parameter
depicts a selected weighting scheme. This procedure allowed us to estimate optimally the number of lags for both the daily and monthly returns within MIDAS. It can produce various lag structures for past returns, monotonically increasing/decreasing, hump-shapes, etc. Ghysels et al. [
38] documented that the beta function is a better choice than the exponential Almon for high-frequency models. In our setting, we, therefore, introduced:
By construction, the weights , , sum to one. Practically, we imposed , which implies that the weights are monotonically declining.
We now introduce the realized measure of volatility
, inside the measurement equation, which provides a framework for the joint modeling of returns and volatility based on high-frequency data. Specifically, this paper features jump-robust estimators (sampled at an hourly frequency to avoid microstructure noise; see, e.g., Hansen and Lunde [
39]) of Bitcoin and Ethereum instead of the naive RV, as recalled in
Section 2 (see Sanhaji and Chevallier [
18] for a further reference).
Unlike the naive augmentation of GARCH processes by realized measures, the REGARCH model relates the observed realized measure to the latent volatility via the measurement equation. According to Hansen et al. [
3], including the realized measure in the model and the fact that
has an ARMA representation motivate its name. In what follows,
t denotes the daily frequency and
i is the hourly frequency:
The sequence
was assumed to be i.i.d. with mean zero and variance
,
and
being mutually independent. Equation (
12) is natural when
is a consistent estimator of the integrated variance. The innovation is captured by
.
is called the leverage function. This function captures the dependence between returns and future volatility, and we used the functional form constructed on Hermite polynomials, i.e.,:
The choice of Hermite polynomials is retained here, permitting a simple quadratic form, ensuring with and for any distribution. This form generates an asymmetric response in volatility to returns shocks, permitting mapping out how positive and negative shocks to the price affect future volatility. The parameter reflects how much the daily volatility occurs during trading hours. The parameters and (for instance) show how a shock in the price impacts volatility. For that, we can use the impact curve defined as . So, measures the percentage impact on volatility as a function of a return shock: .
To sum up, Equation (
7) is labeled as the ‘return equation’, Equation (
9) as the ‘GARCH equation’, Equation (
10) as the ‘long-term’ equation, Equation (
11) as the ‘weighting scheme’, and Equation (
12) as the ‘measurement equation’. In this multiplicative framework, the ‘GARCH equation’ drives the dynamics of latent volatility. The ‘measurement equation’ is the true innovation in REGARCH and makes the model dynamically complete. It links the ex post realized measure with the ex ante conditional variance. It facilitates a simple modeling of the dependence between returns and future volatility. Of course, discrepancies between the two measures can be observed in
. On the one hand, the conditional variance refers to close-to-close market intervals (e.g., daily returns). On the other hand, the realized measure is computed from open-to-close market intervals (in our setting, hourly sampling frequency). That is why both proportional
and exponential
correction parameters are included.
Taken together, Equations (
7)–(
12) form the realized mixed-frequency GARCH model for time-varying conditional variance, which we propose to tailor to jump-robust estimators of Bitcoin and Ethereum (with volatility indices as exogenous variables) as an original application of this paper.
3.3. Estimation Practicalities
In the tables of results, the parameter space will be synthetically reproduced as
with distributional assumptions
and
.
and
were assumed to be mutually and serially independent. Hansen et al. [
40] further detailed the decomposition of the conditional density, leading to two contributions to the log-likelihood function. Empirically, the model is estimated by the Quasi-Maximum Likelihood Estimator (QMLE).
The asymptotic analysis of the quasi-maximum likelihood estimator for the GARCH MIDAS model can be found in Wang and Ghysels [
41]. They provided a rigorous analysis of the maximum likelihood estimator of the GARCH-MIDAS model, so that it admits covariance stationary or strictly stationary ergodic solutions or satisfies
-mixing properties. Besides, they documented that the QMLE is unbiased and the asymptotic standard errors are valid in the presence of exogenous explanatory variables.
For the realized GARCH, Hansen et al. [
3] and Hansen and Huang [
42] detailed the regularity conditions that justify the QMLE’s inference. They noticed that the mathematical structure of the log-likelihood function for the realized GARCH is similar to the structure analyzed in Straumann et al. [
43], who adopted a stochastic recurrence approach to analyze the QMLE properties for a broad class of GARCH models. Besides, they relied on their proof of previous works by Jensen and Rahbek [
44] and Jensen and Rahbek [
45], documenting that the QMLE is consistent with a Gaussian limit distribution regardless of the process being stationary or non-stationary. Regarding the theoretical properties of the log-linear specification in the realized GARCH, Hansen et al. [
3] provided closed-form expressions for the first and second derivatives of the log-likelihood function that enable the computation of robust standard errors (An attractive feature of the log-linear representation is that it conveniently preserves the ARMA structure that characterizes the GARCH equation. Another advantage of using a logarithmic form is that it automatically ensures a positive variance. Hansen et al. [
3] presented additional evidence in favor of the log-linear specification in Section 5.5 of their paper.).
Based on previous works by Han and Kristensen [
46], Han [
47], and Francq et al. [
48], Borup and Jakobsen [
8] further documented the log-likelihood function of REGARCH-MIDAS:
In Equation (
14), the joint log-likelihood is split into a sum of univariate models, whose likelihood can be maximized separately. The factorization of the likelihood is possible because:
All observables are tied to their individual latent volatility process (e.g., is tied directly to the conditional volatility );
The innovations and are taken to be independent in the formulation of the likelihood function.
Borup and Jakobsen [
8] proceeded to calculate the score functions as a martingale difference sequence, which defines the first-order conditions for the maximum-likelihood estimator and facilitates the direct computation of standard errors for the coefficients. Specifically, for the long-run component, the derivatives for REGARCH-MIDAS can be found in Equation (A.41) of their Supplementary Appendix. To check the validity of the asymptotic distribution of the estimators, Borup and Jakobsen [
8] followed the parametric bootstrapping technique by Paparoditis and Politis [
49]. The in-sample distribution of the estimated parameters for REGARCH-MIDAS aligns with a normal distribution. The authors concluded that the QMLE approach and associated inferences are valid.
In terms of estimation ‘tricks’, we initialized the conditional variance process
to be equal to its unconditional mean. To initialize the long-term component
, we set the past values of
equal to
for the length of the backwards-looking horizon in the MIDAS filter. To avoid the issues of inferior local maxima during the estimation, we considered a grid of starting values by perturbation. Given this perturbation, the numerical optimization was stable. Currently, the estimation of MIDAS filters is eased by several licensed (e.g., Matlab, Eviews) or GNU (R CRAN, Octave, Gretl) software, with resources flagged on Eric Ghysel’s website (
http://eghysels.web.unc.edu/ (accessed on 13 January 2023) (Onno Kleen and Daniel Borup also will provide their respective codes upon request to them.).
3.4. Selecting the Mixed Data Sampling Filter: Cryptocurrency Hashrates
Since the seminal contribution by Ghysels et al. [
50], the appeal of MIxed DAta Sampling (MIDAS) has been immediate to all macroeconomists (Ghysels et al. [
38] defined the MIDAS regressions as “
a simple, parsimonious, and flexible class of time series models that allow the left-hand and right-hand variables of time series regressions to be sampled at different frequencies”.). It offers, indeed, the possibility to contrast the frequency available in financial markets (typically, daily or intraday) with that available for macroeconomic variables (e.g., quarterly or monthly). By using the highest frequency available for each series, the econometrician is, therefore, not losing any information, as he/she would have in the case where he/she had harmonized all the series (say, to monthly frequency) to obtain a balanced sample. Initial applications of this technique include, to cite a few, Ghysels et al. [
51], who found a significantly positive relation between the conditional mean and the conditional variance of the aggregate stock market return, or Ghysels et al. [
52], who considered various MIDAS regressions to predict volatility. A survey can be found in Ghysels et al. [
38].
MIDAS consists of two additive components, one interpreted as a short-run (transitory) component estimated with daily return data and a second one identified as the long-run (secular trend) component obtained from macroeconomic monthly data. Alternatively, the short-term part could reflect the day traders’ investment horizon. In contrast, the long-term component could relate to pension funds or other types of investors with longer-term maturities in mind. Katsiampa [
53] documented that Bitcoin volatility can indeed be decomposed into short- and long-term components. Once again, we observed the ability of the component models to capture complex dynamics via a parsimonious parameter structure.
Regarding the long-term MIDAS macro component serving as a proxy of the business cycle, the monthly industrial production is typically selected by studies on the S&P 500 index (see, e.g., Engle et al. [
2] or Conrad and Loch [
54]). In this paper, we opted for a more ‘digital’ MIDAS filter by selecting the hashrate, which shows the historical measure of the processing power on a given blockchain network. For instance, Bitcoin’s market price and total hashrate tend to go hand in hand (see, e.g., Fantazzini and Kolodin [
55], Marthinsen and Gordon [
56], and Kubal and Kristoufek [
57]), as is visible to the interested reader in
Figure A1 of
Appendix A. Bitcoin’s monthly hashrate was sourced from Nasdaq Data Link. Ethereum’s hashrate was sourced from Etherscan. Both variables are displayed in
Figure 3. The ‘digital’ MIDAS filter was transformed to a logarithmic first difference.