**4. Analysis**

The out-of-sample period begins on 1st September 2016 and the forecast horizon ranges from *h* = 1 to *h* = 7 days ahead. The analysis compares the performances of two models: the first—M1—includes all the predictors: financial, commodity and crypto predictors, see Table 1. The second, M2, excludes crypto predictors. The benchmark model, denoted with M0, is an ARMA(1,1)-GARCH(1,1) model. M1 and M2 can suffer from massive model uncertainty due to the number of possible predictor's combination at each time point *t*. For example, M1 has 29 = 512 models at each point in time. To mitigate this fact, we use the DMA and DMS as described in Koop and Korobilis (2012) and reported in Supplementary Materials. As already mentioned, the methodology requires fixing three hyperparameters: the forgetting factor *λ* for the parameter variation; the decay factor *κ* for the EWMA; and, finally, the discount weight *α* that weights each model based on forecast performances.

The results reported in this section are based on *κ* = 0.94. This value suits daily data, see Riskmetrics (1996) and Prado and West (2010). The other parameters are set to *α* = 0.99 and *λ* = 0.99, coherently with Raftery et al. (2010). In Section 4.2 a robustness analysis for the forgetting factors *α* and *λ* is carried out. Moreover, we also tried to optimize at each time point *λt* using a standard data-driven approach minimizing the expected prediction error. Unfortunately, the optimized *λt* with crypto time series seems to be very unstable; we leave this issue as a topic of further research.

The analysis begins with the investigation of the posterior inclusion probabilities of each predictor: the higher the probability the higher the predictor's influence over the dependent variable. Figure 2 depicts the posterior probabilities of BTC (Panel (a)), and of BHL (Panel (b)). Time-varying posterior probabilities of inclusion for the other exogenous variables are reported in Supplementary Materials.

**Figure 2.** Posterior inclusion probabilities. Panel (**a**) shows posterior inclusion probability for

 BTC.

Panel (**b**) shows posterior inclusion probability for BHL.

The figures show that the importance of each predictor switches rapidly over time, with a high inclusion probability of BTC in some specific periods. One important change is in 2016 when the inclusion probability suddenly jumped from 0.5 to 0.9 increasing the correlation with the S&P500 and potentially its role as a leading indicator.

After a calm period during 2017, the BTC gained importance once again at the end of the same year with a steep rise in price. During this period, a lot of articles pointed out a correlation between BTC and financial markets. Bloomberg (2018) stated that "big investors may be dragging Bitcoin toward market correlation" and see BTC as an asset which guarantees the highest potential risk/return combination in the market. This may have attracted the interest of big investors able to move huge amounts of funds and consequently correlate BTC to the USA stock market. Another article by Cointelegraph (2018) asserted that BTC might be correlated with VIX, but there is no evidence that it may influence the

S&P500 index. An extensive analysis of the latter issue is carried out in the next sections using point and density forecast.

#### *4.1. Forecast Metrics*

To assess the leading property of BTC we use point and density forecast. For the point forecasts, we use the mean absolute forecast error (MAFE) for each forecast horizon, *h* = 1, . . . , 7:

$$\text{MAFE}\_{h} = \frac{1}{T - R} \sum\_{t = R}^{T - h} \left| \hat{g}\_{i, t + h|t} - y\_{i, t + h} \right| \,\text{}\tag{2}$$

where *T* is the number of observations, *R* is the length of the rolling window, *<sup>y</sup>*<sup>ˆ</sup>*t*+*h*|*t* is the S&P500 forecast made at time *t* for horizon *h* and *yt*+*h* is the realization.

To evaluate the density forecasts, we use predictive log score (LS) that is commonly viewed as the broadest measure of density accuracy, see Geweke and Amisano (2010). As for the MAFE, we compute the LS for each horizon:

$$s\_h(y\_i) = \sum\_{t=R}^{T-h} \ln \left( f(y\_{t+h}|I\_t) \right),\tag{3}$$

where *f*(*yt*+*<sup>h</sup>*|*It*) is the predictive density for *yt*+*h* constructed using information up to time *t*.

We report the MAFEs and the LSs as a ratio of each model's with respect to the baseline. Entries smaller than 1 indicate that a given model yields forecasts that are more accurate than those from the baseline and differences in score relative to the baseline, such as a negative number, indicates a model that beats the baseline. In order to statistically assess the differences between alternative models, we apply the Diebold and Mariano (1995) test for equality of the average loss (with loss defined as squared error and negative log score) of each model versus the ARMA(1,1)-GARCH(1,1) benchmark and we also employ the Model Confidence Set procedure of Hansen et al. (2011) using the R package MCS detailed in Bernardi and Catania (2016) to jointly compare all predictions. Differences are tested separately for each forecast horizon.

#### *4.2. Point Forecast*

Point forecast is evaluated through MAFE for both DMA and DMS as well as for their special case, Bayesian Model Averaging (BMA). For each forecast horizon, the errors are calculated using the following combination of forgetting and discount factors: *λ* = *α* = 0.99, *λ* = *α* = 0.95, *λ* = 1 and *α* = 0.99, *λ* = 0.99 and *α* = 1, and finally *λ* = *α* = 1. In all the cases, the decay factor is fixed to *κ* = 0.94.

Table 2 compares point forecast for M1 and M2 as a ratio M0 (top) and against M0 (bottom). From the upper table, it emerges that the errors are increasing in accordance with the forecast horizon. Moreover, when *h* increases, the ratio increases, meaning that the benchmark model displays better results than DMA and DMS. Table S8 in Supplementary Materials B shows that increasing the forecasting horizon to *h* = 10 does not improve the forecasting performance of M1 and M2. However, Section 4.3, which analyses density forecasts results, reveals different outcomes.

Another peculiarity is that forecasts improve when *α* and *λ* tend to 1. When *α* = *λ* = 0.95 we ge<sup>t</sup> the worst forecast performance for DMA and DMS, while the best results are obtained with BMA. This may be due to the nature of the series: the presence of outliers and high peaks in BTC series may distort the point forecast.

To see if BTC improves predictability over the S&P 500 a DM test is performed with a level of significance equal to *α* = 95%. Results are reported in Supplementary Materials Table S2. There is no evidence of an improvement in prediction when the BTC is added to the set of predictors. Further results for different forecast horizons are reported in Supplementary Materials.

Using point forecast it seems that BTC does not improve predictability over the S&P 500 index. **Table 2.** Point forecast: M1 vs. M0 top table and M2 vs. M0 bottom table. Results are reported as the ratio between the model considered over the benchmark. From both the tables it emerges that the simplest model, the ARMA(1,1)-GARCH(1,1), forecasts better than both DMA and DMS.


*J. Risk Financial Manag.* **2019**, *12*, 93
