*4.3. Density Forecast*

Density forecast is more informative than point forecast, as it is a measure of the prediction uncertainty. The PL, which is the basis of density forecast, comes as a by-product of the adopted estimation strategy. Table 3 reports the LS: the evidence is striking and the results are almost opposite to those in Section 4.2. Both M1 and M2 provide statistically superior forecasts with respect to M0.

**Table 3.** Log Score (LS), computed over the forecast horizon. Results are reported relative to the benchmark specification (ARMA(1,1)-GARCH(1,1)) for which the absolute score is reported. Values in **bold** indicate rejection of the null hypothesis of Equal Predictive Ability between each model and the benchmark according to the Diebold-Mariano test at the 5% confidence level. Grey cells indicate those models that belong to the Superior Set of Models delivered by the Model Confidence Set procedure at confidence level 10%. As the table shows, the difference between M1 and M2 is very poor.


The first column reports the PL for the benchmark model (M0), and the other columns report the differences of M1 and M2 with respect to the benchmark. Among the three models, M0 shows the worst results in contrast with the results of Section 4.2. The best forecast result is obtained for M1 when *h* = 1; however, the difference between M1 and M2 is almost irrelevant. Following the same strategy previously adopted the DM test is carried out, with a significance level of 95%.

The DM statistics, equal to −2.236, sugges<sup>t</sup> that the null hypothesis of equal forecasting ability is rejected. As discussed in Harvey et al. (1997), the DM test could be over-sized when the forecast horizon is greater than one, so in those cases, we used the modified test given by:

$$S\_1^\* = \left[\frac{P+1-2h+P^{-1}h(h-1)}{P}\right]^{1/2} S\_{1/2}$$

where *S*1 is the original statistics, *h* is the forecast horizon, *P* is the forecast evaluation period. The modified version of DM test maintains the same null hypothesis of equal forecasting ability. Whereas, the alternative is that model M2 is more accurate than model M1.

*H*1 is accepted in this case since the test shows a very high *p*-value (0.987). In other words, model M2 is performing better than M1 in terms of forecasting. Finally, the MCS indicates that DMS and DMA has similar performance across horizons and they are both superior to the benchmark.

Therefore, the density forecast shows a different outcome to that of the point forecast. While the benchmark model performs better than DMA and DMS in terms of MAFEs, the opposite is true when the density forecast is considered. DMA and DMS give much better predictions for the S&P 500 when the PL is considered.

The main goal of the paper is to understand whether BTC can be assumed to be a good predictor for the S&P 500. Point forecast does not give any contribution to answering this question. A more precise result is reached when the density forecast is used. Even though the PL are close to each other, it has been found that the model that excludes the BTC related series outperforms the one that includes them at lag one. For the other lags, the results are mixed and almost all the models are included in the MCS without a clear superior model. This indicates that BTC does not seem to improve the predictability of the S&P 500 index.

Table S9 in Supplementary Materials reports the results for *α* = *λ* = 0.99 and *κ* = 0.94 when the Dow Jones (DJ) index is substituted to S&P500. It emerges that, using DJ, the BTC improves the

result of both point and density forecast for shorter horizon (one or two days ahead). These results are promising and carrying out an extensive analysis for different markets is a topic for further research.
