1. Introduction
Appropriate modelling and predicting of volatility is important due to several reasons, as outlined in [
1]: Firstly, when volatility is interpreted as uncertainty, it becomes a key input to investment decisions and portfolio choices. Secondly, volatility is the most important variable in the pricing of derivative securities. To price an option, one needs reliable estimates of the volatility of the underlying assets. Thirdly, financial risk management according to the Basel Accord as established in 1996 also requires modelling and prediction of volatility as a compulsory input to risk management for financial institutions around the world. Finally, financial market volatility, as witnessed during the Global Financial Crisis and the COVID-19 pandemic, can have wide repercussions on the economy as a whole, via its effect on real economic activity and public confidence. Hence, estimates of market volatility can serve as a measure for the vulnerability of financial markets and the economy, and can help policy makers design appropriate policies. Evidently, appropriate modelling and accurate predictability of the process of volatility based on factors (predictors), has ample implications for portfolio selection, the pricing of derivative securities and risk management [
2], making it a metric of paramount importance to not only investors, but also policy makers. Hence, not surprisingly the associated international academic literature on stock market volatility, in terms of econometric methods and predictors being considered, is massive to say the least, with a proper review being beyond the scope of this paper. We refer interested readers to the works of [
3,
4,
5,
6,
7] for detailed discussions in this regard.
Against this backdrop of the importance of volatility modelling, and given the fact that macroeconomic news involving movements of fundamentals of the United States (US) have been found in earlier research to drive the volatility of the South African stock market [
8], our aim in this paper is to revisit this issue in greater detail based on a sophisticated machine learning methodology (as will be discussed below). The finding of [
8] is in line with the extant international literature in this regard (see, for example, [
9,
10,
11,
12,
13]). Ref. [
8] showed that surprises about inflation and unemployment rate announcements carried predictive content for the stock market volatility of the Johannesburg stock exchange (JSE), based on the Glosten–Jagannathan–Runkle-generalized autoregressive conditional heteroskedasticity (GJR-GARCH) model [
14]. In addition, we also draw comparisons of our findings related to US fundamentals on the South African stock market volatility with the role of economic sentiment of the US, the influence of which on the South African stock returns variability has been reported by [
15,
16]. In other words, we aim to compare the influence of US fundamentals versus behavioural aspects in shaping the risk profile of the stock market of an emerging market economy. Although our focus is on South Africa mainly, we also contextualize our findings relative to other major emerging markets, namely Brazil, China, India and Russia. This contextualization helps us to provide a comprehensive understanding of the impact of the US-based fundamentals and sentiments on the entire BRICS bloc, which is now well-established as providing diversification benefits to international investors [
17].
The choice of South Africa as the main component of our case study is driven not only by the availability of stock market data spanning over four decades in our analyses (1980 to 2020), but also because the need to conduct a focussed analysis of the South African stock market is warranted due to its sophistication. Moreover, South Africa is one of the largest exporters of highly financialized strategic commodities such as coal, chrome, diamond, gold, ilmenite, iron ore, manganese, palladium, platinum, rutile, vanadium, vermiculite, and zirconium [
18]. Being a commodity-based economy, South Africa is globally well-integrated, and in light of the dominance of the US economy in the world financial system, changes in its macroeconomic fundamentals and behavioural components are likely to affect international financial markets in general and the South African stock market in particular, besides domestically, given that asset prices are functions of the state variables of the economy, shaped by the dynamics of fundamentals and sentiments [
19,
20]. At the same time, emerging markets are subjected to US investment flows, as a matter of portfolio diversification, hence, changes in the macroeconomic and behavioural environment of the US affects its domestic and global investment potential, which in turn is likely to feed into the risk status of the stock markets of emerging market economies [
21,
22]. The theoretical grounding of stock return volatility dates back to the [
23], which states that stock return volatility is mainly caused by leverage effects, i.e., basically affected by news on fundamentals and/or sentiment. Naturally, in light of the connectivity of the global economy, theoretical models of volatility, should incorporate the information of not only domestic shocks, but also international innovation to macroeconomic and financial variables in the state-space of asset price variability. This is more so for emerging markets, which are susceptible to global surprises, while simultaneously acting as portfolio diversification outlets.
Specifically speaking, using a long-horizon prediction model, we evaluate the predictability of monthly stock market realized volatility of South Africa, and the BRIC countries, with newspaper-based macroeconomic attention indexes (MAIs) of the US on eight fundamentals (unemployment, monetary policy, output growth, inflation, housing market, credit ratings, oil, and the US dollar) with comparisons drawn with a newspaper-based measure of US economic sentiment. Realized volatility, as captured by the (log) square root of the sum of squared daily log-returns (following [
24]), is considered as an accurate, observable, and unconditional metric of volatility [
25], unlike the measures of the same derived from the popular alternative types of GARCH models, that has been primarily used in the South African stock market context to capture volatility (see for example, [
26,
27,
28,
29,
30,
31,
32,
33,
34,
35,
36], and references cited therein), as well as the stochastic volatility (SV) framework. At this stage, it must be noted, within the GARCH-class of models, volatility analysis in South Africa has been dominated by univariate frameworks, and when multivariate-settings were indeed adopted to capture information of predictors, focus was primarily on domestic variables [
37,
38,
39]. More importantly, we use quantile random forests [
40], a flexible data-driven machine learning technique, to derive inferences from our prediction models. This machine learning technique is capable of capturing complex non-linear and interaction effects in a natural way from the fundamentals- and sentiment-based predictors mentioned above, which could be as many as seventeen in number in a full model. Moreover, quantile random forests allow the predictive value of the US MAIs and sentiment to be traced out along the quantiles of the conditional distribution of realized volatility. This is advantageous for a risk manager looking to invest in the BRICS, who could need information of whether movements in fundamentals or sentiment precede turbulent market phases, and hence may be not so much interested in a prediction of the conditional mean but rather some upper quantile of future realized volatility.
To the best of our knowledge, we are the first to compare the importance of MAIs versus sentiment of the US economy in predicting realized volatility of the stock returns of South Africa and the BRIC countries using a quantile-based machine learning technique. In a sense, our research can be considered an extension of the recent work by [
41], who highlight the importance of the MAIs, particularly at longer horizons, in predicting the path of realized volatility of the G7 (Canada, France, Germany, Italy, Japan, the United Kingdom (UK), and the US) stock markets. Hence, our paper takes an emerging markets perspective, and goes beyond the conditional-mean-based predictive regression approach by considering quantile random forests to study the entire conditional distribution of the realized volatility of the BRICS, information of which should be of immense value to risk managers, given that these are economies known to provide portfolio diversification opportunities beyond developed equity markets. We organize the remainder of our research as follows: we describe our data in
Section 2, while we lay out our empirical methods in
Section 3. We discuss in
Section 4 our empirical results, and finally we conclude in
Section 5.
2. Data
We computed the square root of the sum of squared daily log-returns of the JSE stock index and, thus, considered the classical estimator of realized variance as per [
24], with the underlying data derived from Refinitiv Datastream. In our predictive analyses, we used the square root of realized variance (that is, realized volatility,
) to scale down on the large spikes in the data during various periods of crises over the sample period, and we considered the natural logarithm of
to bring the empirical distribution of realized volatility closer to a normal distribution. We shall also report, however, in
Section 4 results for realized variance and the anti-log of realized volatility.
Reference [
42] observed that market participants, in addition to the level of volatility, also care about the nature of volatility, with investors typically differentiating between “good” upside (calculated based on the sum of squared daily positive log-returns only) and “bad” downside (calculated based on the sum of squared daily negative log-returns only) volatilities. In light of this, we also predicted the logarithms of good and bad
s, besides overall
. We plot the
along with good and bad
in
Figure 1.
As far as our predictors are concerned, the MAIs are indicators constructed by [
43] to focus on different macroeconomic risks of the US. (The MAIs can be downloaded from the data segment of the internet page of Professor Jinfei Sheng at
https://sites.google.com/site/shengjinfei/data?authuser=0, accesed on 31 January 2023) To this end, the authors consider eight macroeconomic news categories. These eight categories capture risks associated with unemployment, monetary policy, output growth, inflation, housing market, credit ratings, oil, and the US dollar. They then measure the attention of each category by building a word list to count the number of articles in every category. The MAIs are constructed based on a text corpus of articles from the New York Times (NYT) and the Wall Street Journal (WSJ).
In terms of our behavioural variable dealing with economic sentiment, we utilize the news sentiment index (NSI) developed by [
44], which is based on a lexical analysis of economics-related news articles from 24 major US newspapers (compiled by the news aggregator service Factiva). (The data can be downloaded from:
https://www.frbsf.org/economic-research/indicators-data/daily-news-sentiment-index/, accessed on 31 January 2023). The articles that the researchers selected were those with at least 200 words which Factiva identified as dealing with “economics” as the topic, and the “United States” as the subject country. Finally, they combined publicly available lexicons with their news-specific lexicon and constructed a sentiment-scoring model designed specifically for newspaper articles.
When using MAIs derived from the NYT with or without the NSI, our analyses covers June 1980 to December 2020, while the same is January 1984 to December 2020, when we rely on the WSJ-based MAIs (with or without the NSI), while the sentiment index data stretches back to January 1980.
3. Methods
A standard way to examine the link between realized volatility, fundamentals, and sentiment is to estimate a long-horizon prediction model of the following format:
where
denotes an intercept term, and
, and
denote slope coefficients (or coefficient vectors) to be estimated,
denotes a disturbance term,
denotes the realized volatility, the parameter
h denotes the prediction horizon, and (for
)
denotes the average realized volatility over the relevant horizon. The
term on the right-hand side of Equation (
1) controls for the presence of autocorrelation of realized volatility, and the
term denotes fundamentals or sentiment (or both).
The long-horizon prediction model formalized in Equation (
1) sheds light on the effect of fundamentals and sentiment on the conditional mean of subsequent realized volatility and, thus, may be of limited use in certain situations when it comes to risk management. A risk manager, for example, who needs information of whether movements in fundamentals or sentiment precede turbulent market phases may not be so interested in a prediction of the conditional mean but rather at some upper quantile of subsequent realized volatility. A natural extension of the long-horizon prediction model, thus, is to estimate Equation (
1) as a quantile-regression model (Koenker and Bassett, 1978; Koenker 2004) [
45,
46]. The quantile-regression version of the long-run prediction model is given by
where
denotes the quantile being studied,
denotes the quantile-dependent coefficient vector (a hat denotes an estimated parameter), and
, denotes the check function, defined as
if
and
if
.
Estimation of the long-run prediction models given in Equations (
1) and (
2) is complicated when the number of predictors is large. In our empirical analysis, for example, we study various sources of fundamentals, and from two different newspapers. A large number of predictors naturally inflates the number of coefficients to be estimated. In addition, the various fundamentals and sentiment predictors are likely to interact, and accounting for such interactions may help to improve upon the predictive performance of the prediction models. In addition, the link between subsequent realized volatility and its predictors may be non-linear in some cases, such that it would be advantageous to have available a version of the long-run prediction model that accounts, preferably in a purely data-driven way, for predictor interactions and potential non-linearities in the data. A quantile-regression forest is such a model. A recent application of quantile-regression forests in empirical finance can be found in a paper by [
47]. The following exposition of how a quantile-regression forest works closely draws on the discussion in that paper.
A quantile-regression forest consists of many individual regression trees. A regression tree, in turn, consists of a root, interior nodes, and terminal nodes (for a comprehensive textbook exposition, see [
48]). The nodes recursively partition the space of predictors into subspaces (this partitioning is done in a top-down and binary way). The formation of such subspaces starts at the top level of a regression tree by choosing a partitioning predictor,
s (that is, one of the right-hand side variables in Equations (
1) and (
2)), and a partitioning point,
z, in such a way as to form two regions
and
, which are obtained as the solutions to the optimization problem
, where
, with
,
,
denotes that the period
t realization of predictor
s belongs to region
, and
denotes the realizations of realized volatility in region
k. Hence, the regions can be interpreted to represent areas of relative homogeneity of realized volatility. This region-formation process then proceeds in a recursive and hierarchical way until every leaf contains a minimum number of observations of realized volatility or some maximal tree size is reached (a researcher defines these “hyperparameters” in advance).
An individual regression tree clearly represents a complex hierarchical object, and this object highly depends on the data on which it is grown. A complex individual regression tree, thereby, is likely to have poor predictive ability when it is applied to new data. A random forest overcomes this problem of individual regression trees in three ways. First, a random forest is a forest and as such consists of many individual trees. Second, the individual trees that make up a random forest are random regression trees. A random regression tree differs from a standard regression tree in that it is grown by choosing a random subset of the predictors in every step of the region-formation process. Third, every single random regression tree that is a member of a random forest is estimated on a bootstrapped sample of the data [
49]. Averaging predictions across many random regression trees stabilizes predictions, where choosing a random subset of the predictors mitigates the influence of individual influential predictors, and bootstrapping the data decorrelates the predictions from individual random regression trees.
Importantly, bootstrapping the data has the further advantage that a random forest can easily by used to compute predictions of realized volatility based on the hold-out (or, in the terminology of the machine learning literature, the out-of-bag) data. In our empirical research, we use these out-of-bag predictions to study the predictive value of fundamentals and sentiment for subsequent realized volatility. Studying out-of-bag predictions has the advantage that a random forest (or, in our case, a quantile random forest) is grown by sampling from the full sample of data. Hence, studying the predictive value of fundamentals and sentiment by means of the out-of-bag predictions use information from the entire sample. At the same time, the out-of-bag predictions, unlike conventional predictions obtained from exploiting the information in the full sample of data, are not merely in-sample predictions, but they can be interpreted as quasi “out-of-sample” predictions obtained by averaging the predictions of random regression trees on hold-out (“test”) data. In a sense, the out-of-bag predictions can be interpreted to blend elements of in-sample and pure out-of-sample testing, while the latter is often considered as an ultimate test of predictive ability, the former may have a higher power and, therefore, may be more credible than the latter [
50].
Building on the research on random forests, Ref. [
40] developed quantile random forests. Intuitively, the basic idea is that a quantile random forest stores not only information on the mean of realized volatility at the leaves (as a conventional regression tree does) but rather keeps all observations of realized volatility, and then uses this information to compute an estimate,
, of the conditional distribution function of realized volatility. The
-quantile of the conditional distribution function is defined such that the probability that realized volatility is smaller than
, given
, is equal to
, with an estimate of the
-quantile being computed as
.
We used the R language and environment for statistical computing [
51] to carry out our empirical research, where we used the R add-on package “grf” [
52] to estimate the quantile-random-forest versions of our long-horizon prediction models (using the “quantregForest” developed by [
53] gave qualitatively similar results; not reported for reasons of space), where we used 2000 trees to form a random forest, and largely used default values for the other hyperparameters.
We estimated various versions of our long-run prediction models: models that feature only autoregressive terms, models that feature only fundamentals/sentiment as predictors, models with feature both fundamentals and sentiment as predictors, and models that rely on only WSJ fundamentals or only NYT fundamentals as predictors. We used these different versions of our long-run prediction models to trace out the contribution of fundamentals versus sentiment to the predictive performance of the models. To this end, we plugged the out-of-bag predictions of realized volatility implied by the various long-run prediction models into a relative performance statistic (see also [
54,
55]). The relative performance,
, statistic accounts for the performance, given the quantile being analysed, and is given by
where
again denotes the check function,
denotes the prediction error implied by a benchmark model, and
denotes the prediction error implied by a rival model. Equation (
3) makes clear that, given a quantile, the rival model performs better than the benchmark model when
, while the benchmark model dominates the rival model when
. It should be noted that, we evaluated out-of-bag predictions under the loss (check) function. Hence, the relative performance statistic is a metric of the relative predictive value of the benchmark and the rival model at the quantile being studied in terms of a loss-function-weighted sum of absolute prediction errors [
54]. It follows that the relative performance statistic is a quantile-specific, local measure of relative predictive performance rather than a global measure evaluated over the entire conditional distribution of realized volatility [
55].
4. Empirical Results
We summarize the results for the relative performance statistic for our baseline scenario in
Table 1 for five different quantiles,
, and four different prediction horizons,
. We computed the relative performance statistic based on the out-of-bag predictions of the quantile-random-forest long-run prediction models. Panel A summarizes results for the NYT fundamentals, while panel B depicts the results for the WSJ fundamentals. In the baseline scenario, we use a simple first-order autoregressive model as our benchmark model. A positive relative performance statistic indicates that the rival model, given in the first column of the table, outperforms the benchmark model. The relative performance statistic is positive throughout all model specifications, quantiles, and prediction horizons (with only one minor exception). The message to take home, thus, is that the fundamental-/sentiment-based models outperform the benchmark model, where this effect is stronger for the fundamental-based model than for the sentiment-based models. Hence, fundamentals tend to matter more for the out-of-bag predictive performance than sentiment. We also observe, for both fundamentals and sentiment, that the relative performance statistic increases in the prediction horizon for the quantiles beyond the median. When the model features fundamentals as predictors, its relative performance also improves when we study the median. It should be noted that the MAIs and the NSI are also available at a daily frequency. Given this, following the extant literature on the South African stock market, we first obtained daily estimates of conditional volatility using GARCH models over the period from 1 June 1980 to 31 December 2020, and 1 January 1984 to 31 December 2020 to correspond to the MAIs derived from the NYT and WSJ. Complete details of the parameter estimates of the GARCH model are available upon request from the authors. Then, in the next step, we conducted a quantile-based bivariate causality test as outlined in [
56], with the results reported in
Table A1 at the end of the paper (
Appendix A). The results are consistent with our results based on
, different (standardized) fundamentals tend to consistently produce higher predictability (shown by higher values of the standard normal test statistic) compared to the (standardized) NSI for all the considered quantiles of 0.05, 0.25, 0.5, 0.75, and 0.95 for the WSJ data, and barring the quantile of 0.05 when we use the NYT-based MAIs.
In
Table 2, we report the results of a direct comparison between the models of that feature, on the one hand, fundamentals and, on the other hand, sentiment (in addition to an autoregressive term). (In
Table A2, at the end of the paper, we report results of an analysis when the sample period starts in 1994, as that is when democratic South Africa came into being, with various international restrictions deregulated. The general picture that emerges from the results for the shorter sample period is the same as conveyed by the results in
Table A2.) The negative relative performance statistic for the comparison of the fundamentals with the sentiment model shows that the predictive ability of the former clearly dominates that of the latter, where this result grows stronger for the long prediction horizons. This does not mean, however, that sentiment does not carry any predictive value. In fact, we observe a mainly positive relative performance statistic when we compare the fundamentals model with the fundamentals-come-sentiment model, while the positive relative performance statistic is smaller in absolute terms than when we compare the fundamentals model with the sentiment model, the results show that sentiment adds some incremental performance relative to fundamentals alone, which likely reflects that the QRF model captures interaction effects between both types of predictors. Not surprisingly, the fundamentals-come-sentiment model also performs better than the sentiment model, where the margin by which the fundamentals-come-sentiment model outperforms the sentiment model is larger than that we observe when we compare the fundamentals-cum-sentiment model with the fundamentals model. It is also interesting to observe that, when we compare fundamentals with sentiment, the relative performance statistic exhibits a U-shaped pattern across the quantiles for the short predictive horizons, which turns into an inverted U-shaped pattern for the long predictive horizons. Hence, fundamentals contribute more to relative predictive model performance at the upper and lower quantiles when one focuses on the short predictive horizons, while their relative contribution centres at the median for the long predictive horizon.
The results we report in
Table A3,
Table A4 and
Table A5 at the end of the paper (
Appendix A) demonstrate that our baseline results carry over to “good” and “bad” realized volatility, and to a model that features ten autoregressive lags of realized volatility in the array of predictors. We computed good realized volatility based on data for days with positive returns, while bad realized volatility represents days with negative returns. Fundamentals add value to out-of-bag predictions in case of good realized volatility, where the direct comparison of the fundamental- and sentiment-based models shows that this effect is stronger for the long prediction horizon and the quantiles below the median. Moreover, the contribution to relative performance is stronger for the WSJ fundamentals than for the NYT fundamentals. For bad realized volatility, in turn, fundamentals have a large effect on out-of-bag predictive performance. Similarly, the WSJ fundamentals work better than the NYT fundamentals in the case of bad realized volatility and when we consider an the extended autoregressive benchmark model, especially so for the upper quantiles.
In
Table 3, we compare the predictive ability of the NYT with that of the WSJ fundamentals, where we control for the effects of lagged realized volatility and sentiment. The results show that the WSJ fundamentals have a stronger predictive ability over subsequent realized volatility than the NYT fundamentals, an effect that grows stronger in the predictive horizon and for the quantiles below the median. Similarly, the model that features both types of fundamentals performs better than the model that only features the NYT fundamentals. Nonetheless, NYT fundamentals also have some incremental predictive value, as the results for a comparison of the model that features both types of fundamentals in its array of predictors with the model that only features the WSJ fundamentals demonstrates.
In
Table 4, we report the results of two robustness checks. Specifically, we report results for realized volatility (that is, we do not study its natural logarithm) and squared realized volatility, that is, realized variance. The robustness checks, which are based on the WSJ fundamentals, corroborate our main result. Fundamentals have a stronger predictive value than sentiment, especially when we consider the longer predictive horizons (interestingly, this effect is strongest at the upper quantiles), but adding sentiment to the array of predictors further improves the incremental predictive value of the model on out-of-bag data.
While our main results are based on out-of-bag predictions, we summarize some out-of-sample results in
Table A6 at the end of the paper. In order to keep the analysis simple, we split the sample into an in-sample and and out-of-sample part. We then estimated the models on the in-sample part of the data, and used the out-of-sample part to make predictions of realized volatility. We used the last 15, 20, and 25% of the data for the out-of-sample part. The results show that the NYT fundamentals outperform sentiment when we use 20 or 25% of the data for the out-of-sample analysis. The results for an out-of-sample proportion of 15% are mixed. The results for the WSJ fundamentals are mixed as well. The WSJ fundamentals perform best relative to sentiment at the short predictive horizon. Importantly, the negative relative performance statistic reveals that, when we study the WSJ data (and to a lesser extent also for the NYT data), fundamentals dominate sentiment at the upper 95% quantiles of realized volatility, that is, our results imply that investors should take into account fundamentals when trying to compute upper tails (peaks) of realized volatility.
A reader also may wonder whether the results we obtained for South Africa are representative for the other BRIC countries, with the underlying daily data for the stock markets also derived from Refinitiv Datastream. The results for Brazil, China, India, and Russia, that we report in
Table 5, show that this is indeed the case over the periods of July 1994–December 2020, August 1991–December 2020, January 1990–December 2020, and January 1998–December 2020. For estimation of the BRIC models, we used the WSJ fundamentals, but the results for the NYT fundamentals were qualitatively similar (see
Table A7 at the end of the paper), while the details differ across the BRIC countries, we observe, as in the case of South Africa, that fundamentals have a stronger out-of-bag predictive value than sentiment, and that the contribution of fundamentals to relative out-of-bag predictive performance of the model being studied increases in the predictive horizon. Moreover, the relative out-of-bag predictive performance of a model that features both fundamentals and sentiment tends to be stronger when we compare such a comprehensive model with a model that features only sentiment than when we use a model that features fundamentals as our benchmark model.
Overall, we find that:
While both fundamentals and sentiment have predictive value the relative impact of the former is stronger than that of the latter;
The importance of accounting for fundamentals and sentiment varies across the quantiles of the conditional distribution of realized volatility, which motivates the quantiles-based approach we have used in our research;
The importance of accounting for fundamentals and sentiment also varies across different prediction horizons that we have studied, where fundamentals are more important at the extreme quantiles when we consider short horizons, and at the median when we study the long-run horizon;
Robustness checks corroborate our main findings. Importantly, our results for South Africa tend to carry over to the BRIC countries (Brazil, China, India, and Russia).
Having summarized our main findings, some comparisons can be drawn with a couple of related studies, namely) [
8] and [
41], though it must be realized that one-to-one correspondence is impossible with ours being the first paper to predict realized volatility of South Africa and the BRICS countries using quantile-based machine learning methods, applied to wide array of newspaper-derived predictors involving fundamentals and sentiments associated with the US economy. In general, we can conclude that, just like in [
8], we can highlight the importance of US news on fundamentals, beyond unemployment and inflation, in defining the path of South African stock market volatility. These authors, however, rely on a GARCH model, and are not able to study the entire conditional distribution of realized volatility, as we do using quantile random forest, while we find that fundamentals matter more than sentiment in predicting the realized volatility of the BRICs stock markets, we also highlight the importance of the combined role of these predictors. This finding is actually in line with [
41], who highlighted that combined information from economic policy uncertainty (EPU) with MAIs tends to perform better in terms of predicting stock market realized volatility of advanced economies, i.e., the G7, compared to models with just the fundamentals. It should be noted that the similarity arises when one recognizes that sentiment and uncertainty are strongly negatively correlated. In other words, for the predictability of risk involving both developed and emerging stock markets, news on macroeconomic variables and behavioural decisions contain complementary information. However, then again, unlike [
41], who used conditional mean-based predictive regressions, our results are state-specific, and hence can be regarded as being more informative when dealing with realized volatility of the South Africa and the BRIC countries.
5. Concluding Remarks
Given that significant early research has established the importance of macroeconomic news of the US on international stock market volatility, we utilize quantile random forests, a machine learning technique, to predict the realized volatility of the South African stock market—one of the world’s major commodity exporters. In this regard, we use newspaper-based fundamentals and sentiments associated with the US economic performance, while some evidence exist on the role of US fundamentals in driving South African stock market volatility using GARCH models, we go beyond this research by considering information on a wider array of fundamentals, as well as sentiment, via the usage of a more robust, unconditional and observable measure of volatility, i.e., realized volatility, and utilizing a sophisticated quantile-based machine learning approach (random forest) to draw inferences. The econometric framework is capable of capturing complex non-linear and interaction effects in a natural way from the predictors, while simultaneously tracing out their predictive value along the quantiles of the conditional distribution of realized volatility, which corresponds to alternative states (levels). While the focus is on South Africa, we also compare our results with Brazil, China, India and Russia, i.e., the entire BRICS bloc. In the process, we make the first attempt to compare the importance of US fundamentals versus sentiment in predicting realized volatility of stock returns of the BRICS using a quantile-based machine learning technique. In this regard, we build on a similar investigation performed for the G7 countries, but which is limited to a conditional-mean-based predictive model.
The results of our empirical research shed light on how the impact of fundamentals and sentiment on predictive performance varies across the quantiles of the conditional distribution of realized volatility, and across different prediction horizons. A major finding is that, while both fundamentals and sentiment improve predictive performance, the relative impact of fundamentals outweighs that of sentiment. More specifically, fundamentals matter more at the extreme quantiles at short predictive horizons, and at the median in the long-run. Our results are robust to sample periods and alternative definitions of realized volatility. Finally, results for the BRIC countries corroborate the major empirical observations of South Africa.
Clearly, our findings are of obvious importance for investors looking for portfolio diversification involving investments in emerging stock markets. In particular, the information contained in US fundamentals can be used relatively efficiently compared to sentiment in accurately predicting realized volatility of emerging markets, while serving as a key input to investment decisions and portfolio choices. However, results are contingent on the state of the market, with the extremes/tails predictable in the short-run, and the median (normal) state in the long-horizon. At the same time, with stock market volatility also often interpreted as a measure of financial uncertainty, its predictability, based on the information contained in US fundamentals, should assist the South African policy authorities to design appropriate monetary and fiscal policy responses to prevent possible future recessions, especially if the future path of stock market volatility is expected to rise and produce a theoretically consistent reduction in economic activity [
57].
As part of future research, contingent on the availability of newspaper-based macroeconomic attention and sentiment indexes for South Africa, it would be interesting to compare the role of such domestic predictors with those of the US considered in this paper, though it is likely that the latter will contain leading information for the former. Furthermore, in light of the recent emphasis on climate finance [
58], one could also compare US news on uncertainty surrounding climate risks with the fundamentals and behavioural predictors. Some preliminary evidence in this regard, using the WSJ-reliant metric of climate risks of [
59] (WSJ_Engle) over January 1984–June 2017; and the multiple newspaper-based climate policy uncertainty (CPU) index [
60] for the period of April 1987–August 2022, the environmental policy (EnvP) index (as well as two sub-topic indexes for renewable energy policy (EnvP_REP) and international climate negotiations (EnvP_ICN)) over January 1981–March 2019, and environmental policy uncertainty (EnvPU) index covering October 1990–March 2019 (with the last four indexes developed by [
61]), depicting evidence of predictability over the conditional distribution of the South African stock market
, as derived from the causality-in-quantiles test of [
56]. The evidence of in-sample prediction from these indexes of the US is particularly strong around the median, as can be observed from
Table A8 in the
Appendix A. Furthermore, when compared between good and bad realized volatilities, the test statistic is generally higher under the latter, indicative of the negative impact of climate risks on conventional stock returns.
At this stage, it is important to point out a possible limitation of our work, while newspaper-based indexes dealing with fundamental-related information have the advantages of being possibly exogenous, and are also able to measure otherwise latent variables, such as sentiments, news on these topics can itself be biased due to the possibility of the media being managed. Having said this, the chances of such concerns emanating from our reliable news sources are less likely, but reliance on actual values of fundamentals especially may be a viable route to undertake, though this is often involved with publication lags, low-frequency, and endogeneity concerns.