**3. Methodology**

#### *3.1. Indices and Models*

Many di fferent indicators have been proposed as investor sentiment index. Additionally, there are several di fferent measurement mechanisms to build it. They can be divided mostly into two macro-categories: direct and indirect measures. Direct measures are all the indices, where the data are obtained through surveys conducted to consumers, investors or other agents, who explicitly give a response and their sentiment towards some specific questions and issues. The indirect measure is, instead, a financial or pure mathematical index used as a proxy to define the new sentiment indicator.

In the surveys, investors usually divide into bull, neutral, or bear. Alternatively, they are asked to express an opinion through numbers indicating high or low expectations. Some examples are the American Association of Individual Investors (AAII), which o fficially conducts and publishes surveys on investors; the Conference Board Consumer Confidence Index, which elaborates the surveys on individuals' expectations about issues in macroeconomics; and others that can deal with businesses or industrial sectors.

The literature provides many example of indirect measurements that can be assumed as sentiment indices. The more applied are: the IPOs, the number and average of first-day returns on Initial Public O fferings; NYSE turnover, measuring trading volume; CEFD, closed-end fund discount, since it seems to be inversely correlated to sentiment; dividend premium, which is the di fference between average market-to-book ratios of payers and non-payers. All these proxies are considered as subject to sentiment, even though with probably di fferent timing. Consequently, Baker and Wurgler, and Huang et al. (Baker and Wurgler 2006; Huang et al. 2014) combine more of these proxies to create one unique index.

Huang et al. (Huang et al. 2014) and before Baker and Wurgler (Baker and Wurgler 2006, 2007) study how the investor sentiment works and which factors are its constituents. Both indices are constructed from the same set of variables. Both the BW investor sentiment, created by Baker and Wurgler (Baker and Wurgler 2006, 2007), and the aligned one (here-hence denominated as SPLS), created by (Huang et al. 2014), are obtained from the following six individual sentiment proxies:


In constructing the sentiment index, Huang et al. (Huang et al. 2014) and Baker and Wurgler (Baker and Wurgler 2006) use equal structure and the same choice of proxies (see above). The reference equation to create investor sentiment is written as follows:

$$\text{Sent}\_{\text{f}} = \text{CEFD}\_{\text{l}} \, \beta\_1 + \text{TURN}\_{\text{l}} \, \beta\_2 + \text{NIPO}\_{\text{l}} \, \beta\_3 + \text{RIPO}\_{\text{l}} \, \beta\_4 + \text{PDND}\_{\text{l}} \, \beta\_5 + \text{EQII}\_{\text{l}} \, \beta\_6 \tag{1}$$

Baker and Wurgler (Baker and Wurgler 2006) apply a first principal component, Huang et al. (Huang et al. 2014) prefer the partial least squares. According to (Huang et al. 2014), PC fails to produce significant forecasts because it can accumulate approximation errors coming from parts of the variations of the proxies. Hence, every one of the aforementioned proxy is moved on average with six months smoothing, standardised, and elaborated upon other regressions on industrial production, durable, and nondurable consumption, service consumption, employment, and a series of dummy variables in order to reduce the business cycle variation. In addition, the residuals coming from these regressions are used as proxy to be combined to build a new investor sentiment index. This procedure is the orthogonalisation to macro variables in order to compensate for systematic risk and to prevent high correlations, if the raw data are conditioned from macroeconomic factors.

Then, Huang et al. (Huang et al. 2014) apply a linear regression model where they regress sentiment indices at time t to predict returns at t + 1. We extend the regression in two directions. First, we include in the linear regression a set of control variables. Indeed, investor sentiment indices could proxy other information and we control for it. The resulting model is:

$$\mathbf{R}\_{\mathbf{t}+1} = \alpha + \beta \operatorname{Sent}\_{\mathbf{t},\mathbf{k}} + \delta \mathbf{X}\_{\mathbf{t}} + \varepsilon\_{\mathbf{t}+1}, \varepsilon\_{\mathbf{t}+1} \sim \text{i.i.d.} \{0, \sigma^2\}, \mathbf{k} = 1, \ldots, \mathbf{K} \tag{2}$$

where Rt+1 is the excess market return at time t + 1, Sentt,k is the investor sentiment at time t, and k is one of the K alternative investor sentiment indices, Xt is a set of predictors described in the next section. Second, we apply Bayesian inference. Barberis (Barberis 2000), Kandel and Stambaugh (Kandel and Stambaugh 1996), and Hodrick (Hodrick 1992) are among the first papers to advocate the use of Bayesian inference for investigating stock market predictability. Bayesian inference allows to set priors such as that the posterior distribution of the parameters of the predictive return regression can better learn from the data. This is particular useful when the sample size is small and priors help to reduce parameter uncertainty. Moreover, priors can be set to improve long-term asset allocation and to remove biases. Recently, Pettenuzzo, Timmermann, and Valkanov (Pettenuzzo et al. 2014) documented that economic constraints based on prior beliefs systematically reduce uncertainty about model parameters, reduce the risk of selecting a poor forecasting model, and improve both statistical and economic measures of out-of-sample forecast performance. We apply a normal-inverted gamma prior for our linear regression. We set prior mean values equal to OLS estimates and large prior variance values to keep the likelihood dominant on the prior. Degrees of freedom are set equal to 10% of the sample size. Our priors result in a closed form solution for parameter posteriors and predictive distributions. Precisely, the parameters β will follow a Student's *t* posterior distribution and the predictive density will also be *t*-Student distributed. See (Koop 2003) for exact values.<sup>1</sup> The estimation is run recursively.

<sup>1</sup> We also investigate uniform flat priors. For the US example the results are almost identical; for the EU exercise we find large parameter uncertainties and lower forecast accuracy.

Up to the last observation posterior distributions and predictive densities are computed to predict the following value. In the next period, when new data are available, the process is repeated to obtain further predictions.
