*Article* **Unexpected Information Demand and Volatility Clustering of Chinese Stock Returns: Evidence from Baidu Index**

#### **Gang Chu 1, Xiao Li 2,\*, Dehua Shen <sup>1</sup> and Yongjie Zhang <sup>1</sup>**


Received: 20 November 2019; Accepted: 26 December 2019; Published: 28 December 2019

**Abstract:** This paper employs the Baidu Index as the novel proxy for unexpected information demand and shows that this novel proxy can explain the volatility clustering of Chinese stock returns. Generally speaking, these findings suggest that investors in China could take advantage of the Baidu Index to obtain information and then improve their investment decision.

**Keywords:** volatility clustering; Baidu Index; information demand; generalized autoregressive conditional heteroscedasticity model (GARCH); mixture of distribution hypothesis

#### **1. Introduction**

Recently, scholars have begun to employ Internet news as the proxy for information flow to explain the volatility clustering of stock returns. For example, Zhang et al. firstly employed the number of news which appeared in Baidu News as the novel proxy for information arrival and showed that this proxy could explain the volatility clustering of the SME PRICE INDEX [1]. Based on the Mixture of Distribution Hypothesis (MDH), Shen et.al. [2] further showed that this novel proxy could also explain the volatility clustering for individual stocks. This Internet news-based proxy has gained increasing popularity and is used in various empirical studies, such as in [3–6], among others. However, the key drawback of this Internet news-based proxy is that media outlets may not play a role in diffusing information, and the observed phenomenon, such as reduced volatility clustering, is driven by investor sentiment or psychological biases [4,7,8]. In this paper, we construct a novel proxy for unexpected information demand based on the search frequency of the Baidu Index, and show that this novel proxy could explain the volatility clustering of stock returns in the Chinese stock market.

In that sense, the contribution of this paper is twofold. Firstly, unlike prevailing studies employing trading volume as the proxy for information flow [9–13], we advocate a novel proxy by calculating the unexpected information demand with the Baidu Index. The rationale to employ the Baidu Index is that: Zhang et al. [14] claims that compared with Google Trends, the Baidu Index provides more scientific, authentic, and objective data, and the search results are given at daily frequency. In particular, our results show that the Baidu Index explains more volatility clustering compared to the studies relying on trading volume as the proxy for information flow, such as those by [10] and [12]. Secondly, we provide stock-level evidence that Internet information could explain volatility clustering by focusing on 40 stocks in the Chinese stock market. To the best of our knowledge, we are the first to employ the Baidu Index to explain the volatility clustering at stock level.

The remainder of this paper is structured as follows: Section 2 describes the data and variables construction; Section 3 gives the research methodology; Section 4 gives the empirical results; and Section 5 presents the conclusions.

#### **2. Data and Variables Construction**

We used the daily stock closing price over the whole period of 1 January 2015 to 31 December 2018 from the China Stock Market & Accounting Research Database (CSMAR). The following model was used to calculate the daily return of stocks:

$$Return\_{i,t} = \frac{(Closing\ Price\_{i,t} - Closing\ Price\_{i,t-1})}{Closing\ Price\_{i,t-1}}\tag{1}$$

where the *Returni*,*<sup>t</sup>* represents the return of stock *i* on day *t*, and the *ClosingPricei*,*<sup>t</sup>* represents the closing price of stock *i* on day *t*.

In this paper, we used keyword search volume data from the Baidu Index instead of Google Trend. The Baidu search engine is the biggest search engine in China, and we collected the search volume time series data from the website (https://index.baidu.com). The abnormal change of the Baidu search volume represents the unexpected information demand. We followed Drake, Roulstone, and Thornock [15] to define the abnormal search volume:

$$\overline{BSVI}\_{i,t} = \frac{1}{10} \sum\_{k=1}^{10} BSVI\_{i,t-7 \times k} \tag{2}$$

$$AcbSearch\_{i,t} = \frac{BSVI\_{i,t} - \overline{BSVI\_{i,t}}}{\overline{BSVI\_{i,t}}} \tag{3}$$

where, *BSVIi*,*<sup>t</sup>* represents the Baidu search volume of stock *i* on day *t*. *AbSearchi*,*<sup>t</sup>* represents the abnormal search volume of stock *i* on day *t*. We defined *AbSearchi*,*<sup>t</sup>* as that Baidu search volume (*BSVI*) on day *t* for stock *i* less the average *BSVI* for the same stock and weekday over the previous 10 weeks, and divided it by the average *BSVI* for the same stock and weekday over the previous 10 weeks.

We randomly selected 40 stocks from the whole stock market which have a significant autoregressive conditional heteroscedasticity model (ARCH) effect. Figure 1 illustrates the daily return, the autocorrelation coefficient, and the partial correlation coefficient of SHENZHEN ZHENYE (GROUP) CO., LTD (Shenzhen, China) (000006.SZ). We found that the autocorrelation coefficient and the partial correlation coefficient are significantly different from zero (the value exceeds the confidence level), which show that the return time-series of stock 000006.SZ has a significant correlation.

To examine the ARCH effect in residuals, we used two different tests—The Ljung-Box-Pierce Q squared residual correlation diagram and the ARCH Lagrange Multiplier (LM) test. We used Ljung-Box-Pierce Q to investigate the autocorrelation and partial correlation for the squared residuals of the mean equation. Table 1 reports that the Ljung-Box-Pierce Q test is statistically significant at the 5% level in 5-order, 10-order, 15-order, and 20-order lags for the 40 stocks. It denotes that there is significant autocorrelation for all 40 stocks and indicates a significant ARCH effect in the residuals of the mean equation.

**Figure 1.** This figure shows the daily return, the autocorrelation coefficient, and the partial correlation coefficient of SHENZHEN ZHENYE(GROUP) CO., LTD (000006.SZ).


**Table 1.** The results of Ljung-Box-Pierce Q.


**Table 1.** *Cont.*

Notes: \*, \*\* and \*\*\* denotes statistical significance at the 10%, 5%, and 1% levels, respectively.

The ARCH Lagrange Multiplier LM test was calculated by an auxiliary test regression and used to test the heteroscedasticity of the time-series. Table 2 reports that the LM values are statistically significant at the 5% level in 5-order, 10-order, 15-order, and 20-order lags for the 40 stocks. It indicates the existence of an ARCH effect in the residuals sequence. Hence, the Generalized ARCH (GARCH) model is appropriate to use for all the 40 stocks.


**Table 2.** The results of the GARCH test.


**Table 2.** *Cont.*

Notes: \*\*\* denotes statistical significance at the 1% level.

The results of the Ljung-Box-Pierce Q and ARCH Lagrange Multiplier (LM) tests show that there is serious heteroscedasticity and autocorrelation on returns of the stock, and the GARCH(1,1) model fits the data well. We used GARCH(1,1) to calculate the daily return volatility. The GARCH(1,1) model is as follows:

$$
\varepsilon\_t = \sqrt{h\_t} \nu\_t \tag{4}
$$

$$h\_t = \alpha\_0 + \beta\_1 h\_{t-1} + \alpha\_1 \varepsilon\_{t-1}^2 \tag{5}$$

Table 3 reports the Pearson and Spearman correlation coefficients between daily return volatility and the Baidu search volume. This table suggests that there is positive significant contemporaneous correlation between daily return volatility and Baidu search volume in all 40 stocks. Furthermore, the mean of the Pearson correlation coefficients is 0.6428, and the mean of the Spearman correlation is 0.6398, which denote that these two variables are highly correlated.

**Table 3.** Contemporaneous correlations between daily return volatility and the logarithm value of Baidu search volume index (BSVI). This table represents the contemporaneous correlation coefficients between daily return volatility and the logBSVI. The daily return volatility was evaluated by GARCH(1,1), and the BSVI was downloaded from Baidu website (http://index.baidu.com/). "Pearson" denotes the Pearson correlation coefficients and "Spearman" denotes the Spearman correlation coefficients.



**Table 3.** *Cont.*

Notes: \*\*\* denotes statistical significance at the 1% levels.

To further consider the relation between the daily return volatility and Baidu search volume, we introduced another direct measure, namely, mutual information. Mutual information is a useful indicator in information theory to measure relative information, and it is widely used to measure the correlation between two different variables. To measure the correlation between two equal length time series {*xt*} and *yt* , *t* = 1, 2, 3, ... , *N*, we computed the mutual information between these two time series, as follows:

$$MI(X,Y) = \int\_{Y} \int\_{X} p(\mathbf{x},y) \log \left( \frac{p(\mathbf{x},y)}{p(\mathbf{x})p(y)} \right) d\mathbf{x} dy \tag{6}$$

where *p*(*x*, *y*) is the joint probability density distribution function of *X* and *Y*; *p*(*x*) is the marginal probability density distribution function of *X*, and *p*(*y*) is the marginal probability density distribution function of *Y*.

Table 4 represents the mutual information between the daily return volatility and abnormal Baidu search volume. All 40 stocks showed a positive value of mutual information, and the mean of mutual information is 0.7106, which denotes that these two variables are highly correlated. The empirical results clearly support that there is a significant correlation between the Baidu index and daily return volatility.

**Table 4.** The mutual information between the daily return volatility and abnormal Baidu search volume. This table reports the mutual information between the daily return volatility and abnormal Baidu search volume. The daily return volatility is the GARCH(1,1) volatility of Bollerslev [16], and the abnormal Baidu search volume (*AbSearch*) is calculated by Model 3.


#### **3. Methodology**

In the time series financial model, the disturbance variance is often found to be less stable. The conditional variance of the error term usually varies with time and relies on the magnitude of the previous errors. In order to solve the heteroscedasticity issue, Bollerslev [16] proposed the generalized autoregressive conditional heteroscedasticity model (GARCH), which is designed to deal with the volatility persistence and describe how the amplitude of return varies over time. In this paper, the GARCH(1,1) model was adopted due to the fact that it has been shown to be suitable to deal with conditional variance that fits many financial time series quite well [16,17]. The GARCH model can be described by the following models:

$$R\_t = \mu + \varepsilon\_t,\text{ where } \varepsilon\_t | \Omega\_{t-1} \sim \begin{pmatrix} 0, h\_t^2 \end{pmatrix} \tag{7}$$

$$
\hbar \mathbf{h}\_t^2 = \omega + \alpha \varepsilon\_{t-1}^2 + \beta \mathbf{h}\_{t-1}^2 \tag{8}
$$

where *Rt* represents the stock return at day *t*. μ is a constant, ε*<sup>t</sup>* represents the serially uncorrelated errors, and *h*<sup>2</sup> *<sup>t</sup>* represents the conditional variance of the ε*t*. The sum of the coefficients α and β indicates the degree of volatility persistence.

The Baidu search volume index (SVI) is an ideal proxy for information demand because this variable reflects effort by the investor to obtain firm-specific financial information. The abnormal search volume (*AbSearch*) represents investors' demand to search for information. Clark [18] proposed the Mixture of Distributions Hypothesis (MDH), and believes that the price time varying conditional volatility is associated with the information flow. According to the MDH, we made a rational assumption that introducing a proxy of information arrival into the variance model will decrease the observed volatility clustering. Therefore, we proposed an extended model that contains an abnormal Baidu search volume, which can be written as follows:

$$h\_t^2 = \omega + \alpha \varepsilon\_{t-1}^2 + \beta h\_{t-1}^2 + \lambda A b \text{Search}\_t \tag{9}$$

If the assumption is correct, the volatility persistence, represented by α + β, should be significantly reduced in comparison with the benchmark model, that is, the original GARCH(1,1) model.

#### **4. Empirical Results**

We firstly focus on the estimation results of the benchmark GARCH(1,1) model. In an unreported table, both the coefficients α and β are statistically significant at the 1% level. The sum of the coefficients α + β range from 0.998455 to 0.769458 with a mean value of 0.904028. Figure 2 illustrates the residuals, standardized conditional variance, and standardized residuals of SHENZHEN ZHENYE (GROUP) CO., LTD (000006.SZ). We find that the benchmark model fits the volatility dynamic quite well. Table 5 presents the estimation results of the extended model that contains *AbSearch*. All the coefficients α, β, and γ of the extended GARCH(1,1) model are statistically significant at the 1% level. The sum of the coefficients α + β range from 0.87521 to 0.489454 with a mean value of 0.698305. Table 6 reports the summarized results for the degree of volatility clustering, indicating that α + β is significantly decreased. In particular, we found that after incorporating the proxy for the unexpected information demand, the sum of the coefficients α + β dropped significantly with an average of 0.205723. All these findings suggest that the GARCH(1,1) model captures the volatility clustering well, and the unexpected information demand was able to explain the volatility clustering.

**Figure 2.** This figure shows the residuals, standardized conditional variance, and standardized residuals of SHENZHEN ZHENYE(GROUP) CO., LTD (000006.SZ).


**Table 5.** Estimates of extended GARCH (1,1) model.


**Table 5.** *Cont.*

Notes: \*and \*\*\* denotes statistical significance at the 10% and 1% level, respectively.

**Table 6.** Improvement by the extended model.


Notes: \*\*\* denotes statistical significance at the 1% level.

#### **5. Conclusions**

This paper employed the Baidu search volume index (*BSVI*) as the novel proxy for unexpected information demand and validates the MDH. *BSVI* represents investors' searching behavior through the channel of Baidu, which is the largest search engine in China. In that sense, *BSVI* is a suitable proxy for the information demand. To test the contemporaneous correlation, we employed the Pearson and Spearman correlation coefficients, as well as the mutual information between BSVI and returns and volatiles. The empirical results based on the GARCH(1,1) model reveal a positive and significant impact of the abnormal Baidu Search volume on the conditional volatility of stock return. Generally speaking, these findings suggest that investors in China could take advantage of the Baidu Index to gather information about the stock market and then improve their financial decision-making. For example, investors could employ the high-frequency news to "nowcast" the return volatility, and thus make the optimal investment decision.

**Author Contributions:** Conceptualization, D.S.; Formal analysis, Y.Z.; Funding acquisition, D.S. and Y.Z.; Methodology, G.C. and X.L.; Software, G.C.; Writing—Original Draft, X.L.; Writing—Review & Editing, X.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work is supported by the National Natural Science Foundation of China (71771170, 71801136 and 71701150) and the Young Elite Scientists Sponsorship Program by Tianjin (TJSQNTJ-2017-09).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
