2.2.1. Spread Proxies

We choose three spread proxies—Roll's (1984) spread (*ROLL*), Hasbrouck's (2009) Gibbs estimate (*HASB*), and Lesmond et al.'s (1999) *LOT* measure. Our choice of the proxies is based on whether the measures are commonly used and are relatively easy to estimate. Roll (1984) develops a spread measure that is based on serial covariance in daily returns. Roll's spread is defined as:

$$ROLL = \begin{cases} \ 2\sqrt{-COV(r\_t, r\_{t-1})} , & \text{if } COV(r\_t, r\_{t-1}) \le 0 \\\ \ 2\sqrt{COV(r\_t, r\_{t-1})} , & \text{if } COV(r\_t, r\_{t-1}) > 0 \end{cases}$$

where *COV* denotes covariance, and *rt*, *rt*−1 are daily returns on day *t* and *t* − 1, respectively. In accordance with the study by Lesmond (2005), we calculate the spread separately depending on the sign of the serial covariance. This is to avoid the problem of a negative serial return covariance, resulting in an undefined spread.<sup>3</sup>

The next proxy we use is a spread estimate that is generated numerically using the Gibbs sampler, a simulation procedure based on the Markov Chain Monte Carlo simulation technique. Hasbrouck (2009)

<sup>3</sup> Roll (1984), and Goyenko et al. (2009) assign 0 to the value of the spread when the covariance is negative.

applies the Bayesian Gibbs sampling method to compute the effective costs of trading based on the following variant of Roll's (1984) model:

$$r\_t = c\Delta q\_t + \beta\_m r\_{mt} + \mu\_{t\prime}$$

where *rt* is the change in observed trade prices on day *t*, Δ*qt* is the change in trade directions from *t* − 1 to *t* (i.e., *qt* − *qt*−1), and *rmt* is the market return on *t*. The last term, *μ<sup>t</sup>*, is an innovation in the efficient price *m* (i.e., *mt* − *mt*−1) or the change in the efficient price due to the arrival of new public information. The model has two parameters *c* and *βm* along with latent data on trade direction indicators *q* = {*q*1,..., *qT*}, and efficient prices *m* = {*<sup>m</sup>*1,..., *mT*}. We use the programs that are available on Hasbrouck's website.<sup>4</sup> The estimated parameter *c* is the half spread that is implied by the model. Thus, our spread proxy for the round-trip spread is:

$$HASB = 2c.$$

Our third spread proxy is the *LOT* measure developed by Lesmond et al. (1999). The *LOT* measure estimates the effective spread while considering the notion that informed trading takes place only on nonzero return days. The idea behind it is simple. A zero return on a day implies that the accumulated value of information generated during the day is not large enough to justify the transaction costs imposed during the day. Lesmond et al. (1999) assume the market model as the return generating process for informed traders. Specifically, *rt*, the observed return of the firm on day *t*, and, *r*∗*t* , the unobserved true return of the firm on the same day, are given below in the framework of a limited dependent variable model:

where

$$
\tau\_t = \begin{cases}
\ r\_l^\* - \alpha\_{1\prime} & \text{if } \ r\_l^\* < \alpha\_1 \\
\ 0 & \text{if } \alpha\_1 \le r\_l^\* \le \alpha\_2 \\
\ r\_l^\* - \alpha\_{2\prime} & \text{if } \ r\_l^\* > \alpha\_1.
\end{cases}
$$

*r*∗*t* = *β* × *rmt* + *εt*,

In the above equation, *rmt* is the market return on day *t*. *εt* is the random error term representing the public information shock. *α*1 and *α*2 are the sell-side transaction cost and the buy-side transaction cost, respectively. The round-trip transaction cost for informed traders can be calculated as the gap between *α*2 and *α*1, as follows:

$$LOT = \alpha\_2 - \alpha\_1.$$

Given *rt* and *rmt*, parameters, including *α*1, *α*2, *β*, and *σ* can be estimated by maximizing the following log-likelihood function:

$$\begin{split} \ln L(\alpha\_{1}, \alpha\_{2}, \beta\_{r} \sigma | r\_{l}, r\_{mt}) &= \sum\_{\substack{r \in \mathbb{X} \text{low} = 1}} \ln \frac{1}{(2\pi \sigma^{2})^{1/2}} - \sum\_{\substack{r \in \mathbb{X} \text{low} = 1}} \frac{1}{2\sigma^{2}} \left(r\_{l} + \alpha\_{1} - \beta \times r\_{mt}\right)^{2} + \dots \\ &\sum\_{\substack{r \in \mathbb{X} \text{low} = 2 \\ r \in \mathbb{X} \text{low} = 0}} \ln \frac{1}{(2\pi \sigma^{2})^{1/2}} - \sum\_{\substack{r \in \mathbb{X} \text{low} = 1}} \frac{1}{2\sigma^{2}} \left(r\_{l} + \alpha\_{2} - \beta \times r\_{mt}\right)^{2} + \dots \\ &\sum\_{\substack{r \in \mathbb{X} \text{low} = 0}} \ln (\Phi\_{2} - \Phi\_{1}), \end{split}$$

where 0, 1, and 2 represent the regions where the measured daily return is zero, nonzero negative, and nonzero positive, respectively. *σ* is the standard deviation based on nonzero returns. Lastly, Φ1 and Φ2 are the standard normal cumulative distribution functions evaluated at Regions 1 and 2, respectively.<sup>5</sup>

<sup>4</sup> Gibbs sampler estimation programs are available at www.stern.nyu.edu/~{}jhasbrou. We draw 2000 times for each Gibbs sampler. Like Hasbrouck (2009), we discard the first 200 draws to "burn in the sampler" (Hasbrouck 2009, p. 1451). Hasbrouck points out that 1000 sweeps are sufficient to produce reliable estimates.

<sup>5</sup> Lesmond et al. (1999) also develop measures (ZEROS and ZEROS2) that are similar to, but much simpler than the *LOT* measure, utilizing zero return days. ZEROS and ZEROS2 are based on the rationale that low liquidity and less-informed

## 2.2.2. Price Impact Proxies

For price impacts, we examine three well-known low-frequency proxies, including the Amihud (2002) measure or *AMIHUD*, the Amivest measure (Cooper et al. 1985) or *AMIVEST*, and the Pástor and Stambaugh (2003) estimate *PASTOR*. *AMIHUD* captures the lack of liquidity by dividing the daily returns by the daily dollar volume. The measure shows the price shock that is triggered by a unit of dollar volume. For a given stock, *AMIHUD* is calculated as

$$AMHIID = \frac{1}{T} \sum\_{t=1}^{T} \frac{|r\_t|}{Dollar\ Volume\_t}.$$

where *T* is the number of days with trading volume and *rt* is the return on day *t*.

*AMIVEST* compares the daily returns with daily volume measured as the number of shares:

$$AMIVEST = \frac{1}{T} \sum\_{t=1}^{T} \frac{|Share\ Volumm|\varepsilon\_t}{|r\_t|}.$$

where *T* includes only the days with nonzero returns. The above two measures, even if constructed in a similar manner, differ in several aspects. For example, one uses the dollar volume, while the other uses the share volume. While *AMIHUD* represents illiquidity, *AMIVEST* shows liquidity. Besides, *AMIHUD* does not incorporate the days without trading, which contain important information regarding illiquidity. Although *AMIVEST* does not suffer from this particular limitation, it is limited in that it does not include information from days with a zero return. We use both proxies, since they complement each other.

Our third price impact proxy is a measure that was developed by Pástor and Stambaugh (2003). This measure is obtained after regression of the daily returns in excess of the daily market index returns on signed daily dollar volume. The Pástor and Stambaugh measure is calculated as the coefficient *γ*, using the following regression model:

$$r\_{t+1}^{\varepsilon} = \alpha + \beta \times r\_t + \gamma \times \text{Sign}(r\_t^{\varepsilon}) \times Volume\_t + \varepsilon\_{tr}$$

where *rt* and *ret* are a stock's return and the stock's excess return net of the market index return on day *t*, respectively. *Sign*(*ret*) is the sign of the excess return. *Volumet* is the dollar volume on day *t*. The value of the coefficient *γ* proxies for the magnitude of the price impact:<sup>6</sup>

$$\text{PASTOR} = |\gamma|.$$

### **3. Data and Sample**

This section describes the data sources and the construction of the sample that we use in this study. The data are derived from three different sources. We collect intraday trade and quote data from real time data feeds in the *Bloomberg Terminals*. The tick data contain detailed trade and quote information, including the time of quotes to the nearest second, bid and ask prices, bid and ask sizes, trade price and size in number of shares, as well as the condition codes of the bid and ask quotes. We use various filters to ensure that the trade and quote information that we use is not erroneous or affected by outliers:

(1) Quotes and transactions are used only if they are recorded during the exchange opening hours, and if the quotes or trades have positive prices and positive shares.

trading lead to a zero daily return. The result using ZEROS and ZEROS2 are slightly weaker than the results using the *LOT* measure.

<sup>6</sup> Originally, Pástor and Stambaugh (2003) used the coefficient to measure the liquidity. They anticipated the minus (−) value of the coefficient, where the lower minus value represented the lower liquidity. We take the absolute value to measure the degree of illiquidity in this study. Moreover, we confirm that the latter performs better than the former in the analyses.

	- (a) If a quote is not the first quote of the day, its price should be within the range of 50–150% of its previous quote.
	- (b) If a trade is not the first trade of the day, its price should be within the range of 50–150% of the price of the trade prior to it.

We also use Standard and Poor's (S&P) Emerging Markets Database (EMDB) for information on emerging markets. We retrieve stock codes, security type codes, market capitalization, and industry sector classification codes from the database. Furthermore, we screen out sample firms by deleting the stocks that experienced stock splits during our sample period, while using the stock split information available from the EMDB database.

We rely on the *Datastream International* (DS) data to retrieve daily returns. Even if the DS data provide daily volume information, we use the *Bloomberg Terminals* tick data as the primary source of information on the daily volume. The reason is as follows. All volume information from the DS data is given in units of 1000 shares. However, a unit of 1000 shares is sometimes too large to accurately capture the daily volumes of many stocks in our sample. This is a result of infrequent trading, wherein only a few hundred shares of these stocks are typically traded per day. The DS data record 0 or 1 in the daily trading volume cells for these firms. This eventually leads to too small a variation in the daily volume to guarantee a reasonable estimation of some of the liquidity proxies that utilize daily volume information.

Initially, we collect the trade and quote information for 2105 firms in 23 countries from the *Bloomberg Terminal* data feed. However, only 1629 of these firms are covered by the EMDB. Furthermore, many of the 1629 firms whose information is available on both the *Bloomberg Terminal* and EMDB are not covered by the DS data. Even if they are covered in the DS data, some liquidity benchmarks or proxy variables cannot be estimated for various reasons. In our sample, we discard any firm that is not fully covered by all three databases, or it does not produce all liquidity benchmarks and proxies. Finally, two markets, Russia and Turkey, are excluded, since the number of the surviving firms is too small to carry out a reasonable intra-country analysis. Our final sample consists of 1183 firms from 21 emerging markets.

Among the above countries, China has the largest sample with 222 stocks, followed by South Korea with 145 stocks. Czech Republic has the smallest number of stocks at only seven. The number of stocks in each country is reported in Table 1. The *Bloomberg Terminal* tick data cover relatively large and liquid firms. This, along with the availability of information from the other data sources, and our restrictive data screening and sample selection procedure, leads to the composition of our final sample of firms. This composition tends to include more liquid firms than average emerging market firms. Nevertheless, even for these more liquid and representative firms, spreads are, in general, substantially higher when compared to the levels that were observed in developed markets.








**Table 1.** *Cont.*

Our data spans approximately three months from February to May 2004. However, the data periods for each country do not exactly match. For most countries, the starting date is one of 25, 26, or 27 February, while the ending date is one of 3, 4, or 5 May. The number of trading days ranges from approximately 46 to 51 for most countries.

All variables are measured in terms of U.S. dollars. We obtain exchange rates from Factset. During our sample period from February to May 2004, the changes in exchange rates are small. The maximum and minimum in average monthly exchange rate returns from the 21 emerging markets are 1.97% for the Indian Rupee and −0.44% for the Korean Won, respectively.

Our study focuses on the cross-sectional relation between high-frequency liquidity benchmarks and low-frequency liquidity proxies. Existing studies on the U.S. equity markets demonstrate that the cross-sectional patterns of forecasting errors and the correlation between various liquidity proxies have been stable over time (Chung and Zhang 2014; Abdi and Ranaldo 2017). The results from these two studies clearly indicate that the cross-sectional pattern of the effectiveness of liquidity proxies is time invariant. Therefore, we believe that, despite the limitation in our sample period, our analysis still gives valid and valuable information to researchers and practitioners.
