**2. Methodology**

This section provides the theoretical construct of our statistical arbitrage strategy. Therefore, Section 2.1 describes the Barndorff–Nielsen and Shephard jump test (BNS jump test), which helps us to recognize jumps in our time series. The identification of overnight gaps is presented in Section 2.2.

### *2.1. Barndorff–Nielsen and Shephard Jump Test*

We follow the theoretical framework of Barndorff-Nielsen and Shephard (2004) to detect overnight gaps. First, let us denote low-frequency returns as:

$$y\_i = y^\*(i\hbar) + y^\*((i-1)\hbar), \quad i = 1, 2, \ldots,\tag{1}$$

where *y*<sup>∗</sup>(*t*) denotes the log price of an asset after time interval {*t*}*t*≥0 and *h*¯ represents a fixed time period, e.g., trading days. These low-frequency returns can be split up into *M* equally-spaced high-frequency returns of the following form:

$$y\_{j,i} = y^\*( (i - 1)\hbar + \hbar j M^{-1} ) + y^\*( (i - 1)\hbar + \hbar (j - 1)M^{-1} ), \quad j = 1, 2, \ldots, M. \tag{2}$$

If *i* denotes the *i*th day, the *j*th intra-*h*¯ return is expressed as *yj*,*i*. Therefore, the daily return can be written as:

$$y\_i = \sum\_{j=1}^{M} y\_{j,i}.\tag{3}$$

The BNS jump test of Barndorff-Nielsen and Shephard (2004) underlies the assumptions that prices follow a semi-martingale to ensure the condition of no-arbitrage and are generated by a jump-diffusion process of the following form and properties:

$$y^\*(t) = y^{(1)\*}(t) + y^{(2)\*}(t),\tag{4}$$

where *y*<sup>∗</sup>(*t*) describes the log price and *y*(1)∗(*t*) represents the stochastic volatility semi-martingale process:

$$y^\* = \alpha^\* + m^\*,\tag{5}$$

with *α*<sup>∗</sup> describing the trend term with locally-finite variation paths, following a continuous mean process of the security. The stochastic volatility process is represented through *<sup>m</sup>*<sup>∗</sup>, which is a local martingale and defined as:

$$m^\* = \int\_0^t \sigma(\mu) d\mathcal{W}(\mu),\tag{6}$$

where *W* describes the Wiener process. The spot volatility process *σ*<sup>2</sup>(*t*) is locally restricted away from zero and specified as càdlàg, meaning that the process is limited on the left side, while it is everywhere right continuous. Furthermore, *σ*(*t*) > 0, and the integrated variance (*IV*) process:

$$
\sigma^{2\*}(t) = \int\_0^t \sigma^2(u) du\tag{7}
$$

satisfies *σ*2∗(*t*) < <sup>∞</sup>, ∀ *t* < ∞. Moreover, *y*(2)∗(*t*) defines the discontinuous jump component as:

$$(y^{(2)\*} (t) = \sum\_{i=1}^{N(t)} c\_{i\nu} \tag{8}$$

with *N* representing a finite counting process, so that *N*(*t*) < <sup>∞</sup>, ∀ *t* > 0 and *ci* denoting nonzero random variables. Putting all together, the process can be written as:

$$y^\*(t) = a^\* + \int\_0^t \sigma(u)dW(u) + \sum\_{i=1}^{N(t)} c\_i \tag{9}$$

consisting of a stochastic volatility component that models continuous price motions and a jump term that accounts for sudden price shifts and discontinuous price changes. It is assumed that *σ* and *α*<sup>∗</sup> are independent of *W*. From an economic point of view, Rombouts and Stentoft (2011) showed that neglecting the non-Gaussian features of the data, prices are estimated with large errors.

To conduct the BNS jump test, three volatility metrics need to be specified: The quadratic variation (*QV*), realized variance (*RV*), and bipower variation (*BPV*). *QV* is defined as:

$$[y^\*](t) = \sigma^{2\*}(t) + \sum\_{i=1}^{N(t)} c\_{i'}^2 \tag{10}$$

with *σ*2∗(*t*) denoting the integrated variance, presenting the quadratic variation of the continuous part of the semi-martingale process, while ∑*<sup>N</sup>*(*t*) *i*=1 *c*2*i* determines the quadratic variation of the jump component (see Andersen et al. (2001), Barndorff-Nielsen and Shephard (2002), Andersen et al. (2003), Barndorff-Nielsen and Shephard (2004), Barndorff-Nielsen and Shephard (2006)). Hence, this volatility measurement takes into account the total variation of the underlying jump-diffusion process.

The realized variance:

$$\|\![y\_M^\*]\!\!\_i^2 = \sum\_{j=1}^M y\_{j,i}^2 \tag{11}$$

functions as a consistent estimator of *QV*, where *M* determines the number of intraday returns for day *i*. This volatility measure sums up all squared intraday returns for any considered period.

Andersen and Bollerslev (1998), Andersen et al. (2001), and Barndorff-Nielsen and Shephard (2002) showed that *RV* equals *QV* for large *M*, yielding to the equation:

$$\mathop{\rm plim}\limits\_{M\to\infty}RV\_t = QV\_t = \sigma^{2\*}(t) + \sum\_{i=1}^{N(t)} c\_i^2. \tag{12}$$

*BPV* was introduced by Barndorff-Nielsen and Shephard (2004) as:

$$\{y^\*\}^{[r,s]}(t) = \operatorname\*{plim}\_{\delta \to 0} \delta^{1 - (r+s)/2} \sum\_{j=1}^{\lfloor t/\delta \rfloor - 1} |y\_j|^r |y\_{j+1}|^s \text{ } r \text{, s } \ge 0,\tag{13}$$

where every {*δ*}*<sup>δ</sup>*><sup>0</sup> periods of time observations exist in interval *t*. *BPV* is a consistent estimator of *IV* under the assumption of a semi-martingale stochastic volatility process with a jump component described by Equation (4). Under those assumptions and for *r* > 0 and *s* > 0 applies:

$$\mu\_r^{-1}\mu\_s^{-1}\{y^\*\}^{[r,s]}(t) = \begin{cases} \int\_0^t \sigma^{r+s}(u)d(u), & \max(r,s) < 2, \\ x^\*(t), & \max(r,s) = 2, \\ \infty, & \max(r,s) > 2, \end{cases} \tag{14}$$

where *x*<sup>∗</sup>(*t*) is a stochastic process, and *μ* is defined as:

$$\mu\_{\mathbf{x}} = E|\mu|^{\mathbf{x}} = 2^{\mathbf{x}/2} \frac{\Gamma\left(\frac{1}{2}\left(\mathbf{x} + 1\right)\right)}{\Gamma\left(\frac{1}{2}\right)},\tag{15}$$

with *x* > 0, *u* following a standard normal distribution, while Γ denotes the complete gamma function.

Barndorff-Nielsen and Shephard (2004) focused on the special case of *r* = *s* = 1 leading to the following equation:

$$\mu\_1^{-2}\{y\_M^\*\}\_{1}^{[1,1]} = \mu\_1^{-2}\sum\_{j=1}^{M-1} |y\_{j,1}|^1 |y\_{j+1,1}|^1 \stackrel{p}{\rightarrow} \int\_{\hbar(i-1)}^{\hbar i} \sigma^2(u) du. \tag{16}$$

Hence, *BPV* is for *r* = *s* = 1 a consistent estimator of the integrated volatility for the *i*th period. Based on this case, the variation of the jump term can be isolated by subtracting *BPV* from *RV*:

$$\{ [y\_M^\*] ^2\_i - \mu\_1^{-2} \{ y\_M^\* \} ^{[1,1]}\_i \stackrel{p}{\rightarrow} \sum\_{j=N(\hbar(i-1))+1}^{N(\hbar i)} c\_j^2. \tag{17}$$

By calculating the difference between *RV* and *BPV*, we can separate the jump contribution to the variation of the asset price from the *QV*. Therefore, the volatility can be decomposed into its continuous and discontinuous components.

To identify jumps, we use the basic principles of the non-parametric BNS jump test and apply the ratio *z*-statistic from Huang and Tauchen (2005). This test statistic is adjusted for market noise and provides useful properties such as an appropriate size and a reasonable power. The evidence from the Monte Carlo simulation also suggests that this *z*-test is fairly accurate in detecting real jumps and not easily fooled by market micro structure noises. The ratio test statistic:

$$Z\_{l} = \frac{\frac{RV\_{l} - BPV\_{l}}{RV\_{l}}}{\sqrt{\left(\left(\frac{\pi}{2}\right)^{2} + \pi - 5\right) \frac{1}{M} \max\left(1, \frac{TP\_{l}}{BV\_{l}^{2}}\right)}} \stackrel{d}{\to} N(0, 1) \quad \text{as} \ M \to \infty \tag{18}$$

is asymptotic standard normally distributed under the null hypothesis of no jumps. Following Huang and Tauchen (2005), the tripower quarticity statistic is calculated by the following equation:

$$TP\_t = M\mu\_{4/3}^{-3} \left(\frac{M}{M-2}\right) \sum\_{j=3}^{M} |r\_{t,j}|^{4/3} |r\_{t,j-1}|^{4/3} |r\_{t,j-2}|^{4/3} \to \int\_0^t \sigma^4(u) du. \tag{19}$$

To determine if at least one jump occurred in an asset, a right-sided hypothesis test with the null hypothesis of no jumps was conducted. A commonly-used level of significance is 0.1 percent (see Barndorff-Nielsen and Shephard (2006), Evans (2011), Frömmel et al. (2015)). If the null hypothesis was rejected, at least one jump emerged in the underlying security during the considered period.
