**1. Introduction**

Statistical arbitrage is a market-neutral strategy developed by a quantitative group at Morgan Stanley in the mid-1980s (Pole 2011). Following Hogan et al. (2004), the self-financing strategy describes a long-term trading opportunity that exploits persistent capital market anomalies to draw positive expected profits with a Sharpe ratio that increases steadily over time. Arbitrage situations are identified with the aid of data-driven techniques ranging from plain vanilla approaches to state-of-the-art models. In the event of a temporary anomaly, an arbitrageur goes long in the undervalued stock and short in the overvalued stock (see Vidyamurthy (2004), Gatev et al. (2006)). If history repeats itself, prices converge to their long-term equilibrium and an investor makes a profit. Key contributions are provided by Vidyamurthy (2004), Gatev et al. (2006), Avellaneda and Lee (2010), Bertram (2010), Do and Faff (2012), and Chen et al. (2017).

The available literature divides statistical arbitrage into five sub-streams, including the time-series approach, which concentrates on mean-reverting price dynamics. Since financial data are exposed to more than one source of uncertainty, it is surprising that there exist only a few academic studies that use a jump-diffusion model (see Larsson et al. (2013), Göncü and Akyildirim (2016), Stübinger and Endres (2018), Endres and Stübinger (2019ab)). In addition to mean-reversion, volatility clusters, and drifts, this general and flexible stochastic model is able to capture jumps and fat tails. First, Larsson et al. (2013) used jump-diffusion models to formulate an optimal stopping theory. Göncü and Akyildirim (2016) presented a stochastic model for the daily trading of commodity pairs in which the noise-term is driven by a Lévy process. Stübinger and Endres (2018) introduce a holistic pair selection and trading strategy based on a jump-diffusion model. Recently, Endres and Stübinger (2019ab) derived an optimal pairs trading framework based on a flexible Lévy-driven Ornstein–Uhlenbeck process and applied it to high-frequency data. All these studies deal with intraday price dynamics and are therefore not in a position to take into account the impact of overnight price changes, an apparent deficit as information is published in media platforms 24 h a day, seven days a week.

This paper enhances the existing research in several aspects. First, our manuscript contributes to the literature by developing a fully-fledged statistical arbitrage framework based on a jump–diffusion model, which is able to capture intraday and overnight high-frequency price dynamics. Specifically, we detect overnight price gaps based on the jump test of Barndorff-Nielsen and Shephard (2004) and Andersen et al. (2010) and exploit temporary market anomalies during the first minutes of a trading day. The existence of the assumed mean-reverting property is confirmed by a preliminary analysis on the S&P 500 index; this characteristic is particularly significant 120 min after market opening. Second, the value-add of the proposed trading framework is evaluated by benchmarking it against well-known quantitative strategies in the same research area. In particular, we consider the naive S&P 500 buy-and-hold strategy, fixed threshold strategy, general volatility strategy, as well as reverting volatility strategy. Third, we perform a large-scale empirical study on the sophisticated back-testing framework of high-frequency data of the S&P 500 constituents from January 1998–December 2015. Our jump-based strategy produces statistically- and economically-significant returns of 51.47 percent p.a. appropriate after transaction costs. The results outperform the benchmarks ranging from −6.56 percent for the fixed threshold strategy to 38.85 percent for the reverting volatility strategy; complexity pays off. Fourth, a deep-dive analysis shows that our results are consistently profitable and robust against drawdowns even in the last part of our sample period, which is noteworthy as almost all statistical arbitrage strategies have suffered from negative returns in recent years (see Do and Faff (2010), Stübinger and Endres (2018)). The results pose a major challenge to the semi-strong form of market efficiency.

The remainder of this research study is structured as follows. Section 2 provides the theoretical framework applied in this study. In Section 3, we discuss the event study of the S&P 500 index. After describing the empirical back-testing framework in Section 4, we analyze our results and present key findings in Section 5. Finally, Section 6 gives final remarks and an outlook on future work.
