**3. Method**

#### *3.1. Jump Detection*

In the present work, we focus on anomalous movements in the mid-price *pt* and their co-occurrence structure. To do so, we detect price jumps (up and down crashes) similarly to [10], at least in principle, in 1 min non-overlapping returns.

Specifically, we apply the basic jump detection method from [21] and detect jumps at the 5% significance level. The intuition behind this method is simple: we consider changes in *pt* in the form of log-returns *rt* = log *pt pt*−1 . Those are normalized so that, in the absence of jumps, their distribution is close to being normal and stationary. The method then exploits extreme value theory to obtain thresholds, above or below which, *rt* can be classified as anomalous (i.e., a jump), with a given confidence level.

To achieve a distribution of log-returns close to normal and stationary in time, we must normalize returns locally to account for two known regularities: daily seasonalities and long memory effects [22–25]. Mid-price returns have been shown to have approximate zero mean but a non-stationary variance due to the above [26]. Hence, the method empirically measures and discounts daily seasonality patterns and autocorrelations in return variance from the data. This yields a time series of almost normally distributed returns with stationary variance. Extreme value theory can then be applied as described above.

In addition to the basic features of the method for robust volatility estimation in intraday patterns, we obtain a robust estimate of intraweek periodicity and adjust the return series and jump detection according to [27].

As per the description above, null models are calibrated, and price jumps detected individually for each stock. As we consider 1 min non-overlapping returns, our sampling allows for aligned timestamps. We then consider contemporaneous price jump detection across assets in the universe as simultaneous jumps (a single systemic event).

It is important to highlight in the context of risk that crashes are normally associated with negative price returns of anomalous magnitude. The method used here detects both positive and negative anomalous price movements and we consider both as they are "jumps". In our related work [8], we have shown how both up and down jumps are relevant for risk, as market makers can hold inventory and be exposed in either direction. Further, a short squeeze can potentially be more dangerous, as it is often associated with high levels of leverage. Still, we recognize the importance of investigating down jumps (traditionally termed "crashes") and are looking to include a comparison between down and up jump structures in follow-up works.

#### *3.2. Crash Size Distribution and Firm Persistence*

We investigate whether co-jumps which involve different numbers of stocks originate from the same dynamic process and present the same distribution. We also consider whether individual stocks are involved to the same extent across co-crashes of different sizes or if a pattern emerges.

We define the unnormalized crash frequency for stock *x*, in co-crashes with *m* stocks and time range *t* ∈ [0, *T*] as

$$f\_{\mathbf{x},\mathbf{m}} = \sum\_{t=0}^{T} c\_{\mathbf{x},t,\mathbf{m}}$$

with

*cx*,*t*,*<sup>m</sup>* = 1, if stock *x* is involved in a crash of size *m* at time *t* 0, otherwise

By marginalizing over the ensemble of stocks *x*, we obtain the frequency distribution across co-crash sizes

$$f\_m = \sum\_{\mathbf{x}} f\_{\mathbf{x}, m}$$

The changes in the composition of the crashes are investigated by computing the correlation between the involvement of firms across crashes of different sizes. Namely, for each crash size *m*, we assign to each firm *x* a rank in decreasing order by *fx*,*m*. We then compute the Spearman correlations between these ranks.

#### *3.3. Statistical Testing*

To support the visual intuition of our results, we apply statistical testing in the form of null models. We applied the Spearman correlation to test for rank similarity between the crash frequency distributions across stocks at different crash sizes *m*. As the frequency distributions are noisy and fat-tailed, the correlation *p*-value seems hard to justify as a valid test. Hence, we follow the idea of Mantegna et al. [28] to create a simple null model of correlation significance.

To do so, we sample without replacement the whole list of stocks *Sm* according to ∝ *fm* from Section 3.2 to obtain a biased reshuffling *Gi*,*<sup>m</sup>* of the stocks according to their crash frequency.

For each shuffled list, we calculate the Spearman correlation coefficient between the sample and the original list to form the empirical null distribution as

$$\mathbf{D}\_{m} = \operatorname{Spec} \operatorname{ram}(\mathbf{G}\_{i,m}, \mathbf{S}\_{m})\_{i=1}^{10^{8}}$$

We then define the significance of the correlation between sizes *m*, *m* + *τ* as the quantile of *Spearman*(*Sm*<sup>+</sup>*τ*, *Sm*) in D*<sup>m</sup>*.

#### *3.4. Crash-Weighted Trading Volume*

To investigate the relationship between the crash size and the involvement of highly traded stocks, we define a weighted average daily dollar traded volume for each crash size, where the weighting is given by the normalized crash frequency of each stock.

For crash size *m* and crash frequency distribution *fx*,*m*, as per Section 3.2, we define the crash-weighted dollar traded volume DTV*m* as

$$\text{DTV}\_{\text{m}} = \frac{\sum\_{\mathbf{x}} f\_{\mathbf{x},\mathbf{m}} \text{DTV}\_{\mathbf{x},\mathbf{m}}}{f\_{\mathbf{m}}}$$

This measure aims to represent how more highly traded stocks are involved at different crash sizes.

#### **4. Results and Discussion**

The plot in Figure 1a shows the frequency distribution *fm* of the number of stocks involved in each flash crash. Figure 1b plots the cumulative frequency *f*(*M* ≥ *<sup>m</sup>*). It is evident from both figures that they are heavy-tailed, and there is a change in the slope around *m* ≈ 5 and a finite size effect at ≈ 102, which is when the crash involves a large portion of the system (system size is 3 · 102) [29]. This kind of distribution was already reported in [10], where the authors investigated and modeled flash crash sizes and frequency as a single Hawkes process. The authors there sugges<sup>t</sup> that each security's crash dynamics should be modeled as a self-excitation process, but they point out that this would involve tuning a large number of parameters on very noisy data. They therefore decided to model the collective self-excitation process of securities as the frequency of crashes (or co-crashes) and their size. Hence, all crash sizes are treated as instances of a multi-asset Hawkes process in [10], with no distinction between the assets involved in each crash or their co-occurrence structure.

In the present work, we take a more granular approach and move to investigate the structure of co-crashes and the individual susceptibility of each stock.

To further investigate the difference between small and large crash sizes, we report in Figure 2a the Spearman correlation between the ranks of crash frequency for all stocks. Specifically, each line reports the correlation between the rank of the companies in the initial crash size *m* (correlation 1) and all other crashes of higher sizes *m* + *τ*. We indeed observe how crashes of smaller sizes ( *m* < 5) have a substantially different composition to crashes of larger sizes. We instead observe that for sizes *m* > 5, a steady state is reached, with a large component of the population having similar ranks in frequency across all crash sizes. These steady states for *m* > 5 are significantly higher than the ones of smaller sizes, as the structure no longer evolves significantly between higher size crashes. The plot in Figure 2b provides a clearer visualization of this. We highlight that already at size 5, the correlation transitions directly to the steady state, albeit a lower one with respect to the ones for crash size 6 and above.

To validate the visual results from Figure 2, we apply the null model of correlation significance between crash frequency distributions.

Figure 3 shows the correlation significance between the starting point *m* on the horizontal axis and its steady state distribution ∼[*<sup>m</sup>* + 2, *m* + <sup>10</sup>]. We observe the first significant value at 1% around *m* = 4, which confirms the intuition from Figure 1a,b that crash sizes up to ≈4 belong to a different process than larger crashes. Indeed smaller crashes are dominated by less stable stocks and larger ones by very liquid stocks with high market

capitalization. This suggests that more influential and systemic stocks are involved in larger crashes and perhaps even trigger those. A reason for why this is not the case in small crashes can be that these stocks are systemic enough to mostly be involved in (or perhaps even cause) crashes of a larger size. These are then even more relevant for systemic risk. Alternatively, only larger crashes involve enough activity to influence highly traded stocks.

**Figure 1.** Heterogeneous crash distribution. Log-log plot of the flash crash size distribution. We observe that sizes lesser than 4 follow a different trend, with lower than expected frequency. This suggests that crashes of this size and onwards do not belong to the same self-organized process, but that this is rather a heterogeneous distribution. (**a** ) *f*(*m*); (**b**) *f*(*M* ≥ *<sup>m</sup>*).

**Figure 2.** Crash component rank correlation. Evidence that there is a transition around *m* = 5 with crashes involving a small number of companies (*m* < 5) being substantially differently populated with respect to crashes involving a larger number of companies (*m* > 5). The plot in Figure (**a**) reports the Spearman correlations of ranks in frequency between each starting crash size and higher crash sizes. The plot in Figure (**b**) looks at the average correlation in the range [*m* + 2, *m* + 20] for each value of *m* from Figure (**a**), which offers better visual intuition. (**a**) Spearman correlation between all consecutive crash sizes; (**b**) Spearman steady-state correlation mean in [*m* + 2, *m* + 20] , ∀*<sup>m</sup>*.

**Figure 3.** Crash component significance phase transition Evidence of a transition in the dynamics of crashes composition occurring around *m* = 5. The plot reports the steady-state statistical significance of the base crash size's frequency distribution.

This is therefore further evidence of the occurrence of a transition in the process between smaller and larger crashes. The slow decay of smaller crash sizes indicates how these belong to similar distributions of non-systemic events, but as the crash size grows, the steady state gets closer to the large crash level. This suggests that larger crashes have some systemic characteristics.

If we take a closer look at the top ranked stocks at each size, we observe that smaller crash sizes are dominated by very volatile and illiquid stocks, which are subject to large jumps perhaps due to the lack of a smooth price process in their trading. We would expect this though to make them susceptible to larger systemic events as well and, hence, stably ranked. Yet, we observe very low to null rank correlation between individual (and small) crash frequencies and the large crash size steady state. It seems as if these crashes are not only non-indicative, but also, as indicated by the phase transition in Figure 3, they belong to an unrelated ranking and distribution. We highlight that we considered rankings and ranking correlation in order to avoid any sensitivity to large values or outliers at smaller frequencies.

Large crash sizes involve stocks such as Microsoft (MSFT) and Apple (AAPL) as being consistently high ranked. We highlight that these stocks are highly liquid and characterized by a stable price process with very few price jumps. Indeed, the few times they ge<sup>t</sup> involved into jumps, they are often part of larger simultaneous crashes, which involve more stable and systemic stocks. Further, when analyzing the co-crash relations between pairs of stocks, we observed a heavy-tailed distribution of centrality for these large systemic stocks, which suggests a community and core-periphery-like structure of the contagion network of co-crashes [30–34].

The above observations prompted us to conduct further analyses on the relation between stock liquidity (where average daily dollar traded volume is used as a proxy) and crash frequency at different crash sizes.

To validate visually and numerically our observation that highly traded stocks are more present in large crashes, we present the plots in Figure 4. The plot in Figure 4a shows the average daily traded volume of a stock per crash size, weighted by its crash frequency, as per the definition in Section 3.4. This is plotted against the crash size to show a clearly increasing trend in crash-weighted traded volume with crash size. This shows how larger crashes see stocks with higher traded volumes being more frequently involved.

This could, however, be the consequence of a subset of crashes which involved highly traded stocks. We therefore test this with the results in Figure 4c, which show how not only the average crashing stock is more "liquid" in larger crashes, but also that the fraction of crashes, which involve at least one of the top 20 stocks by traded volume in our universe, increases with crash size.

In line with this, we test how the traded volume of each stock correlates with its crash frequency for each crash size. We report results for the Spearman correlation coefficient in Figure 4b, where dots are used for correlations significant to the 5% confidence level and crosses otherwise. We see that co-crashes of size 1 and 2 seem to have an inverse or no relation between the volume traded and crash size. At our previously identified phase transition point *m* ≈ 5, we see the first significant positive correlation between the volume traded and crash frequency, which stays somewhat stable or is slightly increasing with crash size.

This last result is less clear than the previous one, but still shows a positive correlation between the volume traded (a proxy for liquidity) and crash frequency at crash sizes *m* > 5.

The presence of liquid stocks in most large crashes observed in Figure 4c prompts questions around the periphery structure of the different liquid stocks and implications for systemic risk. Further work in this direction is already underway with promising results and will be the topic of a follow-up work. The causality of such co-crash structures is also a very important topic, albeit harder to investigate rigorously, and should be the subject of future work.

**Figure 4.** Relation of Traded Volume to crash size. The figures above show evidence of a relationship between the traded volume of stocks and their involvement in crashes of different sizes. (**a**) shows the general positive relation between crash size and the involvement of highly traded stocks. (**b**,**<sup>c</sup>**) show how the relationship exists not only on average, but also how "liquid" stocks are more involved throughout crashes at higher crash sizes. (**a**) Positive relation between crash-weighted average daily dollar traded volume and crash size *m*.; (**b**) Spearman correlation between traded volume and crash frequency across crash sizes *m*; (**c**) positive relation between fraction of crashes involving liquid stocks and crash size *m*/.
