**1. Introduction**

At the turn of the 20th century, Bachelier suggested in his PhD thesis that stock prices follow geometric Brownian motions and worked out some of the consequences [1]. This was a major breakthrough at that time, when few expected any theoretical understanding of the stock market. In his thesis, Bachelier assumed that the prices of term contracts follow a normal distribution. Osborn then proposed that it is the rate of return that follows a normal distribution [2]. Later, Mandelbrot and Fama independently found early evidence to sugges<sup>t</sup> that this is not true, and the return distribution has fat tails better fitted by a Levy stable distribution with *b* = 1.7 [3,4]. Mandelbrot then proposed modeling financial returns using fractional Brownian motion [5] and, later, multifractals [6]. Parallel efforts to understand the complexity of financial markets using agent-based models and evolutionary computing were also undertaken at the Santa Fe Institute by Palmer et al. [7]. Up until this point in time, physicists studied economics problems sporadically, and this body of knowledge was not ye<sup>t</sup> known as econophysics.

Widely recognized to be the start of econophysics are the 1991 paper by Mantegna [8] and the 1992 paper by Takayasu and his co-workers [9]. Then, in 1995, Stanley coined the name *econophysics* during the Statphys-Kolkata conference at Kolkata, India [10]. This

**Citation:** Yen, P.T.-W.; Xia, K.; Cheong, S.A. Understanding Changes in the Topology and Geometry of Financial Market Correlations during a Market Crash. *Entropy* **2021**, *23*, 1211. https://doi.org/10.3390/ e23091211

Academic Editor: Ryszard Kutner

Received: 19 July 2021 Accepted: 6 September 2021 Published: 14 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

marked a watershed moment in the field. After 1995, more physicists worked on economic and financial problems, publishing their results and findings in physics journals. These events ushered in the field of econophysics, where physicists (and mathematicians, as well as computer scientists) brought insights from their own fields to the study of economics and finance. Over the next two decades, econophysicists witnessed several breakthroughs. The earliest success of econophysics is the application of random matrix theory (RMT, which is a statistical theory developed to explain the energy spectra of heavy nuclei) to the stock market [11–14]. In RMT, one treats noise as a kind of symmetry, and thus information represents some form of symmetry breaking. This allows physicists to discriminate between noise and signal in financial markets. The next significant milestone in econophysics was a more compelling demonstration of fat tails in return distributions by Mantegna and Stanley [15,16], and also by Mittnik et al. [17]. These two groups estimated *b* = 1.4 for the Levy stable distribution.

Many other breakthroughs then followed, including the fitting of price time series to a log-periodic power-law (LPPL), which allowed precise predictions of market crashes [18,19], as well as the discovery of dragon kings [20] by Sornette, understanding and modeling of the Gibbs–Pareto distribution of wealth and income by Chakrabarti et al. [21] and Yakovenko [22], characterization of the actual Brownian motion in the price fluctuations [23,24], and the development of the DebtRank metric for measuring systemic risk in financial networks [25]. Other network approaches have also started appearing in econophysics recently. These include recurrence networks (RNs), visibility graphs (VGs), and transition networks (TNs). Recurrence networks were proposed by Marwan, Donner and their co-workers in 2009 [26,27] and are used to study the statistical properties of daily exchange rates [27]. Since the seminal work by Lacasa in 2008 [28], many groups have started using VGs to analyze financial time series, including exchange rates [29], stock indices across different countries [30], the macroeconomics series of China [31], and market indices in the US [32]. A recent article by Antoniades et al. [33] used the TN to investigate the Vosvrda macroeconomic model, but thus far no one has tested the approach on real financial time series data.

Other recent breakthroughs include the application of inverse statistics (IS) in finance. IS, which is deeply rooted in fluid dynamics, and related in particular to the phenomenon of turbulence, is an old ye<sup>t</sup> challenging problem. For the last two decades, many concepts have been borrowed from past studies on turbulence and applied to financial problems. One of them was the use of forward statistics, which aims to answer the question "given a fixed time horizon, what are the typical returns that an investor will realize in that period?". In addition, Jensen [34] proposed the inverse statistics, by turning the question around, to ask "for a given return on an investment, what is the typical time required to realize it?". This latter question is no less pertinent and is more relevant to practical financial management. If IS such as the above can be computed, investors could earn market-beating profits.

Using the IS as a probe, Jensen, Simonsen, and Johansen, published a series of papers starting in the mid-2000s [35–38] to study many economic phenomena. They focused particularly on the Gain-Loss Asymmetry (GLA) in financial markets. GLA refers to the observation that, in a financial market, positive prices have different dynamics from the negative ones. After testing stock indices in the US such as the DJIA [35], Nasdaq, and the S&P 500 [37,39], those in other countries such as Austria [40], Korea [41], and 40 other world indices [39], and other instruments such as FOREX [38], mutual funds [42], it was found empirically that negative returns took shorter average times to realize compared to positive returns of the same magnitude. To explain how GLA occurs in real markets, models with a fear factor have been developed [43–46]. However, factors other than fear of loss might also explain the GLA [47]. A comprehensive survey on IS can be found in the review article by Ahlgren et al. [48].

In this Special Issue, we celebrate the breakthrough that is one of Mantegna's crowning achievements, which is the application of the *minimal spanning tree* (MST) to unravel hierarchical structures in financial markets [49]. We will start by reviewing the essence of Mantegna's insight, and the body of works that followed him (including the systematic embedding of cross correlations onto a hierarchy of surfaces with different genera [50]). We then describe attempts to overcome the limitations of the MST by going to hypergraph approaches [51–54]. A hypergraph is a natural extension of a graph, where instead of having each edge join only two nodes, an edge can join any number of nodes. Unfortunately, the hypergraph approach is difficult to implement starting from pairwise correlations, so we argue that the more promising approach to extract deeper insights into the hierarchical structure in financial markets is through *topological data analysis* (TDA) [55–58]. In TDA, the idea is to go beyond the concepts of nodes (0-simplex), links (1-simplex), and the network that they form to a *simplicial complex*, which can contain (*k* > 1)-simplices as constituents.

In a recent paper [59], we demonstrated how TDA can be used to understand the topological changes that accompany market crashes. For such extreme events in financial markets, one of the key questions not well answered through the use of MSTs or planar maximally filtered graphs (PMFGs) is how the hierarchy of cross correlations between stocks re-organizes itself. In particular, an important class of topological changes is the merging between disjoint clusters (or their time reversal—the splitting of a cluster into disjoint clusters). We found, by tracking how the Betti numbers *β*0, *β*1, and *β*2 change over market crashes, that *β*0 (the number of connected components) is small at the beginning of a market crash and increases as the market crash progresses. This tells us that we have a giant connected component in the market just before the crash, and as the market crashed, this broke up into many smaller components. The nature of this breaking up can be understood in greater detail through *β*1 (the number of "holes" in the connected components), and *β*2 (the number of "voids" in the connected components) (see Figure 1). Based on *β*1 and *β*2, we realized that a particular crash occurred in two stages. In the first stage, the topology of the giant connected component became more complex, as some "voids" grew outwards to become "holes". In the second stage, the number of "holes" decreased precipitously, presumably the result of handle-breaking events. These handle-breaking events are not simple, because the number of "voids" increases in this stage. Finally, the giant connected component broke up completely into many connected components that have simple topologies (few "holes" and "voids").

**Figure 1.** A manifold with a "hole" as well as a "void".

In addition to the TDA, we found another promising approach for extending the information filtering paradigm of MSTs and PMFGs. This is through calculating discrete versions of the Ricci curvature, either the Ollivier-Ricci curvature (ORC) for networks, or the Forman-Ricci curvature (FRC) for simplicial complexes. To identify which stocks in a network or simplicial complex make up the neck or bridge region between two densely connected clusters, the naive approach would be to identify them visually. Naturally, this is laborious and inefficient. It turns out the ORC is ideal for this task, because links in the neck regions have negative ORC. More importantly, the breaking up of a manifold into two involved the stretching and narrowing of the neck region through a process called *Ricci flow*. Physical fission processes closely resemble Ricci flow, even when the objects undergoing fragmentation are networks or simplicial complexes. In such discrete Ricci flows, the ORC or FRC become more negative over time to produce finite-time singularities. Our motives in computing the ORC are threefold: First, we would like to identify the

neck regions by looking for where in the network the ORCs are negative. Second, by looking at how the negative ORCs are changing, we would like to predict when we run into finite-time singularities. These are when the fissions occur. Finally, from the natures of the singularities, we would like to understand the drivers for the different fissions.

To make the case for TDA and Ricci curvature analysis, we organized our paper as follows. In Section 2, we will review applications of the MST in econophysics. In Section 3, we will explain how the PMFG can provide more details on correlations between stocks, by keeping more links than in the MST. In fact, there is a hierarchy of maximally filtered networks on closed surfaces with increasing genera (the PMFG being the simplest, on a sphere with genus *g* = 0) that we can explore to understand the structure of correlations between stocks. Unfortunately, the algorithms for obtaining higher-order filtered networks become increasingly difficult to implement, which explains why the PMFG is not as popular as the MST. In fact, we found only one previous work that demonstrated how to filter the weighted links of an artificial complex network onto a torus (with genus *g* = 1) [60]. In Section 4, we describe the ideas behind TDA, and sugges<sup>t</sup> that this is the natural extension going beyond MST and PMFG. To make our case, we explore four toy models for fusions and fissions, and thereafter use their TDA signatures to explain non-trivial topological changes observed in the cross correlations between stocks during a market crash in the Taiwan Stock Exchange (TWSE). In Section 5, we define what Ricci curvature is for smooth surfaces, and describe how this can be generalized to discrete networks and simplicial complexes, in the form of Ollivier-Ricci curvature and Forman-Ricci curvature, respectively. We then explain why we need Ricci curvature analysis to distinguish between different stages of fission processes that are topologically equivalent, before demonstrating this power for one of the toy models. Finally, we use the Ollivier-Ricci curvature to analyze a sequence of PMFGs obtained from the cross correlations of TWSE stocks in overlapping time windows leading up to the market crash of interest, before ending with a comparative case study of two neck regions. In Section 6, we present the conclusions.

#### **2. The Minimal Spanning Tree**

In Figure 2, we show the matrix of Pearson cross correlations

$$\mathbf{C}\_{\mathbf{i}\mathbf{j}} = \frac{\frac{1}{T} \sum\_{t=1}^{T} (\mathbf{x}\_{i,t} - \mathbf{x}\_{i}) \left(\mathbf{x}\_{j,t} - \mathbf{x}\_{j}\right)}{\sqrt{\frac{1}{T-1} \sum\_{t'=1}^{T} \left(\mathbf{x}\_{i,t'} - \mathbf{x}\_{i}\right)^{2}} \sqrt{\frac{1}{T-1} \sum\_{t''=1}^{T} \left(\mathbf{x}\_{j,t''} - \mathbf{x}\_{j}\right)^{2}}} \tag{1}$$

between 561 stocks in the Singapore Exchange (SGX) within the period January 2008 to December 2009. In Equation (1), the time series **x***i* = (*xi*,1, ... , *xi*,*t*, ... , *xi*,*<sup>T</sup>*) and **x***j* = (*xj*,1, ... , *xj*,*t*, ... , *xj*,*<sup>T</sup>*) with average *x*¯*j* = 1 *T* ∑*<sup>T</sup> t*=1 *xj*,*<sup>t</sup>* can be the daily prices, daily price differences (also known as the daily returns), or daily log-returns (which are practically identical to the daily fractional returns) of stocks *i* and *j*. Their time averages are *x*¯*i* = 1 *T* ∑*<sup>T</sup> t*=1 *xi*,*<sup>t</sup>* and *x*¯*j* = 1 *T* ∑*<sup>T</sup> t*=1 *xj*,*t*. In Section 4.3, we used the daily returns for our topological data analysis. This is acceptable for short time periods, e.g., six months, because the price levels do not change by much. For longer time periods, for example, two years, as in the example associated with Figure 2, we used the daily fractional returns, so that we do not have the problem of increasing weights when the price levels become significantly higher at the end of the time period.

Before the rows and columns are reordered, it is impossible to discern any correlational structures in the SGX stocks. After reordering the rows and columns, we find the strong correlations organized into diagonal blocks, with weaker correlations between them. We also see that within the largest diagonal block in Figure 2b, the correlations are not uniform, but are further organized into diagonal sub-blocks. In hindsight, doing the reordering of rows and columns to reveal these correlational structures in the SGX was a straightforward task, since they have been shown to exist in other markets [61–65]. Mantegna was the first

to suspect such hierarchical organizations exist in stock markets and proposed methods to elucidate such structures. Like us, Mantegna employed hierarchical clustering methods to carry out the reordering of rows and columns. However, clustering methods are based on pairwise distances, so the first problem that he had to solve was mapping the conventional Pearson cross correlations, which do not satisfy the three axioms of a distance metric, to pairwise distances. After discussions with Sornette (see Ref. 14 in [49]), Mantegna adopted the mapping

$$D\_{ij} = \sqrt{2(1 - \mathcal{C}\_{ij})} \tag{2}$$

going from a cross correlation *Cij* between stock *i* and stock *j* to a pairwise distance *Dij*, which satisfies the *strong triangle inequality Dij* ≤ max{*Dik*, *Dkj*}. Mantegna then investigated the correlational structures in the component stocks of the Dow Jones Industrial Average (DJIA) and Standards and Poors 500 (S&P 500) indices, using single-linkage hierarchical clustering. Based on these results, Mantegna argued that US stocks do not react equally strongly to the various economic factors, but do so in groups synonymous with those discovered by random matrix theory [66]. This corroboration between Mantegna's 1999 MST paper and Plerou et al.'s 1999 RMT paper was an important discovery at that time.

**Figure 2.** (**a**) The cross-correlation matrix for 561 stocks in the SGX from January 2008 to December 2009. In this figure, red correlations are strongly positive, blue correlations are strongly negative, while green correlations are close to zero. No structures can be discerned in this figure, because the stocks are arranged in alphabetical order. (**b**) After reordering the rows and columns of the cross-correlation matrix, we found strong correlations organized into diagonal blocks, with weaker correlations between them. Material from: Teh et al., Cluster fusion-fission dynamics in the Singapore stock exchange, Euro. Phys. J. B, published 2015 [67], Springer Nature Switzerland AG.

However, the greatest impact of this 1999 paper was the use of the minimal spanning tree (MST) as a caricature of the correlational structures between stocks. A *tree* is a graph with no cycles, and the MST was introduced as early as the 1950s as a special subgraph of a weighted graph containing cycles. In Figure 3a, we show the algorithm attributed to Kruskal [68] for constructing an MST, as well as an example in Appendix A. Following Mantegna's lead, many others (including ourselves) started publishing papers on the MSTs of different markets in, for example, the US [69–75], UK [76], Korea [77,78], Japan [79], China [80], India [81], Indonesia [82], and Africa [83]. We also find the MST applied to different classes of financial instruments: market indices [81,84–86], bonds and interest rates [87–89], currencies [90–95], commodities [96–101], overnight loans in an interbank network [102], housing market indices of different countries [103], to name just a few. Beyond Mantegna's test of the temporal stability of the MST representation (where he changed the time period slightly, recomputed the cross correlations, and drew the MST again) [69],

Onnela et al. also used the MST to visualize the progression of a market crash [70,104]. Other applications include Sun et al. [105,106] and Jiang et al. [107] using the MST to detect insider trading in stock markets, as well as Onnela et al. [70,73], Tola et al. [108], and Coelho et al. [109] using the MST for portfolio selection. The popularity of the MST in econophysics should be clear from this quick survey, and interested readers can refer to the reviews [110,111] for even more references.


