Complex Network Model of Global Financial Time Series Based on Different Distance Functions

Wang, Zhen; Ning, Jicai; Gao, Meng

doi:10.3390/math12142210

Open AccessArticle

Complex Network Model of Global Financial Time Series Based on Different Distance Functions

by

Zhen Wang

¹,

Jicai Ning

²

and

Meng Gao

^1,*

¹

School of Mathematics and Information Sciences, Yantai University, No. 30, Qingquan Road, Yantai 264005, China

²

Yantai Institute of Coastal Zone Research, Chinese Academy of Sciences, No. 17, Chunhui Road, Yantai 264003, China

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(14), 2210; https://doi.org/10.3390/math12142210

Submission received: 12 June 2024 / Revised: 8 July 2024 / Accepted: 12 July 2024 / Published: 15 July 2024

(This article belongs to the Topic Complex Networks and Social Networks)

Download

Browse Figures

Versions Notes

Abstract

:

By constructing a complex network model grounded in time series analysis, this study delves into the intricate relationships between the stock markets of 18 countries worldwide. Utilizing 31 distinct time series distance functions to formulate the network, we employ Hamming distance to quantify the resemblance between networks derived from different distance functions. By modulating the network density through distance percentiles (

p = 0.1

, 0.3, 0.5), we demonstrate the similarity of various distance functions across multiple density levels. Our findings reveal that certain distance functions exhibit high degrees of similarity across varying network densities, suggesting their potential for mutual substitution in network construction. Furthermore, the centroid network identified via hierarchical cluster analysis highlights the similarities between the stock markets of different nations, mirroring the intricate interconnections within the global financial landscape. The insights gained from this study offer crucial perspectives for comprehending the intricate network structure of global financial time series data, paving the way for further analysis and prediction of global financial market dynamics.

Keywords:

complex networks; time series distance function; Hamming distance; similarity; hierarchical clustering; global financial markets

MSC:

62P25; 91G45

1. Introduction

Networks stand as one of the preeminent tools for modeling complex systems [1]. Complex systems typically manifest as a collection of time series exhibiting profound interdependencies or resemblances [2]. These time series can be mapped onto a network, with nodes symbolizing the individual series and links bridging pairs of correlated nodes. Quantifying the similarity between time series often relies on distance functions, necessitating a judicious choice of these functions to precisely capture the desired similarity characteristics of the time series.

The process of network construction based on time series encompasses two pivotal steps [3]. Firstly, distances between all pairs of time series are calculated and these metrics are stored in a distance matrix. Subsequently, this distance matrix is transformed into an adjacency matrix, determining the connectivity between each pair of nodes in the network. This transformation not only lays the groundwork for network construction but also serves as the fundamental structure for subsequent network analysis. It facilitates not only the examination of individual node attributes within the network but also a profound exploration of the relationships (edges) between nodes. Network models, through this methodology, have emerged as potent tools for studying the internal structure and dynamics of complex systems, offering insights into system operation mechanisms from both macroscopic and microscopic viewpoints.

The evaluation of various distance functions is approached from the lens of network similarity. As a tool for assessing this similarity, the Hamming distance [4] is employed, quantifying the discrepancies between networks created by distinct distance functions. This is achieved by calculating the number of edges that need to be added or removed to transform one network into another. This method not only measures the similarity between networks but also reveals the interchangeability between different distance functions, thereby providing flexibility in choosing the most suitable method. Additionally, by adjusting the distance percentile p, we explore how different distance functions respond to variations in network density. Heatmaps are utilized to visually illustrate the changes in network similarity across different p values. This approach reveals that certain distance functions are resilient to alterations in network density, maintaining their stability and accuracy across varying density levels.

Distance functions occupy a pivotal position in the construction of complex networks based on time series. They define the degree of similarity or dissimilarity between nodes (time series) within the network, ultimately shaping the network’s topological structure and thus its overarching architecture. By calculating the distance between time series, these functions determine the connectivity between nodes, thereby uncovering the intricate interactions and dependencies among them.

In this paper, we embark on a comprehensive evaluation of the disparities in network construction resulting from the employment of various distance functions, leveraging the Hamming distance as a benchmark. Subsequently, we delve into a case study utilizing global financial time series, constructing complex networks grounded in diverse distance functions to explore the similarities in financial time series characteristics across different countries. We employ a total of 31 distinct distance functions (as detailed in Table 1) and regulate the number of edges in all networks through the parameter p. This approach allows us to assess the impact of varying p-values on network similarity, leveraging the principles of complex network theory [5] to analyze the networks constructed from the global financial time series data.

2. Data and methods

2.1. Data

We analyzed daily closing price data from the stock markets of 18 countries or regions, outlined in Table 2. The datasets can be freely accessed from https://stock.eastmoney.com/index.html (accessed on 5 December 2023). Given that stock markets are closed on weekends, we excluded weekend closing prices from the time series. Furthermore, we accounted for the varying national holidays across countries throughout the year. Therefore, our initial step was to consolidate the raw data, select an appropriate time frame, and apply interpolation methods to fill in any missing values. Following this, to mitigate the effects of trends and seasonality in the time series, we performed log differencing on the data. The analyzed period spanned from 23 March 2015 to 23 July 2019, resulting in a dataset comprising 18 time series, each containing 1077 data points. To ensure mathematical convenience, we normalized all time series, scaling all data values to a range between 0 and 1.

2.2. Time Series Distance Functions

Time series distance functions serve as metrics to quantify the degree of dissimilarity between two time series. For simplicity, we refer to them as “distance functions” throughout the text. The literature has introduced numerous distance functions, which can be broadly grouped into five major categories [3,26]: shape-based, edit-based, feature-based, structure-based, and event-based.

In the following text, an overview of the concepts of some time series distance functions is provided. To facilitate a better understanding of these concepts, Table 3 enumerates the mathematical symbols involved and their meanings.

The first category is the shape-based distance function. This type of measure evaluates the similarity between two time series by comparing their overall shape. The most notable is the

L_{p}

distance, with common instances being

L_{1}

,

L_{2}

, and

L_{\infty}

distances, corresponding to Manhattan, Euclidean, and Chebyshev distances, respectively.

L_{p}

distances are intuitive and parameter-free but can only compare fixed positional values within the series, often failing to capture similarity, thus termed lock-step measures. To overcome this, researchers have introduced various elastic measures that allow for a more flexible comparison by aligning time series through time warping before comparison. One of the well-known methods is Dynamic Time Warping (DTW) [6], which aligns two sequences by finding a path that minimizes the global warping cost. Complexity Invariant Distance (CID) [7] takes into account the complexity of the time series to adjust the Euclidean distance. Squared Euclidean distance [8], being the square of the Euclidean distance, accentuates the effect of distances, making differences more noticeable. Beyond these, other shape-based distance metrics like DISSIM [9], Short Time Series (STS) [10], Dice [11], etc., employ different computational methods to assess the similarity or dissimilarity between time series. The relevant information is shown in Table 4.

The second category is the edit-based distance function. The concept of this type of distance function typically involves comparing strings by determining the minimum number of character insertions, deletions, and substitutions required to transform one string into another. When the editing operations are limited to insertions, deletions, and substitutions, it is referred to as the Levenshtein distance [27]. However, if the operations also encompass swapping adjacent characters, the metric is designated as the Damerau–Levenshtein distance. Additionally, the Longest Common Subsequence (LCSS) distance [28] assesses the similarity between two sequences by calculating the length of their longest common subsequence and deriving the distance by subtracting this length from the original sequences’ lengths, while allowing only insertions and deletions. Hamming distances, on the other hand, are specifically tailored to quantify the similarity or dissimilarity between strings of equal length, focusing solely on substitution operations.

The third category is the feature-based distance function. This type of measure extracts descriptive features from time series and compares these features to assess their similarity. One of the most widely known measures is the Pearson correlation coefficient (PCC), which quantifies the linear relationship between two time series by calculating the ratio of their covariance to the product of their standard deviations. However, it is important to note that the Pearson correlation coefficient assumes a linear relationship between variables, which may not be applicable to nonlinear relationships and is sensitive to outliers. Therefore, its applicability and limitations should be carefully considered in its application. The Temporal Correlation and Raw Values (CORT) method [17] adjusts traditional distance metrics, such as Euclidean distance or Dynamic Time Warping (DTW), based on temporal correlation. Mutual information (MI) [18] is a nonlinear measure that assesses the amount of information shared between two time series. The maximal information coefficient (MIC) [19] extends this concept by plotting a grid on the scatter plot of the two time series to capture their relationship and automatically determining the grid that yields the maximum MI. The periodogram (PER) [21] is a measure that evaluates the periodic characteristics of time series by utilizing the Fourier transform to convert them from the time domain to the frequency domain and comparing their power spectral density in the frequency domain. Other examples of feature-based measures include the cross-correlation function [8], Fourier coefficients [22], autocorrelation coefficient [23], and integrated periodogram (INTPER) [20]. The key details are presented in Table 5.

The fourth category is the structure-based distance function. These functions identify and compare high-level structures within sequences. Compression-based measures (CDM) [24] and normalized compression distance (NCD) [25] fall under this category. Both metrics rely on the theory of data compression, assuming that similar data will result in shorter compression codes, whereas dissimilar data will yield longer codes. They utilize compression algorithms like gzip, bzip2, or xz to compress the time series. CDM is defined as the ratio of the length of the joint compression of two time series to the sum of their individual compressed lengths. NCD, on the other hand, calculates the difference between the combined compressed length of the two time series and the minimum of their respective compressed lengths and then normalizes this difference by dividing it by the maximum of their compressed lengths. The key information is presented in Table 6.

The fifth category comprises event-based distance functions [3]. These functions compare specific time points where certain events occur in the time series, aiming to detect synchrony or concurrent occurrences between two time series. The definition of events depends on the domain that generated the time series. Common approaches for defining events include identifying local maxima and minima, values exceeding or falling below thresholds, values above or below percentiles, or outliers in the sequence. Methods such as Event Synchronization (ES) distance and van Rossum (VR) distance are used to quantify the similarity or synchronization level between two event sequences. The ES distance typically involves comparing the time points of the marked events, while the VR distance transforms each event into a function and then compares them. The key details are summarized in Table 7.

2.3. From Time Series to Complex Networks

After preprocessing the data, we proceed to compute the distance between each pair of time series. This computation yields a distance matrix D, wherein each element

D_{i j}

signifies the distance between the sequences

X_{i}

and

X_{j}

. To accomplish this, we take into account each of the 31 distance functions enumerated in Table 1. Once the distance matrix D is obtained for each distance function, we normalize it to facilitate comparisons across different metrics and ensure consistency in analysis.

During the network construction process, we use D to construct a binary adjacency matrix A [29], where element

A_{i j}

equals 1 to indicate the presence of an edge between nodes

X_{i}

and

X_{j}

, and

A_{i j}

equals 0 to indicate the absence of an edge. The method for this step involves applying a threshold

τ

as follows:

A = Θ (τ - D),

(1)

where

Θ (•)

is the Heaviside step function. Choosing the right threshold

τ

is crucial in the network construction process, which determines the density or level of fragmentation of the network connection. For each distance function, we compute the distance matrix D and use the pth distance percentile as the threshold

τ

to construct the network. This approach not only avoids selecting different

τ

values for each distance function but also ensures that the network has the same number of edges during topological comparisons. The distance percentile p, used as a threshold

τ

, directly impacts the edge density of the network. Network density, which represents the ratio of the number of actually connected edges to the number of all possible edges in the network. A higher p-value increases the network density, resulting in a more densely connected network. Conversely, a lower p-value decreases the network density, leading to a sparser network.

In the aforementioned process, by employing various distance functions, each function yielded an undirected weighted network. These networks were composed of nodes and edges, where the nodes represent time series, and the edges denote the similarity between them. The weights of the edges were adjusted according to the corresponding distance values, where thicker edges reflect a higher similarity between the respective time series, indicating a lower distance.

2.4. Network Similarity

By analyzing the similarity between networks generated by diverse distance functions, we have transformed time series data into a collection of complex networks. To compare these networks, we employ the network Hamming distance [4], which is a modified metric tailored specifically for network comparisons. This distance measures the number of edges that need to be added or removed to transform one network into another, thus quantifying the dissimilarity between the network structures. Specifically, given two labeled networks G and

G^{'}

, and their respective adjacency matrices

A^{G}

and

A^{G^{'}}

, the Hamming distance is defined as:

d_{H} (A^{G}, A^{G^{'}}) = \sum_{i, j} \frac{1}{2} x o r (A_{i, j}^{G}, A_{i, j}^{G^{'}}),

(2)

where

x o r (•, •)

is the logical exclusive-or operator. By calculating the Hamming distance, the degree of similarity between networks can be measured. All computations were carried out utilizing the free statistical computational environment, R (version 4.1.2).

3. Results and Discussion

3.1. Network Similarities

Firstly, we investigate the similarity between networks constructed using various time series distance functions, considering three network densities: 0.1, 0.3, and 0.5. For each density level, we utilize 31 distinct distance functions to build the networks. We then compare all the resulting networks using the network Hamming distance (Equation (2)). This comparison results in a distance matrix

D_{H}

, where the elements

D_{A, B}^{H}

represent the Hamming distance

D_{H} (A, B)

between networks A and B. Following this, we apply hierarchical clustering, employing the complete linkage method, to categorize the diverse networks based on their structural similarities.

The clustering results are depicted in Figure 1. Our findings reveal that when constructing networks under varying p-values, certain distance functions exhibit notable similarities. For instance, fourierDist, avgL1LInf, Lorentzian, Manhattan, and Gower distance functions show high degrees of similarity across the three p-value conditions mentioned. This indicates that regardless of the p-value, these functions tend to produce similar network structures. Consequently, they can be deemed interchangeable in this context, as swapping them during network construction would not significantly alter the overall network structure. Similarly, Euclidean, sqEuclidean, Tanimoto, Sorensen, and Kulczynski distance functions also display comparable network structures across varying p-values. This further highlights that, in selecting distance functions for network construction, even functions with distinct computational approaches may exhibit similarity and interchangeability, especially when considering different p-value conditions.

It is noteworthy that the similarity among certain distance functions undergoes significant changes as the p-value increases. When p is set to 0.1, resulting in sparse networks, the similarity between networks constructed by most distance functions is relatively low. This may be attributed to the distance functions’ lack of sensitivity to subtle structural differences in sparse networks. However, as the p-value rises to 0.5, leading to denser networks, we observe a marked increase in the similarity between certain functions, such as CID and DTW. This suggests that CID and DTW distances are more adept at capturing large-scale changes in network structures. This observation underscores the importance of considering the performance variation in distance functions across different network densities when selecting them for network construction.

3.2. Global Financial Time Series

In this study, we aim to delve deeply into the intricate connections among global financial markets. To this end, we initially applied hierarchical clustering to 31 networks, all generated using diverse distance functions under the condition of

p = 0.3

. From this clustering, we identified and extracted the centroid network for each cluster, ultimately selecting four representative centroid networks: those constructed using VR, CDM, Euclidean, and Bhattacharyya distances.

Subsequently, we utilized the aforementioned four distance functions to calculate the similarity between time series, resulting in respective distance matrices. These matrices served as the foundation for generating heatmaps, each of which visually represents the distance between different time series. Figure 2, Figure 3, Figure 4 and Figure 5 display heatmaps generated specifically by VR, CDM, Euclidean, and Bhattacharyya distances, accompanied by their corresponding network diagrams.

At a threshold of

p = 0.3

, the heatmap generated using the VR distance reveals an insightful pattern along with its corresponding network diagram. In the heatmap, the intensity of the cell color (blue) inversely correlates with the distance between time series. Darker shades indicate a smaller distance and higher similarity between the stock markets of two countries, while lighter colors signify a greater distance and lower similarity.

The network diagram complements this visual representation, where edges symbolize the similarity between time series. Thicker edges reflect a higher degree of similarity, while thinner edges indicate a lower similarity. Specifically, the orange nodes represent developed countries in Europe, green nodes represent emerging countries in Europe, red nodes signify developed countries in North America, blue nodes represent emerging countries in Asia, and pink nodes correspond to developed countries in Asia.

The heatmap generated using the CDM distance reveals that, despite variations in distance values, there are no extreme highs or lows. This signifies a mutual influence and information flow among global stock markets under the CDM distance metric, resulting in a degree of uniformity in their trends. The corresponding network diagram mirrors this, exhibiting a structurally balanced relational network without significant concentrations or dispersions. This pattern indicates a global market structure where each country’s stock market is influenced, to a certain extent, by the stock markets of other countries.

The heatmap based on the Euclidean distance highlights darker cells primarily between the stock markets of developed European countries. Reflecting this, the corresponding network diagram exhibits particularly thick edges connecting the orange nodes, suggesting a remarkable similarity in the price fluctuation patterns among the stock markets of these European developed nations. A comprehensive analysis of heatmap and network diagram further shows that North America and certain developed European countries have close economic and financial ties, and therefore their stock markets exhibit similar volatility patterns. In contrast, stock markets in other countries appear more autonomous and independent, possibly because their economic linkages are less influenced by international interests.

The heatmap of Bhattacharyya distance and its accompanying network diagram uncover profound connections within European developed countries, as well as between emerging Asian countries and select developed European stock markets. The dark cells in the heatmap indicate a shorter Bhattacharyya distance, pointing to a high degree of similarity between these stock markets. In the network diagram, the corresponding thick lines underscore the strong interplay and similarity between these markets. Conversely, the lighter cells in the heatmap and thinner lines in the network diagram signify more distant relationships between other countries’ stock markets, reflecting a relative autonomy. This autonomy could potentially harbor distinct market risks and opportunities, offering investors a diverse range of options.

Our findings demonstrate that various distance functions effectively capture the nuances and relationships between stock markets. The color gradations in the heatmaps elucidate the specifics of market differences, while the network diagrams visually illustrate how these differences shape intricate interactions among global stock markets. Through the mutual reinforcement of these two visualization tools, we are able to identify not only the direct disparities between stock markets but also gain a profound understanding of how these differences intersect at a global level. This reveals that while the global stock markets exhibit a certain level of interconnectedness, they also maintain varying degrees of autonomy. This analysis provides crucial insights into investment decisions and risk assessment in global financial markets, assisting investors and policymakers in gaining a deeper understanding of market dynamics and formulating informed strategies accordingly.

4. Conclusions

In this study, we delve into daily closing price data from stock markets spanning 18 countries, employing 31 distinct time series distance functions to evaluate their effectiveness in network construction and their subsequent impact on the resulting network structures. The selection of an appropriate distance function is paramount in accurately capturing the similarities between time series. We utilize Hamming distance as a metric to assess the similarity between networks constructed using different distance functions.

By adjusting the distance percentile p (

p = 0.1

, 0.3, 0.5), our results indicate that certain distance functions, such as fourierDist, avgL1LInf, Lorentzian, Manhattan, and Gower, exhibit high similarity across varying network density conditions. This suggests that these functions can be used interchangeably in network construction without significantly altering the overall network structure.

Notably, at

p = 0.3

, our hierarchical clustering analysis identifies four representative centroid networks, constructed using VR, CDM, Euclidean, and Bhattacharyya distances, respectively. The heatmaps and corresponding network diagrams generated from these distances reveal the underlying similarities between stock markets across different countries. It is evident that the stock markets of developed European countries occupy pivotal positions in the global financial landscape. While there exists a degree of correlation among global stock markets, they also maintain varying degrees of autonomy.

In summary, this work underscores the efficacy and interchangeability of different distance functions in network construction, while exploring the intricate connections within global financial markets through the comparative analysis of networks built with diverse distance functions. In future endeavors, we aim to explore more time series data and distance functions to enhance our analysis and prediction capabilities in the realm of global financial market behavior.

Author Contributions

Conceptualization, M.G. and J.N.; methodology, Z.W.; software, Z.W.; validation, M.G., J.N. and Z.W.; formal analysis, Z.W.; investigation, Z.W.; resources, M.G.; data curation, M.G.; writing—original draft preparation, Z.W.; writing—review and editing, M.G., J.N. and Z.W.; visualization, M.G., J.N. and Z.W.; supervision, M.G.; project administration, M.G.; funding acquisition, M.G. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the Key Program of Shandong Natural Science Foundation (No. ZR2020KF031).

Data Availability Statement

All the datasets used in this study could be freely accessed from the following websites: https://stock.eastmoney.com/index.html (accessed on 5 December 2023).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Mitchell, M. Complex systems: Network thinking. Artif. Intell. 2006, 170, 1194–1212. [Google Scholar] [CrossRef]
Silva, T.C.; Zhao, L. Machine Learning in Complex Networks; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
Ferreira, L.N. From Time Series to Networks in R with the ts2net Package. arXiv 2022, arXiv:2208.09660. [Google Scholar] [CrossRef]
Levenshtein, V.I. Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. 1966, 10, 707–710. [Google Scholar]
Barabási, A.L. Network Science; Cambridge University Press: Cambridge, UK, 2016. [Google Scholar]
Berndt, D.J.; Clifford, J. Using dynamic time warping to find patterns in time series. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA, 31 July–1 August 1994; pp. 359–370. [Google Scholar]
Batista, G.E.A.P.A.; Wang, A.; Keogh, E.J. A complexity-invariant distance measure for time series. In Proceedings of the 2011 SIAM International Conference on Data Mining, Mesa, AZ, USA, 28–30 April 2011; Society for Industrial and Applied Mathematics: Philadelphia, PN, USA, 2011; pp. 699–710. [Google Scholar]
Deza, E.; Deza, M.M.; Deza, M.M.; Deza, E. Encyclopedia of Distances; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
Frentzos, E.; Gratsias, K.; Theodoridis, Y. Index-based most similar trajectory search. In Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering, Istanbul, Turkey, 17–20 April 2006; pp. 816–825. [Google Scholar]
Möller-Levet, C.S.; Klawonn, F.; Cho, K.-H.; Wolkenhauer, O. Fuzzy clustering of short time-series and unevenly distributed sampling points. In Proceedings of the International Symposium on Intelligent Data Analysis, Louvain-la-Neuve, Belgium, 12–14 April 2023; Springer: Berlin/Heidelberg, Germany, 2003; pp. 330–340. [Google Scholar]
Dice, L.R. Measures of the amount of ecologic association between species. Ecology 1945, 26, 297–302. [Google Scholar] [CrossRef]
Sørensen, T.J. A Method of Establishing Groups of Equal Amplitude in Plant Sociology Based on Similarity of Species Content and Its Application to Analyses of the Vegetation on Danish Commons; I kommission hos E. Munksgaard: Copenhagen, Denmark, 1948. [Google Scholar]
Tanimoto, T.T. Elementary Mathematical Theory of Classification and Prediction; International Business Machines Corporation: New York, NY, USA, 1958. [Google Scholar]
Cha, S.H. Comprehensive survey on distance/similarity measures between probability density functions. City 2007, 1, 1. [Google Scholar]
Gower, J.C. A general coefficient of similarity and some of its properties. Biometrics 1971, 27, 857–871. [Google Scholar] [CrossRef]
Bhattacharyya, A. On a measure of divergence between two multinomial populations. Sankhyā Indian J. Stat. 1946, 7, 401–406. [Google Scholar]
Chouakria, A.D.; Nagabhushan, P.N. Adaptive dissimilarity index for measuring time series proximity. Adv. Data Anal. Classif. 2007, 1, 5–21. [Google Scholar] [CrossRef]
Meilā, M. Comparing clusterings by the variation of information. In Proceedings of the Learning Theory and Kernel Machines: 16th Annual Conference on Learning Theory and 7th Kernel Workshop, COLT/Kernel 2003, Washington, DC, USA, 24–27 August 2003; Springer: Berlin/Heidelberg, Germany, 2003; pp. 173–187. [Google Scholar]
Reshef, D.N.; Reshef, Y.A.; Finucane, H.K.; Grossman, S.R.; Mcvean, G.; Turnbaugh, P.J.; Lander, E.S.; Mitzenmacher, M.; Sabeti, P.C. Detecting novel associations in large data sets. Science 2011, 334, 1518–1524. [Google Scholar] [CrossRef] [PubMed]
De Lucas, D.C. Classification Techniques for Time Series and Functional Data. Ph.D. Thesis, Universidad Carlos III de Madrid, Madrid, Spain, 2010. [Google Scholar]
Caiado, J.; Crato, N.; Peña, D. A periodogram-based metric for time series classification. Comput. Stat. Data Anal. 2006, 50, 2668–2684. [Google Scholar] [CrossRef]
Agrawal, R.; Faloutsos, C.; Swami, A. Efficient similarity search in sequence databases. In Proceedings of the Foundations of Data Organization and Algorithms: 4th International Conference, FODO’93, Chicago, IL, USA, 13–15 October 1993; Springer: Berlin/Heidelberg, Germany, 1993; pp. 69–84. [Google Scholar]
Galeano, P.; Pena, D.P. Multivariate analysis in vector time series. Resen. Inst. Matemática Estatística Univ. São Paulo 2000, 4, 383–403. [Google Scholar]
Keogh, E.; Lonardi, S.; Ratanamahatana, C.A.; Wei, L.; Lee, S.; Handley, J. Compression-based data mining of sequential data. Data Min. Knowl. Discov. 2007, 14, 99–129. [Google Scholar] [CrossRef]
Cilibrasi, R.; Vitányi, P.M.B. Clustering by compression. IEEE Trans. Inf. Theory 2005, 51, 1523–1545. [Google Scholar] [CrossRef]
Esling, P.; Agon, C. Time-series data mining. ACM Comput. Surv. (CSUR) 2012, 45, 1–34. [Google Scholar] [CrossRef]
Boytsov, L. Indexing methods for approximate dictionary searching: Comparative analysis. J. Exp. Algorithmics (JEA) 2011, 16, 1.1–1.91. [Google Scholar] [CrossRef]
Vlachos, M.; Kollios, G.; Gunopulos, D. Discovering similar multidimensional trajectories. In Proceedings of the 18th International Conference on Data Engineering, San Jose, CA, USA, 26 February–1 March 2002; pp. 673–684. [Google Scholar]
Fan, J.; Meng, J.; Ludescher, J.; Chen, X.; Ashkenazy, Y.; Kurths, J.; Havlin, S.; Schellnhuber, H.J. Statistical physics approaches to the complex Earth system. Phys. Rep. 2021, 896, 1–84. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The Hamming distances between networks constructed using various time series distance functions under three distinct network densities (0.1, 0.3, and 0.5) are depicted, along with the corresponding heatmaps generated through hierarchical clustering (employing the complete linkage method). The heatmap visually represents the degree of similarity between the distance functions through a gradual color shift, ranging from deep blue to light yellow. Specifically, deep blue indicates a very high degree of similarity, while a transition toward light yellow signifies a gradual decrease in similarity, ultimately reaching the lowest level of similarity represented by light yellow.

Figure 2. Heatmap generated by VR distance and its corresponding network diagram at

p =

0.3. In the heatmap, the darker the cell color (blue), the smaller the distance between time series and the higher the similarity between the stock markets of two countries. The lighter the color, the greater the distance and the lower the similarity. In the network diagram, edges represent the similarity between time series, and the thicker the edges are, the higher the similarity is. The thinner the edge, the lower the similarity. Among them, the orange node represents the developed countries in Europe, the green represents the emerging countries in Europe, the red represents the developed countries in North America, the blue represents the emerging countries in Asia, and the pink represents the developed countries in Asia.

Figure 2. Heatmap generated by VR distance and its corresponding network diagram at

p =

0.3. In the heatmap, the darker the cell color (blue), the smaller the distance between time series and the higher the similarity between the stock markets of two countries. The lighter the color, the greater the distance and the lower the similarity. In the network diagram, edges represent the similarity between time series, and the thicker the edges are, the higher the similarity is. The thinner the edge, the lower the similarity. Among them, the orange node represents the developed countries in Europe, the green represents the emerging countries in Europe, the red represents the developed countries in North America, the blue represents the emerging countries in Asia, and the pink represents the developed countries in Asia.

Figure 3. Heatmap generated by CDM distance and its corresponding network diagram at

p =

0.3.

Figure 3. Heatmap generated by CDM distance and its corresponding network diagram at

p =

0.3.

Figure 4. Heatmap generated by Euclidean distance and its corresponding network diagram at

p =

0.3.

Figure 4. Heatmap generated by Euclidean distance and its corresponding network diagram at

p =

0.3.

Figure 5. Heatmap generated by Bhattacharyya distance and its corresponding network diagram at

p =

0.3.

Figure 5. Heatmap generated by Bhattacharyya distance and its corresponding network diagram at

p =

0.3.

Table 1. The time series distance functions used in this study.

Distance	Reference
01 Event Synchronization (ES)	[3]
02 van Rossum (VR)	[3]
03 Dynamic Time Warping (DTW)	[6]
04 Complexity Invariant (CID)	[7]
05 Correlation	[8]
06 Cross-correlation (crossCor)	[8]
07 Euclidean ( $L_{2}$ )	[8]
08 Jaccard	[8]
09 Kulczynski	[8]
10 Lorentzian	[8]
11 Manhattan	[8]
12 Squared-Euclidean (sqdEuclidean)	[8]
13 DISSIM	[9]
14 Short Time Series (STS)	[10]
15 Dice	[11]
16 Sorensen	[12]
17 Tanimoto	[13]
18 Avg( $L_{1}, L_{\infty}$ ) (avgL1LInf)	[14]
19 Wave Hedges	[14]
20 Gower	[15]
21 Bhattacharyya	[16]
22 Temporal Correlation and Raw Values (CORT)	[17]
23 Mutual Information (MI)	[18]
24 Maximal Information Coefficient (MIC)	[19]
25 Integrated Periodogram (INTPER)	[20]
26 Periodogram (PER)	[21]
27 Fourier Coefficient (fourierDist)	[22]
28 Autocorrelation Coefficients (ACF)	[23]
29 Partial Autocorrelation Coefficients (PACF)	[23]
30 Compression-based (CDM)	[24]
31 Normalized Compression (NCD)	[25]

Table 2. Global stock indices. Countries labeled 1 to 9 are European developed countries, the country labeled 10 represents a European emerging country, 11 and 12 are North American developed countries, 13 to 16 represent Asian emerging countries (or regions), and 17 and 18 represent Asian developed countries (or regions).

Country/Region	Index
(1) Austria (AUT)	AUSTRIAN TRADED ATX INDEX
(2) Belgium (BEL)	BEL20 INDEX
(3) Denmark (DNK)	KFX COPENHAGEN SHARE INDEX
(4) France (FRA)	CAC 40 INDEX
(5) Germany (DEU)	DAX INDEX
(6) Ireland (IRL)	IRISH OVERALL INDEX
(7) Spain (ESP)	IBEX 35 INDEX
(8) Sweden (SWE)	OMX STOCKHOLM INDEX
(9) United Kingdom (GBR)	FTSE 100 INDEX
(10) Hungary (HUN)	BUDAPEST STOCK EXCHANGE INDEX
(11) Canada (CAN)	S&P/TSX COMPOSITE INDEX
(12) United States (USA)	S&P 500 INDEX
(13) India (IND)	MUMBAI SENSEX 30 INDEX
(14) Indonesia (IDN)	JAKARTA COMPOSITE INDEX
(15) Malaysia (MYS)	KUALA LUMPUR COMP INDEX
(16) Taiwan (TWN)	TAIWAN WEIGHTED INDEX
(17) Hong Kong (HKG)	HANG SENG INDEX
(18) Japan (JPN)	NIKKEI 225 INDEX

Table 3. The meaning of the symbol.

Symbol	Meaning
X	Time series
Y	Time series
T	The length of the time series
$X_{t}$	An element at the corresponding position t in time series X
$Y_{t}$	An element at the corresponding position t in time series Y
$\bar{X}$	The average of the time series X
$\bar{Y}$	The average of the time series Y
$X_{τ}$	X with a lag of $τ$
$P_{t}$	Probability distribution of elements at corresponding position t in time series X
$Q_{t}$	Probability distribution of elements at corresponding position t in time series Y
$p (X_{t}, Y_{s})$	Joint probability distribution function of $X_{t}$ and $Y_{s}$
a	X and Y are represented by a scatter plot in two-dimensional space, and a certain number of intervals are
	drawn in the x and y directions, respectively.
	a represents the number of intervals in the x direction
b	b represents the number of intervals in the y direction
X’	The event sequence extracted from time series X
Y’	The event sequence extracted from time series Y
$τ^{'}$	The time scale for determining whether two events are synchronized.
$N_{X}$	The total number of events in the event sequence X’
$N_{Y}$	The total number of events in the event sequence Y’
$t_{i}^{X}$	The time point at which the i-th event occurs in the event sequence X’
$t_{j}^{Y}$	The time point at which the j-th event occurs in the event sequence Y’

Table 4. Shape-based distance functions.

Distance	Equation	Reference
Dynamic time warping (DTW)	$d_{D T W} (X, Y) = d t w (t = T, s = T)$	[6]
	$d t w (t, s) = \{\begin{matrix} \infty & , & i f t = 0 x o r s = 0 \\ 0 & , & i f t = s = 0 \\ \| X_{t} - Y_{s} \| + m i n \{\begin{matrix} d t w (t - 1, s) \\ d t w (t, s - 1) \\ d t w (t - 1, s - 1) \end{matrix} & , & o t h e r w i s e \end{matrix}$
Complexity Invariant (CID)	$d_{C I D} (X, Y) = c f (X, Y) \cdot d_{E D} (X, Y)$	[7]
	$c f (X, Y) = \frac{m a x (c e (X), c e (Y))}{m i n (c e (X), c e (Y))}$
	$c e (X) = \sqrt{\sum_{t = 1}^{T - 1} {(X_{t} - X_{t + 1})}^{2}}$
Euclidean	$d_{E D} (X, Y) = \sqrt{\sum_{t = 1}^{T} {(X_{t} - Y_{t})}^{2}}$	[8]
Manhattan	$d_{M H} (X, Y) = \sum_{t = 1}^{T} \| X_{t} - Y_{t} \|$	[8]
Jaccard	$d_{J C} (X, Y) = \frac{\sum_{t = 1}^{T} {(X_{t} - Y_{t})}^{2}}{\sum_{t = 1}^{T} X_{t}^{2} + \sum_{t = 1}^{T} Y_{t}^{2} - \sum_{t = 1}^{T} X_{t} Y_{t}}$	[8]
Squared Euclidean	$d_{S E D} (X, Y) = \sum_{t = 1}^{T} {(X_{t} - Y_{t})}^{2}$	[8]
Kulczynski	$d_{K S} (X, Y) = \frac{\sum_{t = 1}^{T} \| X_{t} - Y_{t} \|}{\sum_{t = 1}^{T} m i n (X_{t}, Y_{t})}$	[8]
Lorentzian	$d_{L O} (X, Y) = \sum_{t = 1}^{T} l n (1 + \| X_{t} - Y_{t} \|)$	[8]
DISSIM	$d_{D I S S I M} (X, Y) = \sum_{t = 1}^{T - 1} \int_{t}^{t + 1} \| X_{t} - Y_{t} \| d t$	[9]
Short Time Series (STS)	$d_{S T S} (X, Y) = \sqrt{\sum_{t = 1}^{T - 1} {((Y_{t + 1} - Y_{t}) - (X_{t + 1} - X_{t}))}^{2}}$	[10]
Dice	$d_{D C} (X, Y) = \frac{\sum_{t = 1}^{T} {(X_{t} - Y_{t})}^{2}}{\sum_{t = 1}^{T} X_{t}^{2} + \sum_{t = 1}^{T} Y_{t}^{2}}$	[11]
Sorensen	$d_{S O} (X, Y) = \frac{\sum_{t = 1}^{T} \| X_{t} - Y_{t} \|}{\sum_{t = 1}^{T} (X_{t} + Y_{t})}$	[12]
Tanimoto	$d_{T M} (X, Y) = \frac{\sum_{t = 1}^{T} (m a x (X_{t}, Y_{t}) - m i n (X_{t}, Y_{t}))}{\sum_{t = 1}^{T} m a x (X_{t}, Y_{t})}$	[13]
$A v g (L_{1}, L_{\infty})$ (avgL1LInf)	$d_{A L L} (X, Y) = (\sum_{t = 1}^{T} \| X_{t} - Y_{t} \| + m a x \| X_{t} - Y_{t} \|) / 2$	[14]
Wave Hedges	$d_{W H} (X, Y) = \sum_{t = 1}^{T} \frac{\| X_{t} - Y_{t} \|}{m a x (X_{t}, Y_{t})}$	[14]
Gower	$d_{G W} (X, Y) = \frac{1}{T} \sum_{t = 1}^{T} \| X_{t} - Y_{t} \|$	[15]
Bhattacharyya	$d_{B H} (X, Y) = - l n \sum_{t = 1}^{T} \sqrt{X_{t} Y_{t}}$	[16]

Table 5. Feature-based distance functions.

Distance	Equation	Reference
Correlation	$d_{C O R} (X, Y) = 1 - \| ρ X Y \|$	[8]
	$ρ X Y = \frac{\sum_{t = 1}^{T} (X_{t} - \bar{X}) (Y_{t} - \bar{Y})}{\sqrt{\sum_{t = 1}^{T} {(X_{t} - \bar{X})}^{2}} \cdot \sqrt{\sum_{t = 1}^{T} {(Y_{t} - \bar{Y})}^{2}}}$
Cross-correlation	$d_{C C O R} (X, Y) = 1 - ({max}_{τ \in [- τ_{m a x}, τ_{m a x}]} \| ρ X Y_{τ} \|)$	[8]
	$ρ X Y_{τ} = \frac{\sum_{t = 1}^{T} (X_{t} - \bar{X}) (Y_{t - τ} - \bar{Y})}{\sqrt{\sum_{t = 1}^{T} {(X_{t} - \bar{X})}^{2}} \cdot \sqrt{\sum_{t = 1}^{T} {(Y_{t - τ} - \bar{Y})}^{2}}}$
Temporal Correlation and Raw Values (CORT)	$d_{C O R T} (X, Y) = Φ (c o r t (X, Y)) \cdot d_{E D} (X, Y)$	[17]
	$Φ (u) = \frac{2}{1 + e^{k u}}, (k = 2)$
	$c o r t (X, Y) = \frac{\sum_{t = 1}^{T - 1} (X_{t + 1} - X_{t}) (Y_{t + 1} - Y_{t})}{\sqrt{\sum_{t = 1}^{T - 1} {(X_{t + 1} - X_{t})}^{2}} \sqrt{\sum_{t = 1}^{T - 1} {(Y_{t + 1} - Y_{t})}^{2}}}$
Mutual Information (MI)	$d_{M I} (X, Y) = H (X) + H (Y) - 2 M I (X, Y)$	[18]
	$H (X) = - \sum_{t = 1}^{T} P_{t} l o g P_{t}$
	$M I (X, Y) = \sum_{t = 1}^{T} \sum_{s = 1}^{T} p (X_{t}, Y_{s}) l o g \frac{p (X_{t}, Y_{s})}{P_{t} Q_{s}}$
Maximal Information Coefficient (MIC)	$d_{M I C} (X, Y) = 1 - M I C (X, Y)$	[19]
	$M I C (X, Y) = {max}_{a, b < T^{α}} (M_{a b})$
	$M_{a b} = \frac{m a x (M I (G_{a b}))}{l o g (m i n (a, b))}$
Integrated Periodogram (INTPER)	$d_{I N T P E R} (X, Y) = \int_{- π}^{π} \| F_{X} (λ) - F_{Y} (λ) \| d λ$	[20]
	$F_{X} (λ_{j}) = \frac{\sum_{k = 1}^{j} I_{X} (λ_{k})}{\sum_{k} I_{X} (λ_{k})}$
	$I_{X} (λ_{k}) = \frac{1}{T} {\| \sum_{t = 1}^{T} X_{t} e^{- i λ_{k} t} \|}^{2}$
	$λ_{k} = \frac{2 π k}{T}, (k = - [\frac{T - 1}{2}], \dots, - 1, 0, 1, \dots, [\frac{T}{2}])$
Periodogram (PER)	$d_{P E R} (X, Y) = \sqrt{\sum_{k = 1}^{T / 2} {(I_{X} (λ_{k}) - I_{Y} (λ_{k}))}^{2}}$	[21]
	$λ_{k} = \frac{2 π k}{T}$
	$I_{X} (λ_{k}) = \frac{1}{T} {\| \sum_{t = 1}^{T} X_{t} e^{- i λ_{k} t} \|}^{2}$
Fourier Coefficient	$d_{F O} (X, Y) = \sqrt{\sum_{j = 1}^{n} {\| x_{j} - y_{j} \|}^{2}}$	[22]
	$x_{j} = \sum_{t = 1}^{T} X_{t} \cdot e^{2 π i (t - 1) (j - 1) / T}$
	$n = ⌊ T / 2 ⌋ + 1, i = \sqrt{- 1}$
Autocorrelation Coefficients (ACF)	$d_{A C F} (X, Y) = \sqrt{\sum_{τ = - τ_{m a x}}^{τ_{m a x}} {(ρ X X_{τ} - ρ Y Y_{τ})}^{2}}$	[23]
Partial Autocorrelation Coefficients (PACF)	$d_{P A C F} (X, Y) = \sqrt{\sum_{τ = - τ_{m a x}}^{τ_{m a x}} {(ρ X X_{τ} - ρ Y Y_{τ})}^{2}}$	[23]

Table 6. Structure-based distance functions.

Distance	Equation	Reference
Compression-based (CDM)	$d_{C D M} (X, Y) = \frac{C (X Y)}{C (X) + C (Y)}$	[24]
Normalized Compression (NCD)	$d_{N C D} (X, Y) = \frac{C (X Y) - m i n (C (X), C (Y))}{m a x (C (X), C (Y))}$	[25]

Table 7. Event-based distance functions.

Distance	Equation	Reference
Event Synchronization (ES)	$d_{E S} (X^{'}, Y^{'}) = 1 - \frac{C^{τ^{'}} (X^{'} ∣ Y^{'}) + C^{τ^{'}} (Y^{'} ∣ X^{'})}{\sqrt{N_{X} N_{Y}}}$	[3]
	$C^{τ^{'}} (X^{'} ∣ Y^{'}) = \sum_{i = 1}^{N_{X}} \sum_{J = 1}^{N_{Y}} J_{i j}^{τ^{'}}$
	$J_{i j}^{τ^{'}} = \{\begin{matrix} 1 & , & i f 0 < t_{i}^{X^{'}} - t_{j}^{Y^{'}} \leq τ^{'} \\ 1 / 2 & , & i f t_{i}^{X^{'}} = t_{j}^{Y^{'}} \\ 0 & , & o t h e r w i s e \end{matrix}$
van Rossum (VR)	$d_{V R} (X^{'}, Y^{'}) = \sqrt{\int_{0}^{\infty} {[V (X^{'}) - V (Y^{'})]}^{2} d t}$	[3]
	$V (X^{'}) = \frac{1}{N_{X}} \sum_{i = 1}^{N_{X}} h_{τ^{'}} (t - t_{i}^{X}) u (t - t_{i}^{X})$
	$h_{τ^{'}}^{g} = e x p (- t^{2} / (2 τ^{' 2})) / \sqrt{2 π τ^{' 2}}$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Z.; Ning, J.; Gao, M. Complex Network Model of Global Financial Time Series Based on Different Distance Functions. Mathematics 2024, 12, 2210. https://doi.org/10.3390/math12142210

AMA Style

Wang Z, Ning J, Gao M. Complex Network Model of Global Financial Time Series Based on Different Distance Functions. Mathematics. 2024; 12(14):2210. https://doi.org/10.3390/math12142210

Chicago/Turabian Style

Wang, Zhen, Jicai Ning, and Meng Gao. 2024. "Complex Network Model of Global Financial Time Series Based on Different Distance Functions" Mathematics 12, no. 14: 2210. https://doi.org/10.3390/math12142210

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Complex Network Model of Global Financial Time Series Based on Different Distance Functions

Abstract

1. Introduction

2. Data and methods

2.1. Data

2.2. Time Series Distance Functions

2.3. From Time Series to Complex Networks

2.4. Network Similarity

3. Results and Discussion

3.1. Network Similarities

3.2. Global Financial Time Series

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI