*4.3. Distance Matrix and Its Minimum Spanning Tree*

Although the correlation coefficient can explain some aspects of the relationships between cryptocurrencies, it is not a metric [85]. Thus, the connections learned from the correlation matrix lack topological characteristics because they are not placed in a metric space [85]. To tackle this issue, a concept named *Distance Matrix* has been introduced to replace the correlation matrix.

Let **D** be a distance matrix deriving from **C***cleaned*, then:

$$d\_{\vec{i}\vec{j}} = \sqrt{2\*\left(1 - c\_{\vec{i}\vec{j}}\right)}\tag{3}$$

where *dij* ∈ [0, 2] is an element of **D**, with 0 indicates the complete similarity between 2 nodes while 2 indicates the complete difference between 2 nodes. From the Equation (3), we can prove that: (1) *dij* ≥ 0, (2) *dij* = 0 if *i* = *j* and (3) *dij* = *dji*, i.e., the requirements of a metric are satisfied [85]. By using the distance matrix, we can derive a network (graph) of cryptocurrencies (nodes) with a specific topology, where similar cryptocurrencies are close to each other and cryptocurrencies with different behaviors are far away from each other, the link (edge) between each pair of cryptocurrencies is their distance value. Thanks to this topology, different communities of cryptocurrencies can be observed.

One problem with this type of network is that it is dense. That is, for a set of *N* nodes, the corresponding graph deriving from **D** has *<sup>N</sup>*×(*N*−1) <sup>2</sup> edges such that each vertex connects to all other vertices. To reduce the complexity of the network, we use a Minimum Spanning Tree (MST) [86], which refers to a special tree from the graph that links all vertices together in which its length is minimal. Particularly, it reduces the amount of redundant information since it only keeps the *N* − 1 most important edges, i.e., *N* − 1 shortest edges that are well connected. MST stems from graph theory and is applied widely to different fields [4,87,88], especially in financial markets [89–91]. To exploit the useability of MST, the dynamics of community structures in the stock market are observed by Huang et al. [92] with the dataset split into consecutive smaller periods and a MST constructed at each of them. Thus, the characteristics of a financial network can be captured by observing the evolution of MSTs. More recently, the cryptocurrency market was introduced and attracted a number of investors, and the demand for exploring the correlation between cryptocurrencies thereby emerged. However, this topic is rather new and needs more studies to be implemented [4,93].

There are two famous algorithms to find the MST, namely Prim [93] and Kruskal [94]. While both methods show good performance, Kruskal seems to be better in terms of time complexity. A comparison between the two from [95] shows that the prior works well with a big network, while the latter is dominant when the network is small, which is appropriate for this study as we have only 34 cryptocurrencies. Moreover, Kruskal is used more often in finance-related topics compared to other approaches [96–98], which strengthens the reliability of the algorithm. With these advantages, we choose Kruskal for this study.
