*4.1. Correlation Matrix Based on Pearson Coefficients and Random Matrix Theory*

Given *xi* is the price time series of cryptocurrency *i*, we use its return values to find the correlation between cryptocurrencies. This is because Return values are represented as a percentage, making them scale-free and especially, stationary, which is an important requirement for many statistical tools, such as *Normalization*. Thus, we first calculate the corresponding return time series *ri* as follows [56]: *ri* <sup>=</sup> *log xt i* /*xt*−<sup>1</sup> *i* , where *x<sup>t</sup> <sup>i</sup>* is the price value of the cryptocurrency *i* at timestamp *t*.

Each of these return time series can be normalized as follows [57]: *r*ˆ*<sup>i</sup>* = (*ri* − *μi*)/*σi*, where *μ<sup>i</sup>* and *σ<sup>i</sup>* are the average value and standard deviation of time series *i*, respectively.

We form a *m* × *n* matrix **G** such that each column represents a normalized return time series of a cryptocurrency and each row represents a timestamp. The corresponding correlation matrix **C** can be expressed as follows [56]: **C** = <sup>1</sup> *<sup>m</sup>***GG**. In other words, each element *Cij* of **C** shows the correlation strength between cryptocurrencies *i* and *j* by calculating the dot product of the two normalized return time series, *Cij* =< *r*ˆ*i*,*r*ˆ*<sup>j</sup>* >. Such a correlation matrix is called *Pearson correlation matrix*.

It should be noted that Pearson correlation has some limitations as described in [58]. In particular, its sensitivity to outliers and inability to capture non-linear relationships both have the potential to cause misleading results. However, we believe that this correlation metric is appropriate to use in our study for the following reasons:


One issue raised from this type of matrix is the question of how reliable these correlations are, in other words, whether the correlation matrix shows genuine and authentic relationships between the considered time series. Thanks to the RMT [61], this hypothesis can be examined. Particularly, given a *m* × *n* random matrix **N** whose elements are distributed randomly with zero mean and unit variance, the eigenvalue distribution of the correlation matrix **CN** = <sup>1</sup> *<sup>m</sup>***NN** follows the Marchenko–Pastur probability density function [62] if the Quality Factor *Q* = *<sup>m</sup> <sup>n</sup>* ≥ 1 holds when the number of timestamps *m* → ∞ and the number of features *<sup>n</sup>* <sup>→</sup> <sup>∞</sup>: *<sup>P</sup>*(*λ*) <sup>=</sup> *<sup>Q</sup>* 2*π* <sup>√</sup>(*λ*+−*λ*)(*λ*−*λ*−) *<sup>λ</sup>* , where *P* is the Marchenko–Pastur

probability density function, *<sup>λ</sup>* is an eigenvalue of **CN**, *<sup>λ</sup>*<sup>±</sup> <sup>=</sup> <sup>1</sup> <sup>+</sup> <sup>1</sup> *<sup>Q</sup>* ± 2 <sup>1</sup> *<sup>Q</sup>* are upper and lower limits, respectively.

From RMT, eigenvalues falling outside of [*λ*−, *λ*+] are assumed to deviate from its expected predictions [63,64]. Hence, we can use this theory to test the reliability of the relationships in our empirical data [65]. That is, if an empirical correlation matrix actually has real valuable information, it must have eigenvalues that are outside the bounds of [*λ*−, *λ*+]. Otherwise, the empirical correlation matrix can be taken to contain mainly random noise. In this study, RMT has been used to test our correlation matrices. The results show that all correlation matrices are not random and contain valuable information.
