4.2.2. Cleaning Method

In recent studies, different approaches have been proposed to remove the noise from a correlation matrix through modification of the corresponding eigenspectrum, e.g., Linear shrinkage [73], Eigenvector clipping [74], Non-linear shrinkage [75] and Rotationally invariant, optimal shrinkage [76]. One common obstacle for most of the existing cleaning methods is that they have parameters needing definition. This raises an obvious question: how do we choose these? It is acknowledged that a lot of effort has been made to obtain the right parameter values, i.e., the noise is removed completely without the loss of data information [77,78]. However, these optimization approaches have one issue, which is that they use the Frobenius norm in their formula, so they fail to work with outlier-containing data, a downside of the Frobenius metric [79]. On the other hand, Eigenvector Clipping distinguishes itself from others [74] as it does not require any training parameters, making its outcome robust and more reliable. Furthermore, this cleaning method is straightforward to implement, with the guaranteed efficiency as it keeps the information part, i.e., after the cleaning process, the trace of the correlation matrix remains unchanged [80]. This

method has shown good performance in different studies and has been applied widely to different topics such as programming education, portfolio optimization and signal processing [70,81,82]. The outstanding performance of the Eigenvector clipping encourages us to choose this method for our cleaning scheme.

Given eigenvalues *λ*<sup>1</sup> ≥ *λ*<sup>2</sup> ≥ *λ*<sup>3</sup> ≥ ... ≥ *λ<sup>n</sup>* and corresponding eigenvectors *v*1, *v*2, ... , *vn* of our empirical correlation matrix **C**, we can identify *k* ≤ *n* such that *λ<sup>k</sup>* > *λ*<sup>+</sup> and *λk*+<sup>1</sup> ≤ *λ*+. The Eigenvector clipping defines the denoised correlation matrix **C***denoised* by [83]:

$$\mathbf{C}\_{dencised} = \Sigma\_{i=1}^{n} \lambda\_i^\* v\_i v\_{i'}^\dagger \lambda\_i^\* = \begin{cases} \frac{\lambda\_{k+1} + \lambda\_{k+2} + \dots + \lambda\_n}{n-k}, \forall i \ge k+1\\ \lambda\_{i'} \forall i \le k \end{cases} \tag{1}$$

Equation (1) uses the same eigenvectors as **C** but modifies their corresponding eigenvalues such that those greater than *λ*+ remain unchanged while the rest will be replaced by their average value. Notably, although small eigenvalues are replaced, the trace of the denoised correlation matrix is equal to its origin.

Regarding the trend effect, it is explained by the first eigenvalue and eigenvector, referred to as "market component" [83]. The market component is proved to influence the outcome of the correlation matrix. In particular, it is involved in all interactions observed from the correlation matrix due to its enormous amount of information, consequently, lessening the performance of clustering algorithms [84]. Thus, removing this component is a necessary step to clean the trend effect so that a greater portion of the correlation can be explained by components that affect specific subsets of the cryptocurrencies and, hence, facilitate clustering algorithms to find dissimilarities across clusters. A cleaned correlation matrix **C***cleaned* is obtained by subtracting the market component from the denoised correlation matrix:

$$\mathbf{C}\_{clancd} = \mathbf{C}\_{denoised} - \lambda\_1 v\_1 v\_1^\mathsf{T} \tag{2}$$

We found that the connections between cryptocurrencies decrease greatly without noise and trend effects: large cryptocurrencies such as Bitcoin, Ethereum and Ripple do not see to affect the cryptocurrency market as they did before the cleaning process, since there is no strong connection between them and other cryptocurrencies. This result is in line with [70], where the Eigenvalue Clipping method was also used to clean the educationrelated correlation matrix.
