*3.1. A Note on Data Sampling and Missing Data*

Since price values are collected tick-by-tick, there is no fixed timescale for all cryptocurrencies leading to an inconsistency between the time series. For this reason, we re-sample the dataset by using data points at a specific timescale. In particular, we choose four different timescales, namely 30 min, 6 h, 12 h and 24 h. Each data point of a dataset is taken to be the price of the last transaction of 34 cryptocurrencies within the considered timescale. Eventually, we have four datasets corresponding to four different timescales. Table 2 shows the description of each re-sampled dataset.

**Table 2.** Characteristics of four re-sampled datasets at four different levels of granularity.


Three out of four datasets have missing values with the same percentage of 0.8%. Note that a data point of a dataset is considered missing if at least one cryptocurrency does not have the price value at this data point. For each time series, instead of simply removing missing values from the time series and values from other time series from the same time, we replace missing values with the average value of the corresponding time series. This technique has been adopted in different research topics with good performance [47–49]. Furthermore, we notice that this does not change the statistical properties of the correlation between time series but, instead, helps to keep more information and thus the results found from conducting the experiments are more reliable and accurate.
