3.1.3. 2-Regimes Entropy

Under SD, a *regime* stands for a qualitative behavior defined as a growth model or dynamical rule with its own state space that allows the existence of multiple attractors in equilibrium or not, at the same time [19]. This *symbolization* (transforming TS values into symbols) allows study, for instance, of structural changes in a TS such as sudden changes in trend, such as changes in the governing rules of a dynamical system (e.g., switching between trends). An adequate symbolization allows the highlighting of temporal patterns, to improve the signal-to-noise ratio, improve computation efficacy and efficiency, to mention a few [36]. To understand this, consider a time series of the form *X* = *x*1, *x*2,..., *xt*|*xi* ∈ R. Typically, the *symbolization* of a TS is carried out by dividing R into *q* ≥ 1 | *q* ∈ {1, 2, ...} non-overlapping bins. Such bins represent the states of the system; hence, each *xi* is mapped to its corresponding partition, mapping from a sequence of points *X* into a sequence of symbols *Z* = *z*1, *z*2,..., *zt*. In the simplest case when *q* = 2, the original TS is mapped into a sequence of two symbols (i.e., 0 or 1) using a threshold that partitions R into two intervals, namely *2-regimes* symbolization. This representation is useful to study trends of growth (expansion) or fall (contraction) in a TS, for instance the *bear* and *bull* regimes in an economic market. In this work, 2-regime symbolization is carried out by employing the sign of the first difference such that

$$z\_t = \begin{cases} 1 & \mathbf{1} \cdot \text{sgn}(\mathbf{x}\_t - \mathbf{x}\_{t-1}) > 0, \\ 0 & \text{otherwise}, \end{cases} \tag{9}$$

where *sgn* stands for the sign function. An example of this codification is presented in Table 2, where the first row displays the original observations, and the second the corresponding 2-regimes codification. Notice that due to Equation (9), the length of the codified TS is n-1.

**Table 2.** Example of a converted time series for codifying TS values.


Once the TS is symbolized into *Z*, Equation (3) can be employed to calculate a two-regime entropy (*H*2*reg*). It is worth mentioning that *<sup>H</sup>*2*reg* can be considered to be a special case of *Hperm*, i.e., during the symbolization step of a TS, a permutation with *de*! = 2 will produce equivalent symbols to those obtained by Equation (9). However, by considering sequences of contiguous symbolized observations *zt*, *zt*+1, ... , *zt*+*d*, an alphabet of size 2*d* can be explored, allowing study of richer 2-regimes alphabets. In this work, the alphabet size is set to 2*d* = 256, *d* = 8. Finally, notice that *<sup>H</sup>*2*reg* can be normalized by *loga*(2*<sup>d</sup>*) to constrain it within 0 ≤ *<sup>H</sup>*2*reg* ≤ 1. In this sense, a lower value is obtained when there is a predominant regime (e.g., trend or drift); a value closer to one stands for a more random and noisier TS.

## *3.2. ESC and the Complexity Feature Space*

Regarding the *forecastability* (i.e., determining a system future states) of a TS <sup>Ω</sup>(*xt*), some complexity measures such as *Hspct*, defines it as the complement of the average uncertainty of the process (given by its spectral density) such that <sup>Ω</sup>(*xt*) = 1 − *H*<sup>∗</sup>, where *H*∗ corresponds to the normalized version of *Hspct* [24]. On the other hand, others indicate that the *forecastability* shall be in terms of existing and new patterns. Thus, complexity may be defined as the relationship between stability and instability [30], Information and Disequilibrium [32], redundancy and new information [21], or Emergence and Self-organization [29]. In particular, it has been established that among the basic properties of complex systems stand the emergence, self-organization, and complexity [27]. Therefore, here we decided to extend the *ESC* paradigm using different entropy functions, namely *Hspct*, *Hperm*, and *<sup>H</sup>*2*reg*. In this sense, it is possible to measure (1) the average

uncertainty given by a probability distribution considering multiple quantizations, (2) estimate their compliments associated with the *forecastability*, and (3) analyze the interplay between these two.

Formally, the Emergence (E), Self-organization (S), and Complexity (C) for a TS, irrespectively of the entropy of choice, is given by the following

$$E = -\mathbf{K} \cdot \mathbf{H}^p(\mathbf{x}\_l) \tag{10}$$

$$S = 1 - E\_\prime \tag{11}$$

$$C = 4 \cdot E \cdot S\_{\prime} \tag{12}$$

where *Hp*(*xt*) is the normalized version of *Hdist*, *Hspct*, *Hperm*, or *<sup>H</sup>*2*reg*, such that 0 ≤ *E*, *S*, *C* ≤ 1. This normalization is carried by the constant *K* = 1 *loga* (*Ub*), which corresponds to the entropy of uniform distribution with an alphabet of size b. It is worth mentioning that when required, *E*, *S*, and *C* for a particular entropy mentioned above will be referred to with the entropy ID underscored. For instance, if we refer to (*<sup>E</sup>*, *S*, *C*) tuple for *Hspct*, these may be referred as (*Espct*, *Sspct*, *Cspct*), respectively.

The feature space conformed by these 12 measures is called the Complexity Feature Space (CFS). Hence, any TS is now mapped to a 12-D space and given the aforementioned definitions of complexity, and is expected that in the CFS it will be grouped into a specific region in accordance with its *forecastability*. Notice that such a region will depend, in part, of the model used to forecast [21] as well as the forecasting horizon. However, to obtain any information from the CFS regarding the relationship between forecastability and complexity, it is a necessary tool for its analysis. For that matter, visual tools based on a dimensional reduction technique such as Principal Components Analysis (PCA) can be employed [18]. In this sense, any TS from the CFS are now displayed as 2-D points whose dimensions correspond to two principal component axes. Although PCA leads to a loss of information due to its linear nature, the topological distribution of points is mostly preserved [18]. Finally, this feature space can be improved by considering other entropy-based complexity measures such as Transfer Entropy [37] or Tsallis Entropy [38] or different characteristics such as the trend, frequency, or seasonality [2,18,39,40].

## *3.3. Forecasting Methods: Smyl, Theta, ARIMA and ETS*

On M4 Competition, 61 forecasting methods participated, the sharing dataset contains in addition the forecast values for the better 25 methods. We select four of them, considering the Smyl winning method and three classical benchmark methods; each of them is described in the next paragraph, and ordered according to the final position in the competition.


The ARIMA method is used to forecast all complete datasets, including synthetic and M4 Competition TS, and the other three methods are used only for M4 Competition TS.

## *3.4. Analyzing the Forecasting Performance in the CFS*

A global view of the executed steps to build the CFS for analyzing the relationship between TS forecastability and complexities is presented in Figure 2.

The first step consists of gathering TS. In our case, we tested the CFS using two types of data sets: synthetic, and M4 Competition TS. Afterward, parameters of TS complexity measures, such the alphabet size, is determined. The third step consists of the calculation of the *ESC* for every type of Entropy function. Recall that these produce a total of 12 measures per TS. The latter is repeated for each TS that belongs to the set (either Synthetic or selected M4 Competition TS). The fourth step is to make the forecast for each TS in accordance with its corresponding forecasting horizon, and measure its error using a performance measure. In the fifth step the *ESC* measures of the dataset along with PCA are used to build the CFS to visually display TS in 2-D. Finally, in the last step the performance metric is displayed in the 2-D CFS to assess its relationship with the complexity measures. To enhance this step, the relationship between forecastability and complexity is assessed by plotting quartiles of the performance metric.

**Figure 2.** Proposed method.
