**3. Methods**

To analyze the relationship between the prediction errors of classical forecasting methods such as ARIMA, we build a feature space [18] based on Shannon entropy (*H*) features that presumably can be used to identify those TS instances where ARIMA forecasting errors are expected to be higher or lower, accordingly. These features are based on four entropy-based complexity measures, namely the frequentist binning approach (*Hdist*) [22]; 2-Regimes (*H*2*reg*) [19] and Permutation (*Hperm*) entropy [23]. These three built upon the notions of symbolic dynamics, and the Spectral entropy (*Hspct*) [24] based on the analysis of the spectrum of a time series. The main difference between these measures is the *discretization* or *symbolization* followed to describe the states of dynamical systems, which has a deep impact on the quantification of entropy [25]. For instance, consider a TS with hundreds of points coming from a sine wave; with the frequentist binning approach, we will be studying a system whose probability distribution follows an arc-sine distribution, whereas if we represent it by symbols that correspond to 1-period waves, we will be studying a system which follows a Dirac delta probability distribution.

On the other hand, *Hdist* has been used to study a dynamical system in terms of the rate of Emergence (*E*) of new states or information, the rate of Self-organization (*S*) displayed as discernible patterns, and the interplay between these two called Complexity (*C*), hereafter *ESC* for short [8]. In particular, systems with higher *C* concentrates its dynamics into a few highly probable states with many less frequent states [8]. In this work, the *ESC* framework is extended to study the interplay between *E* and *S* for *Hspct*, *<sup>H</sup>*2*reg*, and *Hperm*.

Therefore, first, the Shannon-based complexity measures and TS symbolization for each is presented; second, the *ESC* framework is introduced along with the Complexity Feature Space; third, the forecasting methods are briefly defined; finally, the proposed analysis methodology is detailed.

## *3.1. A Background on Entropies*

Entropy is a term with many meanings, but in the information theory area it usually refers to the average ratio of uncertainty a process produces, which is measured by the well-known discrete Shannon Entropy equation [4,26].

$$H = \sum\_{i=1}^{n} p\_i \log\_a p\_{i\prime} \tag{3}$$

where *H* stands for Shannon Entropy, *n* is the total number of TS observations, *a* is the logarithm base, and *pi* is the probability for each symbol of the TS alphabet. It is worth noting that *information* may refer to the capacity of a channel for transmitting messages, the consequence of a message, the semantic meaning conveyed by it, and so on, all regardless of its specific meaning [27]. Entropy-based measures are the first option when the task at hand is the quantification of the *complexity* of a time series [28]. However, what does *complexity* stand for?

Complexity science is a multidisciplinary field in charge of studying dynamical systems composed of several parts, whose behavior is nonlinear, and that cannot be studied neither by the laws of linear thermodynamics nor by modelling parts in isolation [29,30]. A key aspect of these systems is that individual parts' interactions will heavily determine the future states of the overall system, and shall induce spatial, functional, or temporal structures all alone (i.e., self-organizing) [27]. Similarly, these systems are considered *open* since they exchange matter, energy, and information with their environments [27,30]. These are observed in a multitude of disciplines such as biology, ecology, economy, linguistics, and so on; it is common to study their dynamical behavior through the observation of one or more of its variables in the form of TS [31].

There are several measures of *complexity*, but at its core remains the notion of information altogether to some form of Shannon entropy formulation [8]. These two notions spawn a myriad of complexity measures; among them stand out *Hperm*, the Kolmogorov–Sinai (KS) complexity, *Hspct*, *<sup>H</sup>*2*reg*, Transfer entropy, LMC complexity, -complexity, *ESC*, and so on [8,27,31–33]. The diversity of such measurements is given by the inexorable subjectivity of what shall be considered *complex*, which is translated into a specific *quantization* of a TS regarding an observer point of view [25,27].

In this work, *quantization* stands for the procedure to estimate the discrete probability distribution from a TS; in other words, how we describe the states of the system. In the classical *H*, continuous measurements are typically transformed into discrete states by binning measurements into non-overlapping ranges. To emphasize this form of entropy estimation *per se*, it will be referred as *Hdist* and will reserve *H* for the concept. However, there are other ways in which we can discretize

a time series into a probability distribution, which needs to be hand in hand with the properties of the time series that are analyzed. In Figure 1, a cartoon of the four symbolizations used in this work is shown.

**Figure 1.** Four possible characterizations of the states of a dynamical system. On (**A**) the frequentist binning approach; on (**B**) the spectral probability density of the TS is estimated by the classical Fourier transform of the Auto Correlation Function (ACF); on (**C**,**D**) symbolic transformations define the alphabet by ordinal rank patterns and sequences of the first derivative sign, respectively.
