1. Introduction
Information entropy was first proposed by Shannon in his seminal paper “A Mathematical Theory of Communication” [
1]. This measure can effectively assess the amount of “surprise” (new information) contained in any given instance of the result of a random variable with a known distribution function. Originally formulated in the context of telecommunications, researchers have extended the applications of this technique in a wide array of research fields, such as econometry [
2], computer theory [
3], set theory [
4], and medicine [
5].
Several different entropy techniques and variants have been proposed since then. Tsallis’ entropy [
6] and Renyi’s entropy [
7] offer an alternative formulation to Shannon’s entropy, by modifying the overall weight of the probability distribution. Other entropy techniques are characterized by the particular definition of the event set, such as approximate entropy [
8] and sample entropy [
9]. One noteworthy example is Permutation Entropy (PE) [
10], which measures the distribution of the ordinal patterns instead of the cardinal values of the signal. This approach is robust to noise and computationally efficient. Moreover, it requires no prior knowledge of the signal’s internal structure to successfully measure its information content.
Entropy techniques in general, and PE in particular, are almost never used on the raw signal directly. More often, the signal is first treated by a filtering procedure. There are many reasons for this, most notably noise reduction or the study of the signal’s dynamics in a particular frequency band. Some PE techniques, in particular multiscale PE [
11] and its variants [
12], implicitly incorporate a filtering step within their algorithms. Nonetheless, the applied filters are not free of unwanted effects, since any prior modification to the time series implies a modification in the final PE measurement. This effect has not been previously characterized in the literature. Even linear filters, widely implemented in signal processing, have an effect on PE, which is not well understood. For this purpose, we need to accurately outline the impact of said preprocessing and separate its contribution to PE from the signal’s information content.
In this work, we outlined the theoretical effect of linear filters over arbitrarily distributed time series. In particular, we proposed the characterization of the linear filter’s intrinsic PE, which explicitly describes the filter’s contribution to the final PE value, regardless of the original raw signal. This allows the correct interpretation of the PE algorithm over an arbitrary time series, by correctly discriminating the filter’s effect from the signal’s inherent information content.
The remainder of article is presented as follows:
Section 2 introduces all the necessary background knowledge, including the definition of PE and the particular theoretical characteristics of PE for Gaussian time series. In
Section 3, we develop the mathematical foundations of the linear filter’s intrinsic PE.
In
Section 4, we test our theoretical results by applying a variety of linear filters (lowpass, highpass, and passband) over different correlated and uncorrelated signal models. Finally, we summarize and discuss our findings in
Section 5.
3. Methods
In this section, we develop the theory behind the intrinsic PE of linear filters. First, we briefly discuss the filtering process by means of the convolution operator, both in the time and frequency domains. We then apply an arbitrary linear filter on Gaussian noise (wGn). By taking advantage of the frequency properties of wGn and the results from
Section 2.2 [
14], we obtain the theoretical effect of linear filters in the signal’s PE.
3.1. Filters, Power Spectrum, and Autocorrelation
For any given signal
, we apply a filter function
by means of the convolution operation:
where
is a real continuous variable for integration. If we apply the convolution theorem in the context of Fourier transforms [
16], we can describe the filtered signal in the frequency domain as follows,
where the Fourier transform is defined as:
and
f is the frequency. In order to obtain the power spectrum of the signal
, it is enough to compute
, the square of the module of
. Therefore, the filtered signal
will have a power spectrum of
.
We were interested in obtaining the normalized autocorrelation function of
. By means of the Wiener–Khinchin theorem [
17], we know the inverse Fourier transform of the power spectrum is indeed the autocorrelation function of the filtered signal,
where
is the continuous time lag and
k is a constant with a chosen value such that
. With the result of this theorem, we have a direct relationship between the Fourier transform of
and the autocorrelation function
. If
is a Gaussian process, we can obtain the theoretical PE value of the filtered signal
using Equations (
2)–(
4), for
.
3.2. Linear Filter’s Intrinsic Permutation Entropy
For the case of random signals, the power spectrum becomes a power spectral density by applying the expectation operator
. In the particular case of the wGn, the power spectral density is flat and constant along all possible frequencies [
16]. If
is a wGn process, with variance
= 1, without loss of generality, the filtered signal
power spectrum is,
since the filter is a deterministic component. If we compute the filtered signal’s autocorrelation function using Equation (
8),
where
k is a real number such that
.
The particular form of Equation (
10) is revealing. By assuming uncorrelated white Gaussian noise,
is exclusively the autocorrelation function of the filter. If we insert
into Equation (
3), we obtain,
where
T is the sampling period of the signal. By subsequently computing the remaining pattern probabilities using Equation (
4), we obtain a theoretical PE (
) value corresponding solely to the linear filter
.
Even when the calculations initially required the use of the wGn, the autocorrelation function and the resulting PE value are not functions of x(t). Therefore, any linear filter has a corresponding intrinsic permutation entropy value. Moreover, this PE value can be obtained analytically, as long as the inverse Fourier transform of has a closed form.
Since Equation (
10) makes no reference to the distribution of
, any uncorrelated signal will lead to the same result, even if it is not Gaussian. This implies that the filter’s intrinsic PE corresponds to the maximum permutation entropy possible under the restriction of said filter. Therefore, any further reduction from this value must originate from the signal itself. If the effect of the filter is not taken into account, the final PE value can be mistakenly attributed solely to the signal’s dynamics.
3.3. Academic Filter Example
In order to better illustrate the computation of an intrinsic filter’s PE, we used the filter model provided by Farina and Merletti [
18]. In particular, for a given wGn time series, the authors applied a bandpass filter with the following power spectrum,
with
f being the frequency variable,
k a real constant, and
and
the low- and high-cutoff frequencies, respectively.
The information provided by Equation (
12) is enough to provide a theoretical PE for the aforementioned filter effect. First, we obtained the normalized autocorrelation function by means of the inverse Fourier transform of (
12),
choosing again the constant
k in such a way that
.
In order to work with a discrete form of Equation (
13), we needed to define the sampling period
, the inverse of the sampling rate. We subsequently used the autocorrelation function (
11) to obtain the pattern probability distribution using Equations (
3) and (
4). Finally, we obtained the corresponding PE for
using Equation (
2).
The results shown in
Figure 1b correspond to the theoretical predictions using Equations (
2) and (
13). We observed that the lowest PE curves corresponded to filters with narrow bandwidths, since few frequency components imply a signal with low complexity.
In order to compare these results, we generated 1000 uncorrelated white Gaussian noise signals of length
. We applied the filter in (
12) on each series, using the parameters shown in
Figure 1a. We subsequently measured their PE using Equations (
1) and (
2) directly. The results closely followed the curves in
Figure 1b, within a 95% confidence interval. Note that, for shorter time series, we expected the experimental result to be slightly lower, since the PE estimator has a downward bias due to a finite number of data points [
19].
4. Results and Discussion
In this section, we test the results obtained from
Section 3, by means of time series simulations for different models, including white Gaussian noise, white uniform noise, and two Autoregressive and Moving Average (ARMA) models. Each signal was subject to a variety of lowpass, highpass, and bandpass linear filters. The resulting filtered signals were subject to the classic PE procedure (
2) and subsequently compared to the theoretical predictions.
4.1. Filters on Simulated White Noise
In order to test the precision of the intrinsic filter’s PE (
11), we performed a series of test on simulated signals. We created 1000 uncorrelated white Gaussian noise series of length
each, sampled at a sampling frequency of
Hz. For each signal instance, we apply the filters specified in
Table 1.
In
Figure 2, we compare the theoretical PE (
2), as a function to the cutoff frequency
, with the PE value obtained from simulations, using the moving average (which is part of the multiscale PE algorithm [
20]), Butterworth, and Chebyshev type I lowpass filters from
Table 1.
Figure 3 repeats this experiment for highpass filters.
Figure 4 shows a Butterworth (
) bandpass filter as a function of the low and high cutoff frequencies
and
respectively.
All proposed filters behaved almost identically with respect to their cutoff frequency, with the exception of the moving average filter in
Figure 2b, which presented discontinuities. This was a consequence of the definition of the moving average filter, where the cutoff frequency
was not an explicit parameter. Instead,
was a discontinuous function dependent on the window length
L, which was always a positive integer. In the case of the passband filters in
Figure 4, all curves behaved similarly, regardless of the filter used (hence, only the Butterworth (
) is shown).
For all types of filters, the theoretical PE (
2) curves lied between the confidence interval obtained from the simulations, at 95% confidence. As a general trend, all curves presented lower levels of PE when we reduced the passband bandwidth. This is straightforward to see for lowpass and highpass filters. Nonetheless, the bandpass filters’ PE curve is more difficult to interpret. Lower
values led to lower PE values, but variations of
did not necessarily lead to monotonic changes in PE. This implies both the passband frequency range and the actual frequency values play an important role in the filter’s information content.
We had similar results with other types of uncorrelated noise, even if the Gaussian assumption by Bandt and Shiha [
10] did not hold. For example, we repeated the previous experiment with white uniform noise. As we can see in
Figure 5, the behavior of PE (
2) with respect to
was identical to the case with white noise. This supported our claim from
Section 3.2, where the particular distribution of the signal was not relevant in the computation of the intrinsic filter’s PE, as long as the signal was uncorrelated.
4.2. Filters on Correlated Gaussian Signals
In this section, we compute the PE of filtered correlated signals, in order to assess the contribution of the filter compared to the signal’s original PE. We used correlated Gaussian signals as a benchmark, since we could explicitly obtain theoretical PE values using Equations (
4) and (
11) for dimension
.
We proposed here the use of Autoregressive and Moving Average (ARMA) processes, since the resulting PE (
) has a closed expression, which is dependent only on the models’ parameters [
21]. Moreover, ARMA processes are widely used to model complex phenomena, such as heart rate variability [
22,
23]. We simulate 1000 signals of length
for each of the following models:
Moving average process of order , with and with the autocorrelation function:
- -
, ;
Autoregressive process of order , with and with the autocorrelation function:
- -
, .
Since these processes comply with the Gaussian conditions in [
14], we knew precisely the theoretical PE (
2) we should expect for both models. The results are shown in
Figure 6 and
Figure 7.
Figure 8 shows the PE difference between each filtered model’s signal and the maximum PE allowed by each filter, as a function of the cutoff frequency.
In
Figure 6 and
Figure 7, we observe, for all cases, that the PE of each model did not exceed the maximum theoretical curve of the intrinsic filter’s PE. This agreed with our theoretical results in
Section 3. Furthermore, for both models, PE measurements obtained from simulated series agreed (with 95% confidence interval) with the theoretical expectations, where the PE values were obtained by computing the autocorrelation function of the filtered signal in Equation (
8) and calculating the pattern probability distribution using Equations (
3) and (
4) for dimension
.
Figure 8 shows the explicit PE difference by applying the lowpass filters in
Table 1 to the MA(2) and AR(2) models (
Figure 8a,b, respectively). For these particular configurations, we clearly observe cutoff frequencies where the PE difference was the maximum. By applying lowpass filters with a normalized cutoff frequency of
(where
is the Nyquist frequency) to the MA(2) model, we were able to identify the bandwidth that contained the most useful information. The same was true for the AR(2) model, where the normalized cutoff frequency was
.
4.3. Discussion
When we empirically apply the PE algorithm to an arbitrary signal, we know that the pattern probability distribution is related to the signal’s autocorrelation function. If the signal is Gaussian, we can describe this distribution with a closed expression for dimensions
and
, using the techniques already discussed [
10]. The relationship is explicit only when the Gaussianity condition is satisfied. Nonetheless, autocorrelation and pattern distribution are still closely related for non-Gaussian signals, albeit not with a closed form. In the absence of a filter, the inherent dynamics captured by the PE correspond exclusively to the intrinsic characteristics of the signal.
The same cannot be said when we apply any type of preprocessing to the signal. By applying a filter we modify the autocorrelation function and, thus, the final PE result. The current literature acknowledges this effect, but does not offer an explicit estimation of the filter’s contribution to the final PE of the signal.
By obtaining the intrinsic maximum PE of any defined linear filter using Equation (
10), we can confidently identify how much said filter reduces the PE measurements. Since this reduction describes the entropy of filtered uncorrelated white noise (Gaussian or otherwise), any further reduction from our measured PE must come from the signal itself. In fact, this is the maximum PE under the constraint of our defined filter.
The results from
Figure 8 require further discussion. the presence of a clear maximum PE difference for specific cutoff frequencies implies the existence of specific bandwidths where most of the ordinal information content can be found. This, of course, depends on the specific model and characteristics of the signal, as well as the proper choice of the filter. This provides a useful reference when we need to identify the most relevant frequency regions for further analysis.
In practice, the signal’s internal dynamics are not the only source of PE deviation from the intrinsic filter’s PE. We still need to identify the contribution of the PE bias [
19] to the final PE measurement. As long as we have a sufficiently long signal, and a reasonably low embedding dimension, we expect this bias to be low, or even negligible, compared to the contribution from the new dynamics of the filtered signal.
5. Conclusions
In this article, we outlined and characterized the effect of linear filter preprocessing on the time series’ Permutation Entropy (PE). First, by means of the Wiener–Khinchin theorem [
17] and the ordinal pattern symmetries of Gaussian processes [
14], we performed the theoretical development of the general linear filter’s intrinsic PE for low embedding dimensions. Next, to test the theoretical results, we applied a series of lowpass, highpass, and bandpass linear filters on the uncorrelated signals, such as white Gaussian noise and white uniform noise. Finally, we applied said filters to two Autoregressive and Moving Average (ARMA) processes of order two, in order to test the resulting PE for signals with intrinsic dynamics.
When plotting PE as a function of the filter’s cutoff frequency, we found our theoretical results to match the simulation results for all cases, with a 95% confidence interval. For the uncorrelated noise, we found no difference in the resulting curve, regardless of the signal’s distribution (Gaussian or uniform stochastic processes) for any given filter. This supported the theoretical result, where we established that the filter’s intrinsic PE curve was independent of the uncorrelated signal distribution. In the case of filtered ARMA processes, we observed lower PE values than those from filtered uncorrelated series. Moreover, the difference between both PE values were dependent on the cutoff frequency. When plotting said entropy difference, we found clear maxima for both ARMA models, with small variations for the different filters used.
The autoregressive and moving average filtered signals merit further attention. We can interpret the PE curve as the maximum entropy allowed by the filter, only achieved when we applied said filter to uncorrelated noise. Any further downward deviation from this curve corresponded to information contained within the signal. Moreover, this PE difference was dependent on the cutoff frequency. In the particular cases of the ARMA models used in the present work, we observed clear maximum PE values, which suggested an optimal passband that contained most of the ordinal information of the system. This may not be the case for an arbitrary signal, but the maximum PE difference, when present, led naturally to the proper choice of passband in order to extract the maximum amount of ordinal information.
With the results presented in this article, we are confident we can properly interpret the results from a filtered signal PE. If using a linear filter, we can separate the different sources of predictability: part of it comes from the filter preprocessing, and the other part comes from the underlying information contained in the signal. This is true in the case of general preprocessing filtering, as well as in cases where the filtering step is part of the PE algorithm, such as the multiscale PE variants [
12,
20]. It is important to note that, in practice, the PE is a biased statistic, which acts as an additional source of information, albeit not desired. For sufficiently long signals, this bias is negligible, but in the worst-case scenario, we should also take this effect into account for a proper interpretation of the phenomenon. There are still several challenges in the understanding of the filters’ effects on ordinal information, in particular the characterization of forbidden patterns [
24] of filtered signals. This is a topic for future research to explore.