1. Introduction
The term “entropy” is used extensively in the modern scientific literature. Originating in the 19th-century theory of thermal dynamics [
1], the concept is now, to a near bewildering extent, used widely in one form or another across many sciences. For example, entropy is at the foundation of information theory [
2] and is of crucial use in computer science [
3,
4]. Also, neurosciences make use of entropy both as a tool to characterize and interpret data from brain scans [
5] and, more fundamentally, in theories of the dynamics of the brain and mind [
6]. Generally speaking, entropy is a fundamental notion in complexity science [
7]. Here, we present a brief review of some recent mathematical developments in the theory of entropy.
In mathematical terms, an entropy is a functional defined on a space of probability distributions associated with a W-dimensional event space. Thus, we use the word “entropy” with the same meaning it assumes, for instance, in the case of Rényi’s entropy, without direct reference to thermodynamics. From this perspective, the relevance of entropies is clear. They can be considered as analytic tools that can help in the analysis of the inter-dependencies within the system behind a given event space. Similarly, their use in information-theoretic analysis of time series is likewise natural. The connection between entropies as mathematical functionals and the thermodynamic entropy of Clausius defined in terms of heat transfer is much less immediate. Here, we will concentrate on the mathematical aspects of entropies as functionals and only make a few comments on the possible connection to thermodynamics.
The first question to tackle is which functional form of
yields useful entropies. i.e., how to limit the infinite number of choices for
. It is well known that the Boltzmann–Gibbs–Shannon form
(we assume
) is the unique possibility if one assumes that the entropy must satisfy the four Shannon–Kinchin (SK) axioms [
8]:
- (SK1)
(Continuity). The function is continuous with respect to all its arguments.
- (SK2)
(Maximum principle). The function takes its maximum value over the uniform distribution .
- (SK3)
(Expansibility). Adding an event of zero probability to a probability distribution does not change its entropy: .
- (SK4)
(Additivity). Given two subsystems A, B of a statistical system, .
Therefore, to derive entropies of a functional form different from the one in Equation (
1), it is necessary to go beyond the four SK axioms. Various strategies in this respect have been adopted in the literature.
Let us start by recalling Constantino Tsallis’s elegant observation [
9] that the formula
provides a possible generalization of Boltzmann’s entropy. This is the case in the sense that
is a functional on the space of probability distributions
, and in the limit
, the entropy
becomes equal to the Boltzmann–Gibbs–Shannon entropy in Equation (
1). Tsallis’s 1988 article [
9] has inspired a tremendous effort to generalize Boltzmann’s entropy in different scenarios, including what we will review in this paper. Tsallis pointed out that the entropy
fulfills a specific procedure for combining independent systems which can be seen as a generalization of the additivity property (SK4) of the Boltzmann–Gibbs–Shannon entropy. In particular, Tsallis suggested that the free parameter
q should be determined by requiring that
for a given system is extensive (for a recent reference to Tsallis’s argument, see [
10]), i.e., that in the uniform case where the probabilities are
for all
, the entropy
for
, where
N denotes the number of components in the system under analysis. For clarity, we note that when considering physical systems, the entropy may become volume-dependent; for example, because the number of states available,
W, depends on the volume. Volume dependence can also enter through the probabilities
determined by the maximum entropy principle.
Although the Tsallis entropy does not fulfill axiom SK4 in its original form and hence is non-additive, it does satisfy a composition relation different from addition.
Another set of non-Boltzmann–Gibbs–Shannon entropies was derived by Hanel and Thurner [
11] by simply discarding axiom SK4 and then determining the functional form from the asymptotic behavior of the number of events, or states,
W, as a function of the number of components in the system. However, in this approach, there is no systematic rule for handling the computation of the entropy of a system consisting of independent parts.
It is well known that in physics, the investigation of required symmetries has often been helpful. Think of Einstein’s use of the symmetry between different reference frames to derive special and general relativity theory. Consider also the eightfold way and the derivation of QCD. Additionally, consider the application of symmetry and group theory to atomic spectra. Therefore, it seems natural to, rather than discarding the fourth SK axiom, replace it in a way informed by the symmetry properties (and related group-theoretic restrictions) that an entropy must necessarily satisfy. Consequently, the question is, which symmetry cannot be ignored when dealing with entropies?
The fourth SK axiom addresses how the entropy of a system , consisting of two independent parts A and B, can be expressed as a sum of the entropy of A and the entropy of B. Tsallis entropy also allows the entropy of to be expressed in terms of the entropy of the two parts, not as a sum but as a generalized combination of the entropies of the parts.
The notion of group entropy, introduced in [
12,
13], goes one step further and exploits the idea that the process of combining two independent systems can be seen as a group operation. The group entropies satisfy the three first SK axioms, as well as a fourth one, which consists of a generalized composition procedure making use of formal group theory [
14,
15].
This approach leads to axiomatically defined entropies, whose composition rule is defined in terms of the generator
of a suitable formal group law, namely
Although this restricts the allowed functional forms available for an entropy, it does not uniquely determine the entropy as the four SK axioms do. Below, we will discuss how the analysis of combining independent systems using formal group theory, together with requiring extensivity, allows for a systematic classification of entropies in “universality classes”. These classes are defined by taking into account how fast the number of available states
W grows with the number of components
N. We accomplish this by starting with a more general functional non-trace form than the one given in Equation (
2). For details, see [
12].
By generalizing the Tsallis functional form and requiring composability together with extensivity, we are able to regard Tsallis entropy as the composable entropy associated with systems where interdependence between their components forces W to grow slowly, namely as a power of N. Below, we will discuss why composability on the entire probability space is an indispensable property of an entropy. We will also address the need for extensivity in a general sense. We will point out that extensivity can be very relevant even beyond the thermodynamic need for a well-defined limit as the number of components approaches infinity. For example, extensivity is essential for using an entropy as a measure of the complexity of a time series or for handling power-law probability distributions.
2. Why Composability
The need for composability does not arise because real systems can always be regarded as a simple combination of subsystems. The requirement is, in a sense, a logical necessity [
16]. When we consider two independent systems with state spaces
A and
B, we should obtain the same result if we compute the entropy of the Cartesian combined system
as if we first compute the entropy of
A and
B separately and then afterward decide to consider them as one combined system. By Cartesian combination, we mean that the system
is given by the set of states
, with the probabilities for the individual states given by
. This Cartesian combination immediately suggests familiar properties from group theory. The composition is as follows:
Commutative: Since we have state spaces in mind, we consider . The ordering is immaterial.
Associative: .
“Neutral” element: if . In other words, is essentially the same set as A if B consists of one element only. In terms of state spaces, all sets with one state only are considered to be identical, that is, indistinguishable. In this sense, a unique “neutral” element exists in our composition process. Accordingly, we want the entropy of a probability distribution on a set containing a single element to be zero: indeed, it would correspond to a certainty configuration. Moreover, we want the entropy of to be equal to the entropy of A if the entropy of B is zero.
The group structure of the Cartesian combination of the event spaces for systems must also be satisfied by the entropy functional operating on the corresponding probability spaces. This can be ensured by employing formal group theory [
16]. Define the entropy using the expression in Equation (
3), where the function
is the “group generator”. Here, the formal power series G(t) is said to be the group exponential in the formal group literature [
15]. The combination of independent systems is now expressed as
where the function
is given by
. Given a formal group law, namely when
is a formal power series in two variables, it is possible to prove that there exists a one-variable formal power series
such that
. This series represents the formal inverse and completes the group structure.
3. Why Extensivity
Let us first recall the definitions of extensivity and additivity. We say that an entropy is extensive if the entropy per component is finite in the limit of infinitely many components, i.e.,
An entropy is additive if, for two statistically independent systems
A and
B, the entropy of the two systems considered as a combined system is equal to the sum of the entropies, i.e.,
When considering the thermodynamics of macroscopic systems with the number of constituents, N, of the order of Avogadro’s number, the usual procedure is to compute quantities such as the thermodynamic free energy F for an arbitrary value of N. The limit of large, essentially infinite systems is then handled by considering intensive quantities, e.g., the free energy per constituent. Hence, for an entropy to be thermodynamically useful, it needs to be extensive, given the fundamental thermodynamic relation . Since the temperature T is an intensive quantity, the entropy must be extensive. Thus, we need the limit to be well defined.
Outside thermodynamics, entropy finds a significant application within information theory as a tool to characterize the complexity of a deterministic or random process generating a time series. More precisely, we can associate with a time series an ordinal representation formed by all ordinal patterns of length
assigned [
17]. Assuming that all different patterns are allowed for a process, we have
, and each pattern
i will occur with a probability of
. The Boltzmann–Gibbs–Shannon entropy in Equation (
1) is given by
. So, we obtain a diverging entropy rate
as the length of the time series increases. As we will see, this is a common situation since random processes exhibit super-exponential growth in the number of permitted patterns. Again, extensivity enters into play. Thus, we would need an entropy that grows proportionally to the number of allowed patterns in the considered time series.
The widespread occurrence of power-law probability distributions in nature, either exact or approximate, has long been the focus of self-organized criticality (for an overview, see [
18,
19]). It is now clear that power-law distributions with fat tails are common, and for this reason, it seems natural to consider the extent to which the workhorse of information theory, the Shannon entropy, can be used as a meaningful entropic measure for such distributions.
Consider a probability distribution of the following form:
Here,
A is a normalization factor and
a is a positive exponent. The variable
s denotes the “size” of some process, e.g., an avalanche or a structure such as a spatial cluster. When
grows with
N, the usual Boltzmann–Gibbs–Shannon entropy will, in general, not allow a well-defined limit
as
.
4. The Structure of the Group Entropies
Here, we explain why the expression in Equation (
3) is a good starting point for deriving generalized entropies. First, we address why we choose the argument of the generating function
to be
. We also comment on the so-called trace form of the group entropies given by
Finally, we briefly recapitulate how the functional form of
is determined by reference to formal group theory and the requirement that
is extensive on the uniform (also denoted as the microcanonical) ensemble, given by
The structure of Equation (
3) is used as the starting point because
being a group generator ensures composability for all distributions
, not only the uniform distributions. And, taking the argument to be
enables this functional form to generate a range of well-known entropies, including Boltzmann–Gibbs–Shannon, Rényi, and Tsallis [
12]. More specifically, if one chooses
one recovers the Boltzmann–Gibbs–Shannon entropy in the limit
; Rényi’s entropy in the double limit
; and Tsallis’s entropy in the double limit
.
4.1. Extensivity and the Group Law
Let us now briefly describe how the requirement of extensivity determines the group law
in Equation (
3). Details can be found in [
20,
21]. For a given dependence of the number of available states
, we want to ensure that the entropy given in Equation (
3) is extensive, i.e., that on the uniform ensemble
for
we have
.
We can express this as
Asymptotically, we have
Then, we invert the relation between
S and
G, which, by Equation (
12), amounts to inverting the relation between
G and
N. For
to generate a group law, we must require
[
12,
16], so we adjust the expression for
accordingly and conclude that
Hence, given the asymptotic behavior of
, we derive different corresponding entropies. In the expressions below,
,
, and
are free parameters.
- (I)
- (II)
Exponential,
This is, of course, the Rényi entropy.
- (III)
Super-exponential,
This entropy was recently studied in relation to a simple model in which the components can form emergent paired states in addition to the combination of single-particle states [
22].
So far, we have only considered the so-called non-trace form of the group entropies given in Equation (
3). A set of entropies can be constructed in the same manner, starting with the trace-form ansatz in Equation (
8).
4.2. Trace-Form Group Entropies
It is interesting to observe that the ansatz in Equation (
8) directly leads to either the Boltzmann, the Tsallis, or an entirely new entropy, depending on the asymptotic behavior of
. By applying the procedure described in
Section 4.1, we obtain the following three classes corresponding to the ones considered for the non-trace case.
- (I)
To emphasize the relation with the Tsallis q-entropy, we have introduced and . Note that the parameter q is determined by the exponent a, so it is controlled entirely by .
- (II)
Exponential,
,
This is the Boltzmann–Gibbs–Shannon entropy.
- (III)
Super-exponential,
,
4.3. Examples of Systems and Corresponding Group Entropies
To illustrate the classification of group entropies based on the asymptotic behavior of , we consider three Ising-type models:
- (a)
The Ising model on a random network [
11].
- (b)
The usual Ising model, for example, with nearest-neighbor interaction on a hyper-cubical lattice.
- (c)
The so-called pairing model in which Ising spins can form paired states [
22].
Let
E denote the total energy of the system. We are interested in the asymptotic behavior of the number of possible states for the three models as a function of
N for fixed energy per component
. First, consider (a). As explained in [
11],
when the fraction of interaction links, the connectance, in the considered network, is kept constant as the number of nodes
N is increased. The exponent
a is given by the ratio between the energy density and the connectance. The entropy corresponding to this functional form of
is, for all values of the exponent
a, given by the Tsallis entropy [
21].
The entropy corresponding to the standard Ising model (case (b)) with
is the Boltzmann–Gibbs–Shannon entropy. The pairing version of the Ising model (case (c)) admits a super-exponential growth in the number of states
, leading us to a new functional form of the entropy [
22]
5. Group Entropies and the Ubiquity of the Q-Exponential Distribution
It is well known that the q-exponential form relating to the Tsallis q-entropy provides a very good fit to an impressively broad range of data sets (see, e.g., [
10]). This may, at first, appear puzzling given that we saw in
Section 4.1 that the Tsallis entropy corresponds to one of the three classes considered here, namely systems with strong interdependence between the components that
. The reason that the q-exponential appears to be much more pervasive than one would expect, given that the q-entropy is restricted to the case
, may be due to the following.
Consider the maximum entropy principle. For all the classes of entropies considered in
Section 4, the probability distribution that maximizes the entropy is a q-exponential. The probability distribution for the specific case of
is the usual exponential Boltzmann distribution. But since the Boltzmann distribution is the limiting case of the q-exponential for
, we can say that, independently of the asymptotic behavior of
, the maximum entropy principle always leads to q-exponential distributions [
21].
How can this be? The reason is the functional form of the argument
of the ansatz in Equation (
3). When one applies Lagrange multipliers and extremizes the entropy in Equation (
3), the q-exponential functional form will arise from the derivative
. The remaining factors in the expression for the derivative of
will depend on the functional form of the group law
but will formally just be a constant if evaluated on the maximizing distribution
and do not depend explicitly on
.
6. An Entropic Measure of Complexity
Fully interacting complex systems possess a number of microstates that may be different from the Cartesian exponential case , where is the number of states available to component number i in isolation. When interactions freeze out states, can grow slower than exponentially with increasing N. In contrast, when interactions allow for the creation of new states from the combination of components, can grow faster than exponentially. As an example, think of hydrogen atoms that form hydrogen molecules . The states of are not just the Cartesian product of free single hydrogen atomic states.
The possible difference between the cardinality of the state space of the fully interacting system and the state space formed as a Cartesian product of the states available to the individual components can be used to construct a new measure of the degree of emergent interdependence among the components of a complex system. We can think of this as a quantitative measure of the degree of complexity in a given system. We imagine the entire system
to be divided into two parts,
A and
B, and compare the entropy of the system
, obtained by combining the micro-sates of the two parts as a Cartesian product, with the system
, obtained by allowing full interaction between the components of
A and those of
B. We denote by
this fully interacting system. The complexity measure is given by [
20]
From the dependence of
on the number of components in the separate systems
A and
B, one can, in principle, determine the kind of emergence generated by the interactions in a specific complex system. In [
20], we conjectured that the number of available states for the brain grows faster than exponentially in the number of brain regions involved. It might, at first, appear impossible to check this conjecture. However, experiments like those conducted on rat brain tissue, such as the famous avalanche experiment by Beggs and Plentz [
23], seem to open up the possibility for a study of
as a function of tissue size. We imagine it would be possible to study a piece of tissue of size
N and one of size
, allowing, at least in principle, to determine how
behaves for such a neuronal system. A different, although related, notion of complexity, the defect entropy, was proposed in [
24].
7. Group Entropy Theory and Data Analysis
The theory of group entropies has recently proved to be relevant in data analysis. One important reason for this relevance is extensivity. When the number of patterns that may occur in a given time sequence depends, in a non-exponential way, on the length
L of the sequence, the Shannon-based entropy of the sequence
will not permit a well-defined entropy rate
because the Shannon entropy will not be extensive in
L. This may, for example, pose a problem for the widely used Lempel–Ziv [
25] complexity measure. This is similar to the discussion above concerning how the Boltzmann-Gibbs-Shannon entropy fails to be extensive on state spaces that grow non-exponential in the number of constituents. We will see below that time series very often contain a number of patterns that grow super-exponentially in the length of the sequence.
To discuss this fundamental application of group entropies, we start with a brief review of the ordinal analysis of time-series data. We follow the discussion and notations in [
17,
26,
27]. Consider the time series
where
t represents a discrete time and
. Let
. We introduce the sequence of length
L (or
L-sequence)
Let
be the permutation of
such that
We denote the rank vector of the sequence
as follows:
The rank vectors
are called ordinal patterns of length L (or L-ordinal patterns). The sequence is said to be “of type” . In this way, given the original time series , we have constructed an ordinal representation associated with it: the family of all ordinal patterns of length L.
We denote by the group of the permutations of , which represents the set of symbols (also called “alphabet”) of the ordinal representation.
In the following, we consider discrete-time stationary processes , both deterministic and random, taking values in a closed interval . We define a “deterministic process” as a “one-dimensional dynamical system” , where I is the state space (a bounded interval of ), is the “Borel -algebra” of I, is a “measure” such that , and is a -invariant map. In this case, the image of X is the orbit of , i.e., , where and . First, we associate with an ordinal representation the probability of finding an ordinal pattern of a given rank . To this aim, we assume the stationary condition: for , the probability of cannot depend on t. This condition ensures that estimates of converge as data increase. Non-stationary processes with stationary increments, such as the fractional Brownian motion and the fractional Gaussian noise, satisfy the condition above.
7.1. Metric and Topological Permutation Entropy
Let be a deterministic or random process taking real values. Let be the probability of a sequence generated by being of type , and let be the corresponding probability distribution. We define the following:
- (i)
If , then is a permitted pattern for X.
- (ii)
If , then is a forbidden pattern.
The
permutation metric entropy of order L of
p is defined as
The topological entropy of order
L of the finite process
,
is the upper limit of the values of the permutation metric entropy of order
L. Formally, we obtain it by assuming that all allowed patterns of length
L are equiprobable:
where
is the number of allowed patterns of length
L for
. It is evident that the following inequalities hold:
We observe that
if all
L-ordinal patterns are allowed.
7.2. Bandt–Pompe Permutation Entropy
In their seminal paper [
28], Bandt and Pompe introduced the following notions:
Permutation metric entropy of
:
where
.
Topological permutation entropy of
:
An important question is, what is the relationship between the permutation metric entropy and the standard Kolmogorov–Sinai (KS) entropy of a map?
Let
f be a strictly piecewise monotone map on a closed interval
. The vast majority of, if not all, one-dimensional maps used in concrete applications belong to the class of piecewise monotone maps. In [
29], it was proved that
The same relation holds for the topological versions of the two entropies. This is a fundamental result since it allows us to compute the KS entropy using the ordinal analysis approach. The above theorem and its generalizations imply that the number of permitted patterns of length L for a deterministic process grows exponentially as
L increases.
where
is the topological KS entropy. In turn, this implies that the number of prohibited patterns grows super-exponentially.
At the other extreme, we have random processes without prohibited patterns. An elementary example is
white noise. According to Stirling’s formula,
It is also worth noting that noisy deterministic time series may not have prohibited patterns.
For example, in the case of dynamics on a non-trivial attractor where the orbits are dense, observational white noise will “destroy” all prohibited patterns, regardless of how little the noise is.
In general, random processes exhibit super-exponential growth in permitted patterns.
Random processes can also have prohibited patterns. In this case, the growth will be “intermediate,” meaning it is still super-exponential but subfactorial.
7.3. A Fundamental Problem
For random processes without prohibited patterns, the permutation entropy diverges in the limit as
:
Also, in general,
. Therefore, it is natural to consider the problem of extending the notion of permutation entropy to make it an intrinsically finite quantity. We assume that for a random process
,
where
is a certain function that depends on the type of process considered.
Can we find a suitable, generalized permutation entropy that converges as ?
7.4. Group Entropies and Ordinal Analysis
We can obtain a new solution to this problem through the theory of group entropies.
Philosophy: Instead of using the Shannon-type permutation entropy introduced by Bandt and Pompe as a universal entropy valid for all random processes, we will adapt our entropic measure to the specific problem we wish to address:
We will classify our processes into complexity classes, defined by complexity functions . These classes, in ordinal analysis, represent a notion entirely analogous to the universality class described earlier (inspired by statistical mechanics).
Each complexity class will correspond to a group permutation entropy, i.e., a specific information measure designed for the class under consideration.
This measure will be convergent as .
Functions and Complexity Classes
A process
is said to belong to the complexity class
g if
The bi-continuous function
is called the
complexity function of
. The process
belongs to the
exponential class if
belongs to the
factorial class if
Example 1. A deterministic process belongs to the exponential class. A random process like white noise ( i.i.d.) belongs to the factorial class.
A process belongs to the subfactorial class if one of the following conditions holds:
Example 2. Processes withbelong to subfactorial class (i). Processes with , can also be constructed explicitly. 7.5. Group Permutation Entropy
Main Result: The conventional permutation entropy of Bandt–Pompe can be consistently generalized. According to our philosophy, the complexity class g “dictates” its associated permutation entropy, which becomes finite in the limit of large L.
Definition 1. The group entropy of order L for a process of class g iswhere is the probability distribution of the L-ordinal patterns of and is the Rényi entropy. The corresponding topological group entropy of order L isThe group metric permutation entropy isThe topological group permutation entropy isThe functions defined in this manner are group entropies. Furthermore, they satisfy the inequalitiesThe following are various examples of group permutation entropies: