A Brief Review of Generalized Entropies

Amigó, José M.; Balogh, Sámuel G.; Hernández, Sergio

doi:10.3390/e20110813

Open AccessReview

A Brief Review of Generalized Entropies

by

José M. Amigó

^1,*

,

Sámuel G. Balogh

² and

Sergio Hernández

³

¹

Centro de Investigación Operativa, Universidad Miguel Hernández, Avda. de la Universidad s/n, 03202 Elche, Spain

²

Department of Biological Physics, Eötvös University, H-1117 Budapest, Hungary

³

HCSoft Programación S.L., 30007 Murcia, Spain

^*

Author to whom correspondence should be addressed.

Entropy 2018, 20(11), 813; https://doi.org/10.3390/e20110813

Submission received: 22 September 2018 / Revised: 18 October 2018 / Accepted: 19 October 2018 / Published: 23 October 2018

(This article belongs to the Special Issue 20th Anniversary of Entropy—Review Papers Collection)

Download

Browse Figures

Versions Notes

Abstract

:

Entropy appears in many contexts (thermodynamics, statistical mechanics, information theory, measure-preserving dynamical systems, topological dynamics, etc.) as a measure of different properties (energy that cannot produce work, disorder, uncertainty, randomness, complexity, etc.). In this review, we focus on the so-called generalized entropies, which from a mathematical point of view are nonnegative functions defined on probability distributions that satisfy the first three Shannon–Khinchin axioms: continuity, maximality and expansibility. While these three axioms are expected to be satisfied by all macroscopic physical systems, the fourth axiom (separability or strong additivity) is in general violated by non-ergodic systems with long range forces, this having been the main reason for exploring weaker axiomatic settings. Currently, non-additive generalized entropies are being used also to study new phenomena in complex dynamics (multifractality), quantum systems (entanglement), soft sciences, and more. Besides going through the axiomatic framework, we review the characterization of generalized entropies via two scaling exponents introduced by Hanel and Thurner. In turn, the first of these exponents is related to the diffusion scaling exponent of diffusion processes, as we also discuss. Applications are addressed as the description of the main generalized entropies advances.

Keywords:

generalized entropy; Tsallis; Rényi; Hanel–Thurner exponents; non-stationary regime

1. Introduction

The concept of entropy was introduced by Clausius [1] in thermodynamics to measure the amount of energy in a system that cannot produce work, and given an atomic interpretation in the foundational works of statistical mechanics and gas dynamics by Boltzmann [2,3], Gibbs [4], and others. Since then, entropy has played a central role in many-particle physics, notoriously in the description of non-equilibrium processes through the second principle of thermodynamics and the principle of maximum entropy production [5,6]. Moreover, Shannon made of entropy the cornerstone on which he built his theory of information and communication [7]. Entropy and the associated entropic forces are also the main character in recent innovative approaches to artificial intelligence and collective behavior [8,9]. Our formalism is information-theoretic (i.e., entropic forms are functions of probability distributions) owing to the mathematical properties that we discuss along the way, but can be translated to a physical context through the concept of microstate.

The prototype of entropy that we are going to consider below is the Boltzmann–Gibbs–Shannon (BGS) entropy,

S_{B G S} (p_{1}, \dots, p_{W}) = k \sum_{i = 1}^{W} p_{i} ln \frac{1}{p_{i}} = - k \sum_{i = 1}^{W} p_{i} ln p_{i} .

(1)

In its physical interpretation,

k = 1.3807 \times 10^{- 23}

J/K is the Boltzmann constant, W is the number of microstates consistent with the macroscopic constraints of a given thermodynamical system, and

p_{i}

is the probability (i.e., the asymptotic fraction of time) that the system is in the microstate i. In information theory, k is set equal to 1 for mathematical convenience, as we do hereafter, and

S_{B G S}

measures the average information conveyed by the outcomes of a random variable with probability distribution

{p_{1}, \dots, p_{W}}

. We use natural logarithms unless otherwise stated, although logarithms to base 2 is the natural choice in binary communications (the difference being the units, nats or bits, respectively). Remarkably enough, Shannon proved in Appendix B of his seminal paper [7] that Equation (1) follows necessarily from three properties or axioms (actually, four are needed; more on this below).

BGS entropy was later on generalized by other “entropy-like” quantities in dynamical systems (Kolmogorov–Sinai entropy [10], etc.), information theory (Rényi entropy [11], etc.), and statistical physics (Tsallis entropy [12], etc.), to mention the most familiar ones (see, e.g., [13] for an account of some entropy-like quantities and their applications, especially in time series analysis). Similar to with

S_{B G S}

, the essence of these new entropic forms was distilled into a small number of properties that allow sorting them out in a more systematic way [13,14]. Currently, the uniqueness of

S_{B G S}

is derived from the four Khinchin–Shannon axioms (Section 2). However, the fourth axiom, called the separability or strong additivity axiom (which implies additivity, i.e.,

S (A_{1} + A_{2}) = S (A_{1}) + S (A_{2})

, where

A_{1} + A_{2}

stands for a system composed of any two probabilistically independent subsystems

A_{1}

and

A_{2}

), is violated by physical systems with long-range interactions [15,16]. This poses the question of what mathematical properties have the “generalized entropies” satisfying only the other three axioms. These are the primary candidates for extensive entropic forms, i.e., functions S such that

S (B_{1} \cup B_{2}) = S (B_{1}) + S (B_{2})

, the shorthand

B_{1} \cup B_{2}

standing for the physical system composed of the subsystems

B_{1}

and

B_{2}

. Note that

B_{1} \cup B_{2} \neq B_{1} + B_{2}

in non-ergodic interacting systems just because the number of states in

B_{1} \cup B_{2}

is different from the number of states in

B_{1} + B_{2}

. A related though different question is how to weaken the separability axiom to identify the extensive generalized entropies; we come back briefly to this point in Section 2 when speaking of the composability property.

Along with

S_{B G S}

, typical examples of generalized entropies are the Tsallis entropy [12],

T_{q} (p_{1}, \dots, p_{W}) = \frac{1}{1 - q} (\sum_{i = 1}^{W} p_{i}^{q} - 1)

(2)

(

q \in R

,

q \neq 1

, with the proviso that for

q < 0

terms with

p_{i} = 0

are omitted), and the Rényi entropy [11],

R_{q} (p_{1}, \dots, p_{W}) = \frac{1}{1 - q} ln (\sum_{i = 1}^{W} p_{i}^{q})

(3)

(

q \geq 0

,

q \neq 1

). The Tsallis and Rényi entropies are related to the BGS entropy through the limits

lim_{q \to 1} T_{q} (p_{1}, \dots, p_{W}) = lim_{q \to 1} R_{q} (p_{1}, \dots, p_{W}) = S_{B G S} (p_{1}, \dots, p_{W}),

this being one of the reasons they are considered generalizations of the BGS entropy. Both

T_{q}

and

R_{q}

have found interesting applications [15,17]; in particular, the parametric weighting of the probabilities in their definitions endows data analysis with additional flexibility. Other generalized entropies that we consider in this paper are related to ongoing work on graphs [18]. Further instances of generalized entropies are also referred to below.

Let us remark at this point that

S_{B G S}

,

T_{q}

,

R_{q}

and other generalized entropies considered in this review can be viewed as special cases of the

(h, ϕ)

-entropies introduced in [19] for the study of asymptotic probability distributions. In turn,

(h, ϕ)

-entropies were generalized to quantum information theory in [20]. Quantum

(h, ϕ)

-entropies, which include von Neumann’s entropy [21] as well as the quantum versions of Tsallis’ and Rényi’s entropies, have been applied, for example, to the detection of quantum entanglement (see [20] and references therein). In this review, we do not consider quantum entropies, which would require advanced mathematical concepts, but only entropies defined on classical, discrete and finite probability distributions. If necessary, the transition to continuous distributions is done by formally replacing probability mass functions by densities and sums by integrals. For other approaches to the concept of entropy in more general settings, see [22,23,24,25].

Generalized entropies can be characterized by two scaling exponents in the limit

W \to \infty

, which we call Hanel–Thurner exponents [16]. For the simplest generalized entropies, which include

T_{q}

but not

R_{q}

(see Section 2), these exponents allow establishing a relationship between the abstract concept of generalized entropy and the physical properties of the system they describe through their asymptotic scaling behavior in the thermodynamic limit. That is, the two exponents label equivalence classes of systems which are universal in that the corresponding entropies have the same thermodynamic limit. In this regard, it is interesting to mention that, for any pair of Hanel–Thurner exponents (at least within certain ranges), there is a generalized entropy with those exponents, i.e., systems with the sought asymptotic behavior. Furthermore, the first Hanel–Thurner exponent allows also establishing a second relation with physical properties, namely, with the diffusion scaling exponents of diffusion processes, under some additional assumptions.

The rest of this review is organized as follows. The concept of generalized entropy along with some formal preliminaries and its basic properties are discussed in Section 1. As way of illustration, we discuss in Section 3 the Tsallis and Renyi entropies, as well as more recent entropic forms. The choice of the former ones is justified by their uniqueness properties under quite natural axiomatic formulations. The Hanel–Thurner exponents are introduced in Section 4, where their computation is also exemplified. Their aforementioned relation to diffusion scaling exponents is explained in Section 5. The main messages are recapped in Section 6. There is no section devoted to the applications but, rather, these are progressively addressed as the different generalized entropies are presented. The main text has been supplemented with three appendices at the end of the paper.

2. Generalized Entropies

Let

P

be the set of probability mass distributions

{p_{1}, \dots, p_{W}}

for all

W \geq 2

. For any function

H : P \to R^{+}

(

R^{+}

being the nonnegative real numbers), the Shannon–Khinchin axioms for an entropic form H are the following.

SK1: Continuity. $H (p_{1}, \dots, p_{W})$ depends continuously on all variables for each W.
SK2: Maximality. For all W,

$H (p_{1}, \dots, p_{W}) \leq H (\frac{1}{W}, \dots, \frac{1}{W}) .$
SK3: Expansibility: For all W and $1 \leq i \leq W$ ,

$H (0, p_{1}, \dots, p_{W}) = H (p_{1}, \dots, p_{i}, 0, p_{i + 1}, \dots, p_{W}) = H (p_{1}, \dots, p_{i}, p_{i + 1}, \dots, p_{W}) .$
SK4: Separability (or strong additivity): For all $W, U$ ,

$\begin{matrix} H (p_{11}, \dots, p_{1 U}, p_{21}, \dots p_{2 U}, \dots, p_{W 1}, \dots, p_{W U}) \\ = H (p_{1 \cdot}, p_{2 \cdot}, \dots, p_{W \cdot}) + \sum_{i = 1}^{W} p_{i \cdot} H (\frac{p_{i 1}}{p_{i \cdot}}, \frac{p_{i 2}}{p_{i \cdot}}, \dots, \frac{p_{i U}}{p_{i \cdot}}), \end{matrix}$

(4)

where $p_{i \cdot} = \sum_{j = 1}^{U} p_{i j}$ .

Let

{p_{11}, \dots, p_{1 U}, p_{21}, \dots p_{2 U}, \dots, p_{W 1}, \dots, p_{W U}}

be the joint probability distribution of the random variables X and Y, with marginal distributions

{p_{i \cdot} : 1 \leq i \leq W}

and

{p_{\cdot j} = \sum_{i = 1}^{W} p_{i j} : 1 \leq j \leq U}

, respectively. Then, axiom SK4 can be written as

H (X, Y) = H (X) + H (Y |X),

where

H (Y |X)

is the entropy of Y conditional on X. In particular, if X and Y are independent (i.e.,

p_{i j} = p_{i \cdot} p_{\cdot j}

), then

H (Y |X) = H (Y)

and

H (X, Y) = H (X) + H (Y) .

(5)

A function H such that Equation (5) holds (for independent random variables X and Y) is called additive. Physicists prefer writing

X + Y

for composed systems with microstate probabilities

p_{i j} = p_{i \cdot} p_{\cdot j}

; this condition holds approximately only for weakly interacting systems X and Y.

With regard to Equation (5), let us remind that, for two general random variables X and Y, the difference

I (X; Y) = H (X) + H (Y) - H (X, Y) \geq 0

is the mutual information of X and Y. It holds

I (X; Y) = 0

if and only if X and Y are independent [26].

More generally, a function H such that

\begin{matrix} H (p_{1} q_{1}, \dots, p_{1} q_{U}, p_{2} q_{1}, \dots, p_{2} q_{U}, \dots, p_{W} q_{1}, \dots, p_{W} q_{U}) \\ = H (p_{1}, \dots, p_{W}) + H (q_{1}, \dots, q_{U}) + (1 - α) H (p_{1}, \dots, p_{W}) H (q_{1}, \dots, q_{U}), \end{matrix}

(6)

(

α > 0

) is called

α

-additive. With the same notation as above, we can write this property as

H (X, Y) = H (X) + H (Y) + (1 - α) H (X) H (Y),

(7)

where, again, X and Y are independent random variables. In a statistical mechanical context, X and Y may stand also for two probabilistically independent (or weakly interacting) physical systems. If

α = 1

, we recover additivity (Equation (5)).

In turn, additivity and

α

-additivity are special cases of composability [15,27]:

H (X, Y) = Φ (H (X), H (Y)),

(8)

with the same caveats for X and Y. Here,

Φ

is a symmetric function of two variables. Composability was proposed in [15] to replace axiom SK4. Interestingly, it has been proved in [27] that, under some technical assumptions, the only composable generalized entropy of the form in Equation (10) is

T_{q}

, up to a multiplicative constant.

As mentioned in Section 1, a function

F : P \to R^{+}

satisfying axioms SK1–SK4 is necessarily of the form

F (p_{1}, \dots, p_{W}) = k S_{B G S} (p_{1}, \dots, p_{W})

for every W, where k is a positive constant ([28], Theorem 1). The same conclusion can be derived using other equivalent axioms [14,29]. For instance, Shannon used continuity, the property that

H (1 / n, \dots, 1 / n)

increases with n, and a property called grouping [29] or decomposibility [30], which he defined graphically in Figure 6 of [7]:

\begin{matrix} H (p_{1}, \dots, p_{W}) & = & H ((p_{1} + \dots + p_{r}), (p_{r + 1} + \dots + p_{W})) \\ + (p_{1} + \dots + p_{r}) H (\frac{p_{1}}{\sum_{i = 1}^{r} p_{i}}, \dots, \frac{p_{r}}{\sum_{i = 1}^{r} p_{i}}) \\ + (p_{r + 1} + \dots + p_{W}) H (\frac{p_{r + 1}}{\sum_{i = r + 1}^{W} p_{i}}, \dots, \frac{p_{W}}{\sum_{i = r + 1}^{W} p_{i}}) \end{matrix}

(9)

(

1 \leq r \leq W - 1

). This property allows reducing the computation of

H (p_{1}, \dots, p_{W})

to the computation of the entropy of dichotomic random variables. According to ([15], Section 2.1.2.7), Shannon missed in his uniqueness theorem to formulate the condition in Equation (5), X and Y being independent random variables.

Nonnegative functions defined on

P

that satisfy axioms SK1–SK3 are called generalized entropies [16]. In the simplest situation, a generalized entropy has the sum property [14], i.e., the algebraic form

F_{g} (p_{1}, \dots, p_{W}) = \sum_{i = 1}^{W} g (p_{i}),

(10)

with

g : [0, 1] \to R^{+}

.

The following propositions are immediate.

(i): Symmetry: $F_{g} (p_{1}, \dots, p_{W})$ is invariant under permutation of $p_{1}, \dots, p_{W}$ .
(ii): $F_{g}$ satisfies axiom SK1 if and only if g is continuous.
(iii): If $F_{g}$ satisfies axiom SK2, then

$\sum_{i = 1}^{W} g (p_{i}) \leq W g (\frac{1}{W})$

for all $W \geq 2$ and $p_{1}, \dots, p_{W}$ with $p_{1} + \dots + p_{W}$ $= 1$ .
(iv): If g is concave (i.e., ∩-convex), then $F_{g}$ satisfies axiom SK2.
(v): $F_{g}$ satisfies axiom SK3 if and only if $g (0) = 0$ .

Note that Proposition (iv) follows from the symmetry and concavity of

F_{g}

(since the unique maximum of

F_{g}

must occur at equal probabilities).

We conclude from Propositions (ii), (iv) and (v) that, for

F_{g}

to be a generalized entropy, the following three condition suffice:

(C1): g is continuous.
(C2): g is concave.
(C3): $g (0) = 0$ .

As in [16], we say that a macroscopic statistical system is admissible if it is described by a generalized entropy

F_{g}

of the form in Equation (10) such that g verifies Conditions (C1)–(C3). By extension, we say also that the generalized entropy

F_{g}

is admissible. Admissible systems and generalized entropies are the central subject of this review. Clearly,

S_{B G S}

is admissible because

g (x) = - x log x,

(11)

0 \leq x \leq 1

. On the other hand,

T_{q}

corresponds to

g (x) = \frac{1}{1 - q} (x^{q} - x) .

(12)

For

T_{q}

to be admissible, Condition (C1) requires

q \geq 0

and Condition (C3) requires

q > 0

.

An example of a function

F : P \to R^{+}

with the sum property that does not qualify for admissible generalized entropy is

F (p_{1}, \dots, p_{W}) = \sum_{i = 1}^{W} {(p_{i} - \frac{1}{W})}^{2} = \sum_{i = 1}^{W} p_{i}^{2} - \frac{1}{W} .

(13)

Indeed,

g (x) = {(x - \frac{1}{W})}^{2}

is not ∩-convex but ∪-convex and

g (0) = \frac{1}{W^{2}} \neq 0

. This probability functional was used in [31] to classify sleep stages.

Other generalized entropies that are considered below have the form

F_{G, g} (p_{1}, \dots, p_{W}) = G (\sum_{i = 1}^{W} g (p_{i})),

(14)

where G is a continuous monotonic function, and g is continuous with

g (0) = 0

. By definition,

F_{G, g}

is also symmetric, and Proposition (iii) holds with the obvious changes. However, the concavity of g is not a sufficient condition any more for

F_{G, g}

to be a generalized entropy. Such is the case of the Rényi entropy

R_{q}

(Equation (3)); here

G (u) = \frac{1}{1 - q} ln u and g (x) = x^{q},

(15)

but

g (x)

(and, hence,

\sum_{i = 1}^{W} g (p_{i})

) is not ∩-convex for

q > 1

. Furthermore, note that axiom SK3 requires

q > 0

for

R_{q}

to be a generalized entropy.

Since Equation (10) is a special case of Equation (14) (set G to be the identity map

i d (u) = u

), we can refer to both cases just by using the notation

F_{G, g}

, as we do hereafter.

We say that two probability distributions

{p_{i}}

and

{p_{i}^{'}}

,

1 \leq i \leq W

, are close if

∥{p_{i}} - {p_{i}^{'}}∥ = \sum_{i = 1}^{W} |p_{i} - p_{i}^{'}| \leq δ,

where

0 < δ ≪ 1

; other norms, such as the two-norm and the max-norm, will do as well since they are all equivalent in the metric sense. A function

F : P \to R^{+}

is said to be Lesche-stable if for all W and

ϵ > 0

there exists

δ > 0

such that

∥{p_{i}} - {p_{i}^{'}}∥ \leq δ \Rightarrow |\frac{F ({p_{i}}) - F ({p_{i}^{'}})}{F_{max}}| < ϵ,

(16)

where

F_{max}

= {max}_{{p_{i}} \in P} F ({p_{i}})

. It follows that

lim_{δ \to 0} lim_{W \to \infty} |\frac{F ({p_{i}}) - F ({p_{i}^{'}})}{F_{max}}| = 0 .

Lesche stability is called experimental robustness in [15] because it guarantees that similar experiments performed on similar physical systems provide similar results for the function F. According to [16], all admissible systems are Lesche stable.

3. Examples of Generalized Entropies

As way of illustration, we put the focus in this section on two classical generalized entropies as well as on some newer ones. The classical examples are the Tsallis entropy and the Rényi entropy because they have extensively been studied in the literature from an axiomatic point of view too. As it turns out, they are unique under some natural assumptions, such as additivity,

α

-additivity or composability (see below for details). The newer entropies are related to potential applications of the concept of entropy to graph theory [18]. Other examples of generalized entropies are listed in Appendix A for further references.

3.1. Tsallis Entropy

A simple way to introduce Tsallis’ entropy as a generalization of the BGS entropy is the following [15]. Given

q \in R

, define the q-logarithm of a real number

x > 0

as

{ln}_{q} x = \{\begin{matrix} ln x & if q = 1, \\ \frac{x^{1 - q} - 1}{1 - q} & otherwise . \end{matrix}

Note that

{ln}_{1} x

is defined by continuity since

{lim}_{q \to 1} {ln}_{q} x = ln x

. If the logarithm in the definition of

S_{B G S}

, Equation (1), is replaced by

{ln}_{q}

, then we obtain the Tsallis entropy:

T_{q} (p_{1}, \dots, p_{W}) = \sum_{i = 1}^{W} p_{i} {ln}_{q} (1 / p_{i}) = \frac{1}{1 - q} (\sum_{i = 1}^{W} p_{i}^{q} - 1) .

(17)

As noted before,

q > 0

for

T_{q}

to be an admissible generalized entropy.

Alternatively, the definition

S_{B G S} (p_{1}, \dots, p_{W}) = - \frac{d}{d x} {\sum_{i = 1}^{W} p_{i}^{x}|}_{x = 1}

can also be generalized to provide the Tsallis entropy via the q-derivative,

T_{q} (p_{1}, \dots, p_{W}) = - D_{q} {\sum_{i = 1}^{W} p_{i}^{x}|}_{x = 1},

where

D_{q} f (x) : = \frac{f (q x) - f (x)}{q x - x} .

Set

q x = x + h

, i.e.,

h = (q - 1) x

, and let

h \to 0

to check that

D_{1} f (x) \equiv {lim}_{q \to 1} D_{q} f (x) = d f (x) / d x

.

Although Tsallis proposed his entropy (Equation (17)) in 1988 to go beyond the standard statistical mechanics [12], basically the same formula had already been proposed in 1967 by Havrda and Charvát (with a different multiplying factor) in the realm of cybernetics and control theory [32].

Some basic properties of

T_{q}

follow.

(T1): $T_{1} = S_{B G S}$ because ${ln}_{1} p_{i} = ln p_{i}$ (or $D_{1} f (x) = d f (x) / d x$ ).
(T2): $T_{q}$ is (strictly) ∩-convex for $q > 0$ . Figure 1 plots $T_{q} (p, 1 - p)$ for $q = 0.5$ , 1, 2 and 5. Let us mention in passing that $T_{q}$ is ∪-convex for $q < 0$ .
(T3): $T_{q}$ is Lesche-stable for all $q > 0$ [33,34]. Actually, we stated at the end of Section 2 that all admissible systems are Lesche stable.
(T4): $T_{q}$ is not additive but q-additive (see Equation (6) or (7) with $α$ replaced by q). This property follows from [15]

${ln}_{q} x y = {ln}_{q} x + {ln}_{q} y + (1 - q) ({ln}_{q} x) ({ln}_{q} y) .$
(T5): Similar to what happens with the BGS entropy, Tsallis entropy can be uniquely determined (except for a multiplicative positive constant) by a small number of axioms. Thus, Abe [35] characterized the Tsallis entropy by: (i) continuity; (ii) the increasing monotonicity of $T_{q} (1 / W, \dots, 1 / W)$ with respect to W; (iii) expansivity; and (iv) a property involving conditional entropies. Dos Santos [36], on the other hand, used the previous Axioms (i) and (ii), q-additivity, and a generalization of the grouping axiom (Equation (9)). Suyari [37] derived $T_{q}$ from the first three Shannon–Khinchin axioms and a generalization of the fourth one. The perhaps most economical characterization of $T_{q}$ was given by Furuichi [38]; it consists of continuity, symmetry under the permutation of $p_{1}, \dots, p_{W}$ , and a property called q-recursivity. As mentioned in Section 2, Tsallis entropy was recently shown [27] to be the only composable generalized entropy of the form in Equation (10) under some technical assumptions. Further axiomatic characterizations of the Tsallis entropy can be found in [39].

An observable of a thermodynamical (i.e., many-particle) system, say its energy or entropy, is said to be extensive if (among other characterizations), for a large number N of particles, that observable is (asymptotically) proportional to N. For example, for a system whose particles are weakly interacting (think of a dilute gas), the additive

S_{B G S}

is extensive, whereas the non-additive

T_{q}

(

q \neq 1

) is non-extensive. The same happens with ergodic systems [40]. However, according to [15], for a non-ergodic system with strong correlations,

S_{B G S}

can be non-extensive while

T_{q}

can be extensive for a particular value of q; such is the case of a microcanonical spin system on a network with growing constant connectancy [40]. This is why

T_{q}

represents a physically relevant generalization of the traditional

S_{B G S}

. Axioms SK1–SK3 are expected to hold true also in strongly interacting systems.

Further applications of the Tsallis entropy include astrophysics [41], fractal random walks [42], anomalous diffusion [43,44], time series analysis [45], classification [46,47], and artificial neural networks [48].

3.2. Rényi Entropy

A simple way to introduce Rényi’s entropy as a generalization of

S_{B G S}

is the following [17]. By definition, the BGS entropy of the probability distribution

{p_{1}, \dots, p_{W}}

(or of a random variable X with that probability distribution) is the linear average of the information function

I (p_{i}) = ln \frac{1}{p_{i}}, 1 \leq i \leq W,

or, equivalently, the expected value of the random variable

ln \frac{1}{p (X)}

:

S_{B G S} (p_{1}, \dots, p_{W}) = E_{p} [ln \frac{1}{p (X)}] = \sum_{i = 1}^{W} p_{i} I (p_{i}) .

In the general theory of expected values, for any invertible function

ϕ

and realizations

x_{1}, \dots, x_{W}

of X in the definition domain of

ϕ

, an expected value can be defined as

E_{p, ϕ} [X] = ϕ^{- 1} (\sum_{i = 1}^{W} p_{i} ϕ (x_{i})) .

Applying this definition to

ln \frac{1}{p (X)}

, we obtain

E_{p, ϕ} [ln \frac{1}{p (X)}] = ϕ^{- 1} (\sum_{i = 1}^{W} p_{i} ϕ (I (p_{i}))) .

If this generalized average has to be additive for independent events, i.e., it has to satisfy Equation (6) with

α = 1

, then

ϕ (x) = c_{1} x or ϕ (x) = c_{2}^{(1 - q) x}

must hold, where

c_{1}

,

c_{2}

are positive constants, and

q > 0, q \neq 1

. The first case leads to

S_{B G S}

, Equation (1), after choosing

c_{1} = e

. The second case leads to the Rényi entropy (actually, a one-parameter family of entropies)

R_{q}

, Equation (3), after choosing

c_{2} = e

as well.

Next, we summarize some important properties of the Rényi entropy.

(R1): $R_{q}$ is additive by construction.
(R2): $R_{1} \equiv {lim}_{q \to 1} R_{q} = S_{B G S}$ . Indeed, use L’Hôpital’s Rule to derive

$\begin{matrix} lim_{q \to 1} \frac{1}{1 - q} ln (\sum_{i = 1}^{W} p_{i}^{q}) & = & - lim_{q \to 1} \frac{d}{d q} ln (\sum_{i = 1}^{W} p_{i}^{q}) \\ = & - lim_{q \to 1} \frac{1}{\sum_{i = 1}^{W} p_{i}^{q}} \sum_{i = 1}^{W} p_{i}^{q} ln p_{i} \\ = & - \sum_{i = 1}^{W} p_{i} ln p_{i} . \end{matrix}$
(R3): $R_{q}$ is ∩-convex for $0 < q \leq 1$ and it is neither ∩-convex nor ∪-convex for $q > 1$ . Figure 2 plots $R_{q} (p, 1 - p)$ for $q = 0.5$ , 1, 2 and 5.
(R4): $R_{q}$ is Lesche-unstable for all $q > 0$ , $q \neq 1$ [49].
(R5): The entropies $R_{q}$ are monotonically decreasing with respect to the parameter q for any distribution of probabilities, i.e.,

$q < q^{'} ⟹ R_{q} \geq R_{q^{'}} .$

This property follows from the formula

$- \frac{d R_{q}}{d q} = \frac{1}{{(1 - q)}^{2}} \sum_{i = 1}^{W} p_{i}^{'} ln \frac{p_{i}^{'}}{p_{i}} = \frac{1}{{(1 - q)}^{2}} D ({p_{i}^{'}} ∥{p_{i}}),$

where $p_{i}^{'} = p_{i}^{q} / \sum_{k = 1}^{W} p_{k}^{q}$ , and $D ({p_{i}^{'}} ∥{p_{i}}) \geq 0$ is the Kullback–Leibler divergence of the probability distributions ${p_{1}^{'}, \dots, p_{W}^{'}}$ and ${p_{1}, \dots, p_{W}}$ . $D ({p_{i}^{'}} ∥{p_{i}})$ vanishes only in the event that both probability distributions coincide, otherwise is positive [26].
(R6): A straightforward relation between Rényi’s and Tsallis’ entropies is the following [50]:

$T_{q} = \frac{1}{1 - q} (e^{(1 - q) R_{q}} - 1) or R_{q} = \frac{1}{1 - q} ln (1 + (1 - q) T_{q}) .$

However, the axiomatic characterizations of the Rényi entropy are not as simple as those for the Tsallis entropy. See [27,51,52] for some contributions in this regard.

For some values of q,

R_{q}

has particular names. Thus,

R_{0} = ln W

is called Hartley or max-entropy, which coincides numerically with

S_{B G S}

for an even probability distribution. We saw in (R2) that

R_{q}

converges to the BGS entropy in the limit

q \to 1

.

R_{2} = - \sum_{i = 1}^{W} p_{i}^{2}

is called collision entropy. In the limit

q \to \infty

,

R_{q}

converges to the min-entropy

R_{\infty} (p_{1}, \dots, p_{W}) = min_{1 \leq i \leq W} (- ln p_{i}) = - max_{1 \leq i \leq W} ln p_{i} = - ln max_{1 \leq i \leq W} p_{i} .

The name of

R_{\infty}

is due to property (R5).

Rényi entropy has found interesting applications in random search [53], information theory (especially in source coding [54,55]), cryptography [56], time series analysis [57], and classification [46,58], as well as in statistical signal processing and machine learning [17].

3.3. Graph Related Entropies

As part of ongoing work on graph entropy [18], the following generalized entropies are defined:

H_{1} (p_{1}, \dots, p_{W}) = \sum_{i = 1}^{W} (1 - {(p_{i})}^{p_{i}}),

(18)

H_{2} (p_{1}, \dots, p_{W}) = \prod_{i = 1}^{W} (2 - {(p_{i})}^{p_{i}}) = exp (\sum_{i = 1}^{W} ln (2 - {(p_{i})}^{p_{i}})),

(19)

and

H_{3} (p_{1}, \dots, p_{W}) = 1 + ln H_{2} (p_{1}, \dots, p_{W}) = \sum_{i = 1}^{W} (p_{i} + ln (2 - {(p_{i})}^{p_{i}})) .

(20)

Note that

H_{1} (\dots, 0, 1, 0, e t c .) = 0

, while

H_{2} (\dots, 0, 1, 0, \dots) = H_{3} (\dots, 0, 1, 0, \dots) = 1

. Other oddities of the above entropies include the terms

{(p_{i})}^{p_{i}}

in their definitions, as well as the presence of products instead of sums in the definition of

H_{2}

.

First,

H_{1}

is of the type in Equation (10) with

g_{1} (x) = \{\begin{matrix} 0 & if x = 0, \\ 1 - x^{x} & if 0 < x \leq 1 . \end{matrix}

(21)

By definition,

g (x)

is continuous (even smooth), concave on the interval

[0, 1]

, and

g_{1} (0) = 0

. Therefore (see Conditions (C1)–(C3) in Section 2), H1 satisfies the axioms SK1–SK3, hence it is a generalized entropy.

As for

H_{2}

, this probability functional is of the type in Equation (14) with

g_{2} (x) = \{\begin{matrix} 0 & if x = 0, \\ ln (2 - x^{x}) & if 0 < x \leq 1, \end{matrix}

(22)

and

G (u) = e^{u}

. To prove that

H_{2}

is a generalized entropy, note that

ln H_{2} (p_{1}, \dots, p_{W}) = \sum_{i = 1}^{W} ln (2 - {(p_{i})}^{p_{i}})

satisfies axioms SK1–SK3 for the same reasons as

H_{1}

does. Therefore, the same happens with

H_{2}

on account of the exponential function being continuous (SK1), increasingly monotonic (SK2), and univalued (SK3).

Finally,

H_{3}

is of the type in Equation (10) with

g_{3} (x) = \{\begin{matrix} 0 & if x = 0, \\ x + g_{2} (x) & if 0 < x \leq 1 . \end{matrix}

(23)

Since

H_{3} = 1 + ln H_{2}

, it is a generalized entropy because, as shown above,

ln H_{2}

satisfies axioms SK1–SK3.

Figure 3 depicts

H_{1} (p, 1 - p)

,

H_{2} (p, 1 - p)

,

H_{3} (p, 1 - p)

, along with

S_{B G S} (p, 1 - p)

and

H_{2} - S_{B G S} - 1

for comparison. As a curiosity, let us point out that the scaled versions

{\tilde{H}}_{i} (p, 1 - p) = \frac{H_{i} (p, 1 - p) - H_{i} (0, 1)}{H_{i} (\frac{1}{2}, \frac{1}{2}) - H_{i} (0, 1)},

(24)

(

i = 1, 2, 3

), see Figure 4, approximate

S_{B G S} (p, 1 - p)

measured in bits very well. In particular, the relative error in the approximation of

S_{B G S} (p, 1 - p)

by

{\tilde{H}}_{2} (p, 1 - p)

is less than

2.9 \times 10^{- 4}

, so their graphs overlap when plotted.

A further description of the entropies in Equations (18)–(20) is beyond the scope of this section. Let us only mention in this regard that these entropies can be extended into the realm of acyclic directed graphs.

4. Hanel–Thurner Exponents

All generalized entropies

F_{G, g}

group in classes labeled by two exponents

(c, d)

introduced by Hanel and Thurner [16], which are determined by the limits

lim_{W \to \infty} \frac{F_{G, g} (p_{1}, \dots, p_{λ W})}{F_{G, g} (p_{1}, \dots, p_{W})} = λ^{1 - c}

(25)

(W being as before the cardinality of the probability distribution or the total number of microstates in the system,

λ > 1

) and

lim_{W \to \infty} \frac{F_{G, g} (p_{1}, \dots, p_{W^{1 + a}})}{F_{G, g} (p_{1}, \dots, p_{W})} W^{a (c - 1)} = {(1 + a)}^{d}

(26)

(

a > 0

). Note that the limit in Equation (26) does not depend actually on c. The limits in Equations (25) and (26) can be computed via the asymptotic equipartition property [26]. Thus,

F_{G, g} (p_{1}, \dots, p_{λ W}) \approx G (λ W g (\frac{1}{λ W}))

and

F_{G, g} (p_{1}, \dots, p_{W^{1 + a}}) \approx G (W^{1 + a} g (\frac{1}{W^{1 + a}}))

asymptotically with ever larger W (thermodynamic limit). Set now

x = 1 / W

to derive

lim_{x \to 0 +} \frac{G (\frac{λ}{x} g (\frac{x}{λ}))}{G (\frac{1}{x} g (x))} = λ^{1 - c}

(27)

and

lim_{x \to 0 +} \frac{G (\frac{1}{x^{1 + a}} g (x^{1 + a}))}{x^{a (c - 1)} G (\frac{1}{x} g (x))} = {(1 + a)}^{d} .

(28)

Clearly, the scaling exponents c, d of a generalized entropy

F_{G, g}

depend on the behavior of g in an infinitesimal neighborhood

(0, ε]

of 0 (i.e.,

g (ε)

with

0 < ε ≪ 1

), as well as on the properties of G if

G \neq i d

. We call

(c, d)

the Hanel–Thurner (HT) exponents of the generalized entropy

F_{G, g}

.

When

G = i d

, Equations (27) and (28) abridge to

lim_{x \to 0 +} \frac{g (z x)}{g (x)} = z^{c}

(29)

(after replacing

λ^{- 1}

by z), and

lim_{x \to 0 +} \frac{g (x^{1 + a})}{x^{a c} g (x)} = {(1 + a)}^{d},

(30)

respectively. In this case,

0 < c \leq 1

, while d can be any real number. If

c = 1

, the concavity of g implies

d \geq 0

[16]. The physical properties of admissible systems are uniquely characterized by their HT exponents, i.e., by their asymptotic properties in the limit

W \to \infty

[16]. In this sense, we can also speak of the universality class

(c, d)

.

As way of illustration, we are going to derive the HT exponents of

S_{B G S}

,

T_{q}

and

R_{q}

.

(E1): For the BGS entropy, $g (x) = - x ln x$ (see Equation (11)), so

$\frac{g (z x)}{g (x)} = \frac{z x ln (z x)}{x ln x} = \frac{z ln z + z ln x}{ln x} \to z$

as $x \to 0 +$ . Therefore, $c = 1$ . Furthermore,

$\frac{g (x^{1 + a})}{x^{a c} g (x)} = \frac{x^{1 + a} ln x^{1 + a}}{x^{a + 1} ln x} = \frac{(1 + a) ln x}{ln x} = 1 + a$

for all $x > 0$ , so $d = 1$ .
(E2): For the Tsallis entropy, see Equation (12),

$g (x) = \{\begin{matrix} \frac{1}{1 - q} x^{q} + O (x) & if 0 < q < 1, \\ - \frac{1}{1 - q} x + O (x) & if q > 1 . \end{matrix}$

It follows readily that $(c, d) = (q, 0)$ if $0 < q < 1$ , and $(c, d) = (1, 0)$ if $q > 1$ . Hence, although ${lim}_{q \to 1} T_{q} = S_{B G S}$ , there is no parallel convergence concerning the HT exponents.
(E3): For the Rényi entropy, $g (x) = x^{q}$ and $G (u) = \frac{1}{1 - q} e^{u}$ (see Equation (15)), so

$\frac{G (\frac{λ}{x} g (\frac{x}{λ}))}{G (\frac{1}{x} g (x))} = \frac{ln (\frac{λ}{x} {(\frac{x}{λ})}^{q})}{ln (\frac{1}{x} x^{q})} = \frac{ln x^{q - 1} - ln λ^{q - 1}}{ln x^{q - 1}} \to 1$

as $x \to 0 +$ (both for $0 \leq q \leq 1$ and $q \geq 1$ ). Therefore, $c = 1$ . Furthermore,

$\frac{G (\frac{1}{x^{1 + a}} g (x^{1 + a}))}{G (\frac{1}{x} g (x))} = \frac{ln (\frac{1}{x^{1 + a}} x^{q (1 + a)})}{ln (\frac{1}{x} x^{q})} = \frac{ln x^{(q - 1) (1 + a)}}{ln x^{q - 1}} = 1 + a$

for all $x > 0$ , so that $d = 1$ . In sum, $(c, d) = (1, 1)$ for all q.

As for the generalized entropies

H_{1}, H_{2}

, and

H_{3}

considered in Section 3.3, we show in Appendix B that their HT exponents are

(1, 1)

,

(0, 0)

, and

(1, 1)

, respectively. Thus,

H_{1}

and

H_{3}

belong to the same universality class as

S_{B G S}

, while the HT exponents of

H_{2}

and

R_{q}

(both of the same the type in Equation (14)) are different. Moreover, the interested reader will find in Table 1 of [16] the HT exponents of the generalized entropies listed in Appendix A.

An interesting issue that arises at this point is the inverse question: Given

c \in (0, 1]

and

d \in R

, is there an admissible system such that its HT exponents are precisely

(c, d)

? The answer is yes, at least under some restrictions on the values of c and d. Following [16], we show in Appendix C that, if

\begin{matrix} d > - 1 & for 0 < c \leq \frac{1}{2}, \\ d \geq 1 - \frac{1}{c} & for \frac{1}{2} < c \leq 1, \end{matrix}

(31)

then the “generalized

(c, d)

-entropy”

S_{c, d} (p_{1}, \dots, p_{W}) = e A \sum_{i = 1}^{W} Γ (d + 1, 1 - c ln p_{i}),

(32)

has HT exponents

(c, d)

. Here,

A > 0

and

Γ

is the incomplete Gamma function (Section 6.5 of [59]), that is,

Γ (r, s) = \int_{s}^{\infty} t^{r - 1} e^{- t} d t (r > 0) .

(33)

Several application cases where generalized

(c, d)

-entropies are relevant have been discussed by Hanel and Thurner in [40] (super-diffusion, spin systems, binary processes, and self-organized critical systems) and [60] (aging random walks, i.e., random walks whose transition rates between states are path- and time-dependent).

5. Asymptotic Relation between the HT Exponent c and the Diffusion Scaling Exponent

In contrast to “non-interacting” systems, where both the additivity and extensivity of the BGS entropy

S_{B G S}

hold, in the case of general interacting statistical systems these properties can no longer be simultaneously satisfied, requiring a more general concept of entropy [16,40]. Following [16] (Section 4), a possible generalization of

S_{B G S}

for admissible systems is defined via the two asymptotic scaling relations in Equations (29) and (30), i.e., the HT exponents c and d, respectively. These asymptotic exponents can be interpreted as a measure of deviation from the “non-interacting” case regarding the stationary behavior.

5.1. The Non-Stationary Regime

In this section, we describe a relation between the exponent c and a similar macroscopic measure that characterizes the system in the non-stationary regime, thus providing a meaningful interpretation of the exponent. The non-stationary behavior of a system can possibly be described by the Fokker–Planck (FP) equation governing the time evolution of a probability density function

p = p (x, t)

. In this continuous limit, the generalized entropy

F_{g}

is assumed to be written as

F_{g} [p (s)] = \int g (p (s)) d s

, where g is asymptotically characterized by Equation (29) and

s = s (x)

is a time-independent scalar function of the space coordinate x (for example, a potential) [61,62].

Going beyond the scope of the simplest FP equation, we consider systems for which the correlation among their (sub-)units can be taken into account by replacing the diffusive term

\partial_{x}^{2} p

with an effective term

\partial_{x}^{2} Φ [p]

, where

Φ [p]

is a pre-defined functional of the probability density.

Φ [p]

can be either derived directly from the microscopical transition rules or it may be defined based on macroscopic assumptions. The resulting FP equation can be written as

\partial_{t} p (x, t) = D β \partial_{x} (p (x, t) \partial_{x} u (x)) + D \partial_{x}^{2} Φ [p (x, t)],

(34)

where

D, β

are constants and

u (x)

is a time-independent external potential.

For simplicity, hereafter we exclusively focus on one dimensional FP equations. In the special case of

Φ [p] = p

and no external forces, Equation (34) reduces to the well-known linear diffusion equation

\partial_{t} p (x, t) = D \partial_{x}^{2} p (x, t) .

(35)

The above equation is invariant under the space-time scaling transformation

p (x, t) = τ^{- γ} p (\frac{x}{τ^{γ}}, \frac{t}{τ})

(36)

with

γ = \frac{1}{2}

[63,64]. This scaling property opens up the possibility of a phenomenological and macroscopic characterization of anomalous diffusion processes [15,44] as well, which correspond to more complicated non-stationary processes described by FP equations in the form of Equation (34) with a non-trivial value of

γ

. With the help of the transformation in Equation (36), we can also classify correlated statistical systems according to the rate of the spread of their probability density functions over time in the asymptotic limit and, thus, quantitatively describe their behavior in the non-stationary regime.

5.2. Relation between the Stationary and Non-Stationary Regime

To reasonably and consistently relate the generalized entropies to the formalism of FP equations—corresponding to the stationary and non-stationary regime, respectively—the functional

Φ [p]

has to be chosen such that the stationary solution of the general FP equation becomes equivalent to the Maximum Entropy (MaxEnt) probability distribution calculated with the generalized entropies. These MaxEnt distributions can be obtained analogously to the results by Hanel and Thurner in [16,40], where they used standard constrained optimization to find the most general form of MaxEnt distributions, which turned out to be

p (ϵ) = E_{c, d, r} (- ϵ)

with

E_{c, d, r} (x) \propto exp [- \frac{d}{1 - c} W_{k} (B {(1 - \frac{x}{r})}^{1 / d})] .

(37)

Here,

B, r

are constants depending only on the

c, d

parameters and

W_{k}

is the kth branch of the Lambert-W function (specifically, branch

k = 0

for

d \geq 0

and branch

k = 1

for

d < 0

). The consistency criterion imposed above accords with the fact that many physical systems tend to converge towards maximum entropy configuration over time, however, it specifies the limits of our assumptions.

Consider systems described by Equation (34) in the absence of external force, i.e.,

\partial_{t} p (x, t) = D \partial_{x}^{2} Φ [p (x, t)] .

(38)

By assuming that the corresponding stationary solutions can be identified with the MaxEnt distributions in Equation (37), it can be shown that the functional form of the effective density

Φ [p]

must be expressed as

Φ [p] \propto \int_{0}^{p} q \partial_{q}^{2} g (q) d q,

(39)

where we neglected additive and multiplicative constant factors for the sake of simplicity. Similar implicit equations have already been investigated in [61,62,65]. Once the asymptotic phase space volume scaling relation in Equation (29) holds, it can also be shown that the generalized FP in Equation (38) (with

Φ

as in Equation (39)) obeys the diffusion scaling property in Equation (36) with a non-trivial value of

γ

in the

p \to 0

asymptotic limit [66] (assuming additionally the existence of the solution of Equation (38), at least from an appropriate initial condition). A simple algebraic relation between the diffusion scaling exponent

γ

and the phase space volume scaling exponent c can be established [66], which can be written as

γ = \frac{1}{1 + c} .

(40)

Therefore, this relation between c and

γ

defines families of FP equations which show asymptotic invariance under the scaling relation in Equation (36).

6. Conclusions

This review concentrates on the concept of generalized entropy (Section 2), which is relevant in the study of real thermodynamical systems and, more generally, in the theory of complex systems. Possibly the first example of a generalized entropy was introduced by Rényi (Section 3.2), who was interested in the most general information measure which is additive in the sense of Equation (5), with the random variables X and Y being independent. Another very popular generalized entropy was introduced by Tsallis as a generalization of the Boltzmann–Gibbs entropy (Section 3.1) to describe the properties of physical systems with long range forces and complex dynamics in equilibrium. Some more exotic generalized entropies are considered in Section 3.3, while other examples that have been published in the last two decades are gathered in Appendix A. Our approach was to a great extent formal, with special emphasis in Section 2 and Section 3 on axiomatic formulations and mathematical properties. For expository reasons, applications are mentioned and the original references given as our description of the main generalized entropies progressed, rather than addressing them jointly in a separate section.

An alternative approach to generalized entropies other than the axiomatic one (Section 2) consists in characterizing their asymptotic behavior in the thermodynamic limit

W \to \infty

. Hanel and Thurner showed that two scaling exponents

(c, d)

suffice for admissible generalized entropies, i.e., those entropies of the form in Equation (10) with g continuous, concave and

g (0) = 0

(Section 4); it holds

c \in (0, 1]

and

d \in R

. As a result, the admissible systems fall in equivalence classes labeled by the exponents

(c, d)

of the corresponding entropies. Conversely, to each

(c, d)

, there is a generalized entropy with those Hanel–Thurner exponents (see Equation (32)), at least for the most interesting value ranges.

It is also remarkable that, at asymptotically large times and volumes, there is a 1-to-1 relation between the equivalence class of generalized entropies with a given

c \in (0, 1]

and the equivalence class of Fokker–Planck equations in which the invariance in Equation (36) holds with

γ = \frac{1}{1 + c} \in [\frac{1}{2}, 1)

(Section 5). This means that the equivalence classes of admissible systems can generally be mapped into anomalous diffusion processes and vice versa, thus conveying the same information about the system in the asymptotic limit (i.e., when

p (x, t) \to 0

) [66]. A schematic visualization of this relation is provided in Figure 5. Moreover, the above result can actually be understood as a possible generalization of the Tsallis–Bukman relation [44].

Author Contributions

All the authors have contributed to conceptualization, methodology, validation, formal analysis, investigation, writing, review and editing, both of the initial draft and the final version.

Funding

J.M.A. was supported by the Spanish Ministry of Economy, Industry and Competitiveness, grant MTM2016-74921-P (AEI/FEDER, EU). S.G.B. was partially supported by the Hungarian National Research, Development and Innovation Office (grant no. K 128780) and the European Union’s Horizon 2020 Research and Innovation Programme under Grant Agreement No. 740688.

Acknowledgments

We thank our referees for their helpful and constructive criticism. J.M.A. was supported by the Spanish Ministry of Economy, Industry and Competitiveness, grant MTM2016-74921-P (AEI/FEDER, EU). This research was also partially supported by the Hungarian National Research, Development and Innovation Office (grant no. K 128780) and the European Union’s Horizon 2020 Research and Innovation Programme under Grant Agreement No. 740688.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

We list in this appendix further generalized entropies of the form in Equation (10) and the original references (notation as in Table 1 of [16]).

Γ (\cdot, \cdot)

is the incomplete Gamma function, in Equation (33).

$S_{η} ({p_{i}}) = \sum_{i} Γ (\frac{η + 1}{η}, ln p_{i}) - p_{i} Γ (\frac{η + 1}{η}) (η > 0)$ [67].
$S_{κ} ({p_{i}}) = \sum_{i} \frac{p_{i}^{1 - κ} - p_{i}^{1 + κ}}{2 κ} (0 < κ < 1)$ [68].
$S_{b} ({p_{i}}) = \sum_{i} (1 - e^{b p_{i}}) + e^{- b} - 1 (b > 0)$ [69].
$S_{E} ({p_{i}}) = \sum_{i} p_{i} (1 - e^{(p_{i} - 1) / p_{i}})$ [70].
$S_{β} ({p_{i}}) = \sum_{i} p_{i}^{β} ln (1 / p_{i}) (0 < β \leq 1)$ [71].
$S_{γ} ({p_{i}}) = \sum_{i} p_{i} {ln}^{1 / γ} (1 / p_{i})$ ([15], page 60).

Appendix B

From Equations (18)–(20), it follows that the functions g of

H_{1}

,

H_{2}

, and

H_{3}

are the following:

\begin{matrix} g_{1} (ε) & \equiv & 1 - ε^{ε} \approx - ε ln ε, \\ g_{2} (ε) & \equiv & ln (2 - ε^{ε}) \approx 1 - ε^{ε} \approx - ε ln ε, \end{matrix}

g_{3} (ε) \equiv ε + ln (2 - ε^{ε}) \approx ε - ε ln ε \approx - ε ln ε .

Since

H_{1}

and

H_{3}

are generalized entropies of the type in Equation (10), we conclude that both belong to the same class as

S_{B G S}

(see Equation (11)), hence

(c, d) = (1, 1)

.

H_{2}

is a generalized entropy of the type in Equation (14) with

G (u) = e^{u}

. Therefore,

\frac{G (\frac{λ}{ε} g_{2} (\frac{ε}{λ}))}{G (\frac{1}{ε} g_{2} (ε))} \approx \frac{exp (- \frac{λ}{ε} \frac{ε}{λ} ln \frac{ε}{λ})}{exp (- ln ε)} = \frac{λ / ε}{1 / ε} = λ .

Comparison with Equation (27) shows that

c = 0

.

Moreover,

\frac{G (\frac{1}{ε^{1 + a}} g_{2} (ε^{1 + a}))}{ε^{a (c - 1)} G (\frac{1}{ε} g_{2} (ε))} \approx \frac{exp (- (1 + a) ln ε)}{ε^{- a} exp (- ln ε)} = \frac{ε^{- (1 + a)}}{ε^{- (1 + a)}} = 1 .

Comparison with Equation (28) shows that

d = 0

.

Appendix C

1. First, note from Equation (32) that

g_{c, d} (x) = e A Γ (d + 1, 1 - c ln x),

(A1)

where the incomplete Gamma function

Γ (d + 1, 1 - c ln x)

exists for

d > - 1

and all

x \in (0, 1]

(see Equation (33)), with

g_{c, d} (0) = {lim}_{x \to 0 +} e A Γ (d + 1, 1 - c ln x) = 0

.

Among Conditions (C1)–(C3) on

g_{c, d}

(Section 2), for the entropy

S_{c, d}

in Equation (32) to be admissible, only concavity (Condition (C2)) needs to be checked. Since

\begin{matrix} \frac{d^{2}}{d x^{2}} g_{c, d} (x) & = & e A \frac{d^{2}}{d x^{2}} Γ (d + 1, 1 - c ln x) \\ = & e A {(\frac{c}{x})}^{2} e^{- 1 + c ln x} {(1 - c ln x)}^{d - 1} \times \\ \times (1 - \frac{1}{c} + (1 - c) ln x - d), \end{matrix}

it holds

g_{c, d}^{″} (x) \leq 0

if and only if

d \geq 1 - \frac{1}{c} + (1 - c) ln x,

where

- \infty < (1 - c) ln x \leq 0

for each

c \in (0, 1]

and

x \in [0, 1]

. Therefore,

g_{c, d}^{″} (x) \leq 0

for all

x \in [0, 1]

if and only if

d \geq 1 - \frac{1}{c}

, where

- \infty < 1 - \frac{1}{c} \leq 0

. On the other hand,

d > - 1

for the integral

Γ (d + 1, 1 - c ln x)

to exist. Both restrictions together lead then to the condition in Equation (31) on d for

S_{c, d}

to be a generalized entropy.

2. Use the asymptotic approximation 6.5.32 of [59]

\begin{matrix} Γ (d + 1, 1 - c ln ε) & = & {(1 - c ln ε)}^{d} e^{c ln ε - 1} + O (\frac{1}{ln ε}) \\ \approx & e^{- 1} ε^{c} {(1 - c ln ε)}^{d} \end{matrix}

(A2)

(

d > - 1

,

0 < c \leq 1

) to obtain the leading approximation of

g_{c, d} (x)

in an infinitesimal neighborhood of 0:

g_{c, d} (ε) \approx A ε^{c} {(1 - c ln ε)}^{d} \approx A c^{d} ε^{c} {(ln \frac{1}{ε})}^{d} .

(A3)

Using Equation (A3), the following can be derived:

\frac{g_{c, d} (z ε)}{g_{c, d} (ε)} \approx \frac{z^{c} ε^{c} {(ln \frac{1}{z ε})}^{d}}{ε^{c} {(ln \frac{1}{ε})}^{d}} = z^{c} \frac{{(ln \frac{1}{z} + ln \frac{1}{ε})}^{d}}{{(ln \frac{1}{ε})}^{d}} \approx z^{c}

(A4)

(see Equation (29)) and

\frac{g_{c, d} (ε^{1 + a})}{ε^{a c} g_{c, d} (ε)} = \frac{ε^{c (1 + a)} {(1 + a)}^{d} {(ln \frac{1}{ε})}^{d}}{ε^{a c} ε^{c} {(ln \frac{1}{ε})}^{d}} = {(1 + a)}^{d}

(A5)

(see Equation (30)).

3. From Equation (A3), we obtain

g_{1, 1} (ε) \approx - A ε ln ε

(A6)

(see Example (E1)) and,

g_{c, 0} (ε) \approx A ε^{c}

(A7)

(see Example (E2)). Set

A = {(1 - c + c d)}^{- 1}

[16] in Equations (A6) and (A7) to reproduce the g functions of

S_{B G S}

(

A = 1

) and

T_{c}

,

0 < c < 1

(

A = \frac{1}{1 - c}

), respectively.

References

Clausius, R. The Mechanical Theory of Heat; McMillan and Co.: London, UK, 1865. [Google Scholar]
Boltzmann, L. Weitere Studien über das Wärmegleichgewicht unter Gasmolekülen. Sitz. Ber. Akad. Wiss. Wien (II) 1872, 66, 275–370. [Google Scholar]
Boltzmann, L. Über die Beziehung eines allgemeinen mechanischen Satzes zum zweiten Hauptsatz der Wärmetheorie. Sitz. Ber. Akad. Wiss. Wien (II) 1877, 75, 67–73. [Google Scholar]
Gibbs, J.W. Elementary Principles in Statistical Mechanics—Developed with Especial References to the Rational Foundation of Thermodynamics; C. Scribner’s Sons: New York, NY, USA, 1902. [Google Scholar]
Dewar, R. Information theory explanation of the fluctuation theorem, maximum entropy production and self-organized criticality in nonequilibrium stationary state. J. Phys. A Math. Gen. 2003, 36, 631–641. [Google Scholar] [CrossRef]
Martyushev, L.M. Entropy and entropy production: old misconceptions and new breakthroughs. Entropy 2013, 15, 1152–1170. [Google Scholar] [CrossRef]
Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
Wissner-Gross, A.D.; Freer, C.E. Causal entropic forces. Phys. Rev. Lett. 2013, 110, 168702. [Google Scholar] [CrossRef] [PubMed]
Mann, R.P.; Garnett, R. The entropic basis of collective behaviour. J. R. Soc. Interface 2015, 12, 20150037. [Google Scholar] [CrossRef] [PubMed]
Kolmogorov, A.N. A new metric invariant of transitive dynamical systems and Lebesgue space endomorphisms. Dokl. Acad. Sci. USSR 1958, 119, 861–864. [Google Scholar]
Rényi, A. On measures of entropy and information. In Proceedings of the 4th Berkeley Symposium on Mathematics, Statistics and Probability; Neyman, J., Ed.; University of California Press: Berkeley, CA, USA, 1961; pp. 547–561. [Google Scholar]
Tsallis, C. Possible generalization of Boltzmann–Gibbs statistics. J. Stat. Phys. 1988, 52, 479–487. [Google Scholar] [CrossRef]
Amigó, J.M.; Keller, K.; Unakafova, V. On entropy, entropy-like quantities, and applications. Disc. Cont. Dyn. Syst. B 2015, 20, 3301–3343. [Google Scholar] [CrossRef] [Green Version]
Csiszár, I. Axiomatic characterization of information measures. Entropy 2008, 10, 261–273. [Google Scholar] [CrossRef]
Tsallis, C. Introduction to Nonextensive Statistical Mechanics; Springer: New York, NY, USA, 2009. [Google Scholar]
Hanel, R.; Thurner, S. A comprehensive classification of complex statistical systems and an axiomatic derivation of their entropy and distribution functions. EPL 2011, 93, 20006. [Google Scholar] [CrossRef]
Principe, J.C. Information Theoretic Learning: Renyi’s Entropy and Kernel Perspectives; Springer: New York, NY, USA, 2010. [Google Scholar]
Hernández, S. Introducing Graph Entropy. Available online: http://entropicai.blogspot.com/search/label/Graph%20entropy (accessed on 22 October 2018).
Salicrú, M.; Menéndez, M.L.; Morales, D.; Pardo, L. Asymptotic distribution of (h, ϕ)-entropies. Commun. Stat. Theory Meth. 1993, 22, 2015–2031. [Google Scholar] [CrossRef]
Bosyk, G.M.; Zozor, S.; Holik, F.; Portesi, M.; Lamberti, P.W. A family of generalized quantum entropies: Definition and properties. Quantum Inf. Process. 2016, 15, 3393–3420. [Google Scholar] [CrossRef]
Von Neumann, J. Thermodynamik quantenmechanischer Gesamtheiten. Nachrichten von der Gesellschaft der Wissenschaften zu Göttingen 1927, 1927, 273–291. (In German) [Google Scholar]
Hein, C.A. Entropy in Operational Statistics and Quantum Logic. Found. Phys. 1979, 9, 751–786. [Google Scholar] [CrossRef]
Short, A.J.; Wehner, S. Entropy in general physical theories. New J. Phys. 2010, 12, 033023. [Google Scholar] [CrossRef] [Green Version]
Holik, F.; Bosyk, G.M.; Bellomo, G. Quantum information as a non-Kolmogovian generalization of Shannon’s theory. Entropy 2015, 17, 7349–7373. [Google Scholar] [CrossRef]
Portesi, M.; Holik, F.; Lamberti, P.W.; Bosyk, G.M.; Bellomo, G.; Zozor, S. Generalized entropie in quantum and classical statistical theories. Eur. Phys. J. Spec. Top. 2018, 227, 335–344. [Google Scholar] [CrossRef]
Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley and Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
Enciso, A.; Tempesta, P. Uniqueness and characterization theorems for generalized entropies. J. Stat. Mech. 2017, 123101. [Google Scholar] [CrossRef]
Khinchin, A.I. Mathematical Foundations of Information Theory; Dover Publications: New York, NY, USA, 1957. [Google Scholar]
Ash, R.B. Information Theory; Dover Publications: New York, NY, USA, 1990. [Google Scholar]
MacKay, D.J. Information Theory, Inference, and Earning Algorithms; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
Bandt, C. A new kind of permutation entropy used to classify sleep stages from invisible EEG microstructure. Entropy 2017, 19, 197. [Google Scholar] [CrossRef]
Havrda, J.; Charvát, F. Quantification method of classification processes. Concept of structural α-entropy. Kybernetika 1967, 3, 30–35. [Google Scholar]
Abe, S. Stability of Tsallis entropy and instabilities of Renyi and normalized Tsallis entropies. Phys. Rev. E 2002, 66, 046134. [Google Scholar] [CrossRef] [PubMed]
Tsallis, C.; Brigatti, E. Nonextensive statistical mechanics: A brief introduction. Contin. Mech. Thermodyn. 2004, 16, 223–235. [Google Scholar] [CrossRef] [Green Version]
Abe, S. Tsallis entropy: How unique? Contin. Mech. Thermodyn. 2004, 16, 237–244. [Google Scholar] [CrossRef]
Dos Santos, R.J.V. Generalization of Shannon’s theorem for Tsallis entropy. J. Math. Phys. 1997, 38, 4104–4107. [Google Scholar] [CrossRef]
Suyari, H. Generalization of Shannon–Khinchin axioms to nonextensive systems and the uniqueness theorem for the nonextensive entropy. IEEE Trans. Inf. Theory 2004, 50, 1783–1787. [Google Scholar] [CrossRef]
Furuichi, S. On uniqueness theorems for Tsallis entropy and Tsallis relative entropy. IEEE Trans. Inf. Theory 2005, 51, 3638–3645. [Google Scholar] [CrossRef]
Jäckle, S.; Keller, K. Tsallis entropy and generalized Shannon additivity. Axioms 2016, 6, 14. [Google Scholar] [CrossRef]
Hanel, R.; Thurner, S. When do generalized entropies apply? How phase space volume determines entropy. Europhys. Lett. 2011, 96, 50003. [Google Scholar] [CrossRef] [Green Version]
Plastino, A.R.; Plastino, A. Stellar polytropes and Tsallis’ entropy. Phys. Lett. A 1993, 174, 384–386. [Google Scholar] [CrossRef]
Alemany, P.A.; Zanette, D.H. Fractal random walks from a variational formalism for Tsallis entropies. Phys. Rev. E 1994, 49, R956–R958. [Google Scholar] [CrossRef]
Plastino, A.R.; Plastino, A. Non-extensive statistical mechanics and generalized Fokker–Planck equation. Physica A 1995, 222, 347–354. [Google Scholar] [CrossRef]
Tsallis, C.; Bukman, D.J. Anomalous diffusion in the presence of external forces: Exact time-dependent solutions and their thermostatistical basis. Phys. Rev. E 1996, 54, R2197. [Google Scholar] [CrossRef]
Capurro, A.; Diambra, L.; Lorenzo, D.; Macadar, O.; Martin, M.T.; Mostaccio, C.; Plastino, A.; Rofman, E.; Torres, M.E.; Velluti, J. Tsallis entropy and cortical dynamics: The analysis of EEG signals. Physica A 1998, 257, 149–155. [Google Scholar] [CrossRef]
Maszczyk, T.; Duch, W. Comparison of Shannon, Renyi and Tsallis entropy used in decision trees. In Proceedings of the International Conference on Artificial Intelligence and Soft Computing, Zakopane, Poland, 22–26 June 2008; Springer: Berlin, Germany, 2008; pp. 643–651. [Google Scholar]
Gajowniczek, K.; Karpio, K.; Łukasiewicz, P.; Orłowski, A.; Zabkowski, T. Q-Entropy approach to selecting high income households. Acta Phys. Pol. A 2015, 127, 38–44. [Google Scholar] [CrossRef]
Gajowniczek, K.; Orłowski, A.; Zabkowski, T. Simulation study on the application of the generalized entropy concept in artificial neural networks. Entropy 2018, 20, 249. [Google Scholar] [CrossRef]
Lesche, B. Instabilities of Renyi entropies. J. Stat. Phys. 1982, 27, 419–422. [Google Scholar] [CrossRef]
Mariz, A.M. On the irreversible nature of the Tsallis and Renyi entropies. Phys. Lett. A 1992, 165, 409–411. [Google Scholar] [CrossRef]
Aczél, J.; Daróczy, Z. Charakterisierung der Entropien positiver Ordnung und der Shannonschen Entropie. Acta Math. Acad. Sci. Hung. 1963, 14, 95–121. (In German) [Google Scholar] [CrossRef]
Jizba, P.; Arimitsu, T. The world according to Rényi: Thermodynamics of multifractal systems. Ann. Phys. 2004, 312, 17–59. [Google Scholar] [CrossRef]
Rényi, A. On the foundations of information theory. Rev. Inst. Int. Stat. 1965, 33, 1–4. [Google Scholar] [CrossRef]
Campbell, L.L. A coding theorem and Rényi’s entropy. Inf. Control 1965, 8, 423–429. [Google Scholar] [CrossRef]
Csiszár, I. Generalized cutoff rates and Rényi information measures. IEEE Trans. Inf. Theory 1995, 41, 26–34. [Google Scholar] [CrossRef]
Bennett, C.; Brassard, G.; Crépeau, C.; Maurer, U. Generalized privacy amplification. IEEE Trans. Inf. Theory 1995, 41, 1915–1923. [Google Scholar] [CrossRef]
Kannathal, N.; Choo, M.L.; Acharya, U.R.; Sadasivan, P.K. Entropies for detection of epilepsy in EEG. Comput. Meth. Prog. Biomed. 2005, 80, 187–194. [Google Scholar] [CrossRef] [PubMed]
Contreras-Reyes, J.E.; Cortés, D.D. Bounds on Rényi and Shannon Entropies for Finite Mixtures of Multivariate Skew-Normal Distributions: Application to Swordfish (Xiphias gladius Linnaeus). Entropy 2016, 11, 382. [Google Scholar] [CrossRef]
Abramowitz, M.; Stegun, I.A. Handbook of Mathematical Tables; Dover Publications: New York, NY, USA, 1972. [Google Scholar]
Hanel, R.; Thurner, S. Generalized (c, d)-entropy and aging random walks. Entropy 2013, 15, 5324–5337. [Google Scholar] [CrossRef] [Green Version]
Chavanis, P.H. Nonlinear mean field Fokker–Planck equations. Application to the chemotaxis of biological populations. Eur. Phys. J. B 2008, 62, 179–208. [Google Scholar] [CrossRef]
Martinez, S.; Plastino, A.R.; Plastino, A. Nonlinear Fokker–Planck equations and generalized entropies. Physica A 1998, 259, 183–192. [Google Scholar] [CrossRef]
Bouchaud, J.P.; Georges, A. Anomalous diffusion in disordered media: Statistical mechanisms, models and physical applications. Phys. Rep. 1990, 195, 127–293. [Google Scholar] [CrossRef]
Dubkov, A.A.; Spagnolo, B.; Uchaikin, V.V. Lévy flight superdiffusion: An introduction. Int. J. Bifurcat. Chaos 2008, 18, 2649–2672. [Google Scholar] [CrossRef]
Schwämmle, V.; Curado, E.M.F.; Nobre, F.D. A general nonlinear Fokker–Planck equation and its associated entropy. EPJ B 2007, 58, 159–165. [Google Scholar] [CrossRef]
Czégel, D.; Balogh, S.G.; Pollner, P.; Palla, G. Phase space volume scaling of generalized entropies and anomalous diffusion scaling governed by corresponding nonlinear Fokker–Planck equations. Sci. Rep. 2018, 8, 1883. [Google Scholar] [CrossRef] [PubMed]
Anteneodo, C.; Plastino, A.R. Maximum entropy approach to stretched exponential probability distributions. J. Phys. A Math. Gen. 1999, 32, 1089–1098. [Google Scholar] [CrossRef]
Kaniadakis, G. Statistical mechanics in the context of special relativity. Phys. Rev. E 2002, 66, 056125. [Google Scholar] [CrossRef] [PubMed]
Curado, E.M.; Nobre, F.D. On the stability of analytic entropic forms. Physica A 2004, 335, 94–106. [Google Scholar] [CrossRef]
Tsekouras, G.A.; Tsallis, C. Generalized entropy arising from a distribution of q indices. Phys. Rev. E 2005, 71, 046144. [Google Scholar] [CrossRef] [PubMed]
Shafee, F. Lambert function and a new non-extensive form of entropy. IMA J. Appl. Math. 2007, 72, 785–800. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Tsallis entropy

T_{q} (p, 1 - p)

for

q = 0.5, 1, 2

and 5.

Figure 1. Tsallis entropy

T_{q} (p, 1 - p)

for

q = 0.5, 1, 2

and 5.

Figure 2. Rényi entropy

R_{q} (p, 1 - p)

for

q = 0.5, 1, 2

and 5.

Figure 2. Rényi entropy

R_{q} (p, 1 - p)

for

q = 0.5, 1, 2

and 5.

Figure 3. Entropies

H_{i} (p, 1 - p)

,

i = 1, 2, 3

, along with

S_{B G S} (p, 1 - p)

and

H_{2} - S_{B G S} - 1

for comparison.

Figure 3. Entropies

H_{i} (p, 1 - p)

,

i = 1, 2, 3

, along with

S_{B G S} (p, 1 - p)

and

H_{2} - S_{B G S} - 1

for comparison.

Figure 4. Scaled entropies

{\tilde{H}}_{i} (p, 1 - p)

,

i = 1, 2, 3,

see Equation (24).

Figure 4. Scaled entropies

{\tilde{H}}_{i} (p, 1 - p)

,

i = 1, 2, 3,

see Equation (24).

Figure 5. Visual summary of the main result presented in Section 5 schematically depicting the relation between the exponents

γ

and c. Source: [66].

Figure 5. Visual summary of the main result presented in Section 5 schematically depicting the relation between the exponents

γ

and c. Source: [66].

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Amigó, J.M.; Balogh, S.G.; Hernández, S. A Brief Review of Generalized Entropies. Entropy 2018, 20, 813. https://doi.org/10.3390/e20110813

AMA Style

Amigó JM, Balogh SG, Hernández S. A Brief Review of Generalized Entropies. Entropy. 2018; 20(11):813. https://doi.org/10.3390/e20110813

Chicago/Turabian Style

Amigó, José M., Sámuel G. Balogh, and Sergio Hernández. 2018. "A Brief Review of Generalized Entropies" Entropy 20, no. 11: 813. https://doi.org/10.3390/e20110813

APA Style

Amigó, J. M., Balogh, S. G., & Hernández, S. (2018). A Brief Review of Generalized Entropies. Entropy, 20(11), 813. https://doi.org/10.3390/e20110813

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Brief Review of Generalized Entropies

Abstract

1. Introduction

2. Generalized Entropies

3. Examples of Generalized Entropies

3.1. Tsallis Entropy

3.2. Rényi Entropy

3.3. Graph Related Entropies

4. Hanel–Thurner Exponents

5. Asymptotic Relation between the HT Exponent c and the Diffusion Scaling Exponent

5.1. The Non-Stationary Regime

5.2. Relation between the Stationary and Non-Stationary Regime

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A

Appendix B

Appendix C

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI