1. Definition and Properties of Entropic Distance
Dealing with the dynamics of classical probabilities, we would like to propose a general recipe for defining the corresponding formula for the entropic divergence between two probability distributions. Our goal is to handle complex systems with a stochastic dynamics, generalized to nonlinear dependence on the probabilities. For the study of quantum state probabilities and their distance measures we refer to a recent paper [
1] and references therein.
Entropic distance, more properly called “entropic divergence”, is traditionally interpreted as a relative entropy, as a difference between entropies with a prior condition, and without [
2]. It is also the Boltzmann–Shannon entropy of a distribution relative to another [
3]. Looking at this construction, however, from the viewpoint of a generalized entropy [
4], the simple difference or logarithm of a ratio cannot be held as a definition anymore.
Instead, in this paper, we explore a reverse engineering concept: seeking an entropic divergence formula, which is subject to some wanted properties, we consider entropy as a derived quantity. More precisely, we seek entropic divergence formulas appropriate for given stochastic dynamics, shrinking during the approach to a stationary distribution, whenever it exists, and establish the entropy formula from this distance to the uniform distribution. By doing so we serve two goals: (i) having constructed a non-negative entropic distance we derive an entropy formula which is maximal for the uniform distribution; and (ii) we come as near as possible to the classical difference formula for the relative entropy.
Starting from a given master equation, it is far from trivial that which is the most suitable entropy divergence formula for analyzing the stability of a stationary solution. In the present paper we provide a general procedure to obtain a general entropic divergence formula for atypical cases. Although we exemplify only the well-known cases of the logarithmic formula of the Kullback–Leibler and that of the Renyi divergence, our result readily generalizes to an infinite number of cases, distinguished by the dependence on the initial state probability at each transition term.
We start our discussion by contrasting the definition of the metric distance, known from geometry, to the basic properties of an entropic distance. The metric distance possesses the following properties:
for a pair of points P and Q,
only for ,
symmetric measure,
, the triangle inequality in elliptic spaces.
The entropic divergence on the other hand is neither necessarily symmetric, nor can satisfy a triangle inequality. On the other hand it is subject to the second law of thermodynamics, distinguishing the time arrow from the past to the future. We require for a real functional, , depending on the distributions and , the followings to hold:
for a pair of distributions and ,
only if the distributions coincide ,
if is the stationary distribution,
only for , i.e., the stationary distribution is unique.
Although this definition is not symmetric in the handling of the normalized distributions
and
, it is an easy task to consider the symmetrized version,
. This symmetrized, entropic divergence inherits some properties from the fiducial construction. Considering a scaling trace form entropic divergence,
with
, to begin with, we identify the following symmetrized kernel function:
The only constraint is to start with a core function,
with a definite concavity. Jensen inequality tells for
that
For satisfying property 1 and 2 one simply sets . Interestingly enough, this setting suffices also for the satisfaction of the second law of thermodynamics, formulated above as further constraints 3 and 4. As a consequence of the symmetrization, it also follows that and .
The symmetrized entropic divergence shows some new, emergent properties. We list its derivatives as follows:
The consequences, listed below, can be derived from these general relations:
,
,
is a minimum,
.
In this way the kernel function, and hence each summand in the symmetrized entropic divergence formula, is non-negative, not only the total sum.
2. Entropic Distance Evolution Due to Linear Stochastic Dynamics
Now we study properties 3 and 4, by evaluating the rate of change of the entropic divergence in time. This change is based on the dynamics (time evolution) of the evolving distribution,
, while the targeted stationary distribution,
is, by definition, time independent. First we consider a class of stochastic evolutions governed by differential equations for
, linear in the distribution,
[
5]. We consider the trace form
and the background master equation
The antisymmetrized sum in the above equation is merely to ensure the conservation of the norm,
, during the time evolution. Using again the notation
we obtain
The basic trick is to apply the splitting
to get
Here the sum in the first term vanishes due to the very definition of the stationary distribution,
. For estimating the remaining term, we utilize the Taylor series remainder theorem in the Lagrange form. We recall the Taylor expansion of the kernel function
,
with
. Here the first derivative term has occurred in Equation (
6). This construction delivers
Here the first sum vanishes again—after exchanging the indices m and n in the first summand, the result is proportional to the total balance expression, which is zero for the stationary distribution. With positive transition rates, the approach to stationary distribution, is hence proven for all . We note that we never used the detailed balance condition for the transition rates, only the vanishing of the total balance, which defines the stationary distribution.
This proof, without recalling the detailed balance condition as Boltzmann’s famous H-theorem did, is quite general. Any core function with positive second derivative and the scaling trace form co-act to ensure the correct change in time. By using the traditional choice,
, we have
and
, satisfying indeed all requirements. The integrated entropic divergence formula (no symmetrization) in this case is given as the Kullback–Leibler divergence :
There is a rationale behind using the logarithm function. It is the only one being additive for the product form of its argument, mapping factorizing and hence statistically independent distributions to an additive entropic divergence kernel: For also therefore we have . Aiming at , the solution is . For it must be , so without restricting generality one chooses .
Finally, we would like to treat this entropic divergence as an entropy difference. This is achieved when comparing the stationary distribution to the uniform distribution,
. Using the above Kullback–Leibler divergence formula one easily derives
with
being the Boltzmann–Gibbs–Planck–Shannon entropy formula. From the Jensen inequality it follows
, so
.
3. Entropic Divergence Evolution for Nonlinear Master Equations
Detailed balance is also not needed for a more general dynamics. We consider Markovian dynamics, with a master equation nonlinear in the distribution,
, as
The stationarity condition defines
The entropic distance formula is sought for in the trace form (but this time without the scaling assumption):
the dependence on
is fixed by
. The change of the entropic divergence in this case is given by
with
. We again put
in the first summand:
In order to use the remainder theorem one has to identify
This ensures for any and .
We examine the example of the
q–Kullback–Leibler or Rényi divergence. Starting with the classical logarithmic kernel,
, we have
. Now having a nonlinear stochastic dynamics,
, the integrated entropic divergence formula (without symmetrization) delivers the Tsallis divergence [
6,
7,
8],
with
being the so called deformed logarithm with the real parameter
q.
We again would like to interpret this entropic divergence as entropy difference. The entropic divergence of the stationary distribution from the uniform distribution
is given by:
with
being the Tsallis entropy formula:
From the Jensen inequality, it follows , so , i.e., the Tsallis entropy formula is also maximal for the uniform distribution. The factor signifies non-extensivity, a dependence on the number of states in the relation between the entropic divergence and the relative Tsallis entropy.
4. Master Equation for Unidirectional Growth and Reset
With the particular choice of the transition rates,
, one describes a local growth process augmented with direct resetting transitions from any state to the ground state labeled by the index zero [
9]. The corresponding master equation
is terminated at
and the equation for the
state takes care of the normalization conservation:
For the stationary distribution one obtains
and
has to be obtained from the normalization.
Table 1 summarizes some well known probability density functions, PDFs , which emerge as stationary distribution to this simplified stochastic dynamics upon different choices of the growth and reset rates
and
. In the continuous limit we obtain
with the stationary distribution
Finally we derive a bound for the entropy production in the continuous model of unidirectional growth with resetting.
First we study the time evolution of the ratio,
. Using
we get from Equation (
25):
Using the same equation for stationary
and dividing by
Q we obtain
Now we turn to the evolution of the entropic divergence,
With the symmetrized kernel,
, one gets using
the following distance evolution, considering the boundary condition
and
:
We note that for the Kullback–Leibler divergence the following symmetrized kernel function has to be used: leads to and in this way ensures .
In order to obtain a lower bound for the speed of the approach to stationarity, we use again the Jensen inequality for
:
with any arbitrary
satisfying
. For pour purpose we choose
. This leads to the following result:
Note that the controlling quantity is actually the expectation value of the resetting rate, . Since reaches its minimum with the value zero only at the argument 1, the entropic divergence stops changing only if the stationary distribution is achieved. In all other cases it shrinks.
5. Conclusions
Summarizing, in this paper we have presented a construction strategy for the entropic distance formula, designed to shrink for a given wide class of stochastic dynamics. The very entropy formula was then derived from inspecting this distance between the uniform distribution and the stationary PDF of the corresponding master equation. In this way, for linear master equations the well-known Kullback–Leibler definition arises, while for nonlinear dependence on the occupation probabilities one always arrives at an accordingly modified expression. In particular, for a general power-like dependence the Tsallis q-entropy occurs as the “natural” relative entropy interpretation of the proper entropic divergence. In the continuous version of the growth and reset master equation, a dissipative probability flow supported with an inflow at the boundary, a lower bound was given for the shrinking speed of the symmetrized entropic divergence using the Jensen inequality.
To finish this paper we would like to make some remarks on real world applications of the above discussed mathematical treatment. Among possible applications of the growth and resetting model, we mention the network degree distributions showing exponential behavior for constant rates and a Tsallis–Pareto distribution [
10] (in the discrete version a Waring distribution [
11,
12]) for having a linear preference in the growth rate,
. For high energy particle abundance (hadron multiplicity) distributions the negative binomial PDF is an excellent approximation [
13], when both rates
and
are linear functions of the state label. For middle and small settlement size distributions a log-normal PDF arise, achievable with linear growth rate,
and a logarithmic reset rate,
. Citations of scientific papers and Facebook shares and likes also follow a scaling Tsallis–Pareto distribution [
14,
15], characteristic to constant resetting and linear growth rates. While wealth seems to be distributed according to a Pareto-law tail, the middle class incomes rather show a gamma distribution, stemming from linear reset and growth rates. For a review of such applications see our forthcoming work.