*Proceeding Paper* **Legendre Transformation and Information Geometry for the Maximum Entropy Theory of Ecology †**

**Pedro Pessoa**

Department of Physics, University at Albany (SUNY), Albany, NY 12222, USA; ppessoa@albany.edu

† Presented at the 40th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, online, 4–9 July 2021.

**Abstract:** Here I investigate some mathematical aspects of the maximum entropy theory of ecology (METE). In particular I address the geometrical structure of METE endowed by information geometry. As novel results, the macrostate entropy is calculated analytically by the Legendre transformation of the log-normalizer in METE. This result allows for the calculation of the metric terms in the information geometry arising from METE and, by consequence, the covariance matrix between METE variables.

**Keywords:** METE; metabolic rate distributions; information geometry; Legendre transformation; Lambert W function

#### **1. Introduction**

The method of maximum entropy (MaxEnt) is usually associated with Jaynes' work [1–3] connecting statistical physics and the information entropy proposed by Shannon [4] although its mathematics is known since Gibbs [5]. It consists of selecting probability distributions by maximizing a functional—namely entropy—usually under a set of expected values constraints, arriving at what is known as Gibbs distributions. Since Shore and Johnson [6] MaxEnt has been understood as a general method for inference—see also [7–9]—hence it is not surprising that (i) Gibbs distributions are what is known in statistical theory as exponential family—the only distributions for which sufficient statistics exist (see e.g., [10]), (ii) MaxEnt encompasses the methods of Bayesian statistics [11], and (iii) MaxEnt has found successful applications in several fields of science (e.g., [12–22]).

One of the scientific fields in which MaxEnt has been successfully applied is macroecology. The work of Harte and collaborators [23–27] presents what is known as the maximum entropy theory of ecology (METE). It consists of finding, through MaxEnt, a joint conditional distribution for the abundance of a species and the metabolic rate of its individuals. From the marginalization and expected values of the MaxEnt distribution, it is possible to obtain (i) the species abundance distribution (Fisher's log series), (ii) the species-area distribution, (iii) the distribution for metabolic rates over individuals, and (iv) the relationship between the metabolic rate of individuals in a species and that species abundance —for a comprehensive confirmation of METE with experimental data see [28]. In a recent article Harte [29] brings forward the need for dynamical models based on MaxEnt, as METE assume the variables to be static—It is relevant to say that Jaynes applied dynamical methods based on information theory for nonequilibrium statistical mechanics [30] leading to what is known as maximum caliber [31,32]. However, maximum caliber assumes a Hamiltonian dynamics and, therefore, does not generalize to ecology and other complex systems.

The field known as information geometry (IG) [33–36] assigns a Riemannian geometry structure to probability distributions. In information geometry the distances are given by the Fisher-Rao information metric (FRIM) [37,38], which is the only metric in accordance with the grouping property of probability distributions [39]. IG has found important

**Citation:** Pessoa, P. Legendre Transformation and Information Geometry for the Maximum Entropy Theory of Ecology. *Phys. Sci. Forum* **2021**, *3*, 1. https://doi.org/ 10.3390/psf2021003001

Academic Editors: Wolfgang von der Linden and Sascha Ranftl

Published: 3 November 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

applications for probabilistic dynamical systems [34,40–43]. Here the FRIM terms for the distributions arising from METE will be calculated. In a future publication I will evolve METE into entropic dynamical models for ecology, as explained in [43], in order to do so it is necessary to calculate the macrostate entropy and the FRIM terms—which can be obtained from differentiating the macrostate entropy. Therefore, present article performs the calculations necessary for an entropic dynamics model for macroecology.

The layout of the paper is as follows: The following section (2) presents MaxEnt in general terms followed by the MaxEnt process in METE. In particular we obtain the macrostate entropy through the Legendre transform, and the Lambert W special function [44,45], which is a novel result to the best of my knowledge. Section 3 presents some general results of IG and calculate the information metric terms for METE. Section 4 concludes the present article by commenting on possible applications and perspectives for IG in a dynamical theory of macroecology.

#### **2. Maximum Entropy**

In information theory, probability distributions encode the available information about a system's variables *x* ∈ X . MaxEnt consists of updating from a prior distribution *q*(*x*)—usually, but not necessarily, taken to be uniform—to a posterior *ρ*(*x*) that maximizes the entropy functional under a set of constraints meant to represent the known information about the system. Usually these constraints are the expected values *A<sup>i</sup>* of a set of real valued functions {*a<sup>i</sup>* (*x*)} namely sufficient statistics. The distribution *ρ* is found as the solution to the following optimization problem

$$\max\_{\rho} \qquad H[\rho] = -\int \text{d}x \,\rho(x) \log \left(\frac{\rho(x)}{q(x)}\right),\tag{1a}$$

$$\text{s.t.}\qquad\int \text{dx}\,\rho(\mathbf{x}) = 1\tag{1b}$$

$$\int \text{d}x \, a^i(\mathbf{x})\rho(\mathbf{x}) = A^i. \tag{1c}$$

where <sup>d</sup>*<sup>x</sup>* refers to the appropriate measure of the set <sup>X</sup> ; if one is interested in a discrete set <sup>X</sup> <sup>=</sup> {*xμ*}, where *<sup>μ</sup>* corresponds to an enumeration of <sup>X</sup> , we have <sup>d</sup>*<sup>x</sup>* <sup>=</sup> <sup>∑</sup>*μ*, if one is interested in a continuous subset of real variables, e.g., <sup>X</sup> = [*a*, *<sup>b</sup>*], we have <sup>d</sup>*<sup>x</sup>* <sup>=</sup> *<sup>b</sup> <sup>a</sup>* d*x*.

The solution of (1) is the Gibbs distribution

$$\rho(\mathbf{x}|\lambda\_1, \lambda\_2, \dots, \lambda\_n) = \frac{q(\mathbf{x})}{Z(\lambda)} \exp\left(-\sum\_{i=1}^n \lambda\_i a^i(\mathbf{x})\right),\tag{2}$$

where *<sup>λ</sup>* <sup>=</sup> {*λi*} is the set of Lagrange multipliers dual to the expected values *<sup>A</sup>* <sup>=</sup> {*A<sup>i</sup>* } and *Z*(*λ*) is a normalization factor given by

$$Z(\lambda) = \int \text{d}x \, q(\mathbf{x}) \exp\left(-\lambda\_i a^i(\mathbf{x})\right) \,. \tag{3}$$

Above, and on the remainder of this article, we use Einstein's summation notation *AiB<sup>i</sup>* = ∑*<sup>i</sup> AiB<sup>i</sup>* . The expected values can be recovered as

$$A^i = -\frac{1}{Z} \frac{\partial Z}{\partial \lambda\_i} = \frac{\partial F}{\partial \lambda\_i}, \quad \text{where} \quad F(\lambda) \doteq -\log(Z(\lambda))\,. \tag{4}$$

We will refer to *F* as the log-normalizer, which displays a role similar to free energy in statistical mechanics.

If one is able to invert the equations arriving from (4), obtaining this way *λi*(*A*) they can express the probability distributions in terms of the expected values, *ρ*(*x*|*A*) = *ρ*(*x*|*λ*(*A*)). This also allows one to calculate the entropy *H* at its maximum—that means

*H*[*ρ*(*x*|*A*)] for *ρ* in (2)—as a function of the expected values, rather than a functional of *ρ*, obtaining

$$H(A) \doteq H[\rho(\mathbf{x}|\lambda(A))] = -\int d\mathbf{x} \,\rho(\mathbf{x}|\lambda(A)) \log \frac{\rho(\mathbf{x}|\lambda(A))}{q(\mathbf{x})} = \lambda\_i(A)A^i - F(\lambda(A))\,. \tag{5}$$

We will refer to *H*(*A*) as the macrostate entropy, which is what we refer to in statistical mechanics as thermodynamical entropy—meaning the one that appears in the laws of thermodynamics (Since the arguments that identify the macrostate entropy as the thermodynamical entropy assume that the sufficient statistics are conserved quantities in a Hamiltonian dynamics [2], analogous 'laws of thermodynamics' - e.g., conservation of *A*<sup>2</sup> in (12) or an impossibility of *H* in (15) to decrease— are not expected in ecological systems). One can see from (5) that *H*(*A*) is the Legendre transformation [46] of *F*(*λ*). It also follows that *λ<sup>i</sup>* = *<sup>∂</sup><sup>H</sup> <sup>∂</sup>A<sup>i</sup>* .

#### *METE*

The first step towards a MaxEnt description involves choosing the appropriate variables for the problem at hand. In METE [24] one assumes an ecosystem of *S* species supporting *N* individuals with a total metabolic rate *E*, meaning in a unit of time the ecosystem consumes a quantity *E* of energy. The state of the system *x* on MaxEnt is defined for a singular species as the number of individuals (abundance) *n*, *n* ∈ {1, 2, ... , *N*} and the metabolic rate of an individual of that species *ε*, *ε* ∈ [1, *E*]—note that one can choose a system of units so that the smallest metabolic rate is the unit, *εmin* = 1. We represent the state as *x* = (*n*,*ε*).

The second step consists of assigning the sufficient statistics that appropriately captures the information about the system. In METE [24] the statistics chosen are the number of individuals in the species *<sup>a</sup>*1(*n*,*ε*) . <sup>=</sup> *<sup>n</sup>* and the total metabolic rate *<sup>a</sup>*2(*n*,*ε*) . = *nε*. Substituting these into the defined expected value constrains for the sufficient statistics (1), we obtain constraints on average abundance per species

$$A^1 = \sum\_{n=1}^{N} \int\_1^E \text{d}\varepsilon \, n \, \rho(n, \varepsilon | \lambda) = \frac{N}{S} \doteq N',\tag{6}$$

and a constrain on the average metabolic consumption per species

$$A^2 = \sum\_{n=1}^{N} \int\_1^E \text{dɛ } n\varepsilon \,\rho(n, \varepsilon | \lambda) = \frac{E}{S} \doteq E'. \tag{7}$$

The defined variable *N* and *E*will replace *A*<sup>1</sup> and *A*2, respectively, when convenient.

Having the state variables and the sufficient statistics chosen, we can compute all quantities defined in the previous subsection for the specific system defined by METE. With a uniform prior *q*, justified by the fact that at its level of complexity organisms should be considered as distinguishable, this leads to the canonical distribution (2) of the form

$$
\rho(n, \varepsilon | \lambda) = \frac{1}{Z(\lambda)} e^{-\lambda\_1 n} e^{-\lambda\_2 n x} \,, \tag{8}
$$

where the normalization factor (3) is given by

$$Z(\lambda) = \sum\_{n=1}^{N} \int\_{1}^{E} \text{d}\varepsilon \, e^{-\lambda\_{1}n} e^{-\lambda\_{2}n\varepsilon} = \sum\_{n=1}^{N} e^{-\lambda\_{1}n} \left(\frac{e^{-\lambda\_{2}n} - e^{-\lambda\_{2}nE}}{\lambda\_{2}n}\right),\tag{9}$$

from which the expected values (4) can be calculated as

$$A^1 = N' = \frac{1}{\lambda\_2 \ Z(\lambda)} \sum\_{n=1}^{N} e^{-\lambda\_1 n} \left( e^{-\lambda\_2 n} - e^{-\lambda\_2 nE} \right), \tag{10a}$$

$$A^2 = E' = \frac{1}{\lambda\_2} \left[ 1 + \frac{1}{Z(\lambda)} \sum\_{n=1}^{N} e^{-\lambda\_1 n} (e^{-\lambda\_2 n} - E e^{-\lambda\_2 nE}) \right]. \tag{10b}$$

These are complicated equations, however some approximations may make them more treatable.

A fair assumption, knowing what the variables are supposed to represent, is that there are far more individuals than species, *N S* and the average metabolic rate per individual is far greater than the unit of metabolic rate *E*/*N* = *E*- /*N*- 1. This allows for a sequence of approximation that we will treat like assumptions here, namely (i) *<sup>e</sup>*−*λ*2*nE <sup>e</sup>*−*λ*2*n*, (ii) *Ee*−*λ*2*nE <sup>e</sup>*−*λ*2*n*, (iii) *<sup>λ</sup>*<sup>1</sup> <sup>+</sup> *<sup>λ</sup>*<sup>2</sup> 1, and (iv) *<sup>e</sup>*−(*λ*1+*λ*2)*<sup>N</sup>* 1. Further explanation on the validity of these assumptions, under *S N E*, can be seen in [24,26] and their confirmation by numerical calculation can be seen in [24]. Under this understanding we can substitute (9) into (10a) obtaining

$$N' = \frac{\sum\_{n=1}^{N} e^{-\lambda\_1 n} (e^{-\lambda\_2 n} - e^{-\lambda\_2 nE})}{\sum\_{n=1}^{N} \frac{1}{n} e^{-\lambda\_1 n} (e^{-\lambda\_2 n} - e^{-\lambda\_2 nE})} \approx \frac{\sum\_{n=1}^{N} e^{-(\lambda\_1 + \lambda\_2)n}}{\sum\_{n=1}^{N} \frac{1}{n} e^{-(\lambda\_1 + \lambda\_2)n}} \tag{11a}$$

$$N' \approx -\left[\frac{1}{\left(\lambda\_1 + \lambda\_2\right)\log\left(\lambda\_1 + \lambda\_2\right)}\right].\tag{11b}$$

We can also rewrite (10b) obtaining

$$E' = \frac{1}{\lambda\_2} + \frac{\sum\_{n=1}^{N} e^{-\lambda\_1 n} (e^{-\lambda\_2 n} - E e^{-\lambda\_2 nE})}{\sum\_{n=1}^{N} \frac{1}{n} e^{-\lambda\_1 n} \left(e^{-\lambda\_2 n} - e^{-\lambda\_2 nE}\right)} \approx \frac{1}{\lambda\_2} + N'. \tag{12}$$

In order to obtain the macrostate entropy analytically (5) one needs to perform the Legendre transformation for METE, which includes inverting (11) and (12) obtaining *λ*1(*N*- , *E*- ) and *λ*2(*N*- , *E*- ). In page 149 of [24] it is said to be unfeasible. However, it is possible to do so obtaining

$$
\lambda\_1 = \beta(N') - \frac{1}{E' - N'}, \quad \text{and} \quad \lambda\_2 = \frac{1}{E' - N'}.\tag{13}
$$

where

$$\beta(N') \doteq -\left[N'\,\,\mathcal{W}\_{-1}\left(-\frac{1}{N'}\right)\right]^{-1}, \qquad \dot{\beta}(N') \doteq \frac{\mathbf{d}\beta}{\mathbf{d}N'} = \left[N'^2 - \frac{N'}{\beta(N')}\right]^{-1}, \tag{14}$$

and *W*−<sup>1</sup> refers to the second main branch of the Lambert W function (see [44,45]). The details on how (13) inverts (11) and (12) are presented in Appendix A. The macrostate entropy can be calculated directly from (5) as

$$H(N',E') = N'\beta(N') + \log\left(E' - N'\right) - \log\left(N'\beta(N')\right) + 1\,\,. \tag{15}$$

With the calculation of the macrostate entropy finished, we can move into a geometric description of METE.

#### **3. Information Geometry**

This section presents the elementary notions of IG—for more in depth discussion and examples see e.g., [33–36]—and some useful identities for the IG of Gibbs distributions. IG consists of assigning a Riemmanian geometry structure to the space of probability distributions, meaning if a set of distributions *p*(*x*|*θ*) is parametrized by a finite number of coordinates, *<sup>θ</sup>* <sup>=</sup> {*θ<sup>i</sup>* }, the distances—which are a measure of distinguishability—d between the neighbouring distributions *<sup>P</sup>*(*x*|*<sup>θ</sup>* <sup>+</sup> <sup>d</sup>*θ*) and *<sup>P</sup>*(*x*|*θ*) are given by <sup>d</sup><sup>2</sup> <sup>=</sup> *gij*d*θ<sup>i</sup>* d*θ<sup>j</sup>* . The work of Cencov [39] demonstrated that the only metric invariant under Markov embeddings—and, therefore, the only one adequate to represent a space of probability distributions—is the metric of the form

$$g\_{i\bar{j}} = \int \text{d}x \, P(x|\theta) \frac{\partial \log P(x|\theta)}{\partial \theta^{i}} \frac{\partial \log P(x|\theta)}{\partial \theta^{\bar{j}}} \,, \tag{16}$$

know as FRIM.

Considering the MaxEnt results presented in previous section, we can restrict our investigation to the Gibbs distributions using the expected values *A* as coordinates *<sup>θ</sup><sup>i</sup>* <sup>=</sup> *<sup>A</sup><sup>i</sup>* and *<sup>P</sup>*(*x*|*θ*) = *<sup>ρ</sup>*(*x*|*A*) as in (2). Two useful expressions arise in that case—for proofs see e.g., [33]—first: the metric terms are the Hessian of the negative of macrostate entropy, meaning

$$\mathcal{g}\_{i\bar{j}} = -\frac{\partial^2 H}{\partial A^i \partial A^{\bar{j}}} = -\frac{\partial \lambda\_i}{\partial A^{\bar{j}}} \, , \tag{17}$$

and second: the covariance matrix between the sufficient statistics *a<sup>i</sup>* (*x*) is the inverse matrix of *gij*, meaning

$$\mathcal{C}^{i\bar{j}}g\_{\bar{j}k} = \delta^i\_k \, , \quad \text{where} \quad \mathcal{C}^{i\bar{j}} = \left\langle a^i(\mathbf{x})a^j(\mathbf{x}) \right\rangle - A^i A^j \,. \tag{18}$$

We can, then, see how these quantities are calculated for METE.

#### *Information Geometry of METE*

By substituting the macrostate entropy for METE (15) in (17) we obtain the FRIM terms:

$$\begin{split} g\_{11} &= -\dot{\beta}(N') + \frac{1}{(E'-N')^2}, \quad g\_{12} = g\_{21} = -\frac{1}{(E'-N')^2},\\ g\_{22} &= \frac{1}{(E'-N')^2}, \quad \text{and} \quad g = -\frac{\dot{\beta}(N')}{(E'-N')^2}.\end{split} \tag{19}$$

where *g* = det *gij*. Per (18) and from the general form of inverse matrix of a two dimensional matrix, the covariance matrix terms can be calculated directly inverting (19) obtaining

$$\begin{aligned} \mathbb{C}^{11} = \frac{\mathcal{S}22}{\mathcal{S}} = \frac{N'}{\beta(N')} - N'^2, \quad \mathbb{C}^{12} = \mathbb{C}^{21} = -\frac{\mathcal{S}12}{\mathcal{S}} = \frac{N'}{\beta(N')} - N'^2, \\ \text{and} \quad \mathbb{C}^{22} = \frac{\mathcal{S}11}{\mathcal{S}} = E'^2 - 2E'N' + \frac{N'}{\beta(N')} \end{aligned} \tag{20}$$

completing the calculation. The matrix *Cij* can be interpreted directly as the covariance between a species abundance and its total metabolic rate—METE sufficient statistics. The information metric terms presented in (19) allow for further studies on dynamical ecology from a information theory background, as we will comment in the following section.

#### **4. Discussion and Perspectives**

The present article calculates the macrostate entropy (15) for METE. This was made possible by the analytical calculation of the Lagrange multipliers (13) as functions of the expected values (10), previously believed to be unfeasible. This allows for a complete description of METE in terms of the average abundance *N* and the expected metabolic rate *E* of each of the ecosystem species. This opens a broad range of investigations possible by analytical calculations. In particular, the IG arising from METE is presented by calculating the FRIM terms in (19). Independently of any geometric interpretation, that was equivalent to calculate the covariance between METE sufficient statistics (20).

The variables that define an ecosystem's state are not expected to remain constant. Because of this, and the growing relevance of IG in dynamical systems, the calculations made in the present article are an important step into expanding maximum entropy ideas into further investigation in macroecology. The calculations done here allow for evolving METE into an entropic dynamics for ecology, as in the framework developed in [43], this venue of research will be explored in future publication.

**Institutional Review Board Statement:** This study did not involve humans nor other animals.

**Informed Consent Statement:** This study did not involve humans.

**Data Availability Statement:** No new data were created or analyzed in this study. Data sharing is not applicable to this article.

**Acknowledgments:** I would like to thank A. Caticha, J. Harte, E.A. Newman, and C. Camargo for insightful discussions.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A. On the Lambert W Function**

In this appendix we will explain how (13) inverts (11) and (12). The Lambert W function is defined as the solution of

$$\mathcal{W}(\mathbf{x})e^{\mathcal{W}(\mathbf{x})} = \mathbf{x} \,. \tag{A1}$$

The python library SciPy [47] implements the numerical calculation of *W*. This relates to (11b) in the following manner: by defining the variable *β* = *λ*<sup>1</sup> + *λ*<sup>2</sup> we obtain

$$\frac{1}{N'} = -\beta \log \beta \iff \frac{1}{\beta N'} e^{-\frac{1}{\beta N'}} = \frac{1}{N'} \, , \tag{A2}$$

hence *β* = − *N*- *W* <sup>−</sup> <sup>1</sup> *N*- −<sup>1</sup> . It is relevant to say that, from (A1), *W*(*x*) is multivalued the terminology Lambert W 'function' is used loosely. The several single-valued functions that solve (A1) are known as the different 'branches' of the Lambert W. In (13) and (14) only the *W*−<sup>1</sup> branch was taken into account. Given our object of study, we will restrict to functions that are guaranteed to give a *β* that is real for large *N*- . As explained in [44], the two branches *<sup>W</sup>*0(*x*) and *<sup>W</sup>*−1(*x*) are real and analytic for <sup>−</sup>*e*−<sup>1</sup> <sup>&</sup>lt; *<sup>x</sup>* <sup>&</sup>lt; 0, of equivalently *<sup>β</sup>* is real for *N*- > *e*. Coherent with the fact that (11) was derived for large *N*- .

Figure A1 presents the graphs of *β* obtained from the *W*0(*x*) and *W*−1(*x*) branches, as well as a comparison to the *β* obtained numerically from inverting (11a). Even though per (A2) the *β* obtained by both branches inverts (11b), it can be seen from Figure A1 that only the one obtained from *W*−1(*x*) approximates the inverse of (11a) for large *N* and, therefore, it is the only one appropriate for the present investigation.

To complete the claim that *λ*<sup>1</sup> and *λ*<sup>2</sup> in (13) are calculated analytically, it is relevant to say that *W*−<sup>1</sup> <sup>−</sup> <sup>1</sup> *N*- can be calculated using the series expansion (see page 153 in [44])

$$\mathcal{W}\_{-1}\left(-\frac{1}{N'}\right) = -\sum\_{m=0}^{\infty} a\_m z^m \,, \quad \text{where} \quad z = \sqrt{2(\log N' - 1)} \,, \tag{A3}$$

and *am* is defined recursively as *a*<sup>0</sup> = 1, *a*<sup>1</sup> = 1, and

$$a\_m = \frac{1}{m+1} \left( a\_{m-1} - \sum\_{k=2}^{m-1} k \, a\_k \, a\_{m+1-k} \right) . \tag{A4}$$

Note that real *z* implies *N*-> *e*, which is coherent with the condition for *W*−<sup>1</sup> to be real.

**Figure A1.** Graphical comparison between the functions defined as: *β*0(*N*- ) . = − *N*- *W*<sup>0</sup> <sup>−</sup> <sup>1</sup> *N*- −<sup>1</sup> , *<sup>β</sup>*−1(*N*- ) . = − *N*- *W*−<sup>1</sup> <sup>−</sup> <sup>1</sup> *N*- −<sup>1</sup> , and *βi*(*N*- )—obtained numerically from inverting (11a), here using *S* = *N*/*N*- = 20. *<sup>W</sup>*<sup>0</sup> and *<sup>W</sup>*−<sup>1</sup> have complex values for *<sup>N</sup>*- < *e*, the graph above only plots the real part in that region.

#### **References**

