Abstract
Statistical counting ad infinitum is the holographic observable to a statistical dynamics with finite states under independent and identically distributed N sampling. Entropy provides the infinitesimal probability for an observed empirical frequency with respect to a probability prior , when as . Following Callen’s postulate and through Legendre–Fenchel transform, without help from mechanics, we show that an internal energy emerges; it provides a linear representation of real-valued observables with full or partial information. Gibbs’ fundamental thermodynamic relation and theory of ensembles follow mathematically. is to what chemical potential is to particle number N in Gibbs’ chemical thermodynamics, what is to internal energy U in classical thermodynamics, and what is to t in Fourier analysis.
1. Introduction
It is a pleasure to be a part of this celebration for Signe Kjelstrup. She has made significant contributions to nonequilibrium thermodynamics, in both theory and applications that include electrochemistry, transport in heterogeneous media, and T. L. Hill’s small systems [1,2,3]. In this work we extend Gibbs’ and Hill’s approach to equilibrium thermodynamics [4] and show that the new logical path via the “crucial step” advocated in [5] is in fact a consequence of a limit theorem [6] in the mathematical theory of probability [7,8]. The results perfectly fit P. W. Anderson’s notion of emergent phenomenon [9].
Sometimes a mathematical transform can provide a fundamental concept beyond just being a technique for solving a problem, and through which a new representation of a natural phenomenon emerges. A case in point is the Fourier transform (FT) that leads to the theory of harmonics in music instruments [10] and the very concept of optical spectrum. FT represents a function of time in terms of , where is introduced as a novel notion, the temporal frequency of a sinusoidal oscillatory component in time [11]. The solutions to a large class of problems in differential calculus involving t can be very efficiently expressed through FT.
We show in the present paper that the fundamental notion of internal energy first appeared in the theory of thermodynamics in the 19th century, collectively developed by J. R. von Mayer, W. Rankine, R. Clausius, and W. Thomson among many others [12], is a concept that can be understood, and generalized, in statistical counting. The transformation in question is the Legendre–Fenchel transform (LFT) [13,14], a more refined mathematical formulation of the traditional Legendre transform [15].
When a simple statistical analysis is carried out on a set of data, correlated or not, it is usually supposed that they are from an identical probability distribution. One of the best understood systems that exhibit an invariant probability is the ergodic dynamical system [16]. The ergodic theory of classical Hamiltonian dynamics has been an intense research area in both physics and mathematics for more than a century [17,18]. Even when the data are from seemingly different “objects”, say different individuals within a biological species, it is understood that an ergodic mating or mutational process is behind the statistical practice; and the conclusions drawn are most meaningful in this regard. Such an ergodic stochastic dynamic perspective has transformed cell biology through the notion of phenotypic switching in recent years [19].
2. Energetic Theory of Statistical Counting
Let us consider the repeated statistical samples ad infinitum of a system with finite state space . In the present work we shall restrict our discussion for independent and identically distributed (i.i.d.) samples. More general sampling of Markov data will be published elsewhere. The number counting with and counting frequency , not to be confused with the in FT above, has a homogeneous degree 1 neg-entropy function with respect to a given probability prior [8,20]:
The Appendix A provides the mathematical origin of the non-negative as a result of statistical counting. In information theory, it is interpreted as the “surprise” in observing the under the assumption [21,22]. It is a double-edged sword which tells the rareness of (or ) with respect to or erroneous model with respect to empirical . The Kolmogorov probability has two rather different roles in statistical inference and in statistical physics. In the former it has been identified as only half to the other “half of probability theory as it is needed in current applications—the principles for assigning probabilities by logical analysis of incomplete information—is not present at all in the Kolmogorov system” [20].
The application of modern probability to statistical physics involves the limit of sample ad infinitum represented by [6]. In this case, the probability is for all the systems with the same state space ; it is not meant to be realistic for any particular system. It simply provides a “metric” under which each and every particular system has its own representation in terms of its complete information, . The is introduced to further gauge the differences among systems with different ’s on the same ; it becomes a “theory of everything.” Because the limit, there are no uncertainties in ; it is a definitive characterization of an i.i.d. statistical distribution with state space .
Therefore, statistical inference is about the mathematical model of a particular system, and statistical physics is about the mathematical representation of all systems with the same under the supposition of i.i.d. data ad infinitum. The entropy in (1) is an emergent characterization in the limit of , with the starting point in terms of generative models [9]. It provides the relationship between and in the sampling process. It is an Eulerian degree 1 homogeneous function of : . This fits naturally to the fundamental thermodynamic postulate formulated by H. B. Callen [23]. The LFT of as a function of the normalized then yields [13,14,24]:
with corresponding optimal
Note that the second equation in (2b) is obtained when one uses calculus to solve the infimum in (2a); this recovers the traditional Legendre transform. Normalizing to induces a gauge freedom in (2), an arbitrary additive constant to . In statistical thermodynamics, the conjugate variable introduced in Equation (2) has been interpreted as the internal energy of the state k, in unit [25]; then is the mean internal energy of “the statistical system”.
In a real-world laboratory working on a particular system, the tends to infinity as but converges to the intrinsic property of the statistical system. In statistical inference, the assumed , as a prior, then is expected to be replaced by the observed, real, posterior according to conditional probability and/or Bayesian statistical logic [24,25]. This concludes the statistical investigation of the particular system with respect to the type of observations. The neg-entropy function in (1) actually provides a meta-statistical theory for all possible observed , assessing their respective infinitesimal probability (rate) with respect to the prior (see Appendix A).
3. Maximization of Entropy Under Constraint by Empirical Mean Value
However, The complete counting for the entire state space in terms of empirical frequencies is only a gedankenexperiment. The significance of Gibbs’ ensemble theory is in dealing with observations from a small set of real-valued observables , where but . These g’s are random variables on the state space . In fact, their empirical mean values are linear combinations of the :
To fix mathematical notations, we append and , which represent the fact that is always normalized, and denote matrix with elements
Equation (3) shows that if all the g’s are linearly independent and , then one can solve the normalized uniquely from each set of x’s: . We refer to such a set of observables as holographic with full information. In the following discussion, we shall always imagine the as the first J component of a holographic observable . When , there is missing information [20,22,25].
With a set of observed values in hand where , the maximum entropy principle (MEP) from classical thermodynamics [23] and the contraction principle from the mathematical theory of probability [8] assert that the most probable that is consistent with the set of corresponds to minimum neg-entropy:
The entire Gibbs’ ensemble theory arises in solving the mathematical problem posed in Equation (5) through LFT. See Appendix A for its origin.
Entropy functions for different observables are different. First, for invertible , one has the entropy function for the holographic observable :
This is simply a change in the independent variables from to . Then in terms of this entropy function , (5) becomes
Intimately related to the generating function of a probability distribution, the LFT provides a powerful mathematical transform of the entropy functions , , and in terms of their conjugates in the energy representation: Parallel to the in (2) are,
These psi’s are now related through linear transformation:
and projection:
And finally, since is convex, the inverse LFT yields
The optimization in (5) is completely “solved” in closed form, through LFT and its inverse, as a parametric function in terms of given in (12).
The equation in (12) should be recognized as a generalization of the celebrated “entropy = mean internal energy − free energy”, where
is the mean value of following Equation (11b), whose conjugate variable is . The identification of in (11a) with the first law of thermodynamics as formulated by Gibbs seems natural.
The in (11a) has a very clear thermodynamic interpretation: Since the conjugate variable are the partial derivatives of the entropy function with respect to , finding x’s with maximum entropy in Equation (7) is simply setting the corresponding , e.g., letting the entropic force be zero. For each independent observable , is its “custom-designed” conjugate force and contributes a term to the internal energy as the “thermodynamic work” associated with : The internal energy is a highly flexible, adaptive representation of the . When , and Equation (10) provides a complete “detailing” of the internal energy in terms of a set of holographic observables. MEP is for missing information [20].
4. Gibbs Distribution and Linear Algebraic Representation
There is a geometric picture associated with the above “thermodynamic analysis”. As we have stated, counting frequency ad infinitum is a fundamental, intrinsic property of an ergodic dynamical system. The space of all possible frequency distributions , with , is a n-dimensional hyper-plane in the positive quadrant of , known as a probability simplex . For a given set of observables , the is foliated by with different . On each leave of the foliation, there is the most probable , which is located at the tangent point between the -dimensional leave and a -dimensional level set of the function. At this point, is the normal vector to the -leave in , and is its projection onto the J-manifold of :
in which
All the other points on the same -leave are no longer relevant: they are deemed statistically impossible under the prior and observed . The foliation therefore represents a partition of the into macro- and micro-worlds: Transversing between different -leaves are macroscopic thermodynamic processes that follow the . According to the logic of Bayesian statistics, one should use the most suitable probability frequency distribution to update the prior for the particular system with observed . The microscopic world is still random, due to missing information, but its prior is now updated. This is Gibbs’ statistical ensemble.
With a given set of , the is collapsed into J-manifold in , which is parametrized by the , or equivalently . There is no uncertainty in this “macroscopic” description. For a different set of g’s and , there will be a different -manifold. It will be desirable to treat different g’s through transformations. We note that even though is a “plane” in , it is not a linear Euclidean space since for any , , and neither are the -leaves. They are affine manifolds [26]. The locating of is a highly nonlinear procedure in the space of energies.
The LFT, in terms , and , etc., enters as a powerful algebraic linear representation of the MEP procedure. The “collapse” of a holographic to with missing information means simply neglecting all the extra dimensions: . This is because due to the convexity of , there is a one-to-one relation between and under a proper gauge fixing. And since the constrains to MEP in (5) are all linear due to the nature of observables being random variables, each g determines a 1-dimensional linear subspace in the space of .
5. Generalized Clausius Inequality
A combination of Equations (9) and (12a) yields a Clausius’ inequality-like relation:
The thermodynamics equilibrium is between the observed mean value and its conjugate “force” . When the equality holds, there is a relation between and which should be identified as a “the equation of state”, with and . When the , the difference can be interpreted as the nonequilibrium heat and again as the entropy; then, the inequality in (15) becomes the Clausius’ inequality.
6. Generalized Gibbs–Duhem Equation
7. Conclusions
The mathematical theory of probability deals with a set of elementary events , on which the probability and random variables g’s are introduced. Applying this mathematics to the real world, each ergodic dynamical system with state space has its own unique steady-state probability distribution which can be obtained as the from i.i.d. sampling ad infinitum.
Our present theory is to statistical inference obtaining particular ’s what dynamics is to kinematics in classical mechanics [27]. The entropy function in (1) arises in this context as a measure of the quantitative relationship between the assumed, “hypothesis” () and the observed “data” ( and ), as “missing information” or “surprise” [21,22]. Motivated by the analogy to the Fourier analysis, our generalized Gibbs’ theory suggests that the notion of thermo-energetics is a powerful mathematical transformation of the statistical description; and are simply two representations of a same physical reality, the former being statistical while the latter thermo-energetic. is to what chemical potential is to particle number N in J. W. Gibbs’ chemical thermodynamics. With a fixed , the theory of probability [8] revealed a powerful, dual energetic representation for various different systems, with the same state space , in terms of their respective internal energy functions [25]. This fundamental duality between counting frequency and internal energy of course has been recognized by L. Boltzmann already in 1880s, when he was developing the statistical mechanics as a foundation of classical thermodynamics under the principle of equal probability a priori. The present work shows that while the probability and statistics are fundamental as the foundation of thermodynamics, mechanics is not necessary. A similar conclusion was reached in the 1925 thesis of L. Szilard [28,29].
For sufficiently large N, the probability of observing a particular is asymptotically zero except . The significance of is to provide a “high-resolution magnifying glass” for the asymptotically small
This is known as the large deviations rate function in the modern theory of probability [8]. The entropy is a function of both and , and . For a given , it views each possible from a real system as a part of an entire class of systems under a common , a meta-statistics. If one chooses the true steady-state probability of a particular system to replace , then Equation (17) gives the probability distribution of the uncertainties in the measurement from N samples. The second-order Taylor expansion near ,
is the central limit theorem for the statistics of counting frequency , with and . This is not the fluctuations within the of the system itself. Gibbs’ theory of ensemble is about statistical measurements of a whole system; not about the individuals within.
We choose to present our theory with finite state space for mathematical simplicity. Formal generalization to continuous state space is straight forward if mathematical rigor is not required. Beyond the finite state space, it is known that modern probability and the theory of measures encounter challenges, c.f., de Finetti’s treatment of infinite sets and the axiom of choice of nonempity subsets [20]. In addition to continuous [30], there are even larger Hilbert spaces of functions on and/or von Neumann algebra of operators acting on a Hilbert space.
Funding
This research received no external funding.
Data Availability Statement
The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.
Acknowledgments
I thank Jin Feng, Weishi Liu, Zhang-Ju Liu, Bing Miao, Zhongmin Shen, Xiang Tang, Yong-Shi Wu, and particularly Jun Zhang, for many helpful discussions, and the support from Olga Jung Wan Endowed Professorship.
Conflicts of Interest
The author declares no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| LFT | Legendre–Fenchel transform |
| FT | Fourier transform |
Appendix A. Statistical Counting ad Infinitum
In this section, we provide the mathematical reasoning for stating “entropy provides the infinitesimal probability for an observed frequency with respect to a probability prior ”, “it characterizes the relationship between and in a sampling process”, and the origin of Legendre–Fenchel transform in entropy analysis. The counting of independent and identically distributed samples with state space yields , a -tuple of non-negative integers. We call all the with a simplex for counting. The simplex for counting grows with N, which we shall identify as “time”. With a given prior probability on , statistical counting is a Markov process on a growing simplex, with probability:
in which is the unit vector for the component. One can easily verify that
is a solution to (A1).
One is interested in the limit of counting ad infinitum, when all the s are expected to tend to infinity as . On the increasing simplex for , the probability . However, the properly normalized converges, and as a function of the becomes sharper and sharper, concentrated around . To more precisely characterize this limiting situation, one introduces counting frequency . The space of s then is called a probability simplex ; Equation (A1) then becomes
Its limit is a Dirac- function: for all , and at . However, “a higher order” infinitesimal analysis shows that [8]
It is clear that entropy function represents the infinitesimal prior probability on . For two ’s with different entropy values, and , their probabilities if . This is the origin of the maximum entropy principle (MEP).
To understand the limit , one can also introduce the probability generating function [8]:
in which . Then, Equation (A1) becomes
The free energy function is meaningful for all finite N. This is why the partition function is valid even for small systems in Gibbs’ theory of ensembles [13]. The Legendre–Fenchel transform of is precisely the the right-hand side of (A3):
in which the optimal . Legendre–Frenchel transform arises in the limit of through the Laplace’s method of evaluating asymptotic integrals, or the related Darwin–Fowler method of maximum term.
The analysis in this Appendix suggests that a proper interpretation of in Equation (A1) is not as an intrinsic property, for example, the generative model of data statistics, rather it should be interpreted as a choice of a “gauge” in terms of which a set of counting data is represented: each particular set of data ad infinitum is represented by the energy function , not , and the is gauge invariant via the Boltzmann relation —this yields an i.i.d. generative model. Probability is not for generative models, it is for analyzing empirical measurements on random variables.
References
- Førland, K.S.; Førland, T.; Kjelstrup, S. Irreversible Thermodynamics: Theory and Applications; John Wiley & Sons: Chichester, UK, 1988. [Google Scholar]
- Kjelstrup, S.; Bedeaux, D. Non-Equlibrium Thermodynamics of Heterogeneous Systems; Series on Advances in Statistical Mechanics; World Scientific: Singapore, 2008; Volume 16. [Google Scholar] [CrossRef]
- Bedeaux, D.; Kjelstrup, S.; Schnell, S.K. Nanothermodynamics Theory and Applications; World Scientific: Singapore, 2023. [Google Scholar] [CrossRef]
- Guggenheim, E.A. Modern Thermodynamics by the Methods of Willard Gibbs; Methuen & Co.: New York, NY, USA, 1933. [Google Scholar]
- Hill, T.L. A different approach to nanothermodynamics. Nano Lett. 2001, 1, 273–275. [Google Scholar] [CrossRef]
- Khinchin, A.Y. Mathematical Foundations of Statistical Mechanics; Dover: New York, NY, USA, 1949. [Google Scholar]
- Touchette, H. The large deviation approach to statistical mechanics. Phys. Rep. 2009, 478, 1–69. [Google Scholar] [CrossRef]
- Dembo, A.; Zeitouni, O. Large Deviations Techniques and Applications, 2nd ed.; Springer: New York, NY, USA, 1998. [Google Scholar] [CrossRef]
- Anderson, P.W. More is different: Broken symmetry and the nature of the hierarchical structure of science. Science 1972, 177, 393–396. [Google Scholar] [CrossRef] [PubMed]
- Alm, J.F.; Walker, J.S. Time-frequency analysis of musical instruments. SIAM Rev. 2002, 44, 457–476. [Google Scholar] [CrossRef]
- Fourier, J.B.J. The Analytic Theory of Heat; Freeman, A., Translator; Cambridge University Press: London, UK, 1878. [Google Scholar] [CrossRef]
- Truesdell, C. Rational Thermodynamics; Springer: New York, NY, USA, 1984. [Google Scholar] [CrossRef]
- Lu, Z.; Qian, H. Emergence and breaking of duality symmetry in thermodynamic behavior: Repeated measurements and macroscopic limit. Phys. Rev. Lett. 2022, 128, 150603. [Google Scholar] [CrossRef] [PubMed]
- Galteland, O.; Bering, E.; Kristiansen, K.; Bedeaux, D.; Kjelstrup, S. Legendre-Fenchel transforms capture layering transitions in porous media. Nanoscale Adv. 2022, 4, 2660–2670. [Google Scholar] [CrossRef] [PubMed]
- Rockafellar, R.T. Convex Analysis; Princeton University Press: Princeton, NJ, USA, 1970. [Google Scholar]
- Qian, M.; Xie, J.S.; Zhu, S. Smooth Ergodic Theory for Endomorphisms; Lecture Notes in Mathematics; Springer: Berlin, Germany, 2009; Volume 1978. [Google Scholar] [CrossRef]
- Dorfman, J.R. An Introduction to Chaos in Nonequilibrium Statistical Mechanics; Cambridge Lect. Notes in Phys.; Cambridge University Press: London, UK, 1999. [Google Scholar] [CrossRef]
- Mackey, M.C. The dynamic origin of increasing entropy. Rev. Mod. Phys. 1989, 61, 981–1015. [Google Scholar] [CrossRef]
- Qian, H.; Ge, H. Stochastic Chemical Reaction Systems in Biology; Lect. Notes on Math. Modelling in the Life Sci.; Springer Nature: Cham, Switzerland, 2021. [Google Scholar] [CrossRef]
- Jaynes, E.T. Probability Theory: The Logic of Science; Cambridge University Press: London, UK, 2003. [Google Scholar]
- Levine, R.D. Information theory approach to molecular reaction dynamics. Annu. Rev. Phys. Chem. 1978, 29, 59–92. [Google Scholar] [CrossRef]
- Ben-Naim, A. A Farewell to Entropy: Statistical Thermodynamics Based on Information; World Scientific: Singapore, 2008. [Google Scholar] [CrossRef]
- Callen, H.B. Thermodynamics and an Introduction to Thermostatistics, 2nd ed.; Wiley: New York, NY, USA, 1991. [Google Scholar]
- Commons, J.; Yang, Y.J.; Qian, H. Duality symmetry, two entropy functions, and an eigenvalue problem in Gibbs’ theory. arXiv 2021. [Google Scholar] [CrossRef]
- Qian, H. Statistical chemical thermodynamics and energetic behavior of counting: Gibbs’ theory revisited. J. Chem. Theory Comput. 2022, 18, 6421–6436. [Google Scholar] [CrossRef]
- Hong, L.; Qian, H.; Thompson, L.F. Representations and divergences in the space of probability measures and stochastic thermodynamics. J. Comput. Appl. Math. 2020, 376, 112842. [Google Scholar] [CrossRef]
- Goldstein, H. Classical Mechanics; Addison-Wesley: New York, NY, USA, 1951. [Google Scholar]
- Szilard, L. Über die ausdehnung der phänomenologschen thermodynamik auf die schwankungserscheinungen. Z. Physik. 1925, 32, 753–7888. [Google Scholar] [CrossRef]
- Mandelbrot, B. On the derivation of statistical thermodynamics from purely phenomenological principles. J. Math. Phys. 1964, 5, 164–171. [Google Scholar] [CrossRef]
- Miao, B.; Qian, H.; Wu, Y.S. Emergence of Newtonian deterministic causality from stochastic motions in continuous space and time. arXiv 2024. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).