Next Article in Journal
Preview-Based Optimal Control for Trajectory Tracking of Fully-Actuated Marine Vessels
Previous Article in Journal
Spatial Position Reasoning of Image Entities Based on Location Words
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Variational Information Principles to Unveil Physical Laws

1
Department of Information and Communication Technologies, Universitat Pompeu Fabra, 08018 Barcelona, Spain
2
Department of Genetics, Microbiology and Statistics, Faculty of Biology, Universitat de Barcelona, 08028 Barcelona, Spain
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(24), 3941; https://doi.org/10.3390/math12243941
Submission received: 30 October 2024 / Revised: 8 December 2024 / Accepted: 10 December 2024 / Published: 14 December 2024

Abstract

:
This article demonstrates that the application of the variation method to purely information-theoretic models can lead to the discovery of fundamental equations in physics, such as Schrödinger’s equation. Our solution, expressed in terms of information parameters rather than physical quantities, suggests a profound implication—Schrödinger’s equation can be viewed as a unique physical expression of a more profound informational formalism, inspiring new avenues of research.

1. Introduction

In previous works [1,2], we invoked the time-independent Schrödinger’s equation [3], which R. Frieden found to be equivalent to the principle of minimum loss of Fisher’s information [4], to compute the stationary states of multiple sources of information. We discovered that the information could be represented and distributed over a set of quantum harmonic oscillators, one for each independent source, whose coordinate for each oscillator was an informative parameter of a smooth statistical manifold. Furthermore, we demonstrated that quantum harmonic oscillators reach the Cramér–Rao lower bound on the estimator’s variance at the lowest energy level, which indicated a relationship between energy and information.
While our approach yielded positive results, and we are currently pursuing a follow-up work due to the flexibility of the mathematical framework, we were slightly concerned about utilizing the time-independent Schrödinger’s equation (or the principle of minimum loss of Fisher’s information) for computing the stationary states of multiple sources of information, so we were a bit cautious about the findings. Our main concern stemmed from the fact that E. Schrödinger postulated his wave equation for understanding the behavior of tiny particles and, in principle, not for interpreting information-theoretic quantities, per se.
Thus, we embarked on a journey to overcome our concerns. Our quest led us to explore the roots and follow the mathematical approaches of E. Schrödinger and R. Frieden. Our ultimate objective was to arrive at physical laws such as the wave equation via a variational principle utilizing information-theoretic models. Yet, the main challenge was to derive a variational principle based on purely informational quantities, contrasting with the previous two formulations that used space and time as coordinates. We aimed to determine whether information-theoretic models could also template the famous wave equation and further support our initial findings [1,2].
On the other hand, for many years, philosophers and scientists have recognized the critical role of the knowing subject in collecting and processing information and proposing explanatory scientific theories to understand reality; see, for example, the inaugural conference at University College in 1961, Probability, statistics and time, [5], and the reference to K. Pearson’s quote, “Law in the scientific sense is…essentially a product of the human mind and has no meaning apart from man, it owes its existence to the creative power of his intellect. There is more meaning in the statement that man gives laws to Nature than in its converse that Nature gives laws to man”, see [6]. It is a provocative and extreme position that would contrast with other more classical positions of 19th-century physics, but it is not without a good deal of truth.
Surely, at this point, we could adopt a position close to that of N. Bohr when he stated Contraria non contradictoria sed complementa sunt, since the framework of the theory of evolution with natural selection, real and external to the subject, prevents us from having an excessively subjective vision of reality. In any case, in the present paper, we emphasize the crucial role of the subject in efficiently processing information. More recently, there has been a noticeable surge in studies that highlight the significant role of the subject. We can mention D.J. Chalmers, a philosopher and cognitive scientist known for his work on the nature of consciousness and subjective experience. His ideas present intriguing developments that align well, as far as we know, with the approach presented in the present work, see [7].
In summary, the interplay between information theory and physics provides a rich landscape for exploring fundamental questions about the nature of reality, the structure of physical laws, and the role of observers in the universe. As research continues in fields like quantum information, complex systems, and cosmology, the information-theoretic approach remains a vital tool for advancing our understanding of the physical world. We present a novel and intellectually engaging perspective on the interaction between information theory and physical laws. We believe this work is original, interdisciplinary, and of considerable interest and could contribute significantly to ongoing discussions in information theory and theoretical physics.

Historical Review

In the following lines, we review the historical events we considered the most significant for developing this mathematical framework. Thus, we consider the variational principle essential for formulating the laws of physics and pursuing elucidating these laws from informational principles.
The idea of working via a variational principle has been used in science, especially physics, to solve problems using the calculus of variations. Indeed, E. Schrödinger also derived his wave equation from a variational principle, as pointed out in an addendum to his first paper on “Quantisation as an eigenvalue problem” [8].
Thus, by extremizing the integral
I = R R R { K 2 2 m ψ x 2 + ψ y 2 + ψ z 2 + V ψ 2 } d x d y d z ,
with respect to functions ψ which satisfy the normalization
R R R ψ 2 d x d y d z = 1 ,
we are led directly to the Schrödinger’s equation
K 2 2 m 2 ψ + ( E V ) ψ = 0 ,
provided we denote by E the undetermined Lagrange multiplier of the problem. The reader familiar with quantum mechanics will recognize the identity of K with the well-known .
The resulting Schrödinger Equation (3) became the first successful equation in quantum mechanics.
After the discovery of Schrödinger’s equation, P.A.M. Dirac derived a relativistic version that harnessed the power of the variational (action) principle and its relativistic invariance [9]. A decade later, R. Feynman developed a new quantum mechanics formulation of the action principle [10]. More specifically, Feynman interpreted Dirac’s approach as a physical recipe for the probability amplitude contributions from every possible path between ( x 1 , t 1 ) and ( x 2 , t 2 ) , resulting in the path integral formulation. A few years later, B. A. Lippmann and J. Schwinger also revisited Dirac’s paper to develop variational principles for the approximate calculation of the unitary (collision) operator that describes the connection between the initial and final states of the system [11].
The culmination of this updating process of variational (action) principles is probably the Standard Model, whose formulation has explained almost all experiments. In the meantime, J.A. Wheeler (advisor of Feynman) revolutionized the field by arguing that information is the most fundamental thing, giving rise to the physical [12]. His popular sentence “It from bit” symbolizes the idea that every item of the physical world has a bottom—a very deep bottom, in most instances—of an immaterial source and explanation, that all things physical are information-theoretic in origin and this is a participatory universe.
Contemporary to Wheeler, R. Frieden realized a deep connection between Schrödinger’s equation and Fisher’s information and was the starting point of a very ambitious research program to explain physical phenomena [4]. His genuine idea was to bring and use an informative measure into the physics domain. In particular, Fisher information measures the amount of information an observable random variable X carries about an unknown parameter θ of a distribution that models X. This approach set up the way for an information-theoretical reinterpretation of the fundamental equations of physics and science, generally [13,14].
In particular, Frieden postulated that there is a functional related to Fisher’s information whose optimization leads to Schrodinger’s equation. More specifically, Frieden modeled the physical space as R 3 with the standard Euclidean geometry and assuming a specific probabilistic model whose density is of the form f ( x , θ ) = P ( | x θ | ) where x , θ R 3 , | · | is the ordinary Euclidean norm, and P is a non-negative real function such that R P ( u ) d u = 1 . Additionally, assuming the conditions that guarantee the existence of the Fisher information matrix, which, under these conditions, becomes a scalar constant matrix whose trace is equal to
R R R 1 P P x 2 + P y 2 + P z 2 d x d y d z ,
Frieden considers the following functional:
A = R R R { 1 P P x 2 + P y 2 + P z 2 } d x d y d z + λ R R R E V P d x d y d z ,
where λ is a negative constant, E is a constant, and V is a potential function. After making the identification P = ψ 2 , the Euler–Lagrange equation corresponding to the variational principle δ A = 0 can be cast as
1 λ 2 ψ + V ψ = E ψ .
Identifying λ with 2 m 2 , where stands for the Planck’s constant, Equation (6) is identical to the time-independent Schrödinger’s equation of a quantum particle of mass m and energy E moving in the potential V, i.e., identical to Equation (3).
The first term appearing in the functional A is Fisher’s information, while the second term is proportional to the mean kinetic energy of the particle:
E K = R R R E V P d x d y d z .
The stationary states described by Schrödinger’s Equation (6), ψ , can be regarded as real, and therefore, | ψ | 2 = ψ 2 .
Frieden’s mathematical framework is probably the best attempt to explain physics from informational quantities. Taking this mathematical approach as a reference point, we aim to develop further a fundamental theory based only on information quantities, leaving aside space and time coordinates and considering only a set of informative parameters embedded in a smooth statistical manifold.
The paper implicitly presupposes the study of information sources and observers. A source of information is an object to be studied external to the observer, a true knowing subject.

2. Materials and Methods

In this section, we define the mathematical framework and demonstrate that given a specific regular parametric statistical model and corresponding sample data, applying a simple variational principle to purely information quantities, we can obtain a partial differential equation on the parametric space, a Riemannian manifold equipped with the metric tensor induced by the Fisher information matrix, the solution of which supplies a subjective probability over on all the possible probabilistic mechanisms that have generated these data, probabilistic mechanisms identifiable with the parameters. At this point, we have only played with basic statistical tools, in particular, the information from the data of a given model, a well-known concept from the statistical literature, for instance, [15] or [16].
Furthermore, if we apply this procedure of analyzing data to the simple statistical model corresponding to a multivariate normal distribution with a constant covariance matrix, a model that can be considered at least as an approximation of many regular statistical models, for large samples via extensions of the central limit theorem, we obtain, from the above-mentioned partial differential equation in the parameter space, a specific differential equation already studied in physics and known as the stationary (time-independent) Schrödinger Equation applied to a quantum harmonic oscillator.
We divide this section into several parts, from the basic setup of modeling information sources in Section 2.1, the formulation of the variational problem in Section 2.2, the solutions in Section 2.3, and the first remarks on the solutions in Section 2.4.

2.1. The Information Sources

Let us consider a class of entities, each of which can be seen as some type of information source that provides sequences of independent data we want to analyze. We will assume that each source can be described by a convenient probability space and the set of all of them through a parametric statistical model. The data may be viewed as simple random samples of arbitrary size. Let us introduce some notations to set a convenient mathematical framework for work.
Let X be a sample or an input space, let A be a σ –algebra of subsets of X , and let μ be a positive σ –finite measure on the measurable space ( X , A ) . In the present paper, a parametric statistical model is defined as the triple ( X , A , μ ) ; Θ ; f , where ( X , A , μ ) is a measure space, Θ is a manifold, also called the parameter space, and f is a measurable map f : X × Θ R such that f ( x , θ ) 0 and P θ ( d x ) = f ( x , θ ) μ ( d x ) is a probability measure on ( X , A ) , θ Θ . We shall refer to μ as the reference measure of the model and f as the model function.
Observe that, under this framework, each point of the manifold represents, potentially, different information sources, which supply sample data, obtained under independence assumptions and summarized by x X n , say x = ( x 1 , , x n ) , each x i belonging to a convenient sample or input space X which may be partially hidden, i.e.: x = ( x 1 , , x n ) X n may not be necessarily completely observed, although all of these allow a reasonable and statistically consistent close estimate of the true parameter θ Θ which characterizes the particular information source studied. Given such a sample, the joint density function with respect to the reference measure extended on the Cartesian product X n will be f ˜ ( x 1 , , x n , θ ) = i = 1 n f ( x i , θ ) , where f ˜ stands for the sample joint density function which, setting x , coincides with the likelihood function, L x ( θ ) in Θ .
Moreover, in the present paper, for the sake of simplicity, Θ will be an m–dimensional C real manifold, Hausdorff, and connected, with or without boundary which, in that case, will be denoted by Θ , although infinite-dimensional Hilbert or Banach manifolds could also be considered. Furthermore, for many purposes, it will be enough to consider the case that Θ is a connected open set of R m , and, in this case, it is customary to use the same symbol ( θ ) to denote points and coordinates. Considering this remark, we shall adopt this case and notation hereafter to present the results more familiarly, even though they can be written with more generality. Also, we shall assume that the model function f satisfies certain regularity conditions we will detail when necessary. Additionally, we will incorporate into our mathematical framework the basic developments and results of what in mathematical statistics is known as information geometry, where the parameter space Θ is a Riemannian manifold, the metric tensor of which will be given through its covariant components,
g i j ( θ ) = E θ ln f ( · , θ ) θ i ln f ( · , θ ) θ j , i , j = 1 , , m ,
where the expectation in (8) is obtained by integrating the product of partial derivatives with respect to the measure P θ ( d x ) = f ( x , θ ) μ ( d x ) on ( X , A ) . Observe, in particular, that if g ( θ ) = | det G ( θ ) | = | det g i j ( θ ) | , the Riemannian volume V ( d θ ) will be given by V ( d θ ) = g ( θ ) d θ . For further details, see, for example, the pioneering work [17], and other works such as [18,19,20,21], among the works of many other authors. Although we can consider as a measure of information corresponding to the sample x , relative to the true parameter θ , simply the log-likelihood ln L x ( θ ) = i = 1 n ln f ( x i , θ ) , this measure is generally not invariant under injective data transformations, such as rescaling. There are many ways to correct this lack of invariance, for instance, a simple way will be to choose a reference point on the manifold, say, θ 0 Θ , and define the information codified by the data x , from the source, relative to the true parameter θ  referred to an arbitrary point θ 0 as
I x ( θ ) = ln L x ( θ ) ( ln L x ( θ 0 ) ) = i = 1 n ln f ( x i , θ ) + i = 1 n ln f ( x i , θ 0 ) .
The implicit dependence of (9) on θ 0 is omitted from the notation since its choice will not play any further role when we calculate its gradient in the parametric manifold. Additionally, (9) will remain invariant under appropriate data changes, and fixed x , is also invariant under coordinate changes on the parametric manifold, i.e., it is a scalar field on Θ .
The information provided by a source, external to the observer, is represented inside him, allowing him to increase his understanding of the knowledge of the objects. Although we can consider different levels and types of said representation, we will focus on two important aspects: the parameter space with its natural information geometry and the observer’s ability to construct a plausibility regarding the true value of the parameter, Ψ , in the parameter space Θ , once given a particular sample data, essentially a square root of a kind of subjective conditional probability density with respect to the Riemannian volume, induced by the information metric over the parameter space, Θ , up to a normalization constant. Specifically, we shall write at the beginning
Θ Ψ 2 ( θ ) V ( d θ ) = a > 0 ,
for reasons that will be apparent later, although we shall be particularly interested in the case a = 1 . Observe that in (10), we are integrating with respect to the Riemannian measure defined in (8) and, therefore, this expression is invariant under coordinate changes. Furthermore, if we intend to define on the parametric manifold a probability, interpretable as a plausibility, about the true value of the parameter, we can take function Ψ 2 as the Radon–Nikodym derivative of said probability with respect to the Riemannian volume, also a measure in the same parameter space. Both measures are independent of the coordinate system used and, therefore, Ψ 2 will be an invariant scalar field on the parametric manifold Θ . Then, we can simply define the information encoded by the subjective plausibility Ψ on the parameter space Θ  relative to the true parameter θ as
Λ ( θ ) = ln ( Ψ 2 ( θ ) ) .
This quantity (11) also remains invariant under coordinate changes on the parametric manifold, being another scalar field on Θ .

2.2. The Variational Principle

In this context, trusting that many of the abilities of the observer have been efficiently shaped by natural selection in the process of biological evolution, we can propose that the subjective information mentioned above adjusts, in some way, to the information provided by the source and, in particular, satisfying the variational principle
Ω ( Ψ ) = Θ 1 n | grad I x ( θ ) grad Λ ( θ ) | 2 Ψ 2 ( θ ) V ( d θ )
is a minimum, or at least stationary, subject to the constraint (10) and assuming that Ψ and its gradient grad Ψ vanish at the boundary Θ or at infinity, which is the way to model that we have strong reasons to believe that the true parameter θ should belong far from the boundary, clearly inside of Θ . Observe that the functional Ω is equal to the expected value of the square of the norm, corresponding to the Riemannian metric given in (8), of the differences between gradients grad I x ( θ ) and grad Λ ( θ ) , | grad I x ( θ ) grad Λ ( θ ) | 2 , the expected value corresponding to the probability on Θ given by the density Ψ 2 , with respect to the Riemannian volume induced by the information metric V ( d θ ) . Observe that (12) is invariant under coordinate changes in Θ , since the square of the norm inside the integral is an invariant and Ψ 2 ( θ ) V ( d θ ) = Ψ 2 ( θ ) g ( θ ) d θ too. Notice also that the source is considered as something objective or at least, strictly speaking, inter-subjective, while the parameter space, with its geometric properties, in some sense, is built by the observer, and therefore, subjective although strongly conditioned by the source.
Any change in the information encoded by x, caused by considering a shift of the source in the parameter space, should correspond to a change in the subjective information proposed by the observer. For this reason, we propose that the squared difference of both gradients grad I x and grad Λ , divided by the sample size n, would be, on average Ω , as small as possible, at least locally.

2.3. Solving the Variational Problem

Since (12) it is an optimization problem with, at least, constraint (10), we may introduce the augmented Lagrangian
L λ , a ( Ψ ) = Θ 1 n | grad I x ( θ ) grad Λ ( θ ) | 2 Ψ 2 ( θ ) V ( d θ ) λ Θ Ψ 2 ( θ ) V ( d θ ) a ,
where λ is a constant Lagrange multiplier. Observe that, since grad ( ln Ψ 2 ( θ ) ) = 2 Ψ ( θ ) grad Ψ ( θ ) , we will have
grad I x ( θ ) grad Λ ( θ ) = grad ( ln L x ( θ ) ) + grad ln ( Ψ 2 ( θ ) ) = grad ( ln L x ( θ ) ) + 2 Ψ ( θ ) grad Ψ ( θ ) = 2 Ψ ( θ ) grad Ψ ( θ ) grad ln ( L x ( θ ) ) ,
where the expression (14) is, in its entirety, invariant under coordinate changes in Θ . Therefore, if we let η ( θ ) , an arbitrary smooth function on Θ and Ψ ( θ ) + ϵ η ( θ ) , satisfy (10), omitting, for simplicity, the variable names of the functions to be integrated, i.e., I x , Ψ and η instead of I x ( θ ) , Ψ ( θ ) , and η ( θ ) , and taking into account (14), we have
L λ , a ( Ψ + ϵ η ) = Θ 1 n | 2 Ψ + ϵ η grad ( Ψ + ϵ η ) grad ln L x | 2 ( Ψ + ϵ η ) 2 V ( d θ ) λ Θ ( Ψ + ϵ η ) 2 V ( d θ ) a
= Θ 1 n { 4 | grad ( Ψ + ϵ η ) | 2 + | grad ln L x | 2 ( Ψ + ϵ η ) 2 4 grad ln L x , grad ( Ψ + ϵ η ) ( Ψ + ϵ η ) } V ( d θ ) λ Θ ( Ψ + ϵ η ) 2 V ( d θ ) a
= Θ { ϵ 2 n 4 | grad η | 2 + | grad ln L x | 2 η 2 4 grad ln L x , grad η η λ n η 2 + ϵ n 8 grad Ψ , grad η + 2 | grad ln L x | 2 Ψ η 4 grad ln L x , grad ( Ψ η ) 2 λ n Ψ η + 1 n 4 | grad Ψ | 2 + | grad ln L x | 2 Ψ 2 4 grad ln L x , grad Ψ Ψ λ n Ψ 2 } V ( d θ ) + λ a
where we have taken into account that grad ( η Ψ ) = η grad Ψ + Ψ grad η . Then, the derivative of (15) with respect to ϵ will be
d L λ , a d ϵ = Θ { 2 ϵ n 4 | grad η | 2 + | grad ln L x | 2 η 2 4 grad ln L x , grad η η λ n η 2 + 1 n ( 8 grad Ψ , grad η + 2 | grad ln L x | 2 Ψ η 4 grad ln L x , grad ( Ψ η ) 2 λ n Ψ η ) } V ( d θ )
and the first variation of the augmented Lagrangian L λ , a
δ L λ , a ( Ψ , η ) d L λ , a d ϵ | ϵ = 0 = Θ 1 n ( 8 grad Ψ , grad η + 2 | grad ln L x | 2 Ψ η 4 grad ln L x , grad ( Ψ η ) 2 λ n Ψ η ) V ( d θ ) .
Observe, now, that we can use a well-known differential operator expression in a Riemannian manifold, div ( η grad Ψ ) = η Δ Ψ + grad η , grad Ψ , where Δ stands for the Laplacian operator. For further details, see, for instance, the work [22]. Thus, we shall have
Θ grad Ψ , grad η V ( d θ ) = Θ div ( η grad Ψ ) V ( d θ ) Θ η Δ Ψ V ( d θ ) ,
but, by the Gauss divergence theorem, we have
Θ div ( η grad Ψ ) V ( d θ ) = Θ η grad Ψ , ν A ( d θ ) = Θ η grad Ψ , ν A ( d θ ) = 0 ,
where ν is a unitary vector field on Θ pointing out Θ , and A ( θ ) is the surface element in Θ induced by the information Riemannian metric on Θ , and taking into account that, by the boundary conditions, grad Ψ vanishes at Θ or at infinity, we obtain (21). Thus, substituting this result into (20), we obtain
Θ grad Ψ , grad η V ( d θ ) = Θ η Δ Ψ V ( d θ ) .
Additionally, since div η Ψ grad ln L x = η Ψ Δ ln L x + grad ( η Ψ ) , grad ln L x , we shall have
Θ grad ( η Ψ ) , grad ln L x V ( d θ ) = Θ div ( ( η Ψ ) grad ln L x V ( d θ ) Θ η Ψ Δ ln L x V ( d θ ) ,
but, by the Gauss divergence theorem and the bi-linearity of the scalar product, we have
Θ div ( η Ψ grad ln L x ) V ( d θ ) = Θ η Ψ grad ln L x , ν A ( d θ ) = 0 ,
since, by the boundary conditions, Ψ vanishes at Θ or at infinity, and, therefore, we have (24). Then, substituting this equation into (23), we obtain
Θ grad ( η Ψ ) , grad ln L x V ( d θ ) = Θ η Ψ Δ ln L x V ( d θ ) .
Combining (19) with (22) and (25), we obtain
δ L λ , a ( Ψ , η ) = Θ 2 n ( | grad ln L x | 2 λ n ) Ψ 4 Δ Ψ + 2 Ψ Δ ln L x η V ( d θ ) ,
for arbitrary η . Therefore, we arrive at the fundamental equation
4 Δ Ψ + | grad ln L x | 2 Ψ = ( 2 Δ ln L x + λ n ) Ψ ,
or, in terms of (9) the fundamental equation will become
4 Δ Ψ + | grad I x | 2 Ψ = 2 Δ I x + λ n Ψ .
This equation supplies the solutions of stationary points of the variational problem (12) subject to the constraint (10) and assuming that Ψ and its gradient grad Ψ vanish at the boundary Θ or at infinity as we have previously highlighted. Observe that for any obtained solution Ψ of (27) or (28), a multiple of this function is also a solution of these partial derivative differential equations, being able to select, among these, the one that satisfies the restriction (10). Furthermore, if we choose a = 1 , we get a direct probabilistic interpretation, i.e., Ψ 2 as a probability density with respect to the Riemannian volume.

2.4. First Remarks on the Solutions

To interpret the Lagrange multiplier, let ( Ψ * , λ * ) be a solution of (27) which supplies stationary points of the augmented Lagrangian (13) subject to the constraint (10). This solution may depend on a. We may write to emphasize this Ψ * ( a ) , λ * ( a ) . Additionally, defining
M * ( a ) = Ω ( Ψ * ( a ) ) ,
by the chain rule, taking into account (10) and (13), it is straightforward to obtain
d M * d a = λ * ( a ) .
Taking into account that grad Λ ( θ ) does not change if we multiply Ψ by any constant, we can write
Ω ( Ψ ) = a Θ 1 n | grad I x ( θ ) grad Λ ( θ ) | 2 Ψ ^ 2 ( θ ) V ( d θ ) = a κ ,
where Ψ ^ 2 is the function Ψ 2 normalized, i.e., Ψ ^ 2 = Ψ 2 / Θ Ψ 2 d V . Then, we have that the constant κ is greater than or equal to zero, i.e., κ 0 , and, as a consequence, we shall have λ * ( a ) 0 .
Moreover, let us suppose that the system (27) or (28) with the constraint (10) admits as a solution for λ = 0 , Ψ 0 , and where, without loss of generality, we have chosen a = 1 . Then, observe that the second derivative of the augmented Lagrangian (13) with respect to ϵ is
d 2 L 0 , 1 d ϵ 2 = 2 n Θ 4 | grad η | 2 + | grad ln L x | 2 η 2 4 grad ln L x , grad η η V ( d θ ) = 2 n Θ | grad ln ( η 2 ) | 2 + | grad ln L x | 2 2 grad ln L x , grad ln ( η 2 ) η 2 V ( d θ ) = 2 n Θ grad ln L x grad ln ( η 2 ) 2 η 2 V ( d θ ) 0 .
Therefore, by (32), the augmented Lagrangian L 0 , 1 is convex, and the fundamental Equation (27) provides an absolute minimum Ψ 0 of the variational problem (12) subject to the constraint (10), with a = 1 .

3. A First Simple Application

In this section, we apply the fundamental Equation (27) to a simple statistical problem. In Section 3.1, we strive to motivate a first simple application, and in Section 3.2, we cover the details of the application.

3.1. Motivation

Let us assume that our regular statistical parametric model has all the usual regularities, which allow the standard asymptotic properties of the maximum likelihood estimation (MLE) of the parameter θ Θ , or actually, any other with the same asymptotic properties. For simplicity, let us consider that this manifold Θ is a simply connected open subset of R m . In this case, it is customary to identify the points θ Θ with their coordinates ( θ 1 , , θ m ) that we may write as m × 1 column vectors θ when necessary. Let θ n * be the MLE based on a sample of size n, x 1 , , x n , then, it is well known that
n ( θ n * θ ) L Y ,
i.e., a convenient standardization of MLE converges weakly (or in law) to a random vector Y, which has a multivariate normal distribution, with zero mean and covariance matrix equal to the inverse of the Fisher’s information matrix at the true value of θ , which means, in the present approach, a value determined by the source. Moreover, since under mild regularity assumptions, the MLE is also a sufficient statistics, a transformation of the sample space which preserves all the relevant quantities related to the inference of θ , in particular, the likelihood partial derivatives with respect θ α , α = 1 , , m and the information metric (essentially due to the Neyman–Fisher factorization theorem), we may start at the beginning with the model corresponding to MLE, i.e., with the sample space being X = Θ and with A being the σ -algebra of Borel (the σ -algebra generated by the open sets of Θ ). Although an exact convenient model to describe a part of the reality may be or remain unknown, we may consider, at least as an approximation, the limit model given by (33), which, under conditions of regularity, can informally be stated as θ n * following an approximate multivariate normal distribution with mean equal to the true value θ and covariance matrix equal to 1 n G 1 ( θ ) . Observe that this asymptotic result is, in fact, an extension of the central limit theorem. Since the MLE is a consistent estimator, and by the continuity of Fisher information, 1 n G 1 ( θ ) could be replaced by 1 n G 1 ( θ n * ) , then, we may focus our attention to an m-variate normal model with a (approximately) constant covariance matrix and a sample summarized by just one value: the parameter estimate.

3.2. The Example

Then, we are motivated to apply the variational development to the parametric statistical model corresponding to m-variate normal distribution with unknown m-dimensional mean vector μ and with a known strictly positive m × m covariance matrix Σ 0 . In this case, X = R m and for a sample of size n, the sample space is X n = R m × n and the parameter space is also Θ = R m . The information metric corresponding to this model is given, using repeated index summation convention by
d s 2 = g α β d μ α d μ β ,
where the metric tensor coefficients, computed from a sample of size equal to one, are the coefficients of the inverse of the covariance matrix, i.e., G = ( g α β ) = Σ 0 1 and g = det ( G ) = 1 / det ( Σ 0 ) , the determinant of the information metric of the model which is constant in the whole parametric manifold Θ . The parametric space is Euclidean, and the Riemannian distance, ρ , corresponding to this model is, in this case, the well-known Mahalanobis distance, whose square is given by
ρ 2 ( μ 2 , μ 1 ) = d M 2 ( μ 2 , μ 1 ) = ( μ 2 μ 1 ) t Σ 0 1 ( μ 2 μ 1 ) ,
where the points of the parameter space are identified through their coordinates and written in matrix notation as m × 1 column vectors.
On the other hand, after some straightforward computation, we may obtain the likelihood corresponding to a random sample of size n, x = ( x 1 , , x n ) , as
L x ( μ ) = ( 2 π ) m n 2 det ( Σ 0 ) n 2 exp n 2 Tr { Σ 0 1 S n } exp ( n 2 d M 2 ( x ¯ n , μ ) ) ,
where x ¯ n = 1 n i = 1 n x i , and S n = 1 n i = 1 n ( x i x ¯ ) ( x i x ¯ ) t , i.e., the sample mean and the sample covariance matrix.
Then, minus their natural logarithm will be
ln L x ( μ ) = m n 2 ln ( 2 π ) + n 2 ln det ( Σ 0 ) + n 2 Tr { Σ 0 1 S n } + n 2 d M 2 ( x ¯ n , μ ) ,
and the information codified by the data x , from the source, relative to the true parameter μ  referred to an arbitrary point μ 0 is given by
I x ( μ ) = n 2 d M 2 ( μ , x ¯ n ) n 2 d M 2 ( μ 0 , x ¯ n ) .
Their partial derivatives, taking into account the symmetry of the metric tensor and using the summation convention for repeated indexes, will be given in classical notation by
( I x ) , α = n 2 d M 2 ( x ¯ n , μ ) μ α = n g α β ( μ β x ¯ n β ) ,
and the components of the gradient of I x , a contravariant vector field, observing that the inverse of the metric tensor corresponding to the model is given by g γ α , since g γ α g α β = δ β γ where δ β γ are the Kronecker delta’s, will be given in classical notation by
( grad I x ) γ = ( I x ) , γ = g γ α ( I x ) , α = g γ α n g α β ( μ β x ¯ n β ) = n ( μ γ x ¯ n γ ) ,
and therefore,
| grad I x | 2 = g γ β ( I x ) , γ ( I x ) , β = g γ β n ( μ β x ¯ n β ) n ( μ γ x ¯ n γ ) = n 2 d M 2 ( μ , x ¯ n ) .
Moreover, we have that the Laplacian of I x , considering (39), taking into account that g = det ( G ) is, in this example, constant and using repeated index summation convention will be given by
Δ I x = 1 g μ i g g i j I x μ j = g i j 2 I x μ i μ j = n g i j g j β μ i ( μ β x ¯ n β ) = n δ i i = m n ,
while the Laplacian of Ψ will be
Δ Ψ = 1 g μ i g g i j Ψ μ j = g i j 2 Ψ μ i μ j .
Observe that g i j equals the covariance between variables X i and X j , the element located in the i-th row and the j-th column of the matrix Σ 0 of the underlying multivariate normal model, and where we have made use of repeated index summation convention. Additionally, notice that both expressions (42) and (43) are invariant under coordinate changes on Θ = R n . Furthermore, if we define the symmetric m × m matrix Ψ μ = 2 Ψ μ i μ j m × m , which is the hessian matrix of Ψ under μ = ( μ 1 , , μ m ) , then (43) becomes
Δ Ψ = Tr { Σ 0 Ψ μ } .
and, therefore, the Equation (27) becomes
4 Tr { Σ 0 Ψ μ } + n 2 d M 2 ( μ , x ¯ n ) Ψ = ( 2 m n + λ n ) Ψ ,
which is the Schrödinger’s equation of m-coupled quantum harmonic oscillators; see [2].
We may consider also the particular case where m = 1 Equation (27) becomes
4 σ 0 2 Ψ + n 2 μ x ¯ n σ 0 2 Ψ = ( 2 n + λ n ) Ψ .
Dividing this expression by 2 n 2 and renaming ε = ( 1 n + λ 2 n ) , a positive quantity, we shall have
2 σ 0 2 n 2 Ψ + 1 2 μ x ¯ n σ 0 2 Ψ = ε Ψ ,
which is the Schrödinger equation of a quantum harmonic oscillator, reported by the authors in a previous paper, in which infinite solutions are obtained involving Hermite polynomials [1]. Observe that Equation (47) is obtained applying Equation (27) to the model described in Section 3.2, for the univariate case, and assuming that Ψ and its gradient grad Ψ vanish at infinity, as we have previously highlighted. In particular, the ground-state solution of the quantum harmonic oscillator problem, i.e., the wave function for the ground-state, is of the form
Ψ 0 ( μ ) = ζ 0 exp n 4 μ x ¯ n σ 0 2 ,
where ζ 0 could be a positive normalization constant such that R Ψ 0 2 ( μ ) d μ = 1 .
In particular, ζ 0 = n 2 π σ 0 2 4 . Moreover, taking into account that
Ψ 0 ( μ ) = ζ 0 exp n 4 μ x ¯ n σ 0 2 n 2 σ 0 2 ( μ x ¯ n ) = Ψ 0 ( μ ) n 2 σ 0 2 ( μ x ¯ n ) ,
and
Ψ 0 ( μ ) = ζ 0 exp n 4 μ x ¯ n σ 0 2 n 2 4 σ 0 4 ( μ x ¯ n ) 2 n 2 σ 0 2 = Ψ 0 ( μ ) n 2 4 σ 0 4 ( μ x ¯ n ) 2 n 2 σ 0 2 ,
it is straightforward to obtain Ψ 0 defined in (48) and satisfy (47) for ε = 1 n . Therefore, λ = 0 and Ψ 0 supply an absolute minimum of the constrained variational problem (12), the intrinsic Crámer–Rao lower bound for intrinsically unbiased estimators; see [20]. Moreover, as we pointed out in our previous paper [1], Ψ 0 2 is the well-known posterior distribution obtained using the Bayes’ theorem [23] through the non-informative improper Jeffrey’s prior [24].
But Equation (47), which recalls the Schrödinger equation of a quantum harmonic oscillator, has infinitely many solutions corresponding to different energy, ε , levels. For instance, if we let
Ψ 1 ( μ ) = ζ 1 ( μ x ¯ n ) exp n 4 μ x ¯ n σ 0 2 ,
where ζ 1 is a positive normalization constant such that R Ψ 1 2 ( μ ) d μ = 1 , we obtain ζ 1 = n 3 2 π σ 0 2 4 . Moreover, considering that
Ψ 1 ( μ ) = ζ 1 exp ( n 4 ( μ x ¯ n ) 2 ) + ζ 1 ( μ x ¯ n ) exp ( n 4 ( μ x ¯ n ) 2 ) ( n 2 ( μ x ¯ n ) )
= ζ 1 exp ( n 4 ( μ x ¯ n ) 2 ) 1 n 2 ( μ x ¯ n ) 2 ,
and, therefore,
Ψ 1 ( μ ) = ζ 1 exp ( n 4 ( μ x ¯ n ) 2 ) n 2 ( μ x ¯ n ) 1 n 2 ( μ x ¯ n ) 2 + ζ 1 exp ( n 4 ( μ x ¯ n ) 2 ) n ( μ x ¯ n ) = Ψ 1 ( μ ) 3 n 2 + n 2 4 ( μ x ¯ n ) 2 ,
it is straightforward to obtain that Ψ 1 defined in (51) satisfy (47) for ε = 3 n , the energy corresponding to the first excited state. Therefore, λ = 2 / n . With this framework, the energy levels, ε ’s, are given by
ε = 2 n ( ν + 1 2 ) , with ν N = { 0 , 1 , 2 , } .
The previous example could also be developed with m > 1 , and making use of the results developed in an intrinsic analysis of the point estimate [20], as in [2] in an analogous and straightforward way, thus substantiating the results of the aforementioned paper.

4. Discussion

In previous works [1,2], we leveraged a fundamental equation in quantum mechanics: the Schrödinger’s equation (or the principle of minimum loss of Fisher information) to calculate the stationary states of multiple sources of information. We obtained interesting results showing that we could break the information into quantum harmonic oscillators, one for each independent source; however, we were concerned that our conclusions relied on a fundamental physics equation. To address this concern, we demonstrated in this paper that applying the variation method on purely information quantities unveils Schrödinger’s equation, supporting the previous results. In mathematical development, we strictly use information-theoretic and statistic quantities and arrive at the wave equation using information parameters rather than physical quantities, neglecting space and time coordinates.
This study reiterates the idea that physical objects are information-theoretic in origin [12]. In some way, the human observer’s brain acts as a built-in statistical machine, processing the primitive information provided by the senses from the object to be studied, information mediated by variables that remain latent or hidden from the observer’s consciousness but that are part of the observation process that shape the properties of the represented objects. The physical observables—the nature things—may emerge after pre-processing the source information by the sensory systems or organs, a stage we call pre-physics, with the laws of physics appearing afterward. Only after this stage do the fundamental principles of physics come into play.
It is not surprising, then, that the most basic and appropriate statistical principles can illuminate aspects of fundamental physics, and reciprocally, fundamental physical laws can help propose methods for adequate information management. In this way, this mathematical framework could also be of significant interest in other research fields, topics such as neuroscience. For instance, this informational formalism may be helpful for the brain to represent physical objects and interact with the environment for the entire organism to survive. We already argued that this complex system, consisting of several components, including water molecules, cell types, and conduction systems, could harbor a representation of information regarding the solution of Schrödinger’s equation. With this paper’s proof, we can now say that it can represent the information in terms of the solution from a variational principle consisting of purely information-theoretic objects.
On the other hand, this result suggests that the famous Schrödinger’s equation is a particular physical expression of a more deep informational formalism. In other words, Schrödinger’s equation describes the physics at home in the universe due to embracing and fitting a more generic informational equation with universal constants to satisfy experimental data. In our view, Schrödinger’s equation is an informational-theoretic object that has been fine-tuned to suit the physics of our universe and to elucidate and forecast certain phenomena occurring at the microscopic level, such as, for example, the stationary states of the non-relativistic hydrogen atom.
The world is fundamentally quantum because the information is quantized, as it comes in discrete observations. The idea is encapsulated in the popular phrase “it from bit” by J.A. Wheeler, and not the other way around, “bit from it”, which suggests that information is the basic building block of the universe. In other words, the physical world demonstrates information quantization. We can observe this feature at the physical microscopic level because of the small number of observation samples. At this tiny level, the solution of the information variational principle becomes apparent. In other words, the solution of this variation method in our physical world is Schrödinger’s equation.
It is interesting to note that our approach may not only be valid for our universe but also apply to other theoretical universes in the case of existence. Adjusting the universal constants to the informational formalism can result in a new or pseudo-Schrödinger’s equation, which could be the key to unlocking the secrets of other universes if they exist. In short, we would need to adjust for the new Planck–Einstein constants to fit the new experimental data. So, in the hypothetical case, we may be able to travel to another universe; the main instrument would be to adjust the solution of a variational principle based on information-theoretic objects rather than physical quantities as they will emerge afterward.
In the same direction, it is essential to emphasize that the method presented applies to any data analysis grounded in a regular parametric statistical model. This approach facilitates the construction of analogs to Bayesian posterior distributions in the parameter space of a statistical model using the available data, and it does so without the need to specify any particular prior distribution. The formulation of a variational problem achieves this, and it can be confidently interpreted within the framework of natural selection. Throughout evolution, this framework has decisively shaped the capacity of the knower to effectively grasp and capture the reality of the object under study.
To conclude, we would like to stress that we are planning to develop further variational principles to fully exploit this informational formalism. A crucial point of using and continuing with this mathematical approach is that the variational principle is flexible. It may incorporate other norms or even restrictions so that we can model complex information-theoretic quantities. For example, we aim to derive the relativistic version of Schrödinger’s equation and other important equations in which space and time may emerge from linear and nonlinear transformations of a set of informative parameters of a specific statistical manifold. Our ultimate goal is an informational derivation of the nature of things: laws of physics.

Author Contributions

Conceptualization, D.B.-C. and J.M.O.; writing—original draft preparation, D.B.-C. and J.M.O.; writing—review and editing, D.B.-C. and J.M.O. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Bernal-Casas, D.; Oller, J.M. Information-Theoretic Models for Physical Observables. Entropy 2023, 25, 1448. [Google Scholar] [CrossRef] [PubMed]
  2. Bernal-Casas, D.; Oller, J.M. Intrinsic Information-Theoretic Models. Entropy 2024, 26, 370. [Google Scholar] [CrossRef]
  3. Schrödinger, E. An Undulatory Theory of the Mechanics of Atoms and Molecules. Phys. Rev. 1926, 28, 1049–1070. [Google Scholar] [CrossRef]
  4. Frieden, B.R. Fisher information as the basis for the Schrödinger wave equation. Am. J. Phys. 1989, 57, 1004–1008. [Google Scholar] [CrossRef]
  5. Bartlett, M.S. Probability, statistics and time. In Probability, Statistics and Time: A Collection of Essays; Bartlett, F., Cox, F., Eds.; Chapman and Hall: London, UK, 1975; pp. 1–20. [Google Scholar] [CrossRef]
  6. Pearson, K. The Grammar of Science, 3rd ed.; Adam and Charles Black: London, UK, 1911. [Google Scholar]
  7. Chalmers, D. Constructing the World; Oxford University Press: Oxford, UK, 2012. [Google Scholar]
  8. Schrödinger, E. Quantisierung als Eigenwertproblem. I. Ann. Phys. 1926, 384, 361–376. [Google Scholar] [CrossRef]
  9. Dirac, P.A.M. The Lagrangian in quantum mechanics. Phys. Z. Sowjetunion 1933, 3, 64–72. [Google Scholar]
  10. Feynman, R.P. The Principle of Least Action in Quantum Mechanics. Ph.D. Thesis, Princeton University, Princeton, NJ, USA, 1942. [Google Scholar] [CrossRef]
  11. Lippmann, B.A.; Schwinger, J. Variational Principles for Scattering Processes. I. Phys. Rev. 1950, 79, 469–480. [Google Scholar] [CrossRef]
  12. Wheeler, J. Information, Physics, Quantum: The Search for Links. In Proceedings of the 3rd International Symposium on Foundations of Quantum Mechanics in the Light of New Technology, Tokyo, Japan, 28–31 August 1989; pp. 354–358. [Google Scholar]
  13. Frieden, B.R. Physics from Fisher Information; Cambridge University Press: Cambridge, UK, 1998; Available online: https://openlibrary.org/books/OL360406M/Physics_from_Fisher_information (accessed on 1 August 2024).
  14. Frieden, B.R. Science from Fisher Information; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar] [CrossRef]
  15. Lehmann, E.L.; Casella, G. Theory of Point Estimation, 2nd ed.; Springer: New York, NY, USA, 1998. [Google Scholar]
  16. Berger, J.O. Statistical Decision Theory and Bayesian Analysis, 2nd ed.; Springer: New York, NY, USA, 1985. [Google Scholar]
  17. Rao, C. Information and Accuracy Attainable in Estimation of Statistical Parameters. Bull. Calcutta Math. Soc. 1945, 37, 81–91. [Google Scholar]
  18. Burbea, J.; Rao, C.R. Entropy differential metric, distance and divergence measures in probability spaces: A unified approach. J. Multivar. Anal. 1982, 12, 575–596. [Google Scholar] [CrossRef]
  19. Amari, S.i. Information Geometry and Its Applications, 1st ed.; Springer: Tokyo, Japan, 2016. [Google Scholar]
  20. Oller, J.M.; Corcuera, J.M. Intrinsic Analysis of Statistical Estimation. Ann. Stat. 1995, 23, 1562–1581. [Google Scholar] [CrossRef]
  21. Nielsen, F. An Elementary Introduction to Information Geometry. Entropy 2020, 22, 1100. [Google Scholar] [CrossRef]
  22. Chavel, I. Eigenvalues in Riemannian Geometry; Elsevier: Orlando, FL, USA, 1984. [Google Scholar] [CrossRef]
  23. Bayes, T. An essay towards solving a problem in the doctrine of chances. Phil. Trans. R. Soc. Lond. 1763, 53, 370–418. [Google Scholar] [CrossRef]
  24. Jeffreys, H. An invariant form for the prior probability in estimation problems. Proc. R. Soc. Lond. Ser. A Math. Phys. Sci. 1946, 186, 453–461. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bernal-Casas, D.; Oller, J.M. Variational Information Principles to Unveil Physical Laws. Mathematics 2024, 12, 3941. https://doi.org/10.3390/math12243941

AMA Style

Bernal-Casas D, Oller JM. Variational Information Principles to Unveil Physical Laws. Mathematics. 2024; 12(24):3941. https://doi.org/10.3390/math12243941

Chicago/Turabian Style

Bernal-Casas, D., and J. M. Oller. 2024. "Variational Information Principles to Unveil Physical Laws" Mathematics 12, no. 24: 3941. https://doi.org/10.3390/math12243941

APA Style

Bernal-Casas, D., & Oller, J. M. (2024). Variational Information Principles to Unveil Physical Laws. Mathematics, 12(24), 3941. https://doi.org/10.3390/math12243941

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop