Next Article in Journal
Phase Space Spin-Entropy
Previous Article in Journal
A Comparative Analysis of Discrete Entropy Estimators for Large-Alphabet Problems
Previous Article in Special Issue
Information-Theoretic Models for Physical Observables
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Intrinsic Information-Theoretic Models

Department of Genetics, Microbiology and Statistics, Faculty of Biology, Universitat de Barcelona, 08028 Barcelona, Spain
*
Author to whom correspondence should be addressed.
Entropy 2024, 26(5), 370; https://doi.org/10.3390/e26050370
Submission received: 1 April 2024 / Revised: 21 April 2024 / Accepted: 24 April 2024 / Published: 28 April 2024

Abstract

:
With this follow-up paper, we continue developing a mathematical framework based on information geometry for representing physical objects. The long-term goal is to lay down informational foundations for physics, especially quantum physics. We assume that we can now model information sources as univariate normal probability distributions N ( μ , σ 0 ) , as before, but with a constant σ 0 not necessarily equal to 1. Then, we also relaxed the independence condition when modeling m sources of information. Now, we model m sources with a multivariate normal probability distribution N m ( μ , Σ 0 ) with a constant variance–covariance matrix Σ 0 not necessarily diagonal, i.e., with covariance values different to 0, which leads to the concept of modes rather than sources. Invoking Schrödinger’s equation, we can still break the information into m quantum harmonic oscillators, one for each mode, and with energy levels independent of the values of σ 0 , altogether leading to the concept of “intrinsic”. Similarly, as in our previous work with the estimator’s variance, we found that the expectation of the quadratic Mahalanobis distance to the sample mean equals the energy levels of the quantum harmonic oscillator, being the minimum quadratic Mahalanobis distance at the minimum energy level of the oscillator and reaching the “intrinsic” Cramér–Rao lower bound at the lowest energy level. Also, we demonstrate that the global probability density function of the collective mode of a set of m quantum harmonic oscillators at the lowest energy level still equals the posterior probability distribution calculated using Bayes’ theorem from the sources of information for all data values, taking as a prior the Riemannian volume of the informative metric. While these new assumptions certainly add complexity to the mathematical framework, the results proven are invariant under transformations, leading to the concept of “intrinsic” information-theoretic models, which are essential for developing physics.

1. Introduction

In this work, we continue developing the mathematical framework introduced in [1] by implementing some variations to better account for reality. In particular, we model information sources as univariate normal probability distributions N ( μ , σ 0 ) as before but with a constant σ 0 not necessarily equal to 1. We also relax the independence condition when modeling m sources of information. Thus, we model m-dependent sources with a multivariate normal probability distribution N m ( μ , Σ 0 ) with a constant variance–covariance matrix Σ 0 not necessarily diagonal, i.e., with covariance values different than 0, which leads to the concept of modes rather than sources when finding the solutions.
As in our initial work, the mathematical approach departs from the supposition that physical objects are information-theoretic in origin, an idea that has recurrently appeared in physics. In the following mathematical developments, we discover that the approach is importantly “intrinsic”, giving rise to the paper’s title, which is the main feature we want to emphasize in this study. In other words, regardless of how we parametrize the modeling, the approach’s inherent properties, for example, energy levels, remain the same, irrespective of updating the framework with the above modifications.
This entire work builds upon this finding, which makes it ideal for studying the properties of information representation and developing physics, and our modifications can further improve the framework’s accuracy and applicability to real-world scenarios. The long-term goal is to provide models to explain the “pre-physics” stage from which everything may emerge. We refer to the initial preprocessing of the source data information which is performed, in principle, by our sensory systems or organs. Therefore, the research in this follow-up paper may significantly contribute to the field and potentially guide future work in the area.

2. Mathematical Framework

The plan of this section, which we divide into eleven subsections for didactic purposes, is the following. In Section 2.1, we outline modeling a single source with a single sample and the derivation of Fisher’s information and the Riemannian manifold. In Section 2.2, we describe modeling a single source with n samples. Section 2.3 is devoted to analyzing the stationary states of a single source with n samples in the Riemannian manifold. In Section 2.4, we present the solutions of the stationary states in our formalism. In Section 2.5, we compute the probability density function, the mean quadratic Mahalanobis distance, and the “intrinsic” Cramér–Rao lower bound for a single source with n samples. An extension of this approach to m independent sources is conducted in Section 2.6 to compute the global probability density function at the ground-state level. In Section 2.7, we outline modeling m-dependent sources of a single sample, Fisher’s information, and the Riemannian manifold. Section 2.8 describes m-dependent sources of n samples. In Section 2.9, we analyze the stationary states of m-dependent sources of n samples in the Riemannian manifold. Section 2.10 is devoted to finding the solutions. Finally, in Section 2.11, we use Bayes’ theorem to obtain the posterior probability density function.

2.1. A Single Source with a Single Sample: The Fisher’s Information, the Riemannian Manifold, and the Quadratic Mahalanobis Distance

We start our mathematical description by modeling a single source with a univariate normal probability distribution N ( μ , σ 0 ) where σ 0 > 0 is a known constant. This is a well-known parametric statistical model in which unidimensional parameter space may be identified with the real line, i.e., Θ = R . We can compute all the relevant quantities relevant to our purpose. For a single sample, the univariate normal density (with respect to the Lebesgue measure), its natural logarithm, and the partial derivative with respect to the parameter μ are given by
f ( x ; μ ) = 1 2 π σ 0 e 1 2 σ 0 2 ( x μ ) 2 ,
ln f = 1 2 ln ( 2 π σ 0 2 ) 1 2 σ 0 2 ( x μ ) 2 ,
ln f μ = x μ σ 0 2 .
From Equation (3), which is also called the score function, we can calculate Fisher’s information [2] for a single sample as
I ( μ ) = E μ ln f μ 2 = E μ x μ σ 0 2 2 = 1 σ 0 2 E μ x μ σ 0 2 = 1 σ 0 2 ,
since with the change z = x μ σ 0 , we have that E μ x μ σ 0 2 = z 2 1 2 π e 1 2 z 2 d z = 1 .
The Riemannian metric [3] from a single source with a single sample derived from the Fisher’s information (4) is a metric tensor whose covariant component, contravariant component, and determinant, respectively, are
g 11 ( μ ) = 1 σ 0 2 ,
g 11 ( μ ) = σ 0 2 ,
det ( g ( μ ) ) = 1 σ 0 2 .
The corresponding square of the Riemannian distance induced in the parametric space is the well-known quadratic Mahalanobis distance [4], i.e.,
d M 2 ( μ 2 , μ 1 ) = 1 σ 0 2 μ 2 μ 1 2 .
The quadratic Mahalanobis distance will play a critical role in the next sections.

2.2. A Single Source with n Samples: The Fisher’s Information, the Riemannian Manifold, and the Square of the Riemannian Distance

If the source generates n samples, ( x 1 , , x n ) , drawn independently from a univariate normal probability distribution (1), the likelihood of this n variate sample, its log-likelihood, and the scores, respectively, are
f n ( x 1 , , x n ; μ ) = i = 1 n f ( x i ; μ ) = 1 2 π σ 0 2 n 2 exp ( 1 2 σ 0 2 i = 1 n x i μ 2 ) ,
ln f n = n 2 ln ( 2 π σ 0 2 ) 1 2 σ 0 2 i = 1 n x i μ 2 ,
ln f n μ = 1 σ 0 2 i = 1 n x i μ = n σ 0 2 x ¯ μ .
Likewise, from Equation (11) we can calculate the Fisher’s information corresponding to an n size sample as
I n ( μ ) = E μ ln f n μ 2 = n σ 0 2 E μ ( x ¯ μ ) σ 0 / n 2 = n σ 0 2 .
since x ¯ follows a univariate normal distribution with mean equal to μ and variance equal to σ 0 2 n .
In other words
I n ( μ ) = n I ( μ ) ,
which shows the well-known additive property of the Fisher information for independent samples.
The Riemannian metric [3] from a single source with n samples derived from the Fisher’s information (12) is a metric tensor whose covariant component, contravariant component, and determinant, respectively, are
g ˜ 11 ( μ ) = n σ 0 2 ,
g ˜ 11 ( μ ) = σ 0 2 n ,
det ( g ˜ ( μ ) ) = n σ 0 2 .
The square of the Riemannian distance, ρ 2 , induced by the Fisher information matrix corresponding to a sample of arbitrary size n will be equal to n times the quadratic Mahalanobis distance (8), i.e.,
ρ 2 ( μ 2 , μ 1 ) = n d M 2 ( μ 2 , μ 1 ) = n σ 0 2 μ 2 μ 1 2 .

2.3. Stationary States of a Single Source of n Samples in the Riemannian Manifold

To calculate the stationary states of a single source of n samples, we can invoke the principle of minimum Fisher information [5] or use the time-independent non-relativistic Schrodinger’s equation [6]. The two approaches have been demonstrated to be equivalent elsewhere [5]. The equation reads as follows
k 2 ψ ( μ ) + U ( μ ) ψ ( μ ) = λ ψ ( μ ) ,
where U ( μ ) is a potential energy and k , λ > 0 . The solution must also satisfy lim μ ψ ( μ ) = lim μ + ψ ( μ ) = 0 and ψ 2 ( μ ) d μ = 1 . For simplicity, we will write ψ instead of ψ ( μ ) .
We can use the modulus square of the score function (11) as the potential energy, except for a constant term
(19a) ln f n μ 2 = g 11 ( μ ) ln f n μ 2 , (19b) = σ 0 2 n n σ 0 2 x ¯ μ 2 , (19c) = n σ 0 2 x ¯ μ 2 .
Alternatively, we can use as the potential energy the difference between the maximum of the log-likelihood attained by the sample, ( x 1 , , x n ) , minus the log-likelihood at an arbitrary point μ , up to a proportionality constant. Since the likelihood is given by (9), we can rewrite it as
f n ( x 1 , , x n ; μ ) = 1 2 π σ 0 2 n 2 exp ( n 2 σ 0 2 s n 2 ) exp ( n 2 σ 0 2 ( x ¯ μ ) 2 ) ,
where x ¯ = 1 n i = 1 n x i and s n 2 = 1 n i = 1 n ( x i x ¯ ) 2 . The supreme likelihood is obviously attained when μ = x ¯ , then, the previously mentioned potential will be
U ( μ ) ln f n ( x 1 , , x n ; x ¯ ) ln f n ( x 1 , , x n ; μ ) = n 2 σ 0 2 ( x ¯ μ ) 2 .
This expression is up to a proportionality constant equal to (19). Thus, we may choose as the potential energy U ( μ ) = n C σ 0 2 x ¯ μ 2 with C > 0 , and Equation (18) reads as:
k 2 ψ + n C σ 0 2 x ¯ μ 2 ψ = λ ψ ,
We compute the Laplacian in Equation (22) as:
(23a) 2 ψ = 1 | g ( μ ) | μ | g ( μ ) | g 11 ( μ ) ψ μ , (23b) = σ 0 n μ n σ 0 σ 0 2 n ψ μ , (23c) = σ 0 2 n 2 ψ μ 2 = σ 0 2 n ψ .
Inserting Equation (23) into Equation (22), we obtain:
k σ 0 2 n ψ + n C σ 0 2 x ¯ μ 2 ψ = λ ψ ,
which is Schrödinger’s equation of the quantum harmonic oscillator [7].

2.4. Solutions of a Single Quantum Harmonic Oscillator in the Riemannian Manifold

Some steps now may seem obvious for those used to quantum mechanics. Considering that ψ has the following form:
ψ ( μ ) = γ e η , with γ > 0 real , η a function of μ .
Equation (24) results:
(26a) k σ 0 2 n γ e η ( η ) 2 + γ e η η + n C σ 0 2 x ¯ μ 2 γ e η = λ γ e η , (26b) k σ 0 2 n ( η ) 2 + η + n C σ 0 2 x ¯ μ 2 = λ .
Assuming a solution for η ( μ ) with the form
η ( μ ) = ξ n x ¯ μ 2 , with ξ > 0 ,
And inserting this expression into Equation (26) gives
k σ 0 2 n 4 ξ 2 n 2 x ¯ μ 2 2 ξ n + n C σ 0 2 x ¯ μ 2 = λ ,
which implies that
4 k σ 0 2 ξ 2 = C σ 0 2 , 2 k σ 0 2 ξ = λ .
In other words, k , C , λ , ξ can not be chosen arbitrarily because they have to satisfy these equations. For example, we can choose k = 2 n and C = 1 2 n , which forces ξ = 1 4 σ 0 2 , and λ = 1 n . Therefore, we can write
2 σ 0 2 n 2 ψ + 1 2 σ 0 2 x ¯ μ 2 ψ = 1 n ψ ,
whose solution is given by
ψ ( μ ) = γ e 1 4 σ 0 2 n x ¯ μ 2 .
With this configuration, we compute the normalization constant γ
1 = ψ 2 ( μ ) d μ = γ 2 e 1 2 σ 0 2 n x ¯ μ 2 d μ = 2 γ 2 σ 0 0 e 1 2 n t 2 d t ,
where we used a first change of variable x ¯ μ σ 0 = t . Now, using a second change of variable 1 2 n t 2 = s , d t = 2 n 1 2 s 1 2 d s , Equation (32) writes as
1 = 2 γ 2 σ 0 0 e s 2 n 1 2 s 1 2 d s = γ 2 2 n σ 0 0 s 1 2 e s d s = γ 2 2 n π σ 0 .
Isolating γ from Equation (33), we obtain γ = n 2 π σ 0 2 1 4 . Therefore, Equation (31) reads as
ψ ( μ ) = n 2 π σ 0 2 1 4 e 1 4 σ 0 2 n x ¯ μ 2 = ψ 0 ( μ ) ,
which is the ground-state solution of the quantum harmonic oscillator problem, and the wave function for the ground-state. The solutions of the quantum harmonic oscillator involve Hermite Polynomials, which were introduced elsewhere [8,9]. In this way, we can prove, after some tedious but straightforward computations, that the wave function:
ψ 1 ( μ ) = γ 1 x ¯ μ σ 0 e 1 4 σ 0 2 n x ¯ μ 2 , with γ 1 > 0 ,
is also a solution of
2 σ 0 2 n 2 ψ + 1 2 σ 0 2 x ¯ μ 2 ψ = λ 1 ψ 1 ,
where λ 1 = 3 n is the energy of the first excited state, and γ 1 = n 3 2 π σ 0 2 1 4 is the normalization constant. With this representation, the λ ’s (energy levels) are given by
λ ν = 2 n ( ν + 1 2 ) = E ν , with ν = 0 , 1 , .
Looking closely at Equation (37), we appreciate that the energy levels depend on two numbers, ν and n. The ground state at ν = 0 has a finite energy E 0 = 1 n , and can become arbitrarily close to zero by massive sampling. Notably, the energy levels are independent of σ 0 . In other words, they do not depend on the informative parameters, leading to the concept of “intrinsic” information-theoretic models which will be discussed in greater detail later.

2.5. Probability Density Function of a Single Source of n Samples, Mean Quadratic Mahalanobis Distance, and Intrinsic Cramér–Rao Lower Bound

Assuming that the square modulus of the wave function can be interpreted as the probability density function:
ψ ( μ ) 2 = ψ * ( μ ) ψ ( μ ) = ρ ( μ ) ,
we can compute the performance of the estimations of μ . For instance, we can calculate the expectation of the quadratic Mahalanobis distance (8) to the sample mean x ¯ at the ground state (34), obtaining
E μ , ρ 0 ( μ ) x ¯ μ σ 0 2 = 1 n x ¯ μ σ 0 / n 2 1 2 π ( σ 0 / n ) 2 e 1 2 ( σ 0 / n ) 2 x ¯ μ 2 d μ = 1 n .
Likewise, we can compute the expectation of the quadratic Mahalanobis distance (8) to the sample mean x ¯ at the first excited state, obtaining
E μ , ρ 1 ( μ ) x ¯ μ σ 0 2 = 1 n 1 2 π ( σ 0 / n ) 2 x ¯ μ σ 0 / n 4 e 1 2 ( σ 0 / n ) 2 x ¯ μ 2 d μ = 3 n .
The expectation of the quadratic Mahalanobis distances to the sample mean x ¯ at the different states equal the quantum harmonic oscillator’s energy levels, i.e., this quantity is definitively quantized. Interestingly, the expectation of the quadratic Mahalanobis distance to the sample mean x ¯ at the ground state (39) equals the intrinsic Cramér–Rao lower bound (ICRLB) for unbiased estimators
E μ x ¯ μ σ 0 2 m n | m = 1 = 1 n ,
considering that we are modeling a single source of n samples with a single informative parameter μ , i.e., m = 1 . For further details, see [10].

2.6. m Independent Sources of n Samples and Global Probability Density Function

With m independent sources, each generating n samples, a finite set of m quantum harmonic oscillators may represent reality. Presuming independence of the sources of information, the “global” wave function (also called the collective wave function) can factor as the product of single wave functions. We can write the global wave function as
ψ ( μ ) = ψ ( μ 1 , μ 2 , , μ m ) = ψ ( μ 1 ) ψ ( μ 2 ) ψ ( μ m ) ,
It constitutes a many-body system, and we may refer to the vector μ as the μ field.
For example, in the case of modeling two independent sources, the global wave function at the ground state will be the product of single wave functions, each of them at the ground state
(43a) ψ 0 ( μ ) = ψ 0 ( μ 1 ) ψ 0 ( μ 2 ) , (43b) = n 2 π σ 0 2 1 4 exp { 1 4 σ 0 2 n x ¯ 1 μ 1 2 } n 2 π σ 0 2 1 4 exp { 1 4 σ 0 2 n x ¯ 2 μ 2 2 } , (43c) = n 2 π σ 0 2 1 4 n 2 π σ 0 2 1 4 exp { 1 4 σ 0 2 n x ¯ 1 μ 1 2 + n x ¯ 2 μ 2 2 } , (43d) = n 2 π σ 0 2 1 2 exp { 1 4 σ 0 2 n x ¯ 1 μ 1 , x ¯ 2 μ 2 x ¯ 1 μ 1 x ¯ 2 μ 2 } .
We can generalize Equation (43) for having m independent sources. The global wave function is written as
ψ 0 ( μ ) = n 2 π σ 0 2 m 4 exp { 1 4 σ 0 2 n x ¯ μ T x ¯ μ } .
Using Equation (38), the probability density function is:
ρ 0 ( μ ) = n 2 π σ 0 2 m 2 exp { 1 2 σ 0 2 n x ¯ μ T x ¯ μ } .

2.7. m Dependent Sources of a Single Sample, the Fisher’s Information, the Riemannian Manifold, and the Quadratic Mahalanobis Distance

Consider now m possibly dependent sources, generating a multivariate sample of size 1, x . Now, although we have a finite set of m univariate quantum harmonic oscillators that can also represent reality, since these sources may be dependent, it is convenient to model this situation as a single m variate source with an m-variate random vector x following a m variate normal probability distribution N m ( μ , Σ 0 ) where μ R m and Σ 0 is a known constant strictly positive definite m × m matrix, the covariance matrix of random vector x , i.e., c o v ( x ) = Σ 0 > 0 . This is a well-known parametric statistical model in which their m-dimensional parameter space may be identified with Θ = R m ; for further details, see [11]. Identifying, as is customary, the points of the manifold Θ with their coordinates μ = ( μ 1 , , μ m ) , we can compute all the quantities relevant to our purpose. For a single sample, the m-variate normal density (with respect to the Lebesgue measure), its natural logarithm, and the partial derivative with respect to μ i are given by
f m ( x ; μ ) = ( 2 π ) m 2 det ( Σ 0 ) 1 2 exp ( 1 2 ( x μ ) T Σ 0 1 ( x μ ) ) ,
ln f m = m 2 ln ( 2 π ) 1 2 ln ( det ( Σ 0 ) ) 1 2 ( x μ ) T Σ 0 1 ( x μ ) ,
ln f m μ i = α = 1 m σ i α x α μ α .
where, following a standard matrix calculus notation, σ i α is the element located in row i and column α of the inverse covariance matrix Σ 0 .
The Fisher’s information matrix G = g i j ( μ ) is a m × m matrix whose elements are
(49a) g i j ( μ ) = E μ ln f m μ i ln f m μ j , (49b) = E μ α = 1 m σ i α ( x α μ α ) β = 1 m σ j β ( x β μ β , (49c) = α = 1 m β = 1 m σ i α σ j β E μ ( x α μ α ) ( x β μ β ) , (49d) = β = 1 m α = 1 m σ i α σ α β σ β j = β = 1 m δ β i σ β j = σ i j ,
where we have taken into account the symmetry of σ j β , and that c o v ( ( x α μ α ) , ( x β μ β ) ) = E μ ( x α μ α ) ( x β μ β ) = σ α β , or, in matrix form, G = Σ 0 1 .
It is well known that the Fisher information matrix is a second-order covariant tensor of the parameter space. It is positive definite and may be considered the metric tensor of this manifold, giving it the structure of the Riemannian manifold. To avoid confusion, we must emphasize that the subscripts or superscripts used to reference the variance and covariance matrix or its inverse do not have a tensor character, i.e., the components of the metric tensor g i j ( μ ) are those of a covariant tensor, in tensor notation, they are written as subscripts and are equal to the components of the inverse variance–covariance matrix given in (49).
The Riemannian geometry induced by the Fisher information matrix in the parameter space is, in this case, Euclidean, and the square of the Riemannian distance, also known as the Rao distance, is the Mahalanobis distance given by
d M 2 ( μ 2 , μ 1 ) = ( μ 2 μ 1 ) T Σ 0 1 ( μ 2 μ 1 ) .
In this expression, the parameter space points are identified with their coordinates and written in matrix notation as m × 1 column vectors.
All these results correspond to a multivariate with a sample size n = 1 .

2.8. m Dependent Sources of n Samples, the Fisher’s Information, the Riemannian Manifold, and the Square of the Riemannian Distance

If each of the m dependent sources generates n samples, ( x 1 , , x n ) , drawn independently from a multivariate normal probability distribution (46), the likelihood distribution is
(51a) f m , n ( x 1 , , x n ; μ ) = i = 1 n ( 2 π ) m 2 det ( Σ 0 ) 1 2 exp ( 1 2 ( x i μ ) T Σ 0 1 ( x i μ ) ) , (51b) = ( 2 π ) m n 2 det ( Σ 0 ) n 2 exp ( 1 2 i = 1 n ( x i μ ) T Σ 0 1 ( x i μ ) ) .
The summation term within the exponential function can be decomposed into two terms
(52a) i = 1 n ( x i μ ) T Σ 0 1 ( x i μ ) = i = 1 n Tr ( Σ 0 1 ( x i μ ) ( x i μ ) T ) , (52b) = Tr Σ 0 1 i = 1 n ( x i μ ) ( x i μ ) T , (52c) = Tr Σ 0 1 i = 1 n ( x i x ¯ + x ¯ μ ) ( x i x ¯ + x ¯ μ ) T , (52d) = Tr Σ 0 1 i = 1 n ( x i x ¯ ) ( x i x ¯ ) T + i = 1 n ( x ¯ μ ) ( x ¯ μ ) T , (52e) = n Tr Σ 0 1 S n + n d M 2 ( x ¯ , μ ) ,
where x ¯ = ( i = 1 n x i ) / n , and S n = ( i = 1 n ( x i x ¯ ) ( x i x ¯ ) T ) / n , i.e., the sample mean and the sample covariance matrix corresponding to this random sample is the quadratic Mahalanobis distance to the sample mean, x ¯ .
Inserting Equation (52) into Equation (51) results
f m , n ( x 1 , , x n ; μ ) = ( 2 π ) m n 2 det ( Σ 0 ) n 2 exp n 2 Tr { Σ 0 1 S n } exp n 2 d M 2 ( x ¯ , μ ) .
The log-likelihood distribution is
ln f m , n = m n 2 ln ( 2 π ) n 2 ln det ( Σ 0 ) n 2 Tr { Σ 0 1 S n } n 2 d M 2 ( x ¯ , μ ) ,
and the partial derivative of ln f m , n with respect to μ α using standard classical notation for covariant derivatives and additionally using repeated index summation convention is
ln f m , n μ α = ln f m , n , α = n g α β ( x ¯ β μ β ) .
The corresponding Fisher information matrix and the square of Riemannian distance for a sample of size n will be the above Equations (49) and (50) multiplied by n.

2.9. Stationary States of m-Dependent Sources of n Samples in the Riemannian Manifold

To calculate the stationary states, we can invoke the time-independent non-relativistic Schrodinger’s equation [6] as above. In the multivariate case, the wave equation reads as follows
k 2 ψ ( μ ) + U ( μ ) ψ ( μ ) = λ ψ ( μ ) ,
where U ( μ ) is the potential energy and k , λ > 0 . The solution must also satisfy that vanish to infinity and R m ψ 2 ( μ ) d μ = 1 . For simplicity, we will write ψ instead of ψ ( μ ) .
We can use the square of the norm of the log-likelihood gradient as the potential energy, except for a constant term. The components of the gradient of ( ln f m , n ) , a contravariant vector field, observing that the inverse of the metric tensor corresponding to a sample of size n is given by 1 n g γ α since g γ α g α β = δ β γ where δ β γ is the Kronecker delta’s, will be given in classical notation by
( ln f m , n ) γ = ( ln f m , n ) , γ = 1 n g γ α ( ln f m , n ) , α = 1 n g γ α n g α β ( x ¯ β μ β ) = ( x ¯ γ μ γ ) ,
and, therefore, the square of the norm of the log-likelihood gradient will be
ln f m , n 2 = n g γ β ( ln f m , n ) , γ ( ln f m , n ) , β = n g γ β ( x ¯ γ μ γ ) ( x ¯ β μ β ) = n d M 2 ( x ¯ , μ ) .
Alternatively, we can use the difference between the log-likelihood at an arbitrary point μ 0 = x ¯ minus the log-likelihood attained by the sample as the potential
U ( μ ) ln f m , n ( x ¯ ) ln f m , n ( μ ) = n 2 d M 2 ( x ¯ , μ ) ,
which is up to a proportionality constant equal to (58). Thus, Equations (58) and (59) suggest to take as the potential energy U ( μ ) = n C d M 2 ( x ¯ , μ ) with C > 0 . In this way, Equation (56) reads as
k 2 ψ + n C d M 2 ( x ¯ , μ ) ψ = λ ψ .
To proceed, we must compute the Laplacian in Equation (60). If g = det ( G ) , with G defined in (49), i.e., the determinant corresponding to the tensor of the information metric for samples of size n = 1 , for a sample of arbitrary size n that determinant will be equal to n m g . In this way, the Laplacian of a function ψ will be given by
2 ψ = 1 n m g μ i n m g 1 n g i j ψ μ j = 1 n g i j 2 ψ μ i μ j ,
where we have used repeated index summation convention. For further details about this formula, see, for instance, [12]. Notice that g i j equals the variance–covariance matrix Σ 0 . Moreover, i m × m matrix Ψ μ = 2 ψ μ i μ j m × m , which is the Hessian matrix of ψ under the coordinates μ = ( μ 1 , , μ m ) , Equation (61) can be written as
2 ψ = 1 n Tr Σ 0 Ψ μ .
Inserting Equation (62) into (60), we obtain
k n Tr Σ 0 Ψ μ + n C d M 2 ( μ , x ¯ n ) ψ = λ ψ ,
which is the Schrödinger’s equation of m-coupled quantum harmonic oscillators.

2.10. Solutions of m-Coupled Quantum Harmonic Oscillators in the Riemannian Manifold

We observe that both (58) and (61) are invariant under coordinate changes on Θ = R n . Therefore, Equation (63) will remain invariant under these changes, particularly linear ones.
Since Σ 0 is a symmetric m × m matrix which diagonalizes on an orthonormal basis, it can be written as Σ 0 = U D U T , where U is an orthogonal m × m matrix and D is a diagonal m × m matrix D = d i a g ( γ 1 2 , , γ m 2 ) , i = 1 , , m , where γ α > 0 are the eigenvalues of the square root of the variance–covariance matrix Σ 0 .
Thus, by introducing the change of coordinates η = U T μ and y ¯ = U T x ¯ , the metric tensor becomes diagonal, i.e., G ^ = D 1 . Taking this coordinate change into account, the Equation (58) becomes
n d M 2 ( x ¯ , μ ) = n d M 2 ( y ¯ , η ) = i = 1 m 1 γ i 2 ( y ¯ i η i ) 2 .
Moreover, if we define the symmetric m × m matrix Ψ η = 2 Ψ η i η j m × m , which is the Hessian matrix of ψ under the new coordinates η = ( η 1 , , η m ) , the Equation (61) becomes
2 ψ = 1 n Tr Σ 0 Ψ μ = 1 n Tr D Ψ η .
Making use of Equation (64) and Equation (65) in Equation (63), we obtain
k n Tr ( D Ψ η ) + n C d M 2 ( y ¯ , η ) ψ = λ ψ ,
which is the Schrödinger’s equation of the m-decoupled quantum harmonic oscillators. If we choose k = 2 n and C = 1 2 n , Equation (66) can be written as
i = 1 m 2 γ i 2 n 2 2 ψ ( η i ) 2 + 1 2 γ i 2 ( y ¯ i η i ) 2 ψ = λ ψ .
Additionally, if we define λ = i = 1 m λ α with λ α > 0 , we may write
i = 1 m 2 γ i 2 n 2 2 ψ ( η i ) 2 + 1 2 γ i 2 ( y ¯ i η i ) 2 ψ λ α ψ = 0 .
Note, that if ψ α is a non–trivial solution of
2 γ α 2 n 2 2 ψ α ( η α ) 2 + 1 2 γ α 2 ( y ¯ α η α ) 2 ψ α = λ α ψ α , α = 1 , , m .
then, Equation (66) admits a solution of the form
ψ ( η 1 , , η m ) = α = 1 m ψ α ( η α ) .
Each of the Equations in (69) admits infinite solutions for different values of λ α , as in our previous work [1]. More specifically, it admits solutions for
λ α , ν = 2 n ( ν + 1 2 ) , ν N .
In particular, for ν = 0 , we have λ α , 0 = 1 n and the wave function for the ground-state is
ψ α , 0 ( η α ) = n 2 π γ α 2 1 4 e 1 4 γ α 2 n y ¯ α η α 2 ,
Then, the global wave function at the ground state will be
ψ 0 ( η 1 , , η m ) = α = 1 m ψ α , 0 ( η α ) ,
with λ = α = 1 m 1 n = m n , which is the intrinsic Cramér–Rao lower bound (ICRLB) for m sources of information of n samples, with each source being modeled with an informative parameter μ , i.e., a total of m informative parameters. For further details, see [10]. The global probability density function at the ground state can be written as
(74a) ρ 0 ( η 1 , , η m ) = ψ 0 ( η 1 , , η m ) 2 = α = 1 m ψ α , 0 2 ( η α ) , (74b) = n 2 π m 2 α = 1 m γ α 2 1 2 exp n 2 α = 1 m 1 γ α 2 y ¯ α η α 2 , (74c) = n 2 π m 2 det ( D ) 1 2 exp ( n 2 y ¯ α η α T D 1 y ¯ α η α ) .
Since η = U T μ and y ¯ = U T x ¯ , where U is an orthonormal m × m matrix and, therefore, | det U | = 1 , we can express Equation (74) as a function of the original coordinates
ρ 0 ( μ 1 , , μ m ) = n 2 π m 2 det ( Σ 0 ) 1 2 exp ( n 2 ( x ¯ μ ) T Σ 0 1 ( x ¯ μ ) )
However, there are many other solutions in (69) considering different values of ν in (71). It is well-known that the solutions of the quantum harmonic oscillator involve Hermite Polynomials, which were introduced elsewhere [8,9]. In particular, for ν = 1 , we will have λ α , 1 = 3 n and the wave function at the first-excited state will be
ψ α , 1 ( η α ) = n 3 2 π γ α 2 1 4 y ¯ α η α γ α e 1 4 γ α 2 n y ¯ α η α 2 .
We can obtain other solutions via the Hermite polynomials, representing excited states of the quantum harmonic oscillator. For instance, we may obtain the solution ψ α , ν for each of the sources α = 1 , , m and for each energy level ν = 0 , , k . Combining the m sources with the k + 1 energy levels, we can build up all possible solutions and, therefore, obtain up to ( k + 1 ) m different solutions
ψ ( η 1 , , η m ) = α = 1 m ψ α , ϵ α ( η α ) ,
where ϵ α { 0 , 1 , , k } and λ = α = 1 m 2 n ϵ α + 1 2 = m n + 2 n α = 1 m ϵ α , the total energy of m oscillators, such that m n λ 2 m k n .

2.11. Bayesian Framework and Posterior Probability Density Function

Regardless of having independent or dependent sources of information, we can compute the posterior probability distribution calculated from the sources of information for all data values using Bayes’theorem [13], taking the Riemannian volume of the metric as a prior. This measure is Jeffrey’s prior distribution on the parameter, and it can be considered in some way an objective, or at least a reference choice for a prior distribution [14].
Considering Equation (49), the Riemannian volume det ( G ) is constant in Θ . Then, taking into account the likelihood probability distribution (51), the posterior probability distribution f m , n ( μ ; x 1 , , x n ) based on Jeffrey’s prior is equal to
(78a) f m , n ( μ ; x 1 , , x n ) f m , n ( x 1 , , x n ; μ ) g , (78b) exp ( 1 2 n d M 2 ( x ¯ , μ ) ) , (78c) = n 2 π m 2 det ( Σ 0 ) 1 2 exp ( n 2 d M 2 ( x ¯ , μ ) ) , (78d) = n 2 π m 2 det ( Σ 0 ) 1 2 exp ( n 2 ( x ¯ μ ) T Σ 0 1 ( x ¯ μ ) )
This posterior probability density function coincides with the global probability density function at the ground state (75). Precisely, the probability density function of m quantum harmonic oscillators at the ground state given by the square of the wave function coincides with the Bayesian posterior obtained from m sources of information for all data values using the improper Jeffrey’s prior. This unexpected and exciting result reveals a plausible relationship between energy and Bayes’ theorem.

3. Discussion

This paper aimed to expand and refine the mathematical framework initially presented in [1]. We made specific adjustments to the approach, enabling us to consider real-world scenarios more thoroughly. As we continued with our work, we came to appreciate the “intrinsic” nature of the modeling, which we believe is a crucial aspect of our study. Our ultimate objective was to improve upon the foundation established in the previous paper and create an even more robust and accurate framework.
First, we extended the approach by modeling a single source of information with a univariate normal probability distribution N ( μ , σ 0 ) , as before, but with a constant σ 0 not necessarily equal to 1. We calculated the stationary states in the Riemannian manifold by invoking Schrödinger’s equation to discover that the information could be broken into quantum harmonic oscillators as before but with the energy levels being independent of σ 0 , an unexpected but relevant result that motivated us to continue exploring this field.
This primitive result led us to title the work “Intrinsic information-theoretic models”, which asserts that the critical features of our modeling process, such as the energy levels, remain independent of the parametrization used and invariant under coordinate changes. This notion of invariance is significant because it implies that the same model can be applied across different parameterizations, allowing for greater consistency and generalizability. Furthermore, this approach can lead to a more robust and reliable modeling process, as it reduces the impact of specific parameter choices on the final model output. As such, the notion of “intrinsic” information-theoretic models has the potential to improve modeling accuracy and reliability significantly.
Similar to our previous study [1], we evaluated the performance of the estimation of the parameter μ . Instead of calculating the estimator’s variance, we used the expectation of the quadratic Mahalanobis distance to the sample mean to discover that equals the energy levels of the quantum harmonic oscillator, being the minimum quadratic Mahalanobis distance at the minimum energy level of the oscillator. Interestingly, we demonstrated that quantum harmonic oscillators reach the “intrinsic” Cramér–Rao lower bound on the quadratic Mahalanobis distance at the lowest energy level.
Then, we modeled m independent sources of information and computed the global density function at the ground state as an example. Essentially, we modeled sources with a multivariate normal probability distribution N m ( μ , Σ 0 ) , with a variance–covariance matrix Σ 0 different than the identity matrix of m-dimension, I m , but being diagonal initially to describe the independence of the sources of information.
We advanced the mathematical approach by modeling m dependent sources of information with a variance–covariance matrix Σ 0 not necessarily diagonal, depicting dependent sources. This resulted in Schrödinger’s equation of m-coupled quantum harmonic oscillators. We could effectively decouple the oscillators through a coordinated transformation, thereby partitioning the information into independent modes. This enabled us to obtain the same energy levels, albeit now with respect to the modes, which further proves the “intrinsic” property of the mathematical framework.
Finally, as in our previous study, we showed that the global probability density function of a set of m quantum harmonic oscillators at the lowest energy level, calculated as the square modulus of the global wave function at the ground state, equals the posterior probability distribution calculated using Bayes’ theorem from the m sources of information for all data values, taking as a prior the Riemannian volume of the informative metric. This is true regardless of whether the sources are independent or dependent.
Apart from the mathematical discoveries detailed in this paper, this framework offers multiple alternatives that we are currently exploring. For example, the informational representation of statistical manifolds with Σ 0 is unknown. Also, this approach can be generalized by exploring other statistical manifolds and depicting how physical observables such as space and time may emerge from linear and nonlinear transformations of a set of parameters of a specific statistical manifold. This way, the laws of physics, including time’s arrow, will appear afterward.
Moreover, several fascinating inquiries warrant further investigation. These involve delving into the relationship between energy and information already highlighted in our initial work. In addition, the very plausible connection between energy and Bayes’ theorem also deserves further exploration. By delving deeper into these topics, we may unlock even more insights into the universe’s fundamental nature and mathematical laws.
The updated framework presented in this study offers a more realistic approach by allowing the modeling of m-dependent sources. In real-world scenarios, information is often distributed over multiple sources that may not be entirely independent. By formulating the problem in terms of modes, we can obtain a solution or set of solutions for the proposed framework. This approach provides a valuable tool for solving complex problems that require a deeper understanding of the underlying dynamics.

Author Contributions

Conceptualization, D.B.-C. and J.M.O.; writing—original draft preparation, D.B.-C. and J.M.O.; writing—review and editing, D.B.-C. and J.M.O. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study did not require ethical approval.

Data Availability Statement

No new data were created.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Bernal-Casas, D.; Oller, J.M. Information-Theoretic Models for Physical Observables. Entropy 2023, 25, 1448. [Google Scholar] [CrossRef] [PubMed]
  2. Fisher, R. On the mathematical foundations of theoretical statistics. Philos. Trans. R. Soc. Lond. Ser. Contain. Pap. Math. Phys. Character 1922, 222, 309–368. [Google Scholar]
  3. Riemann, B. Über die Hypothesen, Welche der Geometrie zu Grunde Liegen. (Mitgetheilt durch R. Dedekind). 1868. Available online: https://eudml.org/doc/135760 (accessed on 15 July 2023).
  4. Mahalanobis, P. On the generalized distance in Statistics. Proc. Nat. Inst. Sci. India 1936, 2, 49–55. [Google Scholar]
  5. Frieden, B. Science from Fisher Information: A Unification, 2nd ed.; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
  6. Schrödinger, E. An Undulatory Theory of the Mechanics of Atoms and Molecules. Phys. Rev. 1926, 28, 1049–1070. [Google Scholar] [CrossRef]
  7. Schrödinger, E. Quantisierung als Eigenwertproblem. II. Ann. Phys. 1926, 79, 489–527. [Google Scholar] [CrossRef]
  8. Laplace, P. Mémoire sur les Intégrales Définies et leur Application aux Probabilités, et Spécialement a la Recherche du Milieu Qu’il Faut Choisir Entre les Resultats des Observations. In Mémoires de la Classe des Sciences Mathématiques et Physiques de L’institut Impérial de France; Institut de France: Paris, France, 1811; pp. 297–347. [Google Scholar]
  9. Hermite, C. Sur un Nouveau Développement en Série de Fonctions; Académie des Sciences and Centre National de la Recherche Scientifique de France: Paris, France, 1864. [Google Scholar]
  10. Oller, J.M.; Corcuera, J.M. Intrinsic Analysis of Statistical Estimation. Ann. Stat. 1995, 23, 1562–1581. [Google Scholar] [CrossRef]
  11. Muirhead, R. Aspects of Multivariate Statistical Theory; Wiley: Hoboken, NJ, USA, 1982. [Google Scholar] [CrossRef]
  12. Chavel, I. Eigenvalues in Riemannian Geometry; Elsevier: Philadelphia, PA, USA, 1984. [Google Scholar] [CrossRef]
  13. Bayes, T. An essay towards solving a problem in the doctrine of chances. Phil. Trans. R. Soc. Lond. 1763, 53, 370–418. [Google Scholar] [CrossRef]
  14. Jeffreys, H. An invariant form for the prior probability in estimation problems. Proc. R. Soc. Lond. Ser. Math. Phys. Sci. 1946, 186, 453–461. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bernal-Casas, D.; Oller, J.M. Intrinsic Information-Theoretic Models. Entropy 2024, 26, 370. https://doi.org/10.3390/e26050370

AMA Style

Bernal-Casas D, Oller JM. Intrinsic Information-Theoretic Models. Entropy. 2024; 26(5):370. https://doi.org/10.3390/e26050370

Chicago/Turabian Style

Bernal-Casas, D., and J. M. Oller. 2024. "Intrinsic Information-Theoretic Models" Entropy 26, no. 5: 370. https://doi.org/10.3390/e26050370

APA Style

Bernal-Casas, D., & Oller, J. M. (2024). Intrinsic Information-Theoretic Models. Entropy, 26(5), 370. https://doi.org/10.3390/e26050370

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop