Furthermore, if we apply this procedure of analyzing data to the simple statistical model corresponding to a multivariate normal distribution with a constant covariance matrix, a model that can be considered at least as an approximation of many regular statistical models, for large samples via extensions of the central limit theorem, we obtain, from the above-mentioned partial differential equation in the parameter space, a specific differential equation already studied in physics and known as the stationary (time-independent) Schrödinger Equation applied to a quantum harmonic oscillator.
2.1. The Information Sources
Let us consider a class of entities, each of which can be seen as some type of information source that provides sequences of independent data we want to analyze. We will assume that each source can be described by a convenient probability space and the set of all of them through a parametric statistical model. The data may be viewed as simple random samples of arbitrary size. Let us introduce some notations to set a convenient mathematical framework for work.
Let be a sample or an input space, let be a –algebra of subsets of , and let be a positive –finite measure on the measurable space . In the present paper, a parametric statistical model is defined as the triple, where is a measure space, is a manifold, also called the parameter space, and f is a measurable map such that and is a probability measure on , . We shall refer to as the reference measure of the model and f as the model function.
Observe that, under this framework, each point of the manifold represents, potentially, different information sources, which supply sample data, obtained under independence assumptions and summarized by , say , each belonging to a convenient sample or input space which may be partially hidden, i.e.: may not be necessarily completely observed, although all of these allow a reasonable and statistically consistent close estimate of the true parameter which characterizes the particular information source studied. Given such a sample, the joint density function with respect to the reference measure extended on the Cartesian product will be , where stands for the sample joint density function which, setting , coincides with the likelihood function, in .
Moreover, in the present paper, for the sake of simplicity,
will be an
m–dimensional
real manifold, Hausdorff, and connected, with or without boundary which, in that case, will be denoted by
, although infinite-dimensional Hilbert or Banach manifolds could also be considered. Furthermore, for many purposes, it will be enough to consider the case that
is a connected open set of
, and, in this case, it is customary to use the same symbol (
) to denote points and coordinates. Considering this remark, we shall adopt this case and notation hereafter to present the results more familiarly, even though they can be written with more generality. Also, we shall assume that the model function
f satisfies certain regularity conditions we will detail when necessary. Additionally, we will incorporate into our mathematical framework the basic developments and results of what in mathematical statistics is known as information geometry, where the parameter space
is a Riemannian manifold, the metric tensor of which will be given through its covariant components,
where the expectation in (
8) is obtained by integrating the product of partial derivatives with respect to the measure
on
. Observe, in particular, that if
, the Riemannian volume
will be given by
. For further details, see, for example, the pioneering work [
17], and other works such as [
18,
19,
20,
21], among the works of many other authors. Although we can consider as a measure of information corresponding to the sample
,
relative to the
true parameter
, simply the log-likelihood
, this measure is generally not invariant under injective data transformations, such as rescaling. There are many ways to correct this lack of invariance, for instance, a simple way will be to choose a reference point on the manifold, say,
, and define the information codified by the data
, from the source,
relative to the
true parameter
referred to an arbitrary point
as
The implicit dependence of (
9) on
is omitted from the notation since its choice will not play any further role when we calculate its gradient in the parametric manifold. Additionally, (
9) will remain invariant under appropriate data changes, and fixed
, is also invariant under coordinate changes on the parametric manifold, i.e., it is a scalar field on
.
The information provided by a source, external to the observer, is represented inside him, allowing him to increase his understanding of the knowledge of the objects. Although we can consider different levels and types of said representation, we will focus on two important aspects: the parameter space with its natural information geometry and the observer’s ability to construct a plausibility regarding the true value of the parameter,
, in the parameter space
, once given a particular sample data, essentially a square root of a kind of subjective conditional probability density with respect to the Riemannian volume, induced by the information metric over the parameter space,
, up to a normalization constant. Specifically, we shall write at the beginning
for reasons that will be apparent later, although we shall be particularly interested in the case
. Observe that in (
10), we are integrating with respect to the Riemannian measure defined in (
8) and, therefore, this expression is invariant under coordinate changes. Furthermore, if we intend to define on the parametric manifold a probability, interpretable as a plausibility, about the true value of the parameter, we can take function
as the Radon–Nikodym derivative of said probability with respect to the Riemannian volume, also a measure in the same parameter space. Both measures are independent of the coordinate system used and, therefore,
will be an invariant scalar field on the parametric manifold
. Then, we can simply
define the
information encoded by the
subjective plausibility
on the parameter space
relative to the
true parameter
as
This quantity (
11) also remains invariant under coordinate changes on the parametric manifold, being another scalar field on
.
2.2. The Variational Principle
In this context, trusting that many of the abilities of the observer have been efficiently shaped by natural selection in the process of biological evolution, we can propose that the subjective information mentioned above adjusts, in some way, to the information provided by the source and, in particular, satisfying the variational principle
is a
minimum, or at least stationary, subject to the constraint (
10) and assuming that
and its gradient
vanish at the boundary
or at infinity, which is the way to model that we have strong reasons to believe that the true parameter
should belong far from the boundary, clearly inside of
. Observe that the functional
is equal to the expected value of the square of the norm, corresponding to the Riemannian metric given in (
8), of the differences between gradients
and
,
, the expected value corresponding to the probability on
given by the density
, with respect to the Riemannian volume induced by the information metric
. Observe that (
12) is invariant under coordinate changes in
, since the square of the norm inside the integral is an invariant and
too. Notice also that the source is considered as something objective or at least, strictly speaking, inter-subjective, while the parameter space, with its geometric properties, in some sense, is built by the observer, and therefore, subjective although strongly conditioned by the source.
Any change in the information encoded by x, caused by considering a shift of the source in the parameter space, should correspond to a change in the subjective information proposed by the observer. For this reason, we propose that the squared difference of both gradients and , divided by the sample size n, would be, on average , as small as possible, at least locally.
2.3. Solving the Variational Problem
Since (
12) it is an optimization problem with, at least, constraint (
10), we may introduce the augmented Lagrangian
where
is a constant Lagrange multiplier. Observe that, since
, we will have
where the expression (
14) is, in its entirety, invariant under coordinate changes in
. Therefore, if we let
, an arbitrary smooth function on
and
, satisfy (
10), omitting, for simplicity, the variable names of the functions to be integrated, i.e.,
,
and
instead of
,
, and
, and taking into account (
14), we have
where we have taken into account that
. Then, the derivative of (
15) with respect to
will be
and the first variation of the augmented Lagrangian
Observe, now, that we can use a well-known differential operator expression in a Riemannian manifold,
, where Δ stands for the Laplacian operator. For further details, see, for instance, the work [
22]. Thus, we shall have
but, by the Gauss divergence theorem, we have
where
is a unitary vector field on
pointing out
, and
is the surface element in
induced by the information Riemannian metric on
, and taking into account that, by the boundary conditions,
vanishes at
or at infinity, we obtain (
21). Thus, substituting this result into (
20), we obtain
Additionally, since
, we shall have
but, by the Gauss divergence theorem and the bi-linearity of the scalar product, we have
since, by the boundary conditions,
vanishes at
or at infinity, and, therefore, we have (
24). Then, substituting this equation into (
23), we obtain
Combining (
19) with (
22) and (
25), we obtain
for arbitrary
. Therefore, we arrive at the fundamental equation
or, in terms of (
9) the fundamental equation will become
This equation supplies the solutions of stationary points of the variational problem (
12) subject to the constraint (
10) and assuming that
and its gradient
vanish at the boundary
or at infinity as we have previously highlighted. Observe that for any obtained solution
of (
27) or (
28), a multiple of this function is also a solution of these partial derivative differential equations, being able to select, among these, the one that satisfies the restriction (
10). Furthermore, if we choose
, we get a direct probabilistic interpretation, i.e.,
as a probability density with respect to the Riemannian volume.
2.4. First Remarks on the Solutions
To interpret the Lagrange multiplier, let
be a solution of (
27) which supplies stationary points of the augmented Lagrangian (
13) subject to the constraint (
10). This solution may depend on
a. We may write to emphasize this
. Additionally, defining
by the chain rule, taking into account (
10) and (
13), it is straightforward to obtain
Taking into account that
does not change if we multiply
by any constant, we can write
where
is the function
normalized, i.e.,
. Then, we have that the constant
is greater than or equal to zero, i.e.,
, and, as a consequence, we shall have
.
Moreover, let us suppose that the system (
27) or (
28) with the constraint (
10) admits as a solution for
,
, and where, without loss of generality, we have chosen
. Then, observe that the second derivative of the augmented Lagrangian (
13) with respect to
is
Therefore, by (
32), the augmented Lagrangian
is convex, and the fundamental Equation (
27) provides an
absolute minimum of the variational problem (
12) subject to the constraint (
10), with
.