1. Introduction: Missing Uncertainties
Assume that the available data consist of a sequence of independent observations,
, each having the same mean. We consider the situation where the unknown variances of the observables cannot be supposed to be equal or to be restricted in any way. This scenario is perhaps unusual in the statistical community. However, according to Morris (1983, p. 49) [
1], “… almost all applications involve unequal variances”. The problem is to estimate the common mean modeled as a location parameter without traditional conditions on the standard deviations (scale parameters).
This setting appears in instances of heterogeneous research synthesis where
represents the summary estimate of the common mean (e.g., the treatment effect) obtained by the
j-th study. Commonly, the protocol of such studies demands that
be accompanied by its uncertainty estimate, but sometimes these estimates are either unavailable or cannot be trusted. In many applications, the variances of systematic, laboratory-specific errors cannot be reliably estimated; a scientist cannot place confidence in inferences made under unrealistically low noise. The issue of underreported uncertainties, particularly those that stem from asymptotic normal theory, which presupposes large datasets, is prevalent in many applications. The existing imputation techniques (e.g., Rukhin [
2], Templ [
3]) may not provide justifiable uncertainty values.
The latest point of view (Spiegelhalter [
4]) is that uncertainty is a subjective relationship between an observer and what is observed. The issue of underreported uncertainties, particularly those that stem from asymptotic normal theory, which presupposes large datasets, is prevalent in metrology. The challenge of reproducibility within individual centers may be exacerbated by the nature of the measuring instruments employed, leading to heterogeneous unknown uncertainties (see Possolo [
5]). The most striking example is provided by “one-shot” devices in the atomic industry, which are limited to single use.
An additional contemporary illustration is found in citizen science or crowd-sourcing projects, where participants contribute measurement results of various random phenomena, with some of them using relatively imprecise instruments, such as smartphones. These measurements can range from precipitation levels to air quality and biological observations. See Hand [
6] for an introduction. Despite anticipated heterogeneity, the project organizers are faced with the task of synthesizing data in the absence of reliable uncertainty (
) estimates.
Our investigation focuses on Bayes estimators obtained from the posterior distribution for an unknown mean, set against a non-informative, objective, or “uniform” prior distribution for both the mean and independent variances. This line of inquiry, initiated by Rukhin [
7] under the assumption of normality, grapples with the complete lack of variance information. Needless to say, this framework introduces several statistical complications. For instance, the classical maximum likelihood estimator becomes undefined, as the likelihood function reaches infinity at each data point. Nevertheless, the problem is well defined, as estimating the common mean requires determining at most
n parameters, the mean itself, and the variance ratios,
, which belong to the unit simplex of dimension
,
.
In
Section 2.1, we investigate the Bayes estimators in the setting allowing for a group of homeogeneous observations, which have the same unknown variance. Under the normality condition, these procedures turn out to have a surprisingly explicit form. In fact, each of the derived rules is a weighted average with data-dependent weights that are invariant under the location–scale transformations, admitting a very clear interpretation. The approximate formulas for the variance of the considered estimators and their limiting behavior are also examined.
Section 3 contains several approaches to the distribution of the Bayes estimator. The orthogonal polynomials are discussed in
Section 3.3 with recursive formulas derived in
Section 3.4.
2. Non-Informative Priors and Bayes Estimators
Consider the situation where distinct independent observables are drawn from a location–scale parameter family with underlying symmetric density p, which has all necessary moments.
The principal interest is in the mean , while are positive nuisance parameters. For this purpose, one needs to estimate the -dimensional vector , with and . If is such an estimator, then one can use as a -statistic. Indeed, if all scale parameters are known, the best unbiased estimator of is the weighted means rule, .
Commonly, the estimated weights are taken to be location-invariant—i.e., for any real
c,
Then, the corresponding estimator
is (location) equivariant,
Most estimators used in practice are also scale-equivariant,
and this property calls for scale-invariant weights.
In the normal case, the reduction to the invariant procedures leads to an explicit form of the maximum likelihood estimators and of some Bayes procedures.
To eliminate nuisance parameters
(or
, one can use a non-informative prior, which is a classical technique. Under mild regularity conditions on the underlying density
p, Rukhin [
8] derived the Bayes estimator under the quadratic loss (the posterior mean) against the uniform (reference) prior
. This statistic coincides with the Bayes rule within the class of invariant procedures.
The discrete posterior distribution is supported by all data points with probabilities
Here, further,
denotes the parity of observations.
Thus, the Bayes estimator has a very explicit form:
The magnitude of probabilities (
1) describes the intrinsic similarity of observations: the weight of a data-point
is large if it is close to the bulk of data, meaning that
is relatively small.
Statistic
also appears in the approximation theory. According to the Tchebycheff interpolation formula, one has
where
runs through polynomials of degree not exceeding
. See Chapter 5 in Trefethen (2013) [
9].
The probabilities (
1) have their origin in optimization problems involving the discriminant function. Borodin [
10] discusses their use in statistical physics. Genest et al. [
11] study the remarkable mirror-symmetry (persymmetry) of the underlying Jacobi matrix.
2.1. Heterogeneity and Homogeneity
Here, normality of observations,
,
, is assumed. We consider the setting where, in addition to
x values, there is a possible group of distinct homogeneous data that have the same unknown standard deviation
, say,
. In the context of citizen science projects mentioned in
Section 1,
may represent data supplied by smartphone users, while
x values correspond to measurements derived by other means. In metrology applications, a known homogeneous group of laboratories employing the same techniques may participate in interlaboratory studies.
Then, one has independent observations and altogether unknown parameters with the main interest in .
We start with the traditional reference prior density of the form
relative to
. Here,
For any continuous bounded function
,
Indeed, for any
,
and
For any fixed small
and fixed
, provided that all data points are different,
Therefore, for
,
where
.
Thus, we can formulate the first result.
Theorem 1. Under the prior (3), when the posterior distribution of μ is discrete with finite support and the probabilitiesHere, are estimators of the common mean and variance based on the homogeneous sub-sample. The Bayes estimator of μ, i.e., the posterior mean, iswith defined by (4). The probabilities (
4) still describe the intrinsic similarity of observations: the weight of a data-point
is large if it is close to the greater part of the data. The attenuating factor,
when
, encourages
, which is close to
. When the homogeneous data are absent, this factor is 1 and (
5) coincides with (
2).
In this situation, the posterior mode
, if
presents the maximum likelihood estimator within the class of invariant procedures.
The prior density (
3) for the mean and the variances represents the right Haar measure on the group of linear transforms. In the context of the multivariate normal model, it is known as the the Geisser–Cornfield prior. See Geisser [
12], Ch 9.1. This prior is known to be an exact frequentist matching prior yielding as the posterior Fisher’s fiducial distribution (Fernandez and Steel [
13], Severini et al. [
14]).
Despite this fact, “the prior seems to be quite bad for correlations, predictions and other inferences involving a multivariate normal distribution” (Sun and Berger [
15]). Its mentioned drawbacks stem from the fact that if
, the marginal (or prior predictive) density does not exist. A related weakness of (
5) is its sensitivity to observations which are close one to another.
To mitigate these drawbacks, we look now for other prior distributions.
2.2. Conjugate Priors and Variance Formulas
A wide class of Bayes estimators of
arises from conjugate prior densities,
relative to
. Here,
and
are hyperparameters to be specified in (
6),
.
A slightly modified proof of Theorem 1 shows that the posterior distribution of
under the prior (
6) is proportional to
where
, which is treated as a constant in the following discussion. The posterior distribution in this situation is the product of
t-densities (with
a degrees of freedom) and
t-density (with
degrees of freedom). Thus, it is a particular case of the
poly-t distribution, which is ubiquitous in multivariate analysis. It appears in the posterior analysis of linear models (Box and Tiao [
16]) and is popular in econometrics (Bauwens [
17]).
The Bayes estimator has the form
If
, (
7) is the classical
Pitman estimator of the location parameter involving
t-distributions with
a degrees of freedom. It is especially well studied for the Cauchy location/scale parameter family (
).
If, in addition,
,
which corresponds to the formal Pitman estimator of the location parameter derived from the working family
employed to estimate the location parameter
when the observations are normal.
Needless to say that the functions in this family are not probability densities. Moreover, they have a singularity of the third kind.
When
and
are fixed positive numbers, the approximate variance of (
7) can be found via the usual argument employed for
M-estimators, i.e., solutions of the moment-type equation
(or minimizers of
). In our case, the contrast functions are,
The M-estimator
satisfies the equation
According to well-known results (e.g., Huber and Ronchetti [
18]),
Here,
refers to the expectation evaluated under the normal distribution with zero mean and variance
; the distribution of
is also normal with variance
. The main restriction on
is that the Central Limit Theorem for
holds. For instance, one can employ (
8) if the Liapounov condition for independent non-identically distributed summands is satisfied—i.e.,
(e.g., Lehmann [
19], Theorem 2.7.3).
To simplify (
8), we need the known formula for the standard normal
Z and positive
,
where
is the familiar Mills ratio (Stuart and Ord, 1994) [
20]. With
the differentiation of this identity shows that for
,
and
where for
one has to replace
by
.
These identities allow for expressing the approximate variance (
8) in terms of the Mills ratio:
When
,
where
Since
, one has
where
is the best unbiased estimator of
when all variances are equal.
If
, i.e.,
and
are adequate approximations of the common variance, then
Thus, when all
are equal, and hyperparameters in (
6) are chosen so that
and
, the variance of
is about
larger than that of
Smaller values of
lead to the larger variance
.
If for some sequence , , the corresponding estimator is asymptotically normal, , albeit at a slower rate than . Therefore, there is no surprise that in the case of (for which , when as , one has n Var(. Indeed, it seems that bears more resemblance to the nonparametric estimates of the location parameter for which the convergence rate is slower than . Numerical experiments suggest that in the normal case, Var.
We summarize now the main results of this section.
Theorem 2. Under the prior distribution (6), the posterior distribution is the product of t-distributions with a degrees of freedom and a t-distribution with degrees of freedom. The approximate variance of the Bayes estimator (7) satisfies (8) with Expression (9) via the Mills ratio. For the remainder of this paper, we will concentrate on the estimator .