Next Article in Journal
The gE-Approximation Property Determined by the Banach Space E = q(p)
Previous Article in Journal
Nonlinear Vibration Control of a High-Dimensional Nonlinear Dynamic System of an Axially-Deploying Elevator Cable
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Estimating Common Mean in Heteroscedastic Variances Model

Department of Mathematics and Statistics, University of Maryland at Baltimore County, Baltimore, MD 21250, USA
Mathematics 2025, 13(8), 1290; https://doi.org/10.3390/math13081290
Submission received: 3 March 2025 / Revised: 26 March 2025 / Accepted: 7 April 2025 / Published: 15 April 2025
(This article belongs to the Section D1: Probability and Statistics)

Abstract

:
Bayes estimators for the unknown mean against a reference, non-informative prior distribution for both the mean and independent variances are derived. I entertain the scenario with two groups of observables with the same unknown mean. The unknown variances of the the first group are not supposed to be equal or to be restricted; the second homeogeneous group of observations all have the same unknown variance. Under the normality condition, these procedures turn out to have a very explicit form of the weighted average with data-dependent weights that admit of a very clear interpretation. The approximate formulas for the variance of the considered estimators and their limiting behavior are also examined. The related “self-dual” orthogonal polynomials and their properties are examined. Recursive formulas for estimators on the basis of these polynomials are developed.

1. Introduction: Missing Uncertainties

Assume that the available data consist of a sequence of independent observations, x j , j = 1 , , n , each having the same mean. We consider the situation where the unknown variances of the observables cannot be supposed to be equal or to be restricted in any way. This scenario is perhaps unusual in the statistical community. However, according to Morris (1983, p. 49) [1], “… almost all applications involve unequal variances”. The problem is to estimate the common mean modeled as a location parameter without traditional conditions on the standard deviations (scale parameters).
This setting appears in instances of heterogeneous research synthesis where x j represents the summary estimate of the common mean (e.g., the treatment effect) obtained by the j-th study. Commonly, the protocol of such studies demands that x j be accompanied by its uncertainty estimate, but sometimes these estimates are either unavailable or cannot be trusted. In many applications, the variances of systematic, laboratory-specific errors cannot be reliably estimated; a scientist cannot place confidence in inferences made under unrealistically low noise. The issue of underreported uncertainties, particularly those that stem from asymptotic normal theory, which presupposes large datasets, is prevalent in many applications. The existing imputation techniques (e.g., Rukhin [2], Templ [3]) may not provide justifiable uncertainty values.
The latest point of view (Spiegelhalter [4]) is that uncertainty is a subjective relationship between an observer and what is observed. The issue of underreported uncertainties, particularly those that stem from asymptotic normal theory, which presupposes large datasets, is prevalent in metrology. The challenge of reproducibility within individual centers may be exacerbated by the nature of the measuring instruments employed, leading to heterogeneous unknown uncertainties (see Possolo [5]). The most striking example is provided by “one-shot” devices in the atomic industry, which are limited to single use.
An additional contemporary illustration is found in citizen science or crowd-sourcing projects, where participants contribute measurement results of various random phenomena, with some of them using relatively imprecise instruments, such as smartphones. These measurements can range from precipitation levels to air quality and biological observations. See Hand [6] for an introduction. Despite anticipated heterogeneity, the project organizers are faced with the task of synthesizing data in the absence of reliable uncertainty ( σ i 2 ) estimates.
Our investigation focuses on Bayes estimators obtained from the posterior distribution for an unknown mean, set against a non-informative, objective, or “uniform” prior distribution for both the mean and independent variances. This line of inquiry, initiated by Rukhin [7] under the assumption of normality, grapples with the complete lack of variance information. Needless to say, this framework introduces several statistical complications. For instance, the classical maximum likelihood estimator becomes undefined, as the likelihood function reaches infinity at each data point. Nevertheless, the problem is well defined, as estimating the common mean requires determining at most n parameters, the mean itself, and the variance ratios, ω i = σ i 2 / j σ j 2 , which belong to the unit simplex of dimension n 1 , i ω i = 1 .
In Section 2.1, we investigate the Bayes estimators in the setting allowing for a group of homeogeneous observations, which have the same unknown variance. Under the normality condition, these procedures turn out to have a surprisingly explicit form. In fact, each of the derived rules is a weighted average with data-dependent weights that are invariant under the location–scale transformations, admitting a very clear interpretation. The approximate formulas for the variance of the considered estimators and their limiting behavior are also examined. Section 3 contains several approaches to the distribution of the Bayes estimator. The orthogonal polynomials are discussed in Section 3.3 with recursive formulas derived in Section 3.4.

2. Non-Informative Priors and Bayes Estimators

Consider the situation where distinct independent observables x j are drawn from a location–scale parameter family with underlying symmetric density p, x j σ j 1 p ( ( · μ ) / σ j ) , j = 1 , , n , which has all necessary moments.
The principal interest is in the mean μ , while σ j , j = 1 , , n are positive nuisance parameters. For this purpose, one needs to estimate the ( n 1 ) -dimensional vector ( ω 1 , , ω n ) , with ω j = σ j 2 / ( j σ j 2 ) and ω j = 1 . If ( w 1 , , w n ) is such an estimator, then one can use j w j x j as a μ -statistic. Indeed, if all scale parameters σ j are known, the best unbiased estimator of μ is the weighted means rule, j ω j x j .
Commonly, the estimated weights are taken to be location-invariant—i.e., for any real c,
w j ( x 1 + c , , x n + c ) = w j ( x 1 , , x n ) .
Then, the corresponding estimator μ ˜ is (location) equivariant,
μ ˜ ( x 1 + c , , x n + c ) = μ ˜ ( x 1 , , x n ) + c .
Most estimators used in practice are also scale-equivariant,
μ ˜ ( a x 1 , , a x n ) = a μ ˜ ( x 1 , , x n ) , a > 0 ,
and this property calls for scale-invariant weights.
In the normal case, the reduction to the invariant procedures leads to an explicit form of the maximum likelihood estimators and of some Bayes procedures.
To eliminate nuisance parameters σ j (or ω j ) , j = 1 , , n , one can use a non-informative prior, which is a classical technique. Under mild regularity conditions on the underlying density p, Rukhin [8] derived the Bayes estimator under the quadratic loss (the posterior mean) against the uniform (reference) prior d μ j d σ i / σ j . This statistic coincides with the Bayes rule within the class of invariant procedures.
The discrete posterior distribution is supported by all data points with probabilities
w i 0 = 1 j i | x i x j | k j k 1 | x k x j | 1
= s i j i ( x i x j ) k j k s k ( x k x ) | 1 .
Here, further,
s i = sign ( j i ( x i x j ) ) , i = 1 , , n ,
denotes the parity of observations.
Thus, the Bayes estimator has a very explicit form:
δ 0 = j w j ( 0 ) x j = j x j i k | x i x j | 1 j i k | x i x j | 1 .
The magnitude of probabilities (1) describes the intrinsic similarity of observations: the weight of a data-point x j is large if it is close to the bulk of data, meaning that i j | x j x i | is relatively small.
Statistic δ 0 also appears in the approximation theory. According to the Tchebycheff interpolation formula, one has
| δ 0 | = min R max i | s i x i R ( x i ) | ,
where R ( x ) runs through polynomials of degree not exceeding n 2 . See Chapter 5 in Trefethen (2013) [9].
The probabilities (1) have their origin in optimization problems involving the discriminant function. Borodin [10] discusses their use in statistical physics. Genest et al. [11] study the remarkable mirror-symmetry (persymmetry) of the underlying Jacobi matrix.

2.1. Heterogeneity and Homogeneity

Here, normality of observations, x j σ j 1 φ ( ( · μ ) / σ j ) , φ ( x )   = exp ( ( x 2 / 2 ) / 2 π , is assumed. We consider the setting where, in addition to x values, there is a possible group of distinct homogeneous data that have the same unknown standard deviation σ , say, x i σ 1 φ ( ( y i μ ) / σ ) , i = n + 1 , , n + m . In the context of citizen science projects mentioned in Section 1, x i , i = n + 1 , , n + m may represent data supplied by smartphone users, while x values correspond to measurements derived by other means. In metrology applications, a known homogeneous group of laboratories employing the same techniques may participate in interlaboratory studies.
Then, one has m + n independent observations and altogether n + 2 unknown parameters μ , σ 1 , , σ n , σ with the main interest in μ .
We start with the traditional reference prior density of the form
π ( μ , σ 1 , , σ n , σ ) = [ j σ j a ] σ b ,
relative to d μ [ j d σ j / σ j ] d σ / σ . Here, a > 1 , b + m > 1 .
For any continuous bounded function h ( μ ) ,
h ( μ ) d μ j 0 φ x j μ σ j d σ j σ j a + 2 0 i φ x i μ σ d σ σ b + m + 2
= [ 0 φ ( u ) u a d u ] n 0 φ ( x ) u b + m d u ( 2 π ) ( m 1 ) / 2 h ( μ ) [ ( x i μ ) 2 ] ( b + m + 1 ) / 2 d μ j | x j μ | a + 1 .
Indeed, for any j = 1 , , n ,
0 φ x j μ σ j d σ j σ j a + 2 = 0 φ ( u ) u a d u | x j μ | a + 1 ,
and
0 i φ x i μ σ d σ σ j b + m + 2 = 0 φ ( u ) | u | b + m d u ( 2 π ) ( m 1 ) / 2 [ ( x i μ ) 2 ] ( b + m + 1 ) / 2 .
For any fixed small ϵ > 0 and fixed j = 1 , , n , provided that all data points are different,
lim a 0 a x j ϵ x j + ϵ h ( μ ) d μ k | x k μ | 1 + a [ i ( x i μ ) 2 ] ( b + m + 1 ) / 2
= 2 h ( x j ) 1 k j n | x j x k | [ i = n + 1 n + m ( x i x j ) 2 ] ( b + m + 1 ) / 2 .
Therefore, for a 0 ,
h ( μ ) d μ j 0 φ x j μ σ j d σ j σ j a + 2 0 i φ x i μ σ d σ σ b + m + 2 d μ j 0 φ x j μ σ j d σ j σ j a + 2 0 i φ x i μ σ d σ σ b + m + 2
j h ( x j ) 1 k j n | x j x k | [ i ( x i x ¯ m ) 2 + m ( x j x ¯ m ) 2 ] ( b + m + 1 ) / 2 1 1 k j n | x j x k | [ i ( x i x ¯ m ) 2 + m ( x j x ¯ m ) 2 ] ( b + m + 1 ) / 2 ,
where x ¯ m = i x i / m .
Thus, we can formulate the first result.
Theorem 1.
Under the prior (3), when a 0 the posterior distribution of μ is discrete with finite support { x 1 , , x n } and the probabilities
w j = 1 [ 1 k j n | x j x k | ] [ 1 + ( x j x ¯ m ) 2 / v 2 ] ] ( b + m + 1 ) / 2
× j k j 1 | x j x k | [ 1 + ( x j x ¯ m ) 2 / v 2 ] ] ( b + m + 1 ) / 2 1 .
Here, x ¯ m = i x i / m , v 2 = i ( x i x ¯ m ) 2 / m are estimators of the common mean and variance based on the homogeneous sub-sample.
The Bayes estimator of μ, i.e., the posterior mean, is
δ = j = 1 n w j x j ,
with w j defined by (4).
The probabilities (4) still describe the intrinsic similarity of observations: the weight of a data-point x j is large if it is close to the greater part of the data. The attenuating factor, [ 1 + ( x j x ¯ m ) 2 / v 2 ] ( b + m + 1 ) / 2 , when b + m + 1 > 0 , encourages x j , which is close to x ¯ m . When the homogeneous data are absent, this factor is 1 and (5) coincides with (2).
In this situation, the posterior mode δ ^ = x ı , if
k : k ı | x ı x k | 1 + ( x ı x ¯ m ) 2 v 2 ( b + m + 1 ) / 2
= min j k : k j | x j x k | 1 + ( x ı x ¯ m ) 2 v 2 ( b + m + 1 ) / 2 ,
presents the maximum likelihood estimator within the class of invariant procedures.
The prior density (3) for the mean and the variances represents the right Haar measure on the group of linear transforms. In the context of the multivariate normal model, it is known as the the Geisser–Cornfield prior. See Geisser [12], Ch 9.1. This prior is known to be an exact frequentist matching prior yielding as the posterior Fisher’s fiducial distribution (Fernandez and Steel [13], Severini et al. [14]).
Despite this fact, “the prior seems to be quite bad for correlations, predictions and other inferences involving a multivariate normal distribution” (Sun and Berger [15]). Its mentioned drawbacks stem from the fact that if n 2 , the marginal (or prior predictive) density does not exist. A related weakness of (5) is its sensitivity to observations which are close one to another.
To mitigate these drawbacks, we look now for other prior distributions.

2.2. Conjugate Priors and Variance Formulas

A wide class of Bayes estimators of μ arises from conjugate prior densities,
π ( μ , σ 1 2 , , σ n 2 , σ 2 ) = exp j s 2 2 σ j 2 t 0 2 2 σ 2 [ i σ i a ] σ b ,
relative to d μ i d σ i 2 d σ 2 . Here, a , b , t 0 2 and s 2 0 are hyperparameters to be specified in (6), a > 1 , b + m > 1 , s 2 0 , t 0 2 0 .
A slightly modified proof of Theorem 1 shows that the posterior distribution of μ under the prior (6) is proportional to
[ j ( s 2 + ( x j μ ) 2 ] ( a + 1 ) / 2 [ t 2 + ( x ¯ m μ ) 2 ] ( b + m + 1 ) / 2 ,
where t 2 = t 0 2 / m + v 2 , which is treated as a constant in the following discussion. The posterior distribution in this situation is the product of t-densities (with a degrees of freedom) and t-density (with b + m degrees of freedom). Thus, it is a particular case of the poly-t distribution, which is ubiquitous in multivariate analysis. It appears in the posterior analysis of linear models (Box and Tiao [16]) and is popular in econometrics (Bauwens [17]).
The Bayes estimator has the form
δ B = μ d μ j [ s 2 + ( x j μ ) 2 ] ( a + 1 ) / 2 [ t 2 + ( x ¯ m μ ) 2 ] ( b + m + 1 ) / 2 d μ j [ s 2 + ( x j μ ) 2 ] ( a + 1 ) / 2 [ t 2 + ( x ¯ m μ ) 2 ] ( b + m + 1 ) / 2 .
If b + m 1 , (7) is the classical Pitman estimator of the location parameter involving t-distributions with a degrees of freedom. It is especially well studied for the Cauchy location/scale parameter family ( a = 1 ).
If, in addition, s 2 = 0 ,
δ B = μ d μ j | x j μ | a + 1 d μ j | x j μ | a + 1 1 ,
which corresponds to the formal Pitman estimator of the location parameter derived from the working family | x μ | a + 1 employed to estimate the location parameter μ when the observations are normal.
Needless to say that the functions in this family are not probability densities. Moreover, they have a singularity of the third kind.
When s 2 and t 2 are fixed positive numbers, the approximate variance of (7) can be found via the usual argument employed for M-estimators, i.e., solutions of the moment-type equation j ψ j ( x j μ ) = 0 (or minimizers of j ρ j ( x j μ ) ). In our case, the contrast functions are,
ψ j ( x j μ ) = ( a + 1 ) ( x j μ ) s 2 + ( x j μ ) 2 , j = 1 , , n ,
ψ ( x ¯ m μ ) = ( b + m + 1 ) ( x ¯ m μ ) t 2 + ( x ¯ m μ ) 2 .
The M-estimator μ ˜ satisfies the equation
j ( a + 1 ) ( x j μ ) s 2 + ( x j μ ) 2 + + ( b + m + 1 ) ( x ¯ m μ ˜ ) t 2 + ( x ¯ m μ ˜ ) 2 = 0 .
According to well-known results (e.g., Huber and Ronchetti [18]),
Var ( μ ˜ ) j E j ψ j 2 ( x j ) + E ψ 2 ( x ¯ m ) j E j ψ j ( x j ) + E ψ ( x ¯ m ) 2
= ( a + 1 ) 2 j E j x j 2 ( s 2 + x j 2 ) 2 + ( b + m + 1 ) 2 E x ¯ m 2 ( t 2 + x ¯ m 2 ) 2
× ( a + 1 ) j E j s 2 x j 2 ( s 2 + x j 2 ) 2 + ( b + m + 1 ) E t 2 x ¯ m 2 t 2 + x ¯ m 2 2 .
Here, E j refers to the expectation evaluated under the normal distribution with zero mean and variance σ j 2 , j = 1 , , n ; the distribution of x ¯ m is also normal with variance σ 2 / m . The main restriction on σ 2 is that the Central Limit Theorem for j ψ j ( x j ) holds. For instance, one can employ (8) if the Liapounov condition for independent non-identically distributed summands is satisfied—i.e., ( j σ j 6 ) 2   = o ( j σ j 4 ) 3 (e.g., Lehmann [19], Theorem 2.7.3).
To simplify (8), we need the known formula for the standard normal Z and positive β ,
E β Z 2 + β 2 = M ( β ) ,
where M ( β ) = [ 1 Φ ( β ) ] / φ ( β ) is the familiar Mills ratio (Stuart and Ord, 1994) [20]. With β j = s / σ j , β n + 1 = m 1 / 2 t / σ , x n + 1 = x ¯ m , the differentiation of this identity shows that for i = 1 , , n + 1 ,
E i x i 2 ( x i 2 + s 2 ) 2 = β i 2 s 2 ( 1 + β i 2 ) M ( β i ) β i ,
and
E i s 2 x i 2 ( x i 2 + s 2 ) 2 = β i 2 s 2 ( 1 β i M ( β i ) ,
where for i = n + 1 one has to replace s 2 by t 2 .
These identities allow for expressing the approximate variance (8) in terms of the Mills ratio:
Var ( μ ˜ ) = { ( a + 1 ) 2 2 s 2 i [ β i ( 1 + β i 2 ) M ( β i ) β i 2 ]
+ ( b + m + 1 ) 2 2 t 2 [ β n + 1 ( 1 + β n + 1 2 ) M ( β n + 1 ) β n + 1 2 ] }
× a + 1 s 2 i β i 2 [ ( 1 β i M ( β i ) ] + b + m + 1 t 2 β n + 1 2 [ ( 1 β n + 1 M ( β n + 1 ) ] 2 .
When σ i 2 σ 2 ,
Var ( μ ˜ ) = σ 2 λ [ ( 1 + β 2 ) M ( β ) β ] β [ 1 β M ( β ) ] 2 ,
where
λ = n ( a + 1 ) 2 s 2 + ( b + m + 1 ) 2 t 2 2 [ n ( a + 1 ) s 2 + ( b + m + 1 ) t 2 ] 2 .
Since λ ( n s 1 + t 1 ) 1 , one has
Var ( μ ˜ ) σ 2 n s 2 + t 2 = Var ( μ ˜ 0 ) ,
where
μ ˜ 0 = x j s 2 + x ¯ t 2 n s 2 + t 2 ,
is the best unbiased estimator of μ when all variances are equal.
If β = 1 , i.e., s 2 and t 2 are adequate approximations of the common variance, then
Var ( μ ˜ ) Var ( μ ˜ 0 ) [ 2 M ( 1 ) 1 ] 2 [ 1 M ( 1 ) ] 2 = 1.31 . . .
Thus, when all σ i 2 are equal, and hyperparameters in (6) are chosen so that s 2 ( x j x ¯ ) 2 / ( n 1 ) and t 2 v 2 , the variance of μ ˜ is about 1.31 larger than that of μ ˜ 0 . Smaller values of s 2 lead to the larger variance μ ˜ .
If s 2 = s 0 2 γ n 0 for some sequence γ n 0 , n γ n 2 , the corresponding estimator μ ˜ = μ ˜ n is asymptotically normal, n γ n ( μ ˜ n μ ) N ( 0 , σ 4 M ( 0 ) / ( 2 s 0 2 ) ) , albeit at a slower rate than n . Therefore, there is no surprise that in the case of δ 0 (for which s 2 = 0 , m = 0 ) , when n β Var ( μ ˜ ) σ 2 M ( 0 ) / 2 as β 0 , one has n Var( δ 0 ) . Indeed, it seems that δ 0 bears more resemblance to the nonparametric estimates of the location parameter for which the convergence rate is slower than n . Numerical experiments suggest that in the normal case, n 1 / 2 Var ( δ 0 ) / log ( n ) π 2 / 8 .
We summarize now the main results of this section.
Theorem 2.
Under the prior distribution (6), the posterior distribution is the product of t-distributions with a degrees of freedom and a t-distribution with b + m degrees of freedom. The approximate variance of the Bayes estimator (7) satisfies (8) with Expression (9) via the Mills ratio.
For the remainder of this paper, we will concentrate on the estimator δ 0 .

3. Distribution of δ 0

3.1. Jacobian and Moments

Let the vector e have unit coordinates w = ( w 1 0 , , w n 0 ) T and x = ( x 1 , , x n ) T representing a random sample. By location equivariance,
k δ 0 x k = e T δ 0 = 1 ,
and by scale equivariance,
k x k δ 0 x k = x T δ 0 = δ 0 .
Define the matrix Q by its elements q i j = k : k i , j ( x i x k ) 1 , i j , q i i = 0 , so that
Q = r e T diag ( r ) Z ,
where the i-th coordinate of the vector r is r i = j : j i ( x i x j ) 1 , and the skew-symmetric matrix Z for i j has elements ( x i x j ) 1 , and a zero diagonal.
Then, the form of the Jacobian J ( w ) with elements w i 0 / x j , i , j = 1 , , n , is
J ( w ) = [ diag ( w ) w w T ] Q T ,
so that tr J ( w ) = w T Q w , and
δ 0 = [ I + Q ( diag ( x ) δ 0 I ) ] w = w [ diag ( r ) + Z ] [ diag ( x ) δ 0 I ] w .
Because of (14), one obtains
w T Q [ diag ( x ) δ 0 I ] w = 1 2 w T w ,
so that
w T δ 0 = 1 2 .
We will get an extension of (15) to higher moments: δ m = x i m w i , m = 1 , 2 , , δ 0 = 1 , δ 1 = δ 0 . It is shown in Rukhin (2023) [8] that for any integer m 1 ,
δ m = [ m diag ( x ) m 1 + Q ( diag ( x ) m δ m I ) ] w
= [ m diag ( x ) m 1 ( diag ( r ) + Z ) ( diag ( x ) m δ m I ) ] w ,
so that
w T δ m = m 2 i ( w i 0 ) 2 x i m 1 + 1 2 p = 0 m 1 δ p δ m 1 p r i ( w i 0 ) 2 x i m .
The coefficients, τ m = 2 w T δ m p = 0 m 1 δ p δ m 1 p , vanish for 0 m 2 n 2 .
Indeed,
w T ( diag ( r ) [ diag ( x m ) δ m I ] w = m 2 i ( w i 0 ) 2 x i m 1 .
Because of Hermite’s (osculatory) interpolation formula, for 0 m 2 n 2 ,
2 i r i x i m w i 2 = m i x i m 1 w i 2 .
Therefore, for these values of m,
2 w T δ m = p = 0 m 1 δ p δ m 1 p .
In particular,
w T ( δ 2 δ 1 2 ) = δ 1 δ 1 = 0 .
Let W = i j i | x i x j | 1 1 , so that the probabilities (4) can be written as w i = j i | x i x j | 1 W . When m = 2 n 1 and τ m = W 2 ,
2 w T δ 2 n 1 = p = 0 2 n 2 δ p δ 2 n 2 p + W 2 .
For any positive integer m, the coefficients τ m determine the asymptotic expansion in x 1 of W 2 i ( x x i ) 2 , which is
W 2 i ( x x i ) 2 = w i 2 ( x x i ) 2 2 r i w i 2 x x i
= m = 1 m i w i 2 x i m 1 x m + 1 2 m = 1 i r i w i 2 x i m x m + 1 = m = 2 n 1 τ m x m + 1 .
For m 2 n 1 , the values τ m can be found from the formula W 2 τ 2 n 1 + m = p 1 + + p n = m 1 n ( p i + 1 ) x i p i .
If s i = sign ( j i ( x i x j ) ) , i = 1 , , n , is the parity of x i , then
1 n ( x x k ) = P e ( x ) P o ( x ) = i : s i = 1 ( x x i ) j : s j = 1 ( x x j )
is a product of two polynomials of degrees N e = N o = n / 2 , n (even), or N e = ( n + 1 ) / 2 = N o + 1 , n (odd); w i = W / [ P o ( x i ) P e ( x i ) ] if s i = 1 ; if s j = 1 , w j = W / [ P e ( x j ) P o ( x j ) ] .
According to the classical Lagrange interpolation formula, one has
1 1 n ( x x k ) = k 1 ( x x k ) j k ( x k x j ) .
For non-negative integer m, define κ m = i s i w i x i m . Then, the same formula implies that for any m , m = 0 , 1 , , n 2 ,
κ m = W 1 n x i m j i ( x i x j ) = 0 .
The coefficients κ m determine the asymptotic expansion in x 1 of W [ 1 n ( x x k ) ] 1 = κ m x m 1 .
For m n 1 , the values κ m can be found from the formula W 1 κ n 1 + m = p 1 + + p n = m i x i p i , e.g., κ n 1 = W , κ n = W x k . With E r , r = 0 , 1 , , n , denoting elementary symmetric functions, one obtains for a positive integer m
κ n + m 1 = E 1 κ n + m 2 E 2 κ n + m 3 + + ( 1 ) n E n κ m 1 .
Comparison to (17) shows that
m = 2 n 1 τ m x m + 1 = m = n 1 κ m x m + 1 2 ,
which means that for any m,
2 w T δ m = p = 0 m 1 δ p δ m 1 p + p = 0 m 1 κ p κ m 1 p .
Furthermore, one has
δ m + κ m W = 2 i : s i = 1 x i m j i ( x i x j ) ,
and
δ m κ m W = 2 i : s i = 1 x i m j i ( x i x j ) .
I will now formulate the results.
Theorem 3.
Formulas (17) and (19)–(21) hold for δ m = i w i x i m and κ m = i s i w i x i m , m = 0 , 1 ,
The matrix d i a g ( x ) [ d i a g ( r ) + Z ] is diagonalizable with the eigenvalues n j , j = 1 , , n .
Proof. 
Let
V = 1 1 1 1 x 1 x 2 x n 1 x n x 1 n 1 x 2 n 1 x n 1 n 2 x n n 1
represent the Vandermonde matrix.
We prove that
V diag ( x ) [ diag ( r ) + Z ] = Λ V .
Here, Λ is a lower triangular matrix with the elements λ j j = n j , λ i j = k x k i j , i > j , λ i j = 0 , i < j , i , j = 1 , , n .
Indeed, the elements of the matrix on the left-hand side are
( V diag ( x ) [ diag ( + Z ] ) p k = t : t k x k p x t p x k x t = t : t k i = 0 p 1 x k i x t p 1 i
= i = 0 p 2 x k i t x t p 1 i + ( n p ) x k p 1 = t λ p t x k t 1 , 1 p , k n .
Since all eigenvalues of diag ( x ) [ diag ( r ) + Z ] are distinct, it is diagonalizable. □

3.2. Differential Equations and Integration by Parts

Let L = L ( x 1 , , x n ) = log i < j | x j x i | , and
F = F ( x 1 , , x n ) = log n 1 k i < j , i , j k | x i x j | ,
where x 1 , , x n form a standard normal random sample.
Then, exp { 2 L } = i j : j i | x i x i | , F ( n 2 ) L / n . In this notation, e T L = e T F = 0 , W = n 1 exp { L F } , x T L = n ( n 1 ) / 2 , x T F = ( n 1 ) ( n 2 ) / 2 , and
w T F = w T Q w = w T } = w T L .
According to the celebrated Selberg formula for z > 2 / n ,
E exp { z L ( x 1 , , x n ) } = k = 1 n Γ ( z k / 2 + 1 ) Γ ( z / 2 + 1 ) ,
which implies that
E exp { F ( x 1 , , x n ) } = k = 1 n 1 Γ ( k / 2 + 1 ) Γ ( 3 / 2 ) .
See Lu and Richards [21] for several results related to uses of the Selberg formula in statistics. In particular, these authors elaborate the Central Limit Theorem for L expressed as a U-statistic,
1 n 2 L n + C n 2 N 0 , π 2 18 ,
where C = 0.577 is Euler’s constant.
To simplify the formula for the quadratic risk of δ 0 , we use integration by parts:
E ( δ 0 ) 2 = j E δ 0 w j x j = j E ( δ 0 w j ) x j = j E δ 0 x j w j + w j x j δ 0 ,
so that
E ( δ 0 ) 2 = 1 2 E ( w T r ) δ 0 .
Let q ( t ) denote the density of the estimator δ 0 . This density exists and is differentiable as δ is the sum of two independent random variables: δ 0 x ¯ and normally distributed x ¯ . Clearly, q ( t ) = q ( t ) , and
q ( t ) q ( t ) = n E ( x ¯ | δ 0 = t ) .
This identity holds since the score function of δ 0 is the conditional expected value of the score function of x ¯ for given δ 0 . It also follows from (10) in the same way as the next formula follows from (11).
For any z, x T e z δ 0 = z e z δ 0 δ 0 , so that
E e z δ 0 ( i x i 2 n ) = z E e z δ 0 δ 0 ,
or
E ( i x i 2 | δ 0 = t ) = n 1 t q ( t ) q ( t ) = n 1 + t E ( i x i | δ 0 = t ) .
More generally,
e n z 2 / 2 q ( t z ) q ( t ) = E e z i x i | δ 0 = t ,
so that q ( t ) / q ( t ) = E [ ( i x i ) 2 | δ 0 = t ] n .
Similarly, it follows from (15) that
E w T e z δ 0 = E e z δ 0 [ δ 0 tr J ( w ) ] = E e z δ 0 ( δ 0 + w T r ) = z E z δ 0 2 ,
which means that
q ( t ) q ( t ) = 2 [ t + E ( w T r | δ 0 = t ) ] .
Since w T r = w T Q w = w T L , for any differentiable bounded function g ( t ) and any z,
g ( t ) ( w T e z L ) = z g ( t ) ( w T r ) e z L .
Integrating by parts, one obtains for z > 2 / n
z E g ( δ 0 ) ( w T r ) e z L = E g ( δ 0 ) ( w T e z L ) = E [ g ( δ 0 ) tr J ( w ) w T g + δ 0 g ( δ 0 ) ] e z L ,
or
E e z L [ δ 0 g ( δ 0 ) g ( δ 0 ) / 2 ] = ( z 1 ) E e z L g ( δ 0 ) ( w T r ) .
It follows that
q ( t ) 2 q ( t ) + δ E ( e z L | δ 0 = t ) + d 2 d t E ( e z L | δ 0 = t ) = ( z 1 ) E ( e z L w T r | δ 0 = t ) .
By putting z = 1 , one obtains
q ( t ) = E e L π e t 2 E ( e L | δ 0 = t ) = 1 π k = 1 n Γ ( k / 2 + 1 ) Γ ( 3 / 2 ) e t 2 E ( e L | δ 0 = t ) .
One can show that E ( e L | δ 0 ) / E e L = E ( e F | δ 0 ) / E e F by applying the above argument to w T r = w T F . Indeed,
d 2 d t E ( e z F | δ 0 = t ) E ( w T r | δ 0 = t ) E ( e z F | δ 0 = t ) = ( z 1 ) E ( e z F w T r | δ 0 = t ) ,
so that
2 E ( w T r | δ 0 = t ) = d d t log E ( e F | δ 0 = t ) = q ( t ) q ( t ) 2 t .
Theorem 4.
Formulas (22) and (23) hold for the density q of the estimator δ 0 .
For 2 / n < z < 1 ,
E ( w T r | δ 0 = t ) E ( e ( 1 z ) F + z L | δ 0 = t ) = d 2 d t E ( e ( 1 z ) F + z L | δ 0 = t ) ,
which means that
E ( e F | δ 0 , W ) E e F = E ( e F | δ 0 ) E ( e F | W ) .
To proceed, we need some facts about orthogonal polynomials with regard to (random) weights w .

3.3. Random Orthogonal Polynomials

Using notation from the previous section for m = 1 , , n , we define Hankel moment matrices as follows:
1 δ 1 δ m δ m δ m + 1 δ 2 m .
Their determinants G m satisfy the condition G m = W 2 m n + 2 G n m 2 . This fact is due to self-duality of the weights w (Rukhin, 2023 [8]). Then, the sequence h m = j [ T m ( x j ) ] 2 w j , 0 h 0 = 1 is such that h m = G m / G m 1 , h m = W 2 / h n m 1 .
We consider the sequence of monic polynomials T m , m = 0 , 1 , , n , T 1 ( x ) = 0 , T 0 ( x ) = 1 , T 1 ( x ) = x δ 1 , , T n ( x ) = ( x x k ) = P e ( x ) P o ( x ) , which are orthogonal in the space L 2 of all functions over { x 1 , , x n } . They are known to satisfy the three-term recurrence:
T m + 1 ( z ) = ( x α m ) T m ( z ) β m T m 1 ( z ) ,
where β m = h m / h m 1 , β m = β n m , and α m = t x t [ T m ( x t ) ] 2 w t / h m , α m = α n 1 m . Clearly, α m depends on δ 1 , , δ 2 m + 1 only, while β m is determined by the first 2 m moments, δ 1 , , δ 2 m . For example, α 0 = δ 1 , α 1 = ( δ 3 δ 1 δ 2 + δ 1 3 ) / ( δ 2 δ 1 2 ) , β 0 = 0 , β 1 = δ 2 δ 1 2 .
Formula (24) shows that
T m ( x i ) = s i h m T n m 1 ( x i ) / W .
Thus, if n is odd, then T ( n 1 ) / 2 = P o , and h ( n 1 ) / 2 = W . Polynomials P e and P o are orthogonal for any n.
The polynomial T n 1 ( x ) = i w i 0 j i ( x x j ) is less deviant from zero in L : T n 1 ( x i ) = s i W . Therefore, any monic polynomial R ( x ) of degree not exceeding n 1
W max i | R ( x i ) | .
The ratio T n 1 ( z ) / T n ( z ) gives the Stiltjes transform of M,
T n 1 ( z ) = i w i 0 T n ( z ) x i = w T T n ( z ) .
In addition to x 1 , , x n , the polynomial T n 1 2 ( z ) W 2 has n 2 real roots (which interlace those of T n 1 ), with R n 1 denoting the monic polynomial of degree n 2 , which has the following roots:
T n 1 2 ( z ) W 2 = T n ( z ) R n 1 ( z ) .
It follows that R n 1 coincides with the associated polynomial T n 1 :
R n 1 ( z ) = i [ T n 1 ( z ) T n 1 ( x i ) ] w i z x i .
Associated with T m , the orthogonal monic polynomial R m , m = 0 , 1 , , n 1 has degree m 1 . These polynomials satisfy the same recurrence (refrek) but with different initial conditions— R 0 = 0 and R 1 = 1 —so that R 2 ( z ) = z α 1 , T n 1 ( z ) = ( z δ 1 ) R n 1 ( z ) β 1 R n 2 ( z ) .
According to (17),
R n 1 ( z ) = 2 i w i 0 T n 1 ( z ) x i = 2 w T T n 1 ( z ) .
Since
T n 1 ( x i ) = s i W w i 0 j i w i 0 + w j 0 x i x j ,
one has
R n 1 ( x i ) = 2 T n 1 ( x i ) w i 0 = 2 s i W j : s j = s i w j 0 x i x j .
One can represent R n 1 ( z ) = R e ( n ) ( z ) R o ( n ) ( z ) as a product of two monic polynomials (with real roots) of degrees N e 1 and N o 1 respectively. Then, T n 1 ( z ) W = P e ( z ) R o ( n ) ( z ) and T n 1 ( z ) + W = P o ( z ) R e ( n ) ( z ) , so that if s i = 1 , R e ( n ) ( x i ) = 2 W / P o ( x i ) , and if s j = 1 , R o ( n ) ( x j ) = 2 W / P e ( x j ) ,
R e ( n ) ( z ) = 2 W s i = 1 s k = 1 , k i ( z x k ) P o ( x i ) P e ( x i ) ,
and
R o ( n ) ( z ) = 2 W s j = 1 s = 1 , j ( z x ) P e ( x j ) P o ( x j ) .
Clearly,
2 W T n ( z ) = R e ( n ) ( z ) P e ( z ) R o ( n ) ( z ) P o ( z ) ,
and
2 T n 1 ( z ) T n ( z ) = R e ( n ) ( z ) P e ( z ) + R o ( n ) ( z ) P o ( z ) .
To specify coefficients of R e ( n ) ( z ) , we use the identity
s k = 1 , k i ( z x k ) = z N e 1 ( E 1 x i ) z N e 2 +
+ ( 1 ) N e 1 E N e 1 + ( 1 ) N e 2 E N e 2 x i + + x i N e 1 ,
where E r = E r ( n ) , r = 0 , 1 , , N e 1 denotes the elementary symmetric function based on x i , s i = 1 . Because of (18) for m N o 1 ,
R e ( n ) ( z ) = z N e 1 ( E 1 δ 1 ) z N e 2 + + r = 0 N e 1 ( 1 ) r E r δ N e 1 r ,
so that
z R e ( n ) ( z ) P e ( z ) = δ 1 z N e 1 + ( δ 2 E 1 δ 1 ) z N e 2 + +
r = 0 N e 2 ( 1 ) r E r δ N e 1 r z + ( 1 ) N e 1 E N e .
Similarly, one obtains formulas for R o ( n ) and P o via the elementary symmetric functions D r = D r ( n ) , r = 0 , , N o 1 based on x j , s j = 1 .
If x i , x i < x i + 1 , i = 1 , , n 1 , represent order statistics, then R o ( 3 ) ( z ) = 1 ,
R e ( 4 ) ( z ) = z E 1 + E 2 D 2 E 1 D 1 ,
R o ( 4 ) ( z ) = z D 1 + E 2 D 2 E 1 D 1 .
The ratio R e ( n ) ( z ) / P e ( z ) coincides with the Stiltjes transform of the discrete measure defined by the weights { 2 w i 0 , s i = 1 } ; R e ( n ) ( z ) is associated with P e = P e ( n ) . Thus, R e ( n ) ( z ) / P e ( z ) can be written as a finite continued fraction whose coefficients can be found from the three-term recurrence (24) for orthogonal polynomials on { x i , s i = 1 } for this measure. Similar facts hold for P o = P o ( n ) and the probability distribution given by { 2 w j 0 , s j = 1 } .
I now summarize the obtained results.
Theorem 5.
Let T m = T m ( n ) , m = 0 , 1 , , n , be monic polynomials which are orthogonal under the norm defined by { w j 0 } . Then, (25) holds. For the polynomial associated with T n 1 , R n 1 ( z ) = R e ( n ) ( z ) R o ( n ) ( z ) , (27) and (28) are valid. The polynomials R e ( n ) and R o ( n ) defined by (29) and (30) satisfy (31)–(33).

3.4. Main Representation

The polynomials R e ( n 1 ) allow for expressing δ m = δ m ( n ) , m < N e ( n ) , as a rational function of x n = x ( n ) = max t x t , namely,
δ m ( n ) W ( n ) = 1 W ( n 1 ) x n m R e ( n 1 ) ( x n ) P e ( n 1 ) ( x n ) r = 0 m 1 δ r ( n 1 ) x n m 1 r .
To stress dependence on n, we write N e = N e ( n ) , N o = N o ( n ) , W = W ( n ) , and s j = s j ( n ) . The functions on the right-hand side of (34) correspond to the sample of size n 1 obtained by deleting x n from the original dataset.
For the reduced sample, N e ( n 1 ) = N o ( n ) , P e ( n 1 ) ( x ) = P o ( n ) ( x ) ; if originally s j = s j ( n ) = 1 , then P o ( n 1 ) ( x j ) = P e ( n ) ( x j ) / ( x n x j ) , s j ( n 1 ) = 1 . Therefore, because of (18),
1 W = R e ( n 1 ) ( x n ) W ( n 1 ) P e ( n 1 ) ( x n ) .
For fixed n and 1 m N e 1 ,
z m R e ( n ) ( z ) P e ( n ) ( z ) = r = 0 m 1 δ r z m 1 r + 2 W s i = 1 x i m ( z x i ) P o ( x i ) P e ( x i ) ,
implying (34).
Since P e ( 1 ) ( z ) = z x 1 , P e ( 2 ) ( z ) = z x 2 , R e ( 1 ) ( z ) = R e ( 2 ) ( z ) = 1 , the repeated use of (35) gives
1 2 W = p = 0 n 2 R e ( n p 1 ) ( x n p ) P e ( n p 1 ) ( x n p ) = p = 0 n 2 R e ( n p 1 ) ( x n p ) | S | .
Here,
S = S ( n ) = Res ( P e , P o ) = s i = 1 , s j = 1 ( x i x j ) = s i = 1 P o ( x i ) = s j = 1 P e ( x j ) ,
is the resultant of polynomials P e and P o (which are supposed not to have common roots). One can check by using (31) that
S = A ( z ) P e ( z ) + B ( z ) P o ( z ) ,
where the degree of polynomial A ( z ) = [ S / ( 2 W ) ] R o ( n ) ( z ) is N o 1 and the degree of B ( z ) = [ S / ( 2 W ) ] R e ( n ) ( z ) is N e 1 .
Actually, (34) and (35) hold if x n is replaced by any x k , s k = 1 , in which case R e ( n 1 ) refers to the monic polynomial of degree N o 1 = N e ( n 1 ) 1 proportional to
s j = 1 j , s = 1 z x x j x 1 i k , s i = 1 ( x j x i ) .
It corresponds to the sample of size n 1 obtained by removing x k , when P e ( n 1 ) ( x k ) becomes P o ( n ) ( x k ) = s j = 1 ( x k x j ) .
In an alternative form of (34) for x , s = 1 , the monic polynomial is proportional to
s i = 1 k i , s k = 1 z x k x i x k 1 j , s j = 1 ( x i x j )
with s i = 1 ( x x i ) = P e ( n ) ( x k ) instead of P o ( n 1 ) ( x ) .
Our goal is to prove the following representation of δ m .
Theorem 6.
For any integer m , 0 m < N e ( n ) , n 3 , with S defined in (36),
δ m = 2 W | S | 1 K m ( x 1 , , x n ) .
Here, K m = K m ( n ) of x 1 , , x n represents a homogeneous symmetric function of degree m + p n —where p n = ( n 2 ) 2 / 4 , n even and p n = ( n 1 ) ( n 3 ) / 4 , n odd—which is a linear combination of products of homogeneous symmetric polynomials in { x i , s i = 1 , i = 1 , , N e } and { x j , s j = 1 , j = 1 , , N o } with integer coefficients. K m ( x 1 , , x n ) = ( 1 ) m K m ( x 1 , , x n ) if n 4 , with K m ( x , , x ) = 0 . The recursive Formula (40) relates K m ( n ) ( x 1 , , x n ) to K r ( n 1 ) ( x 1 , , x n 1 ) , r = 0 , , m 1 , based on the reduced sample x t , 1 t n 1 . One has
i K m ( x 1 , , x n ) x i = K m 1 ( x 1 , , x n ) ,
and
i x i K m ( x 1 , , x n ) x i = ( m + p n ) K m ( x 1 , , x n ) .
Proof. 
As was already noticed, for the subsample x t , t = 1 , , n 1 , the polynomial P e ( n 1 ) ( z ) coincides with P o ( z ) = P o ( n ) ( z ) . The corresponding result is S ( n 1 ) = S ( n ) / j , s = 1 ( x x n ) = ( 1 ) N o 1 S ( n ) / P e ( n 1 ) ( x n ) .
Therefore, (35) implies that
K 0 ( n ) = | S | 2 W ( n ) = R e ( n 1 ) ( x n ) K 0 ( n 1 ) ,
where the coefficients of the polynomial R e ( n 1 ) depend only on x 1 , , x n 1 , and K p ( n 1 ) are evaluated on the sample of size n 1 obtained by removing x k (with s t ( n 1 ) = s t ( n ) for t k ). The homogeneity degree of K 0 is N e ( n ) 1 + N e ( n 1 ) 1 + + N e ( 2 ) 1 = p n = ( N e ( n ) 1 ) ( N o ( n ) 1 ) . If s k = 1 , then
K 0 ( n ) = q + r N o 1 ( 1 ) N o 1 q r D N o 1 q r x r ,
Because of (34) for 1 m N e ( n ) 1 ,
K m ( n ) = K 0 ( n 1 ) [ x n m R e ( n 1 ) ( x n ) x n m 1 P e ( n 1 ) ( x n ) ]
P e ( n 1 ) ( x n ) r = 1 m 1 K r ( n 1 ) x n m 1 r .
If K ( n ) = ( K 0 ( n ) , , K N e 1 ( n ) T is an N e -dimensional vector, then (39) and (40) mean that K ( n ) = A ( n ) K ( n 1 ) where A ( n ) is a matrix of size N e × N o with elements
A m p ( n ) = r = m N o 1 + m p D N o 1 + m p x r , p m N o 1 r = m p 1 m 1 D N o 1 + m p x r , 0 m p 1 .
The induction assumption that for p < n all K m ( p ) , m N e ( p ) 1 , can be represented as linear combinations of products of homogeneous symmetric polynomials in { x i , s i ( p ) = 1 } and { x j , s j ( p ) = 1 } with integer coefficients implies that K m ( n ) , m N e 1 has the claimed properties. Namely, it is homogeneous of the stated degree, it is symmetric in { x i , s i = 1 , i = 1 , , N e } , and { x j , s j = 1 , j = 1 , , N o } , and it can be written as a linear combination of products of homogeneous symmetric polynomials in these variables with integer coefficients.
To complete the proof, note that K m ( x 1 + c , , x n + c ) = K m ( x 1 , , x n ) + + c m K 0 ( x 1 , , x n ) , so that (37) follows while (38) holds because of homogeneity. □
In particular, with non-negative and shift-invariant K 0 specified in (39) as a product of polynomials (29) evaluated at successive order statistics, one obtains the representation of (2) as follows:
δ 0 = δ 1 ( n ) = x n P e ( n 1 ) ( x n ) R e ( n 1 ) ( x n )
with polynomials R e ( n 1 ) and P e ( n 1 ) based on x i , i = 1 , , n 1 , for which s i ( n 1 ) = 1 .
Theorem 6 shows that δ 1 has some resistance to extreme observations,
lim x n δ 1 ( n ) ( x 1 , , x n ) = δ 1 ( n 1 ) ( x 1 , , x n 1 ) ,
and
lim x 1 δ 1 ( n ) ( x 1 , , x n ) = δ 1 ( n 1 ) ( x 2 , , x n ) .
Here are examples of Ks for smaller values of n given in terms of elementary symmetric functions:
n = 3 , p 3 = 0 , K 0 ( x 1 , x 2 , x 3 ) 1 , K 1 ( x 1 , x 2 , x 3 ) = x 2 ;
n = 4 , p 4 = 1 , K 0 ( x 1 , , x 4 ) = E 1 ( 4 ) D 1 ( 4 ) ,
K 1 ( x 1 , , x 4 ) = E 2 ( 4 ) D 2 ( 4 ) ,
K 2 ( x 1 , , x 4 ) = E 2 ( 4 ) D 1 ( 4 ) D 2 ( 4 ) E 1 ( 4 ) ;
n = 5 , p 5 = 2 , K 0 ( x 1 , , x 5 ) = D 2 ( 5 ) [ D 1 ( 5 ) ] 2 + E 1 ( 5 ) D 1 ( 5 ) E 2 ( 5 ) ,
K 1 ( x 1 , , x 5 ) = D 2 ( 5 ) D 1 ( 5 ) D 2 ( 5 ) E 1 ( 5 ) + E 3 ( 5 ) .
K 2 ( x 1 , , x 5 ) = D 2 ( 5 ) E 2 ( 5 ) D 1 ( 5 ) E 3 ( 5 ) D 2 ( 5 ) ;
n = 6 , p 5 = 4 , K 0 ( x 1 , , x 6 ) = E 3 ( E 1 + D 1 ) + E 2 ( E 1 D 1 + 2 D 2 D 1 2 ) + E 1 D 1 D 2 ( E 2 D 2 ) 2 ,
K 1 ( x 1 , , x 6 ) = E 3 ( E 2 + E 1 D 1 + D 2 D 1 2 ) E 2 D 2 ( E 1 2 2 D 2 ) D 3 + E 1 D 1 D 3 D 2 D 3 ,
K 2 ( x 1 , , x 6 ) = E 3 E 2 D 2 E 3 2 D 1 + E 3 D 3 ( E 1 + D 1 ) E 2 D 3 ( E 2 D 2 ) E 1 D 3 2 .

4. Conclusions

The present work allows for obtaining meaningful consensus values under unreliable or missing uncertainties by using Bayes estimators (5) or (7). This method leads to mathematically challenging properties of self-dual weights (1) and their extension (4). The orthogonal polynomials that involve rank parity exhibit fascinating symmetry.
The approach explores non-traditional statistical estimation in the absence of variance information. The recursive algorithm detailed in Theorem 6 may be useful for practical calculations. The paper hints at new mathematical findings and sets the stage for a detailed exploration of a novel statistical methodology for estimating common parameters in the face of variance heterogeneity.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The author declares no conflicts of interest.

References

  1. Morris, C.N. Parametric empirical Bayes inference: Theory and applications. J. Am. Stat. Assoc. 1983, 78, 47–65. [Google Scholar] [CrossRef]
  2. Rukhin, A.L. Estimating heterogeneity variances to select a random effects model. J. Stat. Plan. Inference 2019, 202, 1–13. [Google Scholar] [CrossRef]
  3. Templ, M. Enhancing precision in large scale data-analysis: An innovative robust imputation algorithm for managing outliers and missing values. Mathematics 2023, 11, 2729. [Google Scholar] [CrossRef]
  4. Spiegelhalter, D. The Art of Uncertainty; Norton: New York, NY, USA, 2025. [Google Scholar]
  5. Possolo, A. Measurement science meets the reproducibility challenge. Metrologia 2022, 80, 044002. [Google Scholar] [CrossRef]
  6. Hand, E. Citizen science: People power. Nature 2010, 466, 685–687. [Google Scholar] [CrossRef] [PubMed]
  7. Rukhin, A.L. Estimation of the common mean from heterogeneous normal observations with unknown variances. J. R. Stat. Soc. Ser. B 2017, 79, 1601–1618. [Google Scholar] [CrossRef]
  8. Rukhin, A.L. Orthogonal polynomials for self-dual weights. J. Approx. Theory 2023, 288, 105865. [Google Scholar] [CrossRef]
  9. Trefethen, L.N. Approximation theory and approximation practice. SIAM Rev. 2013, 46, 501–517. [Google Scholar]
  10. Borodin, A. Duality of orthogonal polynomials on a finite set. J. Statist. Phys. 2002, 109, 1109–1120. [Google Scholar] [CrossRef]
  11. Genest, V.; Tsujimoto, S.; Vinet, L.; Zhedanov, A. Persymmetric Jacobi matrices, isospectral deformations and orthogonal polynomials. J. Math. Anal. Appl. 2017, 450, 915–928. [Google Scholar] [CrossRef]
  12. Geisser, S. Predictive Inference: An Introduction; Chapman & Hall: New York, NY, USA, 1993. [Google Scholar]
  13. Fernandez, C.; Steel, M.F.J. Reference priors for the general location-scale model. Stat. Probab. Lett. 1999, 43, 377–384. [Google Scholar] [CrossRef]
  14. Severini, T.A.; Mukherjea, R.; Ghosh, M. On an exact probability matching property of right-invariant priors. Biometrika 2002, 89, 952–957. [Google Scholar] [CrossRef]
  15. Sun, D.; Berger, J.O. Objective Bayesian analysis for the multivariate normal model. In Bayesian Statistics 8; University Press: Oxford, UK, 2007; pp. 525–562. [Google Scholar]
  16. Box, G.; Tiao, G. Bayesian Inference in Statistical Analysis, 2nd ed.; Wiley: New York, NY, USA, 1992. [Google Scholar]
  17. Bauwens, L. Bayesian Full Information Analysis of Simultaneous Equation Models Using Integration by Monte Carlo; Springer: Berlin/Heidelberg, Germany, 1994. [Google Scholar]
  18. Huber, P.J.; Ronchetti, E.M. Robust Statistics, 2nd ed.; Wiley: New York, NY, USA, 2009. [Google Scholar]
  19. Lehmann, E. Elements of Large-Sample Theory; Springer: New York, NY, USA, 1999. [Google Scholar]
  20. Stuart, A.; Ord, J.K. Kendall’s Advanced Theory of Statistics, 6th ed.; E. Arnold: London, UK, 1994; Volume 1. [Google Scholar]
  21. Lu, I.-L.; Richards, D. Random discriminants. Ann. Stat. 1993, 21, 1982–2000. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Rukhin, A.L. Estimating Common Mean in Heteroscedastic Variances Model. Mathematics 2025, 13, 1290. https://doi.org/10.3390/math13081290

AMA Style

Rukhin AL. Estimating Common Mean in Heteroscedastic Variances Model. Mathematics. 2025; 13(8):1290. https://doi.org/10.3390/math13081290

Chicago/Turabian Style

Rukhin, Andrew L. 2025. "Estimating Common Mean in Heteroscedastic Variances Model" Mathematics 13, no. 8: 1290. https://doi.org/10.3390/math13081290

APA Style

Rukhin, A. L. (2025). Estimating Common Mean in Heteroscedastic Variances Model. Mathematics, 13(8), 1290. https://doi.org/10.3390/math13081290

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop