Two New Families of Local Asymptotically Minimax Lower Bounds in Parameter Estimation

Merhav, Neri

doi:10.3390/e26110944

Open AccessArticle

Two New Families of Local Asymptotically Minimax Lower Bounds in Parameter Estimation

by

Neri Merhav

The Viterbi Faculty of Electrical and Computer Engineering, Technion—Israel Institute of Technology, Technion City, Haifa 3200003, Israel

Entropy 2024, 26(11), 944; https://doi.org/10.3390/e26110944

Submission received: 19 September 2024 / Revised: 29 October 2024 / Accepted: 4 November 2024 / Published: 4 November 2024

(This article belongs to the Collection Feature Papers in Information Theory)

Download Versions Notes

Abstract

:

We propose two families of asymptotically local minimax lower bounds on parameter estimation performance. The first family of bounds applies to any convex, symmetric loss function that depends solely on the difference between the estimate and the true underlying parameter value (i.e., the estimation error), whereas the second is more specifically oriented to the moments of the estimation error. The proposed bounds are relatively easy to calculate numerically (in the sense that their optimization is over relatively few auxiliary parameters), yet they turn out to be tighter (sometimes significantly so) than previously reported bounds that are associated with similar calculation efforts, across many application examples. In addition to their relative simplicity, they also have the following advantages: (i) Essentially no regularity conditions are required regarding the parametric family of distributions. (ii) The bounds are local (in a sense to be specified). (iii) The bounds provide the correct order of decay as functions of the number of observations, at least in all the examples examined. (iv) At least the first family of bounds extends straightforwardly to vector parameters.

Keywords:

parameter estimation; minimax estimation; mean-square error; hypothesis testing; probability of error; Fisher information

1. Introduction

The theory of parameter estimation consists of a very large plethora of lower bounds (as well as upper bounds), which characterize the fundamental performance limits of any estimator in a given parametric model. In this context, it is common to distinguish between Bayesian bounds (see, e.g., the Bayesian Cramér–Rao bound [1], the Bayesian Bhattacharyya bound, the Bobrovsky–Zakai bound [2], the Bellini–Tartara bound [3], the Chazan–Zakai–Ziv bound [4], the Weiss–Weinstein bound [5,6], and more (see [7] for a comprehensive overview), and non-Bayesian bounds, where in the former, the parameter to be estimated is considered a random variable with a given probability law, as opposed to the latter, where it is assumed an unknown deterministic constant. The category of non-Bayesian bounds is further subdivided into two subclasses; one is associated with local bounds that hold for classes of estimators with certain limitations, such as unbiased estimators (see, e.g., the Cramér–Rao bound [8,9,10,11,12], the Bhattacharyya bound [13], the Barankin bound [14], the Chapman–Robbins bound [15], the Fraser–Guttman bound [16], the Keifer bound [17], and more), and the other is the subclass of minimax bounds (see, e.g., Ziv and Zakai [18], Hajek [19], Le Cam [20], Assouad [21], Fano [22], Lehmann (Sections 4.2–4.4 in [23]), Nazin [24], Yang and Barron [25], Guntuboyina [26,27], Kim [28], and many more).

In this paper, we focus on the minimax approach, and more concretely, on the local minimax approach. According to the minimax approach, we are given a parametric family of probability density functions (or probability mass functions, in the discrete case),

{p (x_{1}, \dots, x_{n} | θ), (x_{1}, \dots, x_{n}) \in R^{n}, θ \in Θ}

, where

θ

is a d-dimensional parameter vector,

Θ \subseteq R^{d}

is the parameter space, n is a positive integer designating the number of observations, and we define a loss function,

l (θ, {\hat{θ}}_{n})

, where

{\hat{θ}}_{n}

is an estimator, which is a function of the observations

x_{1}, \dots, x_{n}

only. The minimax performance is defined as

R_{n} (Θ) \overset{▵}{=} inf_{{\hat{θ}}_{n} (\cdot)} sup_{θ \in Θ} E_{θ} {l (θ, {\hat{θ}}_{n})},

(1)

where

E_{θ}

denotes expectation w.r.t.

p (\cdot | θ)

. As customary, we consider here loss functions with the property that

l (θ, {\hat{θ}}_{n})

depends on

θ

and

{\hat{θ}}_{n}

only via their difference, that is,

l (θ, {\hat{θ}}_{n}) = ρ (θ - {\hat{θ}}_{n})

, where the function

ρ (\cdot)

satisfies certain assumptions (see Section 2). The local asymptotic minimax performance at the point

θ \in Θ

is defined as follows (see also, e.g., [19]). Let

{ζ_{n}^{*}, n \geq 1}

be a positive sequence, tending to infinity, with the property that

r (θ) \overset{▵}{=} lim_{δ ↓ 0} \underset{n \to \infty}{lim inf} inf_{{\hat{θ}}_{n} (\cdot)} sup_{{θ^{'} : ∥ θ^{'} - θ ∥ \leq δ}} ζ_{n}^{*} \cdot E_{θ^{'}} {ρ ({\hat{θ}}_{n} - θ^{'})}

(2)

is a strictly positive finite constant. Then, we say that

r (θ)

is the local asymptotic minimax performance with respect to (w.r.t.)

{ζ_{n}^{*}}

at the point

θ \in Θ

. Roughly speaking, the significance is that the performance of a good estimator,

{\hat{θ}}_{n}

, at

θ

is about

E_{θ} {ρ (θ - {\hat{θ}}_{n})} \approx r (θ) / ζ_{n}^{*}

. For example, in the scalar mean square error (MSE) case, where

ρ (ε) = ε^{2}

, and where the observations are Gaussian, i.i.d., with mean

θ

and known variance

σ^{2}

, it is actually shown in Example 2.4, p. 257 in [23] that

r (θ) = σ^{2}

w.r.t.

ζ_{n}^{*} = n

, for all

θ \in R

, which is attained by the sample mean estimator,

{\hat{θ}}_{n} = \frac{1}{n} \sum_{i = 1}^{n} x_{i}

.

Our focus in this work is on the derivation of some new lower bounds that are as follows: (i) essentially free of regularity conditions on the smoothness of the parametric family

{p (\cdot | θ), θ \in Θ}

, (ii) relatively simple and easy to calculate, at least numerically, which amounts to the property that the bound contains only a small number of auxiliary parameters to be numerically optimized (typically, no more than two or three parameters), (iii) tighter than earlier reported bounds that are associated with similar calculation efforts as described in (ii), and (iv) lend themselves to extensions that yield even stronger bounds (albeit with more auxiliary parameters to be optimized), as well as extensions to vector parameters. We propose two families of lower bounds on

R_{n} (Θ)

, along with their local versions, of bounding

r (θ)

, with the four above-described properties. The first applies to any convex, symmetric loss function

ρ (\cdot)

, whereas the second is more specifically oriented to the moments of the estimation error,

ρ (ε) = {| ε |}^{t}

, where t is a positive real, not necessarily an integer, with special attention devoted to the MSE case,

t = 2

. For the sake of simplicity and clarity of the exposition, in the first two main sections of the paper (Section 3 and Section 4), our focus is on the case of a scalar parameter, as most of our examples are associated with the scalar case. In Section 5, we extend some of our findings to the vector case.

To put this work in the perspective of earlier work on minimax estimation, we next briefly review some of the basic approaches in this problem area. Admittedly, considering the vast amount of literature on the subject, our review below is by no means exhaustive. For a more comprehensive review, the reader is referred to Kim [28].

First, observe the simple fact that the minimax performance is lower bounded by the Bayesian performance of the same loss function (see, e.g., [1,2,3,4,5,6,7]) for any prior on the parameter,

θ

, and so, every lower bound on the Bayesian performance is automatically a valid lower bound also on the minimax performance (see Section 4.2 in [23]). Indeed, in Section 2.3 in [26], it is argued that the vast majority of existing minimax lower bounding techniques are based upon bounding the Bayes risk from below w.r.t. some prior. Many of these Bayesian bounds, however, are subjected to certain restrictions and regularity conditions concerning the smoothness of the prior and the family of densities,

{p (\cdot | θ), θ \in Θ}

.

Dating back to Ziv and Zakai’s 1969 article [18] on parameter estimation, applied mostly in the context of time-delay estimation, this prior puts all its mass equally on two values,

θ_{0}

and

θ_{1}

, of the parameter

θ

, and considering an auxiliary hypothesis testing problem of distinguishing between the two hypotheses,

H_{0} : θ = θ_{0}

and

H_{1} : θ = θ_{1}

with equal priors (see Section 2 for exact definitions). A simple argument regarding the sub-optimality of a decision rule that is based on estimating

θ

and deciding on the hypothesis with the closer value of

θ

, combined with Chebychev’s inequality, yields a simple lower bound on the corresponding Bayes risk, and hence also the minimax risk, in terms of the probability of error of the optimal decision rule. Five years later, Bellini and Tartara [3], and then independently, Chazan, Zakai, and Ziv [4], improved the bound of [18] using somewhat different arguments and obtained Bayesian bounds that apply to the uniform prior. These bounds are also given in terms of the error probability pertaining to the optimal maximum a posteriori (MAP) decision rule of binary hypothesis testing with equal priors, but this time, it had an integral form. These bounds were demonstrated to be fairly tight in several application examples, but they are rather difficult to calculate in most cases. Shortly before the Bellini–Tartara and the Chazan–Zakai–Ziv articles were published, Le Cam [20] proposed a minimax lower bound, which is also given in terms of the error probability associated with binary hypothesis testing, or equivalently, the total variation between

p (\cdot | θ_{0})

and

p (\cdot | θ_{1})

, under the postulate that the loss function

l (\cdot, \cdot)

is a metric. We will refer to Le Cam’s bound in a more detailed manner later, in the context of our first proposed bound.

A decade later, Assouad [21] extended Le Cam’s two-point testing bound to multiple points, where instead of just two test points,

θ_{0}

and

θ_{1}

, as before, there are, more generally, m test points,

θ_{0}, θ_{1}, \dots, θ_{m - 1}

, and correspondingly, the auxiliary hypothesis testing problem consists of m hypotheses,

H_{i} : θ = θ_{i}

,

i = 0, 1, \dots, m - 1

, with certain priors (again, to be defined in Section 2). Based on those test points, Assouad devised the so-called hypercube method. Another related bounding technique that is based on multiple test points, and referred to as Fano’s method, amounts to further bounding from below the error probability of multiple hypotheses using Fano’s inequality (see Section 2.10 in [29]). Considering the large number of auxiliary parameters to be optimized when multiple hypotheses are present, these bounds demand heavy computational efforts. Also, Fano’s inequality is often loose, even though it is adequate enough for the purpose of proving converse-to-coding theorems in information theory [29]. In later years, Le Cam [30] and Yu [31] extended Le Cam’s original approach to apply to testing mixtures of densities. More recently, Yang and Barron [25] related the minimax problem to the metric entropy of the parametric family,

{p (\cdot | θ_{,} θ \in Θ}

, and Cai and Zhou [32] combined Le Cam’s and Assouad’s methods by considering a larger number of dimensions. Guntuboyina [26,27] pursued a different direction by deriving minimax lower bounds using f-divergences.

The outline of this article is as follows. In Section 2, we define the problem setting, provide a few formal definitions along with background, and establish the notation. In Section 3, we develop the first family of bounds, and in Section 4, we present the second family, both for the scalar parameter case. Finally, in Section 5, we extend some of our findings to the vector case.

2. Problem Setting, Background, Definitions, and Notation

As outlined in the last paragraph of the Introduction, Section 3 and Section 4 are on the scalar parameter case, whereas in Section 5, we extend some of the results to the case of a vector parameter with dimension d. In order to avoid repetition, we formalize the problem and define the notation here for the more general vector case, with the understanding that for the scalar case, all definitions remain the same, except that they are confined to the special case of

d = 1

.

Consider a family of probability density functions (PDFs),

{p (\cdot | θ), θ \in Θ}

, where

θ

is a parameter vector of dimension d to be estimated, and

Θ \subseteq R^{d}

is the parameter space. We denote by

E_{θ} {\cdot}

the expectation operator w.r.t.

p (\cdot | θ)

. Let X

= (X_{1}, \dots, X_{n})

be a random vector of observations, governed by

p (\cdot | θ)

for some

θ \in Θ

. The support of

p (\cdot | θ)

is assumed

X^{n}

, the nth Cartesian power of the alphabet

X

of each component

X_{i}

,

i = 1, \dots, n

. The alphabet

X

may be a finite set, a countable set, a finite interval, an infinite interval, or the entire real line. In the first two cases, the PDFs should be understood to be replaced by probability mass functions and integrations over the observation space should be replaced by summations. A realization of X will be denoted by

x = (x_{1}, \dots, x_{n}) \in X^{n}

.

An estimator,

{\hat{θ}}_{n}

, is given by any function of the observation vector,

g_{n} : X^{n} \to Θ

, that is,

{\hat{θ}}_{n} = g_{n} [x]

. Since X is random, so is the estimate

g_{n} [X]

, as well as the estimation error,

ε_{n} [X] = {\hat{θ}}_{n} - θ = g_{n} [X] - θ

. We associate with every possible vector value,

ϵ

, of

ε_{n} [X]

a certain loss (or “cost”, or “price”),

ρ (ϵ)

, where

ρ (\cdot)

is a non-negative function with the following properties: (i) monotonically non-increasing in each component,

ϵ_{i}

, of

ϵ

, wherever

ϵ_{i} < 0

,

i = 1, 2, \dots, d

, (ii) monotonically non-decreasing in each component,

ϵ_{i}

, of

ϵ

, wherever

ϵ_{i} \geq 0

,

i = 1, 2, \dots, d

, and (iii)

ρ (0) = 0

.

Referring to Section 3 and Section 4, which deals with the scalar case (

d = 1

), we further adopt the following assumptions regarding the loss function

ρ

. In Section 3, we assume that

ρ (\cdot)

is as follows: (iv) convex, and (v) symmetric, i.e.,

ρ (- ϵ) = ρ (ϵ)

for every

ϵ

. In Section 4, we assume, more specifically, that

ρ (ε) = {| ε |}^{t}

, where t is a positive constant, not necessarily an integer. This is a special case of the class of loss functions considered in Section 3, except when

t \in (0, 1)

, in which case,

{| ε |}^{t}

is a concave (rather than a convex) function of

ε

. In Section 5, for the vector extension of the results of Section 3, the above-mentioned symmetry assumption (v) is extended to become radial symmetry; namely,

ρ (ϵ)

will be assumed to depend on

ϵ

only via its Euclidean norm,

∥ ϵ ∥

.

The expected cost of an estimator

g_{n}

at a point

θ \in Θ

, is defined as

R_{n} (θ, g_{n}) \overset{▵}{=} E_{θ} {ρ (g_{n} [X] - θ)} .

(3)

The global minimax performance is defined as

R_{n} (Θ) \overset{▵}{=} inf_{g_{n}} sup_{θ \in Θ} R_{n} (θ, g_{n}) .

(4)

Another related notion is that of local asymptotic minimax performance, defined in Section 1, and repeated here for the sake of completeness. Let

{ζ_{n}^{*}, n \geq 1}

be a positive sequence, tending to infinity, with the property that

r (θ)

, defined as in (2), is a strictly positive finite constant. Then, we say that

r (θ)

is the local asymptotic minimax performance w.r.t.

{ζ_{n}^{*}}

at

θ \in Θ

. The sequence

{1 / ζ_{n}^{*}}

is referred to as the convergence rate of the minimax estimator.

Similarly, as in earlier articles on minimax lower bounds, many of our proposed bounds are given in terms of the error probability pertaining to certain auxiliary hypothesis testing problems that are associated with two or more test points in the parameter space, and the choice of those test points is subjected to optimization. We therefore provide a few definitions, notation conventions, and background associated with elementary hypothesis testing. For further details, the reader is referred to any one of many textbooks that cover the topic, for example, Van Trees (see Sections 2.2 and 2.3 in [1]), Helstrom (see Chapter III in [33]), or Whalen (see Chapter 5 in [34]).

For given

θ_{0}

and

θ_{1}

, both in

Θ

, consider the problem of deciding between two possible hypotheses regarding the probability distribution that governs the given vector of observations, X. Under hypothesis

H_{0}

, X is governed by

p (\cdot | θ_{0})

, and under hypothesis

H_{1}

, X is governed by

p (\cdot | θ_{1})

. Suppose also that the a priori probability of

H_{0}

to be the actual underlying hypothesis is

\Pr {H_{0}} = q

, where

0 \leq q \leq 1

is given, and so the a priori probability of

H_{1}

is the complementary probability,

\Pr {H_{1}} = 1 - q

. When

q = \frac{1}{2}

, we say that the priors are equal; otherwise, the priors are unequal. A decision rule

Ω = (Ω_{0}, Ω_{1})

is a partition of the observation space,

X^{n}

, into two disjoint decision regions,

Ω_{0} \subseteq X^{n}

and its complementary region,

Ω_{1} = Ω_{0}^{c}

: Given that

X = x \in Ω_{0}

, we decide in favor of

H_{0}

; otherwise, we decide in favor of

H_{1}

. The probability of error, associated with the hypotheses

H_{0}

and

H_{1}

, referring to test points

θ_{0}

and

θ_{1}

, respectively, with priors q and

1 - q

, respectively, when using the decision rule

Ω

, is defined as

P_{e} (q, θ_{0}, θ_{1}, Ω) = q \cdot \int_{Ω_{1}} p (x | θ_{0}) d x + (1 - q) \cdot \int_{Ω_{0}} p (x | θ_{1}) d x .

(5)

A well-known elementary result in decision theory asserts that the optimal decision rule,

Ω^{*} = (Ω_{0}^{*}, Ω_{1}^{*})

, in the sense of minimizing

P_{e} (q, θ_{0}, θ_{1}, Ω)

, is given by

\begin{matrix} Ω_{0}^{*} & = & {x \in X^{n} : q \cdot p (x | θ_{0}) \geq (1 - q) \cdot p (x | θ_{1})} \end{matrix}

(6)

\begin{matrix} Ω_{1}^{*} & = & {x \in X^{n} : q \cdot p (x | θ_{0}) < (1 - q) \cdot p (x | θ_{1})}, \end{matrix}

(7)

where it should be pointed out that attributing the case of a tie,

q \cdot (x | θ_{0}) = (1 - q) \cdot p (x | θ_{1})

, to

Ω_{0}^{*}

is completely arbitrary, and could have been attributed alternatively to

Ω_{1}^{*}

without affecting the probability of error. Here and in the sequel, the minimum probability of error, associated with

Ω^{*}

, namely,

P_{e} (q, θ_{0}, θ_{1}, Ω^{*})

, will be denoted more simply by

P_{e} (q, θ_{0}, θ_{1})

. On substituting

Ω^{*}

into the above definition of the probability of error, we obtain

\begin{matrix} P_{e} (q, θ_{0}, θ_{1}) & = & q \cdot \int_{{x : q \cdot p (x | θ_{0}) < (1 - q) \cdot p (x | θ_{1})}} p (x | θ_{0}) d x + \\ (1 - q) \cdot \int_{{x : q \cdot p (x | θ_{0}) \geq (1 - q) \cdot p (x | θ_{1})}} p (x | θ_{1}) d x \\ = & \int_{{x : q \cdot p (x | θ_{0}) < (1 - q) \cdot p (x | θ_{1})}} q \cdot p (x | θ_{0}) d x + \\ \int_{{x : q \cdot p (x | θ_{0}) \geq (1 - q) \cdot p (x | θ_{1})}} (1 - q) \cdot p (x | θ_{1}) d x \\ = & \int_{{x : q \cdot p (x | θ_{0}) < (1 - q) \cdot p (x | θ_{1})}} min {q \cdot p (x | θ_{0}), (1 - q) p (x | θ_{1})} d x + \\ \int_{{x : q \cdot p (x | θ_{0}) \geq (1 - q) \cdot p (x | θ_{1})}} min {q \cdot p (x | θ_{0}), (1 - q) \cdot p (x | θ_{1})} d x \\ = & \int_{X^{n}} min {q \cdot p (x | θ_{0}), (1 - q) \cdot p (x | θ_{1})} d x \\ = & 1 - \int_{X^{n}} max {q \cdot p (x | θ_{0}), (1 - q) \cdot p (x | θ_{1})} d x . \end{matrix}

(8)

The expression of the second to the last line will appear in some of our lower bounds in the sequel and will be recognized and interpreted as

P_{e} (q, θ_{0}, θ_{1})

. The expression of the last line is the one that extends to multiple hypothesis testing, as will be detailed below.

The auxiliary problem of binary hypothesis testing extends from two hypotheses to a general number, m, of hypotheses, associated with m test points,

θ_{0}, θ_{1}, \dots, θ_{m - 1}

, in the following manner. Under hypothesis

H_{i}

, the observation vector X is governed by

p (\cdot | θ_{i})

and the a priori probability of

H_{i}

being the underlying true hypothesis,

\Pr {H_{i}}

, is denoted

q_{i}

,

i = 0, 1, \dots, m - 1

, where

q_{0}, q_{1}, \dots, q_{m - 1}

are given non-negative numbers summing to unity. Here, a decision rule

Ω = (Ω_{0}, Ω_{1}, \dots, Ω_{m - 1})

is a partition of

X^{n}

into m disjoint regions such that if

x \in Ω_{i}

, we decide in favor of

H_{i}

,

i = 0, 1, \dots, m - 1

. The probability of error associated with

Ω

,

θ_{0}, θ_{1}, \dots, θ_{m - 1}

and

q = (q_{0}, q_{1}, \dots, q_{m - 1})

is defined as

P_{e} (q, θ_{0}, θ_{1}, \dots, θ_{m - 1}, Ω) = \sum_{i = 0}^{m - 1} q_{i} \int_{Ω_{i}^{c}} p (x | θ_{i}) d x,

(9)

where

Ω_{i}^{c}

is complementary to

Ω_{i}

. The optimal MAP decision rule

Ω^{*}

selects the hypothesis

H_{i}

whose index i maximizes the product

q_{i} \cdot p (x | θ_{i})

among all

i \in {0, 1, \dots, m - 1}

, for the given x, where ties are broken arbitrarily. The probability of error, associated with

Ω^{*}

, denoted

P_{e} (q, θ_{0}, θ_{1}, \dots, θ_{m - 1})

, is well known to be given by

P_{e} (q, θ_{0}, θ_{1}, \dots, θ_{m - 1}) = 1 - \int_{X^{n}} max {q_{0} \cdot p (x | θ_{0}, q_{1} \cdot p (x | θ_{1}), \dots, q_{m - 1} \cdot p (x | θ_{m - 1})} d x .

(10)

Note that for

m > 2

, this is different from the expression

\int_{X^{n}} min {q_{0} \cdot p (x | θ_{0}, q_{1} \cdot p (x | θ_{1}), \dots, q_{m - 1} \cdot p (x | θ_{m - 1})} d x,

as the latter can be interpreted as the probability that the index i of the true hypothesis

H_{i}

minimizes (rather than maximizes) the product

q_{i} p (x | θ_{i})

over

i \in {0, 1, \dots, m - 1}

for the given x. Imagine an observer that, upon observing a realization x of the random vector X, creates a list of indices

{i}

with the k largest values of

q_{i} p (x | θ_{i})

for some

k < m

, and an error is defined as the event where the correct i is not in that list. This is referred to as a list error, which is a term borrowed from the fields of coded communication and information theory. The last expression is the probability of list error for

k = m - 1

. We will encounter this expression later in certain versions of our lower bound. This completes the background needed about hypothesis testing.

As described in Section 1, our objective in this work is to derive relatively simple and easily computable lower bounds to

r (θ)

, which are as tight as possible. While many existing lower bounds in the literature are satisfactory in terms of yielding the correct rate of convergence,

1 / ζ_{n}^{*}

, here we wish to improve the bound on the constant factor,

r (θ)

. Many of our examples involve numerical calculations which include optimization over auxiliary parameters and occasionally also numerical integrations. All these calculations were carried out using MATLAB R2019a (9.6.0.1072779) 64-bit (win64).

3. Lower Bounds for Convex Symmetric Loss Functions

As explained earlier, here and in Section 4, we consider a scalar parameter, namely,

d = 1

.

Theorem 1.

Let the assumptions of Section 2 be satisfied for

d = 1

and let

ρ (\cdot)

be a symmetric convex loss function. Then,

R_{n} (Θ) \geq sup_{θ_{0}, θ_{1} \in Θ} \{2 \cdot ρ (\frac{θ_{1} - θ_{0}}{2}) \cdot sup_{0 \leq q \leq 1} P_{e} (q, θ_{0}, θ_{1})\},

(11)

where

P_{e} (q, θ_{0}, θ_{1})

is defined as in Equation (8).

Proof of Theorem 1.

For every

θ_{0}, θ_{1} \in Θ

and

q \in [0, 1]

,

\begin{matrix} R_{n} (Θ) & \geq & q E_{θ_{0}} {ρ (g_{n} [X] - θ_{0})} + (1 - q) E_{θ_{1}} {ρ (g_{n} [X] - θ_{1})} \\ \overset{(a)}{=} & \int_{R^{n}} [q \cdot p (x | θ_{0}) ρ (g_{n} [x] - θ_{0}) + \\ (1 - q) \cdot p (x | θ_{1}) ρ (θ_{1} - g_{n} [x])] d x \\ \geq & 2 \cdot \int_{R^{n}} min {q \cdot p (x | θ_{0}), (1 - q) \cdot p (x | θ_{1})} \times \\ [\frac{1}{2} ρ (g_{n} [x] - θ_{0}) + \frac{1}{2} ρ (θ_{1} - g_{n} [x])] d x \\ \overset{(b)}{\geq} & 2 \cdot \int_{R^{n}} min {q \cdot p (x | θ_{0}), (1 - q) \cdot p (x | θ_{1})} \times \\ ρ (\frac{g_{n} [x] - θ_{0}}{2} + \frac{θ_{1} - g_{n} [x]}{2}) d x \\ = & 2 \cdot ρ (\frac{θ_{1} - θ_{0}}{2}) \cdot \int_{R^{n}} min {q \cdot p (x | θ_{0}), (1 - q) \cdot p (x | θ_{1})} d x \\ = & 2 \cdot ρ (\frac{θ_{1} - θ_{0}}{2}) \cdot P_{e} (q, θ_{0}, θ_{1}), \end{matrix}

(12)

where (a) is due to the assumed symmetry of

ρ (\cdot)

and (b) is by its assumed convexity. Since the inequality,

R_{n} (Θ) \geq 2 \cdot ρ (\frac{θ_{1} - θ_{0}}{2}) \cdot P_{e} (q, θ_{0}, θ_{1})

(13)

applies to every

θ_{0}, θ_{1} \in Θ

and

q \in [0, 1]

, it applies, in particular, also to the supremum over these auxiliary parameters. This completes the proof of Theorem 1. □

Before we proceed, two comments are in order:

Note that $P_{e} (q, θ_{0}, θ_{1})$ is a concave function of q for fixed $(θ_{0}, θ_{1})$ , as it can be presented as the minimum among a family of affine functions of q, given by ${min}_{Ω} [q \cdot P (Ω | θ_{0}) + (1 - q) \cdot P (Ω^{c} | θ_{1})]$ , where $Ω$ runs over all possible subsets of the observation space, $X^{n}$ . Another way to see why this is true is by observing that $P_{e} (q, θ_{0}, θ_{1})$ is given in the second to the last line of (8) by an integral, whose integrand, $min {q \cdot p (x | θ_{0}), (1 - q) \cdot p (x | θ_{1})}$ , is concave in q. Clearly, $P_{e} (q, θ_{0}, θ_{1}) = 0$ whenever $q = 0$ or $q = 1$ . Thus, $P_{e} (q, θ_{0}, θ_{1})$ is maximized by some q between 0 and 1. If $P_{e} (q, θ_{0}, θ_{1})$ is strictly concave in q, then the maximizing q is unique.
Note that the lower bound (11) is tighter than the lower bound of $ρ (\frac{θ_{1} - θ_{0}}{2}) P_{e} (\frac{1}{2}, θ_{0}, θ_{1})$ , which was obtained in Equations (6)–(9a) in [18], both because of the factor of 2 and because of the freedom to optimize q rather than setting $q = 1 / 2$ . In a further development of [18] the factor of 2 was accomplished too, but at the price of assuming that the density of the estimation error is symmetric about the origin (see discussion after (10) therein), which limits the class of estimators to which the bound applies. The factor of 2 and the degree of freedom q are also the two ingredients that make the difference between (11) and the lower bound due to Le Cam [20] (see also [26,28]). In Chapter 2 in [26] Guntuboyina reviews standard bounding techniques, including those of Le Cam, Assouad, and Fano. In particular, in Example 2.3.2 therein, Guntiboyina presents a lower bound in terms of the error probability associated with general priors, given by $\frac{η}{2} \cdot P_{e} (q, θ_{0}, θ_{1})$ , where in the case of two hypotheses, $η = {min}_{ϑ} {ρ (θ_{0} - ϑ) + ρ (θ_{1} - ϑ)}$ , in our notation. Now, if $ρ$ is symmetric and monotonically non-decreasing in the absolute error, then the minimizing $ϑ$ is given by $\frac{θ_{0} + θ_{1}}{2}$ , which yields $\frac{η}{2} = ρ (\frac{θ_{1} - θ_{0}}{2})$ and so, again, the resulting bound is of the same form as (11) except that it lacks the prefactor of 2.

Our first example demonstrates Theorem 1 on a somewhat technical but simple model, with an emphasis on the point that the optimal q may differ from

1 / 2

and that it is therefore useful to maximize w.r.t. q in order to improve the bound relative to the choice

q = 1 / 2

.

Example 1.

Let X be a random variable distributed exponentially according to

p (x | θ) = θ e^{- θ x}, x \geq 0,

(14)

and

Θ = {1, 2}

, so that the only possibility to select two different values of θ in the lower bound are

θ_{0} = 1

and

θ_{1} = 2

. In terms of the hypothesis testing problem pertaining to the lower bound, the likelihood ratio test (LRT) is by comparison of

q e^{- x}

to

(1 - q) \cdot 2 e^{- 2 x}

. Now, if

2 (1 - q) \leq q

, or equivalently,

1 \geq q \geq \frac{2}{3}

, the decision is always in favor of

H_{0}

, and then

P_{e} (q, 1, 2) = 1 - q

. For

0 \leq q < \frac{2}{3}

, the optimal LRT compares X to

x_{0} (q) = ln \frac{2 (1 - q)}{q}

. If

X > x_{0} (q)

, one decides in favor of

H_{0}

; otherwise, one decides in favor of

H_{1}

. Thus,

\begin{matrix} P_{e} (q, 1, 2) & = & q \int_{0}^{x_{0} (q)} e^{- x} d x + (1 - q) \int_{x_{0} (q)}^{\infty} 2 e^{- 2 x} d x \\ = & q [1 - e^{- x_{0} (q)}] + (1 - q) e^{- 2 x_{0} (q)} \\ = & q [1 - \frac{q}{2 (1 - q)}] + (1 - q) {[\frac{q}{2 (1 - q)}]}^{2} \\ = & \frac{q (4 - 5 q)}{4 (1 - q)} . \end{matrix}

(15)

In summary,

P_{e} (q, 1, 2) = \{\begin{matrix} \frac{q (4 - 5 q)}{4 (1 - q)} & 0 \leq q < \frac{2}{3} \\ 1 - q & \frac{2}{3} \leq q \leq 1 \end{matrix}

(16)

It turns out that for

q = \frac{1}{2}

,

P_{e} (\frac{1}{2}, 1, 2) = \frac{3}{8} = 0.375

, whereas the maximum is

0.382

, attained at

q = 1 - \frac{\sqrt{5}}{5} = 0.5528

. Thus,

R_{n} (Θ) \geq 2 \cdot 0.382 \cdot ρ (\frac{2 - 1}{2}) = 0.764 \cdot ρ (\frac{1}{2}) .

(17)

This concludes Example 1.

In the above example, we considered just one observation,

n = 1

. From now on, we will refer to the case where

n ≫ 1

. In particular, the following simple corollary to Theorem 1 yields a local asymptotic minimax lower bound.

Corollary 1.

For a given

θ \in Θ

and a constant s, let

{ξ_{n}}_{n \geq 1}

denote a positive sequence tending to zero with the property that

lim_{n \to \infty} max_{q} P_{e} (q, θ, θ + 2 s ξ_{n})

(18)

exists and is given by a strictly positive constant, which will be denoted by

P_{e}^{\infty} (θ, s)

. Also, let

ω (s) \overset{▵}{=} lim_{u \to 0} \frac{ρ (s \cdot u)}{ρ (u)} .

(19)

Then, the local asymptotic minimax performance w.r.t.

ζ_{n} = 1 / ρ (ξ_{n})

is lower bounded by

r (θ) \geq sup_{s \in R} \{2 ω (s) \cdot P_{e}^{\infty} (θ, s)\} .

(20)

Corollary 1 is readily obtained from Theorem 1 by substituting

θ_{0} = θ

and

θ_{1} = θ + 2 s ξ_{n}

in Equation (11), then multiplying both sides of the inequality by

ζ_{n} = 1 / ρ (ξ_{n})

, and finally, taking the limit inferior of both sides.

Next, we study a few examples of the use of Corollary 1. As in Example 1, we emphasize again in Example 2 below the importance of having the degree of freedom to maximize over the prior q rather than to fix

q = \frac{1}{2}

. Also, in all the examples that were examined, the rate of convergence,

1 / ζ_{n} = ρ (ξ_{n})

, is the same as the optimal rate of convergence,

1 / ζ_{n}^{*}

. In other words, it is tight in the sense that there exists an estimator (for example, the maximum likelihood estimator) for which

R_{n} (θ, g_{n})

tends to zero at the same rate. In some of these examples, we compare our lower bound to

r (θ)

to those of earlier reported results on the same models.

Example 2.

Let

X_{1}, \dots, X_{n}

be independently, identically distributed (i.i.d.) random variables, uniformly distributed in the range

[0, θ]

. In the corresponding hypothesis testing problem of Theorem 1, the hypotheses are

θ = θ_{0}

and

θ = θ_{1} > θ_{0}

with priors, q and

1 - q

. There are two cases: If

q / θ_{0}^{n} < (1 - q) / θ_{1}^{n}

, or equivalently,

q < θ_{0}^{n} / (θ_{0}^{n} + θ_{1}^{n})

, one decides always in favor of

H_{1}

and so, the probability of error is q. If, on the other hand,

q / θ_{0}^{n} > (1 - q) / θ_{1}^{n}

, namely,

q > θ_{0}^{n} / (θ_{0}^{n} + θ_{1}^{n})

, we decide in favor of

H_{1}

whenever

{max}_{i} X_{i} > θ_{0}

and then an error occurs only if

H_{1}

is true, yet

{max}_{i} X_{i} < θ_{0}

, which happens with probability

(1 - q) {(\frac{θ_{0}}{θ_{1}})}^{n}

. Thus,

P_{e} (q, θ_{0}, θ_{1}) = \{\begin{matrix} q & q < \frac{θ_{0}^{n}}{θ_{0}^{n} + θ_{1}^{n}} \\ (1 - q) {(\frac{θ_{0}}{θ_{1}})}^{n} & q \geq \frac{θ_{0}^{n}}{θ_{0}^{n} + θ_{1}^{n}} \end{matrix} = min \{q, (1 - q) {(\frac{θ_{0}}{θ_{1}})}^{n}\},

(21)

which is readily seen to be maximized by

q = \frac{θ_{0}^{n}}{θ_{0}^{n} + θ_{1}^{n}}

and then

max_{q} P_{e} (q, θ_{0}, θ_{1}) = \frac{θ_{0}^{n}}{θ_{0}^{n} + θ_{1}^{n}} .

(22)

Now, to apply Corollary 1, we let

θ_{0} = θ

and

θ_{1} = θ_{0} (1 + 2 σ / n)

, which amounts to

s = θ_{0} σ = θ σ

and

ξ_{n} = 1 / n

. Then,

P_{e}^{\infty} (θ, s) = lim_{n \to \infty} \frac{1}{1 + {[1 + 2 s / (θ n)]}^{n}} = \frac{1}{1 + e^{2 s / θ}} .

(23)

In the case of the MSE criterion,

ρ (ε) = ε^{2}

, we have

ω (s) = s^{2}

, and so,

r (θ) \geq sup_{s \geq 0} \frac{2 s^{2}}{1 + e^{2 s / θ}} = θ^{2} \cdot sup_{u \geq 0} \frac{u^{2}}{2 (1 + e^{u})} \approx 0.2414 θ^{2}

(24)

w.r.t.

ζ_{n} = 1 / ρ (ξ_{n}) = 1 / {(1 / n)}^{2} = n^{2}

. This bound will be further improved upon in Section 4.

If instead of maximizing w.r.t. q, we select

q = 1 / 2

, then

P_{e} (\frac{1}{2}, θ, θ (1 + \frac{2 σ}{n})) = \frac{1}{2} \cdot {[\frac{θ}{θ (1 + 2 σ / n)}]}^{n} \to \frac{1}{2} \cdot e^{- 2 σ} = \frac{1}{2} \cdot e^{- 2 s / θ},

(25)

and then the resulting bound would become

r (θ) \geq sup_{s \geq 0} s^{2} e^{- 2 s / θ} = θ^{2} \cdot sup_{u \geq 0} \frac{u^{2} e^{- u}}{4} \approx 0.1353 θ^{2}

(26)

w.r.t.

ζ_{n} = n^{2}

. Therefore, the maximization over q plays an important role here in terms of tightening the lower bound to

r (θ)

.

More generally, for

ρ (ε) = {| ε |}^{t}

(

t \geq 1

),

ω (s) = {| s |}^{t}

, and we obtain

r (θ) \geq θ^{t} \cdot sup_{u \geq 0} \frac{2 u^{t}}{1 + e^{2 u}},

(27)

w.r.t.

ζ_{n} = n^{t}

, where the supremum, which is in fact a maximum, can always be calculated numerically for every given t. For large t, the maximizing u is approximately

t / 2

, which yields

r (θ) \geq \frac{{(t θ)}^{t}}{2^{t - 1} (1 + e^{t})} .

(28)

On the other hand, for

q = 1 / 2

, we end up with

r (θ) \geq sup_{s \geq 0} s^{t} e^{- 2 s / θ} = {(\frac{t θ}{2 e})}^{t} .

(29)

For large t, the bound of

q = 1 / 2

is inferior to the bound with the optimal q, by a factor of about

1 / 2

. This concludes Example 2.

Example 3.

Let

X_{1}, X_{2}, \dots, X_{n}

be i.i.d. random variables, uniformly distributed in the interval

[θ, θ + 1]

. For the hypothesis testing problem, let

θ_{1}

be chosen between

θ_{0}

and

θ_{0} + 1

. Clearly, if

{min}_{i} X_{i} < θ_{1}

, the underlying hypothesis is certainly

H_{1}

. Likewise, if

{max}_{i} X_{i} > θ_{0} + 1

, the decision is in favor of

H_{0}

with certainty. Thus, an error can occur only if all

{X_{i}}

fall in the interval

[θ_{1}, θ_{0} + 1]

, an event that occurs with probability

{(θ_{0} + 1 - θ_{1})}^{n}

. In this event, the best to do is to select the hypothesis with the larger prior with a probability of error given by

min {q, 1 - q}

. Thus,

P_{e} (q, θ_{0}, θ_{1}) = {(θ_{0} + 1 - θ_{1})}^{n} \cdot min {q, 1 - q},

(30)

and so,

max_{q} P_{e} (q, θ_{0}, θ_{1}) = \frac{1}{2} {[1 - (θ_{1} - θ_{0})]}^{n},

(31)

achieved by

q = 1 / 2

. Now, let us select

ξ_{n} = 1 / n

, which yields

lim_{n \to \infty} max_{q} P_{e} (q, θ_{0}, θ_{1}) = lim_{n \to \infty} \frac{1}{2} \cdot {(1 - \frac{2 s}{n})}^{n} = \frac{1}{2} \cdot e^{- 2 s} .

(32)

For

ρ (ε) = {| ϵ |}^{t}

, (

t \geq 1

), we have

\begin{matrix} r (θ) \geq sup_{s \geq 0} 2 s^{t} \cdot \frac{1}{2} e^{- 2 s} = sup_{s \geq 0} s^{t} e^{- 2 s} = {(\frac{t}{2 e})}^{t} \end{matrix}

(33)

w.r.t.

ζ_{n} = 1 / {| 1 / n |}^{t} = n^{t}

. For the case of MSE,

t = 2

,

r (θ) \geq e^{- 2} \approx 0.1353

. The constant

0.1353

should be compared with

\frac{1 - 1 / \sqrt{2}}{128} \approx 0.0023

(see Example 4.9 in [35]), which is two orders of magnitude smaller. This concludes Example 3.

Example 4.

Let

X_{i} = θ + Z_{i}

, where

{Z_{i}}

are i.i.d., Gaussian random variables with zero mean and variance

σ^{2}

. Here, for the corresponding binary hypothesis testing problem, the optimal value of q is always

q^{*} = \frac{1}{2}

. This can be readily seen from the concavity of

P_{e} (q, θ_{0}, θ_{1})

in q and its symmetry around

q = 1 / 2

, as

P_{e} (q, θ_{0}, θ_{1}) = P_{e} (1 - q, θ_{0}, θ_{1})

. Since

\begin{matrix} P_{e} (\frac{1}{2}, θ_{0}, θ_{1}) & = & P r \{\sum_{i = 1}^{n} Z_{i} \geq \frac{n (θ_{1} - θ_{0})}{2}\} \\ = & P r \{\frac{1}{σ \sqrt{n}} \sum_{i = 1}^{n} Z_{i} \geq \frac{\sqrt{n} (θ_{1} - θ_{0})}{2 σ}\} \\ = & Q (\frac{\sqrt{n} (θ_{1} - θ_{0})}{2 σ}), \end{matrix}

(34)

where

Q (t) \overset{▵}{=} \int_{t}^{\infty} \frac{e^{- u^{2} / 2} d u}{\sqrt{2 π}},

(35)

we select

ξ_{n} = \frac{1}{\sqrt{n}}

, which yields

θ_{1} - θ_{0} = \frac{2 s}{\sqrt{n}}

P_{e}^{\infty} (θ, s) = Q (\frac{s}{σ}),

(36)

and then for the MSE case,

ω (s) = s^{2}

,

r (θ) \geq sup_{s \geq 0} \{2 s^{2} Q (\frac{s}{σ})\} = σ^{2} \cdot sup_{u \geq 0} {2 u^{2} Q (u)} \approx 0.3314 σ^{2}

(37)

w.r.t.

ζ_{n} = 1 / {(1 / \sqrt{n})}^{2} = n

, and so, the asymptotic lower bound to

R_{n} (θ, g_{n})

is

0.3314 σ^{2} / n

.

We now compare this bound (which will be further improved in Section 4) with a few earlier reported results. In one of the versions of Le Cam’s bound (see Example 4.7 in [35]) for the same model, the lower bound to

r (θ)

turns out to be

\frac{σ^{2}}{24} = 0.0417 σ^{2}

, namely, an order of magnitude smaller. Also, in Example 3.1 in [28], another version of Le Cam’s method yields

r (θ) \geq (1 - \sqrt{1 / 2}) σ^{2} / 8 = 0.0366 σ^{2}

. According to Corollary 4.3 in [36],

r (θ) \geq \frac{σ^{2}}{8 e} = 0.046 σ^{2}

. Yet another comparison is with Theorem 5.9 in [37], where we find an inequality, which in our notation reads as follows:

sup_{θ \in Θ} P_{θ} \{{(g_{n} [X] - θ)}^{2} \geq \frac{2 α σ^{2}}{n}\} \geq \frac{1}{2} - α α \in (0, \frac{1}{2}) .

(38)

Combining it with Chebychev’s inequality yields

sup_{θ \in Θ} E_{θ} {(g_{n} [X] - θ)}^{2} \geq \frac{σ^{2}}{n} \cdot max_{0 \leq α \leq 1 / 2} α (1 - 2 α) = \frac{0.125 σ^{2}}{n} .

(39)

In [23] (p. 257), it is shown that when

Θ = R

, the minimax estimator for this model is the sample mean, and so, in this case, the correct constant in front of

σ^{2}

is actually 1. This concludes Example 4.

The next example is related to Example 4, as it is based on the use of the central limit theorem (CLT), which means that the Gaussian tail distribution is used here too.

Example 5.

Consider an exponential family,

p (x | θ) = \prod_{i = 1}^{n} p (x_{i} | θ) = \prod_{i = 1}^{n} \frac{e^{θ T (x_{i})}}{Z (θ)} = \frac{exp \{θ \sum_{i = 1}^{n} T (x_{i})\}}{Z^{n} (θ)} .

(40)

where

T (\cdot)

is a given function and

Z (θ)

is a normalization function given by

Z (θ) = \int_{R} e^{θ T (x)} d x,

(41)

assuming that the integral converges. In the auxiliary binary hypothesis problem, the test statistic is

\sum_{i = 1}^{n} T (X_{i})

. If

q = 1 / 2

,

ξ_{n} = 1 / \sqrt{n}

and

θ_{1} - θ_{0} = 2 s / \sqrt{n}

, the LRT amounts to examining whether

\sum_{i = 1}^{n} [T (X_{i}) - E_{θ} {T (X_{i})}]

is larger than

\frac{s}{\sqrt{n}} \cdot n \frac{d^{2} ln Z (θ)}{d θ^{2}} = s \sqrt{n} \cdot \frac{d^{2} ln Z (θ)}{d θ^{2}} .

In this case, the probability of error can be asymptotically assessed using the CLT, which after a simple algebraic manipulation, becomes:

\begin{matrix} P_{e}^{\infty} (θ, s) & = & Q (\frac{s d^{2} ln Z (θ) / d θ^{2}}{\sqrt{d^{2} ln Z (θ) / d θ^{2}}}) \\ = & Q (s \sqrt{\frac{d^{2} ln Z (θ)}{d θ^{2}}}) \\ = & Q (s \sqrt{I (θ)}), \end{matrix}

(42)

where

I (θ)

is the Fisher information. Thus, for the MSE,

r (θ) \geq sup_{s \geq 0} 2 s^{2} Q (s \sqrt{I (θ)}) \approx \frac{0.3314}{I (θ)}

(43)

w.r.t.

ζ_{n} = 1 / {(1 / \sqrt{n})}^{2} = n

. This concludes Example 5.

In several earlier examples, our bound was shown to outperform (sometimes significantly so) earlier reported bounds for the corresponding models. However, to be honest and fair, we should not ignore the fact that there are also situations where our bound may not be tighter than earlier bounds applied to the same model. Such a case is demonstrated in Example 6 below, where our result is compared to those of Ziv and Zakai [18] and Chazan, Zakai, and Ziv [4] in the context of estimating the delay of a known continuous-time signal corrupted by additive white Gaussian noise (for further developments and applications of the Ziv–Zakai and the Chazan–Zakai–Ziv bounds, see, e.g., [38,39,40,41,42,43,44,45] and references therein). Having said that, it should also be kept in mind that our emphasis in this work is on bounds that are relatively simple and easy to calculate (at least numerically), whereas the Chazan–Zakai–Ziv bound, although very tight in many situations, is notoriously difficult to calculate. Indeed, in this setting, the explicit behavior of the resulting complicated bound of [4] is clearly transparent only at high values of the signal-to-noise ratio (SNR)—see Equations (11), (12), and (14) in [4].

Example 6.

Let

X (t) = s (t, θ) + N (t)

,

t \in [0, T]

, where

N (t)

is additive white Gaussian noise (AWGN) with double-sided spectral density

N_{0}

and

s (t, θ)

is a deterministic signal that depends on the unknown parameter, θ. It is assumed that the signal energy,

E = \int_{0}^{T} s^{2} (t, θ) d t

, does not depend on θ (which is the case, for example, when θ is a delay parameter of a pulse fully contained in the observation interval, or when θ is the frequency or the phase of a sinusoidal waveform). We further assume that

s (t, θ)

is at least twice differentiable w.r.t. θ, and that the energies of the first two derivatives are also independent of θ. Then, as shown in Appendix A, for small

| θ_{1} - θ_{0} |

,

\begin{matrix} ϱ (θ_{0}, θ_{1}) & \overset{▵}{=} & \frac{1}{E} \int_{0}^{T} s (t, θ_{0}) s (t, θ_{1}) d t \\ = & 1 - \frac{{(θ_{1} - θ_{0})}^{2}}{2 E} \int_{0}^{T} {[\frac{\partial s (t, θ)}{\partial θ} |_{θ = θ_{0}}]}^{2} d t + o (| θ_{1} - θ_{0} |^{2}) \\ = & 1 - \frac{{(θ_{1} - θ_{0})}^{2} \dot{E}}{2 E} + o (| θ_{1} - θ_{0} |^{2}), \end{matrix}

(44)

where

\dot{E}

is the energy

\dot{s} (t, θ) = \partial s (t, θ) / \partial θ

.

The optimal LRT in deciding between the two hypotheses is based on the comparison between the correlations,

\int_{0}^{T} X (t) s (t, θ_{0}) d t

and

\int_{0}^{T} X (t) s (t, θ_{1}) d t

. Again, the optimal value of q is

q^{*} = 1 / 2

. Thus,

\begin{matrix} P_{e} (\frac{1}{2}, θ_{0}, θ_{1}) & = & Q (\sqrt{\frac{E}{2 N_{0}} [1 - ϱ (θ_{0}, θ_{1})]}) \\ \approx & Q (\sqrt{\frac{E \dot{E} {(θ_{1} - θ_{0})}^{2}}{4 E N_{0}}}) \\ = & Q (\sqrt{\frac{\dot{E}}{4 N_{0}}} \cdot | θ_{1} - θ_{0} |) \\ = & Q (\sqrt{\frac{\dot{P}}{4 N_{0}}} \cdot \sqrt{T} | θ_{1} - θ_{0} |), \end{matrix}

(45)

where

\dot{P} = \dot{E} / T

is the power of

\dot{s} (t, θ)

. Since we are dealing here with continuous time, instead of a sequence

ξ_{n}

, we use a function,

ξ (T)

, of the observation time, T, which in this case would be

ξ (T) = \frac{1}{\sqrt{T}}

. Let

θ_{0} = θ

and

θ_{1} = θ + \frac{2 s}{\sqrt{T}}

. Then,

P_{e}^{\infty} (θ, s) = Q (\sqrt{\frac{\dot{P}}{N_{0}}} \cdot | s |),

(46)

which, for the MSE case, yields

\begin{matrix} r (θ) & \geq & sup_{s \geq 0} \{2 s^{2} Q (\sqrt{\frac{\dot{P}}{N_{0}}} \cdot s)\} \\ = & \frac{N_{0}}{\dot{P}} sup_{u \geq 0} {2 u^{2} Q (u)} \\ \approx & \frac{0.3314 N_{0}}{\dot{P}} . \end{matrix}

(47)

w.r.t.

ζ (T) = 1 / {(1 / \sqrt{T})}^{2} = T

, which means that the minimax loss is lower bounded by

r (θ) / T \geq 0.3314 N_{0} / \dot{E}

. This has the same form as the Cramér–Rao lower bound (CRLB), except that the multiplicative factor is 0.3314 rather than 1. In Equation (20) in [18], the bound is of the same form, but with a multiplicative constant of 0.16 for a high signal-to-noise ratio (SNR). However, in [4], the constant of proportionality was improved to 1 in the high SNR limit, just like in the Cramér–Rao lower bound for the same model. The constant 0.3314 will be improved later to 0.4549 (under Example 9 in the sequel), but it will still be below 1.

The case where

s (t, θ)

is not everywhere differentiable w.r.t. θ can be handled in a similar manner, but some caution should be exercised. For example, consider the model,

X (t) = s (t - θ) + N (t),

(48)

where

- \infty < t < \infty

,

θ \in [0, T]

,

N (t)

is AWGN as before, and

s (\cdot)

is a rectangular pulse with duration Δ and amplitude

\sqrt{E / Δ}

, E being the signal energy. Here,

ϱ (θ, θ + δ) = 1 - | δ | / Δ

; namely, it also includes a linear term in

| δ |

, not just the quadratic one. This changes the asymptotic behavior of the resulting lower bound to

r (θ)

, which turns out to be

0.7544 {(N_{0} Δ / P)}^{2}

w.r.t.

T^{2}

(namely, a minimax lower bound of

0.7544 {(N_{0} Δ / E)}^{2}

). It is interesting to compare this bound to the Chapman–Robbins bound for the same model, which is a local bound of the same form but with a multiplicative constant of

0.1602

instead of

0.7544

, and which is limited to unbiased estimators. The Chazan–Zakai–Ziv bound [4] for this case is difficult to calculate, but in the high SNR regime, it behaves like

3 {(N_{0} Δ / P)}^{2}

w.r.t.

T^{2}

.

It is conceivable that the Chazan–Zakai–Ziv bound for estimating the delay of non-differential signals in Gaussian white noise has been improved even further ever since its publication in 1975, but the point of this particular example remains: our bound may not always be the best available bound. Still, since our bound is easy to calculate, it is worth comparing it to other bounds for every given model. This concludes Example 6.

4. Bounds Based on the Minimum Expected Loss over Some Test Points

In this section, we derive our second family of bounds for the scalar case,

d = 1

.

4.1. Two Test Points

The following generic, yet conceptually very simple, lower bound assumes neither symmetry nor convexity of the loss function

ρ (\cdot)

. For a given

(x, q, θ_{0}, θ_{1}) \in R^{n} \times [0, 1] \times Θ^{2}

, let us define

ψ (x, q, θ_{0}, θ_{1}) = min_{ϑ} {q p (x | θ_{0}) ρ (ϑ - θ_{0}) + (1 - q) p (x | θ_{1}) ρ (ϑ - θ_{1})} .

(49)

Then,

\begin{matrix} R_{n} (Θ) & \geq & sup_{θ_{0}, θ_{1}, q} q E_{θ_{0}} {ρ (g_{n} [X] - θ_{0})} + (1 - q) E_{θ_{1}} {ρ (g_{n} [X] - θ_{1})} \\ = & sup_{θ_{0}, θ_{1}, q} \int_{R^{n}} [q p (x | θ_{0}) ρ (g_{n} [x] - θ_{0}) + (1 - q) p (x | θ_{1}) ρ (g_{n} [x] - θ_{1})] d x \\ \geq & sup_{θ_{0}, θ_{1}, q} \int_{R^{n}} ψ (x, q, θ_{0}, θ_{1}) d x . \end{matrix}

(50)

If we further assume the symmetry of

ρ

, then it is easy to see that the minimizer

ϑ^{*}

, which achieves

ψ (x, q, θ_{0}, θ_{1})

, is always within the interval

[θ_{0}, θ_{1}]

. This is because the objective increases monotonically as we move away from the interval

[θ_{0}, θ_{1}]

in either direction. Of course, this simple idea can easily be extended to apply to the weighted sums of more than two points, in principle, but it would become more complicated—see the next subsection for three such points.

If

ρ

is concave, then the minimizing

ϑ

is either

θ_{0}

or

θ_{1}

, depending on which of

q p (x | θ_{0})

and

(1 - q) p (x | θ_{1})

is smaller, and the bound becomes

R_{n} (Θ) \geq sup_{θ_{0}, θ_{1}, q} ρ (θ_{1} - θ_{0}) P_{e} (q, θ_{0}, θ_{1}) .

(51)

Minimality at the edge points may also happen for some loss functions that are not concave, like the loss function

ρ (u) = 1 {| u | \geq Δ}

.

The generic lower bound (50) is more general than our first bound in the sense that it does not require convexity or symmetry of

ρ

, but the downside is that the resulting expressions are harder to deal with directly, as will be seen shortly. For loss functions other than the MSE or general moments of the estimation error, it may not be a trivial task even to derive a closed-form expression of

ψ (x, q, θ_{0}, θ_{1})

(i.e., to carry out the minimization associated with the definition of

ψ

).

For the case of the MSE,

ρ (ε) = ε^{2}

, the calculation of

ψ (x, q, θ_{0}, θ_{1})

is straightforward, and it readily yields

ψ (x, q, θ_{0}, θ_{1}) = {(θ_{1} - θ_{0})}^{2} \cdot \frac{q p (x | θ_{0}) \cdot (1 - q) p (x | θ_{1})}{q p (x | θ_{0}) + (1 - q) p (x | θ_{1})} .

(52)

However, it may not be convenient to integrate this function of x due to the summation at the denominator. One way to alleviate this difficulty is to observe that

\begin{matrix} ψ (x, q, θ_{0}, θ_{1}) & \geq & {(θ_{1} - θ_{0})}^{2} \cdot \frac{q p (x | θ_{0}) \cdot (1 - q) p (x | θ_{1})}{2 max {q p (x | θ_{0}), (1 - q) p (x | θ_{1})}} \\ = & \frac{1}{2} \cdot {(θ_{1} - θ_{0})}^{2} \cdot min {q p (x | θ_{0}), (1 - q) p (x | θ_{1})}, \end{matrix}

(53)

which after integration yields again

R_{n} (Θ) \geq sup_{θ_{0}, θ_{1}, q} \{\frac{1}{2} {(θ_{1} - θ_{0})}^{2} \cdot P_{e} (q, θ_{0}, θ_{1})\},

(54)

exactly as in Theorem 1 in the special case of the MSE. This indicates that the bound (50) is at least as tight as the bound of Theorem 1 for the MSE.

It turns out, however, that we can do better than bounding the denominator,

q p (x | θ_{0}) + (1 - q) p (x | θ_{1})

, by

2 \cdot max {q p (x | θ_{0}), (1 - q) p (x | θ_{1})}

for the purpose of obtaining a more convenient integrand. To this end, we invoke the following lemma, whose proof appears in Appendix B.

Lemma 1.

Let k be a positive integer and let

a_{1}, \dots, a_{k}

be positive reals. Then,

\sum_{i = 1}^{k} a_{i} = inf_{(r_{1}, \dots, r_{k}) \in S} max \{\frac{a_{1}}{r_{1}}, \dots, \frac{a_{k}}{r_{k}}\},

(55)

where

S

is the interior of the k-dimensional simplex, namely, the set of all vectors

(r_{1}, \dots, r_{k})

with strictly positive components that sum to unity.

Applying Lemma 1 with the assignments

k = 2

,

a_{1} = q p (x | θ_{0})

, and

a_{2} = (1 - q) p (x | θ_{1})

, we have

q p (x | θ_{0} + (1 - q) p (x | θ_{1}) = inf_{r \in (0, 1)} max \{\frac{q p (x | θ_{0})}{r}, \frac{(1 - q) p (x | θ_{1})}{1 - r}\} .

(56)

Thus,

\begin{matrix} ψ (x, q, θ_{0}, θ_{1}) & = & \frac{q p (x | θ_{0}) \cdot (1 - q) p (x | θ_{1})}{{inf}_{r \in (0, 1)} max \{q p (x | θ_{0}) / r, (1 - q) p (x | θ_{1}) / (1 - r)\}} \\ = & sup_{r \in (0, 1)} \frac{q p (x | θ_{0}) \cdot (1 - q) p (x | θ_{1})}{max \{q p (x | θ_{0}) / r, (1 - q) p (x | θ_{1}) / (1 - r)\}} \\ = & sup_{r \in (0, 1)} \{\begin{matrix} r (1 - q) p (x | θ_{1}) & r \leq r^{*} \\ (1 - r) q p (x | θ_{0}) & r \geq r^{*} \end{matrix} \\ = & sup_{r \in (0, 1)} min {r (1 - q) p (x | θ_{1}), (1 - r) q p (x | θ_{0})}, \end{matrix}

(57)

where

r^{*} = \frac{q p (x | θ_{0})}{q p (x | θ_{0}) + (1 - q) p (x | θ_{1})} .

(58)

Thus, the bound becomes

\begin{matrix} R_{n} (Θ) & \geq & sup_{{(θ_{0}, θ_{1}, q) \in Θ^{2} \times (0, 1)}} {(θ_{1} - θ_{0})}^{2} \times \\ \int_{R^{n}} sup_{r \in (0, 1)} min {r (1 - q) p (x | θ_{1}), (1 - r) q p (x | θ_{0})} d x \\ \geq & sup_{(θ_{0}, θ_{1}, q) \in Θ^{2} \times (0, 1)} sup_{r \in (0, 1)} {(θ_{1} - θ_{0})}^{2} \times \\ \int_{R^{n}} min {r (1 - q) p (x | θ_{1}), (1 - r) q p (x | θ_{0})} d x \\ = & sup_{(θ_{0}, θ_{1}, q, r) \in Θ^{2} \times {(0, 1)}^{2}} {(θ_{1} - θ_{0})}^{2} \cdot (q + r - 2 q r) \times \\ \int_{R^{n}} min \{\frac{r (1 - q)}{q + r - 2 q r} \cdot p (x | θ_{1}), \frac{(1 - r) q}{q + r - 2 q r} \cdot p (x | θ_{0})\} d x \\ = & sup_{(θ_{0}, θ_{1}, q, r) \in Θ^{2} \times {(0, 1)}^{2}} {(θ_{1} - θ_{0})}^{2} \cdot (q + r - 2 q r) \cdot P_{e} (\frac{(1 - r) q}{q + r - 2 q r}, θ_{0}, θ_{1}) . \end{matrix}

The bound of Theorem 1 for the MSE is obtained as a special case of

r = 1 / 2

. Therefore, after the optimization over the additional degree of freedom, r, the resulting bound cannot be worse than the MSE bound of Theorem 1. In fact, it may strictly improve as we will demonstrate shortly. The choice

r = q

gives a prior of

1 / 2

in the error probability factor, and then the maximum of the external factor,

q + r - 2 r q = 2 q (1 - q)

, is maximized by

q = 1 / 2

.

Example 7.

To demonstrate the new bound for the MSE, let us revisit Example 2 and see how it improves the multiplicative constant. In that example,

P_{e} (\frac{(1 - r) q}{q + r - 2 q r}, θ_{0}, θ_{1}) = min \{\frac{(1 - r) q}{q + r - 2 q r}, \frac{r (1 - q)}{q + r - 2 q r} \cdot {(\frac{θ_{0}}{θ_{1}})}^{n}\} .

(59)

Let us denote

α = {(θ_{0} / θ_{1})}^{n}

and recall that

α \in (0, 1)

, provided that we select

θ_{1} > θ_{0}

. Then,

(q + r - 2 q r) \cdot P_{e} (\frac{(1 - r) q}{q + r - 2 q r}, θ_{0}, θ_{1}) = min {(1 - r) q, r (1 - q) α} .

(60)

The maximum w.r.t. q is attained when

(1 - r) q = r (1 - q) α

, namely, for

q = q^{*} \overset{▵}{=} α r / [1 - (1 - α) r]

, which yields

max_{q} (q + r - 2 q r) \cdot P_{e} (\frac{(1 - r) q}{q + r - 2 q r}, θ_{0}, θ_{1}) = \frac{α r (1 - r)}{1 - (1 - α) r} .

(61)

Let us denote

β \overset{▵}{=} 1 - α \in (0, 1)

. The maximum of

r (1 - r) / (1 - β r)

is attained for

r = r^{*} \overset{▵}{=} \frac{1 - \sqrt{1 - β}}{β} = \frac{1 - \sqrt{α}}{1 - α} = \frac{1}{1 + \sqrt{α}},

(62)

which yields

\begin{matrix} max_{q, r} (q + r - 2 q r) \cdot P_{e} (\frac{(1 - r) q}{q + r - 2 q r}, θ_{0}, θ_{1}) & = & sup_{r \in (0, 1)} \frac{α r (1 - r)}{1 - (1 - α) r} \\ = & α \cdot {(\frac{1 - \sqrt{α}}{1 - α})}^{2} \\ = & \frac{α}{{(1 + \sqrt{α})}^{2}} . \end{matrix}

(63)

To obtain a local bound in the spirit of Corollary 1, take

θ_{0} = θ

,

θ_{1} = θ (1 + \frac{s}{n θ})

, which yields

α = e^{- s / θ}

in the limit of large n, and so,

r (θ) \geq sup_{s \geq 0} \frac{s^{2} e^{- s / θ}}{{(1 + e^{- s / [2 θ]})}^{2}} = θ^{2} \cdot sup_{u \geq 0} u^{2} \cdot \frac{e^{- u}}{{(1 + e^{- u / 2})}^{2}} \approx 0.3102 θ^{2},

(64)

w.r.t.

ζ_{n} = n^{2}

, which improves on our earlier bound in Example 2,

r (θ) \geq 0.2414 θ^{2}

. This concludes Example 7.

More generally, for general moments of the estimation error, a similar derivation yields the following:

Theorem 2.

For

ρ (ε) = {| ε |}^{t}

,

t \geq 1

(not necessarily an integer),

R_{n} (Θ) \geq sup_{θ_{0}, θ_{1}, q, r} {| θ_{1} - θ_{0} |}^{t} [{(1 - r)}^{t - 1} q + r^{t - 1} (1 - q)] \cdot P_{e} (\frac{{(1 - r)}^{t - 1} q}{{(1 - r)}^{t - 1} q + r^{t - 1} (1 - q)}, θ_{0}, θ_{1}) .

(65)

Applying the local version of Theorem 2 to Example 7, we obtain

r (θ) \geq θ^{t} \cdot sup_{s > 0} \frac{s^{t} e^{- s}}{{(1 + e^{- s / t})}^{t}}

(66)

w.r.t.

ζ_{n} = n^{t}

. Changing the optimization variable from s to

σ = s / t

, we end up with

\begin{matrix} r (θ) \geq {(θ t)}^{t} \cdot {[sup_{σ > 0} \frac{σ}{e^{σ} + 1}]}^{t} & \approx & {(0.2785 t θ)}^{t} . \end{matrix}

(67)

The factor of

{(0.2785 t)}^{t}

should be compared with

{(1 / 2 e)}^{t} = 0 . 1839^{t}

of Example 2, pertaining to the choice

q = r = 1 / 2

. The gap increases exponentially with t. For the maximum likelihood estimator pertaining to Example 2, which is

g_{n} [X] = {max}_{i} X_{i}

, it is easy to show that whenever t is an integer,

E_{θ} {| \hat{θ} - θ |^{t}} = \frac{n! t! θ^{t}}{(n + t)!} = \frac{t! θ^{t}}{(n + 1) (n + 2) \dots (n + t)},

(68)

and so, the asymptotic gap is between

t!

and

{(0.2785 t)}^{t}

. Considering the Stirling approximation, the ratio between the upper bound and the lower bound is about

\sqrt{2 π t} \cdot {(1.3211)}^{t}

.

4.2. Three Test Points

The idea behind the bounds of Section 4.1 can be conceptually extended to be based on more than two test points, but the resulting expressions become cumbersome very quickly as the number of test points grows. For three points, however, this is still manageable and can provide improved bounds. Let us select the three test points to be

θ_{0} - Δ

,

θ_{0}

, and

θ_{0} + Δ

for some

θ_{0}

and

Δ

, and let us assign weights q, r, and

w = 1 - q - r

. Consider the bound

\begin{matrix} R_{n} (Θ) & \geq & q E_{θ_{0} - Δ} {ρ (g_{n} [X] - θ_{0} + Δ)} + r E_{θ_{0}} {ρ (g_{n} [X] - θ_{0})} + \\ w E_{θ_{0} + Δ} {ρ (g_{n} [X] - θ_{0} - Δ)} \\ = & \int_{R^{n}} [q \cdot p (x | θ_{0} - Δ) ρ (g_{n} [x] - θ_{0} + Δ) + r \cdot p (x | θ_{0}) ρ (g_{n} [x] - θ_{0}) + \\ w \cdot p (x | θ_{0} + Δ) ρ (g_{n} [x] - θ_{0} - Δ)] d x \\ \geq & \int_{R^{n}} Ψ (x, θ_{0}, Δ, q, r) d x, \end{matrix}

(69)

where

\begin{matrix} Ψ (x, θ_{0}, Δ, q, r) & = & min_{ϑ} {q \cdot p (x | θ_{0} - Δ) ρ (ϑ - θ_{0} + Δ) + r \cdot p (x | θ_{0}) ρ (ϑ - θ_{0}) + \\ w \cdot p (x | θ_{0} + Δ) ρ (ϑ - θ_{0} - Δ)} . \end{matrix}

(70)

Considering the case of the MSE,

\begin{matrix} Ψ (x, θ_{0}, Δ, q, r) & = & min_{ϑ} {q \cdot p (x | θ_{0} - Δ) {(ϑ - θ_{0} + Δ)}^{2} + r \cdot p (x | θ_{0}) {(ϑ - θ_{0})}^{2} + \\ w \cdot p (x | θ_{0} + Δ) {(ϑ - θ_{0} - Δ)}^{2}} \end{matrix}

(71)

can be found in closed-form. Denoting temporarily

a = q p (x | θ_{0} - Δ)

,

b = r p (x | θ_{0})

, and

c = w p (x | θ_{0} + Δ)

,

Ψ (x, θ_{0}, Δ, q, r)

is attained by

ϑ = ϑ^{*} = \frac{a (θ_{0} - Δ) + b θ_{0} + c (θ_{0} + Δ)}{a + b + c} = θ_{0} + \frac{(c - a) Δ}{a + b + c} .

(72)

On substituting

ϑ^{*}

into the sum of squares, we end up with

\begin{matrix} Ψ (x, θ_{0}, Δ, q, r) & = & \frac{[a {(b + 2 c)}^{2} + b {(c - a)}^{2} + c {(2 a + b)}^{2}] Δ^{2}}{{(a + b + c)}^{2}} \\ = & \frac{(a b + b c + 4 a c) Δ^{2}}{a + b + c} \\ = & \frac{[q r p (x | θ_{0} - Δ) p (x | θ_{0}) + r w p (x | θ_{0}) p (x | θ_{0} + Δ) + 4 q w p (x | θ_{0} - Δ) p (x | θ_{0} + Δ)] Δ^{2}}{q p (x | θ_{0} - Δ) + r p (x | θ_{0}) + w p (x | θ_{0} + Δ)} . \end{matrix}

The lower bound on the MSE is then the sum of three integrals

\begin{matrix} I_{1} & = & q r Δ^{2} \cdot \int_{R^{n}} \frac{p (x | θ_{0} - Δ) p (x | θ_{0}) d x}{q p (x | θ_{0} - Δ) + r p (x | θ_{0}) + w p (x | θ_{0} + Δ)} \end{matrix}

(73)

\begin{matrix} I_{2} & = & r w Δ^{2} \cdot \int_{R^{n}} \frac{p (x | θ_{0}) p (x | θ_{0} + Δ) d x}{q p (x | θ_{0} - Δ) + r p (x | θ_{0}) + w p (x | θ_{0} + Δ)} \end{matrix}

(74)

\begin{matrix} I_{3} & = & 4 q w Δ^{2} \cdot \int_{R^{n}} \frac{p (x | θ_{0} - Δ) p (x | θ_{0} + Δ) d x}{q p (x | θ_{0} - Δ) + r p (x | θ_{0}) + w p (x | θ_{0} + Δ)} . \end{matrix}

(75)

Example 8.

Revisiting again Example 2, for a given

t > 0

, let us denote

C (t) = {[0, t]}^{n}

, and then

p (x | θ_{0} + i Δ) = {[θ_{0} + i Δ]}^{- n} \cdot I {x \in C (θ_{0} + i Δ)

,

i = - 1, 0, 1

. Then,

\begin{matrix} I_{1} & = & q r Δ^{2} \cdot \int_{C (θ_{0} - Δ)} \frac{{(θ_{0} - Δ)}^{- n} θ_{0}^{- n} d x}{q {(θ_{0} - Δ)}^{- n} + r θ_{0}^{- n} + w {(θ_{0} + Δ)}^{- n}} \\ = & q r Δ^{2} \cdot \frac{{(θ_{0} - Δ)}^{n} {(θ_{0} - Δ)}^{- n} θ_{0}^{- n}}{q {(θ_{0} - Δ)}^{- n} + r θ_{0}^{- n} + w {(θ_{0} + Δ)}^{- n}} \\ = & \frac{q r Δ^{2} θ_{0}^{- n}}{q {(θ_{0} - Δ)}^{- n} + r θ_{0}^{- n} + w {(θ_{0} + Δ)}^{- n}} \\ = & \frac{q r Δ^{2}}{q {[θ_{0} / (θ_{0} - Δ)]}^{n} + r + w {[θ_{0} / (θ_{0} + Δ)]}^{n}} . \end{matrix}

(76)

For

Δ = θ_{0} s / n

, we then have, for large n:

I_{1} \sim \frac{θ_{0}^{2}}{n^{2}} \cdot \frac{q r s^{2}}{q e^{s} + r + w e^{- s}} = \frac{θ_{0}^{2}}{n^{2}} \cdot \frac{q r s^{2} e^{s}}{q e^{2 s} + r e^{s} + w} .

(77)

\begin{matrix} I_{2} & = & r w Δ^{2} \int_{C (θ_{0})} \frac{θ_{0}^{- n} {(θ_{0} + Δ)}^{- n} d x}{q {(θ_{0} - Δ)}^{- n} I {x \in C (θ_{0} - Δ)} + r θ_{0}^{- n} + w {(θ_{0} + Δ)}^{- n}} \\ = & r w Δ^{2} \cdot \frac{{(θ_{0} - Δ)}^{n} θ_{0}^{- n} {(θ_{0} + Δ)}^{- n}}{q {(θ_{0} - Δ)}^{- n} + r θ_{0}^{- n} + w {(θ_{0} + Δ)}^{- n}} + \\ r w Δ^{2} \cdot \frac{[θ_{0}^{n} - {(θ_{0} - Δ)}^{n}] θ_{0}^{- n} {(θ_{0} + Δ)}^{- n}}{r θ_{0}^{- n} + w {(θ_{0} + Δ)}^{- n}} \\ = & r w Δ^{2} \cdot \frac{{(1 - Δ / θ_{0})}^{n}}{q {[(1 + Δ / θ_{0}) / (1 - Δ / θ_{0})]}^{- n} + r {(1 + Δ / θ_{0})}^{n} + w} + \\ r w Δ^{2} \cdot \frac{[1 - {(1 - Δ / θ_{0})}^{n}]}{r {(1 + Δ / θ_{0})}^{n} + w} \\ \sim & \frac{θ_{0}^{2}}{n^{2}} \cdot r w s^{2} (\frac{e^{- s}}{q e^{2 s} + r e^{s} + w} + \frac{1 - e^{- s}}{r e^{s} + w}) . \end{matrix}

(78)

Similarly,

I_{3} \sim \frac{θ_{0}^{2}}{n^{2}} \cdot \frac{4 q w s^{2}}{q e^{2 s} + r e^{s} + w} .

(79)

Thus,

\begin{matrix} r (θ_{0}) & \geq & θ_{0}^{2} \cdot sup_{s \geq 0} sup_{(q, r, w) \in R_{+}^{3} : q + r + w = 1} s^{2} \cdot [\frac{q r e^{s} + 4 q w + r w e^{- s}}{q e^{2 s} + r e^{s} + w} + \frac{r w (1 - e^{- s})}{r e^{s} + w}] \\ \approx & 0.4624 θ_{0}^{2}, \end{matrix}

(80)

w.r.t.

ζ_{n} = n^{2}

, which is a significant improvement of nearly 50% over the previous bound,

0.3102 θ_{0}^{2}

, which was obtained on the basis of two test points, let alone the bound of Theorem 1, which was

0.2414 θ_{0}^{2}

. This concludes Example 8.

In general, the integrals

I_{1}

,

I_{2}

, and

I_{3}

are not easy to calculate due to the summations at the denominators of the integrands. One way to proceed is to apply Lemma 1 to the sum of

k = 3

terms, but this would introduce two additional parameters to be optimized. Returning to the earlier shorthand notation of a, b, and c, a different approach to get rid of summations at the denominators, at the expense of some loss of tightness, is the following:

\begin{matrix} \frac{a b + b c + 4 a c}{a + b + c} & = & \frac{a (b + 2 c)}{a + b + c} + \frac{c (b + 2 a)}{a + b + c} \\ \geq & \frac{a (b + 2 c)}{a + b + 2 c} + \frac{c (b + 2 a)}{2 a + b + c} \\ \geq & \frac{a (b + 2 c)}{2 max {a, b + 2 c}} + \frac{c (b + 2 a)}{2 max {2 a + b, c}} \\ = & \frac{1}{2} \cdot (min {a, b + 2 c} + min {2 a + b, c}) \\ \geq & \frac{1}{2} \cdot (min {a, b} + min {b, c}) \\ = & \frac{1}{2} \cdot [min {q p (x | θ_{0} - Δ), r p (x | θ_{0})} + min {r p (x | θ_{0}), w p (x | θ_{0} + Δ)}], \end{matrix}

(81)

and so,

\begin{matrix} R_{n} (Θ) & \geq & \frac{Δ^{2}}{2} [\int_{R^{n}} min {q p (x | θ_{0} - Δ), r p (x | θ_{0})} d x + \\ \int_{R^{n}} min {r p (x | θ_{0}), w p (x | θ_{0} + Δ)} d x] \\ = & \frac{Δ^{2}}{2} [(q + r) P_{e} (\frac{q}{q + r}, θ_{0} - Δ, θ_{0}) + \\ (r + w) P_{e} (\frac{r}{r + w}, θ_{0}, θ_{0} + Δ)] . \end{matrix}

(82)

Note that by setting

w = 0

, we recover the bound obtained by integrating (53), and therefore, by optimizing w, the resulting bound cannot be worse. A slightly different (better, but more complicated) route in (81) is to apply Lemma 1 with

k = 2

in the following manner:

\begin{matrix} \frac{a b + b c + 4 a c}{a + b + c} & = & \frac{a (b + 2 c)}{a + b + c} + \frac{c (b + 2 a)}{a + b + c} \\ \geq & \frac{a (b + 2 c)}{a + b + 2 c} + \frac{c (b + 2 a)}{2 a + b + c} \\ = & \frac{a (b + 2 c)}{{min}_{u} max {a / u, (b + 2 c) / (1 - u)}} + \frac{c (b + 2 a)}{{min}_{v} max {(2 a + b) / (1 - v), c / v}} \\ = & max_{u} min {(1 - u) a, u (b + 2 c)} + max_{v} min {v (2 a + b), (1 - v) c} \\ \geq & max_{u} min {(1 - u) a, u b} + max_{v} min {v b, (1 - v) c} \\ = & max_{u} min {(1 - u) q p (x | θ_{0} - Δ), u r p (x | θ_{0})} + \\ max_{v} min {v r p (x | θ_{0}), (1 - v) w p (x | θ_{0} + Δ)}, \end{matrix}

(83)

and so,

\begin{matrix} R_{n} (Θ) & \geq & Δ^{2} \cdot [max_{u} \int_{R^{n}} min {(1 - u) q p (x | θ_{0} - Δ), u r p (x | θ_{0})} d x + \\ max_{v} \int_{R^{n}} min {v r p (x | θ_{0}), (1 - v) w p (x | θ_{0} + Δ)} d x] \\ = & Δ^{2} \cdot {max_{u} [(1 - u) q + u r] \cdot P_{e} (\frac{(1 - u) q}{(1 - u) q + u r}, θ_{0} - Δ, θ_{0}) + \\ max_{v} [v r + (1 - v) w] \cdot P_{e} (\frac{v r}{v r + (1 - v) w}, θ_{0}, θ_{0} + Δ)} . \end{matrix}

(84)

We have just proved the following result:

Theorem 3.

For the MSE case,

\begin{matrix} R_{n} (Θ) & \geq & sup_{θ_{0}, Δ, q, r, w, (u, v) \in {[0, 1]}^{2}} Δ^{2} \cdot {max_{u} [(1 - u) q + u r] \cdot P_{e} (\frac{(1 - u) q}{(1 - u) q + u r}, θ_{0} - Δ, θ_{0}) + \\ max_{v} [v r + (1 - v) w] \cdot P_{e} (\frac{v r}{v r + (1 - v) w}, θ_{0}, θ_{0} + Δ)} . \end{matrix}

(85)

where the maximum over q, r, and w is under the constraints that they are all non-negative and sum to unity.

Example 9.

Revisiting Example 4, it is natural that both error probabilities would correspond to a prior of

1 / 2

. This dictates the relations,

\begin{matrix} u = \frac{q}{q + r} \end{matrix}

(86)

\begin{matrix} v = \frac{w}{w + r}, \end{matrix}

(87)

and so, the bound becomes

\begin{matrix} R_{n} (Θ) & \geq & max_{Δ, q, w, r} Δ^{2} (\frac{2 q r}{q + r} + \frac{2 w r}{w + r}) \cdot Q (\frac{\sqrt{n} Δ}{2 σ}) \\ = & [max_{s \geq 0} {(\frac{2 s σ}{\sqrt{n}})}^{2} Q (s)] \cdot max_{{(q, r) : q \geq 0, r \geq 0, q + r \leq 1}} \{\frac{2 q r}{q + r} + \frac{2 r (1 - q - r)}{1 - q}\} \\ = & \frac{σ^{2}}{n} \cdot max_{s \geq 0} {4 s^{2} Q (s)} \cdot 0.6862 \\ = & \frac{σ^{2}}{n} \cdot 0.6629 \cdot 0.6862 \\ \approx & \frac{0.4549 σ^{2}}{n}, \end{matrix}

(88)

which is an improvement on the bound of Example 4, which was

0.3314 σ^{2} / n

. Similarly, in Example 6, for smooth signals, the multiplicative constant also improves from 0.3314 to 0.4549, and for the rectangular pulse, it improves from 0.7544 to 1.0352 (about 37% improvement in all of them). This concludes Example 9.

Example 10.

Revisiting Example 2, we now have

\begin{matrix} R_{n} (Θ) & \geq & Δ^{2} \cdot [max_{u} min {(1 - u) q, u r α} + max_{v} min {v r, (1 - v) w α}] \\ = & Δ^{2} (\frac{q r α}{q + r α} + \frac{w r α}{r + w α}) \\ = & Δ^{2} (\frac{q r α}{q + r α} + \frac{r (1 - q - r) α}{r + (1 - q - r) α}) \\ = & \frac{s^{2}}{n^{2}} e^{- s} r (\frac{q}{q + r e^{- s}} + \frac{1 - q - r}{r + (1 - q - r) e^{- s}}) \\ = & \frac{s^{2}}{n^{2}} r (\frac{q}{q e^{s} + r} + \frac{1 - q - r}{r e^{s} + (1 - q - r)}), \end{matrix}

(89)

whose maximum is

0.3909 / n^{2}

, an improvement relative to the earlier bound of

0.3102 / n^{2}

. This concludes Example 10.

5. Extensions to the Vector Case

In this section, we outline extensions of some of our findings in Section 3 and Section 4 to the vector case. Let

θ

be a parameter vector of dimension d and

Θ \subseteq R^{d}

. Let

ρ (ε)

be a convex loss function that depends on the d-dimensional error vector

ε

only via its norm,

∥ ε ∥

(that is,

ρ

has radial symmetry).

First, observe that Theorem 1 extends verbatim to the vector case, as nothing in the proof of Theorem 1 is based on any assumption or property that holds only if

θ

is a scalar parameter.

Corollary 1 can also be extended by letting the two test points of the auxiliary hypothesis testing problem,

θ_{0}

and

θ_{1}

, be selected such that the distance between them,

∥ θ_{1} - θ_{0} ∥

, would decay at an appropriate rate as a function of n, so as to make the probability of error,

{max}_{q} P_{e} (q, θ_{0}, θ_{1})

, converge to a positive constant as

n \to \infty

, as was achieved in Corollary 1 for the scalar case. But in the vector case considered now, there is an additional degree of freedom, which is the direction of the displacement vector,

θ_{1} - θ_{0}

. This direction can now be optimized so as to yield the largest (hence the tightest) lower bound. To be specific, in the vector case, Corollary 1 remains the same except that now s should be thought of as a d-dimensional vector rather than a scalar,

ρ (u)

in the denominator of (19) should be replaced by

ρ (v \cdot u)

, where

v \in R^{d}

is an arbitrary fixed non-zero vector (for example, any unit norm vector in the case of MSE), and the supremum in (20) should be taken over

R^{d}

. To demonstrate this point, we now revisit Example 5, but this time, for the vector case.

Example 11.

Consider the case where

p (x | θ) = \prod_{i = 1}^{n} p (x_{i} | θ)

, with each factor in this product PDF being given by a d-dimensional exponential family,

p (x | θ) = \frac{exp {θ^{'} T (x)}}{Z (θ)}

(90)

where

θ^{'} T (x)

is the inner product of the d-dimensional parameter vector θ and a d-dimensional vector of statistics,

T (x) = (T_{1} (x), T_{2} (x), \dots, T_{d} (x))

, and

Z (θ) = \int_{R} exp {θ^{'} T (x)} d x,

(91)

provided that the integral converges. In the above notation, both θ and

T (x)

are understood to be column vectors and

θ^{'}

denotes transposition of θ to a row vector. Similarly as in Corollary 1, to obtain a local bound at a given θ, we let

θ_{0} = θ

and

θ_{1} = θ + \frac{2 s}{\sqrt{n}}

, where

s \in R^{d}

. A simple extension of the derivation in Example 5 (using the CLT) yields

P_{e}^{\infty} (θ, s) = Q (\sqrt{s^{'} I (θ) s}),

(92)

where s is considered a column vector, the superscript prime denotes vector transposition as before, and

I (θ) = \nabla^{2} ln Z (θ)

is the

d \times d

Fisher information matrix of the exponential family. Considering the case of the MSE,

ρ (ϵ) = {∥ ϵ ∥}^{2}

with v being any unit norm vector, we have

w (s) = lim_{u \to 0} \frac{{∥ s \cdot u ∥}^{2}}{{∥ v \cdot u ∥}^{2}} = {∥ s ∥}^{2},

(93)

and then the following lower bound is obtained w.r.t.

1 / ∥ v / \sqrt{n} ∥^{2} = n

:

\begin{matrix} r (θ) & \geq & sup_{s \in R^{d}} \{{2 ∥ s ∥}^{2} Q (\sqrt{s^{'} I (θ) s})\} \\ = & sup_{t \geq 0} \{2 t^{2} \cdot max_{{s : ∥ s ∥^{2} = t^{2}}} Q (\sqrt{s^{'} I (θ) s})\} \\ = & sup_{t \geq 0} \{2 t^{2} \cdot Q (t \sqrt{λ_{min} (θ)})\} \\ = & \frac{1}{λ_{min} (θ)} sup_{τ \geq 0} \{2 τ^{2} \cdot Q (τ)\} \\ \approx & \frac{0.3314}{λ_{min} (θ)}, \end{matrix}

(94)

where

λ_{min} (θ)

is the smallest eigenvalue of

I (θ)

. In this example, it is apparent that the chosen direction of the vector s is that of the eigenvector corresponding to the smallest eigenvalue of the Fisher information matrix. This completes Example 11.

In the vector case considered in this section, it is also instructive to extend the scope from two test points,

θ_{0}

and

θ_{1}

, to multiple test points,

θ_{0}, θ_{1}, \dots, θ_{m - 1}

, along with the corresponding weights (or priors),

q_{0}, q_{1}, \dots, q_{m - 1}

(see the exposition at the end of Section 2). Interestingly, this will also lead to new bounds even for the case of

m = 2

test points.

To this end, we also consider a set of m unitary transformation matrices,

T_{0}, T_{1}, T_{2}, \dots,

T_{m - 1}

, with the properties: (i)

θ \in Θ

if and only if

T_{i} θ \in Θ

for all

i = 0, 1, \dots, m - 1

, and (ii)

T_{0} + T_{1} + \dots + T_{m - 1} = 0

. For example, if

d = 2

, take

T_{i}

to be matrices of rotation by

2 π i / m

,

i = 0, 1, \dots, m - 1

.

Theorem 4.

Let

θ_{0}, θ_{1}, \dots, θ_{m - 1} \in Θ \subseteq R^{d}

,

d \geq 1

and

q_{0}, q_{1}, \dots, q_{m - 1}

be given, and let

T_{0}, T_{1}, \dots T_{m - 1}

be unitary transformations that sum to zero, as described in the above paragraph. Then, for a convex loss function

ρ (ε)

that depends on ε only via

∥ ε ∥

, we have:

R_{n} (Θ) \geq m \cdot ρ (\frac{1}{m} \sum_{i = 0}^{m - 1} T_{i} θ_{i}) \cdot \int_{R^{n}} min {q_{0} \cdot p (x | θ_{0}), \dots, q_{m - 1} p (x | θ_{m - 1})} d x .

(95)

As explained in Section 2, the integral on the right-hand side can be interpreted as the probability of list error with a list size of

m - 1

.

Proof of Theorem 4.

The proof is a direct extension of the proof of Theorem 1:

\begin{matrix} sup_{θ \in Θ} E_{θ} {ρ (g_{n} [X] - θ)} & \geq & \sum_{i = 0}^{m - 1} q_{i} E_{θ_{i}} {ρ (g_{n} [X] - θ_{i})} \\ = & \int_{R^{n}} [\sum_{i = 0}^{m - 1} q_{i} \cdot p (x | θ_{i}) ρ (g_{n} [x] - θ_{i})] d x \\ \geq & m \cdot \int_{R^{n}} min {q_{0} \cdot p (x | θ_{0}), \dots, q_{m - 1} p (x | θ_{m - 1})} \times \\ [\frac{1}{m} \sum_{i = 0}^{m - 1} ρ (θ_{i} - g_{n} [X])] d x \\ \overset{(a)}{=} & m \cdot \int_{R^{n}} min {q_{0} \cdot p (x | θ_{0}), \dots, q_{m - 1} p (x | θ_{m - 1})} \times \\ [\frac{1}{m} \sum_{i = 0}^{m - 1} ρ (T_{i} (θ_{i} - g_{n} [X]))] d x \\ \overset{(b)}{\geq} & m \cdot \int_{R^{n}} min {q_{0} \cdot p (x | θ_{0}), \dots, q_{m - 1} p (x | θ_{m - 1})} \times \\ ρ (\frac{1}{m} \sum_{i = 0}^{m - 1} T_{i} (θ_{i} - g_{n} [X])) d x \\ \overset{(c)}{=} & m \cdot \int_{R^{n}} min {q_{0} \cdot p (x | θ_{0}), \dots, q_{m - 1} p (x | θ_{m - 1})} \times \\ ρ (\frac{1}{m} \sum_{i = 0}^{m - 1} T_{i} θ_{i}) d x \\ = & m ρ (\frac{1}{m} \sum_{i = 0}^{m - 1} T_{i} θ_{i}) \cdot \int_{R^{n}} min {q_{0} \cdot p (x | θ_{0}), \dots, q_{m - 1} p (x | θ_{m - 1})} d x, \end{matrix}

where in (a), we have used the unitarity (and hence the norm-preserving property) of

{T_{i}}

, as

ρ (\cdot)

is assumed to depend only on the norm of the error vector; in (b), we used the convexity of

ρ

, and in (c), we used the fact that

\sum_{i = 0}^{m - 1} T_{i} = 0

, which implies that

\sum_{i = 0}^{m - 1} T_{i} g_{n} [X] = 0

, thus making the bound independent of the estimator,

g_{n}

. This completes the proof of Theorem 4. □

Theorem 1 is a special case where

m = 2

,

T_{0} = I

, and

T_{1} = - I

, where I is the

d \times d

identity matrix. The integral associated with the lower bound of Theorem 4 might not be trivial to evaluate in general for

m \geq 3

. However, there are some choices of the auxiliary parameters that may facilitate calculations. One such choice is as follows. For some positive integer

k < m

, take

θ_{0} = θ_{1} = \dots = θ_{k - 1} \overset{▵}{=} ϑ_{0}

, for some

ϑ_{0} \in Θ

,

q_{0} = q_{1} = \dots = q_{k - 1} \overset{▵}{=} Q / k

, for some

Q \in (0, 1)

,

θ_{k} = θ_{k + 1} = \dots = θ_{m - 1} \overset{▵}{=} ϑ_{1}

, for some

ϑ_{1} \in Θ

, and finally,

q_{k} = q_{k + 1} = \dots = q_{m - 1} = (1 - Q) / (m - k)

. The integrand then becomes the minimum between two functions only, as in Section 3. Denoting

α = k / m

, the bound then becomes

\begin{matrix} R_{n} (Θ) & \geq & m ρ (\frac{1}{m} \sum_{i = 0}^{k - 1} T_{i} (ϑ_{0} - ϑ_{1})) \cdot \int_{R^{n}} min \{\frac{Q}{k} \cdot p (x | ϑ_{0}), \frac{1 - Q}{m - k} \cdot p (x | ϑ_{1})\} d x \\ = & ρ (\frac{1}{m} \sum_{i = 0}^{k - 1} T_{i} (ϑ_{0} - ϑ_{1})) \cdot \int_{R^{n}} min \{\frac{Q}{α} \cdot p (x | ϑ_{0}), \frac{1 - Q}{1 - α} \cdot p (x | ϑ_{1})\} d x \\ = & ρ (\frac{1}{m} \sum_{i = 0}^{k - 1} T_{i} (ϑ_{0} - ϑ_{1})) \cdot (\frac{Q}{α} + \frac{1 - Q}{1 - α}) \cdot P_{e} (\frac{(1 - α) Q}{(1 - α) Q + α (1 - Q)}, ϑ_{0}, ϑ_{1}) . \end{matrix}

(96)

Redefining

q = \frac{(1 - α) Q}{(1 - α) Q + α (1 - Q)},

(97)

we have

\frac{Q}{α} + \frac{1 - Q}{1 - α} = \frac{1}{1 - α - q + 2 α q},

(98)

and the following corollary to Theorem 2 is obtained.

Corollary 2.

Let the conditions of Theorem 4 be satisfied. Then,

\begin{matrix} R_{n} (Θ) & \geq & sup_{ϑ_{0}, ϑ_{1}, α, q} ρ (\frac{1}{m} \sum_{i = 0}^{k - 1} T_{i} (ϑ_{0} - ϑ_{1})) \cdot \frac{P_{e} (q, ϑ_{0}, ϑ_{1})}{1 - α - q + 2 α q} \\ = & sup_{ϑ_{0}, ϑ_{1}, q} ρ (\frac{1}{m} \sum_{i = 0}^{k - 1} T_{i} (ϑ_{0} - ϑ_{1})) \cdot \frac{P_{e} (q, ϑ_{0}, ϑ_{1})}{min {q, 1 - q}} . \end{matrix}

(99)

Note that if m is even,

α = \frac{1}{2}

, and

T_{0} = I

; then, we are actually back to the bound of

m = 2

, and so, the optimal bound for even

m > 2

cannot be worse than our bound of

m = 2

. We do not have, however, precisely the same argument for an odd m, but for a large m, it becomes immaterial if m is even or odd. In its general form, the bound of Theorem 4 is a heavy optimization problem, as we have the freedom to optimize

θ_{0}, \dots, θ_{m - 1}

,

T_{0}, \dots, T_{m - 1}

(under the constraints that they are all unitary and sum to zero), and

q_{0}, \dots, q_{m - 1}

(under the constraints that they are all non-negative and sum to unity).

Another relatively convenient choice is to take

θ_{i} = T_{i}^{- 1} θ_{0}

,

i = 1, \dots, m - 1

to obtain another corollary to Theorem 4:

Corollary 3.

Let the conditions of Theorem 2 be satisfied. Then,

R_{n} (Θ) \geq sup_{θ_{0}, T_{0}, \dots, T_{m - 1}, q_{0}, \dots, q_{m - 1}} m ρ (θ_{0}) \cdot \int_{R^{n}} min {q_{0} p (x | θ_{0}), q_{1} p (x | T_{1}^{- 1} θ_{0}), \dots, q_{m - 1} p (x | T_{m - 1}^{- 1} θ_{0})} d x .

(100)

Example 12.

To demonstrate a calculation of the extended lower bound for

m = 3

, consider the following model. We are observing a noisy signal,

Z_{i} = ϑ ϕ_{i} + (ϑ + ζ) ψ_{i} + N_{i}, i = 1, 2, \dots, n,

(101)

where ϑ is the desired parameter to be estimated, ζ is a nuisance parameter, taking values within an interval

[- δ, δ]

for some

δ > 0

,

{N_{i}}

are i.i.d. Gaussian random variables with zero mean and variance

σ^{2}

, and

ϕ_{i}

and

ψ_{i}

are two given orthogonal waveforms with

\sum_{i = 1}^{n} ϕ_{i}^{2} = \sum_{i = 1}^{n} ψ_{i}^{2} = n

. Suppose we are interested in estimating ϑ based on the sufficient statistics

X = \frac{1}{n} \sum_{i = 1}^{n} Z_{i} ϕ_{i}

and

Y = \frac{1}{n} \sum_{i = 1}^{n} Z_{i} ψ_{i}

, which are jointly Gaussian random variables with the mean vector

(ϑ, ϑ + ζ)

and the covariance matrix

\frac{σ^{2}}{n} \cdot I

, I being the

2 \times 2

identity matrix. We denote realizations of

(X, Y)

by

(x, y)

. Let us also denote

θ = (ϑ, ϑ + ζ)

. Since we are interested only in estimating ϑ, our loss function will depend only on the estimation error of the first component of θ, which is ϑ. Consider the choice

m = 3

and let

T_{i}

be counter-clockwise rotation transformations by

2 π i / 3

,

i = 0, 1, 2

. For a given

Δ \in (0, δ]

, let us select

θ_{0} = (- Δ, 0)

,

θ_{1} = T_{1}^{- 1} θ_{0} = (Δ / 2, Δ \sqrt{3} / 2)

, and

θ_{2} = T_{2}^{- 1} θ_{0} = (Δ / 2, - Δ \sqrt{3} / 2)

. Finally, let

q_{0} = q_{1} = q_{2} = \frac{1}{3}

. In order to calculate the integral

I = \int_{R^{2}} min \{\frac{1}{3} p (x, y | θ_{0}), \frac{1}{3} p (x, y | θ_{1}), \frac{1}{3} p (x, y | θ_{2})\} d x d y,

the plane

R^{2}

can be partitioned into three slices over which the integrals contributed are equal. In each such region, the smallest

p (x, y | θ_{i})

is integrated. In other words, every

p (x, y | θ_{i})

in its turn is integrated over the region whose Euclidean distance to

θ_{i}

is larger than the distances to the other two values of θ. For

θ_{0} = (- Δ, 0)

, this is the region

{(x, y) : x \geq 0, | y | \leq x \sqrt{3}}

. The factor of

\frac{1}{3}

cancels with the three identical contributions from

θ_{0}

,

θ_{1}

, and

θ_{2}

due to the symmetry. Therefore,

\begin{matrix} I & = & \int_{0}^{\infty} d x \int_{- x \sqrt{3}}^{x \sqrt{3}} d y \cdot p (x, y | θ_{0}) \\ = & \int_{0}^{\infty} d x \int_{- x \sqrt{3}}^{x \sqrt{3}} \cdot \frac{n}{2 π σ^{2}} exp \{- \frac{n {(x + Δ)}^{2} + n y^{2}}{2 σ^{2}}\} d y \\ = & \sqrt{\frac{n}{2 π σ^{2}}} \cdot \int_{0}^{\infty} exp \{- \frac{n {(x + Δ)}^{2}}{2 σ^{2}}\} [1 - 2 Q (\frac{x \sqrt{3 n}}{σ})] d x . \end{matrix}

(102)

Our next mathematical manipulations in this example are in the spirit of the passage from Theorem 1 to Corollary 1, that is, selecting the test points increasingly close to each other as functions of n, so that the probability of list error would tend to a positive constant. To this end, we change the integration variable x to

u = x \sqrt{n} / σ

and select

Δ = s σ / \sqrt{n}

for some

s \geq 0

to be optimized later. Then,

\begin{matrix} I & = & \sqrt{\frac{n}{2 π σ^{2}}} \cdot \int_{0}^{\infty} exp \{- \frac{n {(u σ / \sqrt{n} + s σ / \sqrt{n})}^{2}}{2 σ^{2}}\} [1 - 2 Q (u \sqrt{3})] d (\frac{u σ}{\sqrt{n}}) \\ = & \frac{1}{\sqrt{2 π}} \cdot \int_{0}^{\infty} e^{- {(u + s)}^{2} / 2} \cdot [1 - 2 Q (u \sqrt{3})] d u . \end{matrix}

(103)

The MSE bound then becomes

\begin{matrix} R_{n} (θ, g_{n}) & \geq & sup_{s \geq 0} \{3 \cdot {(\frac{s σ}{\sqrt{n}})}^{2} \cdot \frac{1}{\sqrt{2 π}} \cdot \int_{0}^{\infty} e^{- {(u + s)}^{2} / 2} \cdot [1 - 2 Q (u \sqrt{3})] d u\} \\ = & \frac{σ^{2}}{n} \cdot sup_{s \geq 0} \{\frac{3 s^{2}}{\sqrt{2 π}} \cdot \int_{0}^{\infty} e^{- {(u + s)}^{2} / 2} \cdot [1 - 2 Q (u \sqrt{3})] d u\} \\ \approx & \frac{0.2514 σ^{2}}{n} . \end{matrix}

(104)

This bound is not as tight as the corresponding bound of

m = 2

, which results in

0.3314 σ^{2} / n

, but it should be kept in mind that here, we have not attempted to optimize the choices of

θ_{0}

,

θ_{1}

,

θ_{2}

,

T_{0}

,

T_{1}

,

T_{2}

,

q_{0}

, and

q_{1}

. Instead, we have chosen these parameter values from considerations of computational convenience, just to demonstrate the calculation. This concludes Example 12.

Finally, a comment is in order regarding the possible extensions of Section 4 to the vector case. Such extensions are conceptually straightforward whenever the loss function of the error vector is given by the sum of losses associated with the different components of the error vector. Most notably, when the loss function is the MSE,

ρ (ϵ) = {∥ ϵ ∥}^{2}

, each component of the estimation error can be handled separately by the methods of Section 4. Of course, the two or three test points should be chosen such that they differ in all components of the parameter vector. In the case of three test points, it makes sense to select them equally spaced along a certain straight line of a general direction in

Θ

. We will not pursue any further this extension in the framework of this work.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The author declares no conflict of interest.

Appendix A

Proof of Equation (44).

Consider the Taylor series expansion,

s (t, θ + δ) = s (t, θ) + δ \cdot \dot{s} (t, θ) + \frac{δ^{2}}{2} \cdot \ddot{s} (t, θ) + o (δ^{2}),

(A1)

where

\dot{s} (t, θ)

and

\ddot{s} (t, θ)

are the first two partial derivatives of

s (t, θ)

w.r.t.

θ

. Correlating both sides with

s (t, θ)

yields

\int_{0}^{T} s (t, θ) s (t, θ + δ) d t = E + δ \cdot \int_{0}^{T} s (t, θ) \dot{s} (t, θ) d t + \frac{δ^{2}}{2} \cdot \int_{0}^{T} s (t, θ) \ddot{s} (t, θ) d t + o (δ^{2}) .

(A2)

Now,

\int_{0}^{T} s (t, θ) \dot{s} (t, θ) d t = \frac{1}{2} \cdot \frac{\partial}{\partial θ} \{\int_{0}^{T} s^{2} (t, θ) d t\} = \frac{1}{2} \cdot \frac{\partial E}{\partial θ} = 0,

(A3)

since the energy E is assumed independent of

θ

. Also, since

\frac{\partial^{2} E}{\partial θ^{2}} = 0

, we have

\begin{matrix} 0 & = & \frac{\partial^{2}}{\partial θ^{2}} \{\int_{0}^{T} s^{2} (t, θ) d t\} \\ = & \frac{\partial}{\partial θ} \{2 \cdot \int_{0}^{T} s (t, θ) \dot{s} (t, θ) d t\} \\ = & 2 \cdot \int_{0}^{T} {\dot{s}}^{2} (t, θ) d t + 2 \cdot \int_{0}^{T} s (t, θ) \ddot{s} (t, θ) d t, \end{matrix}

(A4)

which yields

\int_{0}^{T} s (t, θ) \ddot{s} (t, θ) d t = - \int_{0}^{T} {\dot{s}}^{2} (t, θ) d t,

(A5)

and so,

\int_{0}^{T} s (t, θ) s (t, θ + δ) d t = E - \frac{δ^{2}}{2} \int_{0}^{T} {\dot{s}}^{2} (t, θ) d t + o (δ^{2}),

(A6)

and hence,

ϱ (θ, θ + Δ) = 1 - \frac{δ^{2}}{2 E} \int_{0}^{T} {\dot{s}}^{2} (t, θ) d t + o (δ^{2}) .

(A7)

□

Appendix B

Proof of Lemma 1.

First, observe that

\sum_{i = 1}^{k} a_{i} = \sum_{i = 1}^{k} r_{i} \cdot \frac{a_{i}}{r_{i}} \leq max \{\frac{a_{1}}{r_{1}}, \dots, \frac{a_{k}}{r_{k}}\},

(A8)

and since the inequality

\sum_{i = 1}^{k} a_{i} \leq max {a_{1} / r_{1}, \dots, a_{k} / r_{k}}

applies to all

r \in S

, it also applies to the infimum of

max {a_{1} / r_{1}, \dots, a_{k} / r_{k}}

over

S

, thus establishing the inequality “≤” between the two sides. To establish the “≥” inequality,

r^{*} \in S

is defined to be the vector whose components are given by

r_{i}^{*} = a_{i} / \sum_{j = 1}^{k} a_{j}

. Then,

inf_{(r_{1}, \dots, r_{k}) \in S} max \{\frac{a_{1}}{r_{1}}, \dots, \frac{a_{k}}{r_{k}}\} \leq max \{\frac{a_{1}}{r_{1}^{*}}, \dots, \frac{a_{k}}{r_{k}^{*}}\} = \sum_{i = 1}^{k} a_{i} .

(A9)

This completes the proof of Lemma 1. □

References

Van Trees, H.L. Detection, Estimation and Modulation Theory: Part I; John Wiley & Sons, Inc.: New York, NY, USA, 1968. [Google Scholar]
Bobrovsky, B.; Zakai, M. A lower bound on the estimation error for certain diffusion processes. IEEE Trans. Inform. Theory 1976, 22, 45–52. [Google Scholar] [CrossRef]
Bellini, S.; Tartara, G. Bounds on error in signal parameter estimation. IEEE Trans. Commun. 1974, 22, 340–342. [Google Scholar] [CrossRef]
Chazan, D.; Zakai, M.; Ziv, J. Improved lower bounds on signal parameter estimation. IEEE Trans. Inform. Theory 1975, 21, 90–93. [Google Scholar] [CrossRef]
Weiss, A.J. Fundamental Bounds in Parameter Estimation. Ph.D. Thesis, Department of Electrical Engineering—Systems, Tel Aviv University, Tel Aviv, Israel, June 1985. [Google Scholar]
Weiss, A.J.; Weinstein, E. A lower bound on the mean square error inrandom parameter estimation. IEEE Trans. Inf. Theory 1985, 31, 680–682. [Google Scholar] [CrossRef]
Van Tress, H.L.; Bell, K.L. (Eds.) Bayesian Bounds for Parametric Estimation and Nonlinear Filtering/Tracking; John Wiley & Sons, Inc.: New York, NY, USA, 2007. [Google Scholar]
Fisher, R.A. On the mathematical foundations of theoretical statistics. Philos. Trans. R. Soc. 1922, 222, 309. [Google Scholar]
Dugue, D. Application des proprietes de la limite au sens du calcul des probabilities a l’etudes des diverses questions d’estimation. Ecol. Poly. 1937, 3, 305–372. [Google Scholar]
Frechet, M. Sur l’extension de certaines evaluations statistiques au cas de petits echantillons. Rev. Inst. Int. Stat. 1943, 11, 182–205. [Google Scholar] [CrossRef]
Rao, C.R. Information and accuracy attainable in the estimation of statistical parameters. Bull. Calcutta Math. Soc. 1945, 37, 81–91. [Google Scholar]
Cramér, H. Mathematical Methods of Statistics; Princeton University Press: Princeton, NJ, USA, 1946. [Google Scholar]
Bhattacharyya, A. On some analogues of the amount of information and their use in statistical estimation. Sankhya India J. Stat. 1946, 8, 1–14. [Google Scholar]
Barankin, E.W. Locally best unbiased estimators. Ann. Math. Stat. 1949, 20, 477–501. [Google Scholar] [CrossRef]
Chapman, D.G.; Robbins, H. Minimum variance estimation without regularity assumption. Ann. Math. Stat. 1951, 22, 581–586. [Google Scholar] [CrossRef]
Fraser, D.A.; Guttman, I. Bhattacharyya bound without regularity assumptions. Ann. Math. Stat. 1952, 23, 629–632. [Google Scholar] [CrossRef]
Kiefer, J. On minimum variance estimation. Ann. Math. Stat. 1952, 23, 627–629. [Google Scholar] [CrossRef]
Ziv, J.; Zakai, M. Some lower bounds on signal parameter estimation. IEEE Trans. Inform. Theory 1969, 15, 386–391. [Google Scholar] [CrossRef]
Hájek, J. Local asymptotic minimax and admissibility in estimation. In Proceedings of the Sixth Berekely Symposium on Mathematical Statistics and Probability; University of California Press: Berkeley, CA, USA, 1972; pp. 175–194. [Google Scholar]
Le Cam, L.M. Convergence of estimates under dimensionality restrictions. Ann. Stat. 1973, 1, 38–53. [Google Scholar]
Assouad, P. Deux remarques sur l’estimation. C. R. L’Acad. Sci. Paris Ser. Math. 1983, 296, 1021–1024. [Google Scholar]
Fano, R.M. Transmission of Information: A Statistical Theory of Communication; The M.I.T. Press: Cambridge, MA, USA, 1961. [Google Scholar]
Lehmann, E.L. Theory of Point Estimation; John Wiley & Sons: New York, NY, USA, 1983. [Google Scholar]
Nazin, A.V. On minimax bound for parameter estimation in ball (bias accounting). In New Trends in Probability and Statistics; Walter de Gruyter GmbH: Berlin, Germany, 1991; pp. 612–616. [Google Scholar]
Yang, Y.; Barron, A.R. Information-theoretic determination of minimax rates of convergence. Ann. Stat. 1999, 27, 1564–1599. [Google Scholar] [CrossRef]
Guntuboyina, A. Minimax Lower Bounds. Ph.D. Thesis, Statistics Department, Yale University, New Haven, CT, USA, 2011. [Google Scholar]
Guntuboyina, A. Lower bounds for the minimax risk using f-divergences, and applications. IEEE Trans. Inform. Theory 2011, 57, 2386–2399. [Google Scholar] [CrossRef]
Kim, A.K.H. Obtaining minimax lower bounds: A review. J. Korean Stat. Soc. 2020, 49, 673–701. [Google Scholar] [CrossRef]
Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
Le Cam, L. Asymptotic Methods in Statistical Decision Theory; Springer: New York, NY, USA, 1986. [Google Scholar]
Yu, B. Assouad, Fano and Le Cam. In A Festschrift for Lucien Le Cam; Pollard, D., Torgersen, E., Yang, G.L., Eds.; Springer: New York, NY, USA, 1997; pp. 423–435. [Google Scholar]
Cai, T.T.; Zhou, H.H. Optimal rates of convergence for sparse covariance matrix estimation. Ann. Stat. 2012, 40, 2389–2420. [Google Scholar] [CrossRef]
Helstrom, C.W. Statistical Theory of Signal Detection, 2nd ed.; Pergamon Press: Oxford, UK, 1975. [Google Scholar]
Whalen, D. Detection of Signals in Noise; Academic Press: New York, NY, USA, 1971. [Google Scholar]
Parag, P. Lecture 17: Minimax Bounds. In Foundations of Machine Learning; Indian Institute of Science: Bangalore, India, 2022; Available online: https://ece.iisc.ac.in/~parimal/2022/ml/lecture-17.pdf (accessed on 18 September 2024).
Rogers, J. Information Theoretic Minimax Lower Bounds. 2018. Available online: https://jenniferbrennan.github.io/resources/Minimax_Lower_Bound_Notes.pdf (accessed on 18 September 2024).
Rigollet, P. Chapter 5: Minimax lower bounds. In High Dimensional Statistics; OpenCourseWare Lecture Notes; Massachusetts Institute of Technology: Cambridge, MA, USA, 2015; Available online: https://ocw.mit.edu/courses/18-s997-high-dimensional-statistics-spring-2015/501374d1714bfd55ff6345189b9c2e26_MIT18_S997S15_Chapter5.pdf (accessed on 18 September 2024).
Zhang, Z.; Shi, Z.; Gu, Y. Ziv–Zakai bound for DOAs estimation. IEEE Trans. Signal Process. 2022, 71, 136–149. [Google Scholar] [CrossRef]
Bell, K.L.; Steinberg, Y.; Ephraim, Y.; Van Trees, H.L. Extended Ziv–Zakai lower bound for vector parameter estimation. IEEE Trans. Inform. Theory 1997, 43, 624–637. [Google Scholar] [CrossRef]
Zhang, Z.; Shi, Z.; Zhou, C.; Yan, C.; Gu, Y. Ziv–Zakai bound for compressive time delay estimation. IEEE Trans. Signal Process. 2022, 70, 4006–4019. [Google Scholar] [CrossRef]
Sun, J.; Ma, S.; Xu, G.; Li, S. Trade-off between positioning and communication for millimeter wave systems with Ziv–Zakai bound. IEEE Trans. Commun. 2023, 71, 3752–3762. [Google Scholar] [CrossRef]
Mishra, K.V.; Eldar, Y.C. Performance of time delay estimation in a cognitive radar. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp. 3141–3145. [Google Scholar]
Bell, K.L.; Ephraim, Y.; Van Trees, H.L. Explicit Ziv–Zakai lower bound for bearing estimation. IEEE Trans. Signal Process. 1996, 44, 2810–2824. [Google Scholar] [CrossRef]
Zhang, Z.; Shi, Z.; Gu, Y.; Greco, M.S.; Gini, F. Ziv–Zakai bound for compressive time delay estimation from zero-mean Gaussian signal. Signal Process. Lett. 2023, 30, 1112–1116. [Google Scholar] [CrossRef]
Chiriac, V.M.; He, Q.; Haimovich, A.M.; Blum, R.S. Ziv–Zakai bound for joint parameter estimation in MIMO radar systems. IEEE Trans. Signal Process. 2015, 63, 4956–4968. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Merhav, N. Two New Families of Local Asymptotically Minimax Lower Bounds in Parameter Estimation. Entropy 2024, 26, 944. https://doi.org/10.3390/e26110944

AMA Style

Merhav N. Two New Families of Local Asymptotically Minimax Lower Bounds in Parameter Estimation. Entropy. 2024; 26(11):944. https://doi.org/10.3390/e26110944

Chicago/Turabian Style

Merhav, Neri. 2024. "Two New Families of Local Asymptotically Minimax Lower Bounds in Parameter Estimation" Entropy 26, no. 11: 944. https://doi.org/10.3390/e26110944

APA Style

Merhav, N. (2024). Two New Families of Local Asymptotically Minimax Lower Bounds in Parameter Estimation. Entropy, 26(11), 944. https://doi.org/10.3390/e26110944

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Two New Families of Local Asymptotically Minimax Lower Bounds in Parameter Estimation

Abstract

1. Introduction

2. Problem Setting, Background, Definitions, and Notation

3. Lower Bounds for Convex Symmetric Loss Functions

4. Bounds Based on the Minimum Expected Loss over Some Test Points

4.1. Two Test Points

4.2. Three Test Points

5. Extensions to the Vector Case

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI