Asymptotic Properties of MSE Estimate for the False Discovery Rate Controlling Procedures in Multiple Hypothesis Testing

Palionnaya, Sofia; Shestakov, Oleg

doi:10.3390/math8111913

Open AccessArticle

Asymptotic Properties of MSE Estimate for the False Discovery Rate Controlling Procedures in Multiple Hypothesis Testing

by

Sofia Palionnaya

^1,2,* and

Oleg Shestakov

^1,2,3,*

¹

Faculty of Computational Mathematics and Cybernetics, M. V. Lomonosov Moscow State University, 119991 Moscow, Russia

²

Moscow Center for Fundamental and Applied Mathematics, 119991 Moscow, Russia

³

Institute of Informatics Problems, Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, 119333 Moscow, Russia

^*

Authors to whom correspondence should be addressed.

Mathematics 2020, 8(11), 1913; https://doi.org/10.3390/math8111913

Submission received: 14 October 2020 / Revised: 27 October 2020 / Accepted: 29 October 2020 / Published: 1 November 2020

(This article belongs to the Special Issue Analytical Methods and Convergence in Probability with Applications)

Download Versions Notes

Abstract

:

Problems with analyzing and processing high-dimensional random vectors arise in a wide variety of areas. Important practical tasks are economical representation, searching for significant features, and removal of insignificant (noise) features. These tasks are fundamentally important for a wide class of practical applications, such as genetic chain analysis, encephalography, spectrography, video and audio processing, and a number of others. Current research in this area includes a wide range of papers devoted to various filtering methods based on the sparse representation of the obtained experimental data and statistical procedures for their processing. One of the most popular approaches to constructing statistical estimates of regularities in experimental data is the procedure of multiple testing of hypotheses about the significance of observations. In this paper, we consider a procedure based on the false discovery rate (FDR) measure that controls the expected percentage of false rejections of the null hypothesis. We analyze the asymptotic properties of the mean-square error estimate for this procedure and prove the statements about the asymptotic normality of this estimate. The obtained results make it possible to construct asymptotic confidence intervals for the mean-square error of the FDR method using only the observed data.

Keywords:

false discovery rate; mean-square risk estimate; thresholding

1. Introduction

The problems involved in testing statistical hypotheses occupy an important place in applied statistics and are used in such areas as genetics, biology, astronomy, radar, computer graphics, etc. The classical methods for solving these problems are based on a single hypothesis test. There is a sample X of size m and the null hypothesis

H_{0}

is tested against the general alternative

H_{1}

. The hypothesis is tested using the statistic T, a function of the sample with a known distribution under the null hypothesis (zero distribution). For a given zero distribution, the attainable p-values are calculated, and the decision to reject the null hypothesis is made on their basis. Errors arising from the application of this one-time hypothesis testing algorithm are divided into two types, and the probability of falsely rejecting the correct null hypothesis (the probability of a type I error) is bounded by a given significance level

α

:

P (type I error) = P (T \geq t | H_{0}) \leq α,

where t is the critical threshold value.

With this approach, we can often not only find the region for which the

α

-constraint on the probability of a type I error is satisfied, but also minimize the probability of a type II error, i.e., maximize the statistical power.

When considering the problem of multiple hypothesis testing, the task becomes more complicated: now we are dealing with n different null hypotheses

{H_{0_{i}}, i = 1, \dots, n}

and the alternatives

{H_{1_{i}}, i = 1, \dots, n}

. These hypotheses are tested by statistics

T_{i}

with given zero distributions. Thus, for each hypothesis, the attainable p-values

{p_{i}, i = 1, \dots, n}

can be calculated as well as type II error probabilities.

Let us introduce the notation:

M_{0}

is the set of indices of true null hypotheses, R is the set of indices of rejected hypotheses. Then

V = | M_{0} \cap R |

is the number of type I errors. The task is to minimize V by changing the parameter R.

There are many statistical procedures that offer different ways to solve the multiple hypothesis testing problem. One of the first measures proposed to generalize the type I error was the family-wise error rate (FWER) [1]. This value is defined as the probability of making at least one type I error, i.e., instead of controlling the probability of a type I error at the level

α

for each test, the overall FWER is controlled:

FWER = P (V \geq 1) \leq α

. However, such a strict criterion significantly increases the type II error for a large number of tested hypotheses.

In [2], an alternative measure called the false discovery rate (FDR) was proposed. This measure assumes to control the expected proportion of false rejections:

FDR = E (\frac{V}{max (R, 1)}) .

This approach is widely used in situations where the number of tested hypotheses is so large that it is preferable to allow a certain number of type I errors in order to increase the statistical power.

To control FDR, the Benjamini–Hochberg [2] multiple hypothesis testing algorithm is often used, which under the condition of independency of the testing statistics allows the FDR value to be bounded by the parameter

α

, i.e.,

E (\frac{V}{max (R, 1)}) \leq α .

In this procedure, the significance levels change linearly:

α_{i} = \frac{i}{n} α, i = 1, \dots, n .

To apply the Benjamini–Hochberg method, a variational series is constructed from the attained p-values:

p_{(1)} \leq p_{(2)} \leq \dots \leq p_{(n)} .

All hypotheses

H_{0_{1}}, \dots, H_{0_{k}}

are rejected, where k,

k \in {1, n}

, is the maximum index for which

p_{(i)} \leq α_{i} .

There are other measures to control the total number of type I errors. In [1], a q-value is considered that provides control of the positive false discovery rate (pFDR). Controlling the full coverage ratio (FCR) involves solving the problem of multiple hypothesis testing in terms of the confidence intervals [3]. The papers [4,5] are devoted to the harmonic mean p-value (HMP) method. However, in this paper we focus on the properties of the FDR method. It is believed that the widespread use of the FDR measure is due to the development of technologies that allow collecting and analyzing large amounts of data. Computing power makes it easy to perform hundreds or thousands of statistical tests on a given data set, and therefore the use of FWER loses its relevance.

In this paper, we study the asymptotic properties of the mean-square risk estimate for the FDR method in the problem of multiple hypothesis testing for the mathematical expectation of a Gaussian vector with independent components. The consistency of this estimate was proved in [6]. In this paper, we prove its asymptotic normality.

The paper is organized as follows. Section 2 provides some basic information about the statement of the problem and the considered vector classes. In Section 3 we define the mean-square risk of the thresholding method and describe the properties of the FDR-threshold. Section 4 considers the asymptotic properties of the mean-square risk estimate, and Section 5 contains some concluding remarks.

2. Preliminaries

Consider the problem of estimating the mathematical expectation of a Gaussian vector

X_{i} = μ_{i} + W_{i}, i = 1, \dots, n,

(1)

where

W_{i}

are independent normally distributed random variables with zero expectation and known variance

σ^{2}

, and

μ = (μ_{1}, \dots, μ_{n})

is an unknown vector belonging to some given set (class). The key assumption adopted in this paper is the “sparsity” of the vector

μ

, i.e., it is assumed that only a relatively small number of its components are significantly large. A similar problem statement arises, for example, in the analysis and processing of signals containing noise. In this case, the sparsity or “economical” representation of the signal is achieved using some special preprocessing, for example, a discrete wavelet transform of the signal vector.

In this paper, we consider the following definitions of sparsity. Let

{∥μ∥}_{0}

denote the number of nonzero components of

μ

. Fixing

η_{n}

, define the class

L_{0} (η_{n}) = {μ \in R^{n} : {∥μ∥}_{0} \leq η_{n} n} .

For small values of

η_{n}

, only a small number of vector components are nonzero.

Another possible way to define sparsity is to limit the absolute values of

μ_{i}

. To do this, consider the sorted absolute values

{| μ |}_{(1)} \geq \dots \geq {| μ |}_{(n)}

and for

0 < p < 2

define the class

L_{p} (η_{n}) = {μ \in R^{n} {: | μ |}_{(k)} \leq η_{n} n^{1 / p} k^{- 1 / p} for all k = 1, \dots, n} .

In addition, sparsity can be modeled using the

ℓ_{p}

-norm

{∥μ∥}_{p} = {(\sum_{i = 1}^{n} {| μ_{i} |}^{p})}^{1 / p} .

In this case, the sparse class is defined as

M_{p} (η_{n}) = {μ \in R^{n} : \sum_{i = 1}^{n} | μ_{i} |^{p} \leq η_{n}^{p} n} .

There are important relationships between these classes. As

p \to 0

, the

ℓ_{p}

-norm approaches

ℓ_{0}

:

{∥μ∥}_{p}^{p} \to {∥μ∥}_{0}

. The embedding

M_{p} (η_{n}) \subset L_{p} (η_{n})

is also valid.

3. Mean-Square Risk and Properties of the FDR Threshold

In the considered problem, one of the widespread and well-proven methods for constructing an estimate of

μ

is the method of (hard) thresholding of each vector component:

{\hat{μ}}_{i} = ρ_{H} (X_{i}, T) = \{\begin{matrix} X_{i} & | X_{i} | > T, \\ 0 & | X_{i} | \leq T, \end{matrix}

(2)

i.e., the vector component is zeroed if its absolute value does not exceed the critical threshold T. This procedure is equivalent to testing the hypothesis of zero mathematical expectation for each component of the vector, and when using the FDR method, the threshold value T is selected according to the following rule. The initial sample is used to construct a variational series of decreasing absolute values

{| X |}_{(1)} \geq \dots \geq {| X |}_{(n)},

and

{| X |}_{(k)}

are compared with the right tail Gaussian quantiles

t_{k} = σ z (α / 2 \cdot k / n)

. Let

k_{F}

be the largest index k for which

{| X |}_{(k)} \geq t_{k}

, then the threshold

T_{F} = t_{k_{F}}

is chosen.

In combination with hypothesis testing methods, the penalty method is also widely used, in which the target loss function is minimized with the addition of a penalty term [7,8,9]. In a particular case, this method leads to the so-called soft thresholding: the estimates of the vector components are calculated according to the rule

{\hat{μ}}_{i} = ρ_{S} (X_{i}, T) = \{\begin{matrix} X_{i} - T & X_{i} > T, \\ X_{i} + T & X_{i} < - T, \\ 0 & | X_{i} | \leq T . \end{matrix}

(3)

This approach is in some cases more adequate than (2), since the function

ρ_{S}

in (3) is continuous in T.

The mean-square error (or risk) of the considered procedures is determined as

R (T) = \sum_{i = 1}^{n} E {({\hat{μ}}_{i} - μ_{i})}^{2} .

(4)

Methods for selecting the threshold value T are usually focused on minimizing the risk (4) provided that the vector

μ

belongs to a given class. A “perfect” value of the threshold is

T_{m i n} : R (T_{m i n}) = min_{T} R (T) .

Note that the expression (4) contains unknown values of

μ_{i}

and it is impossible to calculate

R (T)

and

T_{m i n}

in practice. Therefore, a minimax approach is used. The threshold

T_{F}

is calculated based on the observed values of

X_{i}

and has the property of an adaptive minimax optimality in the considered sparse classes [7]. In addition,

T_{F}

has the following important property [7], which we will use later in proving the asymptotic normality of the risk estimate.

Theorem 1.

[7] Suppose that

μ \in L_{0} (η_{n})

or

μ \in L_{p} (η_{n}), 0 < p < 2

, where

η_{n} \in [n^{- 1} {(log n)}^{5}, n^{- γ}]

for

L_{0} (η_{n})

and

η_{n}^{p} \in [n^{- 1} {(log n)}^{5}, n^{- γ}]

for

L_{p} (η_{n})

,

0 < γ < 1

. Then there exists

c > 0

such that for the FDR-threshold

T_{F}

with a controlling parameter

α_{n} \to 0

and large n,

sup_{μ \in L_{0} [η_{n}]} P (T_{F} < T_{1}) \leq 2 n exp {- c α_{n} κ_{n} γ_{n}^{2}},

sup_{μ \in L_{p} [η_{n}]} P (T_{F} < T_{1}) \leq 2 n exp {- c α_{n} κ_{n} γ_{n}^{2}},

where

γ_{n} = \frac{1}{log log n}, κ_{n} = \frac{n η_{n}}{1 - α_{n} - γ_{n}}, T_{1} = σ {(2 log η_{n}^{- 1})}^{1 / 2}

(5)

for

L_{0} (η_{n})

and

γ_{n} = \frac{1}{log log n}, τ_{η} = σ {(2 log η_{n}^{- p})}^{1 / 2}, κ_{n} = \frac{n η_{n}^{p} τ_{η}^{- p}}{1 - α_{n} - γ_{n}}, T_{1} = σ {(2 log η_{n}^{- p})}^{1 / 2}

(6)

for

L_{p} (η_{n})

.

Thus, if

α_{n}

is chosen so that

\frac{α_{n} κ_{n} γ_{n}^{2}}{log n} \to \infty

, the value of

T_{1}

is the lower bound for the threshold

T_{F}

.

Note also that the so-called universal threshold

T_{U} = σ \sqrt{2 log n}

is popular as well. This threshold is, in a certain sense, the maximum (it was shown in [10,11] that

T > T_{U}

can be ignored). Based on this, we will assume everywhere that

T \leq T_{U}

.

4. Asymptotic Properties of the Risk Estimate

As already mentioned, since the expression (4) explicitly depends on the unknown values of

μ_{i}

, it cannot be calculated in practice. However, it is possible to construct its estimate, which is calculated using only the observed data. This estimate is determined by the expression

\hat{R} (T) = \sum_{i = 1}^{n} F [X_{i}, T],

(7)

where

F [X_{i}, T] = (X_{i}^{2} - σ^{2}) 1 (| X_{i} | \leq T) + σ^{2} 1 (| X_{i} | > T)

for the hard thresholding and

F [X_{i}, T] = (X_{i}^{2} - σ^{2}) 1 (| X_{i} | \leq T) + (σ^{2} + T^{2}) 1 (| X_{i} | > T)

for the soft thresholding [12].

In [6] it is proved that the estimate (7) is consistent.

Theorem 2.

[6] Let the conditions of Theorem 1 be satisfied and

α_{n} \to 0

as

n \to \infty

so that

\frac{α_{n} κ_{n} γ_{n}^{2}}{log n} \to \infty

, then

\frac{\hat{R} (T_{F}) - R (T_{m i n})}{n} \overset{P}{\to} 0 .

Let us prove a statement about the asymptotic normality of the estimate (7), which, in particular, allows constructing asymptotic confidence intervals for the mean-square risk (4). In the proof, we will use the same notation C for different positive constants that may depend on the parameters of the classes and methods under consideration, but do not depend on n.

First, consider the class

L_{0} (η_{n})

.

Theorem 3.

Let

μ \in L_{0} (η_{n})

,

η_{n} \in [n^{- 1} {(log n)}^{5}, n^{- γ}]

,

1 / 2 < γ < 1

. Let

T_{F}

be the FDR-threshold with a controlling parameter

α_{n} \to 0

and

\frac{α_{n} κ_{n} γ_{n}^{2}}{log n} \to \infty

as

n \to \infty

, where

κ_{n}

and

γ_{n}

are defined in (5). Then

\frac{\hat{R} (T_{F}) - R (T_{m i n})}{σ^{2} \sqrt{2 n}} \Rightarrow N (0, 1)

Proof.

Let us prove the theorem for the soft thresholding method. In the case of hard thresholding, the proof is similar.

Denote

U (T) = \hat{R} (T) - \hat{R} (T_{m i n}) = \sum_{i = 1}^{n} H_{i} (T, T_{m i n}),

where

H_{i} (T, T_{m i n}) = F [X_{i}, T] - F [X_{i}, T_{m i n}],

and write

\hat{R} (T_{F}) - R (T_{m i n}) + \hat{R} (T_{m i n}) - \hat{R} (T_{m i n}) = \hat{R} (T_{m i n}) - R (T_{m i n}) + U (T_{F}) .

Let us show that

\frac{\hat{R} (T_{m i n}) - R (T_{m i n})}{σ^{2} \sqrt{2 n}} \Rightarrow N (0, 1)

(8)

With soft thresholding,

\hat{R} (T_{m i n})

is an unbiased estimate of

R (T_{m i n})

, and with hard thresholding, under the conditions of the theorem the bias tends to zero when divided by

\sqrt{n}

[12]. For the variance of the numerator [13]

lim_{n \to \infty} \frac{D \sum_{i = 1}^{n} (F [X_{i}, T_{m i n}] - E F [X_{i}, T_{m i n}])}{D \sum_{i = 1}^{n} X_{i}^{2}} = 1 .

(9)

Moreover, since

X_{i}

are independent,

D X_{i}^{2} = 2 σ^{4} + 4 σ^{2} μ_{i}^{2}

and the number of nonzero

μ_{i}

does not exceed

η_{n} n

, we obtain

lim_{n \to \infty} \frac{D \sum_{i = 1}^{n} X_{i}^{2}}{2 σ^{4} n} = 1 .

(10)

Finally, the Lindeberg condition is met: for any

ϵ > 0

as

n \to \infty

\frac{1}{V_{n}^{2}} \sum_{i = 1}^{n} E [{(F [X_{i}, T_{m i n}] - E F [X_{i}, T_{m i n}])}^{2} 1 (| F [X_{i}, T_{m i n}] - E F [X_{i}, T_{m i n}] | > ε V_{n})] \to 0,

(11)

where

V_{n}^{2} = D \sum_{i = 1}^{n} (F [X_{i}, T_{m i n}] - E F [X_{i}, T_{m i n}])

. Indeed, due to (9) and (10) and since the summands in

\hat{R} (T_{m i n})

are modulo bounded by the value

T_{U}^{2} + σ^{2}

, starting from some n all indicators in (11) vanish.

Therefore, (8) holds, and to prove the theorem it remains to show that

\frac{U (T_{F})}{\sqrt{n}} \overset{P}{\to} 0 .

Repeating the reasoning from [14,15,16] it can be shown that

T_{m i n} \geq T_{1} - α_{n}

, where

| α_{n} | \leq C \frac{log log n}{\sqrt{log n}}

. To shorten the notation without compromising the proof, we can omit

α_{n}

and assume that

T_{m i n} \geq T_{1}

.

For any

ε > 0

P (\frac{| U (T_{F}) |}{\sqrt{n}} > ε) \leq P (T_{F} \leq T_{1}) + P (\frac{sup_{T \in [T_{1}, T_{U}]} | U (T) |}{\sqrt{n}} > ε) \leq

P (T_{F} \leq T_{1}) + P (\frac{sup_{T \in [T_{1}, T_{U}]} | U (T) - E U (T) | + sup_{T \in [T_{1}, T_{U}]} | E U (T) |}{\sqrt{n}} > ε) .

Let

U (T) = S_{1} (T) + S_{2} (T)

,

T \in [T_{1}, T_{U}]

, where the sum

S_{2} (T)

contains terms with

μ_{i} = 0

, and

S_{1} (T)

contains all other terms. By the definition of the class

L_{0} (η_{n})

, the number of terms in

S_{1} (T)

does not exceed

n_{1} \approx η_{n} n

. Moreover, the absolute value of each term is bounded by

T_{U}^{2} + σ^{2}

. For convenience, we will assume that

S_{1} (T)

contains terms with indices from 1 to

n_{1}

, i.e.,

S_{1} (T) = \sum_{i = 1}^{n_{1}} H_{i} (T, T_{m i n}), S_{2} (T) = \sum_{i = n_{1} + 1}^{n} H_{i} (T, T_{m i n}) .

Next,

E U (T) = E S_{1} (T) + E S_{2} (T)

and

sup_{T \in [T_{1}, T_{U}]} | E S_{1} (T) | \leq n_{1} (T_{U}^{2} + σ^{2})

. Given the definition of the class

L_{0} (η_{n})

and the form of

T_{1}

, it can be shown that for the terms of

S_{2}

the estimate

| E H_{i} (T, T_{m i n}) {| \leq C (log n)}^{1 / 2} n^{- γ}

is valid when

T \in [T_{1}, T_{U}]

. So

sup_{T \in [T_{1}, T_{U}]} | E U (T) | \leq C log n n^{1 - γ}

and for

γ > 1 / 2

\frac{sup_{T \in [T_{1}, T_{U}]} | E U (T) |}{\sqrt{n}} \to 0

(12)

as

n \to \infty

.

Next, take

T < T^{'}

and denote

Z_{1} (T) = S_{1} (T) - E S_{1} (T), N_{1} (T, T^{'}) = \sum_{i = 1}^{n_{1}} 1 (T < | x_{i} | \leq T^{'}) .

Then [10]

| Z_{1} (T) - Z_{1} (T^{'}) | \leq 4 σ^{2} N_{1} (T, T^{'}) + 2 n_{1} (T^{' 2} - T^{2}) a . s .

(13)

Divide the segment

[T_{1}, T_{U}]

into equal parts:

T_{j} = T_{1} + j δ_{n_{1}} \in [T_{1}, T_{U}]

,

j = 1, \dots, n_{1} - 1

,

δ_{n_{1}} = (T_{U} - T_{1}) / n_{1}

. Then

A_{n} = \{sup_{T \in [T_{1}, T_{U}]} | S_{1} (T) - E S_{1} (T) | \geq 5 ε \sqrt{n}\} \subset D_{n} \cup E_{n},

where

D_{n} = \{sup_{j} | Z_{1} (T_{j}) | > ε \sqrt{n}\}, E_{n} = \{sup_{j} sup_{T \in [T_{j}, T_{j} + δ_{n_{1}})} | Z_{1} (T) - Z_{1} (T_{j}) | \geq 4 ε \sqrt{n}\} .

Applying the Hoeffding inequality [17] for

D_{n}

, we obtain the estimate

P (D_{n}) \leq \sum_{j} P (| Z_{1} (T_{j}) | > ε \sqrt{n}) \leq 2 n_{1} exp \{- \frac{ε^{2} n}{2 {(T_{U}^{2} + 2 σ^{2})}^{2} n_{1}}\} \leq 2 n^{1 - γ} exp \{- \frac{C ε^{2} n^{1 - γ}}{log n}\} .

Then, given (13),

E_{n} \subset \{sup_{j} sup_{T \in [T_{j}, T_{j} + δ_{n_{1}})} 4 σ^{2} N_{1} (T_{j}, T) + 4 n_{1} \cdot T_{U} δ_{n_{1}} \geq 4 ε \sqrt{n}\} \subset

\subset \{sup_{j} σ^{2} N_{1} (T_{j}, T_{j} + δ_{n_{1}}) + n_{1} T_{U} δ_{n_{1}} \geq ε \sqrt{n}\} = \{sup_{j} σ^{2} N_{1} (T_{j}, T_{j} + δ_{n_{1}}) \geq ε \sqrt{n} - n_{1} T_{U} δ_{n_{1}}\} .

It is easy to show that

E N_{1} (T_{j}, T_{j} + δ_{n_{1}}) \leq C n_{1} δ_{n_{1}}

. Hence,

\{sup_{j} σ^{2} N_{1} (T_{j}, T_{j} + δ_{n_{1}}) \geq ε \sqrt{n} - n_{1} T_{U} δ_{n_{1}}\} \subset

\subset \{sup_{j} \frac{1}{n_{1}} | N_{1} (T_{j}, T_{j} + δ_{n_{1}}) - E N_{1} (T_{j}, T_{j} + δ_{n_{1}}) | \geq \frac{ε \sqrt{n}}{σ^{2} n_{1}} - \frac{T_{U} δ_{n_{1}}}{σ^{2}} - \frac{C}{σ^{2}} δ_{n_{1}}\} = E_{n}^{^{'}} .

Applying the Hoeffding inequality, we obtain

P (E_{j}^{^{″}}) = P (\frac{1}{n_{1}} | N_{1} (t_{j}, t_{j} + δ_{n_{1}}) - E N_{1} (t_{j}, t_{j} + δ_{n_{1}}) | \geq \frac{ε \sqrt{n}}{σ^{2} n_{1}} - \frac{T_{U} δ_{n_{1}}}{σ^{2}} - \frac{C}{σ^{2}} δ_{n_{1}}) \leq

\leq 2 exp \{- 2 {(\frac{ε \sqrt{n}}{σ^{2} n_{1}} - \frac{T_{U} δ_{n_{1}}}{σ^{2}} - \frac{c}{σ^{2}} δ_{n_{1}})}^{2} n_{1}\} \leq 2 exp \{- C ε^{2} n^{1 - γ}\} .

Hence,

P (E_{n}^{^{'}}) \leq \sum_{j = 1}^{n_{1}} P (E_{j}^{^{″}}) \leq C n^{1 - γ} exp \{- C ε^{2} n^{1 - γ}\} .

Thus, for an arbitrary

ε > 0

P (sup_{T \in [T_{1}, T_{U}]} \frac{| S_{1} (T) - E S_{1} (T) |}{\sqrt{n}} > ε) \to 0

(14)

as

n \to \infty

.

Let us now consider the sum

S_{2} (T)

. For large n, the number of terms in this sum is

n - n_{1} \approx n

. Repeating the above reasoning, we divide the segment

[T_{1}, T_{U}]

into equal parts:

T_{j} = T_{1} + j δ_{n_{1}} \in [T_{1}, T_{U}]

,

j = 1, \dots, n - 1

,

δ_{n} = (T_{U} - T_{1}) / n

. Then

Z_{2} (T) = S_{2} (T) - E S_{2} (T), N_{2} (T, T^{'}) = \sum_{i = n_{1} + 1}^{n} 1 (T < | x_{i} | \leq T^{'}),

A_{n} = \{sup_{T \in [T_{1}, T_{U}]} | S_{2} (T) - E S_{2} (T) | \geq 5 ε \sqrt{n}\} \subset D_{n} \cup E_{n},

where

D_{n} = \{sup_{j} | Z_{2} (T_{j}) | > ε \sqrt{n}\}, E_{n} = \{sup_{j} sup_{T \in [T_{j}, T_{j} + δ_{n})} | Z_{2} (T) - Z_{2} (T_{j}) | \geq 4 ε \sqrt{n}\} .

Taking into account the definition of the class

L_{0} (η_{n})

and the form of

T_{1}

, we can bound the variance of the terms in

S_{2}

(and hence

Z_{2}

):

D H_{i} (T, T_{m i n}) \leq C {(log \frac{n}{{(log n)}^{5}})}^{3 / 2} n^{- γ}

. Then, applying Bernstein’s inequality [18] for

D_{n}

, we obtain

P (D_{n}) \leq \sum_{j} P (| Z_{2} (T_{j}) | > ε \sqrt{n}) \leq 2 n exp \{- \frac{C ε^{2} n}{2 {(log \frac{n}{{(log n)}^{5}})}^{3 / 2} n^{1 - γ} + 2 (T_{U}^{2} + σ^{2}) \sqrt{n}}\} .

Next,

E_{n} \subset \{sup_{j} | N_{2} (T_{j}, T_{j} + δ_{n}) - E N_{2} (T_{j}, T_{j} + δ_{n}) | \geq \frac{ε \sqrt{n}}{σ^{2}} - \frac{n T_{U} δ_{n}}{σ^{2}} - \frac{C n}{σ^{2}} δ_{n}\} = E_{n}^{^{'}} .

The variance of the terms in

N_{2} (T_{j}, T_{j} + δ_{n})

is bounded by

C {(log \frac{n}{{(log n)}^{5}})}^{1 / 2} n^{- γ}

.

Applying Bernstein’s inequality, we obtain

P (E_{j}^{^{″}}) = P (| N_{1} (t_{j}, t_{j} + δ_{n}) - E N_{1} (t_{j}, t_{j} + δ_{n}) | \geq \frac{ε \sqrt{n}}{σ^{2}} - \frac{n T_{U} δ_{n}}{σ^{2}} - \frac{C n}{σ^{2}} δ_{n}) \leq

\leq 2 exp \{- \frac{C ε^{2} n}{2 {(log \frac{n}{{(log n)}^{5}})}^{1 / 2} n^{1 - γ} + 2 \sqrt{n}}\} .

Hence,

P (E_{n}^{^{'}}) \leq \sum_{j = 1}^{n} P (E_{j}^{^{″}}) \leq 2 n exp \{- \frac{C ε^{2} n}{2 {(log \frac{n}{{(log n)}^{5}})}^{1 / 2} n^{1 - γ} + 2 \sqrt{n}}\} .

Thus, for an arbitrary

ε > 0

P (sup_{T \in [T_{1}, T_{U}]} \frac{| S_{2} (T) - E S_{2} (T) |}{\sqrt{n}} > ε) \to 0

(15)

as

n \to \infty

.

Combining (8), (12), (14) and (15), we obtain the statement of the theorem. □

A similar statement is true for the class

L_{p} (η_{n})

.

Theorem 4.

Let

μ \in L_{p} (η_{n})

,

0 < p < 2

,

η_{n}^{p} \in [n^{- 1} {(log n)}^{5}, n^{- γ}]

,

1 / 2 < γ < 1

. Let

T_{F}

be the FDR-threshold with a controlling parameter

α_{n} \to 0

and

\frac{α_{n} κ_{n} γ_{n}^{2}}{log n} \to \infty

as

n \to \infty

, where

κ_{n}

and

γ_{n}

are defined in (6). Then

\frac{\hat{R} (T_{F}) - R (T_{m i n})}{σ^{2} \sqrt{2 n}} \Rightarrow N (0, 1) .

Proof.

The main steps in the proof of this theorem repeat the proof of Theorem 3. We also write

\hat{R} (T_{F}) - R (T_{m i n}) + \hat{R} (T_{m i n}) - \hat{R} (T_{m i n}) = \hat{R} (T_{m i n}) - R (T_{m i n}) + U (T_{F}) .

The statement

\frac{\hat{R} (T_{m i n}) - R (T_{m i n})}{σ^{2} \sqrt{2 n}} \Rightarrow N (0, 1)

is proved exactly the same as the statement (8). Let

U (T) = S_{1} (T) + S_{2} (T)

,

T \in [T_{1}, T_{U}]

, where the sum

S_{2} (T)

contains terms with

| μ_{i} | \leq C / T_{1}

, and

S_{1} (T)

contains all other terms. By the definition of the class

L_{p} (η_{n})

, the number of terms in

S_{1} (T)

does not exceed

n_{1} \approx C η_{n}^{p} n

and each term is modulo bounded by

T_{U}^{2} + σ^{2}

. Considering the form of

T_{1}

, it can be shown that the mathematical expectations of the terms in

S_{2}

do not exceed

C {(log n)}^{1 / 2} n^{- γ}

, and their variances do not exceed

C {(log \frac{n}{{(log n)}^{5}})}^{3 / 2} n^{- γ}

. Next, arguing as in Theorem 3, we see that

\frac{sup_{T \in [T_{1}, T_{U}]} | E U (T) |}{\sqrt{n}} \to 0

and

P (sup_{T \in [T_{1}, T_{U}]} \frac{| U (T) - E U (T) |}{\sqrt{n}} > ε) \to 0

for an arbitrary

ε > 0

as

n \to \infty

. Thus, since

P (T_{F} \leq T_{1}) \to 0

,

\frac{U (T_{F})}{\sqrt{n}} \overset{P}{\to} 0,

as

n \to \infty

. □

The above statements demonstrate that the considered method for constructing estimates in the model (1) has very similar properties to the method based on minimizing the estimate (7) in the parameter T (see [19]).

5. Conclusions

In this paper, we considered a method of estimating the mean of a Gaussian vector based on the procedure of multiple hypothesis testing. The estimation is based on the false discovery rate measure, which controls the expected percentage of false rejections of the null hypothesis. It is common to use the mean-square risk for evaluating the performance of this approach. Its value cannot be calculated in practice, so its estimate must be considered instead. We analyzed the asymptotic properties of this estimate and proved that it is asymptotically normal for the classes of sparse vectors. This result justifies the use of the mean-square risk estimate for practical purposes and allows constructing asymptotic confidence intervals for a theoretical mean-square risk. For more accurate analysis it is desirable to have guaranteed confidence intervals. These intervals could be constructed based on the estimates of the convergence rate in Theorems 3 and 4. Guaranteed confidence intervals would help to understand how the results of Theorems 3 and 4 affect the risk estimation for a finite sample size. We therefore leave the problem of estimating the rate of convergence and numerical simulation for future work.

Author Contributions

Conceptualization, O.S.; methodology, S.P. and O.S.; formal analysis, S.P. and O.S.; investigation, S.P. and O.S.; writing—original draft preparation, S.P. and O.S.; writing—review and editing, S.P. and O.S.; supervision, O.S.; funding acquisition, O.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Ministry of Science and Higher Education of the Russian Federation, project No. 075-15-2020-799.

Conflicts of Interest

The authors declare no conflict of interest.

References

Storey, J.D. A direct approach to false discovery rates. J. Roy. Statist. Soc. Ser. B 2002, 64, 479–498. [Google Scholar] [CrossRef] [Green Version]
Benjamini, Y.; Hochberg, Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Stat. Soc. Ser. B 1995, 57, 289–300. [Google Scholar] [CrossRef]
Benjamini, Y.; Yekutieli, D. False discovery rate-adjusted multiple confidence intervals for selected parameters. J. Am. Stat. Assoc. 2005, 100, 71–93. [Google Scholar] [CrossRef]
Wilson, D.J. The harmonic mean p-value for combining dependent tests. Proc. Natl. Acad. Sci. USA 2019, 116, 1195–1200. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Wilson, D.J. Reply to Held: When is a harmonic mean p-value a Bayes factor? Proc. Natl. Acad. Sci. USA 2019, 116, 5857–5858. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Zaspa, A.Y.; Shestakov, O.V. Consistency of the risk estimate of the multiple hypothesis testing with the FDR threshold. Her. Tver State Univ. Ser. Appl. Math. 2017, 1, 5–16. [Google Scholar] [CrossRef] [Green Version]
Abramovich, F.; Benjamini, Y.; Donoho, D.; Johnstone, I. Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 2006, 34, 584–653. [Google Scholar] [CrossRef]
Donoho, D.; Jin, J. Asymptotic minimaxity of false discovery rate thresholding for sparse exponential data. Ann. Statist. 2006, 34, 2980–3018. [Google Scholar] [CrossRef] [Green Version]
Neuvial, P.; Roquain, E. On false discovery rate thresholding for classification under sparsity. Ann. Statist. 2012, 40, 2572–2600. [Google Scholar] [CrossRef]
Donoho, D.; Johnstone, I.M. Adapting to unknown smoothness via wavelet shrinkage. J. Am. Stat. Assoc. 1995, 90, 1200–1224. [Google Scholar] [CrossRef]
Marron, J.S.; Adak, S.; Johnstone, I.M.; Neumann, M.H.; Patil, P. Exact risk analysis of wavelet regression. J. Comput. Graph. Stat. 1998, 7, 278–309. [Google Scholar]
Mallat, S. A Wavelet Tour of Signal Processing; Academic Press: New York, NY, USA, 1999. [Google Scholar]
Markin, A.V. Limit distribution of risk estimate of wavelet coefficient thresholding. Inform. Appl. 2009, 3, 57–63. [Google Scholar]
Jansen, M. Noise Reduction by Wavelet Thresholding, Volume 161 of Lecture Notes in Statistics; Springer: New York, NY, USA, 2001. [Google Scholar]
Kudryavtsev, A.A.; Shestakov, O.V. Asymptotic behavior of the threshold minimizing the average probability of error in calculation of wavelet coefficients. Dokl. Math. 2016, 93, 295–299. [Google Scholar] [CrossRef]
Kudryavtsev, A.A.; Shestakov, O.V. Asymptotically optimal wavelet thresholding in models with non-gaussian noise distributions. Dokl. Math. 2016, 94, 615–619. [Google Scholar] [CrossRef]
Hoeffding, W. Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 1963, 58, 13–30. [Google Scholar] [CrossRef]
Bennett, G. Probability inequalities for the sum of independent random variables. J. Am. Stat. Assoc. 1962, 57, 33–45. [Google Scholar] [CrossRef]
Shestakov, O.V. Asymptotic normality of adaptive wavelet thresholding risk estimation. Dokl. Math. 2012, 86, 556–558. [Google Scholar] [CrossRef]

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Palionnaya, S.; Shestakov, O. Asymptotic Properties of MSE Estimate for the False Discovery Rate Controlling Procedures in Multiple Hypothesis Testing. Mathematics 2020, 8, 1913. https://doi.org/10.3390/math8111913

AMA Style

Palionnaya S, Shestakov O. Asymptotic Properties of MSE Estimate for the False Discovery Rate Controlling Procedures in Multiple Hypothesis Testing. Mathematics. 2020; 8(11):1913. https://doi.org/10.3390/math8111913

Chicago/Turabian Style

Palionnaya, Sofia, and Oleg Shestakov. 2020. "Asymptotic Properties of MSE Estimate for the False Discovery Rate Controlling Procedures in Multiple Hypothesis Testing" Mathematics 8, no. 11: 1913. https://doi.org/10.3390/math8111913

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Asymptotic Properties of MSE Estimate for the False Discovery Rate Controlling Procedures in Multiple Hypothesis Testing

Abstract

1. Introduction

2. Preliminaries

3. Mean-Square Risk and Properties of the FDR Threshold

4. Asymptotic Properties of the Risk Estimate

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI