Adaptive Significance Levels in Tests for Linear Regression Models: The e-Value and P-Value Cases

Patiño Hoyos, Alejandra E.; Fossaluza, Victor; Esteves, Luís Gustavo; de Bragança Pereira, Carlos Alberto

doi:10.3390/e25010019

Open AccessArticle

Adaptive Significance Levels in Tests for Linear Regression Models: The e-Value and P-Value Cases

by

Alejandra E. Patiño Hoyos

^1,2,*

,

Victor Fossaluza

^2,*

,

Luís Gustavo Esteves

² and

Carlos Alberto de Bragança Pereira

²

¹

Facultad de Ingeniería, Institución Universitaria Pascual Bravo, Medellín 050034, Colombia

²

Instituto de Matemática e Estatística, Universidade de São Paulo, São Paulo 05508-090, Brazil

^*

Authors to whom correspondence should be addressed.

Entropy 2023, 25(1), 19; https://doi.org/10.3390/e25010019

Submission received: 27 September 2022 / Revised: 17 November 2022 / Accepted: 16 December 2022 / Published: 22 December 2022

(This article belongs to the Special Issue Data Science: Measuring Uncertainties II)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The full Bayesian significance test (FBST) for precise hypotheses is a Bayesian alternative to the traditional significance tests based on p-values. The FBST is characterized by the e-value as an evidence index in favor of the null hypothesis (H). An important practical issue for the implementation of the FBST is to establish how small the evidence against H must be in order to decide for its rejection. In this work, we present a method to find a cutoff value for the e-value in the FBST by minimizing the linear combination of the averaged type-I and type-II error probabilities for a given sample size and also for a given dimensionality of the parameter space. Furthermore, we compare our methodology with the results obtained from the test with adaptive significance level, which presents the capital-P P-value as a decision-making evidence measure. For this purpose, the scenario of linear regression models with unknown variance under the Bayesian approach is considered.

Keywords:

adaptive significance levels; Bayesian test; linear regression; predictive distribution; significance test

1. Introduction

The full Bayesian significance test (FBST) for precise hypotheses is presented in [1] as a Bayesian alternative to the traditional significance tests based on p-values. With the FBST, the authors introduce the e-value as an evidence index in favor of the null hypothesis (H). An important practical issue for the implementation of the FBST is to establish how small the evidence must be to decide to reject H ([2,3]). In that sense, the authors of [4] present loss functions such that the minimization of their posterior expected values characterizes the FBST as a Bayes test under a decision-theoretic approach. This procedure provides a cutoff point for the evidence that depends on the severity of the error for deciding whether to reject or accept H.

In the frequentist significance-test context, it is known that under certain conditions the p-value decreases as the sample size increases, in such a way that by setting a single significance level, the comparison of the p-value with the fixed significance level usually leads to rejection of the null hypothesis ([5,6,7,8,9]). In the FBST procedure, the e-value exhibits similar behavior to the p-value when the sample size increases, which suggests that the cutoff point to define the rejection of H should depend on the sample size and (possibly) on other characteristics of the statistical model under consideration. However, in the proposal of [4], a loss function that explicitly takes into account the sample size is not studied.

In order to solve the problem of testing hypotheses in the usual way, in which changing the sample size influences the probability of rejecting or accepting the null hypothesis, the authors of [10], motivated by [11], suggest that the level of significance in hypothesis testing should be a function of sample size. Instead of setting a single level of significance, the authors of [10] propose fixing the ratio of severity between type-I and type-II error probabilities based on the incurred losses in each case, and thus, given a sample size, defining the level of significance that minimizes the linear combination of the decision error probabilities. The authors of [10] show that proceeding this way, by increasing the sample size, the probabilities of both kind of errors and their linear combination decrease, while in most cases, setting a single level of significance independent of sample size, only type-II error probability decreases. The tests proposed in [10] take the same conceptual grounds of the usual tests for simple hypotheses based on the minimization of a linear combination of probabilities of error of decisions as presented in [12]. Then, the authors of [10] extend, in a sense, the idea in [12] to composite and sharp hypotheses, according to the initial work in [11].

Following the same line of work, the authors of [13,14] present a new hypothesis-testing procedure formulated from the ideas developed in previous works ([11,15,16,17]) and using a mixture of frequentist and Bayesian tools. This procedure introduces the capital-P P-value as a decision-making evidence measure and also includes an adaptive significance level, i.e., a significance level that is a function of sample size. Such an adaptive significance level is obtained from the minimization of the linear combination of generalized type-I and type-II error probabilities. According to the authors of [14], the resulting hypothesis tests do not violate the likelihood principle and do not require any constraints on the dimensionalities of the sample space and parameter space. It should be noticed that the new test procedure is precisely the optimal decision rule for the problem of testing the simple hypotheses

f_{H}

against

f_{A}

. For this reason, such a procedure overcomes the drawback of increasing the sample size resulting in the rejection of a null precise hypothesis ([12]). Another important way of successfully dealing with this question is to take into account meaningful deviations from the parameter value that specifies the null precise hypothesis in the formulation of the hypothesis testing problem ([18,19]).

On the other hand, linear models are probably the most used statistical models to establish the influence of a set of covariates on a response variable. In that sense, the proper identification of the relevant variables in the model is an important issue in any scientific investigation and is a more challenging task in the context of Big-Data problems. In addition to high dimensionality, in recent statistical learning problems, it is common to find large datasets with thousands of observations. This fact may cause the hypothesis of nullity of the regression coefficients to be rejected most of the time, due to the large sample size when the significance level is fixed.

The main goal of our work is to determine, in the setting of linear regression models, how small the Bayesian evidence in the FBST should be in order to reject the null hypothesis and prevent a decision-maker from the abovementioned drawbacks. Therefore, taking into account the concepts in [11,12] associated with optimal hypothesis tests, as well as the conclusions in [10] about the relationship between the significance levels and the sample size, and finally, considering the ideas developed recently by the authors of [13,14] related to adaptive significance levels, we present a method to find a cutoff point for the e-value by minimizing a linear combination of the averaged type-I and type-II error probabilities for a given sample size and also for a given dimensionality of the parameter space. For that purpose, the scenario of linear regression models with unknown variance under the Bayesian approach is considered. So, by providing an adaptive level for decision making and controlling the probabilities of both kinds of errors, we intend to avoid the problems associated with the rejection of the hypotheses on the regression coefficients when the sample size is very large. In addition to the e-value, we calculate the P-value as well as its corresponding adaptive significance levels in order to compare the decisions that can be made by performing the tests with each of these measures.

2. The Linear Regression Model with Unknown Variance

The identification of the relevant variables in linear models can be done through hypothesis-testing procedures involving the respective regression coefficients. In the conjugate Bayesian analysis of the normal linear regression model with unknown variance, it is possible to obtain expressions for the posterior distributions of the parameters and their respective marginals. Therefore, in this setting, the FBST can be used for testing if one or more of the regression coefficients is null, which is the basis of one possible model-selection procedure. We first review the normal linear regression model

y = X θ + ε, ε \sim N_{n} (0, σ^{2} I_{n}),

(1)

where

y = {(y_{1}, \dots, y_{n})}^{⊤}

is an

n \times 1

vector of

y_{i}

observations,

X = {(x_{1}, \dots, x_{n})}^{⊤}

is an

n \times p

matrix of covariates, also called the design matrix, with

x_{i} = {(1, x_{i 1}, \dots, x_{i p - 1})}^{⊤}

,

θ = {(θ_{1}, \dots, θ_{p})}^{⊤}

is a

p \times 1

vector of parameters (regression coefficients), and

ε = {(ε_{1}, \dots, ε_{n})}^{⊤}

an

n \times 1

vector of random errors. The model shows simply that the conditional distribution of

y

given parameters

(θ, σ^{2})

is the multivariate normal distribution

N_{n} (X θ, σ^{2} I_{n})

. Therefore, the likelihood becomes

f (y | θ, σ^{2}) = {(2 π σ^{2})}^{- n / 2} exp \{- \frac{1}{2 σ^{2}} {(y - X θ)}^{⊤} (y - X θ)\} .

(2)

The natural conjugate prior distribution of

(θ, σ^{2})

is a p-variate normal-inverse gamma distribution with hyperparameters

m_{0}

,

V_{0}

,

a_{0}

, and

b_{0}

, denoted by

(θ, σ^{2}) \sim N_{p} I G (m_{0}, V_{0}, a_{0}, b_{0})

. Combining it with the likelihood (2) gives the posterior distribution ([20,21,22]):

f (θ, σ^{2} | y) \propto {(σ^{2})}^{- (a_{0} + \frac{n}{2} + \frac{p}{2} + 1)} \begin{matrix} exp \{- \frac{1}{2 σ^{2}} [{(θ - m^{*})}^{⊤} V^{*} - 1 (θ - m^{*}) + 2 b_{1}]\} \end{matrix},

(3)

where

V^{*} = {({V_{0}}^{- 1} + X^{⊤} X)}^{- 1}, m^{*} = V^{*} ({V_{0}}^{- 1} m_{0} + X^{⊤} y),

a_{1} = a_{0} + \frac{n}{2}, b_{1} = b_{0} + \frac{m_{0} {}^{⊤}V_{0}^{- 1} m_{0} + y^{⊤} y - m^{*} {}^{⊤}V^{* - 1} m^{*}}{2} .

If

X^{⊤} X

is non-singular, we can write

m^{*} = V^{*} ({V_{0}}^{- 1} m_{0} + X^{⊤} X \hat{θ}),

where

\hat{θ} = {(X^{⊤} X)}^{- 1} X^{⊤} y

is the classical maximum likelihood or least squares estimator of

θ

. Therefore, the posterior distribution of

(θ, σ^{2})

is

(θ, σ^{2}) | y \sim N_{p} I G (m^{*}, V^{*}, a_{1}, b_{1}) .

See Appendix A for further explanation of the priors, posteriors, and conditional distributions for the linear regression models with unknown variance.

3. Adaptive Significance Levels in Linear Regression Coefficient Hypothesis Testing

In this section, we present the methodology to find a cutoff value for the evidence in the FBST as an adaptive significance level and we also develop the procedure to calculate the P-value with its corresponding adaptive significance level, all this in the context of linear regression coefficient hypothesis testing in models with unknown variance under the Bayesian point of view. For that purpose, first of all, it is necessary to show how the Bayesian prior predictive densities under the null and alternative hypotheses are defined.

3.1. Prior Predictive Densities in Regression-Coefficient Hypothesis Testing

Let

θ = {(θ_{1}^{⊤} θ_{2}^{⊤})}^{⊤}

, with

θ_{1} = {(θ_{1}, \dots, θ_{s})}^{⊤}

and

θ_{2} = {(θ_{s + 1}, \dots, θ_{p})}^{⊤}

, having

θ_{1}

s elements and

θ_{2}

r elements. Let

ξ = {(θ^{⊤}, σ^{2})}^{⊤} = {(θ_{1}^{⊤}, θ_{2}^{⊤}, σ^{2})}^{⊤}

, then,

Y | ξ \sim N_{n} (X θ, σ^{2} I_{n})

where

ξ \in Ξ

. We are interested in testing the hypotheses

\begin{matrix} H : θ_{2} = 0 \\ A : θ_{2} \neq 0 . \end{matrix}

Let

Ξ_{H}

and

Ξ_{A}

be the partition of the parameter space defined by the competing hypotheses H and A. Consider the prior density

g (ξ

) defined over the entire parameter space

Ξ

and let

f_{H}

and

f_{A}

be the Bayesian prior predictive densities under the respective hypotheses. Both are probability density functions over the sample space

Ω

, as follows:

\begin{matrix} f_{H} (y) = & t_{n} (2 (a_{0} + \frac{r}{2}); X C m_{0_{1.2}} (0), \frac{b_{0} + \frac{{m_{0_{2}}}^{⊤} {(V_{0_{22}})}^{- 1} m_{0_{2}}}{2}}{(a_{0} + \frac{r}{2})} (I_{n} + (X C) V_{0_{11.2}} {(X C)}^{⊤})), \end{matrix}

(4)

where

C_{(s + r) \times s} = {[I_{s}, 0_{s \times r}]}^{⊤}

.

Additionally,

\begin{matrix} f_{A} (y) & = t_{n} (2 a_{0}; X m_{0}, \frac{b_{0}}{a_{0}} (I_{n} + X V_{0} X^{⊤})) . \end{matrix}

(5)

where

P_{H}

and

P_{A}

are the prior probability measure of

ξ

restricted to the sets

H

and

A

respectively (more details can be seen in Appendix B).

3.2. Evidence Index: e-Value

The full Bayesian significance test (FBST) was proposed in [1] for precise or “sharp” hypotheses (subsets of the parameter space with smaller dimension than the dimension of the whole parameter space, and, therefore, with null Lebesgue measure) based on the evidence in favor of the null hypothesis, calculated as the posterior probability of the complement of the highest posterior density (HPD) region (here we consider the usual HPD region with respect to the Lebesgue measure, even though it could be built by choosing any other dominating measure instead) tangent to the set that defines the null hypothesis. Considering the concepts in [10,11], and the recent works [13,14] related to adaptive significance levels, we propose to establish a cutoff value

k^{*}

for the

e

-value (

e v (H; y_{0})

) in the FBST as a function of the sample size n and the dimensionality of the parameter space d, i.e.,

k^{*} = k^{*} (n, d)

with

k^{*} \in [0, 1]

, such that

k^{*}

minimizes the linear combination of the averaged type-I and type-II error probabilities,

a α + b β

. To describe the procedure in the context of the coefficient hypothesis testing of the linear regression model we are addressing, consider the tangential set to the null hypothesis which is defined as

\begin{matrix} T_{y_{0}} & = \{ξ \in Ξ : f (ξ | y_{0}) > sup_{H} f (ξ | y_{0})\} \\ = \{(θ_{1}, θ_{2}, σ^{2}) \in Ξ : f (θ_{1}, θ_{2}, σ^{2} | y_{0}) > sup_{H} f (θ_{1}, θ_{2}, σ^{2} | y_{0})\} . \end{matrix}

(6)

This is the posterior distribution of

(θ_{1}, σ^{2})

given

θ_{2}

a s-variate normal-inverse gamma, that is

(θ_{1}, σ^{2} | θ_{2}, y_{0}) \sim N_{s} I G ({m^{*}}_{1.2} (θ_{2}), {V^{*}}_{11.2}, a_{0} + \frac{r}{2}, b_{0} + \frac{{(θ_{2} - {m^{*}}_{2})}^{⊤} {V^{*}}_{22}^{- 1} (θ_{2} - {m^{*}}_{2})}{2}),

where the point under H for which the posterior attains its maximum value can be calculated as follows

\begin{matrix} \underset{H}{arg sup} f (θ_{1}, θ_{2}, σ^{2} | y_{0}) & = \underset{θ_{1}, θ_{2} = 0, σ^{2}}{arg sup} f (θ_{1}, θ_{2} = 0, σ^{2} | y_{0}) \\ = \underset{θ_{1}, σ^{2}}{arg sup} \frac{f (θ_{1}, θ_{2} = 0, σ^{2} | y_{0})}{\int_{θ_{1} \in R^{s}, σ^{2} \in R_{+}} f (θ_{1}, θ_{2} = 0, σ^{2} | y_{0}) d θ_{1} d σ^{2}} \\ = \underset{θ_{1}, σ^{2}}{arg sup} f (θ_{1}, σ^{2} | θ_{2} = 0, y_{0}) \\ = Mode [f (θ_{1}, σ^{2} | θ_{2} = 0, y_{0})] \\ = [{m^{*}}_{1.2} (θ_{2} = 0), 0, \frac{b_{1} + \frac{{({m^{*}}_{2})}^{⊤} {({V^{*}}_{2})}^{- 1} ({m^{*}}_{2})}{2}}{(a_{1} + \frac{r}{2}) + 1 + \frac{s}{2}}] \\ = [\hat{θ_{1}}, 0, \hat{σ^{2}}] . \end{matrix}

Thus, we get the tangential set

\begin{matrix} T_{y_{0}} & = \{(θ_{1}, θ_{2}, σ^{2}) \in Ξ : f (θ_{1}, θ_{2}, σ^{2} | y_{0}) > f (\hat{θ_{1}}, 0, \hat{σ^{2}} | y_{0})\} . \end{matrix}

(7)

The evidence in favor

H

is calculated as the posterior probability of the complement of

T_{y_{0}}

. That is,

e v (H; y_{0}) = 1 - P (ξ \in T_{y_{0}} | y_{0}) .

(8)

The evidence index, e-value, in favor of a precise hypothesis, considers all points of the parameter space which are less “probable" than some point in

Ξ_{H}

. A large value of

e v (H; y_{0})

means that the subset

Ξ_{H}

lies in a high-probability region of

Ξ

, and, therefore, the data support the null hypothesis; on the other hand, a small value of

e v (H; y_{0})

means that

Ξ_{H}

is in a low-probability region of

Ξ

and the data would make us discredit the null hypothesis ([23]).

The evidence in (8) can be approximately determined via Monte Carlo simulation. Then, generating M samples from the posterior distribution of

ξ

, such that

ξ | y \sim N_{p} I G (m^{*}, V^{*}, a_{1}, b_{1})

, we estimate the evidence by Monte Carlo simulation through the expression

1 - \frac{1}{M} \sum_{j = 1}^{M} 1 (ξ^{(j)} \in T_{y_{0}}) .

Now, consider the test such that

φ_{e} (y) = \{\begin{matrix} 0 i f e v (H; y) > k \\ 1 i f e v (H; y) \leq k . \end{matrix}

The averaged error probabilities, expressed in terms of the predictive densities, can be estimated by Monte Carlo simulation through the expressions

α_{φ_{e}} = \int_{y \in Ψ_{e}} f_{H} (y) d y and β_{φ_{e}} = \int_{y \notin Ψ_{e}} f_{A} (y) d y,

(9)

where

Ψ_{e}

is the set

Ψ_{e} = \{y \in Ω : e v (H; y) \leq k\} .

So, the adaptive cutoff value

k^{*}

for

e v (H; y)

will be the k that minimizes

a α_{φ_{e}} + b β_{φ_{e}}

. The a and b values represent the relative seriousness of errors of the two types or, equivalently, relative prior preferences for the competing hypotheses. For example, if

b / a = 1

, it is said that

β_{φ_{e}}

and

α_{φ_{e}}

are equally severe, whereas if

b / a < 1

, then

α_{φ_{e}}

undergoes a more intense minimization than

β_{φ_{e}}

, which means that type-I error is considered more serious than type-II error and also indicates a prior preference for H.

3.3. Significance Index: P-Value

The authors of [13,14] present a new hypothesis-testing procedure using a mixture of frequentist and Bayesian tools. On the one hand, the procedure resembles a frequentist test as it is based on the comparison of the P-value as a decision-making evidence measure with an adaptive significance level. On the other hand, such an adaptive significance level is obtained from the minimization of a linear combination of generalized type-I and type-II error probabilities under a Bayesian perspective. As a result, it generally depends on both the null and alternative hypotheses and on the sample size n, as opposed to standard fixed significance levels. The new proposal may also be seen as a test for simple hypotheses characterized by the predictive distributions

f_{H}

and

f_{A}

in Section 3.1 that minimizes a specific linear combination of probabilities of errors of decision. It is then formally characterized by a cutoff for the Bayes Factor (which takes the place of the likelihood ratio here) and therefore may prevent a decision-maker from rejecting the null hypothesis when the data seem to be clear evidence in its favor ([12]). It should be stressed that under the new proposal, a cutoff value for the Bayes factor (for the “likelihood ratio” here) is chosen in advance and consequently no constraint is imposed exclusively on the probability of the error of the first kind. In this sense, the test in [13,14] completely departs from regular frequentist tests. From another angle, the Bayes factor may be seen as the ratio between the posterior odds in favor of the null hypothesis and its prior odds ([24]). Note that the quantity defined here is a capital-P “P-value” to distinguish it from the small-p “p-value”. In the scenario of the linear regression model with unknown variance, the ratio between the two prior predictive densities (4) and (5), will be the Bayes factor,

BF (x) = \frac{f_{H} (x)}{f_{A} (x)} .

(10)

Now, consider the test

φ^{*} (y) = \{\begin{matrix} 0 i f BF (y) > \frac{b}{a} \\ 1 i f BF (y) \leq \frac{b}{a} . \end{matrix}

For any other test

φ

,

φ^{*}

minimizes a linear combination of the type-I and type-II error probabilities,

a α_{φ} + b β_{φ}

. Here again, the a and b values represent the relative seriousness of errors of the two types. To obtain the P-value at the point

y_{0} \in Ω

, define the set

Ψ_{0}

of sample points

y

for which the Bayes factors are smaller than or equal to the Bayes factor of the observed sample point

y_{0}

, that is

Ψ_{0} = {y \in Ω : BF (y) \leq BF (y_{0})} .

Then, the P-value is the integral of the predictive density over H,

f_{H}

, in

Ψ_{0}

\begin{matrix} P - value (y_{0}) = \int_{Ψ_{0}} f_{H} (y) d y . \end{matrix}

Defining the set

Ψ^{*}

of sample points

y

with Bayes factors smaller than or equal to

b / a

, i.e.,

Ψ^{*} = \{y \in Ω : BF (y) \leq \frac{b}{a}\},

the optimal averaged error probabilities from the generalized Neyman–Pearson Lemma, which will depend on the sample size, are given by

α_{φ^{*}} = \int_{y \in Ψ^{*}} f_{H} (y) d y and β_{φ^{*}} = \int_{y \notin Ψ^{*}} f_{A} (y) d y .

In order to make a decision, the P-value is compared to the optimal adaptive significance level

α_{φ^{*}}

. Then, when

y_{0}

is observed, the hypothesis H will be rejected if the

P - value (y_{0}) < α_{φ^{*}}

.

4. Simulation Study

We developed a simulation study considering two models. The first model was

y = X θ + ε, ε \sim N_{n} (0, σ^{2} I_{n}),

(11)

where

X = 1_{n}

and

θ = θ_{1}

. The hypotheses to be tested were

\begin{matrix} H : θ_{1} = 0 \\ A : θ_{1} \neq 0 . \end{matrix}

The second model studied was

y = X θ + ε, ε \sim N_{n} (0, σ^{2} I_{n}),

(12)

where

X = {(x_{1}, \dots, x_{n})}^{⊤}

is an

n \times p

matrix of covariates with

x_{i} = {(1, x_{i 1}, \dots, x_{i p - 1})}^{⊤}

and

θ = {(θ_{1}^{⊤}, θ_{2}^{⊤})}^{⊤}

is the

p \times 1

vector of coefficients. In this case, the hypotheses of interest were

\begin{matrix} H : θ_{2} = 0 \\ A : θ_{2} \neq 0 . \end{matrix}

The averaged error probabilities,

α_{φ^{*}}

and

β_{φ^{*}}

, were calculated using the Monte Carlo method with values generated from the following distributions:

Model (11) under H

$\begin{matrix} θ_{1}^{(j)} & = 0 \\ σ^{2 (j)} & | θ_{1}^{(j)} = 0 \sim I G (a_{0} + \frac{1}{2}, b_{0} + \frac{{(θ_{1}^{(j)} - m_{0})}^{⊤} V_{0}^{- 1} (θ_{1}^{(j)} - m_{0})}{2}) \\ Y^{(j)} & | σ^{2 (j)}, θ_{1}^{(j)} \sim N_{n} (1_{n} θ_{1}^{(j)}, σ^{2 (j)} I_{n}) . \end{matrix}$
Model (11) under A

$\begin{matrix} σ^{2 (j)} \sim I G (a_{0}, b_{0}) \\ θ_{1}^{(j)} | σ^{2 (j)} \sim N (m_{0}, σ^{2 (j)} V_{0}) \\ Y^{(j)} | σ^{2 (j)}, θ_{1}^{(j)} \sim N_{n} (1_{n} θ_{1}^{(j)}, σ^{2 (j)} I_{n}) . \end{matrix}$
Model (12) under H

$\begin{matrix} θ_{2}^{(j)} = 0 \\ θ_{1}^{(j)} | θ_{2}^{(j)} = 0 \sim t_{s} (2 a_{0} + 1; m_{0_{1.2}} (θ_{2}^{(j)}), \frac{2 b_{0} + {(θ_{2}^{(j)} - m_{0_{2}})}^{⊤} {V_{0}}_{22}^{- 1} (θ_{2}^{(j)} - m_{0_{2}} 2)}{2 a_{0} + 1} V_{0_{11.2}}) \\ σ^{2 (j)} | θ_{1}^{(j)}, θ_{2}^{(j)} = 0 \sim I G (a_{0} + 1, b_{0} + \frac{{(θ^{(j)} - m_{0})}^{⊤} {V_{0}}^{- 1} (θ^{(j)} - m_{0})}{2}) \\ Y^{(j)} | σ^{2 (j)}, θ_{1}^{(j)}, θ_{2}^{(j)} = 0 \sim N_{n} (X θ^{(j)}, σ^{2 (j)} I_{n}) . \end{matrix}$
Model (12) under A

$\begin{matrix} σ^{2 (j)} & \sim I G (a_{0}, b_{0}) \\ θ^{(j)} & | σ^{2 (j)} \sim N_{p} (m_{0}, σ^{2 (j)} V_{0}) \\ Y^{(j)} & | σ^{2 (j)}, θ^{(j)} \sim N_{n} (X θ^{(j)}, σ^{2 (j)} I_{n}) . \end{matrix}$

Then,

y^{(j)} = (y_{1}^{(j)}, \dots y_{n}^{(j)})

is a random sample of the conditional distribution of

Y

,

j = 1 \dots M

.

In a first stage, we considered model (11) where

θ = θ_{1}

and model (12) with

θ = {(θ_{1}, θ_{2})}^{⊤}

. Note that the dimensionality of the parameter space, denoted by d, is different in the two models: for model (11), the dimensionality is

d = 2

and for model (12), the dimensionality is

d = 3

. Samples of size

M = 1000

were generated for each model under the respective hypotheses and also for different sample sizes between

n = 10

and

n = 5000

. In model (12), the covariate

x_{i 1}

,

i = 1 \dots n

, was generated from a standard normal distribution. Finally, to obtain the adaptive values

α_{φ^{*}}

and

β_{φ^{*}}

, the two types of errors were considered as equally severe, that is,

a = b = 1

.

Figure 1 shows the averaged error probabilities for the FBST as functions of k for a sample size

n = 100

. This was replicated for all sample sizes in order to numerically find the corresponding

k^{*}

value that minimizes

α_{φ_{e}} + β_{φ_{e}}

. Table 1 and Table 2 and Figure 2 and Figure 3 present the

k^{*}

and

α_{φ_{P}^{*}}

values as function of n for each model. As can be seen, both values have a decreasing trend when the sample size increases. In the case of the cutoff value for the evidence, it is possible to notice the differences in the results when the dimensionality of the parameter space change. Then, the

k^{*}

value depends not only on the sample size but also on the dimensionality of the parameter space, more specifically, it is greater when d is higher. However, this does not occur with

α_{φ_{P}^{*}}

, which maintains almost the same values even if d increases. On the other hand, Figure 4 and Figure 5 illustrate that in all these models, the optimal averaged error probabilities and their linear combination also decrease with increasing sample size.

We choose a single random sample

y_{0}

to calculate the e-value and P-value for the models. Table 3 displays the results: the cases where

H

is rejected being represented by the cells in boldface. It can be observed that the decision remains the same regardless of the index used.

As the second stage in our simulation study, we set two sample sizes

n = 60

and

n = 120

to perform the tests for model (12), increasing the dimensionality of the parameter space. In that scenario, the vector of coefficients was such that

θ = {(θ_{1}^{⊤}, θ_{2})}^{⊤}

and the hypotheses to be tested were

\begin{matrix} H : θ_{2} = 0 \\ A : θ_{2} \neq 0 . \end{matrix}

So, by varying the dimension of vector

θ_{1}

, the different models considered for each test were obtained. Table 4 and Table 5 and Figure 6 and Figure 7 show the

k^{*}

and

α_{φ_{P}^{*}}

values as functions of d. For

d = 2

, the values correspond to model (11). We can say that, for a fixed hypothesis, the larger the dimensionality of the parameter space, the greater the value of

k^{*}

. In the case of the

α_{φ_{P}^{*}}

value, it does not change significantly when the dimensionality of the parameter space increases, except when the number of parameters is very large in relation to the sample size.

Table 6 presents the e-value and P-value calculated for a single random sample

y_{0}

. Here, with the e-value the null hypothesis is less easily rejected. This may be related to two things: it may be due to approximation error as a result of the simulation process or due to the fact that the evidence apparently converges to 1 as the dimensionality of the parameter space increases, in which case a more detailed study is required.

5. Numerical Examples

In this section, we present two applications with real datasets. We choose

a_{0} = 3

and

b_{0} = 2

as parameters of the inverse gamma prior distribution for

σ^{2}

. Additionally, in the normal prior for

θ

given

σ^{2}

,

m_{0} = 0_{p \times 1}

and

V_{0} = I_{p}

are taken as parameters. The Monte Carlo approximations were made generating samples of size M =10,000.

5.1. Budget Shares of British Households Dataset

We select a dataset that draws 1519 observations from the 1980–1982 British Family Expenditure Surveys (FES) ([25]). In our application, we want to fit the model

y_{i} = θ_{1} + θ_{2} x_{i 1} + θ_{3} x_{i 2} + θ_{4} x_{i 3} + θ_{5} x_{i 4} + ε_{i}, ε_{i} \sim N (0, σ^{2}) .

(13)

We consider as explanatory variables, respectively, the total net household income (rounded to the nearest 10 UK pounds sterling) (

x_{1}

), the budget share for alcohol expenditure (

x_{2}

), the budget share for fuel expenditure, and the age of household head (

x_{3}

). We take the budget share for food expenditure as the dependent variable (

y

). All the expenditures and income are measured in pounds sterling per week.

Table 7 summarizes the results for the hypotheses

H : θ_{j} = 0

,

j = 1 \dots 5

, by performing the test with the p-value at

0.05

significance level and also the e-value and the P-value with their respective adaptive significance levels. The cases where

H

is rejected are represented by the cells in boldface.

{\hat{θ}}_{F r e q}

and

{\hat{θ}}_{B a y e s}

are, respectively, the classical maximum likelihood estimator and the Bayes estimator of

θ

. It can be seen that unlike the p-value, the e-value and the P-value do not reject the hypothesis of nullity of the coefficient associated with the age of household head variable.

Table 8 exposes the optimal averaged error probabilities using the e-value and the P-value. It can be noted that the values are very similar with both methodologies.

5.2. Boston Housing Dataset

We also take a dataset that contains information about housing values obtained from census tracts in the Boston Standard Metropolitan Statistical Area (SMSA) in 1970 ([26]). These data are composed of 506 samples and 14 variables. The regression model we use is

y_{i} = θ_{1} + θ_{2} x_{i 1} + θ_{3} x_{i 2} + θ_{4} x_{i 3} + θ_{5} x_{i 4} + θ_{6} x_{i 5} + θ_{7} x_{i 6} + θ_{8} x_{i 7} + θ_{9} x_{i 8} + θ_{10} x_{i 9} + ε_{i},

(14)

ε_{i} \sim N (0, σ^{2}) .

We choose the following explanatory variables to fit our model: per capita crime rate by town (

x_{1}

), the proportion of residential land zoned for lots over

25.000

sq. ft (

x_{2}

), the proportion of non-retail business acres per town (

x_{3}

), the proportion of non-retail business acres per town (

x_{4}

), the average number of rooms per dwelling (

x_{5}

), the proportion of owner-occupied units built prior to 1940 (

x_{6}

), the weighted mean of distances to five Boston employment centers (

x_{7}

), the full-value property tax rate per

10.000

(

x_{8}

), the pupil–teacher ratio by town, and

1000 {(B k - 0.63)}^{2}

, where

B k

is the proportion of black people by town (

x_{9}

). The dependent variable is the median value of the owner-occupied homes (in 1000 s) in the census tract (

y

).

The results for the hypotheses

H : θ_{j} = 0

,

j = 1 \dots 10

by performing the test with the p-value, the e-value and the P-value, are summarized in Table 9. In this case, with the e-value the null hypotheses are less rejected. The e-value does not reject the hypotheses of nullity of the coefficients associated with the proportion of residential land zoned for lots over

25.000

sq. ft and proportion of non-retail business acres per town variables, while the p-value does. On the other hand, the P-value, unlike the p-value, does not reject the hypothesis for the proportion of residential land zoned for lots over

25.000

sq. ft variable, but it does for the Intercept. As can be observed in Table 10, for these data, the optimal averaged error probabilities values are also very close.

6. Conclusions

In this work, we present a method to find a cutoff value

k^{*}

for the Bayesian evidence in the FBST by minimizing the linear combination of the averaged type-I and type-II error probabilities for a given sample size n and also for a given dimensionality d of the parameter space in the context of linear regression models with unknown variance under the Bayesian perspective. In that sense, we provide a solution to the existing problem in the usual approach of hypothesis-testing procedures based on fixed cutoffs for measures of evidence: the increase of the sample size leads to the rejection of the null hypothesis. Furthermore, we compare our results with those obtained by using the test proposed by the authors of [13,14]. With our suggestion of cutoff value for the evidence in the FBST and also with the procedure proposed by the authors of [13,14], increasing the sample size implies that the probabilities of both kinds of optimal averaged errors and their linear combination decrease, unlike most cases, where, by setting a single level of significance independent of sample size, only type-II error probability decreases.

A detailed study is still needed for more complex models, so the methodology we propose to determine the adaptive cutoff value for evidence in the FBST could be extended to models with different prior specifications, which would involve, among other things, using approximate methods to find the prior predictive densities under the null and alternative hypotheses.

Author Contributions

Conceptualization, A.E.P.H., V.F., L.G.E. and C.A.d.B.P.; Methodology, A.E.P.H., V.F., L.G.E. and C.A.d.B.P.; Formal analysis, A.E.P.H., V.F. and L.G.E.; Investigation, A.E.P.H. and V.F.; Writing—review & editing, A.E.P.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The real datasets are freely available in the Ecdat package ([27]) (BudgetUK dataset) and the MASS package ([28]) (Boston dataset) of R software ([29]).

Acknowledgments

The first author gratefully acknowledges financial support from the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) from Brazil and the Ministerio de Ciencia Tecnología e Innovación (Minciencias) from Colombia. The authors are grateful to the editor and referees for helpful comments and suggestions which have led to an improvement of this article.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

As stated in Section 2, the normal linear regression model in (1) shows that the conditional distribution of

y

given parameters

(θ, σ^{2})

is the multivariate normal distribution

N_{n} (X θ, σ^{2} I_{n})

. Therefore, the likelihood becomes

f (y | θ, σ^{2}) = {(2 π σ^{2})}^{- n / 2} exp \{- \frac{1}{2 σ^{2}} {(y - X θ)}^{⊤} (y - X θ)\} .

(A1)

The natural conjugate prior distribution of

(θ, σ^{2})

is a p-variate normal-inverse gamma distribution with hyperparameters

m_{0}

,

V_{0}

,

a_{0}

, and

b_{0}

, denoted by

(θ, σ^{2}) \sim N_{p} I G (m_{0}, V_{0}, a_{0}, b_{0})

([20,21,22]):

g (θ, σ^{2}) = \frac{{(b_{0})}^{a_{0}}}{{(2 π)}^{p / 2} {| V_{0} |}^{1 / 2} Γ (a_{0})} {(σ^{2})}^{- (a_{0} + \frac{p}{2} + 1)} \begin{matrix} exp \{- \frac{1}{2 σ^{2}} [{(θ - m_{0})}^{⊤} {V_{0}}^{- 1} (θ - m_{0}) + 2 b_{0}]\} \end{matrix},

(A2)

such that the conditional prior distributions of

θ

given

σ^{2}

is

g (θ | σ^{2}) = {(2 π)}^{- p / 2} {| V_{0} |}^{- 1 / 2} {(σ^{2})}^{- p / 2} \begin{matrix} exp \{- \frac{1}{2 σ^{2}} [{(θ - m_{0})}^{⊤} {V_{0}}^{- 1} (θ - m_{0})]\} \end{matrix},

(A3)

and the prior marginal distribution of

σ^{2}

is

g (σ^{2}) = \frac{{(b_{0})}^{a_{0}}}{Γ (a_{0})} {(σ^{2})}^{- (a_{0} + 1)} exp \{- \frac{b_{0}}{σ^{2}}\},

(A4)

denoted, respectively, by

θ | σ^{2} \sim N_{p} (m_{0}, σ^{2} V_{0}), σ^{2} \sim I G (a_{0}, b_{0}) .

(A5)

Both distributions are equivalent to the following new pair of distributions

\begin{matrix} g (σ^{2} | θ) & = \frac{{(b_{0} + \frac{{(θ - m_{0})}^{⊤} {V_{0}}^{- 1} (θ - m_{0})}{2})}^{(a_{0} + \frac{p}{2})}}{Γ (a_{0} + \frac{p}{2})} {(σ^{2})}^{- (a_{0} + \frac{p}{2} + 1)} \times \\ exp \{- \frac{1}{2 σ^{2}} \begin{matrix} [{(θ - m_{0})}^{⊤} {V_{0}}^{- 1} (θ - m_{0}) + 2 b_{0}] \end{matrix}\}, \end{matrix}

(A6)

and

\begin{matrix} g (θ) & = \frac{{(2 b_{0})}^{a_{0}} Γ (a_{0} + \frac{p}{2})}{π^{p / 2} {| V_{0} |}^{1 / 2} Γ (a_{0})} {\{{(θ - m_{0})}^{⊤} {V_{0}}^{- 1} (θ - m_{0}) + 2 b_{0}\}}^{- (a_{0} + \frac{p}{2})} \\ \propto {\{1 + {(θ - m_{0})}^{⊤} {(2 b_{0} V_{0})}^{- 1} (θ - m_{0})\}}^{- (a_{0} + \frac{p}{2})} . \end{matrix}

(A7)

The density in (A7) is a p-variate t distribution with

2 a_{0}

degrees of freedom and hyperparameters

m_{0}

and

(b_{0} a_{0}) V_{0}

. Then, the distributions in (A6) and (A7) are denoted by

σ^{2} | θ \sim I G (a_{0} + \frac{p}{2}, b_{0} + \frac{{(θ - m_{0})}^{⊤} {V_{0}}^{- 1} (θ - m_{0})}{2}), θ \sim t_{p} (2 a_{0}; m_{0}, \frac{b_{0}}{a_{0}} V_{0}) .

(A8)

Now suppose that the

N_{p} I G (m_{0}, V_{0}, a_{0}, b_{0})

distribution (A2) is adopted as the prior distribution for

(θ, σ^{2})

. Combining it with the likelihood (A1) gives the posterior distribution ([20,21,22]):

f (θ, σ^{2} | y) \propto {(σ^{2})}^{- (a_{0} + \frac{n}{2} + \frac{p}{2} + 1)} \begin{matrix} exp \{- \frac{1}{2 σ^{2}} [{(θ - m^{*})}^{⊤} {V^{*}}^{- 1} (θ - m^{*}) + 2 b_{1}]\} \end{matrix},

(A9)

where

V^{*} = {({V_{0}}^{- 1} + X^{⊤} X)}^{- 1}, m^{*} = V^{*} ({V_{0}}^{- 1} m_{0} + X^{⊤} y),

a_{1} = a_{0} + \frac{n}{2}, b_{1} = b_{0} + \frac{m_{0} {}^{⊤}V_{0}^{- 1} m_{0} + y^{⊤} y - m^{*} {}^{⊤}V^{* - 1} m^{*}}{2} .

If

X^{⊤} X

is non-singular, we can write

m^{*} = V^{*} ({V_{0}}^{- 1} m_{0} + X^{⊤} X \hat{θ}),

where

\hat{θ} = {(X^{⊤} X)}^{- 1} X^{⊤} y

is the classical maximum likelihood or least squares estimator of

θ

. Therefore, the posterior distribution of

(θ, σ^{2})

is

(θ, σ^{2}) | y \sim N_{p} I G (m^{*}, V^{*}, a_{1}, b_{1}) .

Consequently,

θ | σ^{2}, y \sim N_{p} (m^{*}, σ^{2} V^{*}), σ^{2} | y \sim I G (a_{1}, b_{1}),

(A10)

and this is equivalent to,

\begin{matrix} σ^{2} | θ, y & \sim & I G (a_{1} + \frac{p}{2}, b_{1} + \frac{{(θ - m^{*})}^{⊤} {V^{*}}^{- 1} (θ - m^{*})}{2}), \end{matrix}

(A11)

\begin{matrix} θ | y & \sim & t_{p} (2 a_{1}; m^{*}, \frac{b_{1}}{a_{1}} V^{*}) . \end{matrix}

(A12)

Consider now conditional distributions given partial specification of

θ

. First let

θ^{⊤} = (θ_{1}^{⊤}, θ_{2}^{⊤})

, and consider distributions conditional on

θ_{2}

. Suppose that

(θ, σ^{2}) \sim N_{p} I G (m_{0}, V_{0}, a_{0}, b_{0})

. Corresponding distributions result if we change

a_{0}

to

a_{1}

,

b_{0}

to

b_{1}

,

m_{0}

to

m^{*}

and

V_{0}

to

V^{*}

. If

θ_{1}

has s elements and

θ_{2}

has r elements, write

m_{0} = [\begin{matrix} m_{01} \\ m_{02} \end{matrix}], V_{0} = [\begin{matrix} V_{011} & V_{012} \\ V_{021} & V_{022} \end{matrix}],

where

m_{01}

is

s \times 1

,

V_{011}

is

s \times s

,

m_{02}

is

r \times 1

,

V_{022}

is

r \times r

, with

r = p - s

. Now since

θ

given

σ^{2}

is distributed as

N_{p} (m_{0}, σ^{2} V_{0})

, using general results on multivariate normal distributions (see [30]), we have the following distributions:

\begin{matrix} θ_{2} | σ^{2} & \sim & N_{r} (m_{02}, σ^{2} V_{022}), \end{matrix}

(A13)

\begin{matrix} (θ_{1} | θ_{2}, σ^{2}) & \sim & N_{s} (m_{0 1.2} (θ_{2}), σ^{2} V_{0 11.2}), \end{matrix}

(A14)

where

m_{0 1.2} (θ_{2}) = m_{01} + m_{012} V_{0_{22}}^{- 1} (θ_{2} - m_{02})

and

V_{0 11.2} = V_{011} - V_{012} V_{0_{22}}^{- 1} V_{021}

.

From (A13) and the prior distribution of

σ^{2}

we have that

(θ_{2}, σ^{2}) \sim N_{r} I G (m_{02}, V_{022}, a_{0}, b_{0})

(A15)

and hence

\begin{matrix} θ_{2} & \sim & t_{r} (2 a_{0}; m_{02}, \frac{b_{0}}{a_{0}} V_{022}), \end{matrix}

(A16)

\begin{matrix} σ^{2} | θ_{2} & \sim & I G (a_{0} + \frac{r}{2}, b_{0} + \frac{{(θ_{2} - m_{02})}^{⊤} V_{0_{22}}^{- 1} (θ_{2} - m_{02})}{2}), \end{matrix}

(A17)

Now (A14) and (A17) together give

(θ_{1}, σ^{2} | θ_{2}) \sim N_{s} I G (m_{0 1.2} (θ_{2}), V_{0 11.2}, a_{0} + \frac{r}{2}, b_{0} + \frac{{(θ_{2} - m_{02})}^{⊤} V_{0_{22}}^{- 1} (θ_{2} - m_{02})}{2})

(A18)

and finally

θ_{1} | θ_{2} \sim t_{s} (2 a_{0} + r; m_{0 1.2} (θ_{2}), \frac{2 b_{0} + {(θ_{2} - m_{02})}^{⊤} V_{0_{22}}^{- 1} (θ_{2} - m_{02})}{2 a_{0} + r} V_{0 11.2}) .

(A19)

Appendix B

Let

f_{H}

and

f_{A}

be the Bayesian prior predictive densities under the respective hypotheses H and A described in Section 3.1. Both are probability density functions over the sample space

Ω

, and they are calculated as the following conditional expectations:

\begin{matrix} f_{H} (y) & = E_{ξ} [f (y | ξ) | H] \\ = \int_{H} f (y | ξ) d P_{H} (ξ) \\ = \int_{H} f (y | θ_{1}, θ_{2}, σ^{2}) g_{H} (θ_{1}, θ_{2}, σ^{2}) d θ_{1} d θ_{2} d σ^{2}, \end{matrix}

where

g_{H} (θ_{1}, θ_{2}, σ^{2})

is the prior density under

H

calculated as

\begin{matrix} g_{H} (θ_{1}, θ_{2}, σ^{2}) & = \frac{g (θ_{1}, θ_{2}, σ^{2})) 1 (θ_{2} = 0)}{\oint_{H} g (θ_{1}, θ_{2}, σ^{2}) d θ_{1} d θ_{2} d σ^{2}} \\ = \frac{g (θ_{1}, θ_{2}, σ^{2}) 1 (θ_{2} = 0)}{\int_{R^{s} \times R_{+}} g (θ_{1}, θ_{2} = 0, σ^{2}) d θ_{1} d σ^{2}} \\ = g (θ_{1}, σ^{2} | θ_{2} = 0) . \end{matrix}

Thus,

f_{H} (y)

is given by

\begin{matrix} f_{H} (y) = & \int_{H} f (y | θ_{1}, θ_{2}, σ^{2}) g_{H} (θ_{1}, θ_{2}, σ^{2}) d θ_{1} d θ_{2} d σ^{2} \\ = & \int_{R^{s} \times R_{+}} f (y | θ_{1}, θ_{2} = 0, σ^{2}) g (θ_{1}, σ^{2} | θ_{2} = 0) d θ_{1} d σ^{2} \\ = & \int_{R^{s} \times R_{+}} N_{n} (X C θ_{1}, σ^{2} I_{n}) \times \\ N_{s} I G (m_{0 1.2} (0), v_{0 11.2}, a_{0} + \frac{r}{2}, b_{0} + \frac{{m_{02}}^{⊤} {(V_{022})}^{- 1} m_{02}}{2}) d θ_{1} d σ^{2} \\ = & t_{n} (2 (a_{0} + \frac{r}{2}); X C m_{0 1.2} (0), \frac{b_{0} + \frac{{m_{02}}^{⊤} {(V_{022})}^{- 1} m_{02}}{2}}{(a_{0} + \frac{r}{2})} (I_{n} + (X C) V_{0} 11.2 {(X C)}^{⊤})), \end{matrix}

(A20)

where

C_{(s + r) \times s} = {[I_{s}, 0_{s \times r}]}^{⊤}

.

The prior predictive density under

A

can be obtained as follows

\begin{matrix} f_{A} (y) & = E_{ξ} [f (y | ξ) | A] \\ = \int_{A} f (y | ξ) d P_{A} (ξ) \\ = \int_{A} f (y | θ, σ^{2}) g_{A} (θ, σ^{2}) d θ d σ^{2} \\ = \int_{A} f (y | θ, σ^{2}) g (θ, σ^{2}) d θ d σ^{2} \\ = \int_{A} N_{n} (X θ, σ^{2} I_{n}) \times N_{p} I G (m_{0}, V_{0}, a_{0}, b_{0}) d θ d σ^{2} \\ = t_{n} (2 a_{0}; X m_{0}, \frac{b_{0}}{a_{0}} (I_{n} + X V_{0} X^{⊤})) . \end{matrix}

(A21)

where

P_{H}

and

P_{A}

are the prior probability measure of

ξ

restricted to the sets

H

and

A

, respectively.

References

Pereira, C.A.B.; Stern, J.M. Evidence and Credibility: Full Bayesian Significance Test for Precise Hypotheses. Entropy 1999, 1, 99–110. [Google Scholar] [CrossRef] [Green Version]
Wagner, B.; Stern, J.M. The Rules of Logic Composition for the Bayesian Epistemic e-Values. Logic J. IGPL 2007, 15, 401–420. [Google Scholar]
Stern, J.M. Cognitive constructivism, eigen-solutions, and sharp statistical hypotheses. Cybern. Hum. Knowing 2007, 14, 9–46. [Google Scholar]
Madruga, R.; Esteves, L.G.; Wechsler, S. On the Bayesianity of Pereira-Stern tests. TEST 2001, 10, 291–299. [Google Scholar] [CrossRef]
Kempthorne, O.; Folks, L. Probability, Statistics, and Data Analysis; Iowa State University Press: Ames, IA, USA, 1971. [Google Scholar]
Cox, D.R.; Spjøtvoll, E.; Johansen, S.; van Zwet, W.R.; Bithell, J.F.; Barndorff-Nielsen, O.; Keuls, M. The Role of Significance Tests [with Discussion and Reply]. Scand. J. Stat. 1978, 4, 49–70. [Google Scholar]
Lindley, D.V.; Barndorff-Nielsen, O.; Gustav, E.; Harsaae, E.; Thorburn, D.; Hald, A.; Spjötvoll, E. The Bayesian Approach [with Discussion and Reply]. Scand. J. Stat. 1978, 5, 1–26. [Google Scholar]
Cox, D.R. Statistical significance tests. Br. J. Clin. Pharmacol. 1982, 14, 325–331. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Johnstone, D.J.; Lindley, D.V. Bayesian inference given data ‘significant atα’: Tests of point hypotheses. Theor. Decis. 1995, 38, 51–60. [Google Scholar] [CrossRef]
Oliveira, M.C. Definition of the Level of Significance as a Function of the Sample Size. Masters Dissertation, Universidade de São Paulo, Instituto de Matemática e Estatística, Departamento de Estatística, São Paulo, Brazil, 2014. (In Portuguese). [Google Scholar]
Pereira, C.A.B. Test of Hypotheses Defined in Spaces of Different Dimensions: Bayesian View and Classical Interpretation. Ph.D. Thesis, Universidade de São Paulo, Instituto de Matemática e Estatśtica, Departamento de Estatśtica, São Paulo, Brazil, 1985. (In Portuguese). [Google Scholar]
DeGroot, M.H. Probability and Statistics, 2nd ed.; Addison-Whesley Publishing Company: Boston, MA, USA, 1986. [Google Scholar]
Pereira, C.A.B.; Nakano, E.Y.; Fossaluza, V.; Esteves, L.G.; Gannon, M.A.; Polpo, A. Hypothesis tests for bernoulli experiments: Ordering the sample space by bayes factors and using adaptive significance levels for decisions. Entropy 2017, 19, 696. [Google Scholar] [CrossRef] [Green Version]
Gannon, M.A.; Pereira, C.A.B.; Polpo, A. Blending bayesian and classical tools to define optimal sample-size-dependent significance levels. Am. Stat. 2019, 73 (Suppl. S1), 213–222. [Google Scholar] [CrossRef] [Green Version]
Montoya-Delgado, L.E.; Irony, T.Z.; Pereira, C.A.B.; Whittle, M.R. An unconditional exact test for the Hardy-Weinberg equilibrium law: Sample-space ordering using the Bayes factor. Genetics 2001, 158, 875–883. [Google Scholar] [CrossRef] [PubMed]
Irony, T.Z.; Pereira, C.A.B. Bayesian hypothesis test: Using Surface Integrals to Distribute Prior Information among the Hypotheses. Resenhas Inst. Matemática Estatística Univ. Paulo 1995, 2, 27–46. [Google Scholar]
Pereira, C.A.B.; Wechsler, S. On the concept of P-value. Braz. J. Probab. Stat. 1993, 7, 159–177. [Google Scholar]
Esteves, L.G.; Izbicki, R.; Stern, J.M.; Stern, R.B. Pragmatic Hypotheses in the Evolution of Science. Entropy 2019, 21, 883. [Google Scholar] [CrossRef] [Green Version]
Schervish, M.J. Theory of Statistics; Springer: New York, NY, USA, 1995. [Google Scholar]
O’Hagan, A.; Forster, J.J. Kendall’s Advanced Theory of Statistics, Volume 2B: Bayesian Inference; Arnold: London, UK, 2004. [Google Scholar]
Box, G.E.P.; Tiao, G.C. Bayesian Inference in Statistical Analysis; Wiley: New York, NY, USA, 1973. [Google Scholar]
DeGroot, M.H. Optimal Statistical Decisions; McGraw-Hill: New York, NY, USA, 1970. [Google Scholar]
Madruga, R.; Pereira, C.A.B. Power of FBST: Standard examples. Inst. Interam. Estadística. Estadístic 2005, 57, 1–9. [Google Scholar]
Berger, J.O.; Delampady, M. Pragmatic Testing Precise Hypotheses. Stat. Sci. 1987, 22, 317–335. [Google Scholar]
Blundell, R.; Duncan, A.; Pendakur, K. Semiparametric estimation and consumer demand. J. Appl. Econom. 1998, 13, 435–461. [Google Scholar] [CrossRef]
Harrison, D.; Rubinfeld, D.L. Hedonic Housing Prices and the Demand for Clean Air. J. Environ. Econ. Manag. 1978, 5, 81–102. [Google Scholar] [CrossRef]
Croissant, Y.; Graves, S. Ecdat: Data Sets for Econometrics. R Package Version 0.4-2. 2022. Available online: https://CRAN.R-project.org/package=Ecdat (accessed on 26 September 2022).
Venables, W.N.; Ripley, B.D. Modern Applied Statistics with S, 4th ed.; Springer: New York, NY, USA, 2002; Available online: https://www.stats.ox.ac.uk/pub/MASS4/ (accessed on 26 September 2022).
R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2022; Available online: https://www.R-project.org/ (accessed on 26 September 2022).
Mardia, K.V.; Kent, J.T.; Bibby, J.M. Multivariate Analysis; Academic Press: London, UK, 1979. [Google Scholar]

Figure 1. Averaged error probabilities (

α_{φ_{e}}

,

β_{φ_{e}}

and

α_{φ_{e}} + β_{φ_{e}}

) as function of k. Sample size

n = 100

.

Figure 1. Averaged error probabilities (

α_{φ_{e}}

,

β_{φ_{e}}

and

α_{φ_{e}} + β_{φ_{e}}

) as function of k. Sample size

n = 100

.

Figure 2. Cutoff values

k^{*}

for

e v (H; y)

as a function of n, with

d = 2

and

d = 3

.

Figure 2. Cutoff values

k^{*}

for

e v (H; y)

as a function of n, with

d = 2

and

d = 3

.

Figure 3. Optimal averaged type-I error probability (

α_{φ^{*}}

) as a function of n, with

d = 2

and

d = 3

.

Figure 3. Optimal averaged type-I error probability (

α_{φ^{*}}

) as a function of n, with

d = 2

and

d = 3

.

Figure 4. Unknown-variance model optimal averaged error probabilities (

α_{φ_{e}^{*}}^{*}

,

β_{φ_{e}^{*}}^{*}

and

α_{φ_{e}^{*}}^{*} + β_{φ_{e}^{*}}^{*}

) as functions of n.

Figure 4. Unknown-variance model optimal averaged error probabilities (

α_{φ_{e}^{*}}^{*}

,

β_{φ_{e}^{*}}^{*}

and

α_{φ_{e}^{*}}^{*} + β_{φ_{e}^{*}}^{*}

) as functions of n.

Figure 5. Optimal averaged error probabilities (

α_{φ^{*}}

,

β_{φ^{*}}

and

α_{φ^{*}} + β_{φ^{*}}

) as functions of n.

Figure 5. Optimal averaged error probabilities (

α_{φ^{*}}

,

β_{φ^{*}}

and

α_{φ^{*}} + β_{φ^{*}}

) as functions of n.

Figure 6. Unknown-variance model cutoff values

k^{*}

for

e v (H; y)

as a function of d, with

n = 60

and

n = 120

.

Figure 6. Unknown-variance model cutoff values

k^{*}

for

e v (H; y)

as a function of d, with

n = 60

and

n = 120

.

Figure 7. Optimal averaged type-I error probability (

α_{φ^{*}}

) as a function of d, with

n = 60

and

n = 120

.

Figure 7. Optimal averaged type-I error probability (

α_{φ^{*}}

) as a function of d, with

n = 60

and

n = 120

.

Table 1. Cutoff values

k^{*}

for

e v (H; y)

as a function of n, with

d = 2

and

d = 3

.

Table 1. Cutoff values

k^{*}

for

e v (H; y)

as a function of n, with

d = 2

and

d = 3

.

	$k^{*}$
n	$d = 2$	$d = 3$
10	0.32530	0.51220
50	0.12534	0.22442
100	0.11705	0.21081
150	0.10889	0.19735
200	0.10092	0.18416
250	0.09323	0.17132
300	0.08587	0.15894
350	0.07893	0.14713
400	0.07243	0.13598
450	0.06641	0.12560
500	0.06091	0.11606
1000	0.03035	0.06689
1500	0.02223	0.07086
2000	0.01892	0.07173

Table 2. Optimal averaged type-I error probability (

α_{φ^{*}}

) as a function of n, with

d = 2

and

d = 3

.

Table 2. Optimal averaged type-I error probability (

α_{φ^{*}}

) as a function of n, with

d = 2

and

d = 3

.

	$α_{φ^{*}}$
n	$d = 2$	$d = 3$
10	0.12400	0.09200
50	0.04515	0.04327
100	0.03899	0.03775
150	0.03327	0.03252
200	0.02817	0.02772
250	0.02380	0.02341
300	0.02018	0.01963
350	0.01732	0.01642
400	0.01513	0.01376
450	0.01353	0.01163
500	0.01241	0.01002
1000	0.00941	0.00683
1500	0.00827	0.00398
2000	0.00681	0.00524

Table 3. Cutoff values

k^{*}

,

e v (H; y_{0})

and P-value (y₀) as function of n, with

d = 2

and

d = 3

.

Table 3. Cutoff values

k^{*}

,

e v (H; y_{0})

and P-value (y₀) as function of n, with

d = 2

and

d = 3

.

$d = 2$					$d = 3$
n	$k^{*}$	$ev$	$α_{φ_{P}^{*}}$	$Pv$	$k^{*}$	$ev$	$α_{φ_{P}^{*}}$	$Pv$
10	0.3253	0.9838	0.1240	0.7510	0.5122	0.9696	0.0920	0.4850
50	0.1253	0.0820	0.0451	0.0190	0.2244	0.9261	0.0433	0.3570
100	0.1171	0.0000	0.0390	0.0000	0.2108	0.4176	0.0377	0.0650
150	0.1089	0.0973	0.0333	0.0200	0.1974	0.2965	0.0325	0.0510
200	0.1009	0.0036	0.0282	0.0000	0.1842	0.0466	0.0277	0.0040
250	0.0932	0.0001	0.0238	0.0000	0.1713	0.0620	0.0234	0.0050
300	0.0859	0.0000	0.0202	0.0000	0.1589	0.0119	0.0196	0.0010
350	0.0789	0.0000	0.0173	0.0000	0.1471	0.0282	0.0164	0.0030
400	0.0724	0.0000	0.0151	0.0000	0.1360	0.0347	0.0138	0.0020
450	0.0664	0.0000	0.0135	0.0000	0.1256	0.0628	0.0116	0.0040
500	0.0609	0.0000	0.0124	0.0000	0.1161	0.0181	0.0100	0.0010
1000	0.0303	0.0000	0.0094	0.0000	0.0669	0.0000	0.0068	0.0010
1500	0.0222	0.0000	0.0083	0.0000	0.0709	0.0000	0.0040	0.0010
2000	0.0189	0.0000	0.0068	0.0000	0.0717	0.0000	0.0052	0.0010

Table 4. Unknown-variance model cutoff values

k^{*}

for

e v (H; y)

as a function of d, with

n = 60

and

n = 120

.

Table 4. Unknown-variance model cutoff values

k^{*}

for

e v (H; y)

as a function of d, with

n = 60

and

n = 120

.

	$k^{*}$
d	$n = 60$	$n = 120$
2	0.18500	0.08560
3	0.20420	0.19480
4	0.31510	0.39630
5	0.47790	0.49500
6	0.57670	0.53040
7	0.79970	0.67400
8	0.82970	0.70490
9	0.91250	0.80310
10	0.94540	0.92770
11	0.97300	0.92940
21	0.99990	0.99960
31	0.99990	0.99970
41	0.99990	0.99990
51	0.99990	0.99990

Table 5. Optimal averaged type-I error probability (

α_{φ^{*}}

) as a function of d, with

n = 60

and

n = 120

.

Table 5. Optimal averaged type-I error probability (

α_{φ^{*}}

) as a function of d, with

n = 60

and

n = 120

.

	$α_{φ^{*}}$
d	$n = 60$	$n = 120$
2	0.03700	0.02100
3	0.03300	0.03800
4	0.03700	0.03600
5	0.04100	0.03800
6	0.04800	0.03300
7	0.04400	0.03500
8	0.04600	0.03100
9	0.05000	0.03600
10	0.04500	0.03900
11	0.04600	0.04000
21	0.05100	0.03700
31	0.05300	0.03700
41	0.07200	0.03600
51	0.12600	0.04100

Table 6. Cutoff values

k^{*}

,

e v (H; y_{0})

and P-value (y₀) as functions of d, with

n = 60

and

n = 120

.

Table 6. Cutoff values

k^{*}

,

e v (H; y_{0})

and P-value (y₀) as functions of d, with

n = 60

and

n = 120

.

$n = 60$					$n = 120$
d	$k^{*}$	$ev$	$α_{φ_{P}^{*}}$	$Pv$	$k^{*}$	$ev$	$α_{φ_{P}^{*}}$	$Pv$
2	0.1850	0.6865	0.0370	0.3660	0.0856	0.0082	0.0210	0.0010
3	0.2042	0.5849	0.0330	0.1360	0.1948	0.7199	0.0380	0.1760
4	0.3151	0.8119	0.0370	0.1820	0.3963	0.9230	0.0360	0.2470
5	0.4779	0.0000	0.0410	0.0000	0.4950	0.0000	0.0380	0.0010
6	0.5767	0.5672	0.0480	0.0290	0.5304	0.7002	0.0330	0.0360
7	0.7997	0.8854	0.0440	0.0820	0.6740	0.9992	0.0350	0.2860
8	0.8297	0.3267	0.0460	0.0050	0.7049	0.7858	0.0310	0.0260
9	0.9125	0.1919	0.0500	0.0020	0.8031	0.0009	0.0360	0.0010
10	0.9454	0.0006	0.0450	0.0010	0.9277	0.0001	0.0390	0.0010
11	0.9730	0.0000	0.0460	0.0000	0.9294	0.0000	0.0400	0.0000
21	0.9999	0.0000	0.0510	0.0000	0.9996	0.0000	0.0370	0.0000
31	0.9999	1.0000	0.0530	0.0240	0.9997	0.0495	0.0370	0.0010
41	0.9999	0.9998	0.0720	0.0010	0.9999	0.0004	0.0360	0.0010
51	0.9999	1.0000	0.1260	0.0000	0.9999	0.0000	0.0410	0.0000

Table 7. Budget shares of British households dataset hypothesis-testing summary.

Coefficients	${\hat{θ}}_{Freq}$	$α$	$pv$	${\hat{θ}}_{Bayes}$	$k^{*}$	$ev$	$α_{φ_{P}^{*}}$	$Pv$
Intercept	0.3758	0.0500	0.0000	0.3700	0.7078	0.0000	0.0382	0.0000
$x_{i 1}$	−0.0004	0.0500	0.0000	−0.0004	0.0113	0.0000	0.0001	0.0000
$x_{i 2}$	−0.1533	0.0500	0.0003	−0.1283	0.9410	0.1890	0.1278	0.0172
$x_{i 3}$	0.1717	0.0500	0.0007	0.1487	0.9520	0.1957	0.1468	0.0143
$x_{i 4}$	0.0009	0.0500	0.0119	0.0010	0.0764	0.3048	0.0004	0.0666

Table 8. Budget shares of British households dataset optimal averaged error probabilities.

Coefficients	$α_{φ_{e}^{}}^{}$	$α_{φ_{P}^{*}}$	$β_{φ_{e}^{}}^{}$	$β_{φ_{P}^{*}}$
Intercept	0.0466	0.0382	0.2157	0.2193
$x_{i 1}$	0.0000	0.0001	0.0006	0.0006
$x_{i 2}$	0.1521	0.1278	0.4146	0.4145
$x_{i 3}$	0.1508	0.1468	0.4679	0.4410
$x_{i 4}$	0.0004	0.0004	0.0080	0.0083

Table 9. Boston housing dataset hypothesis-testing summary.

Coefficients	${\hat{θ}}_{Freq}$	$α$	$pv$	${\hat{θ}}_{Bayes}$	$k^{*}$	$ev$	$α_{φ_{P}^{*}}$	$Pv$
Intercept	1.7035	0.0500	0.6958	1.2035	0.9998	1.0000	0.1916	0.0085
$x_{i 1}$	−0.1244	0.0500	0.0006	−0.1244	0.5780	0.3365	0.0010	0.0001
$x_{i 2}$	0.0359	0.0500	0.0224	0.0362	0.4089	0.9012	0.0004	0.0025
$x_{i 3}$	−0.1489	0.0500	0.0235	−0.1473	0.6390	0.9114	0.0025	0.0023
$x_{i 4}$	6.7165	0.0500	0.0000	6.7336	0.9296	0.0000	0.0143	0.0000
$x_{i 5}$	−0.0655	0.0500	0.0000	−0.0648	0.3275	0.0141	0.0001	0.0000
$x_{i 6}$	−1.3198	0.0500	0.0000	−1.3091	0.8146	0.0001	0.0095	0.0000
$x_{i 7}$	−0.0030	0.0500	0.2324	−0.0030	0.0124	0.9996	0.0002	0.0198
$x_{i 8}$	−0.7652	0.0500	0.0000	−0.7528	0.8223	0.0003	0.0053	0.0000
$x_{i 9}$	0.0145	0.0500	0.0000	0.0147	0.0297	0.0113	0.0001	0.0000

Table 10. Boston housing dataset optimal averaged error probabilities.

Coefficients	$α_{φ_{e}^{}}^{}$	$α_{φ_{P}^{*}}$	$β_{φ_{e}^{}}^{}$	$β_{φ_{P}^{*}}$
Intercept	0.1321	0.1916	0.6494	0.4946
$x_{i 1}$	0.0018	0.0010	0.0165	0.0173
$x_{i 2}$	0.0006	0.0004	0.0075	0.0079
$x_{i 3}$	0.0030	0.0025	0.0286	0.0292
$x_{i 4}$	0.0222	0.0143	0.1123	0.1181
$x_{i 5}$	0.0000	0.0001	0.0068	0.0068
$x_{i 6}$	0.0091	0.0095	0.0825	0.0808
$x_{i 7}$	0.0000	0.0002	0.0016	0.0015
$x_{i 8}$	0.0081	0.0053	0.0494	0.0521
$x_{i 9}$	0.0000	0.0001	0.0019	0.0017

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Patiño Hoyos, A.E.; Fossaluza, V.; Esteves, L.G.; de Bragança Pereira, C.A. Adaptive Significance Levels in Tests for Linear Regression Models: The e-Value and P-Value Cases. Entropy 2023, 25, 19. https://doi.org/10.3390/e25010019

AMA Style

Patiño Hoyos AE, Fossaluza V, Esteves LG, de Bragança Pereira CA. Adaptive Significance Levels in Tests for Linear Regression Models: The e-Value and P-Value Cases. Entropy. 2023; 25(1):19. https://doi.org/10.3390/e25010019

Chicago/Turabian Style

Patiño Hoyos, Alejandra E., Victor Fossaluza, Luís Gustavo Esteves, and Carlos Alberto de Bragança Pereira. 2023. "Adaptive Significance Levels in Tests for Linear Regression Models: The e-Value and P-Value Cases" Entropy 25, no. 1: 19. https://doi.org/10.3390/e25010019

APA Style

Patiño Hoyos, A. E., Fossaluza, V., Esteves, L. G., & de Bragança Pereira, C. A. (2023). Adaptive Significance Levels in Tests for Linear Regression Models: The e-Value and P-Value Cases. Entropy, 25(1), 19. https://doi.org/10.3390/e25010019

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Adaptive Significance Levels in Tests for Linear Regression Models: The e-Value and P-Value Cases

Abstract

1. Introduction

2. The Linear Regression Model with Unknown Variance

3. Adaptive Significance Levels in Linear Regression Coefficient Hypothesis Testing

3.1. Prior Predictive Densities in Regression-Coefficient Hypothesis Testing

3.2. Evidence Index: e-Value

3.3. Significance Index: P-Value

4. Simulation Study

5. Numerical Examples

5.1. Budget Shares of British Households Dataset

5.2. Boston Housing Dataset

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI