Next Article in Journal
The Risk Contagion between Chinese and Mature Stock Markets: Evidence from a Markov-Switching Mixed-Clayton Copula Model
Previous Article in Journal
Dimensionless Groups by Entropic Similarity: I — Diffusion, Chemical Reaction and Dispersion Processes
Previous Article in Special Issue
Causal Confirmation Measures: From Simpson’s Paradox to COVID-19
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

From p-Values to Posterior Probabilities of Null Hypotheses

by
Daiver Vélez Ramos
1,*,
Luis R. Pericchi Guerra
2 and
María Eglée Pérez Hernández
2
1
Faculty of Business Administration, Statistical Institute and Computerized Information Systems, Río Piedras Campus, University of Puerto Rico, 15 AVE Universidad STE 1501, San Juan, PR 00925-2535, USA
2
Faculty of Natural Sciences, Department of Mathematics, Río Piedras Campus, University of Puerto Rico, 17 AVE Universidad STE 1701, San Juan, PR 00925-2537, USA
*
Author to whom correspondence should be addressed.
Entropy 2023, 25(4), 618; https://doi.org/10.3390/e25040618
Submission received: 15 February 2023 / Revised: 28 March 2023 / Accepted: 30 March 2023 / Published: 6 April 2023
(This article belongs to the Special Issue Data Science: Measuring Uncertainties II)

Abstract

:
Minimum Bayes factors are commonly used to transform two-sided p-values to lower bounds on the posterior probability of the null hypothesis, in particular the bound e · p · log ( p ) . This bound is easy to compute and explain; however, it does not behave as a Bayes factor. For example, it does not change with the sample size. This is a very serious defect, particularly for moderate to large sample sizes, which is precisely the situation in which p-values are the most problematic. In this article, we propose adjusting this minimum Bayes factor with the information to approximate an exact Bayes factor, not only when p is a p-value but also when p is a pseudo-p-value. Additionally, we develop a version of the adjustment for linear models using the recent refinement of the Prior-Based BIC.

1. Introduction

By now, it is well known by practitioners that p-values are not posterior probabilities of a null hypothesis, which is what science would need to declare a scientific finding. So p-values, and particularly the threshold of 0.05 , need to be recalibrated. Two widespread practical attempts are (i) the so-called Robust Lower Bound on Bayes factors B F e · p · log ( p ) [1] and (ii) the replacement of the ubiquitous α = 0.05 by α * = 0.005 [2]. These suggestions, which are an improvement of usual practice, fall short of being a real solution, mainly because the dependence of the evidence on the sample size is not considered. Still, the Robust Lower Bound is useful since it is valid from small sample sizes and onward and only depends on the p-value. It is known that the evidence of a p-value against a point null hypothesis depends on the sample size. In [3], they consider p-values in linear models and propose new monotonic minimum Bayes factors that depend on the sample size and converge to e · p · log ( p ) as the sample size approaches infinity, which implies it is not consistent, as Bayes factors are. It turns out that the maximum evidence for an exact two-tailed p-value increases with decreasing sample size. There are several proposals in the literature, and most do not depend on the sample size, while those that do continue to be Robust Lower Bounds; however, neither behaves like a real Bayes factor. In this article, we propose to adjust the Robust Lower Bound e · p · log ( p ) so that it behaves in a similar or approximate way to actual Bayes factors for any sample size. A further complication arises, however, when the null hypotheses are not simple, that is, when they depend on unknown nuisance parameters. In this situation, what is usually called p-values are only pseudo-p-values [4] (p. 397). So, we first need to extend the validity of the Robust Lower Bound to pseudo-p-values. The effect of adjusting this minimum Bayes factor with the sample size is shown in a simulation in Section 5.1.
The outline of the article is as follows: In Section 2 we define pseudo-p-values using the p-value definition of [4] (p. 397) and extend for them the validity of the Robust Lower Bound. In Section 3, we present the adaptive significance levels that will be used for incorporating the sample size in the lower bound: the general adaptive significance level presented in [5] and the refined version for linear models developed in [6]; in both cases, we use versions calibrated using the Prior-Based BIC (PBIC) [7]. In Section 4, we derive adaptive approximate Bayes factors and apply them to pseudo-p-values in Section 5. We close in Section 6 with some final comments.

2. Valid p-Values and Robust Lower Bound

Under the null hypotheses, p-values are well known to have Uniform(0, 1); in [4] (p. 397), a more general definition is given.
Definition 1. 
A p-value  p ( X )  is a statistic satisfying  0 p ( x ) 1  for every sample point x. Small values of p ( X )  give evidence that  H 1 : θ Θ 0 c  is true, where  Θ 0  is some subset of the parameter space and  Θ 0 c  is its complement. A p-value is valid if, for every  θ Θ 0  and every  0 α 1 ,
P θ ( p ( X ) α ) α .
Based on this definition, we can say that there are valid p-values that are Uniformly Distributed in (0, 1), that is,
P θ ( p ( X ) α ) = α for every θ Θ 0 and every 0 α 1 ,
and others that are not, that is, when there is at least one α , such that
P θ ( p ( X ) α ) < α for every θ Θ 0 .
Remark 1. 
We consider any valid p-value complying with (2) a pseudo-p-value.
The “Robust Lower Bound” ( R L B ), as we call it here and proposed by [1], is
B L ( p ) = e · p · log ( p ) p < e 1 1 otherwise
The authors consider that under the null hypothesis, the distribution of the p-value, p ( X ) , is Uniform(0, 1). Alternatives are typically developed by considering alternative models for X , but the results then end up being quite problem-specific. An attractive approach is instead to directly consider alternative distributions for p itself. In effect, they consider that, under H 1 , the density of p is f ( p | ξ ) , where ξ is an unknown parameter. So, consider testing
H 0 : p Uniform ( 0 , 1 ) versus H 1 : p f ( p | ξ )
If the test statistic (T) has been appropriately chosen so that large values of T ( X ) would be evidence in favor of H 1 , then the density of p under H 1 should be decreasing in p. A class of decreasing densities for p that is very easy to work with is the class of Beta ( ξ , 1 ) densities, for 0 < ξ 1 , given by f ( p | ξ ) = ξ p ξ 1 . The uniform distribution (i.e., H 0 ) arises from the choice ξ = 1 [1]. The expression B L ( p ) = inf a l l π B π ( p ) , where B π ( p ) is the Bayes factor of H 0 to H 1 for a given prior density π ( ξ ) on this alternative.
Note that this calibration has already been proposed in [8]. Another class of decreasing densities is Beta ( 1 , ξ ) with ξ > 1 . This leads to the “ e · q · log ( q ) ” calibration, where q = 1 p see [9].
In contrast with Remark 1, if we consider p ( X ) a pseudo-p-value under H 0 , that is,
p Beta ( ξ 0 , 1 ) with ξ 0 > 1 , fixed but arbitrary ,
under the test
H 0 : p Beta ( ξ 0 , 1 ) vs . H 1 : p f ( p | ξ )
with f ( p | ξ ) Beta ( ξ , 1 ) for 0 < ξ ξ 0 , then a generalized Robust Lower Bound R L B ξ 0 can be defined as
B L ( p , ξ 0 ) = e · ξ 0 · p ξ 0 log ( p ) p < e 1 ξ 0 1 otherwise
where ξ 0 has to be estimated or calculated theoretically (see [10] for a proposal when extending for multiple testing). Any value ξ 0 1 corresponds to a pseudo-p-value.
On the other hand, since f ( p | ξ ) = ξ p ξ 1 has its maximum in ξ = 1 log ( p ) < 1 with p < e 1 , then f ( p | ξ ) is decreasing for ξ > 1 log ( p ) , thus for any Bayes factor B 01
B 01 B L ( p ) > B L ( p , ξ 0 ) with ξ 0 > 1
See Figure 1.
In the following, we calibrate R L B ξ 0 such that R L B ξ 0 B 01 .
Lemma 1. 
B L ( p v a l , ξ ) = e · ξ · p v a l ξ · log ( p v a l ) e · ξ · p v a l ξ > p v a l ξ ,    for, 0 < p v a l < e 1 and ξ 1 . Note that B L ( p v a l , 1 ) = B L ( p v a l )
Proof. 
Appendix A.    □
Theorem 1. 
The  RLB ξ  is a valid p-value for  ξ 1 , that is,
P ( B L ( p , ξ ) α | p f ( p | ξ ) ) α , for each 0 α 1 .
Proof. 
Appendix A.    □

3. Adaptive α with PBIC Strategy

The Bayesian literature has been criticizing for several decades the implementation of hypothesis testing with fixed significance levels and, in particular, the use of the scale p-value < 0.05. An adaptive α allows us to adjust the statistical significance with the amount of information; see [5,11,12]. The adaptive values we work with in this section were calculated so that they allow to arrive to results equivalent to those obtained with a Bayes factor. In [5], the authors present an adaptive α based on BIC as
α n ( q ) = [ χ α 2 ( q ) + q log ( n ) ] q 2 1 2 q 2 1 n q 2 Γ q 2 × C α ,
where C α is a calibration constant, and strategies for calculating it are presented in [5]. It yields a consistent procedure; it alleviates the problem of the divergence between practical and statistical significance; and it makes it possible to perform Bayesian testing by computing intervals with the calibrated α -levels.
An adaptive α is also presented in [6], but this time it is a version refined to nested linear models with calibration based on the Bayesian information criterion based on Prior PBIC [7],
α ( b , n ) ( q ) = [ g n , α ( q ) + log ( b ) + C ] q 2 1 b n j 2 ( n 1 ) · 2 ( n 1 ) n j q / 2 1 Γ q 2 × exp n j 2 ( n 1 ) g n , α ( q ) + C .
Here, b = | X j t X j | | X i t X i | and X i , X j are design matrices and
C = 2 m i = 1 q i log ( 1 e v m i ) 2 v m i 2 m j = 1 q j log ( 1 e v m j ) 2 v m j ,
v m l = ξ ^ m l [ d m l ( 1 + n m l e ) ] with l = i , j corresponding to each model. Here, n m l e , with l = i , j , refers to The Effective Sample Size (called TESS) corresponding to that parameter; see [7].
The adaptive α in (5) can also be presented using the PBIC strategy (this strategy was not considered in [5]), and the following expression is obtained
α n ( q ) = [ χ α 2 ( q ) + q log ( n ) + C ] q 2 1 n q 2 2 q 2 1 Γ q 2 × exp 1 2 χ α 2 ( q ) + C .
Note that this adaptive α is still of BIC structure, since the expression χ α 2 ( q ) + q log ( n ) remains.

Example: Binomial Models

Consider comparing two binomial models S 1 binomial ( n 1 , p 1 ) and S 2 binomial ( n 2 , p 2 ) via the test
H 0 : p 1 = p 2 vs . H 1 : p 1 p 2 .
Defining n = n 1 + n 2 and p ^ , the MLE from p 1 p 2 , then (7) gives
α n = 2 n π ( χ α 2 ( 1 ) + log ( n ) + C ) 1 / 2 × exp 1 2 χ α 2 ( 1 ) + C ,
here, χ α 2 ( 1 ) is the quantile α from chi-square with d f = 1 , C = 2 log ( 1 e v ) 2 v , v = p ^ 2 / [ d ( 1 + n e ) ] , d = σ 1 2 n 1 + σ 2 2 n 2 , n e = max n 1 2 σ 1 2 , n 2 2 σ 2 2 d .
Table 1 shows the behavior of this adaptive α n for α = 0.05 and different values of n 1 and n 2 .

4. Adjusting RLB ξ Using Adaptive α

In this section, we combine (3) with the formulas for adaptive α in (6) and (7) for adjusting R L B ξ and obtaining an approximation to an objective Bayes factor. Indeed, we adjust the R L B ξ through the expression B ( α ) = B L ( α , ξ 0 ) · g ( · ) , where g is determined in such a way that when B ( α ) is evaluated in (6) or (7), it converges to a constant (this allows us to obtain equivalent results from the Frequentist and Bayesian point of view, that is, the decision does not change).
Substituting p in (3) by the adaptive α value in (7) results in the following expression.
B ( α , q , n , ξ 0 ) = α ξ 0 log ( α ) Γ ( q / 2 ) ξ 0 n ξ 0 q 2 2 χ α 2 ( q ) + q · log ( n ) + C ξ 0 q 2 ( ξ 0 1 ) .
For a Uniform ( 0 , 1 )  p-value with ξ 0 = 1 , this expression simplifies to
B ( α , q , n ) = α log ( α ) Γ ( q / 2 ) n q 2 2 χ α 2 ( q ) + q · log ( n ) + C q 2 .
The refined version of this calibration for linear models is obtained when (3) is evaluated in (6)
B ( α , q , n , b ) = α log ( α ) Γ ( q / 2 ) b n j 2 ( n 1 ) 2 ( n 1 ) ( g n , α ( q ) + log ( b ) + C ) ( n j ) q 2
in this case, we only consider ξ 0 = 1 .

Balanced One-Way Anova

Suppose we have k groups with r observations each, for a total sample size of k r , and let H 0 : μ 1 = = μ k = μ vs . H 1 : At least one   μ i   different . Then, the design matrices for both models are
X 1 = 1 1 1 , X k = 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 , b = | X k t X k | | X 1 t X 1 | = k 1 r k 1 ,
and the adaptive α for the linear model in accordance with what was presented in [6] is
α ( k , r ) = [ g r , α ( k 1 ) log ( k ) + ( k 1 ) log ( r ) + C ] k 3 2 k 1 r k 1 r 1 2 ( r 1 / k ) 2 ( r 1 / k ) r 1 k 3 2 Γ k 1 2 × exp r 1 2 ( r 1 / k ) g r , α ( k 1 ) + C .
Here, the number of replicas r is The Effective Sample Size (TESS). Therefore, the approximate Bayes factor for this test calculated with (8) is
B ( α , k , r ) = α log ( α ) Γ ( ( k 1 ) / 2 ) k 1 r k 1 r 1 2 ( r 1 / k ) 2 ( r 1 / k ) ( g r , α ( k 1 ) log ( k ) + ( k 1 ) log ( r ) + C ) ( r 1 ) k 1 2
A very important case arises when k = 2 . For this situation, the last formula simplifies to
B ( α , r ) = α log ( α ) r 2 r 1 2 r 1 2 ( r 1 ) π ( g r , α ( 1 ) log r 2 + C ) ( r 1 ) 1 2

5. Obtaining Bounds for P ( H 0 | Data )

In this section, we use (9) and (11) to produce bounds for the posterior probability of the null hypothesis H 0 .
Since for any Bayes factor B 01
B 01 B L ( p , ξ 0 ) with ξ 0 1 , fixed but arbitrary ,
a lower bound for the posterior probability of the null hypothesis can be obtained as
min P ( H 0 | D a t a ) = 1 + 1 B L ( p , ξ 0 ) 1 .
Figure 2 shows these posterior probabilities (called P R L B ξ 0 ) for different values of ξ 0 . To simplify the use of these Bayes factors, we call B F G ξ 0 the Bayes factor of Equation (9), B F G the Bayes factor of Equation (10), and B F L the Bayes factor of Equation (11).

5.1. Testing Equality of Two Means

Consider comparing two normal means via the test
H 0 : μ 1 = μ 2 versus H 1 : μ 1 μ 2 ,
where the associated known variances, σ 1 2 and σ 2 2 , are not equal.
Y = X μ + ϵ = 1 0 1 0 0 1 0 1 μ 1 μ 2 + ϵ 11 ϵ 2 n 2 ,
× ϵ N ( 0 , diag { σ 1 2 , , σ 1 2 n 1 , σ 2 2 , , σ 2 2 n 2 } )
Defining ν = ( μ 1 + μ 2 ) / 2 and ζ = ( μ 1 μ 2 ) / 2 places this in the linear model comparison framework,
Y = B ν ζ + ϵ
with
B = 1 1 1 1 1 1 1 1
where we are comparing M 0 : ζ = 0 versus M 1 : ζ 0 .
So, for B F G and B F L ,
C = 2 log ( 1 e v ) 2 v
v = ζ ^ 2 d ( 1 + n e ) , d = σ 1 2 n 1 + σ 2 2 n 2 , n e = max n 1 2 σ 1 2 , n 2 2 σ 2 2 σ 1 2 n 1 + σ 2 2 n 2 .
A special case is the standard test of equality of means when σ 1 2 = σ 2 2 = σ 2 . Then,
n e = min n 1 1 + n 1 n 2 , n 2 1 + n 2 n 1 .
On the other hand, considering μ = μ 1 μ 2 with σ 1 2 = σ 2 2 = σ 2 :
  • H 0 : μ 1 = μ 2 μ = 0 ;
  • H 1 : μ 1 μ 2 μ 0 .
Assuming priors:
  • μ | σ 2 , H 1 N o r m a l ( 0 , σ 2 / τ 0 ) , τ 0 ( 0 , ) ;
  • π ( σ 2 ) 1 / σ 2 for both H 0 and H 1 .
The Bayes factor is
B F 01 = n + τ 0 τ 0 1 / 2 t 2 τ 0 n + τ 0 + l t 2 + l l + 1 2
where
t = | Y ¯ | s / n
a t-statistic with degrees of freedom l = n 1 and n = n 1 + n 2 ; see [13].
Figure 3 shows the posterior probability for the null hypothesis H 0 when n = 50 and n = 100 for the Robust Lower Bound with ξ 0 = 1 (called P R L B ), the Bayes factor B F L (called P B F L ), the Bayes factor B F G (called P B F G ), and the Bayes factor B F 01 (called P B F 01 ). Note that the posterior probability with B F 01 when τ 0 = 6 looks very similar to the result obtained using the Bayes factors B F L and B F G .
We now present a simulation that shows how our adjustment, or calibration, to R L B ξ works quite similarly to an exact Bayes factor. We perform the following experiment: We simulate r data points from each of the two normal distributions, N ( μ 1 , σ ) and N ( μ 2 , σ ) . We reproduce this K times. For all K simulations, μ 1 μ 2 = 0 . For all K replicates, we test the hypotheses H 0 : μ 1 = μ 2 vs. H 1 : μ 1 μ 2 , and then we count how many of the p-values lie between 0.05 ε and 0.05 . Note that all of these p-values would be considered sufficient to reject H 0 if α = 0.05 is selected. Finally, we determine the proportion of these “significant” p-values obtained from samples where H 0 is true.
Table 2 presents the mean percentage of these significant p-values coming from samples, where H 0 is true for 100 iterations of the simulation scheme with K = 8000 , σ = 1 , and ε = 0.05 for r = 10 , 50 , 100 , 500 , and 1000. As expected, the distribution of the p-values behaved Uniform ( 0 , 1 ) under H 0 , since H 0 was assumed true in the K replicates. Table 2 also presents the proportion of posterior probability of H 0 greater than or equal to 0.5 (50%) when using the R L B ξ , when corrected according to the method suggested in this document (Equations (10) and (11)), and when an exact Bayes factor (Equation (14)) is used. It is clear that the method suggested here behaves very similarly to an exact Bayes factor.

5.2. Fisher’s Exact Test

This is an example where the p-value is a pseudo-p-value (see the example 8.3.30 in [4]). Let S 1 and S 2 be independent observations with S 1 binomial ( n 1 , p 1 ) and S 2 binomial ( n 2 , p 2 ) . Consider testing H 0 : p 1 = p 2 vs. H 1 : p 1 p 2 .
Under H 0 , if we let p be the common value of p 1 = p 2 , the joint pmf of ( S 1 , S 2 ) is
f ( s 1 , s 2 | p ) = n 1 s 1 n 2 s 2 p s 1 + s 2 ( 1 p ) n 1 + n 2 ( s 1 + s 2 )
and the conditional pseudo-p-value is
p ( s 1 , s 2 ) = j = s 1 min { n 1 , s } f ( j | s ) ,
the sum of hypergeometric probabilities, with s = s 1 + s 2 .
Remark 2. 
It does not seem to be simple to estimate the appropriate ξ 0 that best fits the pseudo-p-value in (15), in Figure 4 some arbitrary possibilities are given.
It is important to note that in Bayesian tests with a point null hypothesis, it is not possible to use continuous prior densities, because these distributions (as well as posterior distributions) will grant zero probability to p = ( p 1 = p 2 ) . A reasonable approximation will be to give p = ( p 1 = p 2 ) , a positive probability π 0 , and to p ( p 1 = p 2 ) the prior distribution π 1 g 1 ( p ) , where π 1 = 1 π 0 and g 1 proper. One can think of π 0 as the mass that would be assigned to the real null hypothesis, H 0 : p ( ( p 1 = p 2 ) b , ( p 1 = p 2 ) + b ) if it had not been preferred to approximate by the null point hypothesis. Therefore, if
π ( p ) = π 0 p = ( p 1 = p 2 ) π 1 g 1 ( p ) p ( p 1 = p 2 )
then
m ( s ) = Θ f ( s | p ) π ( p ) d p = f ( s | ( p 1 = p 2 ) ) π 0 + π 1 p ( p 1 = p 2 ) f ( s | p ) g 1 ( p ) d p = f ( s | ( p 1 = p 2 ) ) π 0 + ( 1 π 0 ) m 1 ( s )
where m 1 ( s ) = p ( p 1 = p 2 ) f ( s | p ) g 1 ( p ) d p is the marginal density of ( S = S 1 + S 2 ) with respect to g 1 .
So,
π ( ( p 1 = p 2 ) | s ) = π 0 f ( s | ( p 1 = p 2 ) ) m ( s )
thus
posterior odds = π ( ( p 1 = p 2 ) | s ) 1 π ( ( p 1 = p 2 ) | s ) = f ( s | ( p 1 = p 2 ) ) π 0 m ( s ) ( 1 f ( s | ( p 1 = p 2 ) ) π 0 m ( s ) ) = f ( s | ( p 1 = p 2 ) ) π 0 m ( s ) f ( s | ( p 1 = p 2 ) ) π 0 = f ( s | ( p 1 = p 2 ) ) π 0 ( 1 π 0 ) m 1 ( s ) = π 0 f ( s | ( p 1 = p 2 ) ) π 1 m 1 ( s ) = prior odds · f ( s | ( p 1 = p 2 ) ) m 1 ( s )
and the Bayes factor is
B 01 = f ( s | ( p 1 = p 2 ) ) m 1 ( s ) .
Now, if we take g 1 ( p ) = Beta ( a , b ) such that E ( p ) = a a + b = ( p 1 = p 2 ) , then
B F T e s t = B ( a , b ) B ( s + a , n 1 + n 2 s + b ) p s ( 1 p ) n 1 + n 2 s .
Figure 4 shows the posterior probability for the null hypothesis H 0 when n = n 1 + n 2 = 50 and 100, for the Robust Lower Bound, the Bayes factor B F G ξ 0 (called P B F G ξ 0 ), the Bayes factor B F G (called P B F G ), and the Bayes factor B F T e s t (called P B F T e s t ). We can note that all the P B F G ξ 0 are comparable, even though in the case ξ 0 = 1 ( P B F G ) it is a p-value and not a pseudo-p-value.

5.3. Linear Regression Models

Consider comparing two nested linear models M 3 : y l = λ 1 + λ 2 x l 2 + λ 3 x l 3 + ϵ l with M 2 : y l = λ 1 + λ 2 x l 2 + ϵ l via the test
H 0 : M 2 versus H 1 : M 3 ,
with 1 l n , and the errors ϵ l are assumed to be independent and normally distributed with unknown residual variance σ 2 . According to the Equation (3) in [6,7]
b = ( n 1 ) s 3 2 ( 1 ρ 23 2 ) ,
where s 3 2 is the variance x v 3 , ρ 23 is the correlation between x v 2 and x v 3 , and
C = 2 log ( 1 e v 2 ) 2 v 2 2 log ( 1 e v 3 ) 2 v 3 ,
where v 2 = λ ^ 2 2 / [ d 2 ( 1 + n 2 e ) ] , d 2 = σ 2 / s x l 2 2 , n 2 e = s x l 2 2 / max i { ( x i 2 x ¯ 2 ) 2 } and v 3 = λ ^ 3 2 / [ d 3 ( 1 + n 3 e ) ] , d 3 = σ 2 ( X t X ) 1 , n 3 e = X t X / max i { | X i | 2 } with X = ( I n X * ( X * t X * ) 1 X * ) x l 3 and X * = ( 1 n | x l 2 ) .
As an example, we analyze a data set taken from [14], which can be accessed at http://academic.uprm.edu/eacuna/datos.html (accessed on 13 January 2022). We want to predict the average mileage per gallon (denoted by mpg) of a set of n = 82 vehicles using four possible predictor variables: cabin capacity in cubic feet (vol), engine power (hp), maximum speed in miles per hour (sp), and vehicle weight in hundreds of pounds (wt).
Through the Bayes factors B F G and B F L , we want to choose the best model to predict the average mileage per gallon by calculating the posterior probability of the null hypothesis of the following test   
H 0 : M 2 : mpg = λ 1 + λ 2 wt l + ϵ l vs . H 1 : M 3 : mpg = λ 1 + λ 2 wt l + λ 3 sp l + ϵ l
with α = 0.05 , q = 1 , j = 3 , the posterior probabilities for the null hypothesis H 0 are
P B F L = 0.9253192 , P B F G = 0.7209449 .
The use of this posterior probability in both cases will change the inference, since the p-value of the F test is p = 0.0325 , which is smaller than 0.05 .

Findley’s Counterexample

Consider the following simple linear model [15]
Y i = 1 i · θ + ϵ i , where ϵ i N ( 0 , 1 ) , i = 1 , 2 , 3 , , n
and we are comparing the models H 0 : θ = 0 and H 1 : θ 0 . This is a classical and challenging counterexample against BIC and the Principle of Parsimony. In [7], the inconsistency of BIC is shown, but the consistency of PBIC is shown in this problem.
Here, we show through the posterior probabilities of the null hypothesis that the Bayes factor B F G ( based on BIC) is inconsistent, while the Bayes factor B F L ( based on PBIC) is consistent if it is. We perform the analysis in two contexts: First, when n grows and α = 0.05 or α = 0.01 are fixed. Second, when n is fixed and 0 < α < 0.05 . For calculations
C = 2 log ( 1 e v ) 2 v , v = θ ^ 2 d ( 1 + n e ) , d = i = 1 n 1 i 1 , n e = i = 1 n 1 i .
Figure 5 and Figure 6 show, through the posterior probability of the null hypothesis H 0 , the consistency of the Bayes factor based in PBIC ( P B F L ), as well as the inconsistency of the Bayes factor based in BIC ( P B F G ).

6. Discussion and Final Comments

1.
Lower bounds have been an important development to give practitioners alternatives to classical testing with fixed α levels. A deep-seated problem with the useful bound e · p · l o g ( p ) is that it depends on the p-value, which it should, but it is static, not a function of the sample size n. This limitation makes the bound of little use for moderate to large sample sizes, where it is arguably the correction to p-values more needed.
2.
The approximation develops here as a function of p-values, and sample size has a distinct advantage over other approximations, such as BIC, in that it is a valid approximation for any sample size.
3.
The (approximate) Bayes factors (9) and (11) are simple to use and provide results equivalent to the sensitive p-value Bayes factors of hypothesis tests. In this article, we extended the validity of the approximation for “pseudo-p-values,” which are ubiquitous in statistical practice. We hope that this development will give tools to the practice of statistics to make the posterior probability of hypotheses closer to everyday statistical practice, on which p-values (or pseudo-p-values) are calculated routinely. This allows an immediate and useful comparison between raw-p-values and (approximate) posterior odds.

Author Contributions

Conceptualization, D.V.R., L.R.P.G. and M.E.P.H.; methodology, D.V.R., L.R.P.G. and M.E.P.H.; software, D.V.R.; validation, D.V.R., L.R.P.G. and M.E.P.H.; formal analysis, D.V.R., L.R.P.G. and M.E.P.H.; investigation, D.V.R., L.R.P.G. and M.E.P.H.; writing—original draft preparation, D.V.R.; writing—review and editing, D.V.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The real datasets are freely available in http://academic.uprm.edu/eacuna/datos.html.

Acknowledgments

The first author gratefully acknowledges financial support from the Faculty of Business Administration of the University of Puerto Rico Río Piedras Campus. The work of L.R Pericchi and M.E Pérez has been partially funded by NIH grant U54CA096300, P20GM103475, and R25MD010399.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Proof of Lemma 1. 
Let h ( p v a l ) = e · ξ · log ( p v a l ) , then d [ h ( p v a l ) ] d p v a l = e · ξ p v a l < 0 ; thus, h is decreasing with minimum at ξ = e 1 . So, h ( p v a l ) h ( e 1 ) = e · ξ , which implies B L ( p v a l , ξ ) / p v a l ξ = h ( p v a l ) e · ξ , so B L ( p v a l , ξ ) e · ξ · p v a l ξ > p v a l ξ    □
Proof of Theorem 1. 
First of all, it can be seen that B L ( p , ξ ) = e · ξ · p ξ · log ( p ) is well-defined, since 0 B L ( p , ξ ) 1 .
Let α [ 0 , 1 ] and denote by D B the subset of R p (range of p), such that
e · ξ · p ξ · log ( p ) α ,
then
( B L ( p , ξ ) α ) = [ e · ξ · p ξ · log ( p ) α ] = ( p D B )
where ( p D B ) is the event that consists of all the result x, such that the point p ( x ) D B . Therefore,
F B ( α ) = P ( B L ( p , ξ ) α | p f ( p | ξ ) ) = P ( e · ξ · p ξ · log ( p ) α | p f ( p | ξ ) ) = P ( p D B | p f ( p | ξ ) ) = D B f p ( p ) d p = 0 ρ ξ p ξ 1 d p = ρ ξ
where ρ is determined such that
0 < ρ < 1 e and α = e · ξ · ρ ξ · log ( ρ )
as shown in the Figure A1 for the case when ξ = 1 .
   
Figure A1. Proof of Theorem 1: graph of the generalized Robust Lower Bound for ξ = 1 ( B L ( p , 1 ) ), identifying the value ρ where e · ρ · log ( ρ ) = α .
Figure A1. Proof of Theorem 1: graph of the generalized Robust Lower Bound for ξ = 1 ( B L ( p , 1 ) ), identifying the value ρ where e · ρ · log ( ρ ) = α .
Entropy 25 00618 g0a1
Now, by Lemma 1 F B ( α ) = ρ ξ < e · ξ · ρ ξ · log ( ρ ) = α .

Appendix B. Codes

  • I=seq(1, n1+n2, 1)
  • y=I
  • for (i in I) {
  • y[i]= 1
  • }
  • return(y)
  • }
  • Y=function (n1=10,n2=10) {
  • I=seq(1, n1+n2, 1)
  • y=rep(−1, n1+n2)
  • for (i in I) {
  • y[i]=1
  • }
  • return(y)
  • }
  • ml=function(n1=10, n2=10) {return(lm(X(n1, n2)~Y(n1, n2)))}
  • sigma=function(n1=10, n2=10){
  • return(as.numeric(summary(ml(n1, n2))$sigma^2))}
  • d=function(n1=10, n2=10){return(sigma(n1, n2)*(1/n1+1/n2))}
  • ne=function(n1=10, n2=10){return(min(n1*(1+n1/n2)),n2*(1+n2/n1)))}
  • beta.=function(n1=10, n2=10){
  • return(as.numeric(ml(n1, n2)$coefficients[2]^2))}
  • v=function(n1=10, n2=10){
  • return(beta.(n1, n2)/(d(n1, n2)*( +ne(n1, n2))))}
  • C=function(n1=10, n2=10){
  • return(−2*log((1−exp(−v(n1, n2)))/(sqrt(2)*v(n1, n2))))}
  • # Adaptive alpha eq.8
  • alphabinom=function(n1, n2,alpha){
  • sqrt(2/((n1+n2)*pi*(qchisq(alpha, df=1, lower.tail=F)
  • +log(n1+n2)
  • +C(n1, n2))))*exp(−(qchisq(alpha, df=1, lower.tail=F)
  • +C(n1, n2))/2)
  • }
  • # RLB_xi
  • RLB=function(a,b){
  • −exp(1)*b*a^b*log(a)}
  • pval=seq(0.001,0.36,0.00001)
  • plot(pval,RLB(pval,1),col=4,lty=4,
  • ylab=expression(paste(B[L](p,xi[0]))),
  • xlab=expression(paste(p)),type="l")
  • lines(pval,RLB(pval,1.1),col=5,lty=5)
  • lines(pval,RLB(pval,1.2),col=6,lty=6)
  • lines(pval,RLB(pval,1.3),col=7,lty=7)
  • legend(0.01,1,col =c(4,5,6,7),
  • c(expression(paste(xi[0]==1)),
  • expression(paste(xi[0]==1.1)),
  • expression(paste(xi[0]==1.2)),
  • expression(paste(xi[0]==1.3))),
  • lty=c(4,5,6,7),cex = 0.8)
  • plot(pval,RLB(pval,1),
  • ylab=expression(paste(B[L](p,1))),
  • xlab=expression(paste(p)),type="l")
  • abline(h=RLB(.1,1),lty=2,col="blue")
  • abline(v=0)
  • abline(h=0)
  • segments(0.1,0,0.1,RLB(0.1,1),lty=2)
  • arrows(0.001,RLB(0.1,1),0.025,0.8,length = 0.1)
  • arrows(0.1,0,0.125,0.2,length = 0.1)
  • legend(0.01,0.9,expression(paste(alpha)),bty = "n")
  • legend(0.11,0.3,expression(paste(rho)),bty = "n")
  • alpha=seq(0.000000000001,.05,.00001)
  • # posterior probability of H_0
  • pP=function(a){
  •  1/(1+1/(a))}
  • # posteriors probability (RLB_xi)
  • plot(alpha,pP(RLB(alpha,1)),col=4,lty=4,xlab="p",
  • ylab=expression(paste(minP(H[0]/x))),type = "l")
  • lines(alpha,pP(RLB(alpha,1.1)),col=6,lty=6)
  • lines(alpha,pP(RLB(alpha,1.2)),col=9,lty=9)
  • lines(alpha,pP(RLB(alpha,1.3)),col=10,lty=10)
  • legend(0,.28,col =c(4,6,9,10),
  • c(expression(paste(P[RLB])),
  • expression(paste(P[RLB[1.1]])),
  • expression(paste(P[RLB[1.2]])),
  • expression(paste(P[RLB[1.3]]))),
  • lty=c(4,6,9,10),cex = 0.8)
  • Y=function(n1,n2){
  • c=cbind2(c(rep(1,n1),rep(1,n2)))
  • return(c)}
  • Y1=function(n1,n2){
  • set.seed(2)
  • a=rnorm(n1+n2,0,.05)
  • c=cbind2(c(rep(1,n1),rep(3,n2))+a)
  • return(c)
  • }
  • X1=function(n1,n2){
  • c=cbind2(c(rep(1,n1),rep(-1,n2)))
  • return(c)
  • }
  • X=function(n1,n2){
  • return(cbind2(Y(n1,n2),X1(n1,n2)))
  • }
  • b=function(n1,n2){
  • return(abs(det(t(X(n1,n2))%*%X(n1,n2))/det(t(Y(n1,n2))%*%
  • Y(n1,n2))))}
  • l.model=function(n1,n2){return(lm(Y1(n1,n2)~X1(n1,n2)))}
  • beta=function(n1,n2){as.numeric(l.model(n1,n2)$coefficient[2])}
  • d=function(n1,n2){return(2/n1+2/n2)}
  • ne=function(n1,n2){return(min(n1^2,n2^2)*(1/n1+1/n2))}
  • v=function(n1,n2){return(beta(n1,n2)^2/(d(n1,n2)*(1+ne(n1,n2))))}
  • C=function(n1,n2){return(-2*log((1-exp(-v(n1,n2)))/(sqrt(2)*
  • v(n1,n2))))}
  • # Bayes Factor Linear Version (Eq.8)
  • BFL=function(alpha,q,n,b,C,j){
  • −alpha*log(alpha)*gamma(q/2)*b^((n-j)/(2*(n-1)))*
  • ((2*(n-1))/((qgamma(alpha,shape=q/2,rate=(n-j)/
  • (2*(n-1)),lower.tail = FALSE)
  • +log(b)+C)*(n-j)))^(q/2)
  • }
  • # Bayes Factor General (E.q 9)
  • BFG=function(alpha,q,n,C){
  • −alpha*log(alpha)*gamma(q/2)*n^(q/2)*
  • (2/(qchisq(alpha,q,lower.tail=FALSE)+q*log(n)+C))^(q/2)
  • # Bayes Factor $BF_{01}$ (means)
  •  
  • BF=function(t,n1,n2,alpha){
  •   n=n1+n2
  •   l=n-1
  •   return(((n+t)/t)^(1/2)*(((qt(alpha,l,lower.tail=FALSE))^2*
  •   (t/(n+t))+l)/((qt(alpha,l,lower.tail = FALSE))^2+l))^((l+1)/2))
  • }
  • # Plot posteriors probability
  • par(mfrow=c(1,2))
  • plot(alpha,pP(RLB(alpha,1)),col=4,
  • xlab=expression(paste(alpha)),
  • ylab=expression(paste(P(H[0]/x))),
  • main =expression(paste("n=50,","q=1,",
  • tau[0]==6)),type="l",ylim = c(0,1))
  • lines(alpha,pP(BFL(alpha,1,50,b(25,25),C(25,25),2)),
  • col=6)
  • lines(alpha,pP(BFG(alpha,1,50,C(25,25))),col=3)
  • lines(alpha,pP(BF(6,25,25,alpha)),col=9)
  • legend(0.01,1,col =c(4,6,3,9),
  • c(expression(paste(P[RLB])),
  • expression(paste(P[BFL])),
  • expression(paste(P[BFG])),
  • expression(paste(P[BF["01"]]))),
  •  lty=c(1,1,1,1),cex = 0.9)
  • abline(.5,0,lty=2)
  • plot(alpha,pP(RLB(alpha,1)),col=4,
  • xlab=expression(paste(alpha)),
  • ylab=expression(paste(P(H[0]/x))),
  • main = expression(paste("n=100,","q=1,",tau[0]==6)),
  • type="l",ylim = c(0,1))
  • lines(alpha,pP(BFL(alpha,1,100,b(50,50),
  • C(50,50),2)),col=6)
  • lines(alpha,pP(BFG(alpha,1,100,C(50,50))),col=3)
  • lines(alpha,pP(BF(6,50,50,alpha)),col=9)
  • legend(0.01,1,col =c(4,6,3,9),
  •  c(expression(paste(P[RLB])),
  • expression(paste(P[BFL])),
  • expression(paste(P[BFG])),
  • expression(paste(P[BF["01"]]))),
  •  lty=c(1,1,1,1),cex = 0.9)
  • abline(.5,0,lty=2)
  • # Bayes factor Fisher’s Exact Test
  •  
  • B_01=function(p,a,b,alpha,n){
  • p^(qbinom(alpha,n,p,lower.tail = FALSE))*
  • (1-p)^(n-qbinom(alpha,n,p,lower.tail = FALSE))*
  • beta(a,b)/beta(qbinom(alpha,n,p,lower.tail = FALSE)+a,
  • n-qbinom(alpha,n,p,lower.tail = FALSE)+b)
  • }
  • z=B_01(.7,7,3,alpha,50)
  • x=B_01(.7,7,3,alpha,100)
  • # Posteriors probability
  • par(mfrow=c(1,2))
  • plot(alpha,pP(RLB(alpha,1)),col=4,
  • xlab=expression(paste(alpha)),
  • ylab=expression(paste(P(H[0]/x))),
  • main = expression(paste("n=50,","q=1")),type = "l",ylim = c(0,1))
  • lines(alpha,pP(BFG(1,alpha,25,25,1)),col=2)
  • lines(alpha,pP(BFG(1,alpha,25,25,1.1)),col=3)
  • lines(alpha,pP(BFG(1,alpha,25,25,1.2)),col=5)
  • lines(alpha,pP(BFG(1,alpha,25,25,1.3)),col=6)
  • lines(alpha,pP(z),col=9)
  • legend(0.01,1,col =c(4,2,3,5,6,9),
  • c(expression(paste(P[RLB])),
  • expression(paste(P[BFG])),
  • expression(paste(P[BFG[1.1]])),
  • expression(paste(P[BFG[1.2]])),
  • expression(paste(P[BFG[1.3]])),
  • expression(paste(P[BF[Test]]))),
  •  lty=c(1,1,1,1,1,1),cex = 0.6)
  • abline(.5,0,lty=2)
  • plot(alpha,pP(RLB(alpha,1)),col=4,
  • xlab=expression(paste(alpha)),
  • ylab=expression(paste(P(H[0]/x))),
  • main = expression(paste("n=100,","q=1")),
  • type = "l",ylim = c(0,1))
  • lines(alpha,pP(BFG(1,alpha,80,20,1)),col=2)
  • lines(alpha,pP(BFG(1,alpha,80,20,1.1)),col=3)
  • lines(alpha,pP(BFG(1,alpha,80,20,1.2)),col=5)
  • lines(alpha,pP(BFG(1,alpha,80,20,1.3)),col=6)
  • lines(alpha,pP(x),col=9)
  • legend(0.01,1,col =c(4,2,3,5,6,9),
  •  c(expression(paste(P[RLB])),
  • expression(paste(P[BFG])),
  • expression(paste(P[BFG[1.1]])),
  • expression(paste(P[BFG[1.2]])),
  • expression(paste(P[BFG[1.3]])),
  • expression(paste(P[BF[Test]]))),
  •  lty=c(1,1,1,1,1,1),cex = 0.6)
  • abline(.5,0,lty=2)
  • # C and b
  •  
  • Y=function(n){
  • c=cbind2(rep(1,n))
  • return(c)}
  •  
  • X1=function(n){
  • I=seq(1,n,1)
  • x=I
  • for (i in I) {
  • x[i]=1/i
  • }
  • return(as.matrix(x))
  • }
  • Y1=function(n){
  • set.seed(4)
  • a=rnorm(n,0,1)
  • return(a+X1(n)*0.5)
  • }
  • X=function(n){
  • return(cbind2(Y(n),X1(n)))
  • }
  • b=function(n){
  • return(abs(det(t(X(n))%*%X(n))/det(t(Y(n))%*%Y(n))))}
  • l.model=function(n){return(lm(Y1(n)~X1(n)))}
  • theta=function(n){as.numeric(l.model(n)$coefficient[2])}
  • d=function(n){return(1/apply(X1(n),2,sum))}
  • ne=function(n){return(apply(X1(n),2, sum))}
  • v=function(n){return(theta(n)^2/(d(n)*(1+ne(n))))}
  • C=function(n){return(-2*log((1-exp(-v(n)))/(sqrt(2)*v(n))))}
  •  
  • # plot posteriors probability in function of alpha.
  •  
  • par(mfrow=c(1,3))
  • plot(alpha,pP(BFL(alpha,1,100,b(100),C(50),2)),
  • col=4,xlab=expression(paste(alpha)),
  • ylab=expression(paste(P(H[0]/x))),
  • main =expression(paste("n=100,","q=1")),
  • type="l",ylim = c(0,1))
  • lines(alpha,pP(BFG(alpha,1,100,C(100))),col=3)
  • legend(0.01,1,col =c(4,3),
  •  c(expression(paste(P[BFL])),
  • expression(paste(P[BFG]))),
  •  lty=c(1,1),cex = 0.9)
  • abline(.5,0,lty=2)
  •  
  • plot(alpha,pP(BFL(alpha,1,1000,b(1000),C(1000),2)),
  • col=4,xlab=expression(paste(alpha)),
  • ylab=expression(paste(P(H[0]/x))),
  • main =expression(paste("n=1000,","q=1")),
  • type="l",ylim = c(0,1))
  • lines(alpha,pP(BFG(alpha,1,1000,
  • C(1000))),col=3)
  • legend(0.01,1,col =c(4,3),
  • c(expression(paste(P[BFL])),
  • expression(paste(P[BFG]))),
  •  lty=c(1,1),cex = 0.9)
  • abline(.5,0,lty=2)
  •  
  • plot(alpha,pP(BFL(alpha,1,10000,b(10000),C(10000),2)),
  • col=4,xlab=expression(paste(alpha)),
  • ylab=expression(paste(P(H[0]/x))),
  • main =expression(paste("n=10000,","q=1")),
  • type="l",ylim = c(0,1))
  • lines(alpha,pP(BFG(alpha,1,10000,
  • C(10000))),col=3)
  • legend(0.01,1,col =c(4,3),
  • c(expression(paste(P[BFL])),
  • expression(paste(P[BFG]))),
  • lty=c(1,1),cex = 0.9)
  • abline(.5,0,lty=2)
  • # plot posteriors probability in function of n.
  •  
  • I=seq(1,1000,1)
  • BL=I
  • BL1=I
  • BG=I
  • BG1=I
  • for (n in I) {
  •   i=9+n
  •  BL[n]=BFL(0.05,1,i,b(i),C(i),2)
  •  BL1[n]=BFL(0.01,1,i,b(i),C(i),2)
  •  BG[n]=BFG(0.05,1,i,C(i))
  •  BG1[n]=BFG(0.01,1,i,C(i))
  • }
  •  
  • m=seq(10,1009,1)
  • par(mfrow=c(1,2))
  • plot(m,pP(BL),col=4,
  • xlab=expression(paste("n")),
  • ylab=expression(paste(P(H[0]/x))),
  • main =expression(paste(alpha==0.05,",","q=1")),
  • type="l",ylim = c(0,1))
  • lines(m,pP(BG),col=3)
  • legend(0.01,1,col =c(4,3),
  •  c(expression(paste(P[BFL])),
  • expression(paste(P[BFG]))),
  • lty=c(1,1),cex = 0.8)
  • abline(.5,0,lty=2)
  • plot(m,pP(BL1),col=4,
  • xlab=expression(paste("n")),
  • ylab=expression(paste(P(H[0]/x))),
  • main =expression(paste(alpha==0.01,",","q=1")),
  • type="l",ylim = c(0,1))
  • lines(m,pP(BG1),col=3)
  • legend(0.01,1,col =c(4,3),
  • c(expression(paste(P[BFL])),
  • expression(paste(P[BFG]))),
  •  lty=c(1,1),cex = 0.8)
  • abline(.5,0,lty=2)

References

  1. Sellke, T.; Bayarri, M.J.; Berger, J.O. Calibration of p values for testing precise null hypotheses. Am. Stat. 2001, 55, 62–71. [Google Scholar] [CrossRef]
  2. Benjamin, D.; Berger, J.; Johannesson, M.; Nosek, B.; Wagenmakers, E.-J.; Berk, R.; Bollen, K.; Brembs, B.; Brown, L.; Camerer, C.; et al. Redefine statistical significance. Nat. Hum. Behav. 2018, 2, 6–10. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  3. Held, L.; Ott, M. How the Maximal Evidence of p-Values Against Point Null Hypotheses Depends on Sample Size. Am. Stat. 2016, 70, 335–341. [Google Scholar] [CrossRef] [Green Version]
  4. Casella, G.; Berger, R. Statistical Inference, 2nd ed.; Duxbury Resource Center: Belmont, CA, USA, 2017. [Google Scholar]
  5. Pérez, M.E.; Pericchi, L.R. Changing statistical significance with the amount of information: The adaptive alfa significance level. Stat. Probab. Lett. 2014, 85, 20–24. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. Vélez, D.; Pérez, M.E.; Pericchi, L.R. Increasing the replicability for linear models via adaptive significance levels. Test 2022, 31, 771–789. [Google Scholar] [CrossRef]
  7. Bayarri, M.J.; Berger, J.O.; Jang, W.; Ray, S.; Pericchi, L.R.; Visser, I. Prior-based bayesian information criterion. Stat. Theory Relat. Fields 2019, 3, 2–13. [Google Scholar] [CrossRef]
  8. Vovk, V. A logic of probability, with application to the foundations of statistic. J. R. Stat. Soc. Ser. B 1993, 55, 31–351. [Google Scholar] [CrossRef]
  9. Held, L.; Ott, M. On p-values and bayes factors. Annu. Rev. Stat. Appl. 2018, 5, 393–419. [Google Scholar] [CrossRef] [Green Version]
  10. Cabras, S.; Castellanos, M. p-value calibration in multiple hypotheses testing. Stat. Med. 2021, 36, 2875–2886. [Google Scholar] [CrossRef] [PubMed]
  11. Patiño Hoyos, A.E.; Fossaluza, V.; Esteves, L.G.; Bragança Pereira, C.A.d. Adaptive Significance Levels in Tests for Linear Regression Models: The e-Value and p-Value Cases. Entropy 2023, 25, 19. [Google Scholar] [CrossRef] [PubMed]
  12. Luis, P.; Pereira, C. Adaptative Significance Levels Using Optimal Decision Rules: Balancing by Weighting the Error Probabilities. Braz. J. Probab. Stat. 2015, 29, 70–90. [Google Scholar]
  13. Roger, S.Z.; Sarkar, A.; Carroll, R.J.; Mallick, B.K. A powerful bayesian test for equality of means in high dimensions. J. Am. Stat. Assoc. 2018, 113, 1733–1741. [Google Scholar]
  14. Acuna, E. Regresion Aplicada Usando R; Universidad de Puerto Rico en Mayagüez, Departamento de Ciencias Matemáticas: Mayagüez, Puerto Rico, 2015. [Google Scholar]
  15. Findley, D.F. Counterexamples to parsimony and BIC. Ann. Inst. Stat. Math. 1991, 43, 505–514. [Google Scholar] [CrossRef]
Figure 1. Extended Robust Lower Bound R L B ξ 0 as a function of p for different values of ξ 0 .
Figure 1. Extended Robust Lower Bound R L B ξ 0 as a function of p for different values of ξ 0 .
Entropy 25 00618 g001
Figure 2. Lower bound for posterior probability for the null hypothesis H 0 (in (13)) for ξ 0 = 1 , ξ 0 = 1.1 , ξ 0 = 1.2 , ξ 0 = 1.3 .
Figure 2. Lower bound for posterior probability for the null hypothesis H 0 (in (13)) for ξ 0 = 1 , ξ 0 = 1.1 , ξ 0 = 1.2 , ξ 0 = 1.3 .
Entropy 25 00618 g002
Figure 3. Posterior probability for the null hypothesis H 0 for n = 50 and n = 100 using the Bayes factor R L B ξ 0 with ξ 0 = 1 , the Bayes factor B F 01 , and the Bayes factor B F L and B F G .
Figure 3. Posterior probability for the null hypothesis H 0 for n = 50 and n = 100 using the Bayes factor R L B ξ 0 with ξ 0 = 1 , the Bayes factor B F 01 , and the Bayes factor B F L and B F G .
Entropy 25 00618 g003
Figure 4. Posterior probability for the null hypothesis H 0 for n = 50 and n = 100 using the Bayes factor R L B ξ 0 with ξ 0 = 1 , the Bayes factor B F T e s t , the Bayes factor B F G ξ 0 , and the Bayes factor B F G .
Figure 4. Posterior probability for the null hypothesis H 0 for n = 50 and n = 100 using the Bayes factor R L B ξ 0 with ξ 0 = 1 , the Bayes factor B F T e s t , the Bayes factor B F G ξ 0 , and the Bayes factor B F G .
Entropy 25 00618 g004
Figure 5. Posterior probability for the null hypothesis H 0 for n = 100 , n = 1000 and n = 10 , 000 using the Bayes factors B F L and B F G .
Figure 5. Posterior probability for the null hypothesis H 0 for n = 100 , n = 1000 and n = 10 , 000 using the Bayes factors B F L and B F G .
Entropy 25 00618 g005
Figure 6. Posterior probability for the null hypothesis H 0 for α = 0.05 and α = 0.01 using the Bayes factors B F L and B F G when n grows.
Figure 6. Posterior probability for the null hypothesis H 0 for α = 0.05 and α = 0.01 using the Bayes factors B F L and B F G when n grows.
Entropy 25 00618 g006
Table 1. Adaptive α via PBIC in (8) for testing equality of two proportions for different sample sizes when α = 0.05 .
Table 1. Adaptive α via PBIC in (8) for testing equality of two proportions for different sample sizes when α = 0.05 .
Adaptive α via PBIC ( α n )
n 1 n 2 n = n 1 + n 2
10100.0068
25250.0040
50500.0027
100500.0021
501000.0021
1001000.0018
Table 2. Mean percentage of p-values less than 0.05 (considered significant) coming from data generated under the null hypothesis for 100 experiments, where K = 8000 testing problems are generated under H 0 : μ 1 = μ 2 . This experiment is performed for different groups with sample sizes r. Corrected and uncorrected Bayes factors are considered, as well as an exact Bayes factor.
Table 2. Mean percentage of p-values less than 0.05 (considered significant) coming from data generated under the null hypothesis for 100 experiments, where K = 8000 testing problems are generated under H 0 : μ 1 = μ 2 . This experiment is performed for different groups with sample sizes r. Corrected and uncorrected Bayes factors are considered, as well as an exact Bayes factor.
% Of Samples with P ( H 0 | x ) 0.5
r% Of Samples with  p < 0.05 RLB ξ BFG BFL BF 01
10 5 % 0 % 58 % 66 % 75 %
50 5 % 0 % 81 % 86 % 87 %
100 5 % 0 % 86 % 89 % 91 %
500 5 % 0 % 94 % 96 % 96 %
1000 5 % 0 % 95 % 96 % 97 %
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Vélez Ramos, D.; Pericchi Guerra, L.R.; Pérez Hernández, M.E. From p-Values to Posterior Probabilities of Null Hypotheses. Entropy 2023, 25, 618. https://doi.org/10.3390/e25040618

AMA Style

Vélez Ramos D, Pericchi Guerra LR, Pérez Hernández ME. From p-Values to Posterior Probabilities of Null Hypotheses. Entropy. 2023; 25(4):618. https://doi.org/10.3390/e25040618

Chicago/Turabian Style

Vélez Ramos, Daiver, Luis R. Pericchi Guerra, and María Eglée Pérez Hernández. 2023. "From p-Values to Posterior Probabilities of Null Hypotheses" Entropy 25, no. 4: 618. https://doi.org/10.3390/e25040618

APA Style

Vélez Ramos, D., Pericchi Guerra, L. R., & Pérez Hernández, M. E. (2023). From p-Values to Posterior Probabilities of Null Hypotheses. Entropy, 25(4), 618. https://doi.org/10.3390/e25040618

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop