Next Article in Journal
Tilting Quivers for Hereditary Algebras
Next Article in Special Issue
Classification Methods for the Serological Status Based on Mixtures of Skew-Normal and Skew-t Distributions
Previous Article in Journal
Lyapunov-Based Control via Atmospheric Drag for Tetrahedral Satellite Formation
Previous Article in Special Issue
Statistical Study Design for Analyzing Multiple Gene Loci Correlation in DNA Sequences
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Hypothesis Test to Compare Two Paired Binomial Proportions: Assessment of 24 Methods

by
José Antonio Roldán-Nofuentes
1,*,
Tulsi Sagar Sheth
1,2 and
José Fernando Vera-Vera
3
1
Department of Statistics and Operations Research, School of Medicine, University of Granada, 18016 Granada, Spain
2
Department of Applied Sciences and Humanities, Parul Institute of Engineering and Technology, Parul University, Vadodara 391760, Gujarat, India
3
Department of Statistics and Operations Research, Faculty of Sciences, University of Granada, Fuentenueva s/n, 18071 Granada, Spain
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(2), 190; https://doi.org/10.3390/math12020190
Submission received: 7 November 2023 / Revised: 29 December 2023 / Accepted: 1 January 2024 / Published: 6 January 2024
(This article belongs to the Special Issue Advances in Biostatistics and Applications)

Abstract

:
The comparison of two paired binomial proportions is a topic of interest in statistics, with important applications in medicine. There are different methods in the statistical literature to solve this problem, and the McNemar test is the best known of all of them. The problem has been solved from a conditioned perspective, only considering the discordant pairs, and from an unconditioned perspective, considering all of the observed values. This manuscript reviews the existing methods to solve the hypothesis test of equality for the two paired proportions and proposes new methods. Monte Carlo simulation methods were carried out to study the asymptotic behaviour of the methods studied, giving some general rules of application depending on the sample size. In general terms, the Wald test, the likelihood-ratio test, and two tests based on association measures in 2 × 2 tables can always be applied, whatever the sample size is, and if the sample size is large, then the McNemar test without a continuity correction and the modified Wald test can also be applied. The results have been applied to a real example on the diagnosis of coronary heart disease.

1. Introduction

The comparison of two proportions is a topic of special interest in statistics [1], with important applications in medicine and health sciences in general. Of special interest is the case in which the two proportions are paired, as is the case in which, in a sample of n individuals, a binary variable is observed before and after a certain treatment or when the sensitivities (specificities) of two binary diagnostic tests are compared with respect to the same gold standard [2,3]. This problem also frequently arises in clinical trials [4], such as when assessing the effectiveness of a new drug or treatment. These situations give rise to the analysis of a 2 × 2 table, in which the only value set by the researcher is the sample size n. There are numerous statistical methods in the statistical literature to solve these problems. Classically, the problem has been solved by conditioning in discordant pairs and thus neglecting the frequencies of discordant pairs. This way of solving the problem has given rise to different methods, and the McNemar test [5] is the best known of all of them [6,7,8]. The problem can be solved with exact tests (conditioned and unconditioned) and with approximate tests (conditioned and unconditioned). All test statistics of the approximate methods are distributed approximately according to a chi-square distribution with one degree of freedom.
In the statistical literature, there are numerous methods to solve the hypothesis test to compare two paired proportions. May and Johnson [9], Park [10], and, more recently, Fagerland et al. [11,12,13] have compared different methods to solve this problem. However, in these works, only some of the existing methods have been studied. This is one of the main motivations for our study together with the proposal of new methods, comparing a large number of different methods to solve the hypothesis testing to compare two paired binomial proportions.
An alternative method to the hypothesis test, one directly related to it, consists of comparing the two paired proportions using confidence intervals for the difference (or ratio) of the two paired proportions. A review of more common confidence intervals can be seen in Pradhan et al. [4] and Tan et al. [14]. In addition, new intervals are proposed in Pradhan et al. [15], more recently in Fay et al. [16], and in Chan et al. [17]. A review of different methods to solve the hypothesis test as well as confidence intervals for the difference and the ratio of two paired proportions can be seen in Fagerland et al. [13].
Therefore, the purpose of this manuscript is to compare the asymptotic behaviour in terms of type I error rates and powers of different methods to solve the hypothesis test to compare two paired binomial proportions and to provide general rules of application for the methods. The rest of the article is structured as follows. Section 2 describes 24 methods to solve the hypothesis test for comparing two paired binomial proportions. Section 3 describes the criteria used to compare the asymptotic behaviour of the 24 methods. Section 4 carries out extensive Monte Carlo simulation experiments to study the type I error rates and the powers of the methods. Section 5 presents general rules of application for the methods to solve the problem posed. In Section 6, the results are applied to a real example on the diagnosis of coronary heart disease, and Section 7 discusses the conclusions obtained.

2. Notation and Methods

In general terms and focusing on common problems in the field of medicine, let us consider a binary random variable, with the categories of ‘success’ and ‘failure’, which is observed in a random sample of n individuals before and after a treatment. This situation gives rise to Table 1, where the only value set from the researcher is the sample size n. This table also shows the theoretical probabilities of each cell. The data observed in this table, n = ( n 11 , n 12 , n 21 , n 22 ) T , were the product of a multinominal distribution with probability vector p = ( p 11 , p 12 , p 21 , p 22 ) T , verifying that p i j = 1 . Variance–covariance matrix of p was as follows:
p ^ = d i a g ( p ) p p T n ,
and the estimator of p i j was p ^ i j = n i j / n .
In this situation, the comparison of two paired binomial proportions consisted of solving the hypothesis test:
H 0 : p 1 · = p · 1   vs   H 1 : p 1 · p · 1 ,
which was equivalent to solving the test:
H 0 : p 12 = p 21   vs   H 1 : p 12 p 21 .
Estimators of p 1 · and p · 1 were as follows:
p ^ 1 · = n 11 + n 12 n = n 1 · n   and   p ^ · 1 = n 11 + n 21 n = n · 1 n .
The following describes 24 statistical methods to solve this hypothesis test. Of these 24 methods, two were exact, one was quasi-exact, and 21 were approximate (of which five were new).
1.
Conditional exact test (CET)
The probabilities p 11 and p 22 did not intervene in the hypothesis test (1) so that these probabilities could be ignored, as frequencies n 11 and n 22 could, because they did not influence the results of the hypothesis test (1). Conditioning was in the sum of discordant frequencies, i.e., an exact test was obtained using the binomial distribution [13,18]. Conditioning on n = n 12 + n 21 , it was verified that p 12 + p 21 = 1 , and therefore, n 12 was the product of a binomial distribution of parameters n and p 12 , i.e., n 12 B i n ( n , p 12 ) . If H 0 : p 12 = p 21 was true, then p 12 = p 21 = 1 / 2 , and the hypothesis test (1) was equivalent to the test:
H 0 : p 12 = 1 / 2   vs   H 1 : p 12 1 / 2
The p-value could be calculated directly from the binomial distribution. If we assumed that n 12 n 21 , then the following was derived:
two - sided   exact   p - value = P ( X n 12   o r   X n 21 ) = 2 × P ( X n 12 ) ,
where X B i n ( n , 1 / 2 ) . Finally, the two-sided exact p-value for the comparison test of the two paired binomial proportions was as follows:
two - sided   exact   p - value = 2 × j = 0 M i n ( n 12 , n 21 ) ( n j ) ( 1 2 ) n .
Conditional exact test is a conservative test; that is, when H 0 is true, the p-value is typically less than α % of the time, where α is the nominal error level.
2.
Conditional exact mid-p test (MidpT)
The conditional exact mid-p test [19] is a modification of the CET that consists of subtracting the probability of the observed outcome n 12 from (3), as in the following:
P ( X = n 12 ) = ( n n 12 ) ( 1 2 ) n
Then, the mid-p value to compare the two proportions is as follows:
mid - p   value = two - sided   exact   p - value ( n n 12 ) ( 1 2 ) n .
Conditional exact mid-p test is a less conservative method than the CET.
3.
McNemar Test (MT)
The McNemar test [4,13,18] is the asymptotic version of the CET. Conditioning in n = n 12 + n 21 and applying the central limit theorem, the test statistic for hypothesis test (1) is as follows:
z = p ^ 12 p ^ 21 V a r ( p ^ 12 p ^ 21 ) ,
whose distribution is approximately a standard normal distribution and where the following occurs:
V a r ( p ^ 12 p ^ 21 ) = p 12 + p 21 ( p 12 p 21 ) 2 n ,
Since it is being conditioned in n = n 12 + n 21 (frequencies n 11 and n 22 are disregarded), then p ^ 12 = n 12 / n and p ^ 21 = n 21 / n . If H 0 : p 12 = p 21 is true, then the following are derived:
V a r 0 ( p ^ 12 p ^ 21 ) = p 12 + p 21 n
and
V ^ a r 0 ( p ^ 12 p ^ 21 ) = n 12 + n 21 ( n ) 2 .
Substituting p i j with p ^ i j = n i j / n and V a r ( p ^ 12 p ^ 21 ) for V ^ a r 0 ( p ^ 12 p ^ 21 ) in the expression of the test statistic z, the test statistic of the McNemar test (without continuity correction) is as follows:
z M T = n 12 n 21 n 12 + n 21 ,
whose distribution is approximately (it is traditionally required that n 12 + n 21 > 10 ) a standard normal. Very often, the test statistic is expressed in terms of the chi-square distribution:
χ M T 2 = ( n 12 n 21 ) 2 n 12 + n 21 ,
whose distribution is approximately one chi-square with a degree of freedom. MT is a method that has good asymptotic behaviour in terms of type I error rate and power.
4.
McNemar test with Yates continuity correction (MTYcc)
The McNemar test approximates the binomial distribution to the normal distribution. In this situation, it is common to apply a continuity correction (cc), whose objective is to improve the approximation to the normal distribution. Edwards [20] proposed the following test statistic with Yates cc [21]:
z M T Y c c = ( p ^ 12 p ^ 21 ) 1 n V ^ a r 0 ( p ^ 12 p ^ 21 ) ,
whose distribution is approximately a standard normal distribution. It is also common to express this test statistic in terms of the chi-square distribution [13,18]:
χ M T E c c 2 = ( | n 12 n 21 | 1 ) 2 n 12 + n 21 .
5.
McNemar test with continuity correction (MTcc1)
Conditioning in n = n 12 + n 21 , the random variable n 12 n 21 jumps from 1 to 1, so a cc is 1 / 2 (half the jump) [22]. Therefore, another test statistic of the McNemar test with cc is as follows:
z M T c c 1 = ( p ^ 12 p ^ 21 ) 1 2 n V ^ a r 0 ( p ^ 12 p ^ 21 ) = ( n 12 n 21 ) 1 2 V ^ a r 0 ( p ^ 12 p ^ 21 ) ,
or what is the same:
χ M T c c 1 2 = ( | n 12 n 21 | 0.5 ) 2 n 12 + n 21 .
This cc has been used by Chang et al. [17] to estimate the difference between two paired binomial proportions using confidence intervals. These authors have also proposed other continuity corrections: 0.125 and 0.25. We proposed applying these continuity corrections to the McNemar test statistics, obtaining the following new test statistics (called MTcc2 and MTcc3, respectively):
χ M T c c 2 2 = ( | n 12 n 21 | 0.25 ) 2 n 12 + n 21   and   χ M T c c 3 2 = ( | n 12 n 21 | 0.125 ) 2 n 12 + n 21 .
6.
Modified McNemar test (MMT)
Bennett and Underwood [23] proposed a modification of the McNemar test statistic by adding 1 / 2 to the observed frequencies, with the aim of improving the approximation to the chi-square distribution. Thus, the test statistic is as follows:
χ M M T 2 = ( n 12 n 21 ) 2 n 12 + n 21 + 1 .
7.
Wald test (WT)
The hypothesis test (1) can be solved by applying the Wald method [24,25]. Since p = ( p 11 , p 12 , p 21 , p 22 ) T is the probability vector of a multinomial distribution, its variance-covariance matrix is as follows:
p ^ = d i a g ( p ) p T p n .
The hypothesis test (2) is equivalent to checking the following:
H 0 : Δ T p = 0   vs   H 1 : Δ T p 0 ,
where
Δ = ( 0 , 1 , 1 , 0 ) T ,
It is easy to verify that the estimated variance of p ^ 12 p ^ 21 is as follows:
V ^ a r ( p ^ 12 p ^ 21 ) = V ^ a r ( Δ T p ^ ) = V ^ a r ( p ^ 12 ) + V ^ a r ( p ^ 21 ) 2 C ^ o v ( p ^ 12 , p ^ 21 ) = n 12 ( n n 12 ) + n 21 ( n n 21 ) + 2 n 12 n 21 n 3 ,
Applying the central limit theorem, the following is derived:
p ^ 12 p ^ 21 ( p 12 p 21 ) V a r ( p ^ 12 p ^ 21 ) N ( 0 , 1 ) .
By performing algebraic operations, it was obtained that the Wald test statistic for test (2) was as follows:
χ W T 2 = n ( n 12 n 21 ) 2 4 n 12 n 21 + ( n 11 + n 22 ) ( n 12 + n 21 ) ,
whose distribution was approximately a chi-square distribution with a degree of freedom.
8.
Modified Wald test (MWT)
May and Johnson [9] proposed modifying the Wald test statistic by adding 1 / 2 to n 12 and to n 21 . Thus, the modified Wald test statistic is as follows:
χ M W T 2 = ( n 12 n 21 ) 2 n 12 + n 21 + 1 ( n 12 n 21 ) 2 n .
This method has good asymptotic behaviour and is recommended as one of the best methods to solve the hypothesis test [9].
9.
Likelihood-ratio test (LRT)
The hypothesis test (1) can be solved by applying the likelihood-ratio test [26]. The likelihood function of the data is as follows:
L ( p , n ) = k p 11 n 11 p 12 n 12 p 21 n 21 p 22 n 22 ,
where k = n ! / i , j = 1 2 n i j ! . If H 0 : p 12 = p 21 is true, then it is verified that the likelihood function of the data is as follows:
L 0 ( p , n ) = k p 11 n 11 p 12 n 12 + n 21 p 22 n 22
and that the following is derived:
p ^ 12 = p ^ 21 = n 12 + n 21 2 n = n 2 n .
Applying the likelihood-ratio test [25,26], the likelihood-ratio test statistic to compare the two proportions was as follows:
χ L R T 2 = 2 log ( ( n 2 n ) n 12 + n 21 ( n 12 / n ) n 12 ( n 21 / n ) n 21 ) = 2 n 12 log ( 2 n 12 n 12 + n 21 ) + 2 n 21 log ( 2 n 21 n 12 + n 21 ) ,
whose distribution was approximately one chi-square with a degree of freedom. Therefore, the test statistic of the LRT method only contained the frequencies of the discordant pairs.
10.
Unconditional exact test (UET)
The CET method condition on n = n 12 + n 21 . Suissa and Shuster [27] have proposed, from the McNemar test statistic, an exact test that uses all the observed frequencies and therefore does not condition in n 12 + n 21 . When the two proportions were compared, the power function of the test was as follows:
P ( p 12 , p 21 ) = C ( n n 12 n 21 n m ) p 12 n 12 p 21 n 21 ( 1 p 12 p 21 ) n m ,
where m = n 12 + n 21 and C = { ( n 12 , m ) :   n 12 h ( m ) ;   n 12 = 0 , 1 , , m ;   m = 0 , 1 , , n } , with h ( m ) = ( z M m + m ) / 2 and z M as the calculated value of the McNemar statistic. If H 0 : p 12 = p 21 was true, then the distribution of ( n 12 , m , n m ) was a trinomial distribution with parameters n and probability vector ( δ / 2 , δ / 2 , 1 δ ) T , and the power function was as follows:
P ( δ ) = C ( n n 12 n 21 n m ) ( δ 2 ) m ( 1 δ ) n m ,
where δ = p 12 + p 21 was the nuisance parameter. El nuisance parameter was eliminated by maximizing this function over the range of δ . The function P ( δ ) was simplified as follows:
P ( δ ) = j = k n ( n j ) δ j ( 1 δ ) n j F j ( j i j 1 ) ,
where k = int [ z m 2 + 1 ] , i j = int [ h ( j ) ] , int [ . ] was the integer function and F j was the cumulative binomial distribution function with parameters j and 1 / 2 . Finally, the two-sided exact p-value was calculated as follows:
two - sided   exact   p - value = 2 × sup 0 < δ < 1 { P ( δ ) } .
11.
Unconditional McNemar test (UMT)
Lu [28] has proposed a test statistic for the McNemar test that does not condition on n = n 12 + n 21 . Hypothesis test (1) was equivalent to the following hypothesis test:
H 0 : p 12 p 12 + p 21 = p 21 p 12 + p 21   vs   H 1 : p 12 p 12 + p 21 p 21 p 12 + p 21 .
If H 0 : p 12 = p 21 was true, then n 12 (or n 21 ) was the product of a binomial distribution with parameters n and δ = ( p 12 + p 21 ) / 2 , that is to say, n 12 B i n ( n , δ ) . The mean and variance of the estimators of this binomial distribution were as follows:
n δ ^ = ( n 12 + n 21 ) / 2   and   n δ ^ ( 1 δ ^ ) = ( n 12 + n 21 ) ( n + n 11 + n 22 ) 4 n ,
respectively. Approximating the normal distribution and applying the central limit theorem, the unconditional test statistic was as follows:
z U M T = n 12 n δ ^ n δ ^ ( 1 δ ^ ) = n 12 n 21 ( n 12 + n 21 ) ( n + n 11 + n 22 ) n ,
or rather
χ U M T 2 = n ( n 12 n 21 ) 2 ( n 12 + n 21 ) ( n + n 11 + n 22 ) ,
whose distribution was approximately a chi-square distribution with one degree of freedom. In order to apply this method, it was required that n 12 + n 21 10 , and its asymptotic behaviour was very similar to that of the CET [28].
12.
Unconditional likelihood-ratio test (ULRT)
Lu [29] also proposed a likelihood-ratio test statistic to compare two binomial proportions that contain all frequencies. The likelihood-ratio test statistic is obtained in two phases: (I) the likelihood-ratio test statistic is calculates when the four n i j frequencies are combined in two, n 12 and n 11 + n 21 + n 22 ; (II) the likelihood-ratio test statistic is calculated when the four n i j frequencies are combined in another two, n 21 and n 11 + n 12 + n 22 . Corresponding test statistics were as follows:
χ I 2 = 2 × [ n 12 log ( 2 n 12 n 12 + n 21 ) + ( n 11 + n 21 + n 22 ) log ( 2 ( n 11 + n 21 + n 22 ) 2 n n 12 n 21 ) ]
and
χ I I 2 = 2 × [ n 21 log ( 2 n 21 n 12 + n 21 ) + ( n 11 + n 12 + n 22 ) log ( 2 ( n 11 + n 12 + n 22 ) 2 n n 12 n 21 ) ] .
Finally, the likelihood-ratio test statistic was calculated as the mean of both likelihood-ratio test statistics:
χ U L R T 2 = χ I 2 + χ I I 2 2 = n 12 log ( 2 n 12 n 12 + n 21 ) + n 21 log ( 2 n 21 n 12 + n 21 ) + ( n n 12 ) log [ 2 ( n n 12 ) 2 n n 12 n 21 ] + ( n n 21 ) log [ 2 ( n n 21 ) 2 n n 12 n 21 ] , ,
and its distribution was approximately a chi-square distribution with one degree of freedom. The ULRT can be applied in most cases, although the test statistic does not fit well to the chi-square distribution when the difference between n 12 and n 21 is large, especially when n 11 + n 22 is also large, and in this situation, it was a better method than the LRT [29].
13.
New revised version of the McNemar test (NMT)
Lu et al. [30] revised the unconditional McNemar test [28]. Under the hypothesis that is no difference in the number of “success” and “failure” results between “before” and “after”, the estimated probability of obtaining a “success” is as follows:
p ^ = n 12 + n 21 + 2 n 11 2 n ,
and the estimated probability of obtaining a “failure” is as follows:
q ^ = 1 p ^ = n 12 + n 21 + 2 n 22 2 n .
Frequencies n 12 + n 11 and n 21 + n 22 correspond to “success” and “failure” in “before” measurements. The estimated mean is as follows:
μ ^ = n p ^ = n 12 + n 21 + 2 n 11 2 ,
and the estimated standard deviation is as follows:
σ ^ = n p ^ q ^ = 1 2 ( n 12 + n 21 + 2 n 11 ) ( n 12 + n 21 + 2 n 22 ) 2 .
Applying the central limit theorem, the statistic test was as follows:
z N M T = n 12 n 21 ( n 12 + n 21 + 2 n 11 ) ( n 12 + n 21 + 2 n 22 ) n ,
and its distribution was approximately a standard normal distribution when n 12 + n 21 + 2 n 11 10 and n 12 + n 21 + 2 n 22 10 . Alternatively, the following was derived:
χ N M T 2 = n ( n 12 n 21 ) 2 ( n 12 + n 21 + 2 n 11 ) ( n 12 + n 21 + 2 n 22 ) .
This method had an asymptotic behaviour that improved that of the UMT [30].
14.
New revised version of the McNemar test with cc (NMTcc)
Lu et al. [30] revised the unconditional McNemar test and proposed the following unconditional test statistic with cc:
χ N M T c c 2 = n ( | n 12 n 21 | 1 ) 2 ( n 12 + n 21 + 2 n 11 ) ( n 12 + n 21 + 2 n 22 ) .
15.
Haber test (HT)
Haber [31] has studied the use of continuity correction in hypothesis testing, particularizing the results in 2 × 2 tables. Haber proposed a McNemar test statistic with a cc based on the McNemar test statistics:
z H T = z M T n 2 m ,
where z M T = | n 12 n 21 | / n 12 + n 21 is the McNemar test statistic and m is the number of different values z may attain. The number of different achievable values of z M T is very close to 0.9 ( n + 1 ) 2 / 4 , and since the range of z M T values is [ 0 , n ] , the cc based on the average difference of the successive values gives rise to the test statistic:
χ H T 2 = [ | n 12 n 21 | n 12 + n 21 2 n 0.9 ( n + 1 ) 2 ] 2 ,
and its distribution is approximately a chi-square with one degree of freedom.
16.
Irony et al. test (IT)
Irony et al. [32] have studied the comparison of two binomial proportions from a Bayesian perspective. The Dirichlet distribution is the natural conjugate prior for p = ( p 11 , p 12 , p 21 , p 22 ) T . Therefore, the distribution for PI is a Dirichlet with parameter a = ( a 11 , a 12 , a 21 , a 22 ) T , and its posterior distribution is also Dirichlet with parameter A = ( A 11 , A 12 , A 21 , A 22 ) T , where A i j = a i j + n i j . The objective is to solve the hypothesis test:
H o : δ = 0   v s   H o : δ 0 ,
where δ = p 12 p 21 . This hypothesis test is equivalent to the following:
H o : θ = 1 2   v s   H o : θ 1 2 ,
where θ = p 12 / ( p 12 + p 21 ) . Therefore, the only parameters of interest are p 12 and p 21 , and therefore, only the trinomial data ( n 12 , n 21 , n 11 + n 22 ) are considered. Likelihood function is written as a product of two factors: one depending only on the parameter of interest θ and the other depending only on the nuisance parameter η . Distribution of θ is as follows:
B e t a ( A 12 , A 21 ) ,
and distribution of η is as follows:
B e t a ( A 12 + A 21 , A 11 + A 22 ) .
Parameters θ and η are independent. An interval for δ is constructed by generating a large number of observations from the posterior distribution of ( p 12 , p 21 , 1 η ) , that is, a Dirichlet distribution with parameter ( A 12 , A 21 , A 11 + A 22 ) . Irony et al. [32] have shown that posterior mean of δ is as follows:
A 12 + A 21 A ,
and posterior variance of δ is as follows:
4 A 12 A 21 + ( A 12 + A 21 ) ( A 11 + A 22 ) ( A + 1 ) A 2 .
A confidence interval for δ is as follows:
δ ^ * ± q η ^ * δ ^ * 2 n ± 1 n ,
where
δ ^ * = δ ^ 1 + q 2 n ,   δ ^ = n 12 n 21 n ,   η ^ * = η ^ 1 + q 2 n ,   η ^ = n 12 + n 21 n ,
and q is the 100 ( 1 α / 2 ) % quantile of the standard normal distribution. From the previous equations, test statistic for the hypothesis test (1) was as follows:
χ I T 2 = { ( n 12 n 21 1 ) 2 n 12 + n 21 ,   if   n 12 > n 21 ( n 12 n 21 + 1 ) 2 n 12 + n 21 ,   if   n 12 < n 21 0 ,   if   n 12 = n 21 ,
whose distribution was approximately a chi-square distribution with one degree of freedom.
17.
RR test (RRT)
The hypothesis test (1) was equivalent to the hypothesis test:
H 0 : R R = 1   vs   H 1 : R R 1 ,
where
R R = p 11 + p 12 p 11 + p 21 = p 1 · p · 1 .
Lui [33] solved this hypothesis test by applying weighted least squares. Estimator of RR is as follows:
R R ^ = n 1 · n · 1 ,
and applying the delta method the estimated variance of R R ^ is as follows:
V ^ a r [ log ( R R ^ ) ] = ( R R p ) p = p ^ ^ p ^ ( R R p ) p = p ^ T = 2 ( 1 p ^ ) n p ^ 2 [ p ^ 11 p ^ 22 ( p ^ ) 2 ] n ( p ^ ) 2 ,
where
^ p ^ = d i a g ( p ^ ) p ^ p ^ T n ,   p ^ = n 1 · + n · 1 2 n ,   and   p ^ = n 12 + n 21 2 n .
Applying the central limit theorem, the test statistic for hypothesis test (4) was as follows:
z R R T = log ( R R ^ ) V ^ a r [ log ( R R ^ ) ] n N ( 0 , 1 ) ,
or equivalently
χ R R T 2 = [ log ( R R ^ ) ] 2 V ^ a r [ log ( R R ^ ) ] ,
whose distribution was approximately a chi-square distribution with one degree of freedom.
18.
OD test (ODT)
The hypothesis test (1) was also equivalent to the following:
H 0 : O D = 1   vs   H 1 : O D 1 ,
where
O D = p 12 p 21 ,
Lui [33] solved this hypothesis test by applying the same method as the one used in the RR test. Following an analogous procedure, the test statistic for the hypothesis test (5) is as follows:
χ O D T 2 = [ log ( O D ^ ) ] 2 V ^ a r [ log ( O D ^ ) ] ,
and where
O D ^ = n 12 n 21   and   V ^ a r [ log ( O D ^ ) ] = 1 n p ^ .
The distribution of the test statistic χ O D T 2 is the same as the one in the previous case.
19.
ODM test (ODMT)
The hypothesis test (1) was also the same as the following:
H 0 : O D M = 1   vs   H 1 : O D M 1 ,
where
O D M = ( p 11 + p 12 ) ( p 21 + p 22 ) ( p 11 + p 21 ) ( p 12 + p 22 ) = p 1 · p 2 · p · 1 p · 2 ,
Applying the same method as in the two previous cases, Lui [33] proposed the following test statistic:
χ O D M T 2 = [ log ( O D M ^ ) ] 2 V ^ a r [ log ( O D M ^ ) ] ,
where
O D M ^ = n 1 · n 2 · n · 1 n · 2   and   V ^ a r [ log ( O D M ^ ) ] = 2 n p ^ ( 1 p ^ ) 2 [ p ^ 11 p ^ 22 ( p ^ ) 2 ] n ( p ^ ) 2 ( 1 p ^ ) 2 .
The distribution of test statistic χ O D M T 2 was the same as in the previous cases.
20.
RR, OD, and ODM test with cc (RRTcc, ODTcc, and ODMTcc)
The previous three methods can also be obtained by adding a cc. We proposed to add 1 / 2 to each one of the observed frequencies, i.e., in the following:
n i j = n i j + 1 / 2 .
Thus, the expressions of test statistics χ R R T 2 , χ O D T 2 , and χ O D M T 2 were replaced by p ^ i j , p ^ , and p ^ as follows:
p ^ i j = n i j n ,   p ^ = n 1 · + n · 1 2 n ,   and   p ^ = n 12 + n 21 2 n ,
respectively. In this way, new test statistics χ R R T c c 2 , χ O D T c c 2 , and χ O D M T c c 2 were obtained, and their distributions were the same as in previous cases.

3. Criteria for Comparing Methods

The comparison of the asymptotic behaviour of the methods presented in the previous section was made by comparing their type I error rates and their powers, taking as the nominal error level α = 5 % . Based on the type I error rates and the powers, the criteria in order to choose the methods with best asymptotic behaviour were as follows:
1.
The type I error rate fluctuates around α = 5 % without being much higher than this value, a situation that has been considered when the type I error rate is < 7 % .
2.
The power is higher as long as the type I error rate does not exceed α = 5 % .
“Step 1” of this method to choose the method with the best asymptotic behaviour establishes that the type I error rate must be lower than 7%. Let Δ α = α α * , where α = 5 % and α are the type I error rates of the method. Related to a test statistic, if there is a confidence interval (CI), then Δ α = γ * γ , where γ = 1 α = 0.95 is the nominal confidence of the CI and γ * is the coverage probability of the CI calculated. In this method, to choose test statistics, a test statistic is too liberal if α 7 %   ( Δ α 2 ) , or what amounts to the same if γ 93 % , in which case it is said that the CI fails [34,35,36]. If a CI fails, then the type I error rate of the corresponding hypothesis test is 7 % , and therefore, the hypothesis test is very liberal and leads to too many false significances.

4. Simulation Experiments

Extensive Monte Carlo simulation experiments were carried out in order to study the asymptotic behaviour, measured in terms of type I error rates and powers, of the test statistics presented in Section 2. These experiments, made with the R program [37], consisted of generating N = 50 , 000 random samples of multinomial distribution with probabilities given in Table 1 of n = { 20 , 30 , 50 , 100 , 200 } sizes. Following the idea of Fagerland et al. [12], probabilities ( p 11 , p 12 , p 21 , p 22 ) have been re-parameterized as ( p 12 , p 21 , θ ) , where θ = p 11 p 22 / ( p 11 p 22 ) is the odds ratio. In order to study type I error rates, it was considered that p 12 = p 21 , and to study the powers, it was considered that p 12 p 21 . Values { 0.1 , 0.2 , , 0.8 , 0.9 } were taken as values for p 1 · and p · 1 , and values { 1 , 2 , 5 , 10 } were considered for θ . Therefore, a wide range of values were considered to reveal the asymptotic behaviour of each test statistic. In order to calculate type I error rates and the powers, α = 5 % was set. Initial simulation experiments were carried out, generating N = { 10 , 000 ; 20 , 000 ; 50 , 000 ; 100 , 000 } random samples for several scenarios, obtaining the outcome that the results for N = { 50 , 000 ; 100 , 000 } were stable so that, finally, N = 50 , 000 was considered as a way to save computing time.

4.1. Type I Error Rates

Table 2, Table 3, Table 4 and Table 5 show some of the results obtained for the type I error rates of the test statistics in different scenarios. Each scenario also shows basic descriptive statistics of n 12 + n 21 (mean and standard deviation). By analyzing the result, the following conclusions can be drawn:
  • Both the exact test (CET and UET) and the quasi-exact test MidpT) are conservative methods, and their type I error rates never exceed the nominal error level α = 5 % .
  • All of the McNemar test statistics (MT, MTYcc, MTcc1, MTcc2, MTcc3, and MMT) are conservative when, in general terms, E ( n 12 + n 21 ) is not high. The value of E ( n 12 + n 21 ) decreases as the value of the odds ratio θ increases, so if θ = 1 , all four methods are conservative when E ( n 12 + n 21 ) 21 (rounding up to the nearest whole value), and when θ = 10 , all four methods are conservative when E ( n 12 + n 21 ) 12 . In each scenario, in general terms, the type I error rates of these methods fluctuate around α = 5 % when E ( n 12 + n 21 ) is higher than each one of the previous values. Likewise, continuity corrections do not improve the asymptotic behaviour of the type I error rates, especially when E ( n 12 + n 21 ) is high ( > 20 ) . When E ( n 12 + n 21 ) is small ( 10 ) or moderate ( > 10 and 20 ) , continuity corrections do not have a clear effect on the type I error rate, as sometimes it improves and sometimes it gets worse.
  • MidpT, MT, and UET have practically the same type I error rates when n 30 .
  • Test statistics ODT and ODTcc are methods that lead to many false significances since they have type I error rates that greatly exceed α = 5 % . Therefore, both methods should not be used.
  • The other approximate methods (which are unconditioned methods) are conservative when n 50 , and, in very general terms, their type I error rates fluctuate around α = 5 % (without being too much higher) when n 100 . Some of these methods (WT, LRT, RRT, and ODMT) have type I error rates that fluctuate around α = 5 % (without being too much higher) when n = 50 . Regarding the continuity corrections of the RRT and ODMT methods, they do not improve the asymptotic behaviour of their type I error rates.

4.2. Powers

Table 6, Table 7, Table 8 and Table 9 show some of the results obtained for the power of the test statistics in different scenarios. These tables do not show the results for the test statistics ODT and ODTcc since their type I error rates are very clearly higher than α = 5 % . For each scenario we can also see the basic descriptive statistic of n 12 + n 21 . From the analysis of the results, the following conclusions are obtained:
  • UET and MidpT have very similar powers, and both are a little more powerful than CET, especially when the sample size is small ( n = { 20 , 30 , 50 } ) .
  • The classic McNemar test statistic without cc (MT) has the same power as the three McNemar test statistics with cc (MTcc1, MTcc2, and MTcc3), and all of them are more powerful than the McNemar test statistic with Yates cc (MTYcc).
  • Methods MT, MTcc1, MTcc2, and MTcc3 have the same power as UET and as MidpT when n 30 .
  • Regarding the approximate tests, in general terms, the WT, LRT, RRT, and ODMT methods have more power than the other approximate tests, especially when n 50 . When n 100 , if the difference between p 1 · and p · 1 is small (for example, p 1 · p · 1 = 10 % ), then the WT, LRT, RRT, and ODMT methods have more power than the rest of the approximate methods; if the difference between p 1 · and p · 1 is greater (for example, p 1 · p · 1 40 % ), then all of the approximate methods have very similar powers.
    The continuity corrections in the RRT and ODMT methods do not improve their powers.

5. General Rules of Application

From the results obtained in the simulation experiments and only considering sample size n (as it is the only parameter set by the researcher), one can provide the following general rules of application for the test statistics:
  • When the sample size is small, use the WT, LRT, RRT, or ODMT methods; since they are the least conservative methods, they have the greatest power, and their powers are similar.
  • When the sample size is moderate, use the WT, LRT, RRT, or ODMT methods; since their type I error rates fluctuate around α = 5 % , they have the greatest power, and their powers are similar.
  • When the sample size is large, use the MT, WT, MWT, LRT, RRT, and ODMT methods; since their type I error rates fluctuate around α = 5 % , they have the greatest power, and the powers of these methods are very similar.
The graphs in Figure 1 show the type I error rates of the selected methods, and the graphs in Figure 2 show the powers of these methods for different scenarios. The graphs in Figure 1 show how the WT, LRT, RRT, and ODMT methods have a type I error rate with better behaviour than the MT and MWT methods when the sample size is small or moderate, with their values being very similar when the sample is large. In the graphs in Figure 2, it can be seen that the power of MT is a little lower than that of the other methods when the sample size is small. Likewise, the powers of these methods are very similar when the sample size is moderate or large.

6. Example

The results have been applied to the diagnosis of coronary artery disease using dobutamine echocardiography (DE, test 1) and myocardial perfusion scintigraphy (MPS, test 2) as diagnostic tests and coronary angiography (CA) as the gold standard. The objective of this study is to compare the sensitivities (specificities) of the two diagnostic tests. Table 10 shows the frequencies observed in the study, the estimate of each sensitivity (Se) and of each specificity (Sp), and the results of each method to resolve the respective comparisons. The comparison of the two sensitivities (specificities) has been carried out using the function “pairedProp”, which is a function written in R that allows for comparing two paired binomial proportions using the methods recommended in Section 4. This function is attached as a Supplementary Material to the manuscript. The sentence to compare the two sensitivities is as follows:
pairedProp ( 152 , 17 , 7 , 36 ) ,
and the sentence to compare the two specificities is as follows:
pairedProp ( 25 , 10 , 11 , 290 ) .
In this example, the number of patients with coronary artery disease and the number of patients without coronary artery disease are large, and therefore, all the methods indicated in Section 4 can be applied. The estimates of the sensitivities and specificities of the diagnostic tests are as follows: S ^ e 1 = 0.797 , S ^ e 2 = 0.750 , S ^ p 1 = 1 0.104 = 0.896 , and S ^ p 2 = 1 0.107 = 0.896 . With fixed α = 5 % , the equality of the two sensitivities is rejected, and the equality of the two specificities is not rejected. It is concluded that the sensitivity of the DE test is significantly greater than the sensitivity of the MPS test.
In this example, it can be seen that the p-values of all the methods to compare the two sensitivities (specificities) are very similar to each other, and therefore, the conclusions are the same.

7. Discussion

The comparison of two paired binomial proportions is a problem that appears frequently in medical and clinical studies. In the statistical literature, there are diverse methods proposed to solve this hypothesis test, and therefore, it is necessary to determine which methods have the best asymptotic behaviour in terms of the type I error rate and power. We reviewed 19 existing methods and proposed 5 new ones, and we carried out broad simulation experiments to study their asymptotic behaviour. From the results obtained, we have given some general rules of application for the methods studied.
May and Johnson [9] compared through simulation experiments the asymptotic behaviour of eight methods (CET, MidpT, MT, MTYcc, MMT, WT, MWT, and LRT) and recommended using the MidpT, MWT, and MT methods when it is verified that n 12 + n 21 40 . May and Johnson used the criterion that the type I error rate must not be higher than α = 5 % .
Park [10] has compared, using the same criteria as May and Johnson, the asymptotic behavior of the CET, MT, LRT, and WT methods, concluding that the method with the best behavior is the MT.
Fagerland et al. [11,12,13] also compared through simulation experiments the asymptotic behaviour of five methods: CET, MidpT, UET, MT, and MTYcc. These authors used the same criterion as May and Johnson and recommended using the MidpT and MT methods.
The studies of May and Johnson [9] and Fagerland et al. [11,12,13] used the same criterion to assess the type I error rates, and both studies recommended the MidpT and MT methods. Park [10] recommends the MT method.
Our criterion to assess the type I error rate of each method is more flexible, allowing for a method to be higher than α = 5 % without being too liberal. Regarding the asymptotic behaviour of an approximate test, it is to be expected that its type I error rate will fluctuate around the level of the nominal error when the sample size is large, and therefore, it can be higher than that of the nominal error level. With our criterion, it can be slightly higher than the level of the nominal error. Regarding an exact test, its type I error rate must not be higher than the level of the nominal error, as happens with the results obtained for CET and UET (Table 2, Table 3, Table 4 and Table 5).
The simulation experiments carried out allowed us to establish some general rules of application for the methods. The WT, LRT, RRT, and ODMT methods can be used for whatever the sample size is, and if the sample size is large, then the MT and MWT methods can also be applied. Of these six methods, two are conditioned methods (MT and LRT), and four are unconditioned (WT, MWT, RRT, ODMT); therefore, the problem can be addressed without any problem from both perspectives (conditioned and unconditioned), obtaining results that are very similar. Another important conclusion obtained from the simulation experiments is that continuity corrections do not improve the asymptotic behaviors of the studied methods. Therefore, although in the statistical literature there are different methods that incorporate continuity corrections, their application is not justified.
In this manuscript, we have studied the comparison of two paired proportions using hypothesis tests. An alternative method is to carry out this comparison using confidence intervals instead of hypothesis testing. In this context, there are also numerous intervals (exact and approximate) that can be used [4,13,14,15,16]. In Fagerland et al. [12,13], the behaviour of some of the most used is compared, but it may currently be somewhat incomplete. Therefore, given that new confidence intervals have been investigated in recent years [14,15,16], it is of great interest from a practical point of view to determine which intervals have the best asymptotic behaviour.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/math12020190/s1.

Author Contributions

J.A.R.-N., T.S.S., and J.F.V.-V. have collaborated equally in the realization of this work. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by Spanish Ministry of Science and Innovation, Grant “PID2021-126095NB-100” funded by MCIN/AEI/10.13039/501100011033 and by “ERDF A way of making Europe”.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The authors thank the anonymous referees and the Academic Editors (Elvira Di Nardo and José Luis Vicente Villardón) for their helpful comments that improved the quality of the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Fay, M.P.; Hunsberger, S.A. Practical valid inferences for the two-sample binomial problem. Stat. Surv. 2021, 15, 72–110. [Google Scholar] [CrossRef]
  2. Pepe, M.S. The Statistical Evaluation of Medical Tests for Classification and Prediction, 1st ed.; Oxford University Press: New York, NY, USA, 2003. [Google Scholar]
  3. Zhou, X.H.; Obuchowski, N.A.; McClish, D.K. Statistical Methods in Diagnostic Medicine, 2nd ed.; Wiley: Hoboken, NJ, USA, 2011. [Google Scholar]
  4. Pradhan, V.; Gangopadhyay, A.K.; Menon, S.M.; Basu, C.; Banerjee, T. Confidence Intervals for Discrete Data in Clinical Research, 1st ed.; Chapman & Hall/CRC: New York, NY, USA, 2021. [Google Scholar]
  5. McNemar, Q. Note on the sampling error of the differences between correlated proportions or percentages. Psychometrika 1947, 12, 153–157. [Google Scholar] [CrossRef] [PubMed]
  6. Davis, C.S. Matched pairs with categorical data. In Encyclopedia of Biostatistics; Armitage, P., Colton, T., Eds.; Willey: New York, NY, USA, 1998; Volume 3, pp. 2437–2441. [Google Scholar]
  7. Lachenburch, P.A. McNemar test. In Encyclopedia of Biostatistics; Armitage, P., Colton, T., Eds.; Willey: New York, NY, USA, 1998; Volume 3, pp. 2486–2487. [Google Scholar]
  8. Pembury Smith, M.Q.R.; Ruxton, G.D. Effective use of the McNemar test. Behav. Ecol. Sociobiol. 2020, 74, 133. [Google Scholar] [CrossRef]
  9. May, W.L.; Johnson, W.D. The validity and power of tests for equality of two correlated proportions. Stat. Med. 1997, 16, 1081–1096. [Google Scholar] [CrossRef]
  10. Park, T. Is the exact test better than the asymptotic test for testing marginal homogeneity in 2 × 2 tables? Biom. J. 2002, 44, 571–583. [Google Scholar] [CrossRef]
  11. Fagerland, M.W.; Lydersen, S.; Laake, P. The McNemar test for binary matched-pairs data: Mid-p and asymptotic are better than exact conditional. BMC Med. Res. Methodol. 2013, 13, 91. [Google Scholar] [CrossRef]
  12. Fagerland, M.W.; Lydersen, S.; Laake, P. Recommended tests and confidence intervals for paired binomial proportions. Stat. Med. 2014, 33, 2850–2875. [Google Scholar] [CrossRef] [PubMed]
  13. Fagerland, M.W.; Lydersen, S.; Laake, P. Statistical Analysis of Contingency Tables; Chapman & Hall/CRC: Boca Raton, FL, USA, 2017. [Google Scholar]
  14. Tang, M.L.; Ling, M.H.; Ling, L.; Tian, G. Confidence intervals for a difference between proportions based on paired data. Stat. Med. 2010, 29, 86–96. [Google Scholar] [CrossRef]
  15. Pradhan, V.; Saha, K.K.; Banerjee, T.; Evans, J.C. Weighted profile likelihood-based confidence interval for the difference between two proportions with paired binomial data. Stat. Med. 2014, 33, 2984–2997. [Google Scholar] [CrossRef]
  16. Fay, M.P.; Lumbard, K. Confidence intervals for difference in proportions for matched pairs compatible with exact McNemar’s or sign tests. Stat. Med. 2021, 40, 1147–1159. [Google Scholar] [CrossRef]
  17. Chang, P.; Liu, R.; Hou, T.; Yan, X.; Shan, G. Continuity corrected score confidence interval for the difference in proportions in paired data. J. Appl. Stat. 2024, 51, 139–152. [Google Scholar] [CrossRef] [PubMed]
  18. Agresti, A. Categorical Data Analysis, 3rd ed.; Wiley: New York, NY, USA, 2013; pp. 416–417. [Google Scholar]
  19. Lancaster, H.O. Significance tests in discrete distribution. J. Am. Stat. Assoc. 1961, 56, 223–234. [Google Scholar] [CrossRef]
  20. Edwards, A.L. Note on the “correction for continuity” in testing the significance of the difference between correlated proportions. Psychometrika 1948, 13, 185–187. [Google Scholar] [CrossRef] [PubMed]
  21. Yates, F. Contingency table involving small numbers and the χ 2 test. J. R. Stat. Soc. 1934, 1, 217–235. [Google Scholar]
  22. Martín-Andrés, A.; de Dios Luna del Castillo, J. 40 ± 10 Horas de Bioestadística; Norma-Capitel: Madrid, Spain, 2013. [Google Scholar]
  23. Bennett, B.M.; Underwood, R.E. On McNemar’s test for the 2×2 table and its power function. Biometrics 1970, 26, 339–343. [Google Scholar] [CrossRef]
  24. Wald, A. Tests of statistical hypotheses concerning several parameters when the number of observations is large. Trans. Am. Math. Soc. 1943, 5, 426–482. [Google Scholar] [CrossRef]
  25. Lehmann, E.L.; Romano, J.P. Testing Statistical Hypotheses, 4th ed.; Springer: Cham, Switzerland, 2022; Chapter 14. [Google Scholar]
  26. Wilks, S.S. The large-sample distribution of the likelihood ratio for testing composite hypotheses. Ann. Math. Stat. 1938, 9, 60–62. [Google Scholar] [CrossRef]
  27. Suissa, S.; Shuster, J.J. The 2 × 2 matched-pairs trial: Exact unconditional design and analysis. Biometrics 1991, 47, 361–372. [Google Scholar] [CrossRef]
  28. Lu, Y. A revised version of McNemar’s test for paired binary data. Commun. Stat.-Theory Methods 2010, 39, 3525–3539. [Google Scholar] [CrossRef]
  29. Lu, Y. Considering the concordant observations in likelihood ratio test for paired binary data. Commun. Stat.-Theory Methods 2011, 39, 4214–4232. [Google Scholar] [CrossRef]
  30. Lu, Y.; Wang, M.; Zhang, G. A new revised version of McNemar’s test for paired binary data. Commun. Stat.-Theory Methods 2017, 46, 10010–10024. [Google Scholar] [CrossRef]
  31. Haber, M. The continuity correction and statistical testing. Int. Stat. Rev. 1982, 50, 135–144. [Google Scholar] [CrossRef]
  32. Irony, T.Z.; Pereira, C.A.; Tiwari, R.C. Analysis of opinion swing: Comparison of two correlated proportions. Am. Stat. 2000, 54, 57–62. [Google Scholar]
  33. Lui, K.J. Notes on testing equality in dichotomous data with matched pairs. Biom. J. 2001, 43, 313–321. [Google Scholar] [CrossRef]
  34. Price, R.M.; Bonett, D.G. An improved confidence interval for a linear function of binomial proportions. Comput. Stat. Data. Anal. 2004, 45, 449–456. [Google Scholar] [CrossRef]
  35. Martín-Andrés, A.; Álvarez-Hernández, M. Two-tailed asymptotic inferences for a proportion. J. Appl. Stat. 2014, 41, 1516–1529. [Google Scholar] [CrossRef]
  36. Martín-Andrés, A.; Álvarez-Hernández, M. Two-tailed approximate confidence intervals for the ratio of proportions. Stat. Comput. 2014, 24, 65–75. [Google Scholar] [CrossRef]
  37. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2023; Available online: https://www.R-project.org/ (accessed on 4 December 2023).
Figure 1. Type I error rates of the methods for p 1 · = p · 1 = 20 % .
Figure 1. Type I error rates of the methods for p 1 · = p · 1 = 20 % .
Mathematics 12 00190 g001
Figure 2. Powers of the methods for different scenarios.
Figure 2. Powers of the methods for different scenarios.
Mathematics 12 00190 g002
Table 1. Observed frequencies and theoretical probabilities.
Table 1. Observed frequencies and theoretical probabilities.
Observed Frequencies
After
SuccessFailureTotal
BeforeSuccess n 11 n 12 n 1 ·
Failure n 21 n 22 n 2 ·
Total n · 1 n · 2 n
Theoretical probabilities
After
SuccessFailureTotal
BeforeSuccess p 11 p 12 p 1 ·
Failure p 21 p 22 p 2 ·
Total p · 1 p · 2 1
Table 2. Type I error rates (in %) for θ = 1 and different scenarios.
Table 2. Type I error rates (in %) for θ = 1 and different scenarios.
n = 20 n = 30 n = 50 n = 100 n = 200
MethodScen. 1Scen. 2Scen. 3Scen. 1Scen. 2Scen. 3Scen. 1Scen. 2Scen. 3Scen. 1Scen. 2Scen. 3Scen. 1Scen. 2Scen. 3
CET0.021.860.010.291.672.411.452.873.002.893.183.583.343.803.71
MidpT0.163.640.120.933.444.183.084.614.534.624.484.874.704.984.71
MT0.163.640.120.933.444.183.084.825.285.055.084.914.995.044.95
UET0.163.620.120.933.444.183.084.614.534.134.274.384.704.824.63
MTYcc0.021.860.010.291.652.211.432.462.762.523.173.423.273.643.67
MTcc10.163.640.120.934.600.873.084.523.084.625.004.554.934.955.07
MTcc20.163.640.120.934.670.873.084.803.084.985.004.944.935.055.08
MTcc30.163.640.120.934.670.873.084.803.084.985.004.944.935.055.08
MMT0.163.620.120.933.403.833.043.984.304.134.464.594.594.844.63
WT0.726.620.662.726.386.395.925.685.425.395.245.385.275.195.01
MWT0.165.690.130.933.895.173.084.825.284.695.054.914.935.044.95
LRT0.726.490.662.726.386.255.925.685.355.495.235.125.275.154.95
UMT0.000.870.000.010.421.080.260.881.300.701.071.330.701.071.28
ULRT0.000.870.000.010.421.080.260.881.300.701.071.330.701.071.28
NMT0.000.530.000.010.300.510.190.490.530.540.540.560.520.550.53
NMTcc0.000.180.000.000.070.190.050.190.260.200.290.340.320.370.38
HT0.163.640.120.933.444.183.084.614.544.984.884.914.995.044.95
IT0.021.860.010.291.652.211.432.462.762.523.173.423.273.643.67
RRT2.534.830.125.386.616.136.886.245.496.345.495.375.675.355.01
ODT9.7718.439.8315.9418.5018.4718.4917.7216.9617.4417.4317.5217.4916.7817.00
ODMT0.725.390.662.726.285.835.925.925.376.215.335.275.605.264.99
RRTcc0.163.150.010.933.880.293.314.383.045.334.623.985.174.854.73
ODTcc2.9613.912.949.3015.019.3815.8815.6015.9216.6116.4316.6416.5016.3716.85
ODMTcc0.163.160.120.933.890.873.314.383.285.154.625.075.104.855.26
E ( n 12 + n 21 ) 4.2010.024.195.729.6412.619.0816.0121.0118.0232.0242.0136.0263.9883.97
S D ( n 12 + n 21 ) 1.512.221.501.972.532.702.683.303.493.854.674.935.436.596.97
Scen. 1: p 1 · = p · 1 = 10 % . Scen. 2: p 1 · = p · 1 = 50 % . Scen. 3: p 1 · = p · 1 = 90 % .
Table 3. Type I error rates (in %) for θ = 2 and different scenarios.
Table 3. Type I error rates (in %) for θ = 2 and different scenarios.
n = 20 n = 30 n = 50 n = 100 n = 200
MethodScen. 1Scen. 2Scen. 3Scen. 1Scen. 2Scen. 3Scen. 1Scen. 2Scen. 3Scen. 1Scen. 2Scen. 3Scen. 1Scen. 2Scen. 3
CET0.010.310.720.211.271.981.222.582.972.833.043.353.203.703.76
MidpT0.101.022.090.732.863.902.764.444.664.574.534.734.634.914.87
MT0.101.022.090.732.863.902.764.515.084.845.275.005.094.915.01
UET0.101.022.090.732.863.902.764.444.663.984.444.354.624.754.77
MTYcc0.010.310.720.211.271.941.222.292.532.463.033.293.173.653.63
MTcc10.102.880.070.734.240.622.764.502.684.574.914.724.885.015.20
MTcc20.102.880.070.734.250.622.765.082.684.824.914.954.905.015.23
MTcc30.102.880.070.734.250.622.765.082.684.824.914.954.905.015.23
MMT0.101.022.090.732.853.782.733.904.153.984.524.644.584.844.77
WT0.533.014.832.195.776.555.596.025.485.385.315.275.255.225.04
MWT0.111.232.780.733.104.612.764.515.084.595.165.004.884.915.01
LRT0.533.014.832.195.776.515.596.025.475.575.315.245.255.085.03
UMT0.010.090.250.010.270.650.180.721.070.650.961.150.710.951.12
ULRT0.010.090.250.010.270.650.180.721.070.650.961.150.710.951.12
NMT0.000.050.110.010.140.230.100.280.260.370.320.270.380.330.26
NMTcc0.000.010.030.000.030.080.020.090.110.110.150.140.190.200.16
HT0.101.022.090.732.863.902.764.444.664.824.904.945.094.915.01
IT0.010.310.720.211.271.941.222.292.532.463.033.293.173.653.63
RRT1.954.114.824.215.955.816.426.135.496.245.385.275.585.245.03
ODT8.5816.6318.4214.8518.6318.4418.4518.1317.4517.6017.2417.6417.5317.0117.09
ODMT0.532.984.372.195.495.405.555.815.416.085.325.255.485.145.03
RRTcc0.102.180.010.733.770.192.884.232.665.084.594.095.114.674.88
ODTcc2.4712.333.007.8315.558.0615.5216.2615.4816.3816.4516.5216.6115.5716.95
ODMTcc0.102.190.070.733.880.622.884.232.784.864.594.995.064.675.37
E ( n 12 + n 21 ) 3.995.867.255.368.4910.728.4114.0417.8316.6328.0835.6633.2656.1371.25
S D ( n 12 + n 21 ) 1.441.892.081.882.422.612.583.173.383.734.494.795.276.356.80
Scen. 1: p 1 · = p · 1 = 10 % . Scen. 2: p 1 · = p · 1 = 50 % . Scen. 3: p 1 · = p · 1 = 90 % .
Table 4. Type I error rates (in %) for θ = 5 and different scenarios.
Table 4. Type I error rates (in %) for θ = 5 and different scenarios.
n = 20 n = 30 n = 50 n = 100 n = 200
MethodScen. 1Scen. 2Scen. 3Scen. 1Scen. 2Scen. 3Scen. 1Scen. 2Scen. 3Scen. 1Scen. 2Scen. 3Scen. 1Scen. 2Scen. 3
CET0.010.080.260.090.611.180.742.042.552.582.993.003.053.573.69
MidpT0.040.430.880.381.722.782.023.934.444.484.464.454.544.944.91
MT0.040.430.880.381.722.782.023.934.514.565.285.285.264.964.92
UET0.040.430.880.381.722.782.023.934.443.994.294.384.544.824.71
MTYcc0.010.080.260.090.611.170.741.982.282.292.812.983.033.403.63
MTcc10.041.400.030.383.270.352.024.581.864.484.894.354.774.954.94
MTcc20.041.400.030.383.270.352.024.721.864.564.944.424.944.955.08
MTcc30.041.400.030.383.270.352.024.721.864.564.944.424.944.955.08
MMT0.040.430.880.381.722.772.013.733.933.994.294.434.524.734.83
WT0.311.522.711.404.225.564.626.426.235.325.345.305.305.265.31
MWT0.040.471.060.381.762.962.023.934.514.484.785.104.764.964.92
LRT0.311.522.711.404.225.564.626.426.236.015.345.305.305.165.06
UMT0.010.010.070.000.070.240.090.460.700.540.840.950.630.790.94
ULRT0.010.010.070.000.070.240.090.460.700.540.840.950.630.790.94
NMT0.000.000.020.000.040.040.030.070.060.150.100.070.150.080.06
NMTcc0.000.000.010.000.000.010.000.030.030.060.050.040.080.060.05
HT0.040.430.880.381.722.782.023.934.444.565.034.885.264.964.92
IT0.010.080.260.090.611.170.741.982.282.292.812.983.033.403.63
RRT1.022.052.572.584.164.095.215.545.446.125.345.305.365.175.05
ODT6.3613.1816.3612.3517.8318.6117.6918.3818.2318.0717.0717.2317.2317.4417.11
ODMT0.311.462.171.393.573.574.495.145.356.005.325.305.335.085.00
RRTcc0.040.830.000.382.710.092.043.931.864.724.503.804.964.734.59
ODTcc1.519.051.345.3916.405.4713.4516.2013.7116.1416.9615.9116.8216.2517.20
ODMTcc0.040.820.030.383.090.352.043.931.904.554.514.414.894.735.05
E ( n 12 + n 21 ) 3.614.865.714.726.838.257.2111.1213.6214.0722.1927.2228.1244.3654.41
S D ( n 12 + n 21 ) 1.311.691.861.722.182.382.372.923.143.484.164.454.925.876.31
Scen. 1: p 1 · = p · 1 = 10 % . Scen. 2: p 1 · = p · 1 = 50 % . Scen. 3: p 1 · = p · 1 = 90 % .
Table 5. Type I error rates (in %) for θ = 10 and different scenarios.
Table 5. Type I error rates (in %) for θ = 10 and different scenarios.
n = 20 n = 30 n = 50 n = 100 n = 200
MethodScen. 1Scen. 2Scen. 3Scen. 1Scen. 2Scen. 3Scen. 1Scen. 2Scen. 3Scen. 1Scen. 2Scen. 3Scen. 1Scen. 2Scen. 3
CET0.010.020.070.040.290.520.441.441.982.172.872.992.963.353.55
MidpT0.010.150.350.190.901.561.263.003.854.054.684.564.504.684.88
MT0.010.150.350.190.901.561.263.003.854.075.105.335.314.984.91
UET0.010.150.350.190.901.561.263.003.853.764.194.304.504.674.79
MTYcc0.010.020.070.040.290.520.441.421.932.042.492.742.863.283.40
MTcc10.010.550.010.192.110.131.264.171.134.054.573.984.614.964.69
MTcc20.010.550.010.192.110.131.264.181.134.074.963.995.004.965.09
MTcc30.010.550.010.192.110.131.264.181.134.074.963.995.004.965.09
MMT0.010.150.350.190.901.561.262.973.683.764.194.304.374.604.69
WT0.130.741.370.792.603.963.425.826.454.835.495.415.345.225.20
MWT0.010.160.380.190.911.591.263.003.854.054.744.824.484.894.91
LRT0.130.741.370.792.603.963.425.826.456.215.615.425.345.225.14
UMT0.000.010.010.000.010.060.040.220.410.400.680.790.560.720.81
ULRT0.000.010.010.000.010.060.040.220.410.400.680.790.560.720.81
NMT0.000.000.020.000.010.010.020.030.020.060.040.030.060.030.01
NMTcc0.000.000.020.000.000.000.010.010.010.020.020.020.040.020.00
HT0.010.150.350.190.901.561.263.003.854.075.045.115.314.984.91
IT0.010.020.070.040.290.520.441.421.932.042.492.742.863.283.40
RRT0.460.991.231.482.392.333.784.114.535.685.425.335.365.144.92
ODT4.529.5612.629.5715.6917.5116.6618.4918.4218.2417.5117.1917.0417.6117.68
ODMT0.130.680.950.781.971.983.113.744.425.585.335.335.345.044.91
RRTcc0.010.300.000.191.520.021.273.861.134.134.383.714.644.594.47
ODTcc0.825.750.783.413.983.3510.8315.9910.7016.0117.2815.8616.9516.5317.47
ODMTcc0.010.300.010.191.930.131.273.861.134.044.383.974.594.614.68
E ( n 12 + n 21 ) 3.304.164.734.195.666.626.178.9710.7411.8017.7921.4123.5535.5742.79
S D ( n 12 + n 21 ) 1.181.501.661.551.952.142.152.662.883.213.824.104.655.415.82
Scen. 1: p 1 · = p · 1 = 10 % . Scen. 2: p 1 · = p · 1 = 50 % . Scen. 3: p 1 · = p · 1 = 90 % .
Table 6. Powers (in %) for p 1 · = 0.10 , p · 1 = 0.20 , and θ = 1 .
Table 6. Powers (in %) for p 1 · = 0.10 , p · 1 = 0.20 , and θ = 1 .
Method n = 20 n = 30 n = 50 n = 100 n = 200
CET0.895.4918.5842.4577.04
MidpT2.6310.3825.7448.9480.68
MT2.6310.3825.8352.0280.68
UET2.6310.3825.7448.7180.15
MTYcc0.895.4917.5942.2776.62
MTcc12.6310.3825.7449.6180.68
MTcc22.6310.3825.8350.4780.68
MTcc32.6310.3825.8350.4780.68
MMT2.6310.3624.1248.8180.37
WT6.8517.7930.9352.0881.42
MWT3.0410.8625.8351.1680.68
LRT6.8517.7830.9352.0880.92
UMT0.221.408.2226.7257.77
ULRT0.221.408.2226.7257.77
NMT0.191.246.1621.5051.22
NMTcc0.050.363.1314.7445.96
HT2.6310.3825.7450.4780.68
IT0.895.4917.5942.2776.62
RRT11.4320.0531.6553.8081.73
ODMT6.8517.7731.3852.8981.72
RRTcc2.6310.6626.8151.6680.78
ODMTcc2.6310.3826.7651.1880.70
E ( n 12 + n 21 ) 5.577.9913.0626.0151.92
S D ( n 12 + n 21 ) 1.842.343.084.396.19
Table 7. Powers (in %) for p 1 · = 0.20 , p · 1 = 0.80 , and θ = 2 .
Table 7. Powers (in %) for p 1 · = 0.20 , p · 1 = 0.80 , and θ = 2 .
Method n = 20 n = 30 n = 50 n = 100 n = 200
CET93.7099.81100100100
MidpT96.7299.92100100100
MT96.7299.92100100100
UET96.6699.92100100100
MTYcc93.6799.74100100100
MTcc196.7299.92100100100
MTcc296.7299.92100100100
MTcc396.7299.92100100100
MMT96.6699.89100100100
WT98.4699.95100100100
MWT98.1599.94100100100
LRT98.4099.94100100100
UMT88.8899.48100100100
ULRT88.8899.48100100100
NMT84.1898.56100100100
NMTcc71.7596.96100100100
HT96.7299.92100100100
IT93.6799.74100100100
RRT97.6099.94100100100
ODMT97.9499.94100100100
RRTcc96.2099.89100100100
ODMTcc96.2099.89100100100
E ( n 12 + n 21 ) 13.2619.7232.6465.00129.90
S D ( n 12 + n 21 ) 2.092.573.354.766.74
Table 8. Powers (in %) for p 1 · = 0.10 , p · 1 = 0.50 , and θ = 5 .
Table 8. Powers (in %) for p 1 · = 0.10 , p · 1 = 0.50 , and θ = 5 .
Method n = 20 n = 30 n = 50 n = 100 n = 200
CET53.1791.7899.87100100
MidpT68.9595.6699.92100100
MT68.9595.6699.92100100
UET68.9595.6699.92100100
MTYcc53.1791.7399.83100100
MTcc168.9595.6699.92100100
MTcc268.9595.6699.92100100
MTcc368.9595.6699.92100100
MMT68.9595.5699.91100100
WT82.0097.8099.94100100
MWT72.0596.1399.92100100
LRT82.0097.7999.94100100
UMT36.3780.4399.50100100
ULRT36.3780.4399.50100100
NMT25.5267.4897.21100100
NMTcc14.0752.9394.56100100
HT68.9595.6699.92100100
IT53.1791.7399.83100100
RRT82.3197.5199.95100100
ODMT80.9597.2299.94100100
RRTcc68.9195.6599.92100100
ODMTcc68.6995.6099.91100100
E ( n 12 + n 21 ) 9.2213.5422.2043.8987.49
S D ( n 12 + n 21 ) 2.152.663.484.947.02
Table 9. Powers (in %) for p 1 · = 0.30 , p · 1 = 0.70 , and θ = 10 .
Table 9. Powers (in %) for p 1 · = 0.30 , p · 1 = 0.70 , and θ = 10 .
Method n = 20 n = 30 n = 50 n = 100 n = 200
CET53.0391.5299.87100100
MidpT69.1395.3899.93100100
MT69.1395.3899.93100100
UET69.1395.3899.93100100
MTYcc53.0391.4499.83100100
MTcc169.1395.3899.93100100
MTcc269.1395.3899.93100100
MTcc369.1395.3899.93100100
MMT69.1395.3299.92100100
WT81.7897.6099.96100100
MWT72.1095.9399.93100100
LRT81.7897.6099.95100100
UMT36.2880.4999.39100100
ULRT36.2880.4999.39100100
NMT22.6056.5794.04100100
NMTcc11.4342.4690.09100100
HT69.1395.3899.93100100
IT53.0391.4499.83100100
RRT71.6395.9399.94100100
ODMT72.2495.9399.93100100
RRTcc60.7994.4999.92100100
ODMTcc60.8295.0799.92100100
E ( n 12 + n 21 ) 9.2113.5622.2343.9587.58
S D ( n 12 + n 21 ) 2.162.673.474.966.96
Table 10. Diagnosis of coronary artery disease: frequencies and results of comparisons of sensitivities and specificities.
Table 10. Diagnosis of coronary artery disease: frequencies and results of comparisons of sensitivities and specificities.
Observed Frequencies
Positive DENegative DE
Positive MPSNegative MPSPositive MPSNegative MPSTotal
Positive CA15217736212
Negative CA251011290336
Total1772718326548
Comparison of sensitivities: H 0 : S e 1 = S e 2   v s   H 1 : S e 1 S e 2
MTWTMWTLRTRRTODMT
χ 2 = 4.167 p - v a l u e = 0.041 χ 2 = 4.250 p - v a l u e = 0.039 χ 2 = 4.077 p - v a l u e = 0.0403 χ 2 = 4.296 p - v a l u e = 0.038 χ 2 = 4.169 p - v a l u e = 0.041 χ 2 = 4.191 p - v a l u e = 0.041
Comparison of specificities: H 0 : S p 1 = S p 2   v s   H 1 : S p 1 S p 2
MTWTMWTLRTRRTODMT
χ 2 = 0.048 p - v a l u e = 0.827 χ 2 = 0.048 p - v a l u e = 0.827 χ 2 = 0.045 p - v a l u e = 0.831 χ 2 = 0.048 p - v a l u e = 0.827 χ 2 = 0.048 p - v a l u e = 0.827 χ 2 = 0.048 p - v a l u e = 0.827
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Roldán-Nofuentes, J.A.; Sheth, T.S.; Vera-Vera, J.F. Hypothesis Test to Compare Two Paired Binomial Proportions: Assessment of 24 Methods. Mathematics 2024, 12, 190. https://doi.org/10.3390/math12020190

AMA Style

Roldán-Nofuentes JA, Sheth TS, Vera-Vera JF. Hypothesis Test to Compare Two Paired Binomial Proportions: Assessment of 24 Methods. Mathematics. 2024; 12(2):190. https://doi.org/10.3390/math12020190

Chicago/Turabian Style

Roldán-Nofuentes, José Antonio, Tulsi Sagar Sheth, and José Fernando Vera-Vera. 2024. "Hypothesis Test to Compare Two Paired Binomial Proportions: Assessment of 24 Methods" Mathematics 12, no. 2: 190. https://doi.org/10.3390/math12020190

APA Style

Roldán-Nofuentes, J. A., Sheth, T. S., & Vera-Vera, J. F. (2024). Hypothesis Test to Compare Two Paired Binomial Proportions: Assessment of 24 Methods. Mathematics, 12(2), 190. https://doi.org/10.3390/math12020190

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop