Next Article in Journal
A New Strategy: Remaining Useful Life Prediction of Wind Power Bearings Based on Deep Learning under Data Missing Conditions
Next Article in Special Issue
Cancer Diagnosis by Gene-Environment Interactions via Combination of SMOTE-Tomek and Overlapped Group Screening Approaches with Application to Imbalanced TCGA Clinical and Genomic Data
Previous Article in Journal
Construction of Hermitian Self-Orthogonal Codes and Application
Previous Article in Special Issue
Computation of the Mann–Whitney Effect under Parametric Survival Copula Models
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Joint Statistical Inference for the Area under the ROC Curve and Youden Index under a Density Ratio Model

by
Siyan Liu
1,
Qinglong Tian
2,
Yukun Liu
1,* and
Pengfei Li
2
1
KLATASDS-MOE, School of Statistics, East China Normal University, Shanghai 200062, China
2
Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, ON N2L 3G1, Canada
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(13), 2118; https://doi.org/10.3390/math12132118
Submission received: 27 May 2024 / Revised: 28 June 2024 / Accepted: 3 July 2024 / Published: 5 July 2024
(This article belongs to the Special Issue Statistical Analysis and Data Science for Complex Data)

Abstract

:
The receiver operating characteristic (ROC) curve is a valuable statistical tool in medical research. It assesses a biomarker’s ability to distinguish between diseased and healthy individuals. The area under the ROC curve ( A U C ) and the Youden index (J) are common summary indices used to evaluate a biomarker’s diagnostic accuracy. Simultaneously examining A U C and J offers a more comprehensive understanding of the ROC curve’s characteristics. In this paper, we utilize a semiparametric density ratio model to link the distributions of a biomarker for healthy and diseased individuals. Under this model, we establish the joint asymptotic normality of the maximum empirical likelihood estimator of ( A U C , J ) and construct an asymptotically valid confidence region for ( A U C , J ) . Furthermore, we propose a new test to determine whether a biomarker simultaneously exceeds prespecified target values of A U C 0 and J 0 with the null hypothesis H 0 : A U C A U C 0 or J J 0 against the alternative hypothesis H a : A U C > A U C 0 and J > J 0 . Simulation studies and a real data example on Duchenne Muscular Dystrophy are used to demonstrate the effectiveness of our proposed method and highlight its advantages over existing methods.

1. Introduction

The ROC curve is a valuable statistical tool in medical research for evaluating the performance of binary classifiers across different thresholds. It finds wide applications in fields like radiology, oncology, and genomics [1,2]. In medical studies, ROC curves are particularly useful when evaluating a continuous biomarker to classify individuals as diseased or healthy. Graphically, the ROC curve plots sensitivity (the proportion of true positive) versus one minus specificity (the proportion of false positive) at all possible biomarker thresholds. Extensive and in-depth research has delved into the intricate realm of statistical inferences associated with ROC curves, offering valuable insights and understanding of how these curves are used to evaluate the performance of classification models. For a detailed review, refer to [3,4,5,6].
Let F 0 denote the cumulative distribution function (CDF) of the healthy population and F 1 denote that of the diseased population. Without loss of generality, let us assume that biomarker values are higher in the diseased group than in the healthy group, and an individual is classified as diseased when their biomarker value exceeds a given threshold (x). Under this assumption, the sensitivity is 1 F 1 ( x ) , and the specificity is F 0 ( x ) . The ROC curve is then given by
R O C ( s ) = 1 F 1 F 0 1 ( 1 s )
for s ( 0 , 1 ) , where F 0 1 ( 1 s ) = inf { x R : F 0 ( x ) 1 s } .
In ROC analysis, two common summary indices are used to assess a biomarker’s diagnostic accuracy: the A U C [7,8] and the Youden index (J) [9,10,11]. They are defined mathematically as
A U C = 0 1 R O C ( s ) d s and J = { 1 F 1 ( c ) + F 0 ( c ) 1 } = { F 0 ( c ) F 1 ( c ) } ,
where c = arg max x { F 0 ( x ) F 1 ( x ) } is the “optimal” threshold. By definition, the A U C summarizes the overall performance of a classifier across all possible thresholds. While valuable, it does not directly provide an “optimal” threshold. On the other hand, J is the maximum value of the sensitivity plus the specificity minus 1. Not only does J quantify the biomarker’s effectiveness (with a value of J = 1 indicating complete separation of the biomarker’s distributions for diseased and healthy populations, and J = 0 indicating complete overlap), but it also offers a distinct advantage over A U C by providing a criterion for selecting the “optimal” threshold c. However, J only measures the diagnostic accuracy at the “optimal” threshold c and not at other thresholds.
In practical scenarios where medical practitioners encounter multiple biomarkers, they often use the A U C to choose the most diagnostically useful biomarker [12,13]. However, relying solely on A U C has limitations. The biomarker with the highest A U C might not have the best overall accuracy at the “optimal” threshold. Similarly, focusing only on the Youden index selects the biomarker with the highest total accuracy at the “optimal” threshold. But this "best" biomarker by the Youden index may not perform well overall. If the threshold changes, the biomarker may no longer maintain satisfactory diagnostic accuracy. For real examples and further discussions, refer to [1,14]. In summary, both the A U C and the Youden index are valuable tools for evaluating a biomarker’s effectiveness, each emphasizing distinct aspects of its performance. Simultaneously examining A U C and J, which provide complementary information, may help us make better decisions [1]. This motivates us to develop joint inference procedures for A U C and J in this paper.
In the literature, ref. [1] considered both parametric and nonparametric methods for constructing confidence regions of A U C and J. Later, ref. [2] proposed both parametric and nonparametric tests to determine if a biomarker exceeds predefined target values with hypotheses H 0 : A U C A U C 0 or J J 0 versus H a : A U C > A U C 0 and J > J 0 . For the parametric inference procedures, it is assumed that the original biomarkers or the biomarkers after the Box–Cox transformation follow normal distributions in both the healthy and diseased groups. For the nonparametric inference procedures, the empirical CDF or kernel method is used to estimate F 0 and F 1 .
Generally, parametric joint inference procedures are highly efficient when the underlying parametric models are correct. This means that the resulting confidence region for ( A U C , J ) has a smaller area, and the joint testing procedure has greater power. However, these procedures may not be robust to misspecification of the models for F 0 and F 1 . See Section 4 for more details. On the other hand, nonparametric methods are free from assumptions about the models of F 0 and F 1 . In medical research, it has been observed that healthy and diseased populations often share certain common characteristics [4,15,16,17]. However, fully nonparametric methods ignore this information, potentially leading to inefficient inference procedures.
In this paper, we develop new semiparametric joint inference procedures for ( A U C , J ) based on a semiparametric density ratio model (DRM; refs. [18,19,20]), which effectively utilizes information from both healthy and diseased populations. Let d F 1 and d F 2 be the probability density functions of F 1 and F 2 , respectively. The DRM assumes that
d F 1 ( x ) = exp α + β T q ( x ) d F 0 ( x ) = exp θ T Q ( x ) d F 0 ( x ) ,
where q ( x ) is a prespecified, p-variate vector-valued nontrivial function of x, Q ( x ) = ( 1 , q ( x ) T ) T , and θ = ( α , β T ) T are unknown parameters. The unspecified baseline distribution F 0 makes DRM a semiparametric model. This flexibility allows DRM to encompass many distributions commonly used in studying ROC curves [21]. For instance, if we set q ( x ) to log ( x ) , the DRM encompasses the lognormal distributions (with equal variance on the log scale) and the beta distributions (sharing the same power parameter for ( 1 x ) ). Similarly, setting q ( x ) to x, it includes the normal distributions with the same variance and exponential distributions. The DRM has a close relationship with the logistic regression model. To illustrate this point, let us define D = 0 and 1 as indicators for individuals from the healthy and diseased populations, respectively. As shown by [18,19], the DRM is equivalent to the logistic regression model through the following equation:
P ( D = 1 | x ) = exp { α * + β T q ( x ) } 1 + exp { α * + β T q ( x ) } ,
where α * = α + log { P ( D = 1 ) / P ( D = 0 ) } .
The DRM has proven itself as a valuable tool for inference on ROC curves and their summary indices [4,15,17,22]. Existing theoretical and numerical studies have shown that point estimators of A U C and J under the DRM are more efficient than fully nonparametric estimators. However, as far as we are aware, semiparametric joint inference procedures for ( A U C , J ) , such as confidence regions and joint hypothesis testing procedures, remain uninvestigated under the DRM (2). This paper aims to fill this gap.
Our contributions are three-fold. First, we establish the joint asymptotic normality of the maximum empirical likelihood estimator (MELE) of ( A U C , J ) under the DRM (2). This allows us to construct an asymptotically valid Wald-type confidence region for ( A U C , J ) . We further propose a nonparametric bootstrap procedure to improve the coverage accuracy of the Wald-type confidence region. Second, we develop a joint testing procedure for the null hypothesis: H 0 : A U C A U C 0 or J J 0 versus the alternative hypothesis: H a : A U C > A U C 0 and J > J 0 . We introduce a novel bootstrap procedure to obtain its p-value. Finally, we evaluate the performance of our proposed methods through simulation studies and application to real data on Duchenne Muscular Dystrophy. The numerical studies demonstrate that the proposed method produces more precise confidence regions for ( A U C , J ) with smaller areas. Additionally, the newly proposed joint testing procedure maintains controlled type-I error rates while achieving satisfactory power.
The rest of the paper is structured as follows. In Section 2, we introduce the maximum empirical likelihood estimator (MELE) of ( A U C , J ) and prove its joint asymptotic normality. Section 3 details the proposed joint inference procedures. This includes constructing confidence regions and conducting joint hypothesis tests for ( A U C , J ) . Section 4 presents simulation results and Section 5 contains a real application. A summary and discussion of the findings are given in Section 6 and Section 7, respectively.

2. Methodology

Let x 1 , , x n 0 and y 1 , , y n 1 denote independent random samples from the healthy and diseased populations, respectively. We define a combined sample of size n = n 0 + n 1 by setting z i = x i for 1 i n 0 and z n 0 + i = y i for 1 i n 1 .

2.1. Maximum Empirical Likelihood Estimators of AUC and J

We begin by developing the empirical likelihood (EL) function. By the EL principle [23] and under the DRM (2), the likelihood function based on the observed data is
L n = i = 1 n 0 d F 0 ( x i ) i = 1 n 1 d F 1 ( y i ) = i = 1 n d F 0 ( z i ) i = 1 n 1 exp { θ T Q ( y i ) } = i = 1 n p i · i = 1 n 1 exp { θ T Q ( y i ) } ,
where p i = d F 0 ( z i ) for i = 1 , , n , and they satisfy
p i 0 , i = 1 n p i = 1 , i = 1 n p i exp { θ T Q ( z i ) } = 1 .
The MELEs of ( θ , p 1 , , p n ) , denoted as ( θ ^ , p ^ 1 , , p ^ n ) , are defined as the maximizer of L n subject to the constraints in (3).
Let ρ = n 1 / n . Following [19,24], we obtain the MELE of θ by
θ ^ = arg max θ n ( θ ) ,
where
n ( θ ) = i = 1 n 1 { θ T Q ( y i ) } i = 1 n log 1 ρ + ρ exp { θ T Q ( z i ) }
is the dual empirical log-likelihood function.
Once we have θ ^ , we calculate the MELEs of p i as
p ^ i = 1 n 1 1 ρ + ρ exp { θ ^ T Q ( z i ) } , i = 1 , , n .
Subsequently, the MELEs of F 0 and F 1 are given by
F ^ 0 ( x ) = i = 1 n p ^ i 1 z i x and F ^ 1 ( x ) = i = 1 n p ^ i exp { θ ^ T Q ( z i ) } 1 z i x ,
where 1 ( · ) is the indicator function.
Recall the definition of A U C in (1). It can be verified that
A U C = F 0 ( x ) d F 1 ( x ) = { 1 F 1 ( x ) } d F 0 ( x ) .
The MELE of A U C is then given by
A U C ^ = 1 2 F ^ 0 ( x ) d F ^ 1 ( x ) + 1 2 { 1 F ^ 1 ( x ) } d F ^ 0 ( x ) = 1 2 i = 1 n p ^ i exp { θ ^ T Q ( z i ) } F ^ 0 ( z i ) + 1 2 i = 1 n p ^ i { 1 F ^ 1 ( z i ) } .
Again, recall the definition of the Youden index J in (1). Then, the optimal threshold c should satisfy d F 1 ( c ) = d F 0 ( c ) [17]. This is equivalent to
θ T Q ( c ) = 0 .
With θ ^ , the MELE c ^ of c solves
θ ^ T Q ( c ^ ) = 0 .
The MELE of J is then defined as
J ^ = F ^ 0 ( c ^ ) F ^ 1 ( c ^ ) .

2.2. Joint Asymptotic Normality of ( A U C ^ , J ^ )

In this section, we establish the joint asymptotic normality of ( A U C ^ , J ^ ) . We begin by introducing some notation. Let θ * denote the true value of θ ,
ω ( x ) = exp { ( θ * ) T Q ( x ) } , h ( x ) = 1 ρ + ρ ω ( x ) , h 1 ( x ) = ρ ω ( x ) h ( x ) , h 0 ( x ) = 1 h 1 ( x ) ,
and
U n = n 1 i = 1 n h 1 ( z i ) Q ( z i ) + n 1 i = n 0 + 1 n Q ( z i ) { n ( 1 ρ ) } 1 i = 1 n h 0 ( z i ) F 1 ( z i ) ( 1 A U C * ) ( n ρ ) 1 i = 1 n h 1 ( z i ) F 0 ( z i ) A U C * { n ( 1 ρ ) } 1 i = 1 n h 0 ( z i ) 1 z i c * F 0 ( c * ) ( n ρ ) 1 i = 1 n h 1 ( z i ) 1 z i c * F 1 ( c * ) .
In Lemma A2 of Appendix A, we show that
E ( U n ) = 0 and V a r ( U n ) = n 1 V ,
where the detailed form of V is given in Lemma A2.
We denote the true values of A U C , Youden index J, and optimal threshold c as A U C * , J * and c * , respectively. The asymptotic results in this section rely on the following regularity conditions.
C1.
J ϵ = sup x c * > ϵ F 0 ( x ) F 1 ( x ) < J * for any ϵ > 0 .
C2.
F 0 ( x ) and F 1 ( x ) are continuous in the neighborhood of c * , with F 0 c * F 1 c * = 0 and F 0 c * F 1 c * < 0 .
C3.
The total sample size n = n 0 + n 1 , and ρ = n 1 / n remains constant.
C4.
The DRM (2) is satisfied by F 0 and F 1 . Additionally, Q ( x ) Q ( x ) T d F 0 ( x ) is positive definite, and for θ in a neighborhood of θ * ,
exp θ T Q ( x ) d F 0 ( x ) < .
We note that Conditions C1 and C2 are from [25]. These conditions ensure the identifiability of c * . Condition C4 ensures that the components of q ( x ) are linearly independent under both F 0 ( x ) and F 1 ( x ) . Conditions C3 and C4 guarantee the asymptotic normality of θ ^ .
The following theorem establishes the joint asymptotic normality of ( A U C ^ , J ^ ) . The proof is provided in Appendix A.
Theorem 1. 
Suppose Conditions C1–C4 are satisfied. As the total sample size n , we have
n A U C ^ A U C * J ^ J * N 0 , Σ
in distribution, where Σ = H V H T with
H = B A 1 1 1 0 0 C A 1 0 0 1 1 ,
where
A = h 1 ( x ) Q ( x ) Q ( x ) T d F 0 ( x ) , B T = h 1 ( x ) 1 ρ F 0 ( x ) + 1 1 ρ F 1 ( x ) Q ( x ) d F 0 ( x ) , C T = 1 ρ ( 1 ρ ) c * h 1 ( x ) Q ( x ) d F 0 ( x ) .
Remark 1. 
Our method relies on the DRM assumption (refer to Equation (2)). To assess the validity of this model assumption in practice, we can use goodness-of-fit test statistics proposed by [19]:
Δ n 0 = sup x | F ^ 0 ( x ) F ˜ 0 ( x ) | or Δ n 1 = sup x F ^ 1 ( x ) F ˜ 1 ( x ) ,
and apply the bootstrap method to perform the test. Here,
F ˜ 0 ( x ) = 1 n 0 i = 1 n 0 1 x i x a n d F ˜ 1 ( x ) = 1 n 1 i = 1 n 1 1 y i x .
It can be shown that Δ n 0 = ρ Δ n 1 / ( 1 ρ ) . Therefore, the test results based on Δ n 0 and Δ n 1 are equivalent to each other. Consequently, we only need to consider one statistic, for example Δ n 0 , for practical applications.

3. Joint Inference Procedures for ( AUC , J ) under the DRM

3.1. Confidence Region of ( A U C , J )

The variance–covariance matrix Σ in Theorem 1 depends on θ * , A U C * , J * , c * , F 0 , F 1 . Replacing these by their MELEs θ ^ , A U C ^ , J ^ , c ^ , F ^ 0 , F ^ 1 leads to a variance estimator
Σ ^ = σ ^ 11 σ ^ 12 σ ^ 21 σ ^ 22 .
It can be easily shown that Σ ^ is consistent with Σ .
Theorem 2. 
Under the conditions of Theorem 1, it follows that Σ ^ Σ in probability as n .
For notational convenience, let η = ( A U C , J ) T , η * = ( A U C * , J * ) T , and η ^ = ( A U C ^ , J ^ ) T . Building upon the asymptotic results in Theorems 1 and 2, we conclude that
T n ( η * ) = n η ^ η * T Σ ^ 1 η ^ η * χ 2 2
in distribution as n . Therefore, a 100 ( 1 a ) % th asymptotic Wald-type confidence region for η is
η : T n ( η ) χ 2 , 1 a 2 ,
where χ 2 , 1 a 2 represents the 100 ( 1 a ) % quantile of the Chi-square distribution with two degrees of freedom. Our simulation results in Section 4 demonstrate that this approach yields liberal confidence regions when sample sizes are insufficiently large. To enhance coverage accuracy, we propose a bootstrap method. Throughout the subsequent discussion, we denote quantities derived from the l-th bootstrap sample with the subscript “ B , l ”.
Step 1.
Calculate η ^ and the corresponding Σ ^ based on the observed data { x i } i = 1 n 0 and { y i } i = 1 n 1 .
Step 2.
For l = 1 , , L , draw a bootstrap sample of size n 1 with replacements from { y i } i = 1 n 1 and another bootstrap sample of size n 0 with replacements from { x i } i = 1 n 0 .
Step 3.
For l = 1 , , L , based on the lth bootstrap two-sample data in Step 2, calculate the estimate η ^ B , l for η and the corresponding Σ ^ B , l for Σ , and compute
T n , B , l = n η ^ B , l η ^ T Σ ^ B , l 1 η ^ B , l η ^ .
Step 4.
Obtain the 100 ( 1 a ) % th quantile of { T n , B , l } l = 1 L , which is denoted as q 1 a , T .
Step 5.
The 100 ( 1 a ) % bootstrap confidence region of η is given by
η : T n ( η ) q 1 a , T .
In our simulation study, we set L = 500 . The resulting bootstrap confidence region for η offers improved coverage accuracy, as will be discussed in Section 4.

3.2. Joint Hypothesis Testing on ( A U C , J )

In this section, we examine a testing procedure to determine whether a biomarker simultaneously exceeds prespecified target values of A U C 0 and J 0 . We define the hypotheses as follows:
H 0 : A U C A U C 0 or J J 0 versus H a : A U C > A U C 0 and J > J 0 .
It is important to note that the null hypothesis represents a multivariate order-restrictive hypothesis within a non-convex space.
Our testing procedure is motivated by the results in Proposition 1. The proof is provided in Appendix B.
Proposition 1. 
Suppose that ( X 1 , X 2 ) T is a bivariate normal random vector with unknown mean μ = ( μ 1 , μ 2 ) T and known variance–covariance matrix σ 1 2 ρ σ 1 σ 2 ρ σ 1 σ 2 σ 2 2 . Let
Θ 0 = { μ | μ 1 μ 10 or μ 2 μ 20 } .
(a) 
The maximum likelihood estimator of μ in Θ 0 based on ( X 1 , X 2 ) T is given by
μ ˜ T = ( X 1 , X 2 ) , if X 1 μ 10 or X 2 μ 20 , X 1 ρ σ 1 ( X 2 μ 20 ) σ 2 , μ 20 , if X 1 > μ 10 , X 2 > μ 20 , and X 1 μ 10 σ 1 X 2 μ 20 σ 2 , μ 10 , X 2 ρ σ 2 ( X 1 μ 10 ) σ 1 , if X 1 > μ 10 , X 2 > μ 20 , and X 1 μ 10 σ 1 < X 2 μ 20 σ 2 .
(b) 
The likelihood ratio test statistic for testing H 0 : μ Θ 0 versus H a : μ Θ 0 is
min X 1 μ 10 σ 1 + , X 2 μ 20 σ 2 + 2 ,
where x + is the positive part of x.
Define
W n 1 = A U C ^ A U C 0 σ ^ 11 and W n 2 = J ^ J 0 σ ^ 22 .
With the asymptotic results in Theorems 1 and 2, we propose, according to Proposition 1 (b), to test (8) based on the following test statistic
W n = min W n 1 + , W n 2 + 2 .
We reject the null hypothesis in (8) when W n exceeds a critical value.
The distribution of W n depends on the true null model, which is unknown. Motivated by Proposition 1, we suggest a bootstrap procedure based on W n for the hypothesis testing problem in (8). For ease of presentation, let F ^ 0 , A U C 0 ( x ) and F ^ 1 , A U C 0 ( x ) denote the MELEs of F 0 ( x ) and F 1 ( x ) subject to the constraint that the A U C is fixed at A U C 0 . Similarly, let F ^ 0 , J 0 ( x ) and F ^ 1 , J 0 ( x ) denote the MELEs of F 0 ( x ) and F 1 ( x ) subject to the constraint that the Youden index J is fixed at J 0 . Their numerical calculations will be discussed in Appendix C. Let a be the significance level.
Step 1.
Calculate the test statistic W n based on the observed data { x i } i = 1 n 0 and { y i } i = 1 n 1 .
Step 2.
For l = 1 , , L , generate l-th boostrap two-sample data as follows:
(a)
If A U C ^ A U C 0 or J ^ J 0 , draw a bootstrap sample of size n 1 from F ^ 1 ( x ) and another bootstrap sample of size n 0 from F ^ 0 ( x ) .
(b)
If A U C ^ > A U C 0 , J ^ > J 0 , and W n 1 > W n 2 , draw a sample of size n 1 from F ^ 1 , J 0 ( x ) and another sample of size n 0 from F ^ 0 , J 0 ( x ) .
(c)
If A U C ^ > A U C 0 , J ^ > J 0 , and W n 1 < W n 2 , draw a sample of size n 1 from F ^ 1 , A U C 0 ( x ) and another sample of size n 0 from F ^ 0 , A U C 0 ( x ) .
Step 3.
For l = 1 , , L , calculate the test statistic W n , B , l based on the l-th bootstrap two-sample data in Step 2 (using the same method as in Step 1).
Step 4.
Calculate the p-value of W n as
p - value = 1 L i = 1 L 1 W n , B , l > W n .
Step 5.
Reject the null hypothesis in (8) at the significance level a if the p-value of W n is smaller than a.
We note that Cases (a)–(c) in Step 2 correspond to the three cases outlined in Proposition 1(a). For instance, consider Step 2(b), in which A U C ^ > A U C 0 , J ^ > J 0 , and W n 1 > W n 2 . By the second case of Proposition 1(a), we set the MELE of J to be J 0 under the null model in (8). Hence, we generate the bootstrap two-sample data from F ^ 1 , J 0 ( x ) and F ^ 0 , J 0 ( x ) .
In our simulation study, we set L = 500 . The simulation results in Section 4 demonstrate that the bootstrap procedure effectively controls the type-I error.

4. Simulation Study

This section employs simulation examples to compare the finite-sample performances of our proposed joint inference procedures and some existing competitors.

4.1. Simulation Parameter Settings

We consider two distributional settings:
(1)
f 0 L N ( μ 0 , σ 0 2 ) and f 1 L N ( μ 1 , σ 1 2 ) ;
(2)
f 0 B e t a ( a 0 , b 0 ) and f 1 B e t a ( a 1 , b 1 ) .
Here, L N ( μ , σ 2 ) denotes the lognormal distribution with mean μ and variance σ 2 , both with respect to the log scale; B e t a ( a , b ) denotes the beta distribution with power parameters for x and ( 1 x ) being a and b, respectively.
We comment that Setting (1) corresponds to the case where the model assumption for the Box–Cox method is satisfied, while Setting (2) pertains to the case where this assumption is violated. In both cases, we consider three true values 0.3, 0.5, and 0.7 for the Youden index to cover low, moderate, and high levels of diagnostic accuracy. The details of the parameter settings are given in Table 1.
When specifying the parameter settings in Table 1, we first specify the parameter values (Columns 4 and 5) of f 0 and keep them fixed. Then, for each J * (Column 3), we choose the parameter values (Columns 6 and 7) of f 1 for the designated distribution (Column 1) such that the J * value is achieved. Finally, we calculate the corresponding A U C * (Column 2) under the pair of f 0 and f 1 . This approach of selecting parameter values has been used in [1,17,21].
Throughout this section, our proposed joint inference procedures use the correctly specified q ( x ) = log ( x ) .

4.2. Simulation for Confidence Regions

We compare the proposed confidence regions for ( A U C , J ) with four methods from [1] using simulation studies. The six methods are listed below:
  • Empirical likelihood method (proposed in (6)), which is denoted as “EL”;
  • Bootstrap empirical likelihood method with L = 500 (proposed in (7)), which is denoted as “BEL”;
  • Parametric Box–Cox asymptotic delta method, which is denoted as “AD”;
  • Parametric Box–Cox generalized inference approach, which is denoted as “GPQ”;
  • Nonparametric bootstrap confidence region, which is denoted as “BTI”;
  • Nonparametric bootstrap confidence region with the arcsin-square-root transformation, which is denoted as “BTAT”.
We consider three different combinations of sample sizes: ( n 0 , n 1 ) = (50, 50), (100, 100), and (150, 50). This results in nine combinations of parameters and sample sizes for each of the two distributional settings of f 0 and f 1 . We repeat the simulation 2000 times for each combination.
We now examine the behavior of the 95% confidence region of η = ( A U C , J ) T . The performance of a confidence region is evaluated by the coverage probability (CP) in percentage and area of the confidence region (ACR), which are computed as follows:
C P = 1 L l = 1 L 1 ( η * I ( l ) ) × 100 , A C R = 1 L l = 1 L C ( l ) ,
where I ( l ) is the confidence region of η computed from the l-th simulation run, and C ( l ) is the area of the confidence region. The simulation results are presented in Table 2 and Table 3.
Table 2 presents simulation results at the nominal level of 95% under the lognormal distributional setting. The BEL method improves the coverage of the EL method at the cost of increasing the area size. Nonetheless, the proposed BEL method still demonstrates the most stable and comparable performance in almost all cases with CPs reasonably close to 95% in nearly all scenarios. As sample sizes become larger, the AD method generally overestimates the coverage probabilities, resulting in the largest area of confidence regions among the six methods. The GPQ method is generally comparable to the BEL method, maintaining satisfactory coverage probabilities with a relatively small area, especially when the true value of J is large. This is likely because the model assumption for the Box–Cox method is satisfied in this case. Both the BTI and BTAT methods underestimate the coverage probabilities, especially when the true value of J is 0.3, and they produce similar areas in many cases.
Table 3 presents simulation results for the beta distributional setting at the nominal level of 95%. The EL, BEL, BTI, and BTAT methods exhibit similar trends in the lognormal setting. However, the performance of the AD and GPQ methods is quite different. Both AD and GPQ methods struggle to accurately estimate the CPs when the true value of J is 0.5 or 0.7. The GPQ method exhibits particularly poor performance. For example, when the true value of J is 0.7 and ( n 0 , n 1 ) = ( 100 , 100 ) , the CP of the GPQ method is only 39.05%.
In summary, when q ( x ) is correctly specified, and the Box–Cox model is satisfied, the BEL and GPQ methods have comparable performance and are better than other methods. When the Box–Cox model is not satisfied, the BEL method performs better than the parametric and nonparametric methods.

4.3. Simulation for Joint Hypothesis Testing

This section presents a simulation study comparing the performance of the proposed bootstrap joint test procedure with L = 500 (denoted as “BELT”), introduced in Section 3.2, with two recommended joint test methods from [2]:
  • Parametric bootstrap joint test method, which is denoted as “PBA”;
  • Nonparametric kernel-smoothed-based joint test method, which is denoted as “NKS”.
All tests were carried out at the significance level a = 0.05 . We consider three different combinations of sample sizes: ( n 0 , n 1 ) = (50, 50), (75, 75), and (75, 50). For both lognormal and beta distributional settings, the model with J = 0.5 is chosen as the null model and that with J = 0.7 is chosen as the alternative model. The number of replications is 2000. The simulated type-I errors and powers of three tests are shown in Table 4 and Table 5.
The first block of Table 4 presents the simulated type-I errors of three tests under the lognormal distribution. We observe that all three methods maintain type-I error rates at 0.05, although the PBA and NKS methods tend to be conservative. The second block of Table 4 presents the simulated powers of the three tests. The proposed BELT method exhibits the largest power.
Table 5 presents the results for beta distributions. While BELT and NKS effectively control type-I errors, the PBA test’s errors significantly exceed 5%. When data come from the alternative model, BELT exhibits greater power compared to NKS.
In conclusion, the BELT method effectively controls type-I errors across both distributional settings (lognormal and beta) and demonstrates superior power compared to the nonparametric method.

5. Real Data Analysis

In this section, we evaluate the performance of the proposed methods using a dataset on Duchenne Muscular Dystrophy (DMD). DMD is a genetic disorder characterized by progressive muscle weakness and wasting. It is caused by mutations in the dystrophin gene, the largest human gene, located on the X chromosome (Xp21). DMD primarily affects males in early childhood. Interestingly, females with one copy of the mutated gene typically do not show symptoms. Therefore, identifying potential female carriers is crucial.
According to [26], individuals carrying the DMD gene mutation may not exhibit symptoms but often have elevated levels of specific biomarkers. The authors of [27] compiled a dataset encompassing four biomarkers: Creatine Kinase (CK), Hemopexin (H), Lactate Dehydrogenase (LD), and Pyruvate Kinase (PK). These biomarkers were measured in blood serum samples from a healthy control group ( n 0 = 127) and a group of DMD carriers ( n 1 = 67).
For illustration, we consider the biomarkers PK and H. We choose q ( x ) = log ( x ) in the proposed methods for each biomarker. We perform the goodness-of-fit test suggested in Remark 1; the p-values based on 1000 bootstrap samples are 0.215 and 0.780 for PK and H, respectively. This suggests that the DRM in (2) with q ( x ) = log ( x ) provides reasonable fits for both biomarkers PK and H.
Table 6 presents the point estimates (PEs) of ( A U C , J ) and the ACRs for ( A U C , J ) at the 95% confidence level based on the BEL, GPQ, and BTAT methods. We omit the results for the EL, AD, and BTI methods because, as shown in Section 4, the BEL method achieves better coverage than the EL method, the AD method has larger ACRs compared to the GPQ method, and the BTI method performs similarly to the BTAT method. Clearly, the BEL method gives the coverage region with the smallest area. As an illustration, we further plot the 95% confidence regions of ( A U C , J ) for the biomarker H based on the BEL, GPQ, and BTAT methods in Figure 1, which demonstrates similar observations as in Table 6.
To illustrate the proposed joint test method BELT, we assess whether the biomarker PK simultaneously exceeds the prespecified target values of A U C 0 = 0.753 and J 0 = 0.371 simultaneously. These values represent the PE of ( A U C , J ) based on the BEL method for biomarker H. Our BELT method with L = 500 gives the p-value 0.022. This result provides strong evidence to reject the null hypothesis H 0 : A U C 0.753 or J 0.371 at the 5% significance level. In contrast, applying both the PBA and NKS tests from [2] fails to reject the same null hypothesis. In conclusion, our BELT method provides stronger evidence against the null hypothesis H 0 : A U C 0.753 or J 0.371 . This implies that the biomarker PK has better discriminatory ability than biomarker H in terms of both A U C and the Youden index J.

6. Summary

In this paper, we proposed a bootstrap confidence region for ( A U C , J ) and a bootstrap joint testing procedure for the hypothesis testing problem in (8) based on the MELE of ( A U C , J ) . We conducted extensive simulations to evaluate the performance of our proposed semiparametric approaches. The results demonstrate that the BEL method accurately constructs confidence regions for ( A U C , J ) with the desired coverage probability. Additionally, the proposed bootstrap testing method, BELT, consistently maintains the type-I error rate and exhibits satisfactory power compared to existing joint tests.
Theoretically, we established the joint asymptotic normality of the MELE of ( A U C , J ) , providing the theoretical foundation for the proposed confidence region and joint testing procedure for ( A U C , J ) . Practically, we developed R functions to implement the proposed methods, which are available in the Supplementary Materials.
To use the proposed methods, it is necessary to specify q ( x ) in (2). Common choices for q ( x ) include q ( x ) = log ( x ) and q ( x ) = x . We recommend that practitioners first use the goodness-of-fit test described in Remark 1 to assess the suitability of a prespecified choice of q ( x ) . The R function for implementing the goodness-of-fit test for the DRM with commonly used q ( x ) is included in the Supplementary Materials. If practitioners do not have a suitable choice for q ( x ) , the nonparametric method outlined in Section 4 may be preferable.

7. Discussion

We observe that the proposed methods have the potential to be applied to other research problems. For example, we may extend them to compare paired or multiple markers based on both A U C and the Youden index J [28]. This paper considers two-sample/two-group data only. Multiple sample/group data are also commonly seen [29], and we may explore the proposed methods in this scenario. Furthermore, although widely used, the A U C has its limitations. A major drawback is that it summarizes the entire ROC curve, including regions that may not be directly relevant to clinical applications. To address this issue while retaining some of the A U C ’s beneficial properties, one can use the partial area under the ROC curve ( p A U C ). Considering a clinically relevant range of false-positive or true-positive rates, the p A U C focuses on a specific portion of the curve [30,31,32,33,34]. We can extend the proposed method to study statistical inference for the p A U C . We leave these research problems for future investigation.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/math12132118/s1.

Author Contributions

Conceptualization, S.L., Q.T., Y.L. and P.L.; methodology, S.L., Q.T., Y.L. and P.L.; software, S.L.; validation, Q.T., Y.L. and P.L.; formal analysis, S.L.; data curation, S.L.; writing—original draft preparation, S.L.; writing—review and editing, Q.T., Y.L. and P.L.; visualization, S.L.; supervision, Q.T., Y.L. and P.L.; project administration, Q.T., Y.L. and P.L.; funding acquisition, Q.T., Y.L. and P.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key R&D Program of China (2021YFA1000100 and 2021YFA1000101), the National Natural Science Foundation of China (12171157 and 32030063), the 111 project (B14019), and the Natural Sciences and Engineering Research Council of Canada (RGPIN-2023-03479 and RGPIN-2020-04964).

Data Availability Statement

The R functions for implementing the proposed methods and the goodness-of-fit test for the DRM with commonly used q ( x ) , as well as the data supporting the findings of this study, are available in Supplementary Materials.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Proof of Theorem 1

Appendix A.1. Some Preparation

Recall that ω ( x ) = exp { ( θ * ) T Q ( x ) } and h ( x ) = 1 ρ + ρ ω ( x ) . The following lemma is helpful for our subsequent calculation, in which we use E 0 to denote the expectation with respect to F 0 ( x ) .
Lemma A1. 
Let g 1 ( x ) and g 2 ( x ) be arbitrary functions of x such that the expectations below are all finite. Then, we have
E i = 1 n 1 g 1 ( y i ) = n ρ E 0 ω ( Z ) g 1 ( Z ) ,
E i = 1 n g 1 ( z i ) = n E 0 h ( Z ) g 1 ( Z ) ,
C o v i = 1 n 1 g 1 ( y i ) , i = 1 n 1 g 2 ( y i ) = n ρ E 0 ω ( Z ) g 1 ( Z ) g 2 ( Z ) n ρ E 0 ω ( Z ) g 1 ( Z ) E 0 ω ( Z ) g 2 ( Z ) ,
C o v i = 1 n g 1 ( z i ) , i = 1 n g 2 ( z i ) = n E 0 h ( Z ) g 1 ( Z ) g 2 ( Z ) n ρ E 0 ω ( Z ) g 1 ( Z ) E 0 ω ( Z ) g 2 ( Z ) n ( 1 ρ ) E 0 g 1 ( Z ) E 0 g 2 ( Z ) .
Proof. 
We first consider (A1). Using the DRM (2), we have
E i = 1 n 1 g 1 ( y i ) = n 1 E { g 1 ( y 1 ) } = n 1 g 1 ( z ) ω ( z ) d F 0 ( z ) = n ρ E 0 { ω ( Z ) g 1 ( Z ) } .
Equation (A2) can be similarly proved.
Next, we consider (A3). Note that
C o v i = 1 n 1 g 1 ( y i ) , i = 1 n 1 g 2 ( y i ) = n 1 C o v ( g 1 ( y 1 ) , g 2 ( y 1 ) ) = n ρ E 0 { ω ( Z ) g 1 ( Z ) g 2 ( Z ) } n ρ E 0 { ω ( Z ) g 1 ( Z ) } E 0 { ω ( Z ) g 2 ( Z ) } .
Equation (A4) can be similarly proved. □
Next, we rewrite the U n in (5) as
U n = U n 1 U n 2 U n 3 U n 4 U n 5 = n 1 i = 1 n h 1 ( z i ) Q ( z i ) + n 1 i = n 0 + 1 n Q ( z i ) { n ( 1 ρ ) } 1 i = 1 n h 0 ( z i ) F 1 ( z i ) ( 1 A U C * ) ( n ρ ) 1 i = 1 n h 1 ( z i ) F 0 ( z i ) A U C * { n ( 1 ρ ) } 1 i = 1 n h 0 ( z i ) 1 z i c * F 0 ( c * ) ( n ρ ) 1 i = 1 n h 1 ( z i ) 1 z i c * F 1 ( c * ) .
The next lemma presents the expectation and variance of U n .
Lemma A2. 
For U n defined in (A5), we have
E ( U n ) = 0 a n d V ar ( U n ) = n 1 V ,
where
V = V 11 V 12 V 13 V 14 V 15 V 21 V 22 V 23 V 24 V 25 V 31 V 32 V 33 V 34 V 35 V 41 V 42 V 43 V 44 V 45 V 51 V 52 V 53 V 54 V 55 ,
with
V 11 = ( 1 ρ ) E 0 h 1 ( Z ) Q ( Z ) Q T ( Z ) 1 ρ ρ E 0 h 1 ( Z ) Q ( Z ) E 0 h 1 ( Z ) Q T ( Z ) , V 21 = V 12 T = E 0 { 1 ω ( Z ) } h 0 ( Z ) F 1 ( Z ) E 0 { h 1 ( Z ) Q T ( Z ) } , V 22 = ( 1 ρ ) 1 E 0 h 0 ( Z ) F 1 2 ( Z ) ρ 1 E 0 h 1 ( Z ) F 1 ( Z ) 2 ( 1 ρ ) 1 E 0 h 0 ( Z ) F 1 ( Z ) 2 , V 31 = V 13 T = 1 ρ ρ E 0 { 1 ω ( Z ) } h 1 ( Z ) F 0 ( Z ) E 0 h 1 ( Z ) Q T ( Z ) , V 32 = V 23 = ρ 1 E 0 { h 1 ( Z ) F 1 ( Z ) F 0 ( Z ) } ρ 1 E 0 { ω ( Z ) h 1 ( Z ) F 0 ( Z ) } E 0 { h 1 ( Z ) F 1 ( Z ) } ρ 1 E 0 { h 1 ( Z ) F 0 ( Z ) } E 0 { h 0 ( Z ) F 1 ( Z ) } , V 33 = ρ 1 E 0 { ω ( Z ) h 1 ( Z ) F 0 2 ( Z ) } 1 ρ ρ 2 E 0 { h 1 ( Z ) F 0 ( Z ) } 2 ρ 1 E 0 { ω ( Z ) h 1 ( Z ) F 0 ( Z ) } 2 , V 41 = V 14 T = E 0 { 1 ω ( Z ) } h 0 ( Z ) 1 Z c * E 0 { h 1 ( Z ) Q T ( Z ) } , V 42 = V 24 = ( 1 ρ ) 1 E 0 h 0 ( Z ) F 1 ( Z ) 1 Z c * ρ 1 E 0 h 1 ( Z ) 1 Z c * E 0 h 1 ( Z ) F 1 ( Z ) ( 1 ρ ) 1 E 0 h 0 ( Z ) 1 Z c * E 0 h 0 ( Z ) F 1 ( Z ) , V 43 = V 34 = ρ 1 E 0 { h 1 ( Z ) F 0 ( Z ) 1 Z c * } ρ 1 E 0 { ω ( Z ) h 1 ( Z ) F 0 ( Z ) } E 0 { h 1 ( Z ) 1 Z c * } ρ 1 E 0 { h 1 ( Z ) F 0 ( Z ) } E 0 { h 0 ( Z ) 1 Z c * } , V 44 = ( 1 ρ ) 1 E 0 h 0 ( Z ) 1 Z c * ρ 1 E 0 h 1 ( Z ) 1 Z c * 2 ( 1 ρ ) 1 E 0 h 0 ( Z ) 1 Z c * 2 , V 51 = V 15 T = 1 ρ ρ E 0 { 1 ω ( Z ) } h 1 ( Z ) { 1 Z c * } E 0 h 1 ( Z ) Q T ( Z ) , V 52 = V 25 = ρ 1 E 0 { h 1 ( Z ) F 1 ( Z ) 1 Z c * } ρ 1 E 0 { ω ( Z ) h 1 ( Z ) 1 Z c * } E 0 { h 1 ( Z ) F 1 ( Z ) } ρ 1 E 0 { h 1 ( Z ) 1 Z c * } E 0 { h 0 ( Z ) F 1 ( Z ) } , V 53 = V 35 = ρ 1 E 0 { ω ( Z ) h 1 ( Z ) F 0 ( Z ) 1 Z c * } 1 ρ ρ 2 E 0 { h 1 ( Z ) 1 Z c * } E 0 { h 1 ( Z ) F 0 ( Z ) } ρ 1 E 0 { ω ( Z ) h 1 ( Z ) 1 Z c * } E 0 { ω ( Z ) h 1 ( Z ) F 0 ( Z ) } , V 54 = V 45 = ρ 1 E 0 { h 1 ( Z ) 1 Z c * } ρ 1 E 0 { ω ( Z ) h 1 ( Z ) 1 Z c * } E 0 { h 1 ( Z ) 1 Z c * } ρ 1 E 0 { h 1 ( Z ) 1 Z c * } E 0 { h 0 ( Z ) 1 Z c * } , V 55 = ρ 1 E 0 { ω ( Z ) h 1 ( Z ) 1 Z c * } 1 ρ ρ 2 E 0 { h 1 ( Z ) 1 Z c * } 2 ρ 1 E 0 { ω ( Z ) h 1 ( Z ) 1 Z c * } 2 .
Proof. 
For E ( U n ) , we show only that E ( U n 2 ) = 0 . The other parts can be verified using Lemma A1 directly. Recall that
U n 2 = { n ( 1 ρ ) } 1 i = 1 n h 0 ( z i ) F 1 ( z i ) ( 1 A U C * ) .
Using (A2), we have that
E ( U n 2 ) = n n ( 1 ρ ) E 0 { h ( Z ) h 0 ( Z ) F 1 ( Z ) } ( 1 A U C * ) = E 0 { F 1 ( Z ) } ( 1 A U C * ) = 0 .
For V a r ( U n ) , we only verify
V 31 = n C o v ( U n 3 , U n 1 ) = 1 ρ ρ E 0 { 1 ω ( Z ) } h 1 ( Z ) F 0 ( Z ) E 0 h 1 ( Z ) Q T ( Z ) .
The other parts, again, can be similarly checked.
Note that
C o v ( U n 3 , U n 1 ) = C o v 1 n ρ i = 1 n h 1 ( z i ) F 0 ( z i ) , 1 n i = 1 n h 1 ( z i ) Q ( z i ) + 1 n i = n 0 + 1 n Q ( z i ) = 1 n 2 ρ C o v i = 1 n h 1 ( z i ) F 0 ( z i ) , i = 1 n h 1 ( z i ) Q ( z i ) + 1 n 2 ρ C o v i = n 0 + 1 n h 1 ( z i ) F 0 ( z i ) , i = n 0 + 1 n Q ( z i ) .
By Lemma A1, we have
C o v i = n 0 + 1 n h 1 ( z i ) F 0 ( z i ) , i = n 0 + 1 n Q ( z i ) = n ρ E 0 ω ( Z ) h 1 ( Z ) F 0 ( Z ) Q ( Z ) T n ρ E 0 ω ( Z ) h 1 ( Z ) F 0 ( Z ) E 0 ω ( Z ) Q ( Z ) T
and
C o v i = 1 n h 1 ( z i ) F 0 ( z i ) , i = 1 n h 1 ( z i ) Q ( z i ) = n ρ E 0 ω ( Z ) h 1 ( Z ) F 0 ( Z ) Q ( Z ) T n ρ E 0 ω ( Z ) h 1 ( Z ) F 0 ( Z ) E 0 ω ( Z ) h 1 ( Z ) Q ( Z ) T n ( 1 ρ ) E 0 h 1 ( Z ) F 0 ( Z ) E 0 h 1 ( Z ) Q ( Z ) T .
Combining (A6)–(A8), after some algebraic manipulation, we obtain
C o v ( U n 3 , U n 1 ) = 1 ρ n ρ E 0 { 1 ω ( Z ) } h 1 ( Z ) F 0 ( Z ) E 0 h 1 ( Z ) Q T ( Z ) .
This finishes the proof. □

Appendix A.2. Proof of Theorem 1

Proof. 
Recall that the MELEs of p i are given by
p ^ i = 1 n 1 1 ρ + ρ exp { θ ^ T Q ( z i ) } , i = 1 , , n .
Let
q ^ i = p ^ i exp { θ ^ T Q ( z i ) } .
It follows that the MELEs of F 0 and F 1 are given by
F ^ 0 ( x ) = i = 1 n p ^ i 1 z i x a n d F ^ 1 ( x ) = i = 1 n q ^ i 1 z i x .
The joint asymptotic normality of ( A U C ^ , J ^ ) relies on linear approximations of ( A U C ^ , J ^ ) . According to [4,17], we have
A U C ^ A U C * = i = 1 n q ^ i F 0 ( z i ) + i = 1 n p ^ i { 1 F 1 ( z i ) } 2 A U C * + o p ( n 1 / 2 ) = i = 1 n q ^ i F 0 ( z i ) A U C * i = 1 n p ^ i F 1 ( z i ) ( 1 A U C * ) + o p ( n 1 / 2 )
and
J ^ J * = { F ^ 0 ( c * ) F 0 ( c * ) } { F ^ 1 ( c * ) F 1 ( c * ) } + o p ( n 1 / 2 ) = i = 1 n p ^ i 1 z i c * F 0 ( c * ) i = 1 n q ^ i 1 z i c * F 1 ( c * ) + o p ( n 1 / 2 ) .
Recall the form of U n in (A5). Applying a first-order Taylor expansion to (A9), we have
A U C ^ A U C * = U n 2 + U n 3 + 1 n ρ i = 1 n h 0 ( z i ) h 1 ( z i ) F 0 ( z i ) Q ( z i ) T θ ^ θ * + 1 n ( 1 ρ ) i = 1 n h 0 ( z i ) h 1 ( z i ) F 1 ( z i ) Q ( z i ) T θ ^ θ * + o p ( n 1 / 2 ) .
Using the weak law of large numbers and Lemma A1, we further obtain
A U C ^ A U C * = U n 2 + U n 3 + 1 ρ ρ E 0 h 1 ( Z ) F 0 ( Z ) Q ( Z ) T θ ^ θ * + E 0 h 1 ( Z ) F 1 ( Z ) Q ( Z ) T θ ^ θ * + o p ( n 1 / 2 ) .
By [19], we have
θ ^ θ * = ( 1 ρ ) 1 A 1 U n 1 + o p ( n 1 / 2 ) ,
which, together with (A10), implies that
A U C ^ A U C * = U n 2 + U n 3 + 1 ρ E 0 h 1 ( Z ) F 0 ( Z ) Q ( Z ) T A 1 U n 1 + 1 1 ρ E 0 h 1 ( Z ) F 1 ( Z ) Q ( Z ) T A 1 U n 1 + o p ( n 1 / 2 ) .
Similarly, we obtain
J ^ J * = U n 4 U n 5 1 ρ E 0 h 1 ( Z ) 1 ( z i c * ) Q ( Z ) T θ ^ θ * + o p ( n 1 / 2 ) . = U n 4 U n 5 1 ρ ( 1 ρ ) E 0 h 1 ( Z ) 1 ( z i c * ) Q ( Z ) T A 1 U n 1 + o p ( n 1 / 2 ) .
Combining (A11) and (A12) leads to
n A U C ^ A U C * J ^ J * = n H U n + o p ( 1 ) .
Using the central limit theorem, Lemma A2, and Slutsky’s theorem, we have
n A U C ^ A U C * J ^ J * N 0 , Σ = H V H T
in distribution, as claimed in Theorem 1. This finishes the proof. □

Appendix B. Proof of Proposition 1

Proof. 
The log-likelihood function of μ = ( μ 1 , μ 2 ) T based on ( X 1 , X 2 ) T , up to a constant not depending on μ , is
( μ 1 , μ 2 ) = 1 2 ( 1 ρ 2 ) ( X 1 μ 1 ) 2 σ 1 2 2 ρ ( X 1 μ 1 ) ( X 2 μ 2 ) σ 1 σ 2 + ( X 2 μ 2 ) 2 σ 2 2 .
For (a), it is easy to see that if X 1 μ 10 or X 2 μ 20 , we have
μ ˜ = ( X 1 , X 2 ) T .
Next, we concentrate on the cases when X 1 > μ 10 and X 2 > μ 20 . Denote
μ 1 * = argmax μ 1 ( μ 1 , μ 20 ) and μ 2 * = argmax μ 2 ( μ 10 , μ 2 ) .
Then, μ 1 * and μ 2 * satisfy
( μ 1 * , μ 20 ) μ 1 = 1 2 ( 1 ρ 2 ) 2 ( μ 1 * X 1 ) σ 1 2 2 ρ ( μ 20 X 2 ) ) σ 1 σ 2 = 0 , ( μ 10 , μ 2 * ) μ 2 = 1 2 ( 1 ρ 2 ) 2 ( μ 2 * X 2 ) σ 2 2 2 ρ ( μ 10 X 1 ) ) σ 1 σ 2 = 0 .
It can be easily verified that
μ 1 * = X 1 ρ σ 1 ( X 2 μ 20 ) σ 2 , μ 2 * = X 2 ρ σ 2 ( X 1 μ 10 ) σ 1
and
( μ 1 * , μ 20 ) = ( X 2 μ 20 ) 2 / 2 σ 2 2 , ( μ 10 , μ 2 * ) = ( X 1 μ 10 ) 2 / 2 σ 1 2 .
Therefore, when X 1 > μ 10 and X 2 > μ 20 , and further ( μ 1 * , μ 20 ) ( μ 10 , μ 2 * ) , equivalently, ( X 1 μ 20 ) / σ 1 ( X 2 μ 20 ) / σ 2 , we have
μ ˜ = ( μ 1 * , μ 20 ) T .
When X 1 > μ 10 and X 2 > μ 20 , and further ( μ 1 * , μ 20 ) < ( μ 10 , μ 2 * ) , equivalently, ( X 1 μ 20 ) / σ 1 < ( X 2 μ 20 ) / σ 2 , we have
μ ˜ = ( μ 10 , μ 2 * ) T .
This finishes the proof of Part (a).
For (b), the likelihood ratio test statistic for testing H 0 : μ Θ 0 versus H a : μ Θ 0 is
2 ( X 1 , X 2 ) ( μ ˜ 1 , μ ˜ 2 ) ,
which is equal to
0 if X 1 μ 20 or X 2 μ 20 , X 2 μ 20 σ 2 2 if X 1 > μ 10 , X 2 > μ 20 , and X 1 μ 10 σ 1 X 2 μ 20 σ 2 , X 1 μ 10 σ 1 2 , if X 1 > μ 10 , X 2 > μ 20 , and X 1 μ 10 σ 1 < X 2 μ 20 σ 2 .
After some calculation, the likelihood ratio test statistic can be equivalently written as
min X 1 μ 10 σ 1 + , X 2 μ 20 σ 2 + 2 .
This completes the proof of Part (b). □

Appendix C. Numerical Calculations of F ^ 0 , AUC 0 ( x ) , F ^ 1 , AUC 0 ( x ) , F ^ 0 , J 0 ( x ) , F ^ 1 , J 0 ( x )

For convenience, we write
n ( α , β ) = n ( θ ) ,
where n ( θ ) is the dual empirical log-likelihood function in (4). Further, let
α ^ ( β ) = arg max α n ( α , β )
and
p ^ i ( β ) = 1 n 1 1 ρ + ρ exp { α ^ ( β ) + β T q ( z i ) } , i = 1 , , n ,
which are the MELEs of α and p i s with the fixed β , respectively. Define
F ^ 0 ( x ; β ) = i = 1 n p ^ i ( β ) 1 z i x a n d F ^ 1 ( x ; β ) = i = 1 n q ^ i ( β ) 1 z i x ,
where q ^ i ( β ) = p ^ i ( β ) exp { α ^ ( β ) + β T q ( z i ) } .
For the fixed β , the MELE of A U C is
A U C ^ ( β ) = 1 2 i = 1 n q ^ i ( β ) F ^ 0 ( z i ; β ) + 1 2 i = 1 n p ^ i ( β ) { 1 F ^ 1 ( z i ; β ) } .
We use the following three steps to find F ^ 0 , A U C 0 ( x ) and F ^ 1 , A U C 0 ( x ) :
Step 1.
Find all β s such that A U C ^ ( β ) = A U C 0 .
Step 2.
Obtain
β ˜ A U C 0 = arg max β n ( α ^ ( β ) , β ) ,
where the maximization is over all β s in Step 1.
Step 3.
Calculate F ^ 0 , A U C 0 ( x ) = F ^ 0 ( x ; β ˜ A U C 0 ) and F ^ 1 , A U C 0 ( x ) = F ^ 1 ( x ; β ˜ A U C 0 ) .
Note that F ^ 0 , J 0 ( x ) , and F ^ 1 , J 0 ( x ) can be obtained similarly. For the fixed β , we obtain c ^ ( β ) by solving the following equation
α ^ ( β ) + β T q c ^ ( β ) = 0 .
With c ^ ( β ) , the estimator of J for the fixed β is defined as
J ^ ( β ) = F ^ 0 c ^ ( β ) ; β F ^ 1 c ^ ( β ) ; β .
We use the following three steps to find F ^ 0 , J 0 ( x ) and F ^ 1 , J 0 ( x ) :
Step 1.
Find all β s such that J ^ ( β ) = J 0 .
Step 2.
Obtain
β ˜ J 0 = arg max β n ( α ^ ( β ) , β ) ,
where the maximization is over all β s in Step 1.
Step 3.
Calculate F ^ 0 , J 0 ( x ) = F ^ 0 ( x ; β ˜ J 0 ) and F ^ 1 , J 0 ( x ) = F ^ 1 ( x ; β ˜ J 0 ) .

References

  1. Yin, J.; Tian, L. Joint confidence region estimation for area under ROC curve and Youden index. Stat. Med. 2014, 33, 985–1000. [Google Scholar] [CrossRef] [PubMed]
  2. Yin, J.; Mutiso, F.; Tian, L. Joint hypothesis testing of the area under the receiver operating characteristic curve and the Youden index. Pharm. Stat. 2021, 20, 657–674. [Google Scholar] [CrossRef] [PubMed]
  3. Pepe, M.S. Receiver operating characteristic methodology. J. Am. Stat. Assoc. 2000, 95, 308–311. [Google Scholar] [CrossRef]
  4. Qin, J.; Zhang, B. Using logistic regression procedures for estimating receiver operating characteristic curves. Biometrika 2003, 90, 585–596. [Google Scholar] [CrossRef]
  5. Zhou, X.H.; Obuchowski, N.A.; McClish, D.K. Statistical Methods in Diagnostic Medicine, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2011. [Google Scholar]
  6. Chen, B.; Li, P.; Qin, J.; Yu, T. Using a monotonic density ratio model to find the asymptotically optimal combination of multiple diagnostic tests. J. Am. Stat. Assoc. 2016, 111, 861–874. [Google Scholar] [CrossRef]
  7. Bradley, A.P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 1997, 30, 1145–1159. [Google Scholar] [CrossRef]
  8. Faraggi, D.; Reiser, B. Estimation of the area under the ROC curve. Stat. Med. 2002, 21, 3093–3106. [Google Scholar] [CrossRef] [PubMed]
  9. Youden, W.J. Index for rating diagnostic tests. Cancer 1950, 3, 32–35. [Google Scholar] [CrossRef]
  10. Fluss, B.; Faraggi, D.; Reiser, B. Estimation of the Youden Index and its associated cutoff point. Biom. J. 2005, 47, 458–472. [Google Scholar] [CrossRef] [PubMed]
  11. Schisterman, E.F.; Perkins, N.J.; Liu, A.; Bond, H. Optimal cut-point and its corresponding Youden index to discriminate individuals using pooled blood samples. Epidemiology 2005, 16, 73–81. [Google Scholar] [CrossRef] [PubMed]
  12. Lavrentieva, A.; Kontakiotis, T.; Lazaridis, L.; Tsotsolis, N.; Koumis, J.; Kyriazis, G.; Bitzani, M. Inflammatory markers in patients with severe burn injury: What is the best indicator of sepsis? Burns 2007, 33, 189–194. [Google Scholar] [CrossRef] [PubMed]
  13. Bantis, L.E.; Nakas, C.T.; Reiser, B. Constr.Construction of confidence regions in the ROC space after the estimation of the optimal Youden index-based cut-off point. Biometrics 2014, 70, 212–223. [Google Scholar] [CrossRef] [PubMed]
  14. Wotschofsky, Z.; Busch, J.; Jung, M.; Kempkensteffen, C.; Weikert, S.; Schaser, K.D.; Melcher, I.; Kilic, E.; Miller, K.; Kristiansen, G. Diagnostic and prognostic potential of differentially expressed miRNAs between metastatic and non-metastatic renal cell carcinoma at the time of nephrectomy. Clin. Chim. Acta 2013, 416, 5–10. [Google Scholar] [CrossRef] [PubMed]
  15. Jiang, S.; Tu, D. Inference on the probability P(T1<T2) as a measurement of treatment effect under a density ratio model and random censoring. Comput. Stat. Data Anal. 2012, 56, 1069–1078. [Google Scholar]
  16. Wang, C.; Marriott, P.; Li, P. Testing homogeneity for multiple nonnegative distributions with excess zero observations. Comput. Stat. Data Anal. 2017, 114, 146–157. [Google Scholar] [CrossRef]
  17. Yuan, M.; Li, P.; Wu, C. Semiparametric inference of the Youden index and the optimal cut-off point under density ratio models. Can. J. Stat. 2021, 49, 965–986. [Google Scholar] [CrossRef]
  18. Anderson, J.A. Multivariate logistic compounds. Biometrika 1979, 66, 17–26. [Google Scholar] [CrossRef]
  19. Qin, J.; Zhang, B. A goodness-of-fit test for logistic regression models based on case-control data. Biometrika 1997, 84, 609–618. [Google Scholar] [CrossRef]
  20. Qin, J. Biased Sampling, Over-Identified Parameter Problems and Beyond; Springer: Singapore, 2017. [Google Scholar]
  21. Hu, D.; Yuan, M.; Yu, T.; Li, P. Statistical inference for the two-sample problem under likelihood ratio ordering, with application to the ROC curve estimation. Stat. Med. 2023, 42, 3649–3664. [Google Scholar] [CrossRef] [PubMed]
  22. Zhang, B. A semiparametric hypothesis testing procedure for the ROC curve area under a density ratio model. Comput. Stat. Data Anal. 2006, 50, 1855–1876. [Google Scholar] [CrossRef]
  23. Owen, A.B. Empirical Likelihood; Chapman and Hall/CRC: New York, NY, USA, 2001. [Google Scholar]
  24. Cai, S.; Chen, J.; Zidek, J.V. Hypothesis testing in the presence of multiple samples under density ratio models. Stat. Sin. 2017, 27, 761–783. [Google Scholar] [CrossRef]
  25. Hsieh, F.; Turnbull, B.W. Nonparametric methods for evaluating diagnostic tests. Stat. Sin. 1996, 6, 47–62. [Google Scholar]
  26. Percy, M.E.; Andrews, D.F.; Thompson, M.W. Duchenne muscular dystrophy carrier detection using logistic discrimination: Serum creatine kinase, hemopexin, pyruvate kinase, and lactate dehydrogenase in combination. Am. J. Med. Genet. 1982, 13, 27–38. [Google Scholar] [CrossRef] [PubMed]
  27. Andrews, D.F.; Herzberg, A.M. Data: A Collection of Problems from Many Fields for the Student and Research Worker; Springer: New York, NY, USA, 2012. [Google Scholar]
  28. Yin, J.; Samawi, H.; Tian, L. Joint inference about the AUC and Youden index for paired biomarkers. Stat. Med. 2022, 41, 37–64. [Google Scholar] [CrossRef] [PubMed]
  29. Wang, J.; Yin, J.; Tian, L. Evaluating joint confidence region of hypervolume under ROC manifold and generalized Youden index. Stat. Med. 2024, 43, 869–889. [Google Scholar] [CrossRef]
  30. McClish, D.K. Analyzing a portion of the ROC curve. Med. Decis. Mak. 1989, 9, 190–195. [Google Scholar] [CrossRef] [PubMed]
  31. Jiang, Y.; Metz, C.E.; Nishikawa, R.M. A receiver operating characteristic partial area index for highly sensitive diagnostic tests. Radiology 1996, 201, 745–750. [Google Scholar] [CrossRef] [PubMed]
  32. Zhang, D.D.; Zhou, X.H.; Freeman, D.H., Jr.; Free, J.L. A non-parametric method for the comparison of partial areas under ROC curves and its application to large health care data sets. Stat. Med. 2002, 21, 701–715. [Google Scholar] [CrossRef]
  33. Dodd, L.E.; Pepe, M.S. Partial AUC estimation and regression. Biometrics 2003, 59, 614–623. [Google Scholar] [CrossRef] [PubMed]
  34. Ma, H.; Bandos, A.I.; Rocket, H.E.; Gur, D. On use of partial area under the ROC curve for evaluation of diagnostic performance. Stat. Med. 2013, 32, 3449–3458. [Google Scholar] [CrossRef] [PubMed]
Figure 1. The 95% confidence regions of ( A U C , J ) for biomarker H in the DMD dataset based on the BEL (black solid), GPQ (red dashed), and BTAT (blue dot–dashed) methods. The BEL point estimate of ( A U C , J ) is ( 0.753 , 0.371 ) , which is indicated by the black hollow point at the center of the ellipse.
Figure 1. The 95% confidence regions of ( A U C , J ) for biomarker H in the DMD dataset based on the BEL (black solid), GPQ (red dashed), and BTAT (blue dot–dashed) methods. The BEL point estimate of ( A U C , J ) is ( 0.753 , 0.371 ) , which is indicated by the black hollow point at the center of the ellipse.
Mathematics 12 02118 g001
Table 1. Simulation settings.
Table 1. Simulation settings.
Distribution AUC * J * μ 0 σ 0 2 μ 1 σ 1 2
lognormal0.7070.3010.771
0.8300.5011.351
0.9280.7012.071
Distribution AUC * J * a 0 b 0 a 1 b 1
beta0.7040.31.532.773
0.8240.51.534.253
0.9220.71.537.093
Table 2. Summary of CP (%) and ACR (×100, in parentheses) of 95% confidence regions for ( A U C , J ) under lognormal distributions.
Table 2. Summary of CP (%) and ACR (×100, in parentheses) of 95% confidence regions for ( A U C , J ) under lognormal distributions.
( n 0 , n 1 ) AUC * J * ELBELADGPQBTIBTAT
(50, 50)0.7070.3 90.20 ( 0.50 ) 95.85 ( 0.71 ) 98.90 ( 5.08 ) 96.70 ( 1.28 ) 52.80 ( 3.08 ) 55.10 ( 3.18 )
0.8300.5 90.80 ( 0.60 ) 94.45 ( 0.81 ) 97.70 ( 3.00 ) 96.40 ( 0.79 ) 87.90 ( 2.29 ) 89.05 ( 2.35 )
0.9280.7 88.55 ( 0.45 ) 94.85 ( 0.85 ) 96.45 ( 2.52 ) 95.50 ( 0.46 ) 92.10 ( 1.22 ) 93.65 ( 1.22 )
(100, 100)0.7070.3 93.40 ( 0.24 ) 95.50 ( 0.29 ) 99.30 ( 1.94 ) 98.20 ( 0.48 ) 53.30 ( 1.67 ) 54.05 ( 1.71 )
0.8300.5 92.65 ( 0.29 ) 94.45 ( 0.34 ) 98.75 ( 1.39 ) 97.85 ( 0.28 ) 83.80 ( 1.26 ) 84.85 ( 1.28 )
0.9280.7 91.35 ( 0.22 ) 95.05 ( 0.29 ) 97.30 ( 2.67 ) 95.70 ( 0.17 ) 91.20 ( 0.68 ) 92.60 ( 0.69 )
(150, 50)0.7070.3 92.60 ( 0.28 ) 95.80 ( 0.37 ) 99.50 ( 3.81 ) 98.20 ( 0.72 ) 46.45 ( 2.13 ) 48.70 ( 2.18 )
0.8300.5 93.10 ( 0.34 ) 95.90 ( 0.43 ) 99.35 ( 2.58 ) 98.05 ( 0.43 ) 83.00 ( 1.58 ) 84.15 ( 1.61 )
0.9280.7 89.85 ( 0.25 ) 94.60 ( 0.38 ) 97.10 ( 1.87 ) 96.60 ( 0.25 ) 90.55 ( 0.82 ) 92.00 ( 0.83 )
Table 3. Summary of CP (%) and ACR (×100, in parentheses) of 95% confidence regions for ( A U C , J ) under beta distributions.
Table 3. Summary of CP (%) and ACR (×100, in parentheses) of 95% confidence regions for ( A U C , J ) under beta distributions.
( n 0 , n 1 ) AUC * J * ELBELADGPQBTIBTAT
(50, 50)0.7040.3 92.00 ( 0.49 ) 96.50 ( 0.64 ) 96.70 ( 3.48 ) 95.00 ( 1.51 ) 60.85 ( 3.15 ) 65.80 ( 3.45 )
0.8240.5 91.25 ( 0.62 ) 94.60 ( 0.80 ) 89.85 ( 1.67 ) 87.20 ( 1.21 ) 87.55 ( 2.44 ) 88.65 ( 2.50 )
0.9220.7 89.85 ( 0.53 ) 96.45 ( 0.93 ) 62.90 ( 0.86 ) 58.90 ( 0.82 ) 90.25 ( 1.42 ) 91.80 ( 1.43 )
(100, 100)0.7040.3 92.80 ( 0.24 ) 94.70 ( 0.27 ) 96.30 ( 0.96 ) 94.70 ( 0.67 ) 59.55 ( 1.72 ) 60.10 ( 1.76 )
0.8240.5 93.15 ( 0.30 ) 94.50 ( 0.34 ) 86.10 ( 0.62 ) 82.40 ( 0.57 ) 86.10 ( 1.35 ) 86.70 ( 1.37 )
0.9220.7 92.65 ( 0.26 ) 95.50 ( 0.32 ) 44.40 ( 0.38 ) 39.05 ( 0.38 ) 92.10 ( 0.81 ) 93.90 ( 0.81 )
(150, 50)0.7040.3 91.90 ( 0.27 ) 94.85 ( 0.33 ) 96.05 ( 1.42 ) 94.25 ( 0.86 ) 58.40 ( 2.03 ) 58.60 ( 2.07 )
0.8240.5 94.20 ( 0.33 ) 96.05 ( 0.39 ) 85.90 ( 0.82 ) 81.85 ( 0.70 ) 85.10 ( 1.50 ) 85.75 ( 1.52 )
0.9220.7 93.20 ( 0.28 ) 95.40 ( 0.34 ) 43.10 ( 0.44 ) 39.85 ( 0.44 ) 93.25 ( 0.81 ) 94.75 ( 0.82 )
Table 4. Simulated type-I errors and powers of three tests for testing H 0 : A U C 0.830 or J 0.5 versus H a : A U C > 0.830 and J > 0.5 under lognormal distributions at the 0.05 significance level.
Table 4. Simulated type-I errors and powers of three tests for testing H 0 : A U C 0.830 or J 0.5 versus H a : A U C > 0.830 and J > 0.5 under lognormal distributions at the 0.05 significance level.
( AUC * , J * ) ( n 0 , n 1 ) BELTPBANKS
(0.830, 0.5)(50, 50)5.602.650.25
(75, 50)5.602.400.10
(75, 75)5.502.350.05
(0.928, 0.7)(50, 50)90.8558.3524.45
(75, 50)96.3566.2523.10
(75, 75)98.2570.7537.25
Table 5. Simulated type-I errors and powers of three tests for testing H 0 : A U C 0.824 or J 0.5 versus H a : A U C > 0.824 and J > 0.5 under beta distributions at the 0.05 significance level.
Table 5. Simulated type-I errors and powers of three tests for testing H 0 : A U C 0.824 or J 0.5 versus H a : A U C > 0.824 and J > 0.5 under beta distributions at the 0.05 significance level.
( AUC * , J * ) ( n 0 , n 1 ) BELTPBANKS
(0.824, 0.5)(50, 50)5.657.303.05
(75, 50)5.408.752.30
(75, 75)5.057.902.45
(0.922, 0.7)(50, 50)91.0094.0081.55
(75, 50)95.7097.6090.25
(75, 75)97.7099.0593.90
Table 6. Point estimates (PEs) of ( A U C , J ) and the ACRs(×100) for ( A U C , J ) at the 95% confidence level based on the BEL, GPQ, and BTAT methods.
Table 6. Point estimates (PEs) of ( A U C , J ) and the ACRs(×100) for ( A U C , J ) at the 95% confidence level based on the BEL, GPQ, and BTAT methods.
Biomarker ( AUC , J ) BELGPQBTAT
PKPE ( 0.824 , 0.483 ) ( 0.803 , 0.481 ) ( 0.814 , 0.511 )
ACR 0.442 0.836 1.743
HPE ( 0.753 , 0.371 ) ( 0.748 , 0.374 ) ( 0.760 , 0.421 )
ACR 0.332 0.573 1.772
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, S.; Tian, Q.; Liu, Y.; Li, P. Joint Statistical Inference for the Area under the ROC Curve and Youden Index under a Density Ratio Model. Mathematics 2024, 12, 2118. https://doi.org/10.3390/math12132118

AMA Style

Liu S, Tian Q, Liu Y, Li P. Joint Statistical Inference for the Area under the ROC Curve and Youden Index under a Density Ratio Model. Mathematics. 2024; 12(13):2118. https://doi.org/10.3390/math12132118

Chicago/Turabian Style

Liu, Siyan, Qinglong Tian, Yukun Liu, and Pengfei Li. 2024. "Joint Statistical Inference for the Area under the ROC Curve and Youden Index under a Density Ratio Model" Mathematics 12, no. 13: 2118. https://doi.org/10.3390/math12132118

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop