*2.2. The Discrepancy Comparison Probability and Bootstrap Discrepancy Comparison Probability*

Suppose that we have two nested models that are formulated to characterize the sample *y*, and we designate one of the models the null, represented by *θ*1, and the other model the alternative, represented by *θ*2. The discrepancies under the fitted null and alternative models are given by *d*(*g*, ˆ *θ*1) and *d*(*g*, ˆ *θ*2), respectively. We can use these discrepancies to define the Kullback–Leibler discrepancy comparison probability (KLDCP), which is given by

$$P = \Pr[d(\mathcal{g}, \hat{\theta}\_1) < d(\mathcal{g}, \hat{\theta}\_2)].$$

The KLDCP evaluates the probability that the fitted null model is closer to the true data-generating model than the fitted alternative. The values of *d*(*g*, ˆ *θ*1) and *d*(*g*, ˆ *θ*2) are calculated from the same sample. For example, a KLDCP of 0.8 means that the fitted null has a smaller discrepancy than the fitted alternative in 80% of the samples drawn from the same distribution and of the same size. The development and interpretation of the KLDCP is presented in depth by Riedle, Neath and Cavanaugh [1].

We can estimate the KLDCP using the bootstrap approximation of the joint distribution of *d*(*g*, ˆ *θ*1) and *d*(*g*, ˆ *θ*2). The bootstrap joint distribution is based on the discrepancy estimators that arise from the "plug-in" principle, as described by Efron and Tibshirani [2] , which replaces all the elements of the KLD by their bootstrap analogues. Specifically, we replace *g* by the empirical distribution *g*ˆ; *y* by the bootstrap sample from *g*ˆ, which we call *y*∗; and finally, ˆ *θ* by the maximum likelihood estimate (MLE) derived under the bootstrap sample *y*∗, which we call ˆ *θ*∗. With these replacements, the bootstrap version of the KLD is given by

$$\begin{aligned} d(\pounds, \hat{\theta}^\*) &= E\_{\hat{\mathcal{S}}}[-2\ell(\theta|y)]|\_{\theta=\hat{\theta}^\*} \\ &= \sum\_{i=1}^n -2\ell\_i(\hat{\theta}^\*|y\_i) \text{ (because each } y\_i \text{ is independent.)} \\ &= -2\ell(\hat{\theta}^\*|y)\_{\prime} \end{aligned}$$

where *<sup>i</sup>* represents the contribution to the likelihood based on the *i*th response *yi*.

Now, in order to build a bootstrap distribution, we must draw various bootstrap samples from *y*. Suppose that we draw *j* = 1, 2, ... , *J* bootstrap samples, and for each of these samples, we calculate the MLE of *θ*, which we denote as ˆ *θ*∗(*j*). This allows us to obtain a set of *J* different bootstrap discrepancies; this set is defined as

$$\{d(\oint\_{\prime} \hat{\theta}^\*(j)) : j = 1, \dots, J\}\_{\prime \prime}$$

and these variates can be used to construct the bootstrap analogue of the discrepancy distribution.

Finally, we can extend this procedure to the setting of the null and alternative models. For each bootstrap sample, we calculate ˆ *θ*∗ <sup>2</sup> (*j*) and <sup>ˆ</sup> *θ*∗ <sup>1</sup> (*j*), which are the bootstrap sample MLEs of *θ*<sup>2</sup> and *θ*1, respectively. We then compute the discrepancies *d*(*g*ˆ, ˆ *θ*∗ <sup>2</sup> (*j*)) and *d*(*g*ˆ, ˆ *θ*∗ <sup>1</sup> (*j*)) for the null and alternative models, respectively. This collection of *J* pairs of null and alternative bootstrap discrepancies defines the set

$$\{ (d(\pounds, \hat{\theta}\_1^\*(j)), d(\pounds, \hat{\theta}\_2^\*(j))) : j = 1, \dots, J \}\_{\prime \prime}$$

which characterizes the bootstrap analogue of the joint distribution of *d*(*g*ˆ, ˆ *θ*1) and *d*(*g*ˆ, ˆ *θ*2)). The bootstrap distribution can be utilized to estimate the bootstrap analogue of the DCP, given by

$$P^\* = \text{Pr}^\*[d(\mathfrak{g}, \theta\_1^\*) < d(\mathfrak{g}, \theta\_2^\*)].$$

By the law of large numbers, we can approximate *P*∗ by calculating the proportion of times when *d*(*g*ˆ, ˆ *θ*∗ <sup>1</sup> (*j*)) <sup>&</sup>lt; *<sup>d</sup>*(*g*ˆ, <sup>ˆ</sup> *θ*∗ <sup>2</sup> (*j*)) in the *J* bootstrap samples that were drawn. Thus, if *I* is an indicator function, we can define an estimator of the DCP, which we call the bootstrap discrepancy comparison probability (BDCP), as follows:

$$\text{BDCP} = \frac{1}{J} \sum\_{j=1}^{J} I[d(\mathfrak{J}\_{\prime}\theta\_{1}^{\*}(j)) < d(\mathfrak{J}\_{\prime}\theta\_{2}^{\*}(j))].\tag{1}$$
