**3. Bias Corrections for the BDCP**

An important issue that arises in the bootstrap estimation of the KLD is the negative bias of the discrepancy estimators that materializes from the "plug-in" principle. The following lemma establishes and quantifies this bias for large-sample settings under an appropriately specified candidate model.

**Lemma 1.** *For a large sample size, assuming that the candidate model subsumes the true model, we have*

$$E\_{\mathfrak{F}}\left\{E\_{\ast}\left[-2\ell(\hat{\theta}^{\ast}|y)\right]\right\} \approx E\_{\mathfrak{F}}[d(\mathfrak{g},\hat{\theta})] - k\_{\prime}$$

*where E*<sup>∗</sup> *is the expectation with respect to the bootstrap distribution, and k is the dimension of the model.*

**Proof.** For a maximum likelihood estimator ˆ *θ*, it is well known that for a large sample size and under certain regularity conditions, we have

$$(\theta - \theta)^T I (\theta | y) (\theta - \theta) \sim \chi^2\_{k'} \tag{2}$$

provided that the model is adequately specified. In the preceding, *χ*<sup>2</sup> *<sup>k</sup>* denotes a centrally distributed chi-square random variable with *k* degrees-of-freedom.

Now, consider the second-order Taylor series expansion of <sup>−</sup>2(<sup>ˆ</sup> *<sup>θ</sup>*∗|*y*) about <sup>ˆ</sup> *θ*, which results in

$$-2\ell(\dot{\theta}^\*|y) \approx -2\ell(\dot{\theta}|y) + (\dot{\theta}^\* - \dot{\theta})^T I(\dot{\theta}|y)(\dot{\theta}^\* - \dot{\theta}).\tag{3}$$

By taking the expected value of both sides of (3) with respect to the bootstrap distribution of ˆ *θ*∗, we obtain

$$\begin{split} E\_\*\left(-2\ell(\boldsymbol{\theta}^\*|\boldsymbol{y})\right) &\approx -E\_\*\left(2\ell(\boldsymbol{\theta}|\boldsymbol{y})\right) + E\_\*\left((\boldsymbol{\theta}^\*-\boldsymbol{\theta})^T I(\boldsymbol{\theta}|\boldsymbol{y})(\boldsymbol{\theta}^\*-\boldsymbol{\theta})\right) \\ &\approx -2\ell(\boldsymbol{\hat{\theta}}|\boldsymbol{y}) + k \quad \text{(by the approximation in (2))}, \\ &= \mathrm{AIC} - k, \end{split}$$

where AIC denotes the Akaike information criterion.

Finally, it has been established that if the true model is contained in the candidate class at hand, and if the large sample properties of MLEs hold, then AIC serves as an asymptotically unbiased estimator of the KLD. Thus,

$$\begin{aligned} E\_{\mathbb{S}}\left(E\_\*\left(-2\ell(\hat{\theta}^\*|y)\right)\right) &\approx E\_{\mathbb{S}}(\text{AIC}) - k\\ &\approx E\_{\mathbb{S}}(d(\mathcal{g}, \theta)) - k. \end{aligned}$$

The preceding expression can be re-written as

$$E\_{\mathcal{S}}(d(\mathcal{g}, \hat{\theta})) \approx E\_{\mathcal{S}}\left(E\_\*\left(-2\ell(\hat{\theta}^\*|y)\right)\right) + k\_\*\bar{\epsilon}$$

which implies that the bias correction *k* must be added to the bootstrap discrepancy in the estimation of the KLD. The BD estimator corrected by the addition of *k* will be called BDk.

Now, focus again on Equation (3). By subtracting (−2(<sup>ˆ</sup> *θ*|*y*)) from both sides of the equation, we obtain

$$-2\ell(\hat{\theta}^\*|y) - (-2\ell(\hat{\theta}|y)) \approx (\hat{\theta}^\* - \hat{\theta})^T I(\hat{\theta}|y)(\hat{\theta}^\* - \hat{\theta}).\tag{4}$$

As mentioned previously, if the candidate model is adequately specified, then the distributional approximation in (2) holds true. However, if this model specification assumption is not met, then we can utilize the approximation in (4) to find a suitable bias correction via the bootstrap. The bootstrap has been used for bias corrections in similar problem contexts [3,4].

By applying the expected value with respect to the bootstrap distribution of ˆ *θ*∗ to both sides of (4), we obtain

$$E\_\*\left(-2\ell(\theta^\*|y)\right) - (-2\ell(\theta|y)) \approx E\_\*\left((\theta^\*-\theta)^T I(\theta|y)(\theta^\*-\theta)\right).\tag{5}$$

The goal is then to find an approximation of *E*<sup>∗</sup> <sup>−</sup>2(<sup>ˆ</sup> *θ*∗|*y*) <sup>−</sup> (−2(<sup>ˆ</sup> *θ*|*y*)). Note that by the law of large numbers, we have that when *J* → ∞,

$$\frac{1}{J} \sum\_{j=1}^{J} -2\ell(\theta^\*(j)|y) \to E\_\*(-2\ell(\theta^\*|y)).$$

Thus, for *J* → ∞, we can assert

$$\frac{1}{J} \sum\_{j=1}^{J} -2\ell(\hat{\theta}^\*(j)|y) - (-2\ell(\hat{\theta}|y)) \to E\_\*( -2\ell(\hat{\theta}^\*|y)) - (-2\ell(\hat{\theta}|y)).$$

The preceding result shows that <sup>1</sup> *<sup>J</sup>* <sup>∑</sup>*<sup>J</sup> <sup>j</sup>*=<sup>1</sup> <sup>−</sup>2(<sup>ˆ</sup> *<sup>θ</sup>*∗(*j*)|*y*) <sup>−</sup> (−2(<sup>ˆ</sup> *θ*|*y*)) serves as an asymptotically unbiased estimator of *<sup>E</sup>*∗(−2(<sup>ˆ</sup> *<sup>θ</sup>*∗|*y*)) <sup>−</sup> (−2(<sup>ˆ</sup> *θ*|*y*)). We therefore propose using

$$k\_{\theta} = \frac{1}{J} \sum\_{j=1}^{J} -2\ell(\hat{\theta}^\*(j)|y) - (-2\ell(\hat{\theta}|y))^2$$

as a bootstrap-based correction of the BD. A more in-depth derivation and exploration of the *kb* correction can be found in Cavanaugh and Shumway [5].

Subsequently, the bootstrap approximation of the KLD with a bootstrap-based bias correction is expressed by *<sup>E</sup>*∗(−2(<sup>ˆ</sup> *θ*∗|*y*)) + *kb*, and is estimated by

$$\text{BDb} = \frac{1}{J} \sum\_{j=1}^{J} -2\ell(\hat{\theta}^\*(j)|y) + k\_b.$$

It follows that the bootstrap bias-corrected BDCP would be defined as

$$\text{BDCPb} = \frac{1}{J} \sum\_{j=1}^{J} I \left[ d(\not\mathcal{g}, \hat{\theta}\_1^\*(j)) + k\_{1b} < d(\not\mathcal{g}, \hat{\theta}\_2^\*(j)) + k\_{2b} \right],\tag{6}$$

where *k*1*<sup>b</sup>* and *k*2*<sup>b</sup>* correspond to the bootstrap-based corrections for the null and alternative models, respectively.

Similarly, the *k* bias-corrected BD is expressed as

$$\text{BDk} = \frac{1}{J} \sum\_{j=1}^{J} -2\ell(\hat{\theta}^\*(j)|y) + k\_{\text{v}}$$

and the *k* bias-corrected BDCP is given by

$$\text{BDCPk} = \frac{1}{J} \sum\_{j=1}^{J} I \left[ d(\mathcal{g}, \hat{\theta}\_1^\*(j)) + k\_1 < d(\mathcal{g}, \hat{\theta}\_2^\*(j)) + k\_2 \right],\tag{7}$$

where *k*<sup>1</sup> and *k*<sup>2</sup> are the number of functionally independent parameters that define the null and alternative models, respectively.
