*2.1. Background*

When faced with the task of choosing amongst competing models, statisticians often use discrepancy or divergence functions. One of the most flexible and ubiquitous divergence measures is the Kullback–Leibler information. To introduce this measure in the present context, consider a vector of independent observations *y* = (*y*1, *y*2, ... , *yn*)*<sup>T</sup>* such that *y* is generated from an unknown distribution *g*(*y*). Suppose that a candidate

**Citation:** Dajles, A.; Cavanaugh, J. Probabilistic Pairwise Model Comparisons Based on Bootstrap Estimators of the Kullback–Leibler Discrepancy. *Entropy* **2022**, *24*, 1483. https://doi.org/10.3390/e24101483

Academic Editors: Karagrigoriou Alexandros and Makrides Andreas

Received: 27 September 2022 Accepted: 16 October 2022 Published: 18 October 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Department of Biostatistics, University of Iowa, 145 N. Riverside Drive, Iowa City, IA 52242, USA

model *f*(*y*|*θ*) is proposed as an approximation for *g*(*y*), and that this model belongs to the parametric class of densities

$$F = [f(y|\theta) : \theta \in \Theta]\_{\prime}.$$

where Θ is the parameter space for *θ*. The Kullback–Leibler information, given by

$$I\_{KL}(\mathcal{g}, \theta) = E\_{\mathcal{S}} \left[ \log \frac{g(y)}{f(y|\theta)} \right],$$

captures the separation between the proposed model *f*(*y*|*θ*) and the true data-generating model *g*(*y*).

Although not a formal metric, *IKL*(*g*, *θ*) is characterized by two desirable properties. First, by Jensen's inequality, *IKL*(*g*, *θ*) ≥ 0 with equality if and only if *g*(*y*) = *f*(*y*|*θ*). Second, as the dissimilarity between *g*(*y*) and *f*(*y*|*θ*) increases, *IKL*(*g*, *θ*) increases accordingly.

Note that we can write

$$\begin{aligned} 2I\_{KL}(\mathcal{g}, \theta) &= E\_{\mathbb{S}}[-2\log(f(\mathcal{y}|\theta))] - E\_{\mathbb{S}}[-2\log(\mathcal{g}(\mathcal{y}))] \\ &= E\_{\mathbb{S}}[-2\ell(\theta|\mathcal{y}))] - E\_{\mathbb{S}}[-2\log(\mathcal{g}(\mathcal{y}))]. \end{aligned}$$

where log(*f*(*y*|*θ*)) = (*θ*|*y*). In the preceding relation, for any proposed candidate model, the quantity *Eg*[−2 log(*g*(*y*))] is constant. Only the quantity *Eg*[−2(*θ*|*y*)] changes across different models, which means it is the only quantity needed to distinguish among various models. The expression

$$d(\mathfrak{g}, \theta) = E\_{\mathfrak{F}}[-2\ell(\theta|y))$$

is known as the Kullback–Leibler discrepancy (KLD) and is often used as a substitute for *IKL*(*g*, *θ*).

In practice, the goal is to determine the propriety of fitted models of the form *<sup>f</sup>*(*y*|<sup>ˆ</sup> *θ*), where ˆ *<sup>θ</sup>* <sup>=</sup> argmax*θ*∈<sup>Θ</sup> (*θ*|*y*). The KL discrepancy for the fitted model is given by

$$d(\mathcal{g}, \hat{\theta}) = E\_{\mathfrak{F}}[-2\ell(\theta|y)]|\_{\theta=\hat{\theta}}.$$
