**Abbreviations**

The following abbreviations are used in this manuscript:


### **Appendix A. Proofs**

### *Appendix A.1. Proof for Proposition 1*

**Proof.** By construction of the criterion, as stated in Assumption 1, arg min*θ*∈<sup>Θ</sup> *Q*∞(*θ*) is its minimizer, and, by assuming *θ*0 ∈ Θ, it is also equal to *θ*0. Hence, item 2 is equivalent to item 1 by definition under correct specification.

The equivalence of the deterministic limit criterion (item 2) as a function describing the divergence of the underlying probability measures of **w** and **w**<sup>ˆ</sup> (item 4) is assumed, however, given a limit criterion function *Q*∞ : Θ → R and a flexible definition of divergence (e.g., a pre-metric, such as the *KL*-divergence), it is often possible to find a divergence *d*P : PΘ × PΘ → R≥0 on the space of probability measures satisfying arg min*θ*∈<sup>Θ</sup> *<sup>d</sup>*P(P**<sup>w</sup>**,P**wˆ***<sup>θ</sup>* ) = arg min*θ*∈<sup>Θ</sup> *Q*∞(*θ*). The *KL*-divergence example is provided in this paper in the context of the maximum likelihood criterion.

By the assumption that *r* exists, the deterministic limit criterion that minimizes divergence, is also the minimizer of a distance metric *<sup>d</sup>*∗P(*P***<sup>w</sup>**, *<sup>P</sup>***<sup>w</sup>**<sup>ˆ</sup>*θ* ), hence item 4 is also equivalent to item 2.

Finally, since *f*W : Θ → *<sup>F</sup>*Θ(W) is injective, (*P***<sup>w</sup>**, *<sup>P</sup>***<sup>w</sup>**<sup>ˆ</sup>*θ* ) ≡ *<sup>d</sup>*<sup>∗</sup>*F*(*f* **w**, *f*(·, *θ*)) ∀ *θ* ∈ Θ and *d*∗*F* is a metric on *<sup>F</sup>*Θ(W), *θ*0 is also the minimizer of *<sup>d</sup>*<sup>∗</sup>*F*(*f* **w**, *f*(·, *θ*)) ∀ *θ* ∈ Θ so that item 3 is equivalent to item 2.

### *Appendix A.2. Proof for Proposition 2*

**Proof.** The result follows immediately by the arguments used in proposition 1 dropping only the first equivalence.

*Appendix A.3. Proof for Proposition 3* **Proof.** First,Hellingerdistanceis

$$H(P^{\mathbf{w}}(\mathbf{w}|\boldsymbol{\theta}\_{\mathbf{w}}), P^{\boldsymbol{\Psi}}(\mathbf{w}|\boldsymbol{\theta}\_{\mathbf{w}})) = \sqrt{\frac{1}{2} \int \left(\sqrt{p^{\mathbf{w}}(\mathbf{w}|\boldsymbol{\theta}\_{\mathbf{w}})} - \sqrt{p^{\boldsymbol{\Psi}}(\mathbf{w}|\boldsymbol{\theta}\_{\mathbf{w}})}\right)^{2} d\mathbf{w}}.$$

hence,

$$\int \left( H(P^{\mathbf{w}}(\mathbf{w}|\boldsymbol{\theta}\_{\mathbf{w}}), P^{\boldsymbol{\Phi}}(\mathbf{w}|\boldsymbol{\theta}\_{\mathbf{w}})) \right)^2 = \frac{1}{2} \int \left( \sqrt{p^{\mathbf{w}}(\mathbf{w}|\boldsymbol{\theta}\_{\mathbf{w}})} - \sqrt{p^{\boldsymbol{\Phi}}(\mathbf{w}|\boldsymbol{\theta}\_{\mathbf{w}})} \right)^2 d\mathbf{w} \dots$$

Now, the R.H.S. can be written as

$$\frac{1}{2}\int p^{\mathbf{w}}(\mathbf{w}|\boldsymbol{\theta}\_{\mathbf{w}})d\mathbf{w} + \frac{1}{2}\int p^{\boldsymbol{\Phi}}(\mathbf{w}|\boldsymbol{\theta}\_{\mathbf{w}})d\mathbf{w} - \int \sqrt{p^{\mathbf{w}}(\mathbf{w}|\boldsymbol{\theta}\_{\mathbf{w}})p^{\boldsymbol{\Phi}}(\mathbf{w}|\boldsymbol{\theta}\_{\mathbf{w}})}d\mathbf{w}.$$

The integral of a probability density over its domain equals 1, hence the sum of the first two terms is 1, hence this can be rewritten as

$$1 - \int \sqrt{p^{\mathbf{w}}(\mathbf{w}|\boldsymbol{\theta}\_{\mathbf{w}})p^{\boldsymbol{\aleph}}(\mathbf{w}|\boldsymbol{\theta}\_{\mathbf{w}})}d\mathbf{w}.$$

This has an upper bound, provided by the inequality

$$1 - \int \sqrt{p^{\mathbf{w}}(\mathbf{w}|\boldsymbol{\theta}\_{\mathbf{w}})p^{\boldsymbol{\Psi}}(\mathbf{w}|\boldsymbol{\theta}\_{\boldsymbol{\Psi}})}d\mathbf{w} \leq -\ln\int \sqrt{p^{\mathbf{w}}(\mathbf{w}|\boldsymbol{\theta}\_{\mathbf{w}})p^{\boldsymbol{\Psi}}(\mathbf{w}|\boldsymbol{\theta}\_{\boldsymbol{\Psi}})}d\mathbf{w}.$$

Write R.H.S. as − ln + ⎡ ⎣ 9 *p***w**ˆ (**w**|*<sup>θ</sup>***w**<sup>ˆ</sup> ) *p***<sup>w</sup>**(**w**|*<sup>θ</sup>***w**) *p***<sup>w</sup>**(**w**|*<sup>θ</sup>***w**) ⎤ ⎦*d***w** and to obtain the upper bound

$$-\ln\int\left[\sqrt{\frac{p^{\boldsymbol{\Psi}}(\mathbf{w}|\boldsymbol{\theta}\_{\mathbf{W}})}{p^{\boldsymbol{\Psi}}(\mathbf{w}|\boldsymbol{\theta}\_{\mathbf{W}})}}p^{\boldsymbol{\Psi}}(\mathbf{w}|\boldsymbol{\theta}\_{\mathbf{W}})\right]d\mathbf{w} \leq -\int\left[\ln\sqrt{\frac{p^{\boldsymbol{\Psi}}(\mathbf{w}|\boldsymbol{\theta}\_{\mathbf{W}})}{p^{\boldsymbol{\Psi}}(\mathbf{w}|\boldsymbol{\theta}\_{\mathbf{W}})}}p^{\boldsymbol{\Psi}}(\mathbf{w}|\boldsymbol{\theta}\_{\mathbf{W}})\right]d\mathbf{w} \leq$$

by applying Jensen's inequality, which can be applied to the integral case, since any random variable whose distribution admits a probability density function has the expected value represented by the integral over the full range of the density.

Finally, define the R.H.S. as

$$\lim\_{\epsilon \to 0} \int \left[ \ln \frac{p^{\mathbf{w}}(\mathbf{w}|\boldsymbol{\theta}\_{\mathbf{w}})}{p^{\boldsymbol{\aleph}}(\mathbf{w}|\boldsymbol{\theta}\_{\mathbf{w}})} p^{\mathbf{w}}(\mathbf{w}|\boldsymbol{\theta}\_{\mathbf{w}}) \right] d\mathbf{w} = -\int \left[ \ln \sqrt{\frac{p^{\boldsymbol{\aleph}}(\mathbf{w}|\boldsymbol{\theta}\_{\mathbf{w}})}{p^{\boldsymbol{\aleph}}(\mathbf{w}|\boldsymbol{\theta}\_{\mathbf{w}})}} p^{\mathbf{w}}(\mathbf{w}|\boldsymbol{\theta}\_{\mathbf{w}}) \right] d\mathbf{w} \,\omega$$

and conclude that the last expression is equivalent to the Kullback–Leibler divergence by an elementary row operation.

$$E\int \left[\ln \frac{p^{\mathbf{w}}(\mathbf{w}|\boldsymbol{\theta}\_{\mathbf{w}})}{p^{\hat{\mathbf{w}}}(\mathbf{w}|\boldsymbol{\theta}\_{\hat{\mathbf{w}}})} p^{\mathbf{w}}(\mathbf{w}|\boldsymbol{\theta}\_{\mathbf{w}})\right] d\mathbf{w} \equiv KL\left(P^{\mathbf{w}}(\mathbf{w}|\boldsymbol{\theta}\_{\mathbf{w}}) || P^{\hat{\mathbf{w}}}(\mathbf{w}|\boldsymbol{\theta}\_{\hat{\mathbf{w}}})\right).$$
