Set 4: *Both null and alternative models are equally underspecified.*

Consider the true data-generating model given by

$$y\_i = \beta\_{0,1} + \beta\_{0,2}\mathbf{x}\_{i2} + \beta\_{0,3}\mathbf{x}\_{i3} + \beta\_{0,4}\mathbf{x}\_{i4} + \beta\_{0,5}\mathbf{x}\_{i5} + \beta\_{0,6}\mathbf{x}\_{i6} + \beta\_{0,7}\mathbf{x}\_{i7} + \varepsilon\_i$$

with *<sup>i</sup>* ∼ *N*(0, 50), *β*0,1 = 1, *β*0,2 = *β*0,3 = *β*0,6 = *β*0,7 = 0.5, *β*0,4 = *β*0,5 = −0.5, and , *xi*<sup>1</sup> *xi*<sup>2</sup> ··· *xi*<sup>7</sup> -*<sup>T</sup>* is sampled as indicated in (8).

For the hypothesis testing setting in Set 4, the null and alternative models are

$$\begin{aligned} H\_1: y\_i &= \beta\_1 + \beta\_2 x\_{2i} + \beta\_3 x\_{i3}, \\ H\_2: y\_i &= \beta\_1 + \beta\_4 x\_{i4} + \beta\_5 x\_{i5}. \end{aligned}$$

Here, the null and alternative candidate models are equally underspecified because they have the same number of explanatory variables with the same effect sizes, and neither model captures the true data-generating model.

Set 5: *Null model has correct mean specification and alternative model is overspecified, but both are misspecified with respect to the error distribution, which is a Student's t distribution.*

Consider the true data generating model given by

$$y\_i = \beta\_{0,1} + \epsilon\_{i\prime}$$

with *<sup>i</sup>* <sup>∼</sup> *td f*<sup>=</sup><sup>5</sup> and *<sup>β</sup>*0,1 <sup>=</sup> 1. Therefore, *<sup>σ</sup>*<sup>2</sup> <sup>0</sup> = <sup>5</sup> 3 . For the hypothesis testing setting in Set 5, the null and alternative models are

$$\begin{aligned} H\_1: y\_i &= \beta\_{1\prime} \\ H\_2: y\_i &= \beta\_1 + \beta\_2 x\_{i2\prime} \end{aligned}$$

where *xi*<sup>2</sup> ∼ *N*(1, 100). This setting is similar to the one displayed in Set 1, where the null is properly specified while the alternative is overspecified. However, the models in the setting at hand inadequately specify the distribution of the errors.

Set 6: *Null model has correct mean specification, and the alternative model is overspecified, but both are misspecified with respect to the error distribution, which is a mixture of normals.*

Consider the true data-generating model given by

$$y\_i = \beta\_{0,1} + \epsilon\_{i\prime}$$

with *<sup>i</sup>* ∼ *Z* · *N*(0, 1)+(1 − *Z*)· *N*(0, 50), where *Z* ∼ *Bernoulli*(*π*) with *π* = 0.85. Therefore,

$$\begin{split} \sigma\_0^2 &= 0.85(1) + 0.15(50) \\ &= 8.35. \end{split}$$

For the hypothesis testing setting in Set 6, the null and alternative models are

$$\begin{aligned} H\_1: y\_i &= \beta\_{1\prime} \\ H\_2: y\_i &= \beta\_1 + \beta\_2 x\_{i2\prime} \end{aligned}$$

where *xi*<sup>2</sup> ∼ *N*(1, 100). This setting is similar to the one featured in Set 5. However, the errors in the setting at hand are generated from a mixture of normal distributions.
