*Asymptotic Theory of the Dual Divergence Test Statistic*

Having established the two main results of the work, namely the decomposition of the proposed restricted estimator (Theorem 1) together with its asymptotic properties (Theorem 2), as well as the asymptotic distribution of the associated test statistic under the class of Csiszar *ϕ*-functions (Theorem 3) we continue below extended in a natural way the results of [11] for the dual divergence test statistic. The extensions presented in this section are considered vital due to their practical impication on cross tabulations discussed in Section 4. The proofs will be omitted since both results (Theorems 4 and 5) follow along the lines of previous results (see Theorems 3.4 and 3.9 of [11]). In what follows we adopt the following notation:

$$b = m^{-a\_1},\ p\_{(1)}^{a\_1} = \min\_{i \in \{1, \dots, m\}} p\_i(\theta\_0)^{a\_1},\ p\_{(m)}^{a\_1} = \max\_{i \in \{1, \dots, m\}} p\_i(\theta\_0)^{a\_1}, k = m - 1 - s + \nu.$$

**Theorem 4.** *Under Assumptions* (*A*0)*–*(*A*7) *we have*

$$T^{\alpha\_1}\_{\Phi\_1}(\hat{\theta}^{r}\_{(\Phi\_2,\alpha\_2)}) \xrightarrow[N\to\infty]{L} b\chi^2\_k.$$

**Remark 3.** *Consider the case where Assumption* (*A*6) *is relaxed. Then, the asymptotic distribution of the test statistic Tα*<sup>1</sup> Φ1 (*θ***ˆ** *r* (Φ2,*α*2)) *is estimated to be approximately bχ*<sup>2</sup> *<sup>k</sup> where*

$$b = \frac{p\_{(1)}^{\alpha\_1} + p\_{(m)}^{\alpha\_1}}{2} \tag{15}$$

*as long as α*<sup>2</sup> = 0 *or α*<sup>2</sup> → 0*. For further elaboration of this remark we refer to [11].*

**Remark 4.** *Observe that if <sup>α</sup>*<sup>1</sup> <sup>→</sup> <sup>0</sup> *then <sup>b</sup>* <sup>→</sup> <sup>1</sup> *and the asymptotic distribution becomes <sup>χ</sup>*<sup>2</sup> *<sup>k</sup>, while for α*<sup>1</sup> *away from* 0 *the distribution is proportional to χ*<sup>2</sup> *<sup>k</sup> with proportionality index b* = 1*. However, for not equiprobable models these statements hold true as long as α*<sup>2</sup> *is close to zero.*

Consider now the hypothesis with contiguous alternatives [25,26]

$$H\_0 \colon \mathbf{p} = \mathbf{p}(\theta\_0) \text{ vs. } H\_{1,N} \colon \mathbf{p} = \mathbf{p}(\theta\_0) + \frac{\mathbf{d}}{\sqrt{N}} \tag{16}$$

where **d** is an *m*-dimensional vector of known real values with components *di* satisfying the assumption ∑*<sup>m</sup> <sup>i</sup>*=<sup>1</sup> *di* = 0.

Observe that as *N* tends to infinity, the local contiguous alternative converges to the null hypothesis at the rate *O*(*N*1/2). Alternatives, such as those in (16), are known as *Pitman* *transition alternatives* or *Pitman (local) alternatives* or *local contiguous alternatives* to the null hypothesis *H*<sup>0</sup> [25].

**Theorem 5.** *Under Assumptions* (*A*0)*–*(*A*7) *and for the hypothesis (16) we have*

$$T^{\alpha\_1}\_{\Phi\_1}(\hat{\theta}^{\mathsf{T}}\_{\left(\Phi\_2,\mathsf{A}\_2\right)}) \xrightarrow[N\to\infty]{L} b\chi^2\_k(\mathfrak{F}^{\mathsf{T}}\mathfrak{F})^\perp$$

*which represents a non-central chi-squared distribution with k degrees of freedom and non-centrality parameter <sup>ξ</sup><sup>ξ</sup> for which <sup>ξ</sup>* <sup>=</sup> *diag*(*p*(*θ*0)−1/2)(*<sup>I</sup>* <sup>−</sup> *<sup>J</sup>*(*θ*0)*W*(*θ*0))*d*.

**Remark 5.** *Observe that under Assumption* (*A*6) *(pi* = 1/*m) the asymptotic distribution is independent of* Φ*, α*<sup>1</sup> *and α*2*. As a result the associated power of the test is* Pr(*χ*<sup>2</sup> *<sup>k</sup>* (*ξξ*) <sup>≥</sup> *<sup>χ</sup>*<sup>2</sup> *<sup>k</sup>*,*a*) *where a the* 100(1 − *a*)% *percentile of the distribution. If assumption A*<sup>6</sup> *is relaxed then the α*1 +*p α*1

(*m*)

*distribution is approximately non-central chi-squared with proportionality index b* <sup>=</sup> *<sup>p</sup>* (1) <sup>2</sup> *.*

#### **4. Cross Tabulations and Dual Divergence Test Statistic**

In this section, we try to take advantage of the methodology proposed earlier for the analysis of cross tabulations. In particular we focus on the case of three categorical variables, say *X*,*Y*, and *Z* with corresponding, *I*, *J*, and *K*. Then, assume that the probability mass of a realization of a randomly selected subject is denoted by *pijk*(*θ*) = *Pr*(*X* = *i*,*Y* = *j*, *Z* = *k*) > 0, where here and in what follows *i* = 1, ... , *I*, *j* = 1, ... , *J*, *k* = 1, ... , *K* unless otherwise stated. The associated probability vector is given as **p**(*θ*) = {*pijk*(*θ*)} where

$$p\_{ijk}(\boldsymbol{\theta}) = \begin{cases} \theta\_{ijk\prime} & (i,j,k) \neq (I,J,K) \\ 1 - \sum\_{i=1}^{I} \sum\_{j=1}^{J} \sum\_{k=1}^{K} \theta\_{ijk\prime} & (i,j,k) = (I,J,K) \\ & (i,j,k) \neq (I,J,K) \end{cases}$$

and the parameter space as Θ = {*θijk*,(*i*, *j*, *k*) = (*I*, *J*, *K*)}. The sample estimator of *pijk*(*θ*) is *p*ˆ*ijk* = *nijk*/*N*, where *nijk* is the frequency of the corresponding (*i*, *j*, *k*) cell.

In this set up the dual divergence test statistics is given as

$$T^{a\_1}\_{\Phi\_1} \left( \hat{\theta}^{\prime}\_{(\Phi\_2, a\_2)} \right) = \frac{2N}{\Phi\_1^{\prime\prime}(1)} \sum\_{i=1}^{l} \sum\_{j=1}^{l} \sum\_{k=1}^{K} p\_{ijk} (\hat{\theta}^{\prime}\_{(\Phi\_2, a\_2)})^{1+a} \Phi\_1 \left( \frac{p\_{ijk}}{p\_{ijk} (\theta^{\prime}\_{(\Phi\_2, a\_2)})} \right) \tag{17}$$

where *p*ˆ*ijk* as above and the rMD estimator as

$$\boldsymbol{\hat{\theta}}\_{\left(\Phi\_{2},a\_{2}\right)}^{\prime} = \arg \inf\_{\{\boldsymbol{\theta} \in \Theta \subset \mathbb{R}^{s} \colon f\_{k}(\boldsymbol{\theta}) = 0, k = 1, \dots, \nu\}} \sum\_{i=1}^{I} \sum\_{j=1}^{J} \sum\_{k=1}^{K} p\_{ijk} (\boldsymbol{\theta})^{1 + a\_{2}} \boldsymbol{\Phi}\_{2} \left(\frac{\hat{p}\_{ijk}}{p\_{ijk}(\boldsymbol{\theta})}\right) . \tag{18}$$

For *α*1, *α*<sup>2</sup> = 0 and special cases of the functions Φ<sup>1</sup> and Φ2, classical restricted minimum divergence estimators and associated test statistics can be derived from (18) and (17), respectively. For example, for *α*1, *α*<sup>2</sup> = 0, and Φ1, Φ<sup>2</sup> = Φ*KL* the likelihood ratio test statistic with the restricted maximum likelihood estimator (*G*2(*θ***ˆ** *r* )) can be derived, while for Φ1, Φ<sup>2</sup> = Φ*<sup>λ</sup>* and *λ* = 1 we obtain the chi-squared test statistic with the restricted minimum chi-squared estimator (*X*2(*θ***ˆ** *r <sup>X</sup>*<sup>2</sup> )). For Φ1, Φ<sup>2</sup> = Φ*<sup>λ</sup>* and *λ* = 2/3 the dual divergence test statistic reduces to the power divergence test statistic with the restricted minimum power divergence estimator (*CR*(*θ***ˆ** *r CR*)) whereas for *λ* = −1/2 reduces to the Freeman–Tukey test statistic with the restricted minimum Freeman–Tukey estimator (*FT*(*θ***ˆ** *r FT*)).

The hypothesis of conditional independence between *X*, *Y*, and *Z* is given for any triplet *i*, *j*, *k* by

$$H\_0: p\_{ijk}(\theta\_0) = \frac{p\_{i\*k}(\theta\_0)p\_{\*jk}(\theta\_0)}{p\_{\*\*k}(\theta\_0)}, \ \theta\_0 \in \Theta \text{ unknotn}.$$

where

$$p\_{i \ast k}(\mathfrak{e}\_0) = \sum\_{j=1}^{l} p\_{ijk}(\mathfrak{e}\_0), \ p\_{\ast j\vec{k}}(\mathfrak{e}\_0) = \sum\_{i=1}^{l} p\_{i\vec{j}k}(\mathfrak{e}\_0) \text{ and } \ p\_{\ast \ast k}(\mathfrak{e}\_0) = \sum\_{i=1}^{l} \sum\_{j=1}^{l} p\_{i\vec{j}k}(\mathfrak{e}\_0).$$

Under the (*I* − 1)(*J* − 1)*K* constrained functions

$$f\_{ijk}(\theta) = p\_{11k}(\theta)p\_{ijk}(\theta) - p\_{1jk}(\theta)p\_{i1k}(\theta) = 0$$

*i* = 2, ... , *I*, *j* = 2, ... , *J*, *k* = 1, ... , *K* the above *H*<sup>0</sup> hypothesis with *θ*<sup>0</sup> unknown, becomes

$$H\_0: \mathbf{p} = \mathbf{p}(\theta\_0), \text{ for } \theta\_0 \in \Theta\_{0\prime}$$

$$\text{where } \oplus\_0 = \{ \theta \in \Theta : f\_{ijk}(\theta) = 0, \text{ } i = 2, \dots, I, \text{ } j = 2, \dots, l \text{ }, k = 1, \dots, K \}.$$

**Remark 6.** *For practical purposes, the choice of the values of the indices is motivated by the work of [8] where, in an attempt to achieve a compromise between robustness and efficiency of estimators, they recommended the use of small values in the* (0, 1) *region. In the following subsection, our analysis will reconfirm their findings since as it will be seen, values of both indices close to (0) (than to one (1)) will be found to be associated with a good performance not only in terms of estimation but also in terms of goodness of fit as it will be reflected in the size and the power of the test.*

#### *Simulation Study*

In this simulation study, we use the rMD estimator and the associated dual divergence test statistic for the analysis of cross tabulations. Specifically, we are going to compare in terms of size and power classical tests with those that can be derived through the proposed methodology, for the problem of conditional independence of three random variables in contingency tables. We test the hypothesis of conditional independence for a 2 × 2 × 2 contingency table, thus in this case we have *m* = 8 probabilities of the multinomial model, *s* = 7 unknown parameters to estimate and two constraint functions (*ν* = 2) which are given by

$$f\_{221}(\theta) = \theta\_{111}\theta\_{221} - \theta\_{121}\theta\_{211} \text{ and } f\_{222}(\theta) = \theta\_{112} \left(1 - \sum\_{i=1}^{2} \sum\_{j=1}^{2} \sum\_{k=1}^{2} \theta\_{ijk} \right) - \theta\_{122}\theta\_{212}.$$

For a better understanding of the behaviour of the dual divergence test statistic given in (17) we compare it with the four classical tests-of-fit mentioned earlier in Section 4, namely with the *G*2(*θ***ˆ** *r* ), *X*2(*θ***ˆ** *r <sup>X</sup>*<sup>2</sup> ), *CR*(*θ***ˆ** *r CR*) and *FT*(*θ***ˆ** *r FT*). The proposed test *<sup>T</sup>α*<sup>1</sup> Φ1 - *θ***ˆ** *r* (Φ2,*α*2) is applied for Φ<sup>1</sup> = Φ*α*<sup>1</sup> , Φ<sup>2</sup> = Φ*α*<sup>2</sup> and six different values of *α*<sup>1</sup> and *α*2, *α*1, *α*<sup>2</sup> = 10<sup>−</sup>7, 0.01, 0.05, 0.10, 0.50, and 1.50. Note that, the critical values used in this simulation study, are the asymptotic critical values based on the asymptotic distribution *bχ*<sup>2</sup> <sup>2</sup> with *b* as in (15) for the double index family of test statistics, and the *χ*<sup>2</sup> <sup>2</sup> for the classical test statistics. For the analysis we used 100,000 simulations and sample sizes equal to *n* = 20, 25 (small sample sizes) and *n* = 40, 45 (moderate sample sizes).

In this study, we have used the model previously considered by [27] given by


where 0 ≤ *<sup>w</sup>* < 1 and *<sup>π</sup>ijk* = *pi*∗∗ × *<sup>p</sup>*∗*j*<sup>∗</sup> × *<sup>p</sup>*∗∗*<sup>k</sup> <sup>i</sup>*, *<sup>j</sup>*, *<sup>k</sup>* = 1, 2 with


For *w* = 0 we take the model under the null hypothesis of conditional independence while for values *w* = 0 we take the models under the alternative hypotheses. We considered the following values of *w* = 0.00, 0.30, 0.60, and 0.90. Note that the larger the value of *w* the more we deviate from the null model. For the simulation study, we used the R software [28], while for the constrained optimization the **auglag** function from the **nloptr** package [29].

From Table 1, we can observe that in terms of size the performance of the *Tα*<sup>1</sup> Φ1 (*θ***ˆ** *r* (Φ2,*α*2)) is adequate for values of *α*1, *α*<sup>2</sup> ≤ 0.5 both for small and moderate sample sizes. In addition, we can see that for *<sup>α</sup>*<sup>1</sup> <sup>≤</sup> 0.10, *<sup>T</sup>α*<sup>1</sup> Φ1 (*θ***ˆ** *r* (Φ2,*α*2)) appears to be liberal while for *α*<sup>1</sup> ≥ 0.5 appears to be conservative. We also note that the size becomes smaller as *α*<sup>1</sup> and *α*<sup>2</sup> increase with *α*<sup>1</sup> ≥ *α*2. Table 2 provides the size of the classical tests-of-fit from where we can observe that *CR*(*θ***ˆ** *r CR*) has the best performance among all competing tests for every sample size. In contrast, *FT*(*θ***ˆ** *r FT*) has the worst performance among all competing tests and appears to be very liberal. Furthermore, *X*2(*θ***ˆ** *r <sup>X</sup>*<sup>2</sup> ) appears to be conservative while *G*2(*θ***ˆ** *r* ) appears to be liberal. Note that for *<sup>α</sup>*<sup>1</sup> <sup>∈</sup> [0.01, 0.5] and *<sup>α</sup>*<sup>2</sup> <sup>≤</sup> 0.10, *<sup>T</sup>α*<sup>1</sup> Φ1 (*θ***ˆ** *r* (Φ2,*α*2)) behaves better than the *G*2(*θ***ˆ** *r* ) test statistic and its performance is quite close to the performance of the *X*2(*θ***ˆ** *r <sup>X</sup>*<sup>2</sup> ).

**Table 1.** Size (*w* = 0.00) calculations (%) of the *Tα*<sup>1</sup> Φ<sup>1</sup> (*θ***ˆ** *r* (Φ2,*α*2)) test statistic for sample sizes *n* = 20, 25, 40, 45. Sizes that satisfy Dale's criterion are presented in bold.


In order to examine the closeness of the estimated (true) size to the nominal size *α* = 0.05 we consider the criterion given by Dale [30]. The criterion involves the following inequality

$$|\text{logit}(1-\mathbb{A}\_n) - \text{logit}(1-\alpha)| \le d \tag{19}$$

where logit(*p*) = log(*p*/(1 − *p*)) and *α*ˆ *<sup>n</sup>* is the estimated (true) size. The estimated (true) size is considered to be close to the nominal size if (19) is satisfied with *d* = 0.35. Note that in this situation the estimated (true) size is close to the nominal one if *α*ˆ *<sup>n</sup>* ∈ [0.0357, 0.0695] and is presented in Tables 1 and 2 in bold. This criterion has been used previously among others by [27,31].

Regarding the proposed test we can see that for small sample sizes the estimated (true) size is close to the nominal for *α*<sup>1</sup> ∈ [0.10, 0.50] and *α*<sup>2</sup> ≤ 0.10 while for moderate sample sizes for *<sup>α</sup>*<sup>1</sup> <sup>∈</sup> [10−7, 0.50] and *<sup>α</sup>*<sup>2</sup> <sup>≤</sup> 0.10. With reference to the classical tests-of-fit we can observe that the size of the *CR*(*θ***ˆ** *r CR*) is close to the nominal for every sample size whereas the size of *G*2(*θ***ˆ** *r* ) and *X*2(*θ***ˆ** *r <sup>X</sup>*<sup>2</sup> ) is close only for moderate sample sizes. Finally, we note that the estimated (true) size of *FT*(*θ***ˆ** *r FT*) fails to be close to the nominal both for small and moderate sample sizes.

In Tables 3–5, we provide the results regarding the power of the proposed family of test statistics for the three alternatives and sample sizes *n* = 20, 25, 40, 45, while Table 2 provides the results regarding the power of the classical tests-of-fit. The performance tends to be better as we deviate from the null model and as the sample size increases both for the classical and the proposed tests.

**Table 2.** Size (*w* = 0.00) and power (*w* = 0.30, 0.60, 0.90) calculations (%) for the classical tests-of-fit. Sizes that satisfy Dale's criterion are presented in bold.


**Table 3.** Power (*w* = 0.30) calculations (%) of the *Tα*<sup>1</sup> Φ<sup>1</sup> (*θ***ˆ** *r* (Φ2,*α*2)) test statistic for sample sizes *n* = 20, 25, 40, 45.



**Table 4.** Power (*w* = 0.60) calculations (%) of the *Tα*<sup>1</sup> Φ<sup>1</sup> (*θ***ˆ** *r* (Φ2,*α*2)) test statistic for sample sizes *n* = 20, 25, 40, 45.

**Table 5.** Power (*w* = 0.90) calculations (%) of the *Tα*<sup>1</sup> Φ<sup>1</sup> (*θ***ˆ** *r* (Φ2,*α*2)) test statistic for sample sizes *n* = 20, 25, 40, 45.


As general comments regarding the behaviour of the proposed and the classical testsof-fit in terms of power we state that the best results for the *Tα*<sup>1</sup> Φ1 (*θ***ˆ** *r* (Φ2,*α*2)) are obtained for small values of *α*<sup>1</sup> in the range (0, 0.1] and large values of *α*<sup>2</sup> with *α*<sup>1</sup> ≤ *α*2. Note that although in terms of power results become better as *α*<sup>2</sup> increases in terms of size these are adequate only for *α*<sup>2</sup> ≤ 0.5. In addition, we can observe that the performance of *Tα*<sup>1</sup> Φ1 (*θ***ˆ** *r* (Φ2,*α*2)) is better than the *CR*(*θ***<sup>ˆ</sup>** *r CR*) and *X*2(*θ***ˆ** *r <sup>X</sup>*<sup>2</sup> ) for every alternative and every sample size for *<sup>α</sup>*<sup>1</sup> <sup>≤</sup> 0.1 and *<sup>α</sup>*<sup>2</sup> <sup>≤</sup> 0.5 and slightly better than *<sup>G</sup>*2(*θ***<sup>ˆ</sup>** *r* ) for small values of *α*<sup>1</sup> and large values of *α*2, for example for *α*<sup>1</sup> = 0.01 and *α*<sup>2</sup> = 0.50. Furthermore, we can observe that for *α*<sup>1</sup> = 0.1 and *α*<sup>2</sup> ≤ 0.1 the size of the test is better than the size of the *G*2(*θ***ˆ** *r* ) and slightly worst form the size of the *CR*(*θ***ˆ** *r CR*) and *X*2(*θ***ˆ** *r <sup>X</sup>*<sup>2</sup> ) test statistics while its power is quite better than the power of the *CR*(*θ***ˆ** *r CR*) and *X*2(*θ***ˆ** *r <sup>X</sup>*<sup>2</sup> ) and slightly worst than the *G*2(*θ***ˆ** *r* ). Additionally, we can see that as *α*<sup>1</sup> and *α*<sup>2</sup> tend to 0 the behaviour of the *Tα*<sup>1</sup> Φ1 (*θ***ˆ** *r* (Φ2,*α*2)) test statistic coincides with the *<sup>G</sup>*2(*θ***<sup>ˆ</sup>** *r* ) test both in terms of size and power as it was expected.

In order to attain a better insight about the behaviour of the test statistics, we apply Dale's criterion, not only for the nominal size *α* = 0.05, but also for a range of nominal sizes that are of interest. Based on the previous analysis, beside the classical tests, we will focus our interest on the *T*0.05 <sup>Φ</sup><sup>1</sup> (*θ***<sup>ˆ</sup>** *r* (Φ2,0.05)), *<sup>T</sup>*0.10 <sup>Φ</sup><sup>1</sup> (*θ***<sup>ˆ</sup>** *r* (Φ2,0.10)), and *<sup>T</sup>*0.20 <sup>Φ</sup><sup>1</sup> (*θ***<sup>ˆ</sup>** *r* (Φ2,0.20)). The following simplified notation is used in every Figure, FT <sup>≡</sup> *FT*(*θ***<sup>ˆ</sup>** *r FT*), ML <sup>≡</sup> *<sup>G</sup>*2(*θ***<sup>ˆ</sup>** *r* ), CR <sup>≡</sup> *CR*(*θ***<sup>ˆ</sup>** *r CR*), Pe <sup>≡</sup> *<sup>X</sup>*2(*θ***<sup>ˆ</sup>** *r <sup>X</sup>*<sup>2</sup> ), T1 <sup>≡</sup> *<sup>T</sup>*0.05 <sup>Φ</sup><sup>1</sup> (*θ***<sup>ˆ</sup>** *r* (Φ2,0.05)), T2 <sup>≡</sup> *<sup>T</sup>*0.10 <sup>Φ</sup><sup>1</sup> (*θ***<sup>ˆ</sup>** *r* (Φ2,0.10)), and T3 <sup>=</sup> *<sup>T</sup>*0.20 <sup>Φ</sup><sup>1</sup> (*θ***<sup>ˆ</sup>** *r* (Φ2,0.20)). From Figure 1a, we can see that for small sample sizes (*n* = 25) *T*0.20 <sup>Φ</sup><sup>1</sup> (*θ***<sup>ˆ</sup>** *r* (Φ2,0.20)) and *CR*(*θ***<sup>ˆ</sup>** *r CR*) satisfy Dale's criterion for every nominal size while *T*0.10 <sup>Φ</sup><sup>1</sup> (*θ***<sup>ˆ</sup>** *r* (Φ2,0.10)) and *<sup>X</sup>*2(*θ***<sup>ˆ</sup>** *r <sup>X</sup>*<sup>2</sup> ) for nominal sizes greater than 0.03 and 0.06, respectively. Note that the dashed line in Figure 1 denotes the situation in which the estimated (true) size equals to the nominal size and thus lines that lie above this reference line refer to liberal tests while those that lie below to conservative ones. On the other hand, for moderate sample sizes (*n* = 45) all chosen test statistics satisfy Dale's criterion except *FT*(*θ***ˆ** *r FT*).

Taking into account the fact that the actual size of each test differs from the targeted nominal size, we have to make an adjustment in order to proceed further with the comparison of the tests in terms of power. We focus our interest in those tests that satisfy Dale's criterion and follow the method proposed in [32] which involves the so-called receiver operating characteristic (ROC) curves. In particular, let *G*(*t*) = *Pr*(*T* ≥ *t*) be the survivor function of a general test statistic *T*, and *c* = inf{*t* : *G*(*t*) ≤ *α*} be the critical value, then ROC curves can be formulated by plotting the power *G*1(*c*) against the size *G*0(*c*) for various values of the critical value *c*. Note that with *G*0(*t*) we denote the distribution of the test statistic under the null hypothesis and with *G*1(*t*) under the alternative.

**Figure 1.** Estimated (true) sizes against nominal sizes. The shaded area refers to Dale's criterion. (**a**) *n* = 25. (**b**) *n* = 45.

Since results are similar for every alternative we restrict ourselves to *w* = 0.60 which refers to an alternative that is neither too close nor too far from the null. For small sample sizes (*n* = 25) results are presented in Figure 2, where we can see that the proposed test is superior from the classical tests-of-fit in terms of power. However, for moderate sample sizes (*n* = 45) we can observe in Figure 3 that *G*2(*θ***ˆ** *r* ) has the best performance among all competing tests followed by the proposed test-of-fit.

**Figure 2.** (**a**) Empirical ROC curves for *n* = 25. (**b**) The same curves magnified over a relevant range of empirical sizes.

**Figure 3.** (**a**) Empirical ROC curves for *n* = 45. (**b**) The same curves magnified over a relevant range of empirical sizes.

From the conducted analysis we conclude that regarding the proposed test there is a trade off between size and power for different choices of the indices *α*<sup>1</sup> and *α*2. In particular, we can see that as *α*<sup>1</sup> increases the size becomes smaller in the expense of smaller power, while as *α*<sup>2</sup> increases the power becomes better and the tests more liberal. In conclusion, we could state that for values of *α*<sup>1</sup> and *α*<sup>2</sup> in the range (0.05, 0.25) the resulting test statistic provides a fair balance between size and power which makes it an attractive alternative to the classical tests-of-fit where for small sample sizes larger values of the indices are preferable whereas for moderate sample sizes, smaller ones are recommended.

#### **5. Conclusions**

In this work, a general divergence family of test statistics is presented for hypothesis testing problems as in (3), under constraints. For estimating purposes, we introduce, discuss and use the rMD (restricted minimum divergence) estimator presented in (8). The proposed double index (dual) divergence test statistic involves two pairs of elements, namely (Φ2, *α*2) to be used for the estimation problem and (Φ1, *α*1) to be used for the testing problem. The duality refers to the fact that the two pairs may or may not be the same providing the researcher with the greatest possible flexibility.

The asymptotic distribution of the dual divergence test statistic is found to be proportional to the chi-squared distribution irrespectively of the nature of the multinomial model, as long as the values of the two indicators involved are relative close to zero (less than 0.5). Such values are known to provide a satisfactory balance between efficiency and robustness (see, for instance, [8] or [3]).

The methodology developed in this work can be used in the analysis of contingency tables which is applicable in various scientific fields: biosciences, such as genetics [33] and epidemiology [34]; finance, such as the evaluation of investment effectiveness or business performance [35]; insurance science [36]; or socioeconomics [37]. This work concludes with a comparative simulation study between classical test statistics and members of the proposed family, where the focus is placed on the conditional independence of three random variables. Results indicate that, by selecting wisely the values of the *α*<sup>1</sup> and *α*<sup>2</sup> indices, we can derive a test statistic that can be thought of as a powerful and reliable alternative to the classical tests-of-fit especially for small sample sizes.

**Author Contributions:** Conceptualization, A.K. and C.M.; data curation, C.M.; methodology, A.K. and C.M; software, C.M.; formal analysis, A.K. and C.M.; writing—original draft preparation, C.M.; writing—review and editing, A.K. and C.M.; supervision, A.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** The research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** The authors wish to express their appreciation to the anonymous referees and the Associated Editor for their valuable comments and suggestions. The authors wish also to express their appreciation to the professor A. Batsidis of the University of Ioannina for bringing to their attention citation [31] which helped greatly the comparative analysis performed in this work. This work was completed as part of the first author PhD thesis and falls within the research activities of the Laboratory of Statistics and Data Analysis of the University of the Aegean.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

The Birch regularity conditions mentioned in Assumption (A5) of Section 2 are stated below (for details see [22])


$$p\_i(\boldsymbol{\theta}) = p\_i(\boldsymbol{\theta}\_0) + \sum\_{j=1}^s (\theta\_j - \theta\_{0j}) \frac{\partial p\_i(\theta\_0)}{\partial \theta\_j} + o(\|\boldsymbol{\theta} - \theta\_0\|), \ i = 1, \dots, m$$

as *θ* → *θ*0.

4. The Jacobian matrix

$$\mathbf{J}(\theta\_0) = \left(\frac{\partial \mathbf{p}(\theta)}{\partial \theta}\right)\_{\theta = \theta\_0} = \left(\frac{\partial p\_i(\theta\_0)}{\partial \theta\_j}\right)\_{\substack{i = 1, \dots, m \\ j = 1, \dots, s}}$$

is of full rank;


## **References**

