**1. Introduction**

The concept of distance or divergence is known since at least the time of Pearson, who, in 1900, considered the classical goodness-of-fit (gof) problem by considering the distance between observed and expected frequencies. The problem for both discrete and discretized continuous distributions have been in the center of attention for the last 100+ years. The classical set-up is the one considered by Pearson where a hypothesized *m*-dimensional multinomial distribution, say *Multi*(*N*, *p*1, ... , *pm*) is examined as being the underlying distributional mechanism for producing a given sample of size *N*. The problem can be extended to examine the homogeneity (in terms of the distributional mechanisms) among two independent samples or the independence among two population characteristics. In all such problems we are dealing with cross tabulations or crosstabs (or contingency tables). Problems of such nature appear frequently in a great variety of fields including biosciences, socio-economic and political sciences, actuarial science, finance, business, accounting, and marketing. The need to establish for instance, whether the mechanisms producing two phenomena are the same or not is vital for altering economic policies, preventing socio-economic crises or enforcing the same economic or financial decisions to groups with similar underlying mechanisms (e.g., retaining the insurance premium in case of similarity or having different premiums in case of diversity). It is important to note that divergence measures play a pivotal role also in statistical inference in continuous settings. Indeed, for example, in [1] the authors investigate the multivariate normal case while in a recent work [2], the modified skew-normal-Cauchy (MSNC) distribution is considered, against normality.

Let us consider the general case of two *m*-dimensional multinomial distributions for which each probability depends on an *s*-dimensional unknown parameter, say *θ* = (*θ*1,..., *θs*). A general family of measures introduced by [3] is the *d<sup>α</sup>* <sup>Φ</sup> family defined by

$$d^{\mathfrak{a}}\_{\Phi}(\mathbf{p}(\theta), \mathbf{q}(\theta)) = \sum\_{i=1}^{m} q\_i(\theta)^{1+a} \Phi\left(\frac{p\_i(\theta)}{q\_i(\theta)}\right); \ a > 0, \ \Phi \in F^\* \tag{1}$$

**Citation:** Meselidis, C.; Karagrigoriou, A. Contingency Table Analysis and Inference via Double Index Measures. *Entropy* **2022**, *24*, 477. https:// doi.org/10.3390/e24040477

Academic Editor: Narayanaswamy Balakrishnan

Received: 20 February 2022 Accepted: 28 March 2022 Published: 29 March 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

where *α* is a positive indicator (index) value, **p**(*θ*) = (*p*1(*θ*), ... , *pm*(*θ*)) and **q**(*θ*) = (*q*1(*θ*), ... , *qm*(*θ*)), *F*<sup>∗</sup> is a class of functions s.t. *F*<sup>∗</sup> = {Φ(·) : Φ(*x*) strictly convex, *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>+, <sup>Φ</sup>(1) = <sup>Φ</sup> (1) = 0, Φ(1) = 0 and by convention, Φ(0/0) = 0 and 0Φ(*p*/0) = lim*x*→∞[Φ(*x*)/*x*]}.

Note that the well known Csiszar family of measures [4] is obtained for the special case where the indicator is taken to be equal to 0 while the classical Kullback–Leibler (KL) distance [5] is obtained if the indicator *α* is equal to 0 and at the same time the function Φ(·) is taken to be Φ(*x*) ≡ Φ*KL*(*x*) = *x* log(*x*) or *x* log(*x*) − *x* + 1.

The function

$$\Phi\_{\lambda}(\mathbf{x}) = \frac{1}{\lambda(\lambda+1)} \left[ \mathbf{x}(\mathbf{x}^{\lambda} - 1) - \lambda(\mathbf{x} - 1) \right] \in F^\*, \lambda \neq 0, -1$$

is associated with the Freeman–Tukey test when *λ* = −1/2, with the recommended Cressie and Read (CR) power divergence [6] when *λ* = 2/3, with the Pearson's chi-squared divergence [7] when *λ* = 1 and with the classical KL distance when *λ* → 0.

Finally, the function

$$\Phi\_{\mathfrak{a}}(\mathfrak{x}) \equiv (\lambda + 1)\Phi\_{\lambda}(\mathfrak{x})|\_{\lambda=\mathfrak{a}} = \frac{1}{\mathfrak{a}}[\mathfrak{x}(\mathfrak{x}^{\mathfrak{a}} - 1) - \mathfrak{a}(\mathfrak{x} - 1)], \mathfrak{a} \neq 0$$

produces the BHHJ or Φ*α*-power divergence [8] given by

$$d^{\mathfrak{a}}\_{\Phi\_{\mathfrak{a}}}(\mathbf{p}(\theta), \mathbf{q}(\theta)) = \sum\_{i=1}^{m} q\_i^{\mathfrak{a}}(\theta) \left\{ q\_i(\theta) - p\_i(\theta) \right\} + \frac{1}{\mathfrak{a}} \sum\_{i=1}^{m} p\_i(\theta) \left\{ p\_i^{\mathfrak{a}}(\theta) - q\_i^{\mathfrak{a}}(\theta) \right\}.$$

Assume that the underlying true distribution of an *m*-dimensional multinomial random variable with *N* experiments, is

$$\mathbf{X} = (X\_1, \dots, X\_m)^\top \sim Mult(\mathbf{N}, \mathbf{p} = (p\_1, \dots, p\_m)^\top\text{)}$$

where **p** is, in general, unknown, belonging to the parametric family

$$\mathcal{P} = \left\{ \mathbf{p}(\boldsymbol{\theta}) = (p\_1(\boldsymbol{\theta}), \dots, p\_m(\boldsymbol{\theta}))^\top \colon \boldsymbol{\theta} = (\theta\_1, \dots, \theta\_s)^\top \in \boldsymbol{\Theta} \subset \mathbb{R}^s \right\}.\tag{2}$$

The sample estimate *p***ˆ** = (*p*ˆ1, ... , *p*ˆ*m*) of **p** is easily obtained by *p*ˆ*<sup>i</sup>* = *xi*/*N* where *xi* is the observed frequency for the *i*-th category (or class).

Divergence measures can be used for estimating purposes by minimizing the associated measure. The classical estimating technique is the one where (1) we take *α* = 0 and Φ(*x*) = Φ*KL*(*x*). Then, the resulting *KL* minimization is equivalent to the classical maximization of the likelihood producing the well-known Maximum Likelihood Estimator (MLE, see ([9], Section 5.2)). In general, the minimization with respect to the parameter of interest of the divergence measure, gives rise to the corresponding minimum divergence estimator (see, e.g., [6,10,11]). For the case where constraints are involved the case associated with Csiszar's family of measures was recently investigated [12]. For further references, please refer to [13–21].

Consider the hypothesis

$$H\_0 \colon \mathbf{p} = \mathbf{p}(\theta\_0) \text{ vs. } H\_1 \colon \mathbf{p} \neq \mathbf{p}(\theta\_0), \ \theta\_0 = (\theta\_{01}, \dots, \theta\_{0s})^\top \in \Theta \subset \mathbb{R}^s \tag{3}$$

where **p** is the vector of the true but unknown probabilities of the underlying distribution and **p**(*θ*0) the vector of the corresponding probabilities of the hypothesized distribution which is unknown and falls within the family of P with the unknown parameters satisfying in general, certain constraints, e.g., of the form *c*(*θ*) = 0, under which the estimation of the parameter will be performed. The purpose of this work is twofold: having as a reference the divergence measure given in (1), we will first propose a general double index divergence class of measures and make inference regarding the parameter estimators involved. Then, we proceed with the hypothesis problem with the emphasis given to the concept of conditional independence. The innovative idea proposed in this work is the duality in choosing among the members of the general class of divergences, one for estimating and one for testing purposes which may not be necessarily, the same. In that sense, we propose a double index divergence test statistic offering the greatest possible range of options, both for the strictly convex function Φ and the indicator value *α* > 0.

Thus, the estimation problem can be examined considering expression (1) using a function Φ<sup>2</sup> ∈ *F*<sup>∗</sup> and an indicator *α*<sup>2</sup> > 0:

$$d\_{\Phi\_2}^{a\_2} \left( \mathbf{p}, \mathbf{p}(\theta) \right) = \sum\_{i=1}^{m} p\_i^{1+a\_2}(\theta) \Phi\_2 \left( \frac{p\_i}{p\_i(\theta)} \right) \tag{4}$$

the minimization of which with respect to the unknown parameter, will produce the restricted minimum (Φ2, *α*2) divergence (rMD) estimator

$$\boldsymbol{\theta}^{r}\_{\left(\Phi\_{2},a\_{2}\right)} = \arg\inf\_{\boldsymbol{\theta}\in\operatorname{\boldsymbol{\Theta}}:\boldsymbol{c}\left(\boldsymbol{\theta}\right)=\boldsymbol{0}} d^{\mathrm{u}\_{2}}\_{\boldsymbol{\Phi}\_{2}}\left(\boldsymbol{\theta},\mathbf{p}\left(\boldsymbol{\theta}\right)\right) \tag{5}$$

for some constraints *c*(*θ*) = 0. Observe that the unknown vector of underlying probabilities has been replaced by the vector of the corresponding sample frequencies *p***ˆ**. Then, the testing problem will be based on

$$d\_{\Phi\_1}^{a\_1} \left( \not p\_\prime \mathbf{p} (\boldsymbol{\theta}\_{(\Phi\_2, a\_2)}^r) \right) = \sum\_{i=1}^m p\_i^{1+a\_1} (\boldsymbol{\theta}\_{(\Phi\_2, a\_2)}^r) \Phi\_1 \left( \frac{\not p\_i}{p\_i (\hat{\boldsymbol{\theta}}\_{(\Phi\_2, a\_2)}^r)} \right) \tag{6}$$

where Φ1(·) and *α*<sup>1</sup> may be different from the corresponding quantities used for the estimation problem in (4). Finally, the duality of the proposed methodology surfaces when the testing problem is explored via the dual divergence test statistic formulated on the basis of the double-*α*-double-Φ divergence given by

$$d^{a\_1}\_{\Phi\_1} \left( \mathfrak{p}\_{\prime} \mathbf{p} (\boldsymbol{\theta}^{\prime}\_{(\Phi\_2, a\_2)}) \right) \tag{7}$$

where Φ1, Φ<sup>2</sup> ∈ *F*<sup>∗</sup> and *α*1, *α*<sup>2</sup> > 0.

The remaining parts of this work are: Section 2 presents the formal definition and the asymptotic properties of the rMD estimator (rMDE). Section 3 deals with the general testing problem with the use of rMDE. The associated set up for the case of three-way contingency tables is developed in Section 4 with a simulation section emphasizing on the conditional independence of three random variables. We close this work with some conclusions.
