**2. Restricted Minimum (Φ,** *α***)-Power Divergence Estimator**

In what follows, we will provide the formal definition and the expansion of the rMD estimator and prove its asymptotic normality. The assumptions required for establishing the results of this section for the rMD estimator under constraints, are provided below:

#### **Assumption 1.**


$$\mathcal{Q}(\boldsymbol{\theta}\_{0}) = \left(\frac{\partial f\_{k}(\boldsymbol{\theta}\_{0})}{\partial \boldsymbol{\theta}\_{j}}\right)\_{\substack{k=1,\ldots,\nu\\j=1,\ldots,s}} \\ \text{and} \quad \mathcal{J}(\boldsymbol{\theta}\_{0}) = \left(\frac{\partial p\_{i}(\boldsymbol{\theta}\_{0})}{\partial \boldsymbol{\theta}\_{j}}\right)\_{\substack{i=1,\ldots,m\\j=1,\ldots,s}}$$

*are of full rank;*


**Definition 1.** *Under assumptions* (*A*0)*–*(*A*3) *the rMD estimator of θ*<sup>0</sup> *is any vector in* Θ*, such that*

$$\boldsymbol{\theta}\_{(\Phi,a)}^{\mathsf{r}} = \arg\inf\_{\{\boldsymbol{\theta} \in \Theta \subset \mathbb{R}\_{-}^{s} \colon f\_{k}(\boldsymbol{\theta}) = 0, k = 1, \dots, \mathsf{v}\}} d\_{\boldsymbol{\Phi}}^{\mathsf{u}}(\boldsymbol{\tilde{\boldsymbol{\theta}}}, \boldsymbol{\mathcal{p}}(\boldsymbol{\theta})). \tag{8}$$

In order to derive the decomposition of *θ***ˆ** *r* (Φ,*α*) the Implicit Function Theorem (IFT) is exploited according to which if a function has an invertible derivative at a point then itself is invertible in a neighbourhood of this point but it cannot be expressed in closed form [23].

**Theorem 1.** *Under Assumptions* (*A*0)*–*(*A*5)*, the rMD estimator of θ*<sup>0</sup> *is such that*

$$\begin{split} \boldsymbol{\theta}\_{\left(\Phi,\mathfrak{a}\right)}^{\prime} &= \boldsymbol{\theta}\_{0} + H(\boldsymbol{\theta}\_{0}) \Big( \boldsymbol{\mathcal{B}}(\boldsymbol{\theta}\_{0})^{\top} \boldsymbol{\mathcal{B}}(\boldsymbol{\theta}\_{0}) \Big)^{-1} \boldsymbol{\mathcal{B}}(\boldsymbol{\theta}\_{0})^{\top} \operatorname{diag} \left( \boldsymbol{\mathcal{p}}(\boldsymbol{\theta}\_{0})^{\boldsymbol{a}^{\prime}/2} \right) \times \\ &\quad \times \operatorname{diag} \left( \boldsymbol{\mathcal{p}}(\boldsymbol{\theta}\_{0})^{-1/2} \right) (\boldsymbol{\mathcal{p}} - \boldsymbol{\mathcal{p}}(\boldsymbol{\theta}\_{0})) + o(\left\| \boldsymbol{\mathcal{P}} - \boldsymbol{\mathcal{p}}(\boldsymbol{\theta}\_{0}) \right\|) \end{split} \tag{9}$$

*where θ***ˆ** *r* (Φ,*α*) *is unique in a neighbourhood of θ*<sup>0</sup> *and*

$$\begin{aligned} H(\boldsymbol{\theta}\_{0}) &= I - \left( \mathbf{B}(\boldsymbol{\theta}\_{0})^{\top} \mathbf{B}(\boldsymbol{\theta}\_{0}) \right)^{-1} \mathbf{Q}(\boldsymbol{\theta}\_{0})^{\top} \times \\ & \quad \times \left( \mathbf{Q}(\boldsymbol{\theta}\_{0}) \left( \mathbf{B}(\boldsymbol{\theta}\_{0})^{\top} \mathbf{B}(\boldsymbol{\theta}\_{0}) \right)^{-1} \mathbf{Q}(\boldsymbol{\theta}\_{0})^{\top} \right)^{-1} \mathbf{Q}(\boldsymbol{\theta}\_{0}), \end{aligned}$$

$$\mathbf{B}(\boldsymbol{\theta}\_{0}) = \operatorname{diag} (\mathbf{p}(\boldsymbol{\theta}\_{0})^{\boldsymbol{\alpha}/2}) \mathbf{A}(\boldsymbol{\theta}\_{0}), \text{ while } \mathbf{A}(\boldsymbol{\theta}\_{0}) = \operatorname{diag} (\mathbf{p}(\boldsymbol{\theta}\_{0})^{-1/2}) \mathbf{J}(\boldsymbol{\theta}\_{0}).$$

**Proof.** Let *V* be a neighbourhood of *θ*<sup>0</sup> on which **p**(·): Θ →P⊂ *l <sup>m</sup>* has continuous second partial derivatives where *l <sup>m</sup>* is the interior of the unit cube of dimension *m*. Let

$$\mathbf{F} = (F\_{1\prime}, \dots, F\_{\nu+s}) \colon l^{\mathrm{m}} \times \mathbb{R}^{\nu+s} \to \mathbb{R}^{\nu+s} \text{ }$$

with

$$F\_{\vec{\mathcal{I}}}(\mathbf{p}, \lambda, \boldsymbol{\theta}) = \begin{cases} \begin{array}{c} f\_{\vec{\mathcal{I}}}(\boldsymbol{\theta}), & \boldsymbol{j} = 1, \dots, \boldsymbol{\nu} \\ \begin{array}{c} \frac{\partial d\_{\boldsymbol{\Theta}}^{u}(\mathbf{p}, \mathbf{p}(\boldsymbol{\theta}))}{\partial \boldsymbol{\theta}\_{\boldsymbol{j} - \boldsymbol{\nu}}} + \sum\_{k=1}^{\boldsymbol{\nu}} \lambda\_{k} \frac{\partial f\_{k}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\_{\boldsymbol{j} - \boldsymbol{\nu}}}, & \boldsymbol{j} = \boldsymbol{\nu} + 1, \dots, \boldsymbol{\nu} + s. \end{array} \end{cases}$$

where (**p**, *λ*, *θ*)=(*p*1, ... , *pm*, *λ*1, ... , *λν*, *θ*1, ... , *θs*) and *λk*, *k* = 1, ... , *ν* are the coefficients of the constraints.

It holds that

$$F\_j(p\_1(\theta\_0), \dots, p\_m(\theta\_0), 0, \dots, 0, \theta\_{01}, \dots, \theta\_{0s}) = 0, \ j = 1, \dots, \nu + s$$

and by denoting *γ* = (*γ*1,..., *γν*+*s*)=(*λ*1,..., *λν*, *θ*1,..., *θs*), the matrix

$$\frac{\partial \mathbf{F}}{\partial \boldsymbol{\gamma}} = \left(\frac{\partial F\_{\boldsymbol{j}}}{\partial \boldsymbol{\gamma}\_{k}}\right)\_{\boldsymbol{j}=1,\ldots,\boldsymbol{\nu}+\boldsymbol{s}} = \begin{pmatrix} \mathbf{0}\_{\boldsymbol{\nu}\times\boldsymbol{\nu}} & \mathbf{Q}(\boldsymbol{\theta}\_{0}) \\ \mathbf{Q}(\boldsymbol{\theta}\_{0})^{\top} & \boldsymbol{\Phi}''(1)\mathbf{B}(\boldsymbol{\theta})^{\top}\mathbf{B}(\boldsymbol{\theta}) \end{pmatrix}.$$

is nonsingular at (**p**, *λ*, *θ*)=(**p**(*θ*0), *γ*0)=(*p*1(*θ*0), ... , *pm*(*θ*0), 0, ... , 0, *θ*01, ... , *θ*0*s*) with *γ*<sup>0</sup> = (**0***ν*, *θ*0).

Using the IFT a neighbourhood *U* of (**p**(*θ*0), *γ*0) exists, such that *∂***F**/*∂γ* is nonsingular and a unique differentiable function *γ*<sup>∗</sup> = (*λ*∗, *θ*∗) : *A* ⊂ *l <sup>m</sup>* <sup>→</sup> <sup>R</sup>*ν*+*<sup>s</sup>* , such that

**p**(*θ*0) ∈ *A* and {(**p**, *γ*) ∈ *U* : **F**(**p**, *γ*) = 0} = {(**p**, *γ*∗(**p**)): **p** ∈ *A*} and *γ*∗(**p**(*θ*0)) = (*λ*∗(**p**(*θ*0)), *θ*∗(**p**(*θ*0))) = *γ*0. By the chain rule and for **p** = **p**(*θ*0) we obtain

$$\frac{\partial \mathbf{F}}{\partial \mathbf{p}(\theta\_0)} + \frac{\partial \mathbf{F}}{\partial \gamma\_0} \frac{\partial \gamma\_0}{\partial \mathbf{p}(\theta\_0)} = 0.$$

Then

$$\frac{\partial \theta\_0}{\partial \mathbf{p}(\theta\_0)} = \begin{pmatrix} \mathbf{E}(\theta\_0) \\ \mathbf{W}(\theta\_0) \end{pmatrix}.$$

where

$$\begin{split} \mathbf{E}(\boldsymbol{\theta}\_{0}) &= \boldsymbol{\Phi}^{\prime\prime}(\boldsymbol{1}) \Big( \mathbf{Q}(\boldsymbol{\theta}\_{0}) \Big( \mathbf{B}(\boldsymbol{\theta}\_{0})^{\top} \mathbf{B}(\boldsymbol{\theta}\_{0}) \Big)^{-1} \mathbf{Q}(\boldsymbol{\theta}\_{0})^{\top} \Big)^{-1} \times \\ & \times \mathbf{Q}(\boldsymbol{\theta}\_{0}) \Big( \mathbf{B}(\boldsymbol{\theta}\_{0})^{\top} \mathbf{B}(\boldsymbol{\theta}\_{0}) \Big)^{-1} \mathbf{B}(\boldsymbol{\theta}\_{0})^{\top} \operatorname{diag}(\mathbf{p}(\boldsymbol{\theta}\_{0})^{a/2}) \operatorname{diag}(\mathbf{p}(\boldsymbol{\theta}\_{0})^{-1/2}) \end{split}$$

and

$$\mathbf{W}(\boldsymbol{\theta}\_{0}) = \mathbf{H}(\boldsymbol{\theta}\_{0}) \left( \mathbf{B}(\boldsymbol{\theta}\_{0})^{\top} \mathbf{B}(\boldsymbol{\theta}\_{0}) \right)^{-1} \mathbf{B}(\boldsymbol{\theta}\_{0})^{\top} diag(\mathbf{p}(\boldsymbol{\theta}\_{0})^{a/2}) diag(\mathbf{p}(\boldsymbol{\theta}\_{0})^{-1/2}) \tag{10}$$

since

$$\frac{\partial \mathbf{F}}{\partial \mathbf{p}(\boldsymbol{\theta}\_{0})} = \begin{pmatrix} \mathbf{0}\_{\boldsymbol{\nu} \times \boldsymbol{m}} \\ -\boldsymbol{\Phi}''(\mathbf{1})\mathbf{B}(\boldsymbol{\theta}\_{0})^{\top}\operatorname{diag}(\mathbf{p}(\boldsymbol{\theta}\_{0})^{\boldsymbol{a}/2})\operatorname{diag}(\mathbf{p}(\boldsymbol{\theta}\_{0})^{-1/2}) \end{pmatrix}.$$

Expanding *θ*∗(**p**) around **p**(*θ*0) and using (10) gives, for *θ*∗(**p**(*θ*0)) = *θ*0,

$$\begin{split} \boldsymbol{\theta}^\*(\mathbf{p}) &= \boldsymbol{\theta}\_0 + \mathbf{H}(\boldsymbol{\theta}\_0) \Big( \mathbf{B}(\boldsymbol{\theta}\_0)^\top \mathbf{B}(\boldsymbol{\theta}\_0) \Big)^{-1} \mathbf{B}(\boldsymbol{\theta}\_0)^\top \operatorname{diag}(\mathbf{p}(\boldsymbol{\theta}\_0)^{\mathbf{a}/2}) \times \\ &\times \operatorname{diag}(\mathbf{p}(\boldsymbol{\theta}\_0)^{-1/2}) (\boldsymbol{\theta} - \mathbf{p}(\boldsymbol{\theta}\_0)) + o(||\boldsymbol{\theta} - \mathbf{p}(\boldsymbol{\theta}\_0)||). \end{split}$$

Since *<sup>p</sup>***<sup>ˆ</sup>** *<sup>p</sup>* −→ **p**(*θ*0) eventually *p***ˆ** ∈ *A* and then *γ*∗(*p***ˆ**)=(*λ*∗(*p***ˆ**), *θ*∗(*p***ˆ**)) is the unique solution of the system

$$\begin{aligned} f\_k(\boldsymbol{\theta}) &= 0, & k &= 1, \dots, \nu \\ \frac{\partial d^u\_{\boldsymbol{\Phi}}(\mathbf{p}\_{\prime} \mathbf{p}(\boldsymbol{\theta}))}{\partial \boldsymbol{\theta}\_j} + \sum\_{k=1}^{\nu} \lambda\_k \frac{\partial f\_k(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\_j} &= 0, & j &= 1, \dots, s \end{aligned}$$

and (*p***ˆ**, *<sup>γ</sup>*∗(*p***ˆ**)) <sup>∈</sup> *<sup>U</sup>*. Hence, *<sup>θ</sup>*∗(*p***ˆ**) coincides with rMDE *<sup>θ</sup>***<sup>ˆ</sup>** *r* (Φ,*α*) given in (9).

The theorem below establishes the asymptotic normality of rMDE which is a straightforward extension of Theorem 2.4 [11] since by the Central Limit Theorem we know that

$$\sqrt{N}(\mathfrak{p} - \mathfrak{p}(\theta\_0)) \xrightarrow[N \to \infty]{L} \mathrm{N}(\mathfrak{0}, \Sigma\_{\mathfrak{p}(\theta\_0)}) \tag{11}$$

.

with the asymptotic variance-covariance matrix **Σp**(*θ*0) given by *diag*(**p**(*θ*0)) − **p**(*θ*0)**p**(*θ*0).

**Theorem 2.** *Under Assumptions* (*A*0)*–*(*A*5)*, by (11) and for W*(*θ*0) *given in (10), the asymptotic distribution of rMDE is the s-dimensional Normal distribution given by*

$$\sqrt{N}(\boldsymbol{\theta}\_{(\Phi,a)}^{\boldsymbol{r}} - \boldsymbol{\theta}\_{0}) \xrightarrow[N \to \infty]{L} \mathrm{N}\_{\boldsymbol{s}}(\boldsymbol{\theta}, \boldsymbol{\mathsf{W}}(\boldsymbol{\theta}\_{0}) \boldsymbol{\Sigma}\_{p(\boldsymbol{\theta}\_{0})} \boldsymbol{\mathsf{W}}(\boldsymbol{\theta}\_{0})^{\top}) \boldsymbol{.}$$

**Remark 1.** *The proposed class of estimators forms a family of estimators that goes beyond the indicator α since it is easy to see that estimators obtained for the Csiszar's ϕ family are given for α* = 0 *in (1) and also the standard equiprobable model.*
