*3.1. Definitions of New Estimators*

In this section, we define robust versions of the estimators 4*t<sup>θ</sup>* from (18) and robust versions of minimum empirical divergence estimators *θ* 4*<sup>ϕ</sup>* from (20). First, we define robust estimators of *<sup>t</sup>θ*, by using a truncated version of the function *<sup>x</sup>*<sup>→</sup> *<sup>∂</sup> <sup>∂</sup>tm*(*x*, *θ*, *t*), and then, we insert such a robust estimator in the estimating equation corresponding to the minimum empirical divergence estimator. The truncated function is based on the multidimensional Huber function and contains a shift vector *τθ* and a scale matrix *A<sup>θ</sup>* to calibrate *t<sup>θ</sup>* and, thus, *tθ*, which realizes the supremum in the duality formula, will also be the solution of a new equation based on the new truncated function.

For simplicity, for fixed *θ* ∈ Θ, we also use the notation *m<sup>θ</sup>* (*x*, *t*) := *m*(*x*, *θ*, *t*). With this notation, *t<sup>θ</sup>* = *t<sup>θ</sup>* (*P*0) defined in (16) is the unique solution of the equation:

$$\int \frac{\partial}{\partial t} m\_{\theta}(\mathbf{x}, t\_{\theta}(P\_0))dP\_0(\mathbf{x}) = 0. \tag{26}$$

Consider the system

$$\int \frac{\partial}{\partial t} m\_{\theta}(y, t) dP\_{0}(y) = 0 \tag{27}$$

$$\int H\_{\varepsilon}(A[\frac{\partial}{\partial t}m\_{\theta}(y,t) - \tau])dP\_{0}(y) = 0 \tag{28}$$

$$\int H\_{\varepsilon}(A[\frac{\partial}{\partial t}m\_{\theta}(y,t)-\tau])H\_{\varepsilon}(A[\frac{\partial}{\partial t}m\_{\theta}(y,t)-\tau])^{\top}dP\_{0}(y) = I\_{\ell+1} \tag{29}$$

where

$$H\_c(y) := \begin{cases} \ y \cdot \min\left(1, \frac{c}{\|y\|}\right) \text{ if } y \neq 0\\ 0 \text{ if } y = 0 \end{cases} \tag{30}$$

is the multidimensional Huber function, with *c* > 0, *I*+<sup>1</sup> the identity matrix, *A* is a (*<sup>l</sup>* <sup>+</sup> <sup>1</sup>) <sup>×</sup> (*<sup>l</sup>* <sup>+</sup> <sup>1</sup>) matrix, and *<sup>τ</sup>* <sup>∈</sup> <sup>R</sup>*l*<sup>+</sup>1. For fixed *<sup>θ</sup>*, this system admits a unique solution (*t*, *A*, *τ*)=(*t<sup>θ</sup>* (*P*0), *A<sup>θ</sup>* (*P*0), *τθ* (*P*0)) (according to [18], p. 17).

The multidimensional Huber function is useful to define robust estimators; it transforms each point outside a hypersphere of *c* radius to the nearest point of it and leaves those inside unchanged (see [26], p. 239, [27]). By applying the multidimensional Huber function to the function *<sup>y</sup>*<sup>→</sup> *<sup>∂</sup> <sup>∂</sup>tm<sup>θ</sup>* (*y*, *t*), together with considering the scale matrix *A<sup>θ</sup>* and the shift vector *τθ*, a modification is produced there, where the norm exceeds the bound *c*, and in the meantime, the original *t<sup>θ</sup>* remains the solution of the equation based on the new truncated function. For parametric models, the multidimensional Huber function was also used in other contexts, for example to define optimal B*s*-robust estimators or optimal B*i*-robust estimators (see [26], p. 244).

The above arguments can be used for each probability measure *P* from the moment condition model <sup>M</sup>1. This context allows defining the truncated version of the function *<sup>y</sup>*<sup>→</sup> *<sup>∂</sup> <sup>∂</sup>tm<sup>θ</sup>* (*y*, *t*), which we denote by *ψθ* (*y*, *t*), such that the original *t<sup>θ</sup>* (*P*0), the solution of Equation (26), is also the solution of the equation *ψθ* (*y*, *<sup>t</sup><sup>θ</sup>* (*P*0))*dP*0(*y*) = 0.

For *θ* fixed and *P* a probability measure, the equation *<sup>∂</sup> <sup>∂</sup>tm<sup>θ</sup>* (*y*, *t*)*dP*(*y*) = 0 has a unique solution *t* = *t<sup>θ</sup>* (*P*) ∈ Λ*<sup>θ</sup>* (*P*) assuring the supremum in the dual form of the divergence *Dϕ*(M*θ*, *P*) (see [12]). For each *t*, we define the *A<sup>θ</sup>* (*t*) and *τθ* (*t*) solutions of the system:

$$\int H\_{\ell}(A\_{\theta}(t)) [\frac{\partial}{\partial t} m\_{\theta}(y, t) - \pi\_{\theta}(t)] \, |dP(y) = 0 \tag{31}$$

$$\int H\_{\ell}(A\_{\theta}(t)) \partial\_{x} \Delta\_{\theta}(y, t) - \pi\_{\theta}(t) \overline{\partial}\_{x} \Delta\_{\theta}(y, t) = 0 \tag{32}$$

$$\int H\_{\mathfrak{C}}(A\_{\theta}(\mathfrak{t})[\frac{\mathfrak{d}}{\partial \mathfrak{t}}m\_{\theta}(\mathfrak{y},\mathfrak{t})-\mathfrak{x}\_{\theta}(\mathfrak{t})])H\_{\mathfrak{C}}(A\_{\theta}(\mathfrak{t})[\frac{\mathfrak{d}}{\partial \mathfrak{t}}m\_{\theta}(\mathfrak{y},\mathfrak{t})-\mathfrak{x}\_{\theta}(\mathfrak{t})])^{\top}dP(\mathfrak{y}) = I\_{\ell+1}. \tag{32}$$

We define a new estimator 4*t c <sup>θ</sup>* of *t<sup>θ</sup>* = *t<sup>θ</sup>* (*P*0), as a Z-estimator corresponding to the *ψ*function:

$$\psi\_{\theta}(\mathbf{x},t) := H\_c\left(A\_{\theta}(t)[\frac{\partial}{\partial t}m\_{\theta}(\mathbf{x},t) - \pi\_{\theta}(t)]\right);\tag{33}$$

more precisely, 4*t c <sup>θ</sup>* is defined by

$$\int \psi\_{\theta}(y,\hat{\mathbf{r}}\_{\theta}^{\varepsilon})dP\_{n}(y) = 0 \quad \text{or} \quad \sum\_{i=1}^{n} H\_{\mathsf{c}}(A\_{\theta}(\hat{\mathbf{r}}\_{\theta}^{\varepsilon})[\frac{\partial}{\partial t}m\_{\theta}(X\_{i},\hat{\mathbf{r}}\_{\theta}^{\varepsilon}) - \pi\_{\theta}(\hat{\mathbf{r}}\_{\theta}^{\varepsilon})]) = 0,\tag{34}$$

the theoretical counterpart of this estimating equation being

$$
\int \psi\_{\theta}(y, t\_{\theta}(P\_0))dP\_0(y) = 0. \tag{35}
$$

For a given probability measure *P*, the statistical functional *t c <sup>θ</sup>* (*P*) associated with the estimator 4*t c <sup>θ</sup>*, whenever it exists, is defined by

$$\int \psi\_{\theta}(y, t^{\varepsilon}\_{\theta}(P))dP(y) = \int H\_{\mathfrak{c}}(A\_{\theta}(t^{\varepsilon}\_{\theta}(P)) [\frac{\partial}{\partial t} m\_{\theta}(y, t^{\varepsilon}\_{\theta}(P)) - \tau\_{\theta}(t^{\varepsilon}\_{\theta}(P))]) dP(y) = 0. \tag{36}$$

Note that

$$t\_\theta^c(P\_0) = t\_\theta(P\_0),\tag{37}$$

by construction.

**Remark 1.** *We notice a similarity between the Z-estimator defined in* (34) *and the classical optimal Bs-robust estimator for parametric models from [26]. In the case of the parametric models, the M-estimator corresponding to the ψ-function* (33)*, but defined for the classical score function ∂ <sup>∂</sup>t*(ln *ft*(*x*)) = *<sup>∂</sup> <sup>∂</sup><sup>t</sup> ft*(*x*) *ft*(*x*) *instead of the function <sup>∂</sup> <sup>∂</sup>tm<sup>θ</sup>* (*x*, *t*) *(inclusively in the system* (31) *and* (32) *defining A<sup>θ</sup>* (*t*) *and τθ* (*t*)*), is the classical optimal Bs-robust estimator (ft*(*x*) *denotes the density* *corresponding to a parametric model indexed by the parameter t). The classical optimal Bs-robust estimator for parametric models has the optimal property that minimizes a measure of the asymptotic mean-squared error, among all the Fisher-consistent estimators with a self-standardized sensitivity smaller than the positive constant c.*

In the following, for a given divergence, using the estimators 4*t c <sup>θ</sup>* for *t<sup>θ</sup>* (*P*0), we constructed new estimators of the parameter *θ*<sup>0</sup> of the model. In Section 3.3, we prove that all the estimators 4*t c <sup>θ</sup>* are robust, and this property will be transferred to the new estimators that we define for the parameter *θ*0.

To define new estimators for *θ*0, we used the dual representation (15) of the divergence *Dϕ*(M*θ*, *P*0). Since

$$\theta\_0 = \arg\inf\_{\theta \in \Theta} D\_\theta(\mathcal{M}\_{\theta'} P\_0) \quad = \arg\inf\_{\theta \in \Theta} \sup\_{t \in \Lambda\_\theta} \int m\_\theta(y, t) dP\_0(y) \tag{38}$$

$$=\arg\inf\_{\theta\in\Theta}\int m\_{\theta}(y, t\_{\theta}(P\_0))dP\_0(y),\tag{39}$$

*θ* = *θ*<sup>0</sup> is the solution of the equation:

$$\int \frac{\partial}{\partial \theta} [m(y, \theta, t(\theta, P\_0))] dP\_0(y) = 0,\tag{40}$$

where we used the notation *t*(*θ*, *P*) := *t<sup>θ</sup>* (*P*). Equation (40) may be written as

$$\int \frac{\partial}{\partial \theta} m(y, \theta\_0, t(\theta\_0, P\_0)) dP\_0(y) + \frac{\partial}{\partial \theta} t(\theta\_0, P\_0)^\top \int \frac{\partial}{\partial t} m(y, \theta\_0, t(\theta\_0, P\_0)) dP\_0(y) = 0. \tag{41}$$

On the basis of the definition of *t<sup>θ</sup>* (*P*0) = *t*(*θ*, *P*0), for *θ* = *θ*0, we have

$$\int \frac{\partial}{\partial t} m(\underline{y}, \theta\_0, t(\theta\_0, P\_0)) dP\_0(\underline{y}) = 0;\tag{42}$$

therefore, we deduce that *θ* = *θ*<sup>0</sup> is the solution of equation:

$$\int \frac{\partial}{\partial \theta} m(y, \theta, t(\theta, P\_0)) dP\_0(y) = 0. \tag{43}$$

Using (37), namely *t <sup>c</sup>*(*θ*, *P*0) = *t*(*θ*, *P*0), we obtain that *θ* = *θ*<sup>0</sup> is in fact the solution of equation:

$$\int \frac{\partial}{\partial \theta} m(y, \theta, t^c(\theta, P\_0)) dP\_0(y) = 0. \tag{44}$$

Then, we define a new estimator *θ* 4*c <sup>ϕ</sup>* of *θ*0, as a plug-in estimator solution of the equation:

$$\int \frac{\partial}{\partial \theta} m(y, \widehat{\theta}\_{\varphi}^{\varepsilon}, t^{c}(\widehat{\theta}\_{\varphi}^{\varepsilon}, P\_{n})) dP\_{n}(y) = 0. \tag{45}$$

For a probability measure *P*, the statistical functional *T<sup>c</sup>* corresponding to the estimator *θ* 4*c <sup>ϕ</sup>*, whenever it exists, is defined by

$$\int \frac{\partial}{\partial \theta} m(y, T^c(P), t^c(T^c(P), P)) dP(y) = 0. \tag{46}$$

The functional *T<sup>c</sup>* is Fisher-consistent, because

$$T^{\mathbb{C}}(P\_0) = \theta\_0. \tag{47}$$

This equality is obtained by using (46) for *P* = *P*0, the fact that *t <sup>c</sup>*(*Tc*(*P*0), *P*0) = *t*(*Tc*(*P*0), *P*0), and the definition of *t<sup>θ</sup>* (*P*0) = *t*(*θ*, *P*0) for *θ* = *Tc*(*P*0), all these leading to

$$\int \frac{\partial}{\partial \theta} m(y, T^c(P\_0), t(T^c(P\_0), P\_0)) dP\_0(y) + \frac{\partial}{\partial \theta} t(T^c(P\_0), P\_0)^\top \int \frac{\partial}{\partial t} m(y, T^c(P\_0), t(T^c(P\_0), P\_0)) dP\_0(y) = 0. \tag{48}$$

Since *θ*<sup>0</sup> is the unique solution of Equation (41) and, according to (48), *Tc*(*P*0) would be another solution to the same equation, we deduce (47).

From (34) and (45), we have

$$\int \psi\_{\widehat{\theta}^{\varepsilon}\_{\varphi}}(y, t^{\varepsilon}(\widehat{\theta}^{\varepsilon}\_{\widehat{\theta}^{\prime}} P\_{n}))dP\_{n}(y) = 0, \int \frac{\partial}{\partial \theta} m(y, \widehat{\theta}^{\varepsilon}\_{\widehat{\theta}^{\prime}} t^{\varepsilon}(\widehat{\theta}^{\varepsilon}\_{\widehat{\theta}^{\prime}} P\_{n}))dP\_{n}(y) = 0,$$

and then,

$$\begin{cases} \psi(y, \hat{\theta}^{\boldsymbol{\varepsilon}}\_{\boldsymbol{\varrho}^{\boldsymbol{\varepsilon}}} \hat{t}^{\boldsymbol{\varepsilon}}\_{\hat{\theta}^{\boldsymbol{\varepsilon}}\_{\boldsymbol{\varrho}}}) dP\_n(y) = 0, \\ \int \frac{\partial}{\partial \boldsymbol{\theta}} m(y, \hat{\theta}^{\boldsymbol{\varepsilon}}\_{\boldsymbol{\varrho}^{\boldsymbol{\varepsilon}}} \hat{t}^{\boldsymbol{\varepsilon}}\_{\hat{\theta}^{\boldsymbol{\varepsilon}}\_{\boldsymbol{\varrho}}}) dP\_n(y) = 0, \end{cases}$$

with *ψ*(*y*, *θ*, *t*) := *ψθ* (*y*, *t*). The couple of estimators *θ* 4*c <sup>ϕ</sup>*,4*t c θ* 4*c ϕ* can be viewed as a Z-estimator solution of the above system. Denoting

$$\Psi(\underline{y},\theta,t) := (\psi(\underline{y},\theta,t)^\top, (\frac{\partial}{\partial\theta}m(\underline{y},\theta,t))^\top)^\top,\tag{49}$$

the Z-estimators *θ* 4*c <sup>ϕ</sup>*,4*t c θ* 4*c ϕ* are the solutions of the system:

$$\int \Psi(y, \widehat{\theta}^{\underline{c}}\_{\underline{\theta}^{\underline{\sigma}}} \widehat{t}^{\underline{c}}\_{\widehat{\theta}^{\underline{\sigma}}\_{\underline{\theta}}}) dP\_n(y) = 0,\tag{50}$$

and the theoretical counterpart is given by

$$\int \Psi(y,\theta\_0,t\_{\theta\_0})dP\_0(y) = 0.\tag{51}$$
