*Article* **Robust Model Selection Criteria Based on Pseudodistances**

#### **Aida Toma 1,2,\*, Alex Karagrigoriou <sup>3</sup> and Paschalini Trentou <sup>3</sup>**


Received: 17 February 2020; Accepted: 3 March 2020; Published: 6 March 2020

**Abstract:** In this paper, we introduce a new class of robust model selection criteria. These criteria are defined by estimators of the expected overall discrepancy using pseudodistances and the minimum pseudodistance principle. Theoretical properties of these criteria are proved, namely asymptotic unbiasedness, robustness, consistency, as well as the limit laws. The case of the linear regression models is studied and a specific pseudodistance based criterion is proposed. Monte Carlo simulations and applications for real data are presented in order to exemplify the performance of the new methodology. These examples show that the new selection criterion for regression models is a good competitor of some well known criteria and may have superior performance, especially in the case of small and contaminated samples.

**Keywords:** model selection; minimum pseudodistance estimation; Robustness

#### **1. Introduction**

Model selection is fundamental to the practical applications of statistics and there is a substantial literature on this issue. Classical model selection criteria include, among others, the C*p*-criterion, the Akaike Information Criterion (AIC), based on the Kullback-Leibler divergence, and the Bayesian Information Criterion (BIC) as well as a General Information Criterion (GIC) which corresponds to a general class of criteria which also estimates the Kullback-Leibler divergence. These criteria have been proposed respectively in [1–4], and represent powerful tools for choosing the best model among different candidate models that can be used to fit a given data set. On the other hand, many classical procedures for model selection are extremely sensitive to outliers and to other departures from the distributional assumptions of the model. Robust versions of classical model selection criteria, which are not strongly affected by outliers, have been proposed for example in [5–7]. Some recent proposals for robust model selection are criteria based on divergences and minimum divergence estimators. We recall here, the Divergence Information Criteria (DIC) based on the density power divergences introduced in [8], the Modified Divergence Information Criteria (MDIC) introduced in [9] and the criteria based on minimum dual divergence estimators introduced in [10].

The interest on statistical methods based on divergence measures has grown significantly in recent years. For a wide variety of models, statistical methods based on divergences have high model efficiency and are also robust, representing attractive alternatives to the classical methods. We refer to the monographs [11,12] for an excellent presentation of such methods, for their importance and applications. The pseudodistances that we use in the present paper were originally introduced in [13], where they are called "type-0" divergences, and corresponding minimum divergence estimators

have been studied. They are also presented and extensively studied in [14] where they are called *γ*-divergences, as well as in [15] in the context of decomposable pseudodistances. Like divergences, the pseudodistances are not mathematical metrics in the strict sense of the term. They satisfy two properties, namely the nonnegativity and the fact that the pseudodistance between two probability measures equals to zero if and only if the two measures are equal. The divergences are moreover characterized by the information processing property, that is, the complete invariance with respect to statistically sufficient transformations of the observation space. In general, a pseudodistance may not satisfy this property. We have adopted the term pseudodistance for this reason, but in literature we can also encounter the other terms mentioned above.

The pseudodistances that we consider in this paper have also been used to define robustness and efficiency measures, as well as the corresponding optimal robust M-estimators following the Hampel's infinitesimal approach in [16]. The minimum pseudodistance estimators for general parametric models have been studied in [15] and consist of minimizing an empirical version of a pseudodistance between the assumed theoretical model and the true model underlying the data. These estimators have the advantage of not requiring any prior smoothing and conciliate robustness with high efficiency, providing a high degree of stability under model misspecification, often with a minimal loss in model efficiency. Such estimators are also defined and studied in the case of the multivariate normal model, as well as for linear regression models in [17,18], where applications for portfolio optimization models are also presented.

In the present paper we propose new criteria for model selection, based on pseudodistances and on minimum pseudodistance estimators. These new criteria have robustness properties, are asymptotically unbiased, consistent and compare well with some other known model selection criteria, even for small samples.

The paper is organized as follows—Section 2 is devoted to minimum pseudodistance estimators and to their asymptotic properties, which will be needed in the next sections. Section 3 presents new estimators of the expected overall discrepancy using pseudodistances, together with corresponding theoretical properties including robustness, consistency and limit laws. The new asymptotically unbiased model selection criteria are presented in Section 3.3, where the case of the univariate normal model and the case of linear regression models are investigated. Applications based on Monte Carlo simulations and on real data, illustrating the performance of the new methodology in the case of linear regression models, are included in Section 4.

#### **2. Minimum Pseudodistance Estimators**

The construction of new model selection criteria is based on using the following family of pseudodistances (see [15]). For two probability measures *P* and *Q* admitting densities *p* and *q* respectively with respect to the Lebesgue measure, the family of pseudodistances of order *γ* > 0 is defined by

$$R\_{\gamma}(P, \mathbb{Q}) = \frac{1}{\gamma + 1} \ln \left( \int p^{\gamma} \mathrm{d}P \right) + \frac{1}{\gamma(\gamma + 1)} \ln \left( \int q^{\gamma} \mathrm{d}Q \right) - \frac{1}{\gamma} \ln \left( \int p^{\gamma} \mathrm{d}Q \right) \tag{1}$$

and satisfies the limit relation

$$\lim\_{\gamma \to 0} R\_{\gamma}(P, Q) = R\_0(P, Q)\_{\prime} \tag{2}$$

where *R*0(*P*, *Q*) := R ln *<sup>q</sup> p* d*Q* is the modified Kullback-Leibler divergence.

Let (*P<sup>θ</sup>* ) be a parametric model indexed by *θ* ∈ Θ, where Θ is a *d*-dimensional parameter space, and *p<sup>θ</sup>* be the corresponding densities with respect to the Lebesgue measure *λ*. Let *X*1, . . . , *X<sup>n</sup>* be a random sample on *Pθ*<sup>0</sup> , *θ*<sup>0</sup> ∈ Θ. For *γ* > 0 fixed, a minimum pseudodistance estimator of the unknown parameter *θ*<sup>0</sup> from the law *Pθ*<sup>0</sup> is defined by replacing the measure *Pθ*<sup>0</sup> in the pseudodistance *Rγ*(*P<sup>θ</sup>* , *Pθ*<sup>0</sup> ) by the empirical measure *P<sup>n</sup>* pertaining to the sample, and then minimizing this empirical

quantity with respect to *θ* on the parameter space. Since the middle term in *Rγ*(*P<sup>θ</sup>* , *Pθ*<sup>0</sup> ) does not depend on *θ*, these estimators are defined by

$$\hat{\theta}\_{\eta} = \arg\min\_{\theta \in \Theta} \left\{ \frac{1}{\gamma + 1} \ln \left( \int p\_{\theta}^{\gamma + 1} \mathrm{d}\lambda \right) - \frac{1}{\gamma} \ln \left( \frac{1}{n} \sum\_{i=1}^{n} p\_{\theta}^{\gamma} (X\_i) \right), \right\} \tag{3}$$

or equivalently as

$$\widehat{\theta}\_{\boldsymbol{n}} = \arg\max\_{\theta \in \Theta} \{ \mathbb{C}\_{\boldsymbol{\gamma}}(\theta)^{-1} \cdot \frac{1}{n} \sum\_{i=1}^{n} p\_{\theta}^{\boldsymbol{\gamma}}(X\_{i}) \},\tag{4}$$

where *Cγ*(*θ*) = (R *p γ*+1 *θ* d*λ*) *γ*/(*γ*+1) . Denoting *h*(*x*, *θ*) := *Cγ*(*θ*) −1 · *p γ θ* (*x*), these estimators can be written as

$$\widehat{\theta}\_{\mathfrak{n}} = \arg\max\_{\theta \in \Theta} \frac{1}{\mathfrak{n}} \sum\_{i=1}^{n} h(X\_i, \theta). \tag{5}$$

The optimum given above need not be uniquely defined. On the other hand,

$$\arg\max\_{\theta \in \Theta} \int h(\mathbf{x}, \theta)dP\_{\theta\_0}(\mathbf{x}) = \theta\_0 \tag{6}$$

and here *θ*<sup>0</sup> is the unique optimizer, since *Rγ*(*P<sup>θ</sup>* , *Pθ*<sup>0</sup> ) = 0 implies *θ* = *θ*0.

Define

$$R\_{\gamma}(\theta\_0) := \max\_{\theta \in \Theta} \int h(\mathbf{x}, \theta) dP\_{\theta\_0}(\mathbf{x}) = \int h(\mathbf{x}, \theta\_0) dP\_{\theta\_0}(\mathbf{x}).$$

An estimator of *Rγ*(*θ*0) is defined by

$$\widehat{\mathcal{R}}\_{\gamma}(\theta\_0) := \max\_{\theta \in \Theta} \int h(\mathbf{x}, \theta) d\mathbb{P}\_{\mathbb{R}}(\mathbf{x}) = \max\_{\theta \in \Theta} \frac{1}{n} \sum\_{i=1}^n h(\mathbf{X}\_i, \theta) = \frac{1}{n} \sum\_{i=1}^n h(\mathbf{X}\_i, \widehat{\theta}\_{\mathbb{R}}). \tag{7}$$

The following regularity conditions of the model will be assumed throughout the rest of the paper.

(C1) The density *p<sup>θ</sup>* (*x*) has continuous partial derivatives with respect to *θ* up to the third order (for all *x λ*-a.e.).

(C2) There exists a neighborhood *Nθ*<sup>0</sup> of *θ*<sup>0</sup> such that the first-, the second- and the third- order partial derivatives with respect to *θ* of *h*(*x*, *θ*) are dominated on *Nθ*<sup>0</sup> by some *Pθ*<sup>0</sup> -integrable functions.

(C3) The integrals R [ *∂* 2 *∂θ*<sup>2</sup> *<sup>h</sup>*(*x*, *<sup>θ</sup>*)]*θ*=*θ*<sup>0</sup> d*Pθ*<sup>0</sup> (*x*) and R [ *∂ ∂θ h*(*x*, *θ*)]*θ*=*θ*<sup>0</sup> [ *∂ ∂θ <sup>h</sup>*(*x*, *<sup>θ</sup>*)]*<sup>t</sup> θ*=*θ*<sup>0</sup> d*Pθ*<sup>0</sup> (*x*) exist.

**Theorem 1.** *Assume that conditions (C1), (C2) and (C3) are fulfilled. Then*


$$V = \mathbb{S}^{-1} \mathbb{S} \mathbb{S}^{-1} \, \tag{8}$$

*where S* := − R [ *∂* 2 *∂θ*<sup>2</sup> *<sup>h</sup>*(*x*, *<sup>θ</sup>*)]*θ*=*θ*<sup>0</sup> d*Pθ*<sup>0</sup> (*x*) *and M* := R [ *∂ ∂θ h*(*x*, *θ*)]*θ*=*θ*<sup>0</sup> [ *∂ ∂θ <sup>h</sup>*(*x*, *<sup>θ</sup>*)]*<sup>t</sup> θ*=*θ*<sup>0</sup> d*Pθ*<sup>0</sup> (*x*)*.* 2

$$\begin{aligned} \text{(c)} \qquad &\sqrt{n}\left(\hat{R}\_{\gamma}(\theta\_{0}) - R\_{\gamma}(\theta\_{0})\right) \text{ converges in distribution to a centered normal variable with variance } \sigma^{2}(\theta\_{0}) = \int h(\mathbf{x}, \theta\_{0})^{2} \mathrm{d}P\_{\theta\_{0}}(\mathbf{x}) - \left(\int h(\mathbf{x}, \theta\_{0}) \mathrm{d}P\_{\theta\_{0}}(\mathbf{x})\right)^{2} .\end{aligned}$$

We refer to [15] for details regarding these estimators and for the proofs of the above asymptotic properties.

#### **3. Model Selection Criteria Based on Pseudodistances**

Model selection is a method for selecting the best model among candidate models that can be used to fit a given data set. A model selection criterion can be considered as an approximately unbiased estimator of the expected overall discrepancy, a nonnegative quantity which measures the distance between the true unknown model and a fitted approximating model. If the value of the criterion is small, then the approximated candidate model can be chosen. In the following, by applying the same methodology used for AIC, we construct new criteria for model selection using pseudodistances (1) and minimum pseudodistance estimators.

Let *X*1, . . . , *X<sup>n</sup>* be a random sample from the distribution associated with the true model *Q* with density *q* and let *p<sup>θ</sup>* be the density of a candidate model *P<sup>θ</sup>* from a parametric family (*P<sup>θ</sup>* ), where *<sup>θ</sup>* <sup>∈</sup> <sup>Θ</sup> <sup>⊂</sup> <sup>R</sup>*<sup>d</sup>* .

#### *3.1. The Expected Overall Discrepancy*

For *γ* > 0 fixed, we consider the quantity

$$\mathcal{W}\_{\theta} = \frac{1}{\gamma + 1} \ln \left( \int p\_{\theta}^{\gamma + 1} \mathbf{d} \lambda \right) - \frac{1}{\gamma} \ln \left( \int p\_{\theta}^{\gamma} q \mathbf{d} \lambda \right), \tag{9}$$

which is the same as the pseudodistance *Rγ*(*P<sup>θ</sup>* , *Q*) without the middle term that remains constant irrespectively of the model (*P<sup>θ</sup>* ) used.

The target theoretical quantity that will be approximated by an asymptotically unbiased estimator is given by

$$E[\mathcal{W}\_{\widehat{\theta}\_{\text{n}}}] = E[\mathcal{W}\_{\theta} | \theta = \widehat{\theta}\_{\text{n}}],\tag{10}$$

where *θ*b*<sup>n</sup>* is a minimum pseudodistance estimator defined as in (3). The same pseudodistance is used for both *W<sup>θ</sup>* and *θ*b*n*. The quantity (10) can be seen as an average distance between *Q* and (*P<sup>θ</sup>* ) up to a constant and is called *the expected overall discrepancy* between *Q* and (*P<sup>θ</sup>* ).

The next Lemma gives the gradient vector and the Hessian matrix of *W<sup>θ</sup>* and is useful for the evaluation of *<sup>E</sup>*[*Wθ*b*<sup>n</sup>* ] through Taylor expansion.

Throughout this paper, for a scalar function *<sup>ϕ</sup><sup>θ</sup>* (·), the quantity *<sup>∂</sup> ∂θ ϕ<sup>θ</sup>* (·) denotes the *d*-dimensional gradient vector of *<sup>ϕ</sup><sup>θ</sup>* (·) with respect to the vector *<sup>θ</sup>* and *<sup>∂</sup>* 2 *∂θ*<sup>2</sup> *<sup>ϕ</sup><sup>θ</sup>* (·) denotes the corresponding *<sup>d</sup>* × *<sup>d</sup>* Hessian matrix. We also use the notations *ϕ*˙ *<sup>θ</sup>* and *ϕ*¨ *<sup>θ</sup>* for the first and the second order derivatives of *ϕ<sup>θ</sup>* with respect to *θ*.

We assume the following conditions allowing derivation under the integral sign:

(C4) There exists a neighborhood *N<sup>θ</sup>* of *θ* such that

$$\int \sup\_{t \in N\_{\theta}} \left\| \left. \frac{\partial}{\partial t} p\_t^{\gamma + 1} \right\| \right\| \, \mathrm{d}\lambda < \infty, \quad \int \sup\_{t \in N\_{\theta}} \left\| \left. \frac{\partial}{\partial t} [p\_t^{\gamma} p\_t] \right\| \right\| \, \mathrm{d}\lambda < \infty.$$

(C5) There exists a neighborhood *N<sup>θ</sup>* of *θ* such that

$$\int \sup\_{t \in \mathcal{N}\_{\theta}} \left\| \left. \frac{\partial}{\partial t} p\_t^{\gamma} \right\| \right\| q \mathbf{d} \lambda < \infty, \\ \int \sup\_{t \in \mathcal{N}\_{\theta}} \left\| \left. \frac{\partial}{\partial t} [p\_t^{\gamma - 1} \dot{p}\_t] \right\| \right\| q \mathbf{d} \lambda < \infty.$$

**Lemma 1.** *Under (C4) and (C5), the gradient vector and the Hessian matrix of W<sup>θ</sup> are*

$$\frac{\partial}{\partial \theta} \mathcal{W}\_{\theta} = \frac{\int p\_{\theta}^{\gamma} \dot{p}\_{\theta} \mathbf{d} \lambda}{\int p\_{\theta}^{\gamma + 1} \mathbf{d} \lambda} - \frac{\int p\_{\theta}^{\gamma - 1} \dot{p}\_{\theta} q \mathbf{d} \lambda}{\int p\_{\theta}^{\gamma} q \mathbf{d} \lambda} \tag{11}$$

$$\begin{split} \frac{\partial^2}{\partial\theta^2} \mathsf{W}\_{\theta} &= \quad \frac{[\gamma \int p\_{\theta}^{\gamma-1} \dot{p}\_{\theta} \dot{p}\_{\theta}^t \mathrm{d}\lambda + \int p\_{\theta}^{\gamma} \dot{p}\_{\theta} \mathrm{d}\lambda] \int p\_{\theta}^{\gamma+1} \mathrm{d}\lambda - (\gamma+1) \int p\_{\theta}^{\gamma} \dot{p}\_{\theta} \mathrm{d}\lambda (\int p\_{\theta}^{\gamma} \dot{p}\_{\theta} \mathrm{d}\lambda)^t}{(\int p\_{\theta}^{\gamma+1} \mathrm{d}\lambda)^2} \\ &- \quad \frac{[(\gamma-1) \int p\_{\theta}^{\gamma-2} \dot{p}\_{\theta} \dot{p}\_{\theta}^t \mathrm{q} \mathrm{d}\lambda + \int p\_{\theta}^{\gamma-1} \ddot{p}\_{\theta} \mathrm{q} \mathrm{d}\lambda] \int p\_{\theta}^{\gamma} \mathrm{q} \mathrm{d}\lambda - \gamma \int p\_{\theta}^{\gamma-1} \dot{p}\_{\theta} \mathrm{q} \mathrm{d}\lambda (\int p\_{\theta}^{\gamma-1} \dot{p}\_{\theta} \mathrm{q} \mathrm{d}\lambda)^t}{(\int p\_{\theta}^{\gamma} \mathrm{q} \mathrm{d}\lambda)^2}. \end{split}$$

When the true model *Q* belongs to the parametric model (*P<sup>θ</sup>* ), hence *Q* = *Pθ*<sup>0</sup> and *q* = *pθ*<sup>0</sup> , the gradient vector and the Hessian matrix of *W<sup>θ</sup>* simplify to

$$
\left[\frac{\partial}{\partial\theta}\mathcal{W}\_{\theta}\right]\_{\theta=\theta\_0} = \begin{array}{c} \mathbf{0} \\ \\ \end{array} \tag{12}
$$

$$\left[\frac{\partial^2}{\partial\theta^2}\mathcal{W}\_\theta\right]\_{\theta=\theta\_0} = \quad \mathcal{M}\_{\gamma\prime}(\theta\_0) \tag{13}$$

where

$$M\_{\gamma}(\theta\_0) := \frac{(\int p\_{\theta\_0}^{\gamma - 1} \dot{p}\_{\theta\_0} \dot{p}\_{\theta\_0}^t \, \mathrm{d}\lambda)(\int p\_{\theta\_0}^{\gamma + 1} \mathrm{d}\lambda) - (\int p\_{\theta\_0}^{\gamma} \dot{p}\_{\theta\_0} \mathrm{d}\lambda)(\int p\_{\theta\_0}^{\gamma} \dot{p}\_{\theta\_0} \mathrm{d}\lambda)^t}{(\int p\_{\theta\_0}^{\gamma + 1} \mathrm{d}\lambda)^2}. \tag{14}$$

In the following Propositions we suppose that the true model *Q* belongs to the parametric model (*P<sup>θ</sup>* ), hence *Q* = *Pθ*<sup>0</sup> , *q* = *pθ*<sup>0</sup> and *θ*<sup>0</sup> is the value of the parameter corresponding to the true model *Q* = *Pθ*<sup>0</sup> . We also say that *θ*<sup>0</sup> is the true value of the parameter.

**Proposition 1.** *When the true model Q belongs to the parametric model* (*P<sup>θ</sup>* )*, assuming that (C4) and (C5) are fulfilled for q* = *pθ*<sup>0</sup> *and θ* = *θ*0*, the expected overall discrepancy is given by*

$$E[\mathcal{W}\_{\widehat{\theta}\_{\scriptscriptstyle\rm II}}] = \mathcal{W}\_{\theta\_0} + \frac{1}{2}E[(\widehat{\theta}\_{\scriptscriptstyle\rm II} - \theta\_0)^{\ell}M\_{\mathcal{I}}(\theta\_0)(\widehat{\theta}\_{\scriptscriptstyle\rm II} - \theta\_0)] + E[\mathcal{R}\_{\textit{n}}]\_{\prime} \tag{15}$$

*where R<sup>n</sup>* = *o*(k*θ*b*<sup>n</sup>* − *θ*0k 2 )*, Mγ*(*θ*0) *is given by (14).*

#### *3.2. Estimation of the Expected Overall Discrepancy*

In this section, we introduce an estimator of the expected overall discrepancy, under the hypothesis that the true model *Q* belongs to the parametric model (*P<sup>θ</sup>* ). Hence, *Q* = *Pθ*<sup>0</sup> and the unknown parameter *θ*<sup>0</sup> will be estimated by a minimum pseudodistance estimator *θ*b*n*.

For a given *θ* ∈ Θ, a natural estimator of *W<sup>θ</sup>* is defined by

$$Q\_{\theta} := \frac{1}{\gamma + 1} \ln \left( \int p\_{\theta}^{\gamma + 1} \mathbf{d} \lambda \right) - \frac{1}{\gamma} \ln \left( \frac{1}{n} \sum\_{i=1}^{n} p\_{\theta}^{\gamma} (\mathbf{X}\_{i}) \right). \tag{16}$$

**Lemma 2.** *Assuming (C4), the gradient vector and the Hessian matrix of Q<sup>θ</sup> are given by*

*∂ ∂θ <sup>Q</sup><sup>θ</sup>* <sup>=</sup> R *p γ θ p*˙ *<sup>θ</sup>*d*λ* R *p γ*+1 *θ* d*λ* − ∑ *n i*=1 *p γ*−1 *θ* (*Xi*)*p*˙ *<sup>θ</sup>* (*Xi*) ∑ *n i*=1 *p γ θ* (*Xi*) *∂* 2 *∂θ*<sup>2</sup> *Q<sup>θ</sup>* = [*γ* R *p γ*−1 *θ p*˙ *<sup>θ</sup> p*˙ *t θ* d*λ* + R *p γ θ p*¨*θdλ*] R *p γ*+1 *θ* d*λ* − (*γ* + 1) R *p γ θ p*˙ *<sup>θ</sup>*d*λ*( R *p γ θ p*˙ *<sup>θ</sup>*d*λ*) *t* ( R *p γ*+1 *θ* d*λ*) 2 − − [(*γ* − 1) ∑ *n i*=1 *p γ*−2 *θ* (*Xi*)*p*˙ *<sup>θ</sup>* (*Xi*)*p*˙ *<sup>θ</sup>* (*Xi*) *<sup>t</sup>* + ∑ *n i*=1 *p γ*−1 *θ* (*Xi*)*p*¨*<sup>θ</sup>* (*Xi*)] ∑ *n i*=1 *p γ θ* (*Xi*) (∑ *n i*=1 *p γ θ* (*Xi*))<sup>2</sup> + *γ*(∑ *n i*=1 *p γ*−1 *θ* (*Xi*)*p*˙ *<sup>θ</sup>* (*Xi*))(∑ *n i*=1 *p γ*−1 *θ* (*Xi*)*p*˙ *<sup>θ</sup>* (*Xi*))*<sup>t</sup>* (∑ *n i*=1 *p γ θ* (*Xi*))<sup>2</sup> .

**Proposition 2.** *When the true model Q belongs to the parametric model* (*P<sup>θ</sup>* )*, by imposing the conditions (C1)-(C5), it holds*

$$E[Q\_{\theta\_0}] = E[Q\_{\widehat{\theta}\_n}] + \frac{1}{2}E[(\theta\_0 - \widehat{\theta}\_n)^t M\_\gamma(\theta\_0)(\theta\_0 - \widehat{\theta}\_n)] + E[R\_n],\tag{17}$$

*where R<sup>n</sup>* = *o*(k*θ*b*<sup>n</sup>* − *θ*0k 2 )*.*

The following result allows to define an asymptotically unbiased estimator of the expected overall discrepancy.

**Proposition 3.** *When the true model Q belongs to the parametric model* (*P<sup>θ</sup>* )*, under (C1)-(C5), it holds*

$$\begin{split} E[\mathcal{W}\_{\widehat{\theta}\_{n}}] &= \, \, \, E[Q\_{\widehat{\theta}\_{n}}] + E[(\theta\_{0} - \widehat{\theta}\_{n})^{t} \mathcal{M}\_{\gamma}(\theta\_{0})(\theta\_{0} - \widehat{\theta}\_{n})] + \\ &+ \frac{1}{2\gamma n} \left[ 1 - \frac{\int p\_{\theta\_{0}}^{2\gamma+1} \mathbf{d}\lambda}{\left(\int p\_{\theta\_{0}}^{\gamma+1} \mathbf{d}\lambda\right)^{2}} \right] + \, \, \, \, \, \, \, \frac{1}{\gamma} \, \, \, \, \, \Big[\, \, \, \} \, \, \end{split} \tag{18}$$

*where R<sup>n</sup>* = *o*(k*θ*b*<sup>n</sup>* − *θ*0k 2 ) *and R*0 *<sup>n</sup>* = *o*(k 1 *<sup>n</sup>* ∑ *n i*=1 *p γ θ*0 (*Xi*) − R *p γ*+1 *θ*0 d*λ*k 2 )*.*
