**Robust Change Point Test for General Integer-Valued Time Series Models Based on Density Power Divergence**

**Byungsoo Kim 1,\* and Sangyeol Lee <sup>2</sup>**


Received: 17 February 2020; Accepted: 24 April 2020; Published: 24 April 2020

**Abstract:** In this study, we consider the problem of testing for a parameter change in general integer-valued time series models whose conditional distribution belongs to the one-parameter exponential family when the data are contaminated by outliers. In particular, we use a robust change point test based on density power divergence (DPD) as the objective function of the minimum density power divergence estimator (MDPDE). The results show that under regularity conditions, the limiting null distribution of the DPD-based test is a function of a Brownian bridge. Monte Carlo simulations are conducted to evaluate the performance of the proposed test and show that the test inherits the robust properties of the MDPDE and DPD. Lastly, we demonstrate the proposed test using a real data analysis of the return times of extreme events related to Goldman Sachs Group stock.

**Keywords:** integer-valued time series; one-parameter exponential family; minimum density power divergence estimator; density power divergence; robust change point test

#### **1. Introduction**

Integer-valued time series models have received widespread attention from researchers and practitioners in diverse research areas. Since the works of McKenzie [1] as well as Al-Osh and Alzaid [2], integer-valued autoregressive (INAR) models have gained popularity in the analysis of correlated time series of counts. Later, as an alternative, Ferland et al. [3] proposed using Poisson integer-valued generalized autoregressive conditional heteroscedastic (INGARCH) models (see Engle [4] and Bollerslev [5]). Since then, INGARCH models have been studied by many authors, such as Fokianos et al. [6], who developed Poisson autoregressive (Poisson AR) models with nonlinear specifications for their intensity processes. The Poisson assumption on INGARCH models has been extended to include negative binomial INGARCH (NB-INGARCH) models (Davis and Wu [7] and Christou and Fokianos [8]), zero-inflated generalized Poisson INGARCH models (Zhu [9,10] and Lee et al. [11]), and one-parameter exponential distribution AR models (Davis and Liu [12]). The latter are also known as general integer-valued time series models and have been studied by, among others, Diop and Kengne [13] and Lee and Lee [14], who considered change point tests for these models.

The change point problem is a core issue in time series analysis because changes can occur in underlying model parameters owing to critical events or policy changes, and ignoring such changes can result in false conclusions. Numerous studies exist on change point analysis in time series models; refer to Kang and Lee [15] and Lee and Lee [14], and the articles cited therein, for the background and history of change points in integer-valued time series models. Lee and Lee [14] conducted a comparison study of the performance of various cumulative sum (CUSUM) tests using score vectors and residuals through the Monte Carlo simulations. In their work, the conditional maximum likelihood estimator (CMLE) is used for the parameter estimation and also the construction of the CUSUM tests. However, the CMLE is often damaged by outliers, and so is the performance of the CMLE-based CUSUM test. In general, outliers easily mislead the CUSUM test since they can be mistakenly taken for abrupt changes; in the opposite, they can misidentify change points in their presence on time series. Among the robust estimation methods, we adopt the minimum density power divergence estimator (MDPDE) approach—proposed by Basu et al. [16]—as a remedy and propose to use the density power divergence (DPD)-based test as a robust change point test.

The MDPDE method is well known for consistently making robust inferences in various situations, and the trade-off between efficiency and robustness is managed via the tuning parameter. Basu et al. [16] introduced the MDPDE using the independent and identically distributed observations, and later, Ghosh and Basu [17] extended their method to the independent but not identically distributed samples. For earlier works in the context of time series, see Lee and Song [18], Kim and Lee [19], Kang and Lee [20], and Kim and Lee [21], who deal with the MDPDE for GARCH models, multivariate times series, and (zero-inflated) Poisson AR models. Kim and Lee [22] demonstrated that the MDPDE for general integer-valued time series models has strong robust properties, with little loss in asymptotic efficiency relative to the CMLE. This motivates us to use the MDPDE to construct a robust change point test for general integer-valued time series models. More precisely, we anticipate that the robust property of the MDPDE would be inherited to the proposed change point test, so that the influence of outliers should be reduced when performing a parameter change test in the presence of outliers. Although the problem of testing for a parameter change in integer-valued time series models has been investigated by many researchers, the testing procedure for observations with outliers has not been widely studied. This motivates us to develop a MDPDE-based robust change point test for general integer-valued time series models.

Kang and Song [23] proposed an estimate-based robust CUSUM test that uses the MDPDE to detect parameter changes in Poisson AR models. However, this type of test is known to suffer from severe size distortions, especially when the true parameter lies at the boundary of the parameter space. Thus, we use the test deduced based on an empirical version of the DPD, which is the objective function of the MDPDE. Song and Kang [24] and Kang and Song [25] applied DPD-based change point tests in GARCH models and Poisson AR models, respectively. However, the DPD approach basically shares the same spirit as the score-based CUSUM test of Lee and Lee [14] (see Remark 3 in Section 2.2), in that both are based on derivatives of objective functions. Thus, the idea is easily adapted to one-parameter exponential family AR models. As for a parameter change test for independent samples based on divergence measures, see Batsidis et al. [26,27], who consider the *φ*-divergence as a measure. We also refer to Martín and Pardo [28], who point out the importance of a Wald-type test based on DPD in dealing with the change point problem.

Monte Carlo simulations are conducted to evaluate the performance of the proposed test. Here, we compare the DPD-based test and the score-based CUSUM test to demonstrate the superiority of the proposed test in the presence of outliers. Then, we provide a real data analysis of the return times of extreme events related to Goldman Sachs Group (GS) stock to illustrate the proposed test. The paper proceeds as follows. Section 2 constructs the DPD-based change point test for general integer-valued time series models, and states its weak convergence theorem. Section 3 presents a simulation study and a real data analysis. Section 4 concludes the paper. All proofs are provided in the Appendix A.

#### **2. Construction of the MDPDE and Change Point Test**

#### *2.1. MDPDE for General Integer-Valued Time Series Models*

Let *Y*1,*Y*2, . . . be the observations generated from general integer-valued time series models with the conditional distribution of the one-parameter exponential family:

$$\mathcal{Y}\_t|\mathcal{F}\_{t-1} \sim p(y|\eta\_t), \quad \mathcal{X}\_t := E(\mathcal{Y}\_t|\mathcal{F}\_{t-1}) = f\_\theta(\mathcal{X}\_{t-1}, \mathcal{Y}\_{t-1}), \tag{1}$$

where F*t*−<sup>1</sup> is a *σ*-field generated by *Yt*−1,*Yt*−2, . . . and *f<sup>θ</sup>* (*x*, *y*) is a non-negative bivariate function defined on [0, <sup>∞</sup>) <sup>×</sup> <sup>N</sup>0, <sup>N</sup><sup>0</sup> <sup>=</sup> <sup>N</sup> ∪ {0}, depending on the parameter *<sup>θ</sup>* <sup>∈</sup> <sup>Θ</sup> <sup>⊂</sup> <sup>R</sup>*<sup>d</sup>* , and satisfies inf*θ*∈<sup>Θ</sup> *f<sup>θ</sup>* (*x*, *y*) ≥ *x* ∗ for some *x* <sup>∗</sup> > 0 for all *x*, *y*. Here, *p*(·|·) is a probability mass function, given by

$$p(y|\eta) = \exp\{\eta y - A(\eta)\}h(y), \quad y \ge 0,$$

where *η* is the natural parameter and *A*(*η*) and *h*(*y*) are known functions. This distribution family includes several famous discrete distributions, such as the Poisson, negative binomial, and binomial distributions. If *B*(*η*) = *A* 0 (*η*), *B*(*ηt*) and *B* 0 (*ηt*) become the conditional mean and variance of *Y<sup>t</sup>* , and *X<sup>t</sup>* = *B*(*ηt*). The derivative of *A*(*η*) exists for the exponential family; see Lehmann and Casella [29]. Since *B* 0 (*ηt*) = *Var*(*Y<sup>t</sup>* |F*t*−1) > 0, *B*(*η*) is strictly increasing, and since *B*(*ηt*) = *E*(*Y<sup>t</sup>* |F*t*−1) > 0, *A*(*η*) is also strictly increasing. To emphasize the role of *θ*, we also use *Xt*(*θ*) and *ηt*(*θ*) = *B* −1 (*Xt*(*θ*)) to stand for *X<sup>t</sup>* and *η<sup>t</sup>* , respectively.

Davis and Liu [12] showed that the assumption below ensures the strict stationarity and ergodicity of {(*X<sup>t</sup>* ,*Yt*)}:

**(A0)** For all *x*, *x* <sup>0</sup> ≥ 0 and *y*, *y* <sup>0</sup> ∈ N0,

$$\sup\_{\theta \in \Theta} |f\_{\theta}(\mathbf{x}, y) - f\_{\theta}(\mathbf{x'}, y')| \le \omega\_1 |\mathbf{x} - \mathbf{x'}| + \omega\_2 |y - y'|\_{\mathcal{H}}$$

where *ω*1, *ω*<sup>2</sup> ≥ 0 satisfy *ω*<sup>1</sup> + *ω*<sup>2</sup> < 1.

They also demonstrated that there exists a measurable function *f θ* <sup>∞</sup> : N<sup>∞</sup> <sup>0</sup> → [0, ∞), such that *Xt*(*θ*) = *f θ* <sup>∞</sup>(*Yt*−1,*Yt*−2, . . .) almost surely (a.s.).

Meanwhile, the DPD *d<sup>α</sup>* between two density functions *g* and *h* is defined as

$$d\_a(g,h) := \begin{cases} \int \{g^{1+a}(y) - (1 + \frac{1}{a})h(y)g^a(y) + \frac{1}{a}h^{1+a}(y)\} dy, & a > 0, \\\int h(y)(\log h(y) - \log g(y)) dy, & a = 0. \end{cases}$$

For a parametric family {*G<sup>θ</sup>* , *θ* ∈ Θ} with densities given by {*gθ*} and a distribution *H* with density *h*, the minimum DPD functional *Tα*(*H*) is defined by *dα*(*h*, *gTα*(*H*) ) = min*θ*∈<sup>Θ</sup> *dα*(*h*, *g<sup>θ</sup>* ). In particular, if *H* = *Gθ*<sup>0</sup> ∈ {*Gθ*}, *Tα*(*Gθ*<sup>0</sup> ) = *θ*0. Then, given a random sample *Y*1, . . . ,*Y<sup>n</sup>* with unknown density *h*, the MDPDE is defined by

$$\delta\_{\mathfrak{a},\mathfrak{n}} = \underset{\theta \in \Theta}{\text{argmin }} L\_{\mathfrak{a},\mathfrak{n}}(\theta)\_{\mathfrak{a}}$$

where *Lα*,*n*(*θ*) = <sup>1</sup> *<sup>n</sup>* ∑ *n t*=1 *lα*,*t*(*θ*) and

$$l\_{\mathfrak{a},t}(\theta) = \begin{cases} \int \mathfrak{g}\_{\theta}^{1+\mathfrak{a}}(y) dy - \left(1 + \frac{1}{\mathfrak{a}}\right) \mathfrak{g}\_{\theta}^{\mathfrak{a}}(Y\_{t}), & \mathfrak{a} > 0, \\\ -\log \mathfrak{g}\_{\theta}(Y\_{t}), & \mathfrak{a} = 0. \end{cases}$$

When *α* = 0 and 1, the MDPDE becomes the MLE and the *L* 2 -distance estimator, respectively. Basu et al. [16] revealed that ˆ*θα*,*<sup>n</sup>* is consistent for *Tα*(*H*) and asymptotically normal. Furthermore, the estimator is robust against outliers, but still exhibits high efficiency when the true distribution belongs to a parametric family {*Gθ*} and *α* is close to zero. The tuning parameter *α* controls the trade-off between robustness and asymptotic efficiency. A large *α* escalates the robustness while a small *α* yields greater efficiency. The conditional version of the MDPDE is defined similarly (cf. Section 2 of Kim and Lee [22]).

For *Y*1, . . . ,*Y<sup>n</sup>* generated from (1), the MDPDE for general integer-valued time series models is defined as

$$\hat{\theta}\_{\mathfrak{a},\mathfrak{n}} = \operatorname\*{argmin}\_{\theta \in \Theta} \tilde{L}\_{\mathfrak{a},\mathfrak{n}}(\theta) = \operatorname\*{argmin}\_{\theta \in \Theta} \frac{1}{n} \sum\_{t=1}^{n} \tilde{l}\_{\mathfrak{a},t}(\theta), \tag{2}$$

where

$$I\_{a,l}(\theta) \quad = \begin{cases} \begin{array}{l} \sum\_{\substack{\boldsymbol{\theta}=\boldsymbol{0} \\ \boldsymbol{\theta} \end{array}} p^{1+a}(\boldsymbol{y}|\boldsymbol{\eta}\_{l}(\boldsymbol{\theta})) - \left(1 + \frac{1}{a}\right) p^{a}(Y\_{l}|\boldsymbol{\eta}\_{l}(\boldsymbol{\theta})), & a > 0, \\\ \boldsymbol{-\log p}(Y\_{l}|\boldsymbol{\eta}\_{l}(\boldsymbol{\theta})), & a = 0, \end{cases} \tag{3}$$

and *η*˜*t*(*θ*) = *B* −1 (*X*e*t*(*θ*)) is updated recursively using the following equations:

$$\tilde{X}\_t(\theta) = f\_\theta(\tilde{X}\_{t-1}(\theta), Y\_{t-1}), \ t = 2, 3, \dots, \ \tilde{X}\_1(\theta) = \tilde{X}\_{1\prime}$$

with an arbitrarily chosen initial value *X*e1. The MDPDE with *α* = 0 becomes the CMLE from (3).

Kim and Lee [22] showed that under the regularity conditions **(A0)**–**(A9)** stated below, the MDPDE is strongly consistent and asymptotically normal. Conditions **(A10)** and **(A11)** are imposed to derive the limiting null distribution of the DPD-based change point test in Section 2.2. Below, *V* and *ρ* ∈ (0, 1) represent a generic integrable random variable and a constant, respectively; the symbol k · k denotes the *L* 2 -norm for matrices and vectors; and *E*(·) is taken under *θ*0, where *θ*<sup>0</sup> denotes the true value of *θ*.

**(A1)** *<sup>θ</sup>*<sup>0</sup> is an interior point in the compact parameter space <sup>Θ</sup> <sup>⊂</sup> <sup>R</sup>*<sup>d</sup>* . **(A2)** *E* sup*θ*∈<sup>Θ</sup> *<sup>X</sup>*1(*θ*) <sup>4</sup> < ∞. **(A3)** inf*θ*∈<sup>Θ</sup> inf0≤*δ*≤<sup>1</sup> *B* 0 ((1 − *δ*)*ηt*(*θ*) + *δη*˜*t*(*θ*)) ≥ *c* for some *c* > 0. **(A4)** *EY*<sup>4</sup> <sup>1</sup> < ∞. **(A5)** If there exists *t* ≥ 1, such that *Xt*(*θ*) = *Xt*(*θ*0) a.s., then *θ* = *θ*0. **(A6)** sup*θ*∈<sup>Θ</sup> sup0≤*δ*≤<sup>1</sup> *B* <sup>00</sup>((1−*δ*)*ηt*(*θ*)+*δη*˜*t*(*θ*)) *<sup>B</sup>*0((1−*δ*)*ηt*(*θ*)+*δη*˜*t*(*θ*))5/2 <sup>≤</sup> *<sup>K</sup>* for some *<sup>K</sup>* <sup>&</sup>gt; 0. **(A7)** The mapping *θ* 7→ *f θ* <sup>∞</sup> is twice continuously differentiable with respect to *θ*, and satisfies

$$\begin{split} &E\left(\sup\_{\theta\in\Theta} \left\|\frac{\partial f^{\theta}\_{\alpha}(Y\_{0},Y\_{-1},\ldots)}{\partial\theta}\right\|\right)^{4} < \infty \quad \text{and} \; E\left(\sup\_{\theta\in\Theta} \left\|\frac{\partial^{2}f^{\theta}\_{\alpha}(Y\_{0},Y\_{-1},\ldots)}{\partial\theta\partial\theta^{T}}\right\|\right)^{2} < \infty. \\ & \text{(A8)} \;\sup\_{\theta\in\Theta} \left\|\frac{\partial\check{X}\_{t}(\theta)}{\partial\theta} - \frac{\partial\check{X}\_{t}(\theta)}{\partial\theta}\right\| \le V\rho^{t} \text{ a.s.} \\ & \text{(A9)} \;\nu^{T}\frac{\partial\check{X}\_{t}(\theta\_{0})}{\partial\theta} = 0 \text{ a.s. implies } \nu = 0. \\ &\text{(A10)}\;\sup\_{\theta\in\Theta} \left\|\frac{\partial^{2}\check{X}\_{t}(\theta)}{\partial\theta\partial\theta^{T}} - \frac{\partial^{2}\check{X}\_{t}(\theta)}{\partial\theta\partial^{2}}\right\| \le V\rho^{t} \text{ a.s.} \\ &\text{(A11)}\;\sup\_{\theta\in\Theta}\sup\_{0\le\delta\le 1}\frac{\left\|\frac{\partial^{3}\big((1-\delta)\eta\_{t}(\theta)+\delta\bar{\eta}\_{t}(\theta)\big)}{\partial\theta\big(1-\delta\big)\eta\_{t}(\theta)+\delta\bar{\eta}\_{t}(\theta)\big)^{4}}{\rho^{4}(1-\delta)}\right\| \le M \text{ for some } M>0. \end{split}$$

**Proposition 1.** *Under* **(A0)***–***(A5)***,* <sup>ˆ</sup>*θα*,*<sup>n</sup>* −→ *<sup>θ</sup>*<sup>0</sup> *a.s. as n* <sup>→</sup> <sup>∞</sup>*, and further, under* **(A0)***–***(A9)***,*

$$\sqrt{n}(\hat{\theta}\_{\alpha,n} - \theta\_0) \stackrel{d}{\longrightarrow} N(0, f\_{\alpha}^{-1} K\_{\alpha} f\_{\alpha}^{-1}) \text{ as } n \to \infty,$$

*where*

$$J\_{\alpha} = -E\left(\frac{\partial^2 l\_{\alpha,t}(\theta\_0)}{\partial \theta \partial \theta^T}\right),\ K\_{\alpha} = E\left(\frac{\partial l\_{\alpha,t}(\theta\_0)}{\partial \theta} \frac{\partial l\_{\alpha,t}(\theta\_0)}{\partial \theta^T}\right).$$

*and lα*,*t*(*θ*) *is defined by substituting ηt*(*θ*) *for η*˜*t*(*θ*) *in (3).*

**Remark 1.** *In our empirical study, discussed in Section 3.2, we select an optimal α using the method of Warwick [30] and Warwick and Jones [31]. We choose α that minimizes the trace of the estimated asymptotic mean squared error (AMSE):* d

$$A\widehat{\rm MSE} = (\widehat{\theta}\_{\mathfrak{a},\mathfrak{n}} - \widehat{\theta}\_{\mathfrak{l},\mathfrak{n}})(\widehat{\theta}\_{\mathfrak{a},\mathfrak{n}} - \widehat{\theta}\_{\mathfrak{l},\mathfrak{n}})^T + A\widehat{\rm s.var}(\widehat{\theta}\_{\mathfrak{a},\mathfrak{n}})\_{\mathfrak{n}}$$

*where* <sup>ˆ</sup>*θ*1,*<sup>n</sup> is the MDPDE with <sup>α</sup>* <sup>=</sup> <sup>1</sup> *and As.var* <sup>d</sup> ( ˆ*θα*,*n*) *is the estimate of the asymptotic variance of* ˆ*θα*,*n, computed as*

$$\widehat{Asur}(\hat{\theta}\_{a,n}) = \left(\sum\_{t=1}^{n} \frac{\partial^2 \tilde{l}\_{a,t}(\hat{\theta}\_{a,n})}{\partial \theta \partial \theta^T}\right)^{-1} \left(\sum\_{t=1}^{n} \frac{\partial \tilde{l}\_{a,t}(\hat{\theta}\_{a,n})}{\partial \theta} \frac{\partial \tilde{l}\_{a,t}(\hat{\theta}\_{a,n})}{\partial \theta^T}\right) \left(\sum\_{t=1}^{n} \frac{\partial^2 \tilde{l}\_{a,t}(\hat{\theta}\_{a,n})}{\partial \theta \partial \theta^T}\right)^{-1}.$$

**Remark 2.** *Instead of* **(A6)***, Kim and Lee [22] assumed*

$$\sup\_{\theta \in \Theta} \sup\_{0 \le \delta \le 1} \left| \frac{\mathcal{B}^{\prime\prime}((1-\delta)\eta\_t(\theta) + \delta \vec{\eta}\_t(\theta))}{\mathcal{B}^{\prime}((1-\delta)\eta\_t(\theta) + \delta \vec{\eta}\_t(\theta))^3} \right| \le \mathcal{K} \text{ for some } \mathcal{K} > 0$$

*to prove Proposition 1. Note that this condition is satisfied directly if* **(A3)** *and* **(A6)** *hold. In our study, we alter the above condition to* **(A6)** *to prove Lemma A1 in the Appendix A, which is needed to obtain the limiting null distribution of the DPD-based change point test in Section 2.2.*

The following INGARCH(1,1) models are typical examples of general integer-valued time series models:

$$\mathcal{Y}\_t|\mathcal{F}\_{t-1} \sim p(y|\eta\_t), \quad \mathcal{X}\_t = d + a\mathcal{X}\_{t-1} + b\mathcal{Y}\_{t-1}\nu$$

where *X<sup>t</sup>* = *B*(*ηt*) = *E*(*Y<sup>t</sup>* |F*t*−1), *θ* = (*d*, *a*, *b*) *<sup>T</sup>* <sup>∈</sup> <sup>Θ</sup> <sup>⊂</sup> (0, <sup>∞</sup>) <sup>×</sup> [0, <sup>∞</sup>) <sup>2</sup> with *a* + *b* < 1, and Θ is compact. Condition **(A0)** trivially holds, and the process {(*X<sup>t</sup>* ,*Yt*), *t* ≥ 1} has a strictly stationary and ergodic solution. Condition **(A1)** can be replaced with the following:

**(A1)**<sup>0</sup> The true parameter *<sup>θ</sup>*<sup>0</sup> lies in a compact neighborhood <sup>Θ</sup> <sup>∈</sup> <sup>R</sup><sup>3</sup> <sup>+</sup> of *θ*0, where

$$\Theta \in \{ \theta = (d, a, b)^T \in \mathbb{R}\_+^3 : 0 < d\_L \le d \le d\_{\mathcal{U}}, \ \varepsilon \le a + b \le 1 - \varepsilon \} \text{ for some } d\_L, d\_{\mathcal{U}}, \varepsilon > 0.$$

Moreover, we can express

$$X\_t(\theta) = \frac{d}{1-a} + b \sum\_{k=0}^{\infty} a^k Y\_{t-k-1} \quad \text{and} \quad \widetilde{X}\_t(\theta) = \frac{d}{1-a} + b \sum\_{k=0}^{t-2} a^k Y\_{t-k-1}.$$

where the initial value *X*e<sup>1</sup> is taken as *d*/(1 − *a*) for simplicity. Based on the above and **(A4)**, the conditions **(A2)**, **(A5)**, and **(A7)–(A10)** are all satisfied for INGARCH(1,1) models, as proven by Theorem 3 of Kang and Lee [15]. Kim and Lee [22] showed recently that the following Poisson and negative binomial INGARCH(1,1) models satisfy **(A3)** and **(A4)**. Furthermore, following the arguments presented in Section 3.2 of their study, **(A6)** holds for these models as well. Below, we show that **(A11)** holds for Poisson and negative binomial INGARCH(1,1) models.

• *Poisson INGARCH(1,1) model:*

$$Y\_t|\mathcal{F}\_{t-1} \sim \text{Poisson}(X\_t), \quad X\_t = d + aX\_{t-1} + bY\_{t-1}.$$

In this model, *ηt*(*θ*) = log(*Xt*(*θ*)) and *A*(*ηt*(*θ*)) = *e ηt*(*θ*) . Since *B* 0 (*η*) = *B* (3) (*η*), **(A11)** holds owing to **(A3)**.

• *NB-INGARCH(1,1) model:*

$$\mathcal{Y}\_t|\mathcal{F}\_{t-1} \sim \text{NB}(r, p\_t), \quad \mathcal{X}\_t = \frac{r(1 - p\_t)}{p\_t} = d + a\mathcal{X}\_{t-1} + b\mathcal{Y}\_{t-1}\prime$$

where NB(*r*, *p*) denotes a negative binomial distribution with parameters *r* ∈ N and *p* ∈ (0, 1). To be more specific, it counts the number of failures before the *r*-th success occurs in a sequence of Bernoulli trials with success probability *p*. Here, *r* is assumed to be known. In this model, *ηt*(*θ*) = log(*Xt*(*θ*)/(*Xt*(*θ*) + *r*)) and *A*(*ηt*(*θ*)) = *r*log(*r*/(1 − *e ηt*(*θ*) )). From the fact that *B* 0 (*η*) = *reη*/(<sup>1</sup> <sup>−</sup> *<sup>e</sup> η* ) 2 and *B* (3) (*η*) = *re<sup>η</sup>* (*e* <sup>2</sup>*<sup>η</sup>* + 4*e <sup>η</sup>* <sup>+</sup> <sup>1</sup>)/(<sup>1</sup> <sup>−</sup> *<sup>e</sup> η* ) 4 , we have *B* (3) (*η*)/*B* 0 (*η*) <sup>4</sup> = (<sup>1</sup> <sup>−</sup> *<sup>e</sup> η* ) 4 (*e* <sup>2</sup>*<sup>η</sup>* + 4*e <sup>η</sup>* + 1)/*r* 3 *e* 3*η* , which is positive and strictly decreasing on *η* < 0. Moreover, since *dL*/(*d<sup>L</sup>* + *r*) ≤ *e <sup>η</sup>t*(*θ*) < 1, it holds that

$$\frac{B^{(3)}(\eta\_t(\theta))}{B'(\eta\_t(\theta))^4} \le \frac{6(1 - d\_L/(d\_L + r))^4}{r^3 (d\_L/(d\_L + r))^3} = \frac{6r}{d\_L^3 (d\_L + r)}$$

and *B* (3) (*η*˜*t*(*θ*))/*B* 0 (*η*˜*t*(*θ*))<sup>4</sup> also has the same upper bound. Hence, **(A11)** is satisfied.

In addition to the above models, general integer-valued time series models also include nonlinear models, such as the integer-valued threshold GARCH (INTGARCH) model:

$$\mathcal{Y}\_t|\mathcal{F}\_{t-1} \sim \text{Poisson}(\mathbf{X}\_t), \quad \mathbf{X}\_t = d + a\mathbf{X}\_{t-1} + b\_1 \max(\mathbf{Y}\_{t-1} - l, \mathbf{0}) + b\_2 \min(\mathbf{Y}\_{t-1}, l),$$

where *θ* = (*d*, *a*, *b*1, *b*2) *<sup>T</sup>* <sup>∈</sup> <sup>Θ</sup> <sup>⊂</sup> (0, <sup>∞</sup>) <sup>×</sup> [0, <sup>∞</sup>) <sup>3</sup> with *a* + max(*b*1, *b*2) < 1, Θ is compact, and *l* is a non-negative integer value. For more details, see Remark 3 in Kim and Lee [22].

#### *2.2. DPD-Based Change Point Test*

As a robust test for parameter changes in general integer-valued time series models, we propose a DPD-based test for the following hypotheses:

*H*<sup>0</sup> : *θ* does not change over *Y*1, · · · ,*Y<sup>n</sup>* vs. *H*<sup>1</sup> : not *H*0.

To construct the test, we employ the objective function of the MDPDE. That is, our test is constructed using the empirical version of the DPD. Let e*Lα*,*<sup>n</sup>* be that in (2). To implement our test, we employ the following test statistic:

$$\hat{T}\_n^{\alpha} := \max\_{1 \le k \le n} \frac{k^2}{n} \frac{\partial \widetilde{L}\_{\alpha,k}(\hat{\theta}\_{\alpha,n})}{\partial \theta^T} \hat{K}\_{\alpha}^{-1} \frac{\partial \widetilde{L}\_{\alpha,k}(\hat{\theta}\_{\alpha,n})}{\partial \theta}.$$

where

$$\hat{\mathcal{K}}\_{\boldsymbol{\alpha}} = \frac{1}{n} \sum\_{t=1}^{n} \frac{\partial \mathcal{I}\_{\boldsymbol{\alpha},t}(\boldsymbol{\hat{\theta}}\_{\boldsymbol{\alpha},n})}{\partial \boldsymbol{\theta}} \frac{\partial \mathcal{I}\_{\boldsymbol{\alpha},t}(\boldsymbol{\hat{\theta}}\_{\boldsymbol{\alpha},n})}{\partial \boldsymbol{\theta}^{T}}$$

is a consistent estimator of *Kα*. For the consistency of *K*b*α*, see Lemma A5 in Appendix A.

Using the mean value theorem (MVT), we have the following, for each *s* ∈ [0, 1],

$$\frac{\partial \left[ \imath \mathbb{S} \right]}{\sqrt{n}} \frac{\partial \widetilde{L}\_{u, \left[ \imath \mathbb{s} \right]} (\widehat{\theta}\_{a, n})}{\partial \theta} = \frac{\left[ \imath \mathbb{s} \right]}{\sqrt{n}} \frac{\partial \widetilde{L}\_{u, \left[ \imath \mathbb{s} \right]} (\theta\_0)}{\partial \theta} + \frac{\left[ \imath \mathbb{s} \right]}{n} \frac{\partial^2 \widetilde{L}\_{u, \left[ \imath \mathbb{s} \right]} (\theta\_{a, n, s}^\*)}{\partial \theta \partial \theta^T} \sqrt{n} (\theta\_{a, n} - \theta\_0), \tag{4}$$

where *θ* ∗ *α*,*n*,*s* is an intermediate point between <sup>ˆ</sup>*θα*,*<sup>n</sup>* and *<sup>θ</sup>*0. From *<sup>∂</sup>*e*Lα*,*n*( ˆ*θα*,*n*)/*∂θ* = 0, we obtain that, for *s* = 1,

$$0 = \sqrt{n}\frac{\partial \widetilde{L}\_{\mathfrak{a},\mathfrak{n}}(\theta\_0)}{\partial \theta} + \frac{\partial^2 \widetilde{L}\_{\mathfrak{a},\mathfrak{n}}(\theta\_{\mathfrak{a},\mathfrak{n},1}^\*)}{\partial \theta \partial \theta^T} \sqrt{n}(\theta\_{\mathfrak{a},\mathfrak{n}} - \theta\_0).$$

Furthermore, since *J<sup>α</sup>* is nonsingular (cf. proof of Lemma 7 in Kim and Lee [22]), this can be expressed as

$$\begin{split} \sqrt{n}(\boldsymbol{\theta}\_{\boldsymbol{a},\boldsymbol{n}}-\boldsymbol{\theta}\_{0}) &= \ \ & \ \ \_{\boldsymbol{a}}^{-1}\sqrt{n}\frac{\partial \tilde{L}\_{\boldsymbol{a},\boldsymbol{n}}(\boldsymbol{\theta}\_{0})}{\partial \boldsymbol{\theta}}+\ \ \_{\boldsymbol{a}}^{-1}\frac{\partial^{2}\tilde{L}\_{\boldsymbol{a},\boldsymbol{n}}(\boldsymbol{\theta}\_{\boldsymbol{a},\boldsymbol{n},1}^{\*})}{\partial\boldsymbol{\theta}\partial\boldsymbol{\theta}^{T}}\sqrt{n}(\boldsymbol{\theta}\_{\boldsymbol{a},\boldsymbol{n}}-\boldsymbol{\theta}\_{0})+\sqrt{n}(\boldsymbol{\theta}\_{\boldsymbol{a},\boldsymbol{n}}-\boldsymbol{\theta}\_{0}) \\ &= \ & \ \ \ \ \ \ \ \ \ \_{\boldsymbol{a}}^{-1}\sqrt{n}\frac{\partial \tilde{L}\_{\boldsymbol{a},\boldsymbol{n}}(\boldsymbol{\theta}\_{0})}{\partial\boldsymbol{\theta}}+\ \ \ \ \ \ \ \ \ \left \ \frac{\partial^{2}\tilde{L}\_{\boldsymbol{a},\boldsymbol{n}}(\boldsymbol{\theta}\_{\boldsymbol{a},\boldsymbol{n},1}^{\*})}{\partial\boldsymbol{\theta}\partial\boldsymbol{\theta}^{T}}+\ \ \ \ \end{split}$$

Substituting the above into (4) yields

$$\begin{split} \frac{[\text{tts}]}{\sqrt{n}} \frac{\partial \tilde{L}\_{a,[\text{ns}]}(\hat{\theta}\_{a,n})}{\partial \theta} &= \quad \frac{[\text{tts}]}{\sqrt{n}} \frac{\partial \tilde{L}\_{a,[\text{ns}]}(\theta\_{0})}{\partial \theta} + \frac{[\text{tts}]}{n} \frac{\partial^{2} \tilde{L}\_{a,[\text{ns}]}(\theta\_{a,n,s}^{\*})}{\partial \theta \partial \theta^{T}} I\_{a}^{-1} \sqrt{n} \frac{\partial \tilde{L}\_{a,n}(\theta\_{0})}{\partial \theta} \\ &+ \frac{[\text{tts}]}{n} \frac{\partial^{2} \tilde{L}\_{a,[\text{ns}]}(\theta\_{a,n,s}^{\*})}{\partial \theta \partial \theta^{T}} I\_{a}^{-1} \left(\frac{\partial^{2} \tilde{L}\_{a,n}(\theta\_{a,n,1}^{\*})}{\partial \theta \partial \theta^{T}} + I\_{a}\right) \sqrt{n} (\hat{\theta}\_{a,n} - \theta\_{0}). \end{split} \tag{5}$$

In Appendix A, we show that the first two terms on the right-hand side of (5) converge weakly to *K* 1/2 *<sup>α</sup> B o d* (*s*), where *B o d* is a *d*-dimensional standard Brownian bridge and the last term is asymptotically negligible. Therefore, we obtain the following theorem.

**Theorem 1.** *Suppose that conditions* **(A0)***–***(A11)** *hold. Then, under H*0*, we have*

$$K\_{\alpha}^{-1/2} \frac{[ns]}{\sqrt{n}} \frac{\partial \widetilde{L}\_{\alpha, [ns]}(\boldsymbol{\theta}\_{\alpha, n})}{\partial \boldsymbol{\theta}} \stackrel{w}{\longrightarrow} B\_d^o(s).$$

*Therefore,*

$$
\widehat{T}\_n^a \stackrel{d}{\longrightarrow} \sup\_{0 \le s \le 1} \|B\_d^o(s)\|^2.
$$

We reject *<sup>H</sup>*<sup>0</sup> if *<sup>T</sup>*b*<sup>α</sup> n* is large; see Table 1 of Lee et al. [32] for the critical values. When a change point is detected, its location is estimated as

$$\underset{1 \le k \le n}{\text{argmax}} \frac{k^2}{n} \frac{\partial \widetilde{L}\_{\mathfrak{a},k}(\boldsymbol{\hat{\theta}}\_{\mathfrak{a},n})}{\partial \boldsymbol{\theta}^T} \boldsymbol{\hat{K}}\_{\boldsymbol{\alpha}}^{-1} \frac{\partial \widetilde{L}\_{\mathfrak{a},k}(\boldsymbol{\hat{\theta}}\_{\mathfrak{a},n})}{\partial \boldsymbol{\theta}}.$$

**Remark 3.** *The proposed test <sup>T</sup>*b*<sup>α</sup> <sup>n</sup> with α* = 0 *is the same as the score-vector-based CUSUM test proposed by Lee and Lee [14], given by*

$$\widehat{T}\_{n}^{\text{score}} = \max\_{1 \le k \le n} \frac{1}{n} \left( \sum\_{t=1}^{k} \frac{\partial \check{l}\_{0,t}(\boldsymbol{\theta}\_{0,n})}{\partial \boldsymbol{\theta}^{T}} \right) \widehat{I}\_{n}^{-1} \left( \sum\_{t=1}^{k} \frac{\partial \check{l}\_{0,t}(\boldsymbol{\theta}\_{0,n})}{\partial \boldsymbol{\theta}} \right) \boldsymbol{\theta}$$

*where* ˜ *<sup>l</sup>*0,*t*(*θ*) *is defined in (3),* <sup>ˆ</sup>*θ*0,*<sup>n</sup> is the CMLE, and* <sup>b</sup>*I<sup>n</sup>* <sup>=</sup> *<sup>n</sup>* <sup>−</sup><sup>1</sup> ∑ *n t*=1 *∂* 2 ˜ *l*0,*t*( ˆ*θ*0,*n*)/*∂θ∂θ<sup>T</sup> . In the next section, we compare the performance of <sup>T</sup>*b*<sup>α</sup> <sup>n</sup> with that of <sup>T</sup>*b*score n in the presence of outliers.*

#### **3. Empirical Studies**

#### *3.1. Simulation*

In this section, we evaluate the performance of the proposed test *<sup>T</sup>*b*<sup>α</sup> n* (with *α* > 0) through simulations, focusing on the comparison with *<sup>T</sup>*b*score n* . First, we consider the Poisson INGARCH models:

$$Y\_t|\mathcal{F}\_{t-1} \sim \text{Poisson}(X\_t), \quad X\_t = d + aX\_{t-1} + bY\_{t-1}, \tag{6}$$

where *X*<sup>1</sup> is set to 0 for the data generation and *X*e<sup>1</sup> is set as the sample mean of the data. The sample sizes considered are *n* = 500 and 1000, with 1000 repetitions for each simulation. For the comparison, we examine the empirical size and power at the nominal level of 0.05, which has a corresponding critical value of 3.004. To calculate the empirical size and power for each test, we consider cases with *θ* = (*d*, *a*, *b*) = (1, 0.2, 0.2), (1, 0.2, 0.4), (1, 0.2, 0.7) and those in which *θ* = (*d*, *a*, *b*) = (1, 0.2, 0.2) changes to *θ* 0 = (*d* 0 , *a* 0 , *b* 0 ) = (1.5, 0.2, 0.2), (1, 0.4, 0.2), (1, 0.2, 0.4) at the middle time *t* = [*n*/2], respectively.

Table 1 presents the results when the data are not contaminated by outliers, showing that both tests (*T*b*score <sup>n</sup>* and *<sup>T</sup>*b*<sup>α</sup> n* ) exhibit reasonable size, even when *<sup>a</sup>* <sup>+</sup> *<sup>b</sup>* is close to 1. When *<sup>n</sup>* <sup>=</sup> 500, *<sup>T</sup>*b*score <sup>n</sup>* outperforms *T*b*α n* in terms of power; however, as the sample size increases to *<sup>n</sup>* = 1000, *<sup>T</sup>*b*<sup>α</sup> n* exhibits similar power to that of *<sup>T</sup>*b*score n* , particularly when *<sup>α</sup>* is small. The power of *<sup>T</sup>*b*<sup>α</sup> n* tends to decrease as *α* increases, confirming that an MDPDE with large *α* results in a loss of efficiency.


**Table 1.** Empirical sizes and powers for Poisson integer-valued generalized autoregressive conditional heteroscedastic (INGARCH)(1,1) models when no outliers exist.

To evaluate the robustness of the proposed test, we assume that contaminated data *Yc*,*<sup>t</sup>* are observed instead of *Y<sup>t</sup>* in (6) (cf. Fried et al. [33]):

$$Y\_{\mathcal{C},t} = Y\_t + P\_t Y\_{o,t\_t} \tag{7}$$

where *P<sup>t</sup>* are independent and identically distributed (iid) Bernoulli random variables with success probability *p* and *Yo*,*<sup>t</sup>* are iid Poisson random variables with mean *γ*. We assume that *Y<sup>t</sup>* , *P<sup>t</sup>* , and *Yo*,*<sup>t</sup>* are all independent. In this simulation, we consider the cases *p* = 0.01, 0.03 and *γ* = 5, 10. The results are reported in Tables 2–5, showing that *<sup>T</sup>*b*score n* suffers from size distortions that become more severe as either *<sup>p</sup>* or *<sup>γ</sup>* increase. In contrast, *<sup>T</sup>*b*<sup>α</sup> n* compensates for this defect remarkably well, yielding comparable power to that of *<sup>T</sup>*b*score <sup>n</sup>* when *<sup>n</sup>* = 1000. This indicates that as more data are contaminated by outliers, *<sup>T</sup>*b*<sup>α</sup> n* increasingly outperforms *<sup>T</sup>*b*score n* .


**Table 2.** Empirical sizes and powers for Poisson INGARCH(1,1) models when *p* = 0.01 and *γ* = 5.

**Table 3.** Empirical sizes and powers for Poisson INGARCH(1,1) models when *p* = 0.01 and *γ* = 10.


**Table 4.** Empirical sizes and powers for Poisson INGARCH(1,1) models when *p* = 0.03 and *γ* = 5.



**Table 5.** Empirical sizes and powers for Poisson INGARCH(1,1) models when *p* = 0.03 and *γ* = 10.

Next, we consider the following NB-INGARCH(1,1) models:

$$\Upsilon\_t|\mathcal{F}\_{t-1} \sim \text{NB}(r, p\_t), \quad \mathcal{X}\_t = \frac{r(1 - p\_t)}{p\_t} = d + a\mathcal{X}\_{t-1} + b\mathcal{Y}\_{t-1}.\tag{8}$$

where *X*<sup>1</sup> and *X*e<sup>1</sup> are 0 and the sample mean of the data, respectively. We set *r* = 10, and use the same parameter settings as in the Poisson INGARCH model case. In order to evaluate the robustness of the test, we observe contaminated data *Yc*,*<sup>t</sup>* , as in (7), where *Y<sup>t</sup>* are generated from (8), *P<sup>t</sup>* are iid Bernoulli random variables with success probability *p*, and *Yo*,*<sup>t</sup>* are iid NB(10, *κ*) random variables. We consider the cases *p* = 0.01, 0.03 and *κ* = 0.6, 0.5. The results are reported in Tables 6–10, showing similar results to those in Tables 1–5. Our findings show that the DPD-based test performs reasonably well in terms of both size and power, regardless of the existence of outliers. In addition, we confirm that the proposed test outperforms the score-based CUSUM test when the data are contaminated by outliers.

*T*b*α <sup>n</sup>* **with** *α <sup>θ</sup>* = (*d***,** *<sup>a</sup>***,** *<sup>b</sup>*) **<sup>n</sup>** *<sup>T</sup>*b*score <sup>n</sup> α* = **0.1** *α* = **0.2** *α* = **0.3** *α* = **0.5** *α* = **1** (1, 0.2, 0.2) 500 0.076 0.050 0.052 0.054 0.061 0.071 1000 0.061 0.055 0.052 0.052 0.055 0.059 Sizes (1, 0.2, 0.4) 500 0.040 0.041 0.038 0.040 0.045 0.048 1000 0.049 0.053 0.056 0.057 0.062 0.060 (1, 0.2, 0.7) 500 0.047 0.046 0.043 0.038 0.042 0.043 1000 0.041 0.044 0.048 0.048 0.047 0.043 *θ* 0 = (*d* 0 **,** *a* 0 **,** *b* 0 ) **n** *θ* = (*d***,** *a***,** *b*) = (**1, 0.2, 0.2**) **changes to** *θ* 0 = (*d* 0 **,** *a* 0 **,** *b* 0 ) (1.5, 0.2, 0.2) 500 0.821 0.759 0.735 0.706 0.640 0.505 1000 0.953 0.942 0.936 0.932 0.919 0.881 Powers (1, 0.4, 0.2) 500 0.759 0.689 0.646 0.611 0.558 0.454 1000 0.967 0.964 0.959 0.955 0.940 0.881 (1, 0.2, 0.4) 500 0.733 0.719 0.718 0.702 0.650 0.544 1000 0.984 0.984 0.981 0.975 0.961 0.908

**Table 6.** Empirical sizes and powers for negative binomial INGARCH (NB-INGARCH)(1,1) models when no outliers exist.


**Table 7.** Empirical sizes and powers for NB-INGARCH(1,1) models when *p* = 0.01 and *κ* = 0.6.

**Table 8.** Empirical sizes and powers for NB-INGARCH(1,1) models when *p* = 0.01 and *κ* = 0.5.


**Table 9.** Empirical sizes and powers for NB-INGARCH(1,1) models when *p* = 0.03 and *κ* = 0.6.



**Table 10.** Empirical sizes and powers for NB-INGARCH(1,1) models when *p* = 0.03 and *κ* = 0.5.

#### *3.2. Real Data Analysis*

In this section, we demonstrate the validity of *<sup>T</sup>*b*<sup>α</sup> <sup>n</sup>* using a real data analysis. To this end, we analyze the return times of extreme events related to GS stock, which are constructed based on the daily log-returns for the period of 5 May 1999 to 15 March 2012. Davis and Liu [12] and Kim and Lee [22] previously investigated this data set in their works on geometric INGARCH(1,1) models (i.e., NB-INGARCH(1,1) models with *r* = 1).

We first compute the hitting times, *τ*1, *τ*2, . . ., for which the log-returns of GS stock fall outside the 0.05 and 0.95 quantiles of the data. The return times of these extreme events are calculated as *Y<sup>t</sup>* = *τ<sup>t</sup>* − *τt*−1. Figure 1 plots *Y<sup>t</sup>* , *t* = 1, . . . , 323. The figure shows that the data include large observations; for example, a sample variance of 1106 with a sample mean of 10.01 indicates the existence of aberrant observations.

**Figure 1.** Plot of the return times of extreme events for Goldman Sachs Group (GS) stock.

Since *Y<sup>t</sup>* ≥ 1, we consider a geometric distribution that counts the total number of trials, rather than the number of failures, to fit the following geometric INGARCH(1,1) models to the data:

$$\mathcal{Y}\_t|\mathcal{F}\_{t-1} \sim \text{Geo}(p\_t), \quad \mathcal{X}\_t = \frac{1}{p\_t} = d + a\mathcal{X}\_{t-1} + b\mathcal{Y}\_{t-1}\prime$$

where *X*e<sup>1</sup> is set as the sample mean of the data. Kim and Lee [22] showed that the optimal *α* for the MDPDE is 0.25, using the criterion provided in Remark 1. The results for the parameter estimation are summarized in Table 11 for *α* = 0 (CMLE) and 0.25 (MDPDE with optimal *α*); figures in parentheses denote the standard errors of the corresponding estimates. We observe that, compared with the CMLE, the MDPDE with *α* = 0.25 is quite different and has smaller standard errors.

**Table 11.** Parameter estimates for geometric INGARCH(1,1) models.


Next, we use *<sup>T</sup>*b*score <sup>n</sup>* and *<sup>T</sup>*b0.25 *n* (*T*b*α <sup>n</sup>* with *α* = 0.25) to perform a parameter change test at the nominal level of 0.05 (the corresponding critical value is 3.004). Let *<sup>T</sup>*b*score <sup>n</sup>* <sup>=</sup> max1≤*k*≤*<sup>n</sup> SCOREk*,*<sup>n</sup>* and *<sup>T</sup>*b0.25 *<sup>n</sup>* = max1≤*k*≤*<sup>n</sup> DPDk*,*n*. The left and right panels of Figure 2 display *SCOREk*,*<sup>n</sup>* and *DPDk*,*n*, respectively. For most *k*, *DPDk*,*<sup>n</sup>* appears to be smaller than *SCOREk*,*n*, which is definitely attributed to the robustness of the MDPDE and DPD. We obtain *<sup>T</sup>*b*score <sup>n</sup>* = 5.136, which suggests the existence of a parameter change. In Figures <sup>1</sup> and 2, the red, vertical, dashed line represents the location of a change when *<sup>T</sup>*b*score n* is applied. However, this result is not so reliable because *<sup>T</sup>*b*score n* can signal a change point affected by outliers as seen in the previous section, and the change point is truly detected at the occurrence time of an outlier in this case. In contrast, *<sup>T</sup>*b0.25 *<sup>n</sup>* yields a value of 1.219, indicating that no change point exists. This result clearly demonstrates that outliers can severely affect parameter estimates and change point tests by mistakenly identifying a change point. Our findings confirm that the DPD-based change point test provides a functional and robust alternative to the score-based CUSUM test in the presence of outliers.

**Figure 2.** Plots of *SCOREk*,*<sup>n</sup>* and *DPDk*,*<sup>n</sup>* .

#### **4. Conclusions**

In this study, we developed a DPD-based robust change point test for general integer-valued time series models with a conditional distribution that belongs to the one-parameter exponential family. We provided regularity conditions under which the proposed test converges weakly to the function of a Brownian bridge. The simulation study showed that the DPD-based test produces reasonable sizes and powers regardless of the existence of outliers, whereas the score-based CUSUM test suffers from severe size distortions when the data are contaminated by outliers. In the real data analysis using the return times of extreme events related to GS stock, the score-based CUSUM test supported the existence a parameter change, due to the influence of outliers, while the DPD-based test did not detect a change point because of its robust property. This result confirms the validity of the proposed test as a robust test in practice. It is noteworthy that the DPD-based test can be feasibly extended to other parametric models as far as the asymptotic properties of the MDPDE for the models are validated. We leave the issue of extension to other models as our future study.

**Author Contributions:** Conceptualization, B.K. and S.L.; software, B.K.; methodology, B.K. and S.L.; formal analysis, B.K. and S.L.; data curation, B.K.; writing—original draft preparation, B.K. and S.L.; funding acquisition, B.K. and S.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (NRF-2019R1C1C1004662) (B. Kim) and (NRF-2018R1A2A2A05019433) (S. Lee).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

In this appendix, we prove Theorem 1 for *α* > 0; refer to Lee and Lee [14] for the case of *α* = 0. The following properties of the probability mass function of the non-negative integer-valued exponential family are useful for proving Lemma A1. For all *y* ∈ N<sup>0</sup> and *η* ∈ R:

(E1) 0 < *p*(*y*|*η*) < 1, (E2) ∑ ∞ *y*=0 *p*(*y*|*η*) = 1, (E3) ∑ ∞ *y*=0 *yp*(*y*|*η*) = *B*(*η*), (E4) ∑ ∞ *y*=0 *y* <sup>2</sup> *<sup>p</sup>*(*y*|*η*) = *<sup>B</sup>* 0 (*η*) + *B*(*η*) 2 , (E5) ∑ ∞ *y*=0 *y* <sup>3</sup> *<sup>p</sup>*(*y*|*η*) = *<sup>B</sup>* 00(*η*) + 3*B* 0 (*η*)*B*(*η*) + *B*(*η*) 3 .

Throughout this section, we denote *Lα*,*n*(*θ*) = *n* <sup>−</sup><sup>1</sup> ∑ *n t*=1 *lα*,*t*(*θ*) and employ the notation *η<sup>t</sup>* = *ηt*(*θ*), *η*˜*<sup>t</sup>* = *η*˜*t*(*θ*), and *η* 0 *<sup>t</sup>* = *ηt*(*θ*0) for brevity. Furthermore, if we define two functions *hα*(*η*) and *mα*(*η*) as

$$\begin{split} h\_{a}(\eta) &= \sum\_{y=0}^{\infty} p(y|\eta)^{1+a} \frac{y - B(\eta)}{B'(\eta)} - p(\mathbf{Y}\_{l}|\eta)^{a} \frac{\mathbf{Y}\_{l} - B(\eta)}{B'(\eta)}, \\ m\_{a}(\eta) &= \sum\_{y=0}^{\infty} p(y|\eta)^{1+a} \left[ (1+a) \left( \frac{y - B(\eta)}{B'(\eta)} \right)^{2} - \frac{B''(\eta)}{B'(\eta)^{2}} \frac{y - B(\eta)}{B'(\eta)} - \frac{1}{B'(\eta)} \right], \\ &\quad - p(\mathbf{Y}\_{l}|\eta)^{a} \left[ a \left( \frac{\mathbf{Y}\_{l} - B(\eta)}{B'(\eta)} \right)^{2} - \frac{B''(\eta)}{B'(\eta)^{2}} \frac{\mathbf{Y}\_{l} - B(\eta)}{B'(\eta)} - \frac{1}{B'(\eta)} \right], \end{split}$$

we obtain

$$\begin{array}{rcl}\frac{\partial l\_{\mathfrak{d},f}(\theta)}{\partial \theta} &=& (1+\mathfrak{a})h\_{\mathfrak{d}}(\eta\_{t})\frac{\partial \mathcal{X}\_{t}(\theta)}{\partial \theta},\\\frac{\partial^{2}l\_{\mathfrak{d},f}(\theta)}{\partial \theta \partial \theta^{T}} &=& (1+\mathfrak{a})\left(h\_{\mathfrak{d}}(\eta\_{t})\frac{\partial^{2}\mathcal{X}\_{t}(\theta)}{\partial \theta \partial \theta^{T}} + m\_{\mathfrak{d}}(\eta\_{t})\frac{\partial \mathcal{X}\_{t}(\theta)}{\partial \theta}\frac{\partial \mathcal{X}\_{t}(\theta)}{\partial \theta^{T}}\right). \end{array}$$

**Lemma A1.** *Suppose that conditions* **(A3)***,* **(A6)***, and* **(A11)** *hold. Then, we have*

<sup>|</sup>*hα*(*ηt*)| ≤ <sup>1</sup> *c* (*Y<sup>t</sup>* + 3*Xt*(*θ*)), <sup>|</sup>*hα*(*η*˜*t*)| ≤ <sup>1</sup> *c* (*Y<sup>t</sup>* + 3*Xt*(*θ*) + 3|*Xt*(*θ*) − *X*e*t*(*θ*)|), <sup>|</sup>*mα*(*ηt*)| ≤ *<sup>α</sup> c* 2 *Y* 2 *<sup>t</sup>* + *K c* 1/2*Y<sup>t</sup>* + *α c* 2 *Xt*(*θ*) <sup>2</sup> + 3*K c* 1/2 *<sup>X</sup>t*(*θ*) + <sup>3</sup> <sup>+</sup> *<sup>α</sup> c* , <sup>|</sup>*hα*(*ηt*) <sup>−</sup> *<sup>h</sup>α*(*η*˜*t*)| ≤ *α c* 2 *Y* 2 *<sup>t</sup>* + *K c* 1/2*Y<sup>t</sup>* + 2*α c* 2 *Xt*(*θ*) <sup>2</sup> <sup>+</sup> <sup>|</sup>*Xt*(*θ*) <sup>−</sup> *<sup>X</sup>*e*t*(*θ*)<sup>|</sup> 2 + 3*K c* 1/2 *Xt*(*θ*) + |*Xt*(*θ*) − *X*e*t*(*θ*)| + 3 + *α c* |*Xt*(*θ*) − *X*e*t*(*θ*)|, <sup>|</sup>*mα*(*η*˜*t*)| ≤ *<sup>α</sup> c* 2 *Y* 2 *<sup>t</sup>* + *K c* 1/2*Y<sup>t</sup>* + 2*α c* 2 *Xt*(*θ*) <sup>2</sup> <sup>+</sup> <sup>|</sup>*Xt*(*θ*) <sup>−</sup> *<sup>X</sup>*e*t*(*θ*)<sup>|</sup> 2 + 3*K c* 1/2 *Xt*(*θ*) + |*Xt*(*θ*) − *X*e*t*(*θ*)| + 3 + *α c* , <sup>|</sup>*mα*(*ηt*) <sup>−</sup> *<sup>m</sup>α*(*η*˜*t*)| ≤ *α* 2 *c* 3 *Y* 3 *<sup>t</sup>* + 3*αK c* 3/2 *Y* 2 *<sup>t</sup>* + 3*α c* 2 + *M* + 3*K* 2 *Yt* + 4(3*α* <sup>2</sup> + 4*α* + 2) *c* 3 *Xt*(*θ*) <sup>3</sup> <sup>+</sup> <sup>|</sup>*Xt*(*θ*) <sup>−</sup> *<sup>X</sup>*e*t*(*θ*)<sup>|</sup> 3 + 6*αK c* 3/2 *Xt*(*θ*) <sup>2</sup> <sup>+</sup> <sup>|</sup>*Xt*(*θ*) <sup>−</sup> *<sup>X</sup>*e*t*(*θ*)<sup>|</sup> 2 +3 *α* <sup>2</sup> + 5*α* + 3 *c* 2 + *M* + 3*K* 2 *Xt*(*θ*) + |*Xt*(*θ*) − *X*e*t*(*θ*)| + (*α* <sup>2</sup> + 5*α* + 8)*K c* 1/2 |*Xt*(*θ*) − *X*e*t*(*θ*)|.

**Proof.** The proofs for the first four parts of the lemma can be found in Lemma 4 of Kim and Lee [22]. The fifth part is obtained directly from the third part, together with the fact that *X*e*t*(*θ*) ≤ |*Xt*(*θ*) − *X*e*t*(*θ*)| + *Xt*(*θ*).

By the MVT, (E1)–(E5), **(A3)**, **(A6)**, and **(A11)**, we have


where *X* ∗ *t* (*θ*) is an intermediate point between *Xt*(*θ*) and *X*e*t*(*θ*), and *η* ∗ *<sup>t</sup>* = *B* −1 (*X* ∗ *t* (*θ*)). Note that since *B* −1 is strictly increasing, *η* ∗ *t* lies between *B* −1 (*Xt*(*θ*)) = *η<sup>t</sup>* and *B* −1 (*X*e*t*(*θ*)) = *η*˜*<sup>t</sup>* . Then, because *B*(*η* ∗ *t* ) ≤ *B*(*ηt*) + |*B*(*ηt*) − *B*(*η*˜*t*)|, the last part of the lemma is established.

**Lemma A2.** *Suppose that conditions* **(A0)***–***(A11)** *hold. Then, under H*0*, we have as n* → ∞*,*

$$\frac{1}{n} \sum\_{t=1}^{n} \sup\_{\theta \in \Theta} \left\| \frac{\partial^2 l\_{\alpha,t}(\theta)}{\partial \theta \partial \theta^T} - \frac{\partial^2 l\_{\alpha,t}(\theta)}{\partial \theta \partial \theta^T} \right\| = o(1) \text{ a.s.}$$

*and*

$$\frac{1}{n} \sum\_{t=1}^{n} \sup\_{\theta \in \Theta} \left\| \frac{\partial l\_{\mathfrak{a},t}(\theta)}{\partial \theta} \frac{\partial l\_{\mathfrak{a},t}(\theta)}{\partial \theta^{T}} - \frac{\partial \check{l}\_{\mathfrak{a},t}(\theta)}{\partial \theta} \frac{\partial \check{l}\_{\mathfrak{a},t}(\theta)}{\partial \theta^{T}} \right\| = o(1) \text{ a.s.}$$

**Proof.** It is sufficient to show that as *t* → ∞,

$$\sup\_{\theta \in \Theta} \left\| \frac{\partial^2 l\_{\alpha,t}(\theta)}{\partial \theta \partial \theta^T} - \frac{\partial^2 \tilde{l}\_{\alpha,t}(\theta)}{\partial \theta \partial \theta^T} \right\| = o(1) \text{ a.s.}$$

and

$$\sup\_{\theta \in \Theta} \left\| \frac{\partial l\_{\mathfrak{a},t}(\theta)}{\partial \theta} \frac{\partial l\_{\mathfrak{a},t}(\theta)}{\partial \theta^T} - \frac{\partial \tilde{l}\_{\mathfrak{a},t}(\theta)}{\partial \theta} \frac{\partial \tilde{l}\_{\mathfrak{a},t}(\theta)}{\partial \theta^T} \right\| = o(1) \text{ a.s.}$$

Note that we can write

1 1 + *α* sup *θ*∈Θ *∂* 2 *lα*,*t*(*θ*) *∂θ∂θ<sup>T</sup>* − *∂* 2 ˜ *lα*,*t*(*θ*) *∂θ∂θ<sup>T</sup>* ≤ sup *θ*∈Θ *hα*(*η*˜*t*) *∂* <sup>2</sup>*Xt*(*θ*) *∂θ∂θ<sup>T</sup>* − *∂* <sup>2</sup>*X*e*t*(*θ*) *∂θ∂θ<sup>T</sup>* ! + sup *θ*∈Θ (*hα*(*ηt*) <sup>−</sup> *<sup>h</sup>α*(*η*˜*t*)) *<sup>∂</sup>* <sup>2</sup>*Xt*(*θ*) *∂θ∂θ<sup>T</sup>* + sup *θ*∈Θ (*mα*(*ηt*) <sup>−</sup> *<sup>m</sup>α*(*η*˜*t*)) *<sup>∂</sup>Xt*(*θ*) *∂θ ∂Xt*(*θ*) *∂θ<sup>T</sup>* + sup *θ*∈Θ *mα*(*η*˜*t*) *∂Xt*(*θ*) *∂θ ∂Xt*(*θ*) *∂θ<sup>T</sup>* − *∂X*e*t*(*θ*) *∂θ<sup>T</sup>* ! + sup *θ*∈Θ *mα*(*η*˜*t*) *∂Xt*(*θ*) *∂θ* − *∂X*e*t*(*θ*) *∂θ* ! *∂X*e*t*(*θ*) *∂θ<sup>T</sup>* − *∂Xt*(*θ*) *∂θ<sup>T</sup>* ! + sup *θ*∈Θ *mα*(*η*˜*t*) *∂Xt*(*θ*) *∂θ* − *∂X*e*t*(*θ*) *∂θ* ! *∂Xt*(*θ*) *∂θ<sup>T</sup>* ≤ sup *θ*∈Θ |*hα*(*η*˜*t*)| sup *θ*∈Θ *∂* <sup>2</sup>*Xt*(*θ*) *∂θ∂θ<sup>T</sup>* − *∂* <sup>2</sup>*X*e*t*(*θ*) *∂θ∂θ<sup>T</sup>* + sup *θ*∈Θ |*hα*(*ηt*) − *hα*(*η*˜*t*)| sup *θ*∈Θ *∂* <sup>2</sup>*Xt*(*θ*) *∂θ∂θ<sup>T</sup>* + sup *θ*∈Θ |*mα*(*ηt*) − *mα*(*η*˜*t*)| sup *θ*∈Θ *∂Xt*(*θ*) *∂θ* !<sup>2</sup> + 2 sup *θ*∈Θ |*mα*(*η*˜*t*)| sup *θ*∈Θ *∂Xt*(*θ*) *∂θ* sup *θ*∈Θ *∂Xt*(*θ*) *∂θ* − *∂X*e*t*(*θ*) *∂θ* + sup *θ*∈Θ |*mα*(*η*˜*t*)| sup *θ*∈Θ *∂Xt*(*θ*) *∂θ* − *∂X*e*t*(*θ*) *∂θ* !<sup>2</sup> .

Using Lemma 2.1 of Straumann and Mikosch [34], together with Lemma A1, **(A2)**, **(A4)**, **(A7)**, **(A8)**, **(A10)**, and Lemma 1 of Kim and Lee [22], the right-hand side of the last inequality converges to 0 a.s. as *t* → ∞. Hence, the first part of the lemma is verified.

Similarly, we have

1 (1 + *α*) 2 sup *θ*∈Θ *∂lα*,*t*(*θ*) *∂θ ∂lα*,*t*(*θ*) *∂θ<sup>T</sup>* − *∂* ˜ *lα*,*t*(*θ*) *∂θ ∂* ˜ *lα*,*t*(*θ*) *∂θ<sup>T</sup>* ≤ sup *θ*∈Θ (*hα*(*ηt*) <sup>2</sup> <sup>−</sup> *<sup>h</sup>α*(*η*˜*t*) 2 ) *∂Xt*(*θ*) *∂θ ∂Xt*(*θ*) *∂θ<sup>T</sup>* + sup *θ*∈Θ *hα*(*η*˜*t*) 2 *∂Xt*(*θ*) *∂θ ∂Xt*(*θ*) *∂θ<sup>T</sup>* − *∂X*e*t*(*θ*) *∂θ<sup>T</sup>* ! + sup *θ*∈Θ *hα*(*η*˜*t*) 2 *∂Xt*(*θ*) *∂θ* − *∂X*e*t*(*θ*) *∂θ* ! *∂X*e*t*(*θ*) *∂θ<sup>T</sup>* − *∂Xt*(*θ*) *∂θ<sup>T</sup>* ! + sup *θ*∈Θ *hα*(*η*˜*t*) 2 *∂Xt*(*θ*) *∂θ* − *∂X*e*t*(*θ*) *∂θ* ! *∂Xt*(*θ*) *∂θ<sup>T</sup>* ≤ sup *θ*∈Θ |*hα*(*ηt*) − *hα*(*η*˜*t*)| sup *θ*∈Θ |*hα*(*ηt*)| + sup *θ*∈Θ |*hα*(*η*˜*t*)| ! sup *θ*∈Θ *∂Xt*(*θ*) *∂θ* !<sup>2</sup> +2 sup *θ*∈Θ |*hα*(*η*˜*t*) 2 | sup *θ*∈Θ *∂Xt*(*θ*) *∂θ* sup *θ*∈Θ *∂Xt*(*θ*) *∂θ* − *∂X*e*t*(*θ*) *∂θ* + sup *θ*∈Θ |*hα*(*η*˜*t*) 2 | sup *θ*∈Θ *∂Xt*(*θ*) *∂θ* − *∂X*e*t*(*θ*) *∂θ* !<sup>2</sup> ,

and the right-hand side of the last inequality also converges to 0 a.s. from Lemma 2.1 of Straumann and Mikosch [34]. Therefore, the lemma is asserted.

**Lemma A3.** *Suppose that conditions* **(A0)***–***(A11)** *hold. Then, under H*0*, we have as n* → ∞*,*

$$K\_{\alpha}^{-1/2} \frac{[\imath s]}{\sqrt{\imath t}} \frac{\partial L\_{\alpha, [\imath s]}(\theta\_0)}{\partial \theta} \stackrel{w}{\longrightarrow} B\_d(s)\_{\prime}$$

*where B<sup>d</sup> is a d-dimensional Brownian motion.*

**Proof.** First, we show that *K<sup>α</sup>* is nonsingular. Since *Var*[*hα*(*η* 0 *t* )|F*t*−1] = *Var*[*p*(*Y<sup>t</sup>* |*η* 0 *t* ) *α* (*Y<sup>t</sup>* − *B*(*η* 0 *t* ))/*B* 0 (*η* 0 *t* ) |F*t*−1] > 0, we have *E*(*hα*(*η* 0 *t* ) 2 |F*t*−1) > [*E*(*hα*(*η* 0 *t* )|F*t*−1)]<sup>2</sup> <sup>=</sup> 0. Hence, it holds that for *<sup>ν</sup>* <sup>∈</sup> <sup>R</sup>*d*/{0},

$$\nu^T \mathcal{K}\_a \nu = (1+a)^2 \mathbb{E}\left[ h\_a(\eta\_t^0)^2 \left( \nu^T \frac{\partial \mathcal{X}\_t(\theta\_0)}{\partial \theta} \right)^2 \right] = (1+a)^2 \mathbb{E}\left[ \mathbb{E}(h\_a(\eta\_t^0)^2 | \mathcal{F}\_{t-1}) \left( \nu^T \frac{\partial \mathcal{X}\_t(\theta\_0)}{\partial \theta} \right)^2 \right] > 0,$$

from **(A9)**, which implies that *K<sup>α</sup>* is nonsingular.

Note that

$$E\left(\frac{\partial l\_{\mathfrak{a},t}(\theta\_0)}{\partial \theta} \Big| \mathcal{F}\_{t-1}\right) = (1+\mathfrak{a})\frac{\partial X\_t(\theta\_0)}{\partial \theta}E(h\_{\mathfrak{a}}(\eta\_t^0)|\mathcal{F}\_{t-1}) = 0\_\lambda$$

and *K<sup>α</sup>* is finite from Lemma 5 of Kim and Lee [22]. Since *∂lα*,*t*(*θ*0)/*∂θ* is stationary and ergodic, it holds from the functional central limit theorem for martingales (cf. Section 18 in Billingsley [35]) that

$$K\_{\alpha}^{-1/2} \frac{[\mathfrak{u}\mathfrak{s}]}{\sqrt{n}} \frac{\partial L\_{\mathfrak{a},[\mathfrak{ns}]}(\theta\_{0})}{\partial \theta} = K\_{\mathfrak{a}}^{-1/2} \frac{1}{\sqrt{n}} \sum\_{t=1}^{[\mathfrak{n}\mathfrak{s}]} \frac{\partial l\_{\mathfrak{a},t}(\theta\_{0})}{\partial \theta} \stackrel{w}{\longrightarrow} B\_{d}(s).$$

Furthermore, we can show that

$$\sup\_{0 \le s \le 1} \frac{[ns]}{\sqrt{n}} \left\| \frac{\partial L\_{a,[ns]}(\theta\_0)}{\partial \theta} - \frac{\partial \tilde{L}\_{a,[ns]}(\theta\_0)}{\partial \theta} \right\| \le \frac{1}{\sqrt{n}} \sum\_{t=1}^n \left\| \frac{\partial l\_{a,t}(\theta\_0)}{\partial \theta} - \frac{\partial \tilde{l}\_{a,t}(\theta\_0)}{\partial \theta} \right\| = o(1) \text{ a.s.} $$

from Lemma 6 of Kim and Lee [22]. Hence, the lemma is verified.

**Lemma A4.** *Suppose that conditions* **(A0)***–***(A11)** *hold. Then, under H*0*, we have as n* → ∞*,*

$$\max\_{1 \le k \le n} \frac{k}{n} \left\| \frac{\partial^2 \widetilde{L}\_{\alpha,k}(\widetilde{\theta}\_{\alpha,n,k})}{\partial \theta \partial \theta^T} + f\_{\alpha} \right\| = o(1) \text{ a.s.} $$

*where* { ¯*θα*,*n*,*<sup>k</sup>* <sup>|</sup><sup>1</sup> <sup>≤</sup> *<sup>k</sup>* <sup>≤</sup> *<sup>n</sup>*, *<sup>n</sup>* <sup>≥</sup> <sup>1</sup>} *is any double array of* <sup>Θ</sup>*-valued random vectors satisfying* <sup>k</sup> ¯*θα*,*n*,*<sup>k</sup>* <sup>−</sup> *<sup>θ</sup>*0k ≤ <sup>k</sup> <sup>ˆ</sup>*θα*,*<sup>n</sup>* <sup>−</sup> *<sup>θ</sup>*0k*.*

**Proof.** From Lemma 5 of Kim and Lee [22], it holds that

$$E\left(\sup\_{\theta \in \Theta} \left\| \frac{\partial^2 l\_{\mathfrak{a},t}(\theta)}{\partial \theta \partial \theta^T} - \frac{\partial^2 l\_{\mathfrak{a},t}(\theta\_0)}{\partial \theta \partial \theta^T} \right\|\right) < \infty.$$

Since *∂* 2 *lα*,*t*(*θ*)/*∂θ∂θ<sup>T</sup>* is continuous in *θ*, for any *e* > 0, we can take a neighborhood *Ne*(*θ*0), such that

$$E\left(\sup\_{\theta \in N\_{\epsilon}(\theta\_0)} \left\| \left| \frac{\partial^2 l\_{a,t}(\theta)}{\partial \theta \partial \theta^T} - \frac{\partial^2 l\_{a,t}(\theta\_0)}{\partial \theta \partial \theta^T} \right| \right|\right) < \epsilon \tag{A1}$$

by decreasing the neighborhood to *θ*0. Since ˆ*θα*,*<sup>n</sup>* converges to *θ*<sup>0</sup> a.s. by Proposition 1, we can write that for sufficiently large *n*,

max 1≤*k*≤*n k n ∂* <sup>2</sup>e*Lα*,*<sup>k</sup>* ( ¯*θα*,*n*,*<sup>k</sup>* ) *∂θ∂θ<sup>T</sup>* + *J<sup>α</sup>* ≤ max 1≤*k*≤*n k n ∂* <sup>2</sup>e*Lα*,*<sup>k</sup>* ( ¯*θα*,*n*,*<sup>k</sup>* ) *∂θ∂θ<sup>T</sup>* − *∂* <sup>2</sup>*Lα*,*<sup>k</sup>* ( ¯*θα*,*n*,*<sup>k</sup>* ) *∂θ∂θ<sup>T</sup>* + max 1≤*k*≤*n k n ∂* <sup>2</sup>*Lα*,*<sup>k</sup>* ( ¯*θα*,*n*,*<sup>k</sup>* ) *∂θ∂θ<sup>T</sup>* − *∂* <sup>2</sup>*Lα*,*<sup>k</sup>* (*θ*0) *∂θ∂θ<sup>T</sup>* + max 1≤*k*≤*n k n ∂* <sup>2</sup>*Lα*,*<sup>k</sup>* (*θ*0) *∂θ∂θ<sup>T</sup>* + *J<sup>α</sup>* ≤ 1 *n n* ∑ *t*=1 sup *θ*∈*Ne*(*θ*0) *∂* 2 ˜ *lα*,*t*(*θ*) *∂θ∂θ<sup>T</sup>* − *∂* 2 *lα*,*t*(*θ*) *∂θ∂θ<sup>T</sup>* + 1 *n n* ∑ *t*=1 sup *θ*∈*Ne*(*θ*0) *∂* 2 *lα*,*t*(*θ*) *∂θ∂θ<sup>T</sup>* − *∂* 2 *lα*,*t*(*θ*0) *∂θ∂θ<sup>T</sup>* + max 1≤*k*≤*n k n ∂* <sup>2</sup>*Lα*,*<sup>k</sup>* (*θ*0) *∂θ∂θ<sup>T</sup>* + *J<sup>α</sup>* := *I<sup>n</sup>* + *II<sup>n</sup>* + *III<sup>n</sup>* a.s.

By Lemma A2, *I<sup>n</sup>* = *o*(1) a.s. By using (A1) and the stationarity and ergodicity of *∂* 2 *lα*,*t*(*θ*)/*∂θ∂θ<sup>T</sup>* , we have

$$\lim\_{n \to \infty} II\_n = E \left( \sup\_{\theta \in N\_{\mathfrak{c}}(\theta\_0)} \left\| \frac{\partial^2 l\_{\mathfrak{a},t}(\theta)}{\partial \theta \partial \theta^T} - \frac{\partial^2 l\_{\mathfrak{a},t}(\theta\_0)}{\partial \theta \partial \theta^T} \right\| \right) < \varepsilon \text{ a.s.}$$

Finally, since *∂* <sup>2</sup>*Lα*,*n*(*θ*0)/*∂θ∂θ<sup>T</sup>* + *J<sup>α</sup>* converges to 0 a.s., we can show that

$$\max\_{1 \le k \le \sqrt{n}} \frac{k}{n} \left\| \frac{\partial^2 L\_{\mathfrak{a},k}(\theta\_0)}{\partial \theta \partial \theta^T} + I\_{\mathfrak{a}} \right\| \le \frac{1}{\sqrt{n}} \sup\_{1 \le k} \left\| \frac{\partial^2 L\_{\mathfrak{a},k}(\theta\_0)}{\partial \theta \partial \theta^T} + I\_{\mathfrak{a}} \right\| = o(1) \text{ a.s.}$$

and

$$\max\_{\substack{\sqrt{n} \le k \le n \\ \sqrt{n}}} \frac{k}{n} \left\| \frac{\partial^2 L\_{\mathfrak{a},k}(\theta\_0)}{\partial \theta \partial \theta^T} + f\_{\mathfrak{a}} \right\| \le \max\_{\sqrt{n} \le k \le n} \left\| \frac{\partial^2 L\_{\mathfrak{a},k}(\theta\_0)}{\partial \theta \partial \theta^T} + f\_{\mathfrak{a}} \right\| = o(1) \text{ a.s.} $$

which assert *III<sup>n</sup>* = *o*(1) a.s. Therefore, the lemma is established.

**Proof of Theorem 1.** First, we show that

$$\frac{\delta \left[ \mathfrak{N} \right]}{\sqrt{n}} \frac{\partial \widetilde{L}\_{a, \left[ \mathfrak{n} \right]} \left( \theta\_0 \right)}{\partial \theta} + \frac{\left[ \mathfrak{N} \right]}{n} \frac{\partial^2 \widetilde{L}\_{a, \left[ \mathfrak{n} \right]} \left( \theta\_{a, n, s}^\* \right)}{\partial \theta \partial \theta^T} J\_a^{-1} \sqrt{n} \frac{\partial \widetilde{L}\_{a, n} (\theta\_0)}{\partial \theta} \stackrel{w}{\longrightarrow} K\_a^{1/2} B\_d^o(s). \tag{A2}$$

From Lemma A3, we have

$$\frac{[ns]}{\sqrt{n}} \frac{\partial L\_{\alpha, [ns]}(\theta\_0)}{\partial \theta} - \frac{[ns]}{n} \sqrt{n} \frac{\partial \bar{L}\_{\alpha, n}(\theta\_0)}{\partial \theta} \stackrel{w}{\longrightarrow} K\_a^{1/2} B\_d^o(s).$$

Since √ *n∂*e*Lα*,*n*(*θ*0)/*∂θ* = *Op*(1) by Lemma A3 with *s* = 1, using Lemma A4, it holds that

$$\begin{split} &\sup\_{0\leq s\leq 1} \frac{\left\lVert\mathcal{B}^{2}\tilde{L}\_{a,\left[ns\right]}(\theta\_{a,n,s}^{\*})}{\left\lVert\theta\partial\theta^{T}\right\rVert}I\_{a}^{-1}\sqrt{n}\frac{\partial\tilde{L}\_{a,\eta}(\theta\_{0})}{\partial\theta}+\sqrt{n}\frac{\partial\tilde{L}\_{a,\eta}(\theta\_{0})}{\partial\theta}\right\rVert \\ &\leq \left\lVert J\_{a}^{-1}\sqrt{n}\frac{\partial\tilde{L}\_{a,\eta}(\theta\_{0})}{\partial\theta}\right\rVert \max\_{1\leq k\leq n} \frac{k}{n} \left\lVert\frac{\partial^{2}\tilde{L}\_{a,k}(\theta\_{a,n,k}^{\*})}{\partial\theta\partial\theta^{T}}+J\_{a}\right\rVert \\ &=\left\lVert o\_{p}(1),\end{split}$$

where *θ* ∗ *α*,*n*,*k* denotes that corresponding to *θ* ∗ *<sup>α</sup>*,*n*,*<sup>s</sup>* when [*ns*] = *k*. Hence, (A2) is verified.

Next, from Lemma A4, we have

$$\sup\_{0 \le s \le 1} \frac{[ns]}{n} \left\| \frac{\partial^2 \widetilde{L}\_{a, [ns]}(\theta\_{a, n, s}^\*)}{\partial \theta \partial \theta^T} \right\| \le \max\_{1 \le k \le n} \frac{k}{n} \left\| \frac{\partial^2 \widetilde{L}\_{a, k}(\theta\_{a, n, k}^\*)}{\partial \theta \partial \theta^T} + f\_a \right\| + \left\| f\_a \right\| = O\_p(1)$$

and

$$\left\| \frac{\partial^2 \widetilde{L}\_{a,\mathfrak{n}}(\theta^\*\_{\mathfrak{a},\mathfrak{n},1})}{\partial \theta \partial \theta^T} + f\_{\mathfrak{n}} \right\| \leq \max\_{1 \leq k \leq n} \frac{k}{\mathfrak{n}} \left\| \frac{\partial^2 \widetilde{L}\_{a,k}(\theta^\*\_{\mathfrak{a},\mathfrak{n},k})}{\partial \theta \partial \theta^T} + f\_{\mathfrak{a}} \right\| = o(1) \text{ a.s.}$$

Then, since √ *n*( <sup>ˆ</sup>*θα*,*<sup>n</sup>* <sup>−</sup> *<sup>θ</sup>*0) = *<sup>O</sup>p*(1) by Proposition 1, we have

$$\sup\_{0 \le s \le 1} \frac{[n\text{s}]}{n} \left\| \frac{\partial^2 \widetilde{L}\_{a, [ns]}(\theta\_{a, n, s}^\*)}{\partial \theta \partial \theta^T} \left( \frac{\partial^2 \widetilde{L}\_{a, n}(\theta\_{a, n, 1}^\*)}{\partial \theta \partial \theta^T} + f\_a \right) \sqrt{n} (\theta\_{a, n} - \theta\_0) \right\| = o\_p(1). \tag{A3}$$

Therefore, from (5), (A2), and (A3), the theorem is validated.

**Lemma A5.** *Suppose that conditions* **(A0)***–***(A11)** *hold. Then, under H*0*, we have as n* → ∞*,*

$$\frac{1}{m} \sum\_{t=1}^{n} \frac{\partial \tilde{l}\_{\alpha,t}(\hat{\theta}\_{\alpha,n})}{\partial \theta} \frac{\partial \tilde{l}\_{\alpha,t}(\hat{\theta}\_{\alpha,n})}{\partial \theta^T} \stackrel{a.s.}{\longrightarrow} \mathsf{K}\_{\alpha}.$$

**Proof.** In a similar way to Lemma A4, from Lemma 5 of Kim and Lee [22], we can also take a neighborhood *Ne*(*θ*0), such that

$$\lim\_{n\to\infty} \frac{1}{n} \sum\_{t=1}^{n} \sup\_{\theta \in N\_{\ell}(\theta\_{0})} \left\| \frac{\partial l\_{a,t}(\theta)}{\partial \theta} \frac{\partial l\_{a,t}(\theta)}{\partial \theta^{T}} - \frac{\partial l\_{a,t}(\theta\_{0})}{\partial \theta} \frac{\partial l\_{a,t}(\theta\_{0})}{\partial \theta^{T}} \right\|$$

$$= -E \left( \sup\_{\theta \in N\_{\ell}(\theta\_{0})} \left\| \frac{\partial l\_{a,t}(\theta)}{\partial \theta} \frac{\partial l\_{a,t}(\theta)}{\partial \theta^{T}} - \frac{\partial l\_{a,t}(\theta\_{0})}{\partial \theta} \frac{\partial l\_{a,t}(\theta\_{0})}{\partial \theta^{T}} \right\| \right) < \varepsilon \quad \text{a.s.} \tag{A4}$$

Note that we can write

 1 *n n* ∑ *t*=1 *∂* ˜ *lα*,*t*( ˆ*θα*,*n*) *∂θ ∂* ˜ *lα*,*t*( ˆ*θα*,*n*) *∂θ<sup>T</sup>* − *E ∂lα*,*t*(*θ*0) *∂θ ∂lα*,*t*(*θ*0) *∂θ<sup>T</sup>* ≤ 1 *n n* ∑ *t*=1 *∂* ˜ *lα*,*t*( ˆ*θα*,*n*) *∂θ ∂* ˜ *lα*,*t*( ˆ*θα*,*n*) *∂θ<sup>T</sup>* − 1 *n n* ∑ *t*=1 *∂lα*,*t*( ˆ*θα*,*n*) *∂θ ∂lα*,*t*( ˆ*θα*,*n*) *∂θ<sup>T</sup>* + 1 *n n* ∑ *t*=1 *∂lα*,*t*( ˆ*θα*,*n*) *∂θ ∂lα*,*t*( ˆ*θα*,*n*) *∂θ<sup>T</sup>* − 1 *n n* ∑ *t*=1 *∂lα*,*t*(*θ*0) *∂θ ∂lα*,*t*(*θ*0) *∂θ<sup>T</sup>* + 1 *n n* ∑ *t*=1 *∂lα*,*t*(*θ*0) *∂θ ∂lα*,*t*(*θ*0) *∂θ<sup>T</sup>* − *E ∂lα*,*t*(*θ*0) *∂θ ∂lα*,*t*(*θ*0) *∂θ<sup>T</sup>* := *I<sup>n</sup>* + *II<sup>n</sup>* + *IIIn*.

By Lemma A2,

$$I\_n \le \frac{1}{n} \sum\_{t=1}^n \sup\_{\theta \in \Theta} \left\| \frac{\partial \check{l}\_{a,t}(\theta)}{\partial \theta} \frac{\partial \check{l}\_{a,t}(\theta)}{\partial \theta^T} - \frac{\partial l\_{a,t}(\theta)}{\partial \theta} \frac{\partial l\_{a,t}(\theta)}{\partial \theta^T} \right\| = o(1) \text{ a.s.}$$

Since ˆ*θα*,*<sup>n</sup>* converges to *θ*<sup>0</sup> a.s. by Proposition 1, from (A4), we have

$$\lim\_{n \to \infty} I I\_{\mathbb{H}} \le \lim\_{n \to \infty} \frac{1}{n} \sum\_{t=1}^{n} \sup\_{\theta \in N\_{\mathfrak{C}}(\theta\_0)} \left\| \frac{\partial l\_{a,t}(\theta)}{\partial \theta} \frac{\partial l\_{a,t}(\theta)}{\partial \theta^T} - \frac{\partial l\_{a,t}(\theta\_0)}{\partial \theta} \frac{\partial l\_{a,t}(\theta\_0)}{\partial \theta^T} \right\| < \varepsilon \text{ a.s.}$$

Finally, by the ergodic theorem, *III<sup>n</sup>* = *o*(1) a.s. Therefore, the lemma is established.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
