*Article* **Statistical Inference for Periodic Self-Exciting Threshold Integer-Valued Autoregressive Processes**

**Congmin Liu 1, Jianhua Cheng <sup>1</sup> and Dehui Wang 2,\***


**\*** Correspondence: wangdh@jlu.edu.cn

**Abstract:** This paper considers the periodic self-exciting threshold integer-valued autoregressive processes under a weaker condition in which the second moment is finite instead of the innovation distribution being given. The basic statistical properties of the model are discussed, the quasilikelihood inference of the parameters is investigated, and the asymptotic behaviors of the estimators are obtained. Threshold estimates based on quasi-likelihood and least squares methods are given. Simulation studies evidence that the quasi-likelihood methods perform well with realistic sample sizes and may be superior to least squares and maximum likelihood methods. The practical application of the processes is illustrated by a time series dataset concerning the monthly counts of claimants collecting short-term disability benefits from the Workers' Compensation Board (WCB). In addition, the forecasting problem of this dataset is addressed.

**Keywords:** periodic autoregression; integer-valued threshold models; parameter estimation

**Citation:** Liu, C.; Cheng, J.; Wang, D. Statistical Inference for Periodic Self-Exciting Threshold Integer-Valued Autoregressive Processes. *Entropy* **2021**, *23*, 765. https://doi.org/ 10.3390/e23060765

Academic Editor: Christian H. Weiss

Received: 29 April 2021 Accepted: 13 June 2021 Published: 17 June 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

There has been considerable interest in integer-valued time series because of their wide range of applications, including epidemiology, finance, and disease modeling. Examples of such data are as follows: the number of major global earthquakes per year, monthly crimes in a particular country or region, and patient numbers in a hospital per month over a period of time, etc. Following the first-order integer-valued autoregressive (INAR(1)) models introduced by Al-Osh and Alzaid [1], INAR models have been widely used, see Du and Li [2], Jung et al. [3], Weiß [4], Risti´c et al. [5], Zhang et al. [6], Li et al. [7], Kang et al. [8] and Yu et al. [9], among others. However, for so-called piecewise phenomenon such as high thresholds, sudden bursts of large values, and time volatility, the INAR model will not work well. The threshold models (Tong [10]; Tong and Lim [11]) have attracted much attention and have been widely used to model nonlinear phenomena. To capture the piecewise phenomenon of integer-valued time series, Monteiro et al. [12] introduced a class of self-exciting threshold integer-valued autoregressive (SETINAR) models driven by independent Poisson-distributed random variables. Wang et al. [13] proposed a selfexcited threshold Poisson autoregressive (SETPAR) model. Yang et al. [14] considered a class of SETINAR processes that properly capture flexible asymmetric and nonlinear responses without assuming the distributions for the errors. Yang et al. [15] introduced an integer-valued threshold autoregressive process based on a negative binomial thinning operator (NBTINAR(1)).

In addition, there are many sources of business, economic and meteorology time series data showing a periodically varying phenomenon that repeats itself after a regular period of time. It may be affected by seasonal factors and human activities. For dealing with the processes exhibiting periodic patterns, Bennett [16] and Gladyshev [17] proposed periodically correlated random processes. Then, Bentarzi and Hallin [18], Lund and Basawa [19], Basawa and Lund [20], and Shao [21], among other authors, studied

the periodic autoregressive moving-average (PARMA) models in some detail. To capture the periodic phenomenon of integer-valued time series, Monteiro et al. [22] proposed the periodic integer-valued autoregressive models of order one (PINAR(1)) with period *T*, driven by a periodic sequence of independent Poisson-distributed random variables. Hall et al. [23] considered the extremal behavior of periodic integer-valued moving-average sequences. Santos et al. [24] introduced a multivariate PINAR model with time-varying parameters. The analysis of periodic self-exciting threshold integer-valued autoregressive (PSETINAR(2; 1, 1)*T*) processes was introduced by Pereira et al. [25]. Manaa and Bentarzi [26] established the existence of high moment and the strict periodic stationarity for the PSETINAR(2; 1, 1)*<sup>T</sup>* processes. The CLS and CML methods are applied to estimate the parameters while using the nested sub-sample search (NeSS) algorithm proposed by Li and Tong [27] to estimate the periodic threshold parameters. A drawback of this PSETINAR(2; 1, 1)*<sup>T</sup>* model is that the mean and variance of Poisson distribution are equal, which is not always true in the real data. Therefore, in this paper, we remove the assumption of Poisson distribution, only specify the relationship between mean and variance of observations, develop quasi-likelihood inference for the PSETINAR(2; 1, 1)*<sup>T</sup>* processes, and consider the estimation of thresholds.

Quasi-likelihood is a non-parametric inference method proposed by Wedderburn [28]. It is very useful in cases where the exact distributional information is not available, while only the relation between mean and variance of the observation is given, and it enjoys a certain robustness of validity. Quasi-likelihood has been widely applied. For example, Azrak and Mélard [29] proposed a simple and efficient algorithm to evaluate the exact quasi-likelihood of ARMA models with time-dependent coefficients; Christou and Fokianos [30] studied probabilistic properties and quasi-likelihood estimation for negative binomial time series models; Li et al. [31] studied the quasi-likelihood inference for the self-exciting threshold integer-valued autoregressive (SETINAR(2,1)) processes under a weaker condition; Yang et al. [32] modeled overdispersed or underdispersed count data with generalized Poisson integer-valued autoregressive (GPINAR(1)) processes and investigated the maximum quasi- likelihood estimators.

The remainder of this paper is organized as follows. In Section 2, we redefine the PSETINAR(2; 1, 1)*<sup>T</sup>* processes under weak conditions and discuss their basic properties. In Section 3, we consider the quasi-likelihood inference for the unknown parameters. Thresholds estimation is also discussed. Section 4 presents some simulation results for the estimates. In Section 5, we give an application of the proposed processes to a real dataset. The forecasting problem of this dataset is addressed. Concluding remarks are given in Section 6. All proofs are postponed to the Appendix A.

#### **2. The Model and Its Properties**

The periodic self-exciting threshold integer-valued autoregressive model of order one with two regimes (PSETINAR(2; 1, 1)*T*) (originally proposed by Pereira et al. [25], and further studied by Manaa and Bentarzi [26]) is defined by the recursive equation:

$$X\_t = \begin{cases} a\_t^{(1)} \circ X\_{t-1} + Z\_{t\prime} & X\_{t-1} \le r\_{t\prime} \\ a\_t^{(2)} \circ X\_{t-1} + Z\_{t\prime} & X\_{t-1} > r\_{t\prime} \end{cases} \tag{1}$$

with threshold parameters *rt* <sup>=</sup> *rj*, autoregressive coefficients *<sup>α</sup>*(*k*) *<sup>t</sup>* <sup>=</sup> *<sup>α</sup>*(*k*) *<sup>j</sup>* ∈ (0, 1), for *<sup>k</sup>* <sup>=</sup> 1, 2, *<sup>t</sup>* <sup>=</sup> *<sup>j</sup>* <sup>+</sup> *sT*, *<sup>j</sup>* <sup>=</sup> 1, 2, ... , *<sup>T</sup>*,*<sup>s</sup>* <sup>∈</sup> <sup>Z</sup>, and *<sup>T</sup>* <sup>∈</sup> <sup>N</sup>0. Note that Equation (1) admits the representation

$$X\_{j+sT} = \left(a\_j^{(1)} \diamond X\_{j+sT-1}\right) I\_{j+sT-1}^{(1)} + \left(a\_j^{(2)} \diamond X\_{j+sT-1}\right) I\_{j+sT-1}^{(2)} + Z\_{j+sT} \tag{2}$$

where


$$\alpha\_j^{(k)} \circ X\_{j+sT-1} = \sum\_{i=1}^{X\_{j+sT-1}} \mathcal{U}\_{i,j+sT} \left( \alpha\_j^{(k)} \right),\tag{3}$$

in which {*Ui*,*j*+*sT α*(*k*) *j* , *<sup>j</sup>* <sup>=</sup> 1, 2, ... , *<sup>T</sup>*,*<sup>s</sup>* <sup>∈</sup> <sup>Z</sup>} is a sequence of independent periodic Bernoulli random variables with *P Ui*,*j*+*sT α*(*k*) *j* = 1 = 1 − *P Ui*,*j*+*sT α*(*k*) *j* = 0 = *α*(*k*) *<sup>j</sup>* , *k* = 1, 2;

(iii) {*Zj*+*sT*, *<sup>j</sup>* <sup>=</sup> 1, 2, ... , *<sup>T</sup>*,*<sup>s</sup>* <sup>∈</sup> <sup>Z</sup>} constitutes a sequence of independent periodic random variables with *E* - *Zj*+*sT* = *λj*, *Var*- *Zj*+*sT* = *σ*<sup>2</sup> *z*,*j* , which is assumed to be independent of {*Xj*+*sT*−1} and {*α*(*k*) *<sup>j</sup>* ◦ *Xj*+*sT*−1}.

**Remark 1.** *The innovation of PSETINAR*(2; 1, 1)*<sup>T</sup> process defined by Pereira et al. [25] and Manaa and Bentarzi [26] is a sequence of independent periodic Poisson-distributed random variables with mean λj, that is* {*Zt*} ∼ *P* - *λj , where <sup>t</sup>* <sup>=</sup> *<sup>j</sup>* <sup>+</sup> *sT, <sup>j</sup>* <sup>=</sup> 1, 2, ... , *T, <sup>s</sup>* <sup>∈</sup> <sup>Z</sup>*. In this paper, we use E* - *Zj*+*sT* = *λj, Var*- *Zj*+*sT* = *σ*<sup>2</sup> *<sup>z</sup>*,*<sup>j</sup> instead of the assumption of periodic Poisson distribution for* {*Zj*+*sT*}*, so that the model is more flexible.*

The following proposition establishes the conditional mean and the conditional variance of the PSETINAR(2; 1, 1)*<sup>T</sup>* process, which plays an important role in the study of the process properties and parameter estimations.

**Proposition 1.** *For any fixed <sup>j</sup>* <sup>=</sup> 1, 2, ... , *T, with <sup>T</sup>* <sup>∈</sup> <sup>N</sup>0*, the conditional mean and the conditional variance of the process* {*Xt*} *for t* <sup>=</sup> *<sup>j</sup>* <sup>+</sup> *sT and s* <sup>∈</sup> <sup>Z</sup> *defined in (2) are given by*

$$\text{E}(\text{i}) \quad E\big(X\_{\text{j}+sT}|X\_{\text{j}+sT-1}\big) = \alpha\_{\text{j}}^{(1)}X\_{\text{j}+sT-1}I\_{\text{j}+sT-1}^{(1)} + \alpha\_{\text{j}}^{(2)}X\_{\text{j}+sT-1}I\_{\text{j}+sT-1}^{(2)} + \lambda\_{\text{j}+sT}$$

$$\text{(iii)}\quad Var\left(X\_{j+sT}|X\_{j+sT-1}\right) = \sum\_{k=1}^{2} a\_j^{(k)} \left(1 - a\_j^{(k)}\right) X\_{j+sT-1} I\_{j+sT-1}^{(k)} + \sigma\_{z,j}^2.$$

The following theorem states the ergodicity of the PSETINAR(2; 1, 1)*<sup>T</sup>* process (2). This property is useful in deriving the asymptotic properties of the parameter estimators.

**Theorem 1.** *For any fixed <sup>j</sup>* <sup>=</sup> 1, 2, ... , *T, with <sup>T</sup>* <sup>∈</sup> <sup>N</sup>0*, the process* {*Xt*} *for <sup>t</sup>* <sup>=</sup> *<sup>j</sup>* <sup>+</sup> *sT and <sup>s</sup>* <sup>∈</sup> <sup>Z</sup> *defined in (2) is an ergodic Markov chain.*

#### **3. Parameters Estimation**

Suppose we have a series of observations {*Xj*+*sT*, *<sup>j</sup>* <sup>=</sup> 1, 2, ... , *<sup>T</sup>*,*<sup>s</sup>* <sup>∈</sup> <sup>N</sup>0} generated from the PSETINAR(2; 1, 1)*<sup>T</sup>* process. The goal of this section is to estimate the unknown parameters vector *<sup>β</sup>* <sup>=</sup> (*β*1,..., *<sup>β</sup>*3*T*) (*α*(1) <sup>1</sup> , *<sup>α</sup>*(2) <sup>1</sup> , *<sup>λ</sup>*1, *<sup>α</sup>*(1) <sup>2</sup> , *<sup>α</sup>*(2) <sup>2</sup> , *<sup>λ</sup>*2, ... , *<sup>α</sup>*(1) *<sup>T</sup>* , *<sup>α</sup>*(2) *<sup>T</sup>* , *λT*) and threshold parameters vector *r* = (*r*1,*r*2,...,*rT*) . This section is divided into two subsections. In Section 3.1, we estimate the parameters vector *β* by using the maximum quasi-likelihood (MQL) method when the thresholds value is known. We consider the maximum quasi-likelihood (MQL) and conditional least square (CLS) estimators of thresholds *r* in Section 3.2.

#### *3.1. Estimation of Parameters β*

As described in Proposition 1 (ii), we have the variance of *Xt* conditional on *Xt*−1, let *θ<sup>j</sup> θ* (1) *<sup>j</sup>* , *θ* (2) *<sup>j</sup>* , *<sup>σ</sup>*<sup>2</sup> *z*,*j* with *θ* (*k*) *<sup>j</sup>* <sup>=</sup> *<sup>α</sup>*(*k*) *j* <sup>1</sup> <sup>−</sup> *<sup>α</sup>*(*k*) *j* , *k* = 1, 2, *j* = 1, 2, ... , *T*, then the *Var*- *Xj*+*sT*|*Xj*+*sT*−<sup>1</sup> admits the representation

$$V\_{\theta\_{\hat{\jmath}}}(X\_{\hat{\jmath}+sT}|X\_{\hat{\jmath}+sT-1}) \stackrel{\triangle}{=} Var(X\_{\hat{\jmath}+sT}|X\_{\hat{\jmath}+sT-1}) = \theta\_{\hat{\jmath}}^{(1)}X\_{\hat{\jmath}+sT-1}I\_{\hat{\jmath}+sT-1}^{(1)} + \theta\_{\hat{\jmath}}^{(2)}X\_{\hat{\jmath}+sT-1}I\_{\hat{\jmath}+sT-1}^{(2)} + \sigma\_{z\hat{\jmath}'}^2$$

for <sup>∀</sup>*<sup>j</sup>* <sup>=</sup> 1, 2, . . . , *<sup>T</sup>*,*<sup>s</sup>* <sup>∈</sup> <sup>N</sup>0.

As discussed in Wedderburn [28], we have the set of standard quasi-likelihood estimating equations:

$$L(\boldsymbol{\pm}) = \sum\_{s=0}^{N-1} \sum\_{j=1}^{T} \frac{X\_{j+sT} - E\left(X\_{j+sT}|X\_{j+sT-1}\right)}{V\_{\theta\_{\hat{\jmath}}}(X\_{j+sT}|X\_{j+sT-1})} \frac{\partial E\left(X\_{j+sT}|X\_{j+sT-1}\right)}{\partial \beta\_{\hat{\jmath}}} = 0,\tag{4}$$

for *i* = 1, ... , 3*T*, where *N* is the total number of cycles. By solving (4), the quasi-likelihood estimator can be obtained.

This method is essentially a two-step estimation, if *θ<sup>j</sup>* is unknown, we propose substituting a suitable consistent estimator of *θ<sup>j</sup>* obtained by other means, getting modified quasilikelihood estimating equations and then solving them for the primary parameters of interest. In the modified quasi- likelihood estimating equations, we replace *θ<sup>j</sup>* with a suitable consistent estimator *θ*ˆ*j*. For simplicity in notation, we define *V*−<sup>1</sup> *θ*ˆ*j* = *V*−<sup>1</sup> *θ*ˆ*j* - *Xj*+*sT*|*Xj*+*sT*−<sup>1</sup> . This approach leads to the modified quasi-likelihood estimator *β*ˆ *MQL* of *β* (see Zheng, Basawa and Datta [33]):

$$
\hat{\mathcal{B}}\_{MQL} = \mathcal{Q}\_N^{-1} \mathfrak{q}\_{N'} \tag{5}
$$

where

$$\mathbf{Q}\_N = \begin{bmatrix} \mathbf{Q}\_{1,N} & \mathbf{0} & \cdots & \mathbf{0} \\ \mathbf{0} & \mathbf{Q}\_{2,N} & \cdots & \mathbf{0} \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{0} & \mathbf{0} & \cdots & \mathbf{Q}\_{T,N} \end{bmatrix}^T$$

and

$$\mathfrak{q}\_N = \left( \mathfrak{q}\_{1,N'} \mathfrak{q}\_{2,N'} \dots \mathfrak{q}\_{T,N} \right)',$$

moreover, the **0**'s are (3 × 3)-null matrices, *Qj*,*<sup>N</sup>* and *qj*,*<sup>N</sup>* (*j* = 1, 2, . . . , *T*) given by

$$\mathbf{Q}\_{j,N} = \begin{bmatrix} \sum\_{s=0}^{N-1} V\_{\boldsymbol{\theta}\_{j}}^{-1} \mathbf{X}\_{\boldsymbol{j}+sT-1}^{2} I\_{\boldsymbol{j}+sT-1}^{(1)} & \mathbf{0} & \sum\_{s=0}^{N-1} V\_{\boldsymbol{\theta}\_{j}}^{-1} \mathbf{X}\_{\boldsymbol{j}+sT-1} I\_{\boldsymbol{j}+sT-1}^{(1)}\\ \mathbf{0} & \sum\_{s=0}^{N-1} V\_{\boldsymbol{\theta}\_{j}}^{-1} \mathbf{X}\_{\boldsymbol{j}+sT-1}^{2} I\_{\boldsymbol{j}+sT-1}^{(2)} & \sum\_{s=0}^{N-1} V\_{\boldsymbol{\theta}\_{j}}^{-1} \mathbf{X}\_{\boldsymbol{j}+sT-1} I\_{\boldsymbol{j}+sT-1}^{(2)}\\ \sum\_{s=0}^{N-1} V\_{\boldsymbol{\theta}\_{j}}^{-1} \mathbf{X}\_{\boldsymbol{j}+sT-1} I\_{\boldsymbol{j}+sT-1}^{(1)} & \sum\_{s=0}^{N-1} V\_{\boldsymbol{\theta}\_{j}}^{-1} \mathbf{X}\_{\boldsymbol{j}+sT-1} I\_{\boldsymbol{j}+sT-1}^{(2)} & \sum\_{s=0}^{N-1} V\_{\boldsymbol{\theta}\_{j}}^{-1} \end{bmatrix},$$

$$\boldsymbol{\mathfrak{q}}\_{j,N} = \left(\sum\_{s=0}^{N-1} \boldsymbol{V}\_{\boldsymbol{\mathfrak{d}}\_{j}}^{-1} \boldsymbol{X}\_{\boldsymbol{\mathfrak{f}}+sT} \boldsymbol{X}\_{\boldsymbol{\mathfrak{f}}+sT-1} \boldsymbol{I}\_{\boldsymbol{\mathfrak{f}}+sT-1}^{(1)} \boldsymbol{I}\_{\boldsymbol{\mathfrak{f}}+sT-1}^{(1)} \sum\_{s=0}^{N-1} \boldsymbol{V}\_{\boldsymbol{\mathfrak{d}}\_{j}}^{-1} \boldsymbol{X}\_{\boldsymbol{\mathfrak{f}}+sT} \boldsymbol{X}\_{\boldsymbol{\mathfrak{f}}+sT-1} \boldsymbol{I}\_{\boldsymbol{\mathfrak{f}}+sT-1}^{(2)} \sum\_{s=0}^{N-1} \boldsymbol{V}\_{\boldsymbol{\mathfrak{d}}\_{j}}^{-1} \boldsymbol{X}\_{\boldsymbol{\mathfrak{f}}+sT}\right) \boldsymbol{I}\_{\boldsymbol{\mathfrak{d}}\_{j}}$$

Note that we use consistent estimator *θ*ˆ*<sup>j</sup>* = *α*ˆ (1) *j* 1 − *α*ˆ (1) *j* , *α*ˆ (2) *j* 1 − *α*ˆ (2) *j* , *σ*ˆ <sup>2</sup> *z*,*j* instead of *θj*.

Next, the proposition gives consistent estimators *σ*ˆ <sup>2</sup> *<sup>z</sup>*,*<sup>j</sup>* of *<sup>σ</sup>*<sup>2</sup> *z*,*j* , which depends on some consistent estimators *α*ˆ (*k*) *<sup>j</sup>* and *<sup>λ</sup>*<sup>ˆ</sup> *<sup>j</sup>* with *<sup>k</sup>* <sup>=</sup> 1, 2, *<sup>j</sup>* <sup>=</sup> 1, 2, . . . , *<sup>T</sup>*.

**Proposition 2.** *The following variance estimators for* {*Zj*+*sT*} *with <sup>j</sup>* <sup>=</sup> 1, 2, ... , *<sup>T</sup>*,*<sup>s</sup>* <sup>∈</sup> <sup>N</sup><sup>0</sup> *are consistent:*

$$\begin{split}(i) \quad \partial\_{1,z,j}^{2} &= \frac{1}{N} \sum\_{s=0}^{N-1} \left( X\_{j+sT} - \sum\_{k=1}^{2} \hbar\_{j}^{(k)} X\_{j+sT-1} I\_{j+sT-1}^{(k)} - \lambda\_{j} \right)^{2} \\ &- \frac{1}{N} \sum\_{k=1}^{2} \sum\_{s=0}^{N-1} \hbar\_{j}^{(k)} \left( 1 - \hbar\_{j}^{(k)} \right) X\_{j+sT-1} I\_{j+sT-1}^{(k)} \end{split} \tag{6}$$

$$\begin{aligned} \text{(ii)} \quad \mathfrak{d}\_{2,z,j}^{2} &= \mathfrak{d}\_{\mathbf{x},j}^{2} - \mathfrak{p}\_{j} \left[ \mathring{\mathfrak{a}}\_{j}^{(1)^{2}} \mathfrak{d}\_{j}^{2(1)} + \mathring{\mathfrak{a}}\_{j}^{(1)} \left( 1 - \mathring{\mathfrak{a}}\_{j}^{(1)} \right) \mathring{\mathfrak{p}}\_{j}^{(1)} \right] \\ &- \left( 1 - \mathfrak{p}\_{j} \right) \left[ \mathring{\mathfrak{a}}\_{j}^{(2)^{2}} \mathfrak{d}\_{j}^{2(2)} + \mathring{\mathfrak{a}}\_{j}^{(2)} \left( 1 - \mathring{\mathfrak{a}}\_{j}^{(2)} \right) \mathring{\mathfrak{a}}\_{j}^{(2)} \right] \\ &- \hat{\mathfrak{p}}\_{j} (1 - \mathring{\mathfrak{p}}\_{j}) \left( \mathring{\mathfrak{a}}\_{j}^{(1)} \mathring{\mathfrak{a}}\_{j}^{(1)} - \mathring{\mathfrak{a}}\_{j}^{(2)} \mathring{\mathfrak{a}}\_{j}^{(2)} \right)^{2}, \end{aligned} \tag{7}$$

*for <sup>k</sup>* <sup>=</sup> 1, 2, *<sup>j</sup>* <sup>=</sup> 1, 2, ... , *<sup>T</sup>*,*<sup>s</sup>* <sup>∈</sup> <sup>N</sup>0*, in which <sup>α</sup>*<sup>ˆ</sup> (*k*) *<sup>j</sup> and <sup>λ</sup>*<sup>ˆ</sup> *<sup>j</sup> are consistent estimators of <sup>α</sup>*(*k*) *<sup>j</sup> and λ<sup>j</sup> (for example, we can use the CLS estimators given in Theorem 3.1 of Pereira et al. [25]), furthermore*

$$\mathcal{X}\_{j} = \frac{1}{N} \sum\_{s=0}^{N-1} X\_{j+sT\prime} \,\,\boldsymbol{\sigma}\_{\mathbf{x},j}^{2} = \frac{1}{N} \sum\_{s=0}^{N-1} \left(X\_{j+sT} - \mathcal{X}\_{j}\right)^{2},$$

$$N\_{j}^{(k)} = \sum\_{s=0}^{N-1} I\_{j+sT-1\prime}^{(k)} \,\,\boldsymbol{\mu}\_{j}^{(k)} = \frac{1}{N\_{j}^{(k)}} \sum\_{s \in \left\{I\_{j+sT-1}^{(k)} = 1\right\}} X\_{j+sT\prime},$$

$$\hat{\boldsymbol{\mu}}\_{j} = \frac{1}{N} \sum\_{s=0}^{N-1} I\_{j+sT-1\prime}^{(1)} \,\,\hat{\boldsymbol{\sigma}}\_{j}^{2(k)} = \frac{1}{N\_{j}^{(k)}} \sum\_{s \in \left\{I\_{j+sT-1}^{(k)} = 1\right\}} \left(X\_{j+sT} - \hat{\boldsymbol{\mu}}\_{j}^{(k)}\right)^{2}.$$

*The two estimations are based on conditional variance Var*- *Xj*+*sT*|*Xj*+*sT*−<sup>1</sup> *and variance Var*- *Xj*+*sT respectively. The details can be found in the Appendix A.*

*,*

To study the asymptotic behavior of the estimator *β*ˆ *MQL*, we make the following assumptions about the process of {*Xt*}:

(C1) *By Proposition 1 in Pereira et al. [25], we assume the* {*Xt*} *is a strictly ciclostationary process;* (C2) *E*|*Xt*| <sup>4</sup> < ∞.

Now for the asymptotic properties of the quasi-likelihood estimator *β*ˆ *MQL* given by (5), we have the following asymptotic distribution.

**Theorem 2.** *Let* {*Xt*} *be a PSETINAR*(2; 1, 1)*<sup>T</sup> process defined in (2), then under the assumptions (C1)-(C2), the estimator β*ˆ *MQL given by (5) is asymptotically normal,*

$$\sqrt{N}\Big(\hat{\mathfrak{k}}\_{MQL} - \mathfrak{k}\Big) \to \mathcal{N}\Big(\mathbf{0}, H^{-1}(\boldsymbol{\theta})\Big),$$

*where*

$$H(\boldsymbol{\theta}) = \begin{bmatrix} H\_1(\boldsymbol{\theta}) & \mathbf{0} & \cdots & \mathbf{0} \\ \mathbf{0} & H\_2(\boldsymbol{\theta}) & \cdots & \mathbf{0} \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{0} & \mathbf{0} & \cdots & H\_T(\boldsymbol{\theta}) \end{bmatrix}^T$$

*with matrices H<sup>j</sup>* (*j* = 1, 2, . . . , *T*) *given by*

$$H\_{\hat{f}}(\boldsymbol{\theta}) = \begin{bmatrix} E\left(V\_{\boldsymbol{\theta}\_{\hat{\boldsymbol{\eta}}}}^{-1}(X\_{\hat{\boldsymbol{\eta}}}|X\_{\hat{\boldsymbol{\eta}}-1})X\_{\hat{\boldsymbol{\eta}}-1}^{2}I\_{\hat{\boldsymbol{\eta}}-1}^{(1)}\right) & 0 & E\left(V\_{\boldsymbol{\theta}\_{\hat{\boldsymbol{\eta}}}}^{-1}(X\_{\hat{\boldsymbol{\eta}}}|X\_{\hat{\boldsymbol{\eta}}-1})X\_{\hat{\boldsymbol{\eta}}-1}I\_{\hat{\boldsymbol{\eta}}-1}^{(1)}\right) \\ 0 & E\left(V\_{\boldsymbol{\theta}\_{\hat{\boldsymbol{\eta}}}}^{-1}(X\_{\hat{\boldsymbol{\eta}}}|X\_{\hat{\boldsymbol{\eta}}-1})X\_{\hat{\boldsymbol{\eta}}-1}^{2}I\_{\hat{\boldsymbol{\eta}}-1}^{(2)}\right) & E\left(V\_{\boldsymbol{\theta}\_{\hat{\boldsymbol{\eta}}}}^{-1}(X\_{\hat{\boldsymbol{\eta}}}|X\_{\hat{\boldsymbol{\eta}}-1})X\_{\hat{\boldsymbol{\eta}}-1}I\_{\hat{\boldsymbol{\eta}}-1}^{(2)}\right) \\ E\left(V\_{\boldsymbol{\theta}\_{\hat{\boldsymbol{\eta}}}}^{-1}(X\_{\hat{\boldsymbol{\eta}}}|X\_{\hat{\boldsymbol{\eta}}-1})X\_{\hat{\boldsymbol{\eta}}-1}I\_{\hat{\boldsymbol{\eta}}-1}^{(2)}\right) & E\left(V\_{\boldsymbol{\theta}\_{\hat{\boldsymbol{\eta}}}}^{-1}(X\_{\hat{\boldsymbol{\eta}}}|X\_{\hat{\boldsymbol{\eta}}-1})\right) \end{bmatrix}.$$

It is worth mentioning that this theorem reflects the consistency of the estimator *β*ˆ *MQL*.

#### *3.2. Estimation of Thresholds Vector r*

Note that in the real data application, the threshold values are also unknown. In this subsection, we estimate the thresholds vector *r* = (*r*1,*r*2,...,*rT*) . Here, we further promote the nested sub-sample search (NeSS) algorithm (see, e.g., Yang et al. [15], Li and Tong [27], and Li et al. [31]) and use conditional least squares (CLS) and modified quasi-likelihood (MQL) principles to estimate *r*.

For some fixed *λ* = (*λ*1, *λ*2,..., *λT*) , the application of the conditional least squares principle yields the sum of squared errors:

$$\begin{split} &S\_{N}(\mathbf{r},\lambda) \\ &=\sum\_{s=0}^{N-1} \sum\_{j=1}^{T} \left( X\_{j+sT} - \sum\_{k=1}^{2} \frac{\sum\_{s=0}^{N-1} \left( X\_{j+sT}X\_{j+sT-1}I\_{j+sT-1}^{(k)} - \lambda\_{j}X\_{j+sT-1}I\_{j+sT-1}^{(k)} \right)}{\sum\_{s=0}^{N-1} X\_{j+sT-1}^{2} I\_{j+sT-1}^{(k)}} X\_{j+sT-1} I\_{j+sT-1}^{(k)} - \lambda\_{j} \right)^{2}, \end{split}$$

and then the thresholds vector *r* can be estimated by minimizing *SN*(*r*, *λ*),

$$\hat{\mathbf{r}} = \arg\min\_{\mathbf{r} \in [\mathbf{L}, \mathbf{\tilde{r}}]} S\_N(\mathbf{r}, \mathbf{\lambda}), \tag{8}$$

where *r* and *r* are some known lower and upper bounds of *r*. In practice, they can be selected as the minimum and maximum values in each cycle of the sample. For convenience, we consider an alternative objective function

$$J\_N(\mathbf{r}, \lambda) = S\_N - S\_N(\mathbf{r}, \lambda)\_\prime$$

where

$$S\_N = \sum\_{s=0}^{N-1} \sum\_{j=1}^{T} \left( X\_{j+sT} - \frac{\sum\_{s=0}^{N-1} \left( X\_{j+sT} X\_{j+sT-1} - \lambda\_j X\_{j+sT-1} \right)}{\sum\_{s=0}^{N-1} X\_{j+sT-1}^2} X\_{j+sT-1} - \lambda\_j \right)^2.$$

Now, the optimization in (8) is equivalent to

$$\hat{\mathbf{r}}\_{\text{CLS}} = \arg\max\_{\mathbf{r} \in [\mathbf{z}\mathcal{P}]} J\_N(\mathbf{r}, \boldsymbol{\lambda}), \tag{9}$$

where ˆ*rCLS* is the conditional least squares estimator of the thresholds vector *r*.

Inspired by the method of conditional least squares, we investigate the performances of *r* by using the quasi-likelihood principle. The modified quasi-likelihood estimator *r*ˆ*MQL* of *r* is obtained by maximizing the expression

$$
\vec{f}\_N(\mathbf{r}, \lambda) = \vec{S}\_N - \vec{S}\_N(\mathbf{r}, \lambda),
$$

which yields

$$\hat{\sigma}\_{MQL} = \arg\max\_{r \in [\underline{\mathcal{F}}]} \mathbb{J}\_N(r, \lambda), \tag{10}$$

where

*S*˜ *<sup>N</sup>*(*r*, *λ*)

$$=\sum\_{s=0}^{N-1} \sum\_{j=1}^{T} V\_{\tilde{\mathfrak{d}}\_{j}}^{-1} \left( X\_{j+sT} - \sum\_{k=1}^{2} \frac{\sum\_{s=0}^{N-1} V\_{\tilde{\mathfrak{d}}\_{j}}^{-1} \cdot \left( X\_{j+sT} X\_{j+sT-1} I\_{j+sT-1}^{(k)} - \lambda\_{j} X\_{j+sT-1} I\_{j+sT-1}^{(k)} \right)}{\sum\_{s=0}^{N-1} V\_{\tilde{\mathfrak{d}}\_{j}}^{-1} \cdot X\_{j+sT-1}^{2} I\_{j+sT-1}^{(k)}} X\_{j+sT-1} I\_{j+sT-1}^{(k)} - \lambda\_{j} \right)^{2},\tag{10}$$

and

$$S\_N = \sum\_{s=0}^{N-1} \sum\_{j=1}^T V\_{\tilde{\mathfrak{d}}\_{\tilde{\mathfrak{d}}\_j}}^{-1} \left( X\_{j+sT} - \frac{\sum\_{s=0}^{N-1} V\_{\tilde{\mathfrak{d}}\_j}^{-1} \cdot \left( X\_{j+sT} X\_{j+sT-1} - \lambda\_j X\_{j+sT-1} \right)}{\sum\_{s=0}^{N-1} V\_{\tilde{\mathfrak{d}}\_j}^{-1} \cdot X\_{j+sT-1}^2} X\_{j+sT-1} - \lambda\_j \right)^2.$$

It is worth mentioning that there are unknown parameters *λ<sup>j</sup>* with *j* = 1, ... , *T* when we use (9) and (10) to estimate thresholds vector *r*. As argued in Li and Tong [27], Yang et al. [14], and Yang et al. [15], when *λ* and *r* are one-dimensional parameters, we can choose any positive number as the value of *λ* without worrying about getting a wrong result of *r*ˆ. Fortunately, we also find out by simulations that the estimations of *r* by maximizing ˜*JN*(*r*, *λ*) and *JN*(*r*, *λ*) do not depend on the value of *λ*. In order to give an intuitive impression of ˜*JN*(*r*, *λ*)/*N*, we generate a set of data with Model I (given in Section 4, i.e., *T* = 2, *N* = 50, *β* = (0.2, 0.1, 3, 0.8, 0.1, 7) ,*r* = (8, 4) ), and plot the shapes of ˜*JN*(*r*, *λ*)/*N*. From Figure 1, we can see that for different values of *λ*, the shape of ˜*JN*(*r*, *λ*)/*N* changes, but the maximum value in each subfigure is obtained at the true thresholds vector *r* = (8, 4) . In practice, we can choose the mean in each cycle of the samples for *λj*, *j* = 1, 2, . . . , *T*.

Actually, using the quasi-likelihood method to estimate the thresholds is a three-step estimation procedure, and we now present the algorithm to implement our estimation procedure as follows:

Step 1: *Choose the upper bound r and lower bound r of r, solve (9) to get the r*ˆ*CLS with λ<sup>j</sup>* = *X*¯ *<sup>j</sup>* = 1 *<sup>N</sup>* <sup>∑</sup>*N*−<sup>1</sup> *<sup>s</sup>*=<sup>0</sup> *Xj*+*sT*, *<sup>j</sup>* <sup>=</sup> 1, 2, . . . , *T;*

Step 2: *Fix r*ˆ*CLS at the current value, solve (6) or (7) to get the σ*ˆ <sup>2</sup> *z*,*j* , *j* = 1, 2, ... , *T, where α*(*k*) *<sup>j</sup> and λ<sup>j</sup> with k* = 1, 2 *can be estimated by other methods, then solve (5) to get β*ˆ *MQL.*

Step 3: *Fix θ*ˆ*<sup>j</sup>* = *α*ˆ (1) *j*,*MQL* 1 − *α*ˆ (1) *j*,*MQL* , *α*ˆ (2) *j*,*MQL* 1 − *α*ˆ (2) *j*,*MQL* , *σ*ˆ <sup>2</sup> *z*,*j* , *j* = 1, 2, ... , *T at its estimated value from Step 2, choose the same upper bound r and lower bound r as in Step 1, solve (10) to get r*ˆ*MQL.*

**Figure 1.** The shapes of ˜*JN*(*r*, *λ*)/*N*.

#### **4. Simulation Study**

In this section, we conduct simulation studies to illustrate the finite sample performances of the estimates. The initial value *X*<sup>0</sup> is fixed at 0. In order to capture the characteristics of the data from the PSETINAR(2; 1, 1)*<sup>T</sup>* process, we first generate a set of data with the distribution of innovations {*Zt*} given by Model I (mentioned below in this section) and parameters *β* = (0.2, 0.45, 1, 0.2, 0.45, 2, 0.65, 0.45, 1, 0.65, 0.45, 2, 0.2, 0.45, 3, 0.2, 0.45, 7, 0.8, 0.45, 7, 0.2, 0.1, 3, 0.8, 0.1, 7, 0.2, 0.1, 7, 0.8, 0.45, 2) , *r* = (3, 3, 3, 1, 3, 3, 5, 9, 3, 6, 7) , *T* = 11, *N* = 50. The parameter vectors we choose here are randomly selected, and there are slight differences between the parameters of each cycle, the thresholds vector of *r* was chosen such that there are enough data in each regime. We give the sample path in the first six cycles in Figure 2, of which *N* = 6. We can see that even if there are slight differences between the parameters of each cycle, the dataset still exhibits periodic characteristics.

To report the performances of the estimates, we conduct simulation studies under the following three models:

Model I. Assume that {*Zt*} is a sequence of i.i.d periodic Poisson distributed random variables with mean *<sup>E</sup>*(*Zt*) <sup>=</sup> *Var*(*Zt*) <sup>=</sup> *<sup>λ</sup><sup>j</sup>* for *<sup>t</sup>* <sup>=</sup> *<sup>j</sup>* <sup>+</sup> *sT*, *<sup>j</sup>* <sup>=</sup> 1, 2, . . . , *<sup>T</sup>*,*<sup>s</sup>* <sup>∈</sup> <sup>N</sup>0.

Model II. Assume that {*Zt*} is a sequence of i.i.d. periodic Geometric distributed random variables with p.m.f. given by

$$p(Z\_{j+sT} = z) = \frac{\lambda\_j^z}{\left(1 + \lambda\_j\right)^{1+z}}, z = 0, 1, 2, \dots$$

with *E*(*Zt*) = *λj*, *Var*(*Zt*) = *λ<sup>j</sup>* - 1 + *λ<sup>j</sup>* for *<sup>t</sup>* <sup>=</sup> *<sup>j</sup>* <sup>+</sup> *sT*, *<sup>j</sup>* <sup>=</sup> 1, 2, . . . , *<sup>T</sup>*,*<sup>s</sup>* <sup>∈</sup> <sup>N</sup>0. Model III. Assume that {*Zt*} is a sequence of i.i.d mixed distributed random variables,

$$Z\_{\mathbf{f}} = \Delta\_{\mathbf{f}} Z\_{1\mathbf{f}} + (1 - \Delta\_{\mathbf{f}}) Z\_{2\mathbf{f},\mathbf{f}}$$

where {Δ*t*} is a sequence of i.i.d periodic Bernoulli distributed random variables with *<sup>P</sup>*(Δ*<sup>t</sup>* <sup>=</sup> <sup>1</sup>) <sup>=</sup> <sup>1</sup> <sup>−</sup> *<sup>P</sup>*(Δ*<sup>t</sup>* <sup>=</sup> <sup>0</sup>) <sup>=</sup> *<sup>ρ</sup>j*, *<sup>ρ</sup>* <sup>=</sup> (*ρ*1, *<sup>ρ</sup>*2,..., *<sup>ρ</sup>T*) for *<sup>t</sup>* <sup>=</sup> *<sup>j</sup>* <sup>+</sup> *sT*, *<sup>j</sup>* <sup>=</sup> 1, 2, ... , *<sup>T</sup>*,*<sup>s</sup>* <sup>∈</sup> <sup>N</sup>0, which is independent of {*Zit*}, *i* = 1, 2.

For {*Z*1*t*} given in Model I and {*Z*2*t*} given in Model II, we can easily see that *E*(*Zt*) = *λj*, *Var*(*Zt*) = *λ*<sup>2</sup> *j* - 1 − *ρ<sup>j</sup>* + *λj*.

For each model, we generate the data with *X*<sup>0</sup> = 0, set *T* = 3 and the sample sizes *n* = *NT* = 150, 300, 900. All the calculations are performed under the R3.6.2 software with 1000 replications. We use the command *constrOptim* to optimize the objective function of the maximum likelihood estimation. The threshold vector is calculated by the algorithms discussed in Section 3.2. Other algorithms are based on the explicit expressions.

**Figure 2.** Sample path of the first six cycles.

#### *4.1. Performances of the β***ˆ** *CLS , β***ˆ***MQL and β***ˆ** *CML*

Pereira et al. [25] provided a theoretical basis for the conditional least squares (CLS) and conditional maximum likelihood (CML) estimators of the parameters vector *β* in the PSETINAR(2; 1, 1)*<sup>T</sup>* process but did not conduct simulation research. Manaa and Bentarzi [26] provided the asymptotic properties of the estimators and compared their performance through a simulation study. To compare the performance of the three estimators *β*ˆ *CLS*, *<sup>β</sup>*<sup>ˆ</sup> *CML* and *<sup>β</sup>*<sup>ˆ</sup> *MQL* (given in Section 3), we conduct simulation studies for these three estimators under Models I to III. The parameters are selected as follows:

Series A. *β* = (0.2, 0.45, 1, 0.2, 0.45, 2, 0.8, 0.45, 2) ,*r* = (3, 2, 2) .

Series B. *β* = (0.65, 0.45, 1, 0.65, 0.45, 2, 0.35, 0.45, 2) ,*r* = (2, 2, 3) .

Series C. *β* = (0.2, 0.45, 3, 0.2, 0.45, 7, 0.8, 0.45, 7) ,*r* = (12, 7, 9) .

To eliminate the influence of the change of parameters on estimates, we choose the series randomly and change the parameters with fixed *α*(*k*), *k* = 1, 2 or *λ* separately. The selection of these thresholds ensures there are enough data in each regime.

Spectral analysis starts from finding hidden periodicity, and it is an important subject of time series frequency domain analysis. The approach for studying hidden periods based on frequency domain analysis is the periodogram method, proposed by Schuster [34]; the rigorous examination is shown in Fisher [35]. For a series of observations {*Xt*}, *t* = 1, 2, . . . , *n*, the periodogram is defined as

$$I\_n(f\_k) = \frac{1}{n} |\sum\_{t=1}^n X\_t e^{-i2\pi f\_k t}|^2 = a\_k^2 + b\_{k'}^2 \tag{11}$$

where

$$a\_k = \begin{cases} \frac{1}{\sqrt{n}} \left( \sum\_{t=1}^n X\_t \cos(2\pi f\_k t) \right)^2, & k = 1, 2, \dots, \left\lceil \frac{n-1}{2} \right\rceil, \\\frac{1}{\sqrt{n}} \sum\_{t=1}^n (-1)^t X\_{t\prime} & k = \frac{n}{2}, \\\ 0, & k = 1, 2, \dots, \left\lceil \frac{n-1}{2} \right\rceil, \end{cases}$$

$$b\_k = \begin{cases} \frac{1}{\sqrt{n}} \left( \sum\_{t=1}^n X\_t \sin(2\pi f\_k t) \right)^2, & k = 1, 2, \dots, \left\lceil \frac{n-1}{2} \right\rceil, \\\ 0, & k = \frac{n}{2}, \end{cases}$$

and the period *T* = [1/argmax*<sup>f</sup> In*(*fk*)], where [·] denotes the integer part of a number.

The sample path and periodogram of the Series A, B and C under Model I are plotted in Figure 3 to show the periodic characteristics. Because the period is three and short, it is difficult to see the period from the sample path, but the periodogram can show the period very well. In addition, the simulation results are summarized in Tables 1–9.

As expected, biases and MSE of the estimators decrease as the sample size *N* increases, which is in agreement with the asymptotic properties of the estimators: asymptotic unbiasedness and consistency. Most of the biases and MSE in Model II are larger than those in Model I. Maybe this is because the variance of {*Zt*} in Model II is larger than that in Model I, which leads to the fluctuation of data.

Tables 1–6 summarize the simulation results for different series under Model I and Model II. From these tables, we can see that most of the biases and MSE of *β***ˆ** *MQL* are smaller than *β***ˆ** *CLS*. Perhaps it is because that the MQL method uses more information about the data than the CLS method. Therefore, the MQL method can obtain the optimal value more accurately. In addition, most of the biases of *β***ˆ** *MQL* are smaller than *<sup>β</sup>***<sup>ˆ</sup>** *CML*, while the MSE is larger, which is because the CML uses the distribution. If the distribution is correct, it is indeed better than the MQL. It is worth mentioning that the CML method is more complicated and time-consuming than the MQL method in the simulation procedure. We can conclude that the MQL estimators are better than CLS estimators, and the CML estimators are not unanimously better than MQL estimators.

**Table 1.** Bias and MSE for Series A of Model I (MSE in parentheses): CLS, MQL and CML.



**Table 2.** Bias and MSE for Series B of Model I (MSE in parentheses): CLS, MQL and CML.

**Table 3.** Bias and MSE for Series C of Model I (MSE in parentheses): CLS, MQL and CML.


**Table 4.** Bias and MSE for Series A of Model II (MSE in parentheses): CLS, MQL and CML.


**Table 4.** *Cont.*


**Table 5.** Bias and MSE for Series B of Model II (MSE in parentheses): CLS, MQL and CML.


**Table 6.** Bias and MSE for Series C of Model II (MSE in parentheses): CLS, MQL and CML.


To demonstrate the robustness of the MQL method, we consider the simulations about Model III with different series by using CLS, MQL and CML methods, and set *N* = 300, *ρ* = (0.9, 0.9, 0.9),(0.8, 0.8, 0.8), respectively. From Tables 7–9, we can see that when *ρ* varies from (0.9, 0.9, 0.9) down to (0.8, 0.8, 0.8), the effect on CLS and MQL estimators is slight. Most of the biases and MSE of MQL estimators are smaller than CLS. But due to

incorrect distribution used, the biases and MSE of CML estimators increase. This indicates that the MQL method is more robust than CLS and CML methods.


**Table 7.** Bias and MSE for Series A of Model III with *N* = 300 (MSE in parentheses).

**Table 8.** Bias and MSE for Series B of Model III with *N* = 300 (MSE in parentheses).


**Table 9.** Bias and MSE for Series C of Model III with *N* = 300 (MSE in parentheses).


**Figure 3.** The sample path and periodogram of Series A(top), B(middle) and C(bottom) in Model I.

#### *4.2. Performances of r***ˆ***MQL and r***ˆ***CLS*

As discussed in Section 3.2, we estimate the thresholds vector by using conditional least squares and modified quasi-likelihood methods. The performances of *r*ˆ*MQL* and *r*ˆ*CLS* are compared in this subsection through simulation studies. From the simulation results in Section 4.1, we find that the contaminated data generated from Model III has little influence on least squares and quasi-likelihood estimators, so we only simulate thresholds estimation for different series under Model I and Model II. We assess the performance of *r* by the bias, MSE and bias median, where the bias median is defined by:

$$\text{Bias median} = \underset{i \in \{1, 2, \dots, \kappa\}}{\text{median}} (\mathfrak{f}\_{i\bar{j}} - r\_{0\bar{j}}), j = 1, 2, \dots, T\_{\kappa}$$

where *r*ˆ*ij* is the estimator of *r*0*j*, *r*0*<sup>j</sup>* is the true value with *j* = 1, 2, ... , *T*, and *K* is the number of replications. The simulation results are summarized in Tables 10–15.


**Table 10.** Bias, bias median and MSE for Series A of Model I.

**Table 11.** Bias, bias median and MSE for Series B of Model I.



**Table 12.** Bias, bias median and MSE for Series C of Model I.

**Table 13.** Bias, bias median and MSE for Series A of Model II.


**Table 14.** Bias, bias median and MSE for Series B of Model II.



**Table 15.** Bias, bias median and MSE for Series C of Model II.

From Tables 10–15, we can see that all the simulation results perform better as sample size *N* increases, which implies that the estimators are consistent. The results in Tables 10–12 have smaller biases, bias medians and MSE than in Tables 13–15. This might be because the variance of Model II is larger than Model I for each series. Moreover, almost all the biases, bias medians and MSE of MQL estimators are smaller than CLS estimators, and the MSE of some MQL estimators are even half of the CLS. Because the thresholds are integer values, when we assess the accuracy of the estimators, the bias medians estimated can be more reasonable. It is concluded that it is much better to estimate the thresholds with the MQL method than CLS.

In the process of simulation, we generate the data with *X*<sup>0</sup> = 0; however, 0 is not the mean of the process, so we generate a set of data, discard some data generated first, and use the remaining data for inference, namely, "burn in" samples. Here, we generate a set of data with a length of 1800. We do the simulations for Series A of Model I, Model II and Model III (*ρ* = (0.8, 0.8, 0.8)). Other simulation settings are the same as before. The simulation results are listed in Tables 16–20. From these tables, we can see that under the "burn in" samples, the estimated results are similar to that when the initial value is 0, which indicates that the initial value will not affect our estimated results.

**Table 16.** Bias and MSE for Series A of Model I with "burn in" samples (MSE in parentheses): CLS, MQL and CML.



**Table 17.** Bias and MSE for Series A of Model II with "burn in" samples (MSE in parentheses): CLS, MQL and CML.

**Table 18.** Bias and MSE for Series A of Model III (*ρ* = (0.8, 0.8, 0.8)) with "burn in" samples (MSE in parentheses): CLS, MQL and CML.



**Table 19.** Bias, bias median and MSE for Series A of Model I with "burn in" samples.

**Table 20.** Bias, bias median and MSE for Series A of Model II with "burn in" samples.


#### **5. Real Data Example**

In this section, we use the PSETINAR(2; 1, 1)*<sup>T</sup>* process to fit the series of monthly counts of claimants collecting short-term disability benefits. In the dataset, all the claimants are male, have cuts, lacerations or punctures, and are between the ages of 35 and 54. In addition, they all work in the logging industry and collect benefits from the Workers' Compensation Board (WCB) of British Columbia. The dataset consists of 120 observations, from 1985 to 1994 (Freeland [36]). The simulations were performed on the R3.6.2 software. The threshold vector was calculated by the algorithms (the three-step algorithm of NeSS combined with quasi-likelihood principle and the algorithm of NeSS combined with least squares principle) described in Section 3.2. We uses the command *constrOptim* to optimize the objective function of the maximum likelihood estimation. Figure 4 shows the sample path, ACF and PACF plots of the observations. It can be seen from Figure 4 that this dataset is a dependent counting time series with periodic characteristic.

0 5 10 15 20 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 Lag ACF **Sample Autocorrelation Function** (**b**) 5 10 15 20 −0.2 0.0 0.2 0.4 Lag PACF **Sample Partial Autocorrelation Function** (**c**)

**Figure 4.** The sample path plot (**a**), ACF and PACF plots (**b**,**c**) for the counts of claimants.

We use the periodogram method to determine the period about this dataset and draw Figure 5, from which it can be seen that *In*(*fk*) reach maximum at *fk* = 1/12, and concluded that *T* = 12. This displays the periodic characteristic of the data and exhibits a form of periodic change per year.

**Figure 5.** The periodogram plot for the monthly counts of claimants.

Table 21 displays the descriptive statistics for the monthly counts of claimants collecting short-term disability benefits from WCB. From Table 21, we can see that the mean and variance are approximately equal in some months. We can assume that the distribution of the innovations is a periodic Poisson. However, some months and the total data indicate overdispersion. We find that the dataset has no zero and the minimum value is one. This leads us to consider the periodic Poisson, periodic Geometric, zero-truncated periodic Poisson and zero-truncated periodic Geometric distributions for the innovations to fit the model, respectively. Before the model fitting, we first estimate the threshold vector. The *r*ˆ*CLS* is calculated by (9) and the *r*ˆ*MQL* is calculated through (10) by using the three-step algorithm. Table 22 summarizes the fitting results of *r*ˆ*CLS* and *r*ˆ*MQL*. Due to the lesser data, to fit the model better, when the number of data in each regime is relatively smaller than two or the threshold is the maximum or minimum value of the boundary, we think that these monthly data do not have a piecewise phenomenon, that is, March, July, and August do not have piecewise phenomena.

**Table 21.** Summary statistics for the monthly counts of claimants.


**Table 22.** Threshold estimators for the monthly counts of claimants.


To capture the piecewise phenomenon of this time series dataset, we use PINAR(1)*<sup>T</sup>* and PSETINAR(2; 1, 1)*<sup>T</sup>* models with period *T* = 12 to fit the dataset, respectively. The PINAR(1) process proposed by Monteiro et al. [22] with the following recursive equation

$$X\_t = \mathfrak{a}\_t \circ X\_{t-1} + Z\_{t\prime} \tag{12}$$

with *<sup>α</sup><sup>t</sup>* <sup>=</sup> *<sup>α</sup><sup>j</sup>* <sup>∈</sup> (0, 1) for *<sup>t</sup>* <sup>=</sup> *<sup>j</sup>* <sup>+</sup> *sT*(*<sup>j</sup>* <sup>=</sup> 1, ... , *<sup>T</sup>*,*<sup>s</sup>* <sup>∈</sup> <sup>N</sup>0), the definition of thinning operator "◦" and innovation process {*Zt*} is the same as the PSETINAR(2; 1, 1)*<sup>T</sup>* process.

It is worth mentioning that for this dataset, the conditional least squares and quasilikelihood methods produce non-admissible estimators for some months, so we use the conditional maximum likelihood approach to estimate the parameters. Next, we use PSETINAR(2; 1, 1)<sup>12</sup> and PINAR(1)<sup>12</sup> models to fit the dataset in combination with the four innovation distributions mentioned before. Here, the threshold vectors are based on *r*ˆ*MQL*. The AIC and BIC are listed in Table 23. When we fit the dataset, we hope to get smaller AIC and BIC values. From the results in Table 23, we can conclude that the PSETINAR(2; 1, 1)<sup>12</sup> model with zero-truncated periodic Poisson distribution is more suitable. Then, we do the conditional maximum likelihood estimation, and the results are listed in Table 24. Some estimators of the parameters in Table 24, for example, the *α*(2) of January, May, June, September, October and November, are not statistically significant, suggesting that on those months, the number of claims is mainly modeled through the innovation process.


**Table 23.** The AIC and BIC of the claims data.

**Table 24.** CML estimators in the dataset.


Remark: "-" stand for not available.

To check the predictability of the PSETINAR(2; 1, 1)*<sup>T</sup>* model, we carry out the *h*step-ahead forecasting for varying *h* of the PSETINAR(2; 1, 1)*<sup>T</sup>* model. The *h*-step-ahead conditional expectation point predictor of the PSETINAR(2; 1, 1)*<sup>T</sup>* model is given by

$$\mathcal{X}\_{j+sT+h} = E\left[X\_{j+sT+h}|X\_{j+sT}\right], \ h = 1, 2, \dots$$

Specifically, the one-step-ahead conditional expectation point predictor is given by

$$\mathcal{X}\_{j+sT+1} = E\left[X\_{j+sT+1}|X\_{j+sT}\right] = \mathfrak{a}\_{j+1}^{(1)}X\_{j+sT}I\_{j+sT}^{(1)} + \mathfrak{a}\_{j+1}^{(2)}X\_{j+sT}I\_{j+sT}^{(2)} + \lambda\_{j+1}.$$

However, the conditional expectation will seldom produce integer-valued forecasts. Recently, coherent forecasting techniques have been recommended, which only produce forecasts in N0. This is achieved by computing the *h*-step-ahead forecasting conditional distribution. As pointed out by Möller et al. [37], this approach leads to forecasts themselves being easily obtained from the median or the mode of the forecasting distribution. In addition, Li et al. [38] and Kang et al. [8] have applied this method to forecast the integer-valued processes. Homburg et al. [39] discussed the prediction methods based on conditional distributions and Gaussian approximations and applied them to some integer-valued processes and compared them. For the PSETINAR(2; 1, 1)*<sup>T</sup>* process, the one-step-ahead conditional distribution of *Xj*+*sT*+<sup>1</sup> given *Xj*+*sT* is given by

$$\begin{split} &P\left(\mathbf{X}\_{j+sT+1} = \mathbf{x}\_{j+sT+1} | \mathbf{X}\_{j+sT} = \mathbf{x}\_{j+sT}\right) \\ &= \sum\_{i=1}^{\min\left\{x\_{j+sT}x\_{j+sT+1}\right\}} \sum\_{k=1}^{2} \binom{\mathbf{x}\_{j+sT}}{i} a\_{j+1}^{(k)^{i}} \left(1 - a\_{j+1}^{(k)}\right)^{\mathbf{x}\_{j+sT}-i} I\_{j+sT}^{(k)} P\left(Z\_{j+sT+1} = \mathbf{x}\_{j+sT+1} - i\right). \end{split}$$

Due to the existence of the threshold, while we use the conditional expectation method to predict *Xj*<sup>+</sup>*sT*+*h*, *<sup>h</sup>* > 1, we have to predict the previous moment of *Xj*<sup>+</sup>*sT*+*h*−<sup>1</sup> first and compare it with the corresponding threshold before we do the next prediction. We do the same for the conditional distribution method. (To prevent confusion, we call this method a point-wise conditional distribution forecast. The predictors completely based on *h*-stepahead conditional distribution without intermediate step prediction will be discussed later.) The mode of *h*-step-ahead point-wise conditional distribution can be viewed as the point prediction. Here we compare the two forecasting methods, a standard descriptive measure of forecasting accuracy, namely, *h*-step-ahead predicted root mean squared error (PRMSE) is adopted. This measure can be given by

$$PRMSE = \sqrt{\frac{1}{K - K\_0} \sum\_{t = K\_0 + 1}^{K} \left( X\_{t + h} - \hat{X}\_{t + h} \right)^2}, \ h = 1, 2, \dots, 2$$

where *K* is the full sample size, we split the data into two parts, and the last *K* − *K*<sup>0</sup> observations as a forecasting evaluation sample. We forecast the value of the last year when *h* = 1, 2, 3, 12.

The PRMSEs of the *h*-step-ahead point predictors are list in Table 25. For conditional expectation point predictors, the PRMSEs of PSETINAR(2; 1, 1)<sup>12</sup> with zero-truncated periodic Poisson distribution are smaller than the PINAR(1)<sup>12</sup> with periodic Poisson and zero-truncated periodic Poisson distributions. This further shows the superiority of our model. The PRMSEs of the one-step-ahead point predictors are smaller than others. This is very natural because we use the value of the previous moment as the explanatory variable. For PSETINAR(2; 1, 1)<sup>12</sup> with zero-truncated periodic Poisson distribution, the PRMSEs of twelve-step-ahead predictors are smaller than other *h*-step-ahead predictors except for one-step-ahead. This may be because our period is 12. The PRMSE of one-step-ahead conditional expectation point predictors is smaller than point-wise conditional distribution point predictors. Thus, the former method is better for this dataset.

The PRMSEs of the one-step-ahead fitted series calculated by conditional expectation and conditional distribution are 2.434 and 3.565, respectively. This further illustrates that for our dataset, one-step-ahead forecasting conditional expectation is better than conditional distribution. The original data and the fitted series (calculated by the one-stepahead conditional expectation based on the observations of the previous moments) by the PSETINAR(2; 1, 1)<sup>12</sup> model with zero-truncated periodic Poisson distribution are plotted in Figure 6. It is observed that the trend is similar to the original data. Except for the points with large value (the unexpected prediction may be due to the wrong judgement of regime), this model fits the data well.

**Figure 6.** Plot of fitted curves of the claims data.


**Table 25.** PRMSE of the *h*-step-ahead point predictors.

Actually, we can get the *h*-step-ahead conditional distribution; here, we list the twostep-ahead and three-step-ahead conditional distributions as an example,

$$\begin{aligned} P\left(\mathbf{X}\_{j+sT+2} = \mathbf{x}\_{j+sT+2} | \mathbf{X}\_{j+sT} = \mathbf{x}\_{j+sT}\right) \\ = \sum\_{m=0}^{n} P\left(\mathbf{X}\_{j+sT+1} = m | \mathbf{X}\_{j+sT} = \mathbf{x}\_{j+sT}\right) P\left(\mathbf{X}\_{j+sT+2} = \mathbf{x}\_{j+sT+2} | \mathbf{X}\_{j+sT+1} = m\right), \end{aligned}$$

and

$$\begin{aligned} P\left(\mathbf{X}\_{j+sT+3} = \mathbf{x}\_{j+sT+3} | \mathbf{X}\_{j+sT} = \mathbf{x}\_{j+sT}\right) \\ \mathbf{x} = \sum\_{m=0}^{n} P\left(\mathbf{X}\_{j+sT+2} = m | \mathbf{X}\_{j+sT} = \mathbf{x}\_{j+sT}\right) P\left(\mathbf{X}\_{j+sT+3} = \mathbf{x}\_{j+sT+3} | \mathbf{X}\_{j+sT+2} = m\right), \end{aligned}$$

where *<sup>m</sup>* ∈ {0, 1, ... , *<sup>n</sup>*} is the possible domain of *Xj*+*sT*, *<sup>j</sup>* <sup>=</sup> 1, ... , *<sup>T</sup>*, and *<sup>s</sup>* <sup>∈</sup> <sup>N</sup>0. When *h* = 1, 2, 3, we show the plots of the *h*-step-ahead conditional distribution in Figure 7, where *xj*+*sT* represents the count of claimants in December 1993 and February 1994, respectively. The mode of h-step-ahead conditional distribution can be viewed as the point prediction. The PRMSEs of the two-step-ahead and three-step-ahead point predictors for the last year are 3.227 and 3.215, respectively, which is larger than the point-wise conditional distribution method described before. Maybe for other datasets or models, the *h*-step-ahead forecasting conditional distribution will show some advantages. We will not go into details here.

**Figure 7.** The *h*-step-ahead forecasting conditional distribution for the counts of claimants: (**a**–**c**) conditional on the count of claimants in December 1993; (**d**–**f**) conditional on the count of claimants in February 1994.

#### **6. Conclusions**

This paper extended the PSETINAR(2; 1, 1)*<sup>T</sup>* process proposed by Pereira et al. [25], by removing the assumption of Poisson distribution of {*Zt*} and considered the PSETINAR (2; 1, 1)*<sup>T</sup>* process under weak conditions that the second moment of {*Zt*} is finite. The ergodicity of the process is established. MQL-estimators of the model parameters vector *β*, MQL-estimators and CLS-estimators of the thresholds vector *r* are obtained. Moreover, through simulation, we can see the advantages of the quasi-likelihood method by comparing with the conditional maximum likelihood and conditional least square methods. An application to a real dataset is presented. In addition, the forecasting problem of this dataset is addressed.

In this paper, we only discuss the PSETINAR(2; 1, 1)*<sup>T</sup>* process for univariate time series. Hence, an extension for the multivariate PSETINAR(2; 1, 1)*<sup>T</sup>* process with a diagonal or cross-correlation autoregressive matrix is a topic for future investigation. Furthermore, it is also important to stress that beyond this extension, there are a number of interesting problems for future research in this area. For example, even a simple periodic model can have an inordinately large number of parameters. This is also true for PSETINAR(2; 1, 1)*<sup>T</sup>* models and even multi-period models. Therefore, the development of procedures of dimensionality reduction to overcome the computational difficulties is an impending problem. This remains a topic of future research.

**Author Contributions:** Conceptualization, C.L. and D.W.; methodology, C.L.; software, C.L.; validation, C.L., J.C. and D.W.; formal analysis, C.L.; investigation, C.L. and D.W.; resources, C.L. and D.W.; data curation, C.L.; writing—original draft preparation, C.L.; writing—review and editing, J.C. and D.W.; visualization, C.L.; supervision, J.C. and D.W.; project administration, D.W.; funding acquisition, D.W. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the National Natural Science Foundation of China (No. 11871028, 11731015, 11901053, 12001229), and the Natural Science Foundation of Jilin Province (No. 20180101216JC).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The dataset is available in the book Freeland [36].

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

**Proof of Theorem 1.** According to Theorem 2 of Tweedie [40] (see also, Zheng and Basawa [41]), for the process defined by (2), and <sup>∀</sup>*<sup>j</sup>* <sup>=</sup> 1, 2, . . . , *<sup>T</sup>*,*<sup>s</sup>* <sup>∈</sup> <sup>Z</sup>, we have

$$E\left(X\_{j+sT}|X\_{j+sT-1}=\mathbf{x}\right) = \mathfrak{a}\_j^{(1)}\mathfrak{x}I\_{j+sT-1}^{(1)} + \mathfrak{a}\_j^{(2)}\mathfrak{x}I\_{j+sT-1}^{(2)} + \lambda\_j \le \mathfrak{a}\_{j,\max}\mathfrak{x} + \lambda\_{j+s}$$

where *<sup>α</sup>j*,max <sup>=</sup> max{*α*(1) *<sup>j</sup>* , *<sup>α</sup>*(2) *<sup>j</sup>* } < 1. Let *K* = 1+*λ<sup>j</sup>*

<sup>1</sup>−*αj*,max + 1, where [·] denotes the integer part of a number. Then for *x* ≥ *K*, we have

$$E\left(X\_{j+sT}|X\_{j+sT-1}=\mathfrak{x}\right)\leq \mathfrak{x}-1,$$

and for *x* < *K*,

$$E\left(X\_{j+sT}|X\_{j+sT-1}=\mathbf{x}\right) \le \mathfrak{a}\_{j,\max}\mathfrak{x} + \lambda\_j \le K + \lambda\_j < \infty.$$

Therefore, the process {*Xt*} for *t* = *j* + *sT* defined in (2) is an ergodic Markov chain.

**Proof of Proposition 2.** (i) From Proposition 1, we have

$$\begin{split} Var\left(X\_{j+sT}|X\_{j+sT-1}\right) &= E\left\{X\_{j+sT} - E\left(X\_{j+sT}|X\_{j+sT-1}\right)\right\}^2 \\ &= E\left(X\_{j+sT} - a\_j^{(1)}X\_{j+sT-1}I\_{j+sT-1}^{(1)} - a\_j^{(2)}X\_{j+sT-1}I\_{j+sT-1}^{(2)} - \lambda\_j\right)^2, \end{split}$$

and

$$\operatorname{Var}\left(X\_{\bar{j}+sT}|X\_{\bar{j}+sT-1}\right) = \sum\_{k=1}^{2} a\_{\bar{j}}^{(k)} \left(1 - a\_{\bar{j}}^{(k)}\right) X\_{\bar{j}+sT-1} I\_{\bar{j}+sT-1}^{(k)} + \sigma\_{z,\bar{j}',s}^2$$

with *<sup>k</sup>* <sup>=</sup> 1, 2, *<sup>j</sup>* <sup>=</sup> 1, ... , *<sup>T</sup>*,*<sup>s</sup>* <sup>∈</sup> <sup>N</sup>0, so by substituting suitable consistent estimators of *<sup>α</sup>*(*k*) *j* and *λj*, we can get the consistent estimation of *σ*<sup>2</sup> *z*,*j* ,

$$\begin{split} \mathfrak{d}\_{1,z,j}^{2} &= \frac{1}{N} \sum\_{s=0}^{N-1} \left( X\_{j+sT} - \sum\_{k=1}^{2} \mathring{\mathfrak{a}}\_{j}^{(k)} X\_{j+sT-1} I\_{j+sT-1}^{(k)} - \mathring{\lambda}\_{j} \right)^{2} \\ &- \frac{1}{N} \sum\_{k=1}^{2} \sum\_{s=0}^{N-1} \mathring{\mathfrak{a}}\_{j}^{(k)} \left( 1 - \mathring{\mathfrak{a}}\_{j}^{(k)} \right) X\_{j+sT-1} I\_{j+sT-1}^{(k)}. \end{split}$$

(ii) Moreover, from model (2), we have

$$\begin{split} Var\left(X\_{j+sT}\right) &= \sum\_{k=1}^{2} Var\left\{ \left(a\_{j}^{(k)} \circ X\_{j+sT-1}\right) I\_{j+sT-1}^{(k)} \right\} + Var\left(Z\_{j+sT}\right) \\ &+ 2Cov\left\{ \left(a\_{j}^{(1)} \circ X\_{j+sT-1}\right) I\_{j+sT-1}^{(1)} \left(a\_{j}^{(2)} \circ X\_{j+sT-1}\right) I\_{j+sT-1}^{(2)} \right\}, \end{split}$$

where

$$\begin{split} &Var\left\{ \left( \boldsymbol{a}\_{j}^{(k)} \diamond \boldsymbol{X}\_{j+sT-1} \right) I\_{j+sT-1}^{(k)} \right\} \\ &= Var\left\{ E\left[ \left( \boldsymbol{a}\_{j}^{(k)} \diamond \boldsymbol{X}\_{j+sT-1} \right) I\_{j+sT-1}^{(k)} | \boldsymbol{X}\_{j+sT-1} \right] \right\} + E\left\{ Var\left[ \left( \boldsymbol{a}\_{j}^{(k)} \diamond \boldsymbol{X}\_{j+sT-1} \right) I\_{j+sT-1}^{(k)} | \boldsymbol{X}\_{j+sT-1} \right] \right\} \\ &= {} \boldsymbol{a}\_{j}^{(k)} \cdot Var\left( \boldsymbol{X}\_{j+sT-1} I\_{j+sT-1}^{(k)} \right) + {} {}\_{j}^{(k)} \left( 1 - {}\_{j}^{(k)} \right) E\left( \boldsymbol{X}\_{j+sT-1} I\_{j+sT-1}^{(k)} \right), \end{split}$$

and

$$\begin{split} & 2\mathsf{C}ov \left\{ \left( a\_{j}^{(1)} \diamond X\_{j+sT-1} \right) I\_{j+sT-1}^{(1)} \left( a\_{j}^{(2)} \diamond X\_{j+sT-1} \right) I\_{j+sT-1}^{(2)} \right\} \\ & = -2E \left\{ \left( a\_{j}^{(1)} \diamond X\_{j+sT-1} \right) I\_{j+sT-1}^{(1)} \right\} E \left\{ \left( a\_{j}^{(2)} \diamond X\_{j+sT-1} \right) I\_{j+sT-1}^{(2)} \right\} \\ & = -2a\_{j}^{(1)} a\_{j}^{(2)} E \left\{ X\_{j+sT-1} I\_{j+sT-1}^{(1)} \right\} E \left\{ X\_{j+sT-1} I\_{j+sT-1}^{(2)} \right\}. \end{split}$$

Note that

$$\begin{aligned} &E\left\{X\_{j+sT-1}I\_{j+sT-1}^{(1)}\right\} \\ &=E\left\{E\left[X\_{j+sT-1}I\_{j+sT-1}^{(1)}|I\_{j+sT-1}^{(1)}=1\right]\right\} \\ &=E\left\{X\_{j+sT-1}I\_{j+sT-1}^{(1)}|I\_{j+sT-1}^{(1)}=1\right\}P\left(I\_{j+sT-1}^{(1)}=1\right) \\ &=E\left\{X\_{j+sT-1}|X\_{j+sT-1}\le r\_{j}\right\}P\left(I\_{j+sT-1}^{(1)}=1\right). \end{aligned}$$

Let *pj* = *P I* (1) *<sup>j</sup>*+*sT*−<sup>1</sup> <sup>=</sup> <sup>1</sup> with *<sup>j</sup>* <sup>=</sup> 1, ... , *<sup>T</sup>*,*<sup>s</sup>* <sup>∈</sup> <sup>N</sup>0, we can estimate it with *<sup>p</sup>*ˆ*<sup>j</sup>* <sup>=</sup> <sup>1</sup> *N N*−1 ∑ *s*=0 *I* (1) *<sup>j</sup>*+*sT*−1. Therefore, by substituting a suitable consistent estimator of *α*(*k*)

*<sup>j</sup>* , based on moment estimation, we can get the estimator *σ*ˆ <sup>2</sup> 2,*z*,*<sup>j</sup>* in Proposition 2.

**Proof of Theorem 2.** Let F*j*+*sT* = *σ* - *X*0, *X*1,..., *Xj*+*sT* with *<sup>j</sup>* <sup>=</sup> 1, ... , *<sup>T</sup>*,*<sup>s</sup>* <sup>∈</sup> <sup>N</sup>0. First, we suppose *θ* is known, for the following estimation equations:

*S*(1) *N*,*j* - *θj*, *β<sup>j</sup>* = *N*−1 ∑ *s*=0 *<sup>V</sup>*−<sup>1</sup> *<sup>θ</sup><sup>j</sup> Xj*+*sT* <sup>−</sup> *<sup>α</sup>*(1) *<sup>j</sup> Xj*+*sT*−<sup>1</sup> *I* (1) *<sup>j</sup>*+*sT*−<sup>1</sup> <sup>−</sup> *<sup>α</sup>*(2) *<sup>j</sup> Xj*+*sT*−<sup>1</sup> *I* (2) *<sup>j</sup>*+*sT*−<sup>1</sup> <sup>−</sup> *<sup>λ</sup><sup>j</sup> Xj*+*sT*−<sup>1</sup> *I* (1) *<sup>j</sup>*+*sT*−1,

we have

$$\begin{split} &E\left[\boldsymbol{V}\_{\boldsymbol{\theta}\_{j}^{\top}}^{-1}\big(\boldsymbol{X}\_{j+s\boldsymbol{T}}-\boldsymbol{a}\_{j}^{(1)}\boldsymbol{X}\_{j+s\boldsymbol{T}-1}\boldsymbol{I}\_{j+s\boldsymbol{T}-1}^{(1)}-\boldsymbol{a}\_{j}^{(2)}\boldsymbol{X}\_{j+s\boldsymbol{T}-1}\boldsymbol{I}\_{j+s\boldsymbol{T}-1}^{(2)}-\boldsymbol{\lambda}\_{j}\big)\boldsymbol{X}\_{j+s\boldsymbol{T}-1}\boldsymbol{I}\_{j+s\boldsymbol{T}-1}^{(1)}\big|\mathcal{F}\_{j+s\boldsymbol{T}-1}\big]\right] \\ &=\boldsymbol{V}\_{\boldsymbol{\theta}\_{j}^{\top}}^{-1}\boldsymbol{X}\_{j+s\boldsymbol{T}-1}\boldsymbol{I}\_{j+s\boldsymbol{T}-1}^{(1)}\boldsymbol{E}\big[\left(\boldsymbol{X}\_{j+s\boldsymbol{T}}-\boldsymbol{E}\big(\boldsymbol{X}\_{j+s\boldsymbol{T}}|\boldsymbol{X}\_{j+s\boldsymbol{T}-1}\big)\right)\big|\mathcal{F}\_{j+s\boldsymbol{T}-1}\big] \\ &=\boldsymbol{V}\_{\boldsymbol{\theta}\_{j}^{\top}}^{-1}\boldsymbol{X}\_{j+s\boldsymbol{T}-1}\boldsymbol{I}\_{j+s\boldsymbol{T}-1}^{(1)}\boldsymbol{E}\big[\left(\boldsymbol{X}\_{j+s\boldsymbol{T}}-\boldsymbol{E}\big(\boldsymbol{X}\_{j+s\boldsymbol{T}}|\boldsymbol{X}\_{j+s\boldsymbol{T}-1}\big)\right)\big|\mathcal{X}\_{j+s\boldsymbol{T}-1}\big] \\ &=\boldsymbol{0},\end{split}$$

and

$$\begin{split} &E\left[S\_{s,j}^{(1)}(\boldsymbol{\theta}\_{\boldsymbol{\hat{\beta}}},\boldsymbol{\beta}\_{\boldsymbol{\hat{\beta}}})\middle|\mathcal{F}\_{\boldsymbol{\hat{\beta}}+\boldsymbol{s}T-1}\right] \\ &= E\left[\boldsymbol{V}\_{\boldsymbol{\hat{\theta}}\_{\boldsymbol{\hat{\beta}}}}^{-1}\Big{(}\boldsymbol{X}\_{\boldsymbol{\hat{\beta}}+\boldsymbol{s}T-\boldsymbol{a}\_{j}^{(1)}\boldsymbol{X}\_{\boldsymbol{\hat{\beta}}+\boldsymbol{s}T-1}\boldsymbol{I}\_{j+\boldsymbol{s}T-1}^{(1)}-\boldsymbol{a}\_{j}^{(2)}\boldsymbol{X}\_{\boldsymbol{\hat{\beta}}+\boldsymbol{s}T-1}\boldsymbol{I}\_{j+\boldsymbol{s}T-1}^{(2)}-\boldsymbol{\lambda}\_{\boldsymbol{\hat{\beta}}}\Big{)}\boldsymbol{X}\_{\boldsymbol{\hat{\beta}}+\boldsymbol{s}T-1}\boldsymbol{I}\_{j+\boldsymbol{s}T-1}^{(1)}+\boldsymbol{S}\_{s-1,j}^{(1)}\big{(}\boldsymbol{\theta}\_{\boldsymbol{\hat{\beta}}},\boldsymbol{\theta}\_{\boldsymbol{\hat{\beta}}}\right]\|\mathcal{F}\_{\boldsymbol{\hat{\beta}}+\boldsymbol{s}T-1}\| \\ &= E\left[\boldsymbol{S}\_{s-1,j}^{(1)}\big{(}\boldsymbol{\theta}\_{\boldsymbol{\hat{\beta}}},\boldsymbol{\theta}\_{\boldsymbol{\hat{\beta}}}\right]\|\mathcal{F}\_{\boldsymbol{\hat{\beta}}+\boldsymbol{s}T-1}\Big{]} \\ &= S\_{s-1,j}^{(1)}\big{(}\boldsymbol{\theta}\_{\boldsymbol{\hat{\beta}}},\boldsymbol{\theta}\_{\boldsymbol{\hat{\beta}}}\end{split}$$

so {*S*(1) *s*,*j* - *θj*, *β<sup>j</sup>* , <sup>F</sup>*j*+*sT*, *<sup>j</sup>* <sup>=</sup> 1, 2, ... , *<sup>T</sup>*,*<sup>s</sup>* <sup>∈</sup> <sup>N</sup>0} is a martingale. By Theorem 1.1 of Billingsley [42], we have

1 *N N*−1 ∑ *s*=0 *<sup>V</sup>*−<sup>2</sup> *<sup>θ</sup><sup>j</sup>* - *Xj*+*sT*|*Xj*+*sT*−<sup>1</sup> *Xj*+*sT* <sup>−</sup> *<sup>α</sup>*(1) *<sup>j</sup> Xj*+*sT*−<sup>1</sup> *I* (1) *<sup>j</sup>*+*sT*−<sup>1</sup> <sup>−</sup> *<sup>α</sup>*(2) *<sup>j</sup> Xj*+*sT*−<sup>1</sup> *I* (2) *<sup>j</sup>*+*sT*−<sup>1</sup> <sup>−</sup> *<sup>λ</sup><sup>j</sup>* 2 *X*2 *<sup>j</sup>*+*sT*−<sup>1</sup> *<sup>I</sup>* (1) *j*+*sT*−1 *<sup>a</sup>*.*s*. −→ *<sup>E</sup> <sup>V</sup>*−<sup>2</sup> *<sup>θ</sup><sup>j</sup>* - *Xj*|*Xj*−<sup>1</sup> *Xj* <sup>−</sup> *<sup>α</sup>*(1) *<sup>j</sup> Xj*−<sup>1</sup> *I* (1) *<sup>j</sup>*−<sup>1</sup> <sup>−</sup> *<sup>α</sup>*(2) *<sup>j</sup> Xj*−<sup>1</sup> *I* (2) *<sup>j</sup>*−<sup>1</sup> <sup>−</sup> *<sup>λ</sup><sup>j</sup>* 2 *X*2 *<sup>j</sup>*−<sup>1</sup> *<sup>I</sup>* (1) *j*−1 = *E*{*E <sup>V</sup>*−<sup>2</sup> *<sup>θ</sup><sup>j</sup>* - *Xj*|*Xj*−<sup>1</sup> *Xj* <sup>−</sup> *<sup>α</sup>*(1) *<sup>j</sup> Xj*−<sup>1</sup> *I* (1) *<sup>j</sup>*−<sup>1</sup> <sup>−</sup> *<sup>α</sup>*(2) *<sup>j</sup> Xj*−<sup>1</sup> *I* (2) *<sup>j</sup>*−<sup>1</sup> <sup>−</sup> *<sup>λ</sup><sup>j</sup>* 2 *X*2 *<sup>j</sup>*−<sup>1</sup> *<sup>I</sup>* (1) *<sup>j</sup>*−1|*Xj*−<sup>1</sup> } <sup>=</sup> *<sup>E</sup>*{*V*−<sup>2</sup> *<sup>θ</sup><sup>j</sup>* - *Xj*|*Xj*−<sup>1</sup> *X*2 *<sup>j</sup>*−<sup>1</sup> *<sup>I</sup>* (1) *<sup>j</sup>*−1*<sup>E</sup> Xj* <sup>−</sup> *<sup>α</sup>*(1) *<sup>j</sup> Xj*−<sup>1</sup> *I* (1) *<sup>j</sup>*−<sup>1</sup> <sup>−</sup> *<sup>α</sup>*(2) *<sup>j</sup> Xj*−<sup>1</sup> *I* (2) *<sup>j</sup>*−<sup>1</sup> <sup>−</sup> *<sup>λ</sup><sup>j</sup>* 2 |*Xj*−<sup>1</sup> } <sup>=</sup> *<sup>E</sup>*{*V*−<sup>1</sup> *<sup>θ</sup><sup>j</sup>* - *Xj*|*Xj*−<sup>1</sup> *X*2 *<sup>j</sup>*−<sup>1</sup> *<sup>I</sup>* (1) *<sup>j</sup>*−1} *Hj*,11- *θj* .

Thus, by the central limit theorem of martingale, we get

$$\frac{1}{\sqrt{N}}S^{(1)}\_{N,j}(\boldsymbol{\theta}\_{j\prime}\boldsymbol{\beta}\_{j}) \xrightarrow{L} \mathcal{N}\left(0, H\_{j,11}(\boldsymbol{\theta}\_{j})\right).$$

Similarly,

$$\frac{1}{\sqrt{N}}S^{(2)}\_{N,\boldsymbol{j}}(\boldsymbol{\theta}\_{\boldsymbol{j}},\boldsymbol{\beta}\_{\boldsymbol{j}}) \xrightarrow{L} \mathcal{N}\left(0, H\_{\boldsymbol{j},22}(\boldsymbol{\theta}\_{\boldsymbol{j}})\right).$$

$$\frac{1}{\sqrt{N}}S^{(3)}\_{N,\boldsymbol{j}}(\boldsymbol{\theta}\_{\boldsymbol{j}},\boldsymbol{\beta}\_{\boldsymbol{j}}) \xrightarrow{L} \mathcal{N}(0,H\_{\boldsymbol{j},33}(\boldsymbol{\theta}\_{\boldsymbol{j}})),$$

where

$$S\_{N,j}^{(2)}(\\\theta\_{\hat{\boldsymbol{\nu}}},\\\theta\_{\hat{\boldsymbol{\nu}}}) = \sum\_{s=0}^{N-1} V\_{\theta\_{\hat{\boldsymbol{\nu}}}}^{-1} \left( \mathbf{X}\_{\hat{\boldsymbol{\nu}}+s\boldsymbol{T}} - a\_{\hat{\boldsymbol{\nu}}}^{(1)} \mathbf{X}\_{\hat{\boldsymbol{\nu}}+s\boldsymbol{T}-1} I\_{\hat{\boldsymbol{\nu}}+s\boldsymbol{T}-1}^{(1)} - a\_{\hat{\boldsymbol{\nu}}}^{(2)} \mathbf{X}\_{\hat{\boldsymbol{\nu}}+s\boldsymbol{T}-1} I\_{\hat{\boldsymbol{\nu}}+s\boldsymbol{T}-1}^{(2)} - \lambda\_{\hat{\boldsymbol{\nu}}} \right) \mathbf{X}\_{\hat{\boldsymbol{\nu}}+s\boldsymbol{T}-1} I\_{\hat{\boldsymbol{\nu}}+s\boldsymbol{T}-1}^{(2)}$$
 
$$\text{and}$$

$$S\_{N,j}^{(3)}(\\\theta\_{\boldsymbol{\upbeta}}, \boldsymbol{\upbeta}\_{\boldsymbol{\upbeta}}) = \sum\_{s=0}^{N-1} V\_{\boldsymbol{\uptheta}\_{\boldsymbol{\upbeta}}}^{-1} \left( X\_{\boldsymbol{\upbeta}+sT} - a\_{\boldsymbol{\upbeta}}^{(1)} X\_{\boldsymbol{\upbeta}+sT-1} I\_{\boldsymbol{\upbeta}+sT-1}^{(1)} - a\_{\boldsymbol{\upbeta}}^{(2)} X\_{\boldsymbol{\upbeta}+sT-1} I\_{\boldsymbol{\upbeta}+sT-1}^{(2)} - \lambda\_{\boldsymbol{\upbeta}} \right).$$

For any *c* = (*c*1, *c*2,..., *cT*) <sup>∈</sup> <sup>R</sup>3*<sup>T</sup>* \ (0, 0, . . . , 0), *<sup>c</sup><sup>j</sup>* <sup>=</sup> *c* (1) *<sup>j</sup>* , *c* (2) *<sup>j</sup>* , *c* (3) *j* with *j* = 1, 2, ... , *T*, to simplify, let

$$j + sT \neq i + kT, i, j = 1, 2, \dots, T, s, k \in \mathbb{N}\_0.$$

$$u\_{j,s} = c\_j^{(1)} X\_{j + sT - 1} I\_{j + sT - 1}^{(1)} + c\_j^{(2)} X\_{j + sT - 1} I\_{j + sT - 1}^{(2)} + c\_j^{(3)},$$

$$w\_{j,s} = X\_{j + sT} - E\left(X\_{j + sT} | X\_{j + sT - 1}\right) = X\_{j + sT} - a\_j^{(1)} X\_{j + sT - 1} I\_{j + sT - 1}^{(1)} - a\_j^{(2)} X\_{j + sT - 1} I\_{j + sT - 1}^{(2)} - \lambda\_{j + sT}$$

and *n*(*T*, *N*) is a constant associated with *N* and *T*, then

$$E\left[\sum\_{j=1}^{T}\sum\_{s=0}^{N-1}V\_{\theta\_j}^{-1}(\mathbf{X}\_{j+sT}-E\{\mathbf{X}\_{j+sT}|\mathbf{X}\_{j+sT-1}\})\left(c\_j^{(1)}\mathbf{X}\_{j+sT-1}\mathbf{I}\_{j+sT-1}^{(1)}+c\_j^{(2)}\mathbf{X}\_{j+sT-1}\mathbf{I}\_{j+sT-1}^{(2)}+c\_j^{(3)}\right)\right]^2$$

$$=\sum\_{j=1}^{T}\sum\_{s=0}^{N-1}E\left[V\_{\theta\_j}^{-2}w\_{j,s}^2u\_{j,s}^2\right]+n(T,N)E\left[V\_{\theta\_j}^{-1}V\_{\theta\_i}^{-1}w\_{j,s}w\_{i,k}u\_{j,s}u\_{i,k}\right],\tag{A1}$$

for the first item in the right side of Equation (A1), we have

$$\begin{split} &E\left[\boldsymbol{V}\_{\boldsymbol{\theta}\_{j}^{\*}}^{-2}\boldsymbol{w}\_{j,s}^{2}\boldsymbol{u}\_{j,s}^{2}\right] \\ &=E\left\{E\left[\boldsymbol{V}\_{\boldsymbol{\theta}\_{j}^{\*}}^{-2}\boldsymbol{w}\_{j,s}^{2}\boldsymbol{u}\_{j,s}^{2}\mid\boldsymbol{X}\_{j+sT-1}\right]\right\} \\ &=E\left\{\boldsymbol{V}\_{\boldsymbol{\theta}\_{j}^{\*}}^{-2}\boldsymbol{u}\_{j,s}^{2}\boldsymbol{E}\left[\boldsymbol{w}\_{j,s}^{2}\mid\boldsymbol{X}\_{j+sT-1}\right]\right\} \\ &=E\left[\boldsymbol{V}\_{\boldsymbol{\theta}\_{j}^{\*}}^{-1}\left(\boldsymbol{c}\_{j}^{(1)}\boldsymbol{X}\_{j+sT-1}\boldsymbol{I}\_{j+sT-1}^{(1)}+\boldsymbol{c}\_{j}^{(2)}\boldsymbol{X}\_{j+sT-1}\boldsymbol{I}\_{j+sT-1}^{(2)}+\boldsymbol{c}\_{j}^{(3)}\right)^{2}\right], \end{split}$$

for the second item in the right side of Equation (A1), we have

$$\begin{aligned} &E\left[V^{-1}\_{\mathfrak{G}\_j}V^{-1}\_{\mathfrak{G}\_i}w\_{j,s}w\_{i,k}u\_{j,s}u\_{i,k}\right] \\ &=E\left\{V^{-1}\_{\mathfrak{G}\_j}V^{-1}\_{\mathfrak{G}\_i}u\_{j,s}u\_{i,k}E\left[w\_{j,s}w\_{i,k}|X\_{j+sT-1},X\_{i+kT-1}\right]\right\} \\ &=0, \end{aligned}$$

which imply that *Cov*- *SN*,*j*, *SN*,*<sup>i</sup>* = **0**, where *SN*,*<sup>j</sup>* = *S*(1) *N*,*j* - *θj*, *β<sup>j</sup>* , *S*(2) *N*,*j* - *θj*, *β<sup>j</sup>* , *S*(3) *N*,*j* - *θj*, *β<sup>j</sup>* , ∀*i*, *j* = 1, 2, . . . , *T*, *i* = *j*, then we have

$$\begin{split} &\frac{\mathbf{c}\_{j}^{T}}{\sqrt{N}} \left( S\_{N,j}^{(1)}(\boldsymbol{\theta}\_{j},\boldsymbol{\theta}\_{j}), S\_{N,j}^{(2)}(\boldsymbol{\theta}\_{j},\boldsymbol{\theta}\_{j}), S\_{N,j}^{(3)}(\boldsymbol{\theta}\_{j},\boldsymbol{\theta}\_{j}) \right)' \\ &= \frac{1}{\sqrt{N}} \sum\_{i=1}^{3} c\_{j}^{(i)} S\_{N,j}^{(i)}(\boldsymbol{\theta}\_{j},\boldsymbol{\theta}\_{j}) \\ &= \frac{1}{\sqrt{N}} \sum\_{s=0}^{N-1} V\_{\theta\_{j}}^{-1} \left[ X\_{\boldsymbol{\beta}+sT} - E \left( X\_{\boldsymbol{\beta}+sT} | X\_{\boldsymbol{\beta}+sT-1} \right) \right] \left( c\_{j}^{(1)} X\_{\boldsymbol{\beta}+sT-1} I\_{j+sT-1}^{(1)} + c\_{j}^{(2)} X\_{\boldsymbol{\beta}+sT-1} I\_{j+sT-1}^{(2)} + c\_{j}^{(3)} \right) \\ &\overset{\mathcal{L}}{\rightarrow} \mathcal{N} \left( 0, E \left[ V\_{\theta\_{j}}^{-1} (X\_{j} | X\_{j-1}) \left( c\_{j}^{(1)} X\_{j-1} I\_{j-1}^{(1)} + c\_{j}^{(2)} X\_{j-1} I\_{j-1}^{(2)} + c\_{j}^{(3)} \right)^{2} \right] \right), \text{as } \mathcal{N} \to \infty, \end{split}$$

therefore, the *<sup>c</sup><sup>T</sup> j* <sup>√</sup>*<sup>N</sup> S*(1) *N*,*j* - *θj*, *β<sup>j</sup>* , *S*(2) *N*,*j* - *θj*, *β<sup>j</sup>* , *S*(3) *N*,*j* - *θj*, *β<sup>j</sup>* converges to a normal distribution with mean zero and variance *E <sup>V</sup>*−<sup>1</sup> *<sup>θ</sup><sup>j</sup>* - *Xj*|*Xj*−<sup>1</sup> *c* (1) *<sup>j</sup> Xj*−<sup>1</sup> *I* (1) *<sup>j</sup>*−<sup>1</sup> <sup>+</sup> *<sup>c</sup>* (2) *<sup>j</sup> Xj*−<sup>1</sup> *I* (2) *<sup>j</sup>*−<sup>1</sup> <sup>+</sup> *<sup>c</sup>* (3) *j* 2 . Thus, by Cramer-wold device, it follows that

$$\frac{1}{\sqrt{N}} \begin{pmatrix} \mathcal{S}\_{N,1}^{(1)}(\boldsymbol{\theta}\_{1},\boldsymbol{\beta}\_{1}) \\ \mathcal{S}\_{N,1}^{(2)}(\boldsymbol{\theta}\_{1},\boldsymbol{\beta}\_{1}) \\ \mathcal{S}\_{N,1}^{(3)}(\boldsymbol{\theta}\_{1},\boldsymbol{\beta}\_{1}) \\ \vdots \\ \mathcal{S}\_{N,T}^{(1)}(\boldsymbol{\theta}\_{T},\boldsymbol{\beta}\_{T}) \\ \mathcal{S}\_{N,T}^{(2)}(\boldsymbol{\theta}\_{T},\boldsymbol{\beta}\_{T}) \\ \mathcal{S}\_{N,T}^{(3)}(\boldsymbol{\theta}\_{T},\boldsymbol{\beta}\_{T}) \\ \end{pmatrix} \xrightarrow{L} \mathcal{N} \left(\mathbf{0}, \begin{bmatrix} H\_{1}(\boldsymbol{\theta}) & \mathbf{0} & \cdots & \mathbf{0} \\ \mathbf{0} & H\_{2}(\boldsymbol{\theta}) & \cdots & \mathbf{0} \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{0} & \mathbf{0} & \cdots & H\_{T}(\boldsymbol{\theta}) \end{bmatrix}\right) \\ \mathcal{S}\_{N,T}^{(3)}(\boldsymbol{\theta}\_{T},\boldsymbol{\beta}\_{T}) \end{pmatrix} /$$

the **0**'s are (3 × 3)-null matrices. Now, we replace *Vθ<sup>j</sup>* - *Xj*+*sT*|*Xj*+*sT*−<sup>1</sup> by *Vθ*ˆ*<sup>j</sup>* - *Xj*+*sT*|*Xj*+*sT*−<sup>1</sup> , where *θ*ˆ*<sup>j</sup>* is a consistent estimator of *θj*. We aim to get the result

$$\frac{1}{\sqrt{N}} \begin{pmatrix} S\_{N,1}^{(1)}(\boldsymbol{\Phi}\_{1},\boldsymbol{\beta}\_{1}) \\ S\_{N,1}^{(2)}(\boldsymbol{\Phi}\_{1},\boldsymbol{\beta}\_{1}) \\ S\_{N,1}^{(3)}(\boldsymbol{\Phi}\_{1},\boldsymbol{\beta}\_{1}) \\ \vdots \\ S\_{N,T}^{(1)}(\boldsymbol{\Phi}\_{T},\boldsymbol{\beta}\_{T}) \\ S\_{N,T}^{(2)}(\boldsymbol{\Phi}\_{T},\boldsymbol{\beta}\_{T}) \\ S\_{N,T}^{(3)}(\boldsymbol{\Phi}\_{T},\boldsymbol{\beta}\_{T}) \end{pmatrix} \stackrel{L}{\rightarrow} \mathcal{N} \left( \mathbf{0}, \begin{bmatrix} H\_{1}(\boldsymbol{\theta}) & \mathbf{0} & \cdots & \mathbf{0} \\ \mathbf{0} & H\_{2}(\boldsymbol{\theta}) & \cdots & \mathbf{0} \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{0} & \mathbf{0} & \cdots & H\_{T}(\boldsymbol{\theta}) \end{bmatrix} \right) . \tag{A2}$$

To prove (A2), we need to check the following conclusion

$$\frac{1}{\sqrt{N}}S\_{N,j}^{(i)}(\theta\_{\dot{\gamma}},\beta\_{\dot{\gamma}}) - \frac{1}{\sqrt{N}}S\_{N,j}^{(i)}(\theta\_{\dot{\gamma}},\beta\_{\dot{\gamma}}) \stackrel{P}{\rightarrow} 0, j = 1,2,\dots,T,\\i = 1,2,3,\\N \rightarrow \infty. \tag{A3}$$

For ∀ > 0, ∃*δ* > 0, we have

$$\begin{split} P\left( |\frac{1}{\sqrt{N}} S\_N^{(1)}(\boldsymbol{\theta}\_{\boldsymbol{\hat{\boldsymbol{\beta}}}}, \boldsymbol{\theta}\_{\boldsymbol{\hat{\boldsymbol{\beta}}}}) - \frac{1}{\sqrt{N}} S\_N^{(1)}(\boldsymbol{\theta}\_{\boldsymbol{\hat{\boldsymbol{\beta}}}}, \boldsymbol{\theta}\_{\boldsymbol{\hat{\boldsymbol{\beta}}}})| > \epsilon \right) \leq \sum\_{k=1}^2 P\left( |\boldsymbol{\theta}\_{\boldsymbol{\hat{\boldsymbol{\beta}}}}^{(k)} - \boldsymbol{\theta}\_{\boldsymbol{\hat{\boldsymbol{\beta}}}}^{(k)}| > \delta \right) + P\left( |\boldsymbol{\sigma}\_{\boldsymbol{\hat{\boldsymbol{\gamma}}} \mid \boldsymbol{\sigma}}^2 - \boldsymbol{\sigma}\_{\boldsymbol{\hat{\boldsymbol{\beta}}}}^2| > \delta \right) \\ &+ P\left( \sup\_{D} |\frac{1}{\sqrt{N}} S\_N^{(1)}(\boldsymbol{\theta}\_{\boldsymbol{\hat{\boldsymbol{\beta}}}}, \boldsymbol{\theta}\_{\boldsymbol{\hat{\boldsymbol{\beta}}}}) - \frac{1}{\sqrt{N}} S\_N^{(1)}(\boldsymbol{\theta}\_{\boldsymbol{\hat{\boldsymbol{\beta}}}}, \boldsymbol{\theta}\_{\boldsymbol{\hat{\boldsymbol{\beta}}}})| > \epsilon \right) . \end{split}$$

where *θj*<sup>1</sup> = *θ* (1) *<sup>j</sup>*<sup>1</sup> , *<sup>θ</sup>* (2) *<sup>j</sup>*<sup>1</sup> , *<sup>σ</sup>*<sup>2</sup> *z*,*j*<sup>1</sup> , *D* = {*θj*<sup>1</sup> : |*θ* (1) *<sup>j</sup>*<sup>1</sup> − *<sup>θ</sup>* (1) *<sup>j</sup>* | < *δ*, |*θ* (2) *<sup>j</sup>*<sup>1</sup> − *<sup>θ</sup>* (2) *<sup>j</sup>* <sup>|</sup> <sup>&</sup>lt; *<sup>δ</sup>*, *<sup>σ</sup>*<sup>2</sup> *<sup>z</sup>*,*j*<sup>1</sup> <sup>−</sup> *<sup>σ</sup>*<sup>2</sup> *z*,*j* | < *δ*}. If *θ*ˆ*<sup>j</sup>* is a consistent estimator of *θj*, then we just need to prove that

$$P\left(\sup\_{D}|\frac{1}{\sqrt{N}}S^{(1)}\_{N}\left(\boldsymbol{\theta}\_{\boldsymbol{\hat{\jmath}}\_{1}},\boldsymbol{\beta}\_{\boldsymbol{\hat{\jmath}}}\right)-\frac{1}{\sqrt{N}}S^{(1)}\_{N}\left(\boldsymbol{\theta}\_{\boldsymbol{\hat{\jmath}}\_{1}},\boldsymbol{\beta}\_{\boldsymbol{\hat{\jmath}}\_{1}}\right)|>\epsilon\right) \xrightarrow{P} 0, N \to \infty.$$

By the Markov inequality,

*P* sup *D* | 1 <sup>√</sup>*<sup>N</sup> S*(1) *N* - *θj*<sup>1</sup> , *β<sup>j</sup>* <sup>−</sup> <sup>1</sup> <sup>√</sup>*<sup>N</sup> S*(1) *N* - *θj*, *β<sup>j</sup>* | > ≤ 1 <sup>2</sup> *<sup>E</sup>* \$ sup *D* 1 <sup>√</sup>*<sup>N</sup> S*(1) *N* - *θj*<sup>1</sup> , *β<sup>j</sup>* <sup>−</sup> <sup>1</sup> <sup>√</sup>*<sup>N</sup> S*(1) *N* - *θj*, *β* <sup>2</sup> % ≤ 1 <sup>2</sup> *<sup>E</sup>*{sup *D* 1 *N N*−<sup>1</sup> ∑ *s*=0 *<sup>V</sup>*−<sup>1</sup> *<sup>θ</sup><sup>j</sup>* 1 <sup>−</sup> *<sup>V</sup>*−<sup>1</sup> *<sup>θ</sup><sup>j</sup>* <sup>2</sup>- *Xj*+*sT* − *E* - *Xj*+*sT*|*Xj*+*sT*−<sup>1</sup> <sup>2</sup> *X*2 *<sup>j</sup>*+*sT*−<sup>1</sup> *<sup>I</sup>* (1) *j*+*sT*−1 } <sup>=</sup> <sup>1</sup> <sup>2</sup> *<sup>E</sup>* sup *D <sup>V</sup>*−<sup>1</sup> *<sup>θ</sup><sup>j</sup>* 1 - *Xj*|*Xj*−<sup>1</sup> <sup>−</sup> *<sup>V</sup>*−<sup>1</sup> *<sup>θ</sup><sup>j</sup>* - *Xj*|*Xj*−<sup>1</sup> <sup>2</sup> *Xj* <sup>−</sup> *<sup>α</sup>*(1) *<sup>j</sup> Xj*−<sup>1</sup> *I* (1) *<sup>j</sup>*−<sup>1</sup> <sup>−</sup> *<sup>α</sup>*(2) *<sup>j</sup> Xj*−<sup>1</sup> *I* (2) *<sup>j</sup>*−<sup>1</sup> <sup>−</sup> *<sup>λ</sup><sup>j</sup>* 2 *X*2 *<sup>j</sup>*−<sup>1</sup> *<sup>I</sup>* (1) *j*−1 <sup>=</sup> <sup>1</sup> <sup>2</sup> *<sup>E</sup>*{sup *D <sup>θ</sup>* (1) *<sup>j</sup>* − *θ* (1) *j*1 *Xj*−<sup>1</sup> *I* (1) *<sup>j</sup>*−<sup>1</sup> <sup>+</sup> *θ* (1) *<sup>j</sup>* − *θ* (2) *j*1 *Xj*−<sup>1</sup> *I* (2) *<sup>j</sup>*−<sup>1</sup> <sup>+</sup> *σ*2 *<sup>z</sup>*,*<sup>j</sup>* <sup>−</sup> *<sup>σ</sup>*<sup>2</sup> *z*,*j*<sup>1</sup> <sup>2</sup> *V*2 *θj* 1 - *Xj*|*Xj*−<sup>1</sup> *Vθj* - *Xj*|*Xj*−<sup>1</sup> *<sup>X</sup>*<sup>2</sup> *<sup>j</sup>*−<sup>1</sup> *<sup>I</sup>* (1) *<sup>j</sup>*−1} ≤ 1 <sup>2</sup> sup *D* {[ *θ* (1) *<sup>j</sup>* − *θ* (1) *j*1 2 *m*<sup>1</sup> + *θ* (2) *<sup>j</sup>* − *θ* (2) *j*1 2 *m*<sup>2</sup> + *σ*2 *<sup>z</sup>*,*<sup>j</sup>* <sup>−</sup> *<sup>σ</sup>*<sup>2</sup> *z*,*j*<sup>1</sup> 2 *m*<sup>3</sup> + 2*m*4|*θ* (1) *<sup>j</sup>* − *θ* (1) *<sup>j</sup>*<sup>1</sup> ||*<sup>θ</sup>* (2) *<sup>j</sup>* − *θ* (2) *<sup>j</sup>*<sup>1</sup> | + 2*m*5|*θ* (1) *<sup>j</sup>* − *θ* (1) *<sup>j</sup>*<sup>1</sup> ||*σ*<sup>2</sup> *<sup>z</sup>*,*<sup>j</sup>* <sup>−</sup> *<sup>σ</sup>*<sup>2</sup> *z*,*j*<sup>1</sup> | + 2*m*6|*θ* (2) *<sup>j</sup>* − *θ* (2) *<sup>j</sup>*<sup>1</sup> ||*σ*<sup>2</sup> *<sup>z</sup>*,*<sup>j</sup>* <sup>−</sup> *<sup>σ</sup>*<sup>2</sup> *z*,*j*<sup>1</sup> <sup>|</sup>]*X*<sup>2</sup> *<sup>j</sup>*−<sup>1</sup> *<sup>I</sup>* (1) *<sup>j</sup>*−1} <sup>≤</sup> *<sup>c</sup>δ*<sup>2</sup> 2 ,

where *m*1, *m*2, ... , *m*<sup>6</sup> are some finite moments of process {*Xt*} under assumption (C2), and *c* is a positive constant. A similar argument can be used for <sup>√</sup><sup>1</sup> *<sup>N</sup> <sup>S</sup>*(2) *N*,*j* - *θj*, *β<sup>j</sup>* and √1 *<sup>N</sup> <sup>S</sup>*(3) *N*,*j* - *θj*, *β<sup>j</sup>* , *j* = 1, . . . , *T*. Let *δ* → 0, we can get (A3). By the ergodic theorem, we have

$$\frac{1}{N}\mathbf{Q}\_N \xrightarrow{P} H(\boldsymbol{\theta}).$$

After some calculation, we have

$$\begin{split} & \mathcal{Q}\_{N} \Big( \mathfrak{F}\_{MQL} - \mathfrak{F} \Big) \\ &= \Big( \mathcal{S}\_{N,1}^{(1)} (\hat{\mathfrak{G}}\_{1}, \mathfrak{F}\_{1}), \mathcal{S}\_{N,1}^{(2)} (\hat{\mathfrak{G}}\_{1}, \mathfrak{F}\_{1}), \mathcal{S}\_{N,1}^{(3)} (\hat{\mathfrak{G}}\_{1}, \mathfrak{F}\_{1}), \dots, \mathcal{S}\_{N,T}^{(1)} (\hat{\mathfrak{G}}\_{T}, \mathfrak{F}\_{T}), \mathcal{S}\_{N,T}^{(2)} (\hat{\mathfrak{G}}\_{T}, \mathfrak{F}\_{T}), \mathcal{S}\_{N,T}^{(3)} (\hat{\mathfrak{G}}\_{T}, \mathfrak{F}\_{T}) \Big)' \Big), \end{split}$$

Therefore,

$$\sqrt{N}\left(\hat{\mathfrak{k}}\_{MQL} - \mathfrak{k}\right) \xrightarrow{L} \mathcal{N}\left(\mathbf{0}, H^{-1}(\mathfrak{k})\right).$$

This completes the proof.

#### **References**


