*Article* **A New Extension of Thinning-Based Integer-Valued Autoregressive Models for Count Data**

**Zhengwei Liu and Fukang Zhu \***

School of Mathematics, Jilin University, 2699 Qianjin Street, Changchun 130012, China; zhengweil18@mails.jlu.edu.cn **\*** Correspondence: fzhu@jlu.edu.cn

**Abstract:** The thinning operators play an important role in the analysis of integer-valued autoregressive models, and the most widely used is the binomial thinning. Inspired by the theory about extended Pascal triangles, a new thinning operator named extended binomial is introduced, which is a general case of the binomial thinning. Compared to the binomial thinning operator, the extended binomial thinning operator has two parameters and is more flexible in modeling. Based on the proposed operator, a new integer-valued autoregressive model is introduced, which can accurately and flexibly capture the dispersed features of counting time series. Two-step conditional least squares (CLS) estimation is investigated for the innovation-free case and the conditional maximum likelihood estimation is also discussed. We have also obtained the asymptotic property of the two-step CLS estimator. Finally, three overdispersed or underdispersed real data sets are considered to illustrate a superior performance of the proposed model.

**Keywords:** extended binomial distribution; INAR; thinning operator; time series of counts

**Citation:** Liu, Z.; Zhu, F. A New Extension of Thinning-Based Integer-Valued Autoregressive Models for Count Data. *Entropy* **2021**, *23*, 62. https://doi.org/10.3390/ e23010062

Received: 29 October 2020 Accepted: 28 December 2020 Published: 31 December 2020

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

Counting time series naturally occur in many contexts, including actuarial science, epidemiology, finance, economics, etc. The last few years have witnessed the rapid development of modeling time series of counts. One of the most common approaches for modeling integer-valued autoregressive (INAR) time series is based on thinning operators. In order to fit different kinds of situations, many corresponding operators have been developed; see [1] for a detailed discussion on thinning-based INAR models.

The most popular thinning operator is the binomial thinning operator introduced by [2]. Let *X* be a non-negative integer-valued random variable and *α* ∈ (0, 1), the binomial thinning operator is defined as

$$\alpha \circ X = \sum\_{i=1}^{X} B\_{i\prime} \quad \text{ } X > 0\text{,} \tag{1}$$

and 0 otherwise, where {*Bi*} is a sequence of independent identically distributed (i.i.d.) Bernoulli random variables with fixed success probability *α*, and *Bi* is independent of *X*. Based on the binomial thinning operator, [3,4] independently proposed an INAR(1) model as follows

$$X\_t = \mathfrak{a} \circ X\_{t-1} + \mathfrak{e}\_{t\_\star} \quad \mathfrak{t} \in \mathbb{Z}\_\star \tag{2}$$

where {*t*} is a sequence of i.i.d. integer-valued random variables with finite mean and variance. Since this seminal work, the INAR-type models have received considerable attention. For recent literature on this topic, see [5,6], among others.

Note that *Bi* in (1) follows a Bernoulli distribution, so *α* ◦ *X* is always less than or equal to *X*; in other words, the first part of the right side in (2) cannot be greater than *Xt*−1, which limits the flexibility of the model. Although it has such a shortcoming, the simple form makes it easy to estimate the parameter, and it also has many similar properties to the multiplication operator in the continuous case. For this reason, there have still been many extensions of the binomial thinning operator since its emergence. Zhu and Joe [7] proposed the expectation thinning operator, which is the generalization of binomial thinning from the perspective of a probability generating function (pgf). Although this extension is very successful, the estimation procedure is a little complicated. Compared with this extension, the thinning operator we proposed is simpler and more intuitive. For recent developments, Yang et al. [8] proposed the generalized Poisson (GP) thinning operator, which is defined by replacing *Bi* with a GP counting series. Although the GP thinning operator is flexible and adaptable, we argue that it has a potential drawback: the GP distribution is not a strict probability distribution in the conventional sense. Recently, Aly and Bouzar [9] introduced a two-parameter expectation thinning operator based on a linear fractional probability generating function, which can be regarded as a general case of at least nine thinning operators. Kang et al [10] proposed a new flexible thinning operator, which is named GSC because of three initiators of the counting series: Gómez-Déniza, Sarabia and Calderín-Ojeda.

Although the binomial thinning operator is very popular, it may not perform very well in large numerical value counting time series. This is because under such circumstances, the predicted data are often volatile, and the data are more likely to be non-stationary when the numerical value is large. We intend to establish a new thinning operator which meets the following requirements: (i) it is an extension of the binomial thinning operator; (ii) it contains two parameters to achieve flexibility, (iii) it has a simple structure and is easy to implement.

Based on the above considerations, we propose a new thinning operator based on the extended binomial (EB) distribution. The operator has two parameters: real-valued *α* and integer-valued *m* (0 ≤ *α* ≤ 1, *m* ≥ 2), which is more flexible compared to some single parameter thinning, and the binomial thinning operator (1) can be regarded as a special case of *m* = 2 in the EB thinning. The case of *m* > 2 in the EB thinning usually performs better than *m* = 2 in some large value data sets. In other words, the EB thinning alleviates the main defect of the binomial thinning to some extent. Since the EB thinning is not a special case of the expectation thinning in [9], we have further extended the framework of thinning-based INAR models to provide a new way in practical application. Therefore, an INAR(1) model is proposed based on the EB thinning operator, which is an extension of the model (2) and can more accurately and flexibly capture the dispersed features in real data.

This paper is organized as follows. In Section 2, we review the properties of the EB distribution and then introduce the EB thinning operator. Based on the new thinning operator, we propose a new INAR(1) model. In Section 3, two-step conditional least squares estimation is investigated for the innovation-free case of the model and the asymptotic property of the estimator is obtained. The conditional maximum likelihood estimation is discussed and the numerical simulations. In Section 4, we focus on forecasting and introduce two criteria to compare the prediction results for three overdispersed or underdispersed real data sets, which are considered to illustrate a better performance of the proposed model. In Section 5, we give some conclusions and related discussions.

#### **2. A New INAR(1) Model**

The EB distribution comes from the theory about Pascal's triangles, which can be regarded as a multivariate case of the binomial distribution; see [11] for more details. Based on this distribution, we introduce the EB thinning operator and propose a corresponding INAR(1) model.

#### *2.1. EB Distribution*

The EB random variable *Xn*(*m*, *α*), denoted as *EB*(*m*, *n*, *α*), which is defined as follows:

$$P(X\_{\mathfrak{n}}(m,n) = r) = \mathbb{C}\_{\mathfrak{m}}(n,r)a^r \beta^{(m-1)n-r}, \ 0 \le r \le (m-1)n,\tag{3}$$

where *m* and *n* are both integers satisfying *m* ≥ 2 and *n* ≥ 1; *Cm*(*n*,*r*) can be calculated as

$$\mathbb{C}\_{m}(n,r) = \sum\_{s=0}^{s\_1} (-1)^s \binom{n}{s} \binom{r+n-sm-1}{n-1} \prime$$

where *s*<sup>1</sup> = min{*n*, integer part in *r*/*m*}; and *α* and *β* in (3) satisfy the following restriction:

$$
\beta^{m-1} + a\beta^{m-2} + a^2\beta^{m-3} + \dots + a^{m-1} = 1, \ 0 \le a \le 1, \ 0 \le \beta \le 1. \tag{4}
$$

The above restriction is equivalent to *<sup>β</sup><sup>m</sup>* <sup>−</sup> *<sup>α</sup><sup>m</sup>* <sup>=</sup> *<sup>β</sup>* <sup>−</sup> *<sup>α</sup>*. The mean and variance of EB random variables *Xn*(*m*, *α*) are

$$E(X\_n(m,\mathfrak{a})) = na \frac{1 - ma^{m-1}}{\beta - \mathfrak{a}}, \quad \text{Var}(X\_n(m,\mathfrak{a})) = na\beta \frac{1 - m^2(a\beta)^{m-1}}{(\beta - \mathfrak{a})^2}.$$

respectively. The pgf of *Xn*(*m*, *<sup>α</sup>*) can be written as *<sup>G</sup>*(*t*) = *<sup>E</sup>*(*tXn*(*m*,*<sup>α</sup>*)) = *<sup>β</sup><sup>m</sup>* <sup>−</sup> *<sup>α</sup>mt m β* − *αt n* . As *Xn*(*m*, *α*) can be expressed as a convolution, the EB distribution has the reproductive property. Specifically, if *Y*1,*Y*2, ... ,*Yk* are independent random variables with *Yi* ∼ *EB*(*m*, *ni*, *α*) in (3), then ∑*<sup>k</sup> <sup>i</sup>*=<sup>1</sup> *Yi* <sup>∼</sup> *EB*(*m*, <sup>∑</sup>*<sup>k</sup> <sup>i</sup>*=<sup>1</sup> *ni*, *α*). Notice that random variable *Yi* ∼ *EB*(2, 1, *α*) is equivalent to a Bernoulli random variable satisfying *P*(*Yi* = 1) = 1 − *P*(*Yi* = 0) = 1 − *α*.

#### *2.2. EB Thinning Operator*

According to discussions in 2.1, we construct the EB thinning operator based on the configuration of *n* = 1. Let {*Ui*(*m*, *α*)} be a sequence of i.i.d. random variables with common distribution *EB*(*m*, 1, *<sup>α</sup>*), i.e., *<sup>P</sup>*(*Ui*(*m*, *<sup>α</sup>*) = *<sup>z</sup>*) = *<sup>α</sup>zβ*(*m*−1)−*z*, *<sup>z</sup>* <sup>=</sup> 0, ... , *<sup>m</sup>* <sup>−</sup> 1, where *α* and *β* satisfy (4). Note that the mean and variance of *Ui* are

$$\mu = \mathbb{E}(\mathcal{U}\_{\mathrm{i}}) := a \frac{1 - ma^{m-1}}{\beta - a}, \quad \sigma^2 = \mathrm{Var}(\mathcal{U}\_{\mathrm{i}}) := a\beta \frac{1 - m^2 (a\beta)^{m-1}}{\left(\beta - a\right)^2}. \tag{5}$$

One can easily see that *μ* < *σ*<sup>2</sup> if and only if

$$m\alpha^{m-2}(m\beta^m - \beta + \alpha) < 1. \tag{6}$$

For any 3 ≤ *m* < ∞, the left-hand side of (6) approaches 0 as *α* → 0 and *m* as *α* → 1, respectively. Hence, *<sup>μ</sup>* <sup>&</sup>lt; *<sup>σ</sup>*<sup>2</sup> or *<sup>μ</sup>* <sup>≥</sup> *<sup>σ</sup>*<sup>2</sup> is possible. When *<sup>m</sup>* <sup>=</sup> 2, which corresponds to the binomial distribution, and (5) gives *μ* = *α* and *σ*<sup>2</sup> = *αβ* with *α* + *β* = 1, then we have *<sup>μ</sup>* <sup>&</sup>gt; *<sup>σ</sup>*<sup>2</sup> for all 0 <sup>&</sup>lt; *<sup>α</sup>* <sup>&</sup>lt; 1. When *<sup>α</sup>* <sup>≥</sup> *<sup>β</sup>* and *<sup>m</sup>* <sup>→</sup> <sup>∞</sup>, one can easily show that *<sup>μ</sup>* <sup>=</sup> *<sup>θ</sup>*/(<sup>1</sup> <sup>−</sup> *<sup>θ</sup>*), *<sup>σ</sup>*<sup>2</sup> <sup>=</sup> *<sup>θ</sup>*/(<sup>1</sup> <sup>−</sup> *<sup>θ</sup>*)2, where *<sup>θ</sup>* <sup>=</sup> *<sup>α</sup>*/*β*. Therefore, *<sup>μ</sup>* <sup>≤</sup> *<sup>σ</sup>*<sup>2</sup> for all 0 <sup>&</sup>lt; *<sup>θ</sup>* <sup>&</sup>lt; 1 in this case.

Based on the reproductive property of the EB distribution, we define the EB thinning operator "" as follows: for a non-negative integer-valued random variable *X*,

$$(m, \mathfrak{a}) \circledcirc X = \sum\_{i=1}^{X} \mathcal{U}\_i(m, \mathfrak{a}), \quad X > 0, \mathfrak{a}$$

where (*m*, *α*) *X* = 0 if *X* = 0. Note that the EB thinning operator reduces to the binomial operator (1) when *m* = 2. It is easy to know that (*m*, *α*) *X* ≤ *X* or > *X*, so the EB thinning operator is quite flexible when dealing with the overdispersed or underdispersed data sets. **Remark 1.** *The computation of (α*, *m) for given (μ*, *σ*2*) is based on* (4) *and* (5)*. The solution can be obtained by solving these nonlinear equations. When <sup>m</sup>* <sup>=</sup> <sup>3</sup>*, <sup>β</sup>* = (−*<sup>α</sup>* <sup>+</sup> <sup>√</sup> 4 − 3*α*2)/2 *and when m* = 4*,*

$$\begin{split} \beta &= \sqrt[3]{-\left(\frac{10a^3}{27} - \frac{1}{2}\right) + \sqrt{\left(\frac{10a^3}{27} - \frac{1}{2}\right)^2 + \left(\frac{2a^2}{9}\right)^3}} \\ &+ \sqrt[3]{-\left(\frac{10a^3}{27} - \frac{1}{2}\right) - \sqrt{\left(\frac{10a^3}{27} - \frac{1}{2}\right)^2 + \left(\frac{2a^2}{9}\right)^3}} - \frac{\alpha}{3}. \end{split}$$

*For more complex cases (m* ≥ 5*), we can derive the solution (α*, *β*, *m) by solving these large-scale nonlinear systems, and a more detailed calculation procedure is given in Section 3.3.*

#### *2.3. EB-INAR(1) Model*

Based on the EB thinning operator, we define the EB-INAR(1) model as follows:

$$X\_t = (m, \mathfrak{a}) \oplus X\_{t-1} + \mathfrak{e}\_{t\prime} \quad t = 1, 2, \dots \tag{7}$$

where *α* ∈ (0, 1), {*Xt*} is a sequence of non-negative integer-valued random variables; the innovation process {*t*} is a sequence of i.i.d. integer-valued random variables with finite mean and variance; and *<sup>t</sup>* is independent of {*Xs*,*s* < *t*}.

In order to obtain the estimation equations, we give some conditional or unconditional moments of the EB-INAR(1) model in the following proposition.

**Proposition 1.** *Suppose* {*Xt*} *is a stationary process defined by* (7) *and let μ* < 1*; then for t* ≥ 1*, 1. E*(*Xt*|*Xt*−1) = *μXt*−<sup>1</sup> + *μ;*


*where μ and σ*<sup>2</sup> *are the expectation and variance of the innovation t, respectively.*

The proof of some of these properties mentioned above is given in Appendix A.

**Remark 2.** *Inspired by the INAR(p) model in [12], we can further extend this model to INAR(p); the EB-INAR(p) model is defined as follows:*

$$X\_t = (m, \mathfrak{a}\_1) \oplus X\_{t-1} + \dots + (m, \mathfrak{a}\_p) \oplus X\_{t-p} + \mathfrak{e}\_t, \quad t = 2, 3, \dots$$

*where α*1, ... , *α<sup>p</sup>* ∈ (0, 1)*, m is an integer satisfying m* ≥ 2*,* {*Xt*} *is a sequence of non-negative integer-valued random variables, the innovation process* {*t*} *is a sequence of i.i.d. integer-valued random variables with finite mean and variance, and <sup>t</sup> is independent with* {*Xs*,*s* < *t*}*.*

We will show that the new model can accurately and flexibly capture the dispersion features of real data in Section 4.

#### **3. Estimation**

We use the two-step conditional least squares estimation proposed by [13] to investigate the innovation-free case and the asymptotic properties of the estimators are obtained. Conditional maximum likelihood estimation for the parametric cases are also discussed. Finally, we demonstrate the finite sample performance via simulation studies.

*3.1. Two-Step Conditional Least Squares Estimation*

Denote *θ*<sup>1</sup> = (*μ*, *μ*), *θ*<sup>2</sup> = (*σ*2, *σ*<sup>2</sup> ) and *θ* = (*θ* <sup>1</sup> , *θ* <sup>2</sup> ). The two-step CLS estimation will be conducted by the following two steps.

**Step 1.1**. The estimator for *θ*1.

Let *<sup>g</sup>*1(*θ*1, *Xt*−1) = *<sup>E</sup>*(*Xt*|*Xt*−1) = *<sup>μ</sup>Xt*−<sup>1</sup> <sup>+</sup> *μ*, *<sup>q</sup>*1*t*(*θ*1)=(*Xt* <sup>−</sup> *<sup>g</sup>*1(*θ*1, *Xt*−1))2. Let *Q*1(*θ*1) = *n* ∑ *t*=1 *q*1*t*(*θ*1)

be the CLS criterion function. Then the CLS estimator ˆ *θ*1,*CLS* := (*μ*ˆ*CLS*, *μ*ˆ ,*CLS*) of *θ*<sup>1</sup> can be obtained by solving the score equation *<sup>∂</sup>Q*1(*θ*1) *∂θ*<sup>1</sup> = 0, which implies a closed-form solution:

$$
\theta\_{1,CLS} = \begin{pmatrix}
\sum\_{t=1}^{n} X\_{t-1}^2 & \sum\_{t=1}^{n} X\_{t-1} \\
\sum\_{t=1}^{n} X\_{t-1} & n
\end{pmatrix}^{-1} \begin{pmatrix}
\sum\_{t=1}^{n} X\_t X\_{t-1} \\
\sum\_{t=1}^{n} X\_t
\end{pmatrix}.
$$

**Step 1.2**. The estimator for *θ*2.

$$\text{Let } \mathbf{Y}\_t = \mathbf{X}\_t - E(\mathbf{X}\_t | \mathbf{X}\_{t-1}), \operatorname{\mathcal{g}}\_2(\theta\_{2\prime}, \mathbf{X}\_{t-1}) = \operatorname{Var}(\mathbf{X}\_t | \mathbf{X}\_{t-1}) = \sigma^2 \mathbf{X}\_{t-1} + \sigma\_c^2. \text{ Then }$$

$$\begin{aligned} E(Y\_t^2|X\_{t-1}) &= E((X\_t - E(X\_t|X\_{t-1})^2)|X\_{t-1}) \\ &= \text{Var}(X\_t|X\_{t-1}) = \mathcal{g}\_2(\theta\_2, X\_{t-1}). \end{aligned}$$

Let *q*2*t*(*θ*2)=(*Y*<sup>2</sup> *<sup>t</sup>* <sup>−</sup> *<sup>g</sup>*2(*θ*2, *Xt*−1))2; then the CLS criterion function for *<sup>θ</sup>*<sup>2</sup> can be written as

$$Q\_2(\theta\_2) = \sum\_{t=1}^{n} q\_{2t}(\theta\_2).$$

By solving the score equation *<sup>∂</sup>Q*2(*θ*2) *∂θ*<sup>2</sup> <sup>=</sup> 0, we can obtain the CLS estimator <sup>ˆ</sup> *θ*2,*CLS* := (*σ*ˆ <sup>2</sup> *CLS*, *<sup>σ</sup>*<sup>ˆ</sup> <sup>2</sup> ,*CLS*) of *θ*2, which also is a closed-form solution:

$$
\theta\_{2,CLS} = \begin{pmatrix}
\sum\_{t=1}^{\mathbf{u}} \mathbf{X}\_{t-1}^2 & \sum\_{t=1}^{\mathbf{u}} \mathbf{X}\_{t-1} \\
\sum\_{t=1}^{\mathbf{u}} \mathbf{X}\_{t-1} & n
\end{pmatrix}^{-1} \begin{pmatrix}
\sum\_{t=1}^{\mathbf{u}} \mathbf{Y}\_t^2 \mathbf{X}\_{t-1} \\
\sum\_{t=1}^{\mathbf{u}} \mathbf{Y}\_t^2
\end{pmatrix}.
$$

**Step 2**. Estimating parameters (*m*, *α*) via the method of moments.

The estimator (*m*ˆ , *α*ˆ) of (*m*, *α*), which is called a two-step CLS estimator, can be obtained by solving the following estimation equations:

$$\begin{cases} \hat{\mu}\_{CLS} = \alpha \frac{1 - m a^{m-1}}{\beta - a},\\ \hat{\sigma}\_{CLS}^2 = \alpha \beta \frac{1 - m^2 (a \beta)^{m-1}}{(\beta - a)^2}, \end{cases} \tag{8}$$

where *α* and *β* satisfy (4).

Therefore, the resulting CLS estimator is Θˆ *CLS* = (*m*ˆ *CLS*, *α*ˆ *CLS*, *μ*ˆ ,*CLS*, *σ*ˆ <sup>2</sup> ,*CLS*). To study the asymptotic behaviour of the estimator, we make the following assumptions:

**Assumption 1.** {*Xt*} *is a stationary and ergodic process;*

**Assumption 2.** *EX*<sup>4</sup> *<sup>t</sup>* < ∞*.*

**Proposition 2.** *Under assumptions 1 and 2, the CLS estimator* ˆ *θ*1,*CLS is strongly consistent and asymptotically normal:*

$$\sqrt{n}(\theta\_{1,CLS} - \theta\_{1,0}) \xrightarrow{L} N(0, V\_1^{-1}W\_1V\_1^{-1})\_\prime$$

*where V*<sup>1</sup> := *E <sup>∂</sup> ∂θ*<sup>1</sup> *g*1(*θ*1,0, *X*0) *<sup>∂</sup> ∂θ* 1 *g*1(*θ*1,0, *X*0) *, W*<sup>1</sup> := *E q*11(*θ*1,0) *<sup>∂</sup> ∂θ*<sup>1</sup> *g*1(*θ*1,0, *X*0) *<sup>∂</sup> ∂θ* 1 *g*1(*θ*1,0, *X*0) *, and θ*1,0 = (*μ*0, *μ*<sup>0</sup> ) *denotes the true value of θ*1*.*

To obtain the asymptotic normality of ˆ *θ*2,*CLS*, we make a further assumption:

#### **Assumption 3.** *EX*<sup>6</sup> *<sup>t</sup>* < ∞*.*

Then we have the following proposition.

**Proposition 3.** *Under assumptions 1 and 3, the CLS estimator* ˆ *θ*2,*CLS is strongly consistent and asymptotically normal:*

$$\sqrt{n}(\hat{\theta}\_{2,CLS} - \theta\_{2,0}) \xrightarrow{L} N(0, V\_2^{-1} W\_2 V\_2^{-1})\_\* $$

*where V*<sup>2</sup> := *E <sup>∂</sup> ∂θ*<sup>2</sup> *g*2(*θ*2,0, *X*0) *<sup>∂</sup> ∂θ* 2 *g*2(*θ*2,0, *X*0) *, W*<sup>2</sup> := *E q*21(*θ*2,0) *<sup>∂</sup> ∂θ*<sup>2</sup> *g*2(*θ*2,0, *X*0) *<sup>∂</sup> ∂θ* 2 *g*2(*θ*2,0, *X*0) *, and θ*2,0 = (*σ*<sup>2</sup> <sup>0</sup> , *<sup>σ</sup>*<sup>2</sup> <sup>0</sup> ) *denotes the true value of θ*2*.*

Based on Propositions 2 and 3 and Theorem 3.2 in [14], we have the following proposition.

**Proposition 4.** *Under assumptions 1 and 3, the CLS estimator* ˆ *θCLS* = (ˆ *θ*1,*CLS*, ˆ *θ*2,*CLS*) *is strongly consistent and asymptotically normal:*

$$
\sqrt{n}(\theta\_{\rm CLS} - \theta\_0) \xrightarrow{L} N(0, \Omega),
$$

*where*

$$
\Omega = \begin{pmatrix}
V\_1^{-1} W\_1 V\_1^{-1} & V\_1^{-1} M V\_2^{-1} \\
V\_2^{-1} M^\top V\_1^{-1} & V\_2^{-1} W\_2 V\_2^{-1}
\end{pmatrix},
$$

*M* = *E* #*q*11(*θ*1,0)*q*21(*θ*2,0) *<sup>∂</sup> ∂θ*<sup>1</sup> *g*1(*θ*1,0, *X*0) *<sup>∂</sup> ∂θ* 2 *g*2(*θ*2,0, *X*0) *, and θ*<sup>0</sup> = (*θ*1,0, *θ*2,0) *denotes the true value of θ.*

We do the following preparation to establish Proposition 5. Based on (5), solve the equation about (*m*, *α*), and denote the solution as (*h*1(*μ*, *σ*2), *h*2(*μ*, *σ*2)). Let

$$D = D(\mu, \sigma^2) = \begin{pmatrix} \partial h\_1 / \partial \mu & \partial h\_1 / \partial \sigma^2 \\ \partial h\_2 / \partial \mu & \partial h\_2 / \partial \sigma^2 \end{pmatrix}. \tag{9}$$

Based on Proposition 4, we state the strong consistency and asymptotic normality of (*m*ˆ , *α*ˆ) in the following proposition.

**Proposition 5.** *Under assumptions 1 and 3, the CLS estimator* (*m*ˆ *CLS*, *α*ˆ *CLS*) *is strongly consistent and asymptotically normal:*

$$
\sqrt{n}\begin{pmatrix}
\mathfrak{M}\_{CLS} - m\_0 \\
\mathfrak{A}\_{CLS} - \mathfrak{a}\_0
\end{pmatrix} \xrightarrow{L} N(0, D\Sigma D^\top) \wr
$$

*where D is given in* (9)*;* <sup>Σ</sup> <sup>=</sup> *diag*(*IV*−<sup>1</sup> <sup>1</sup> *<sup>W</sup>*1*V*−<sup>1</sup> <sup>1</sup> *<sup>I</sup>*, *IV*−<sup>1</sup> <sup>2</sup> *<sup>W</sup>*2*V*−<sup>1</sup> <sup>2</sup> *<sup>I</sup>*) *with <sup>I</sup>* = (1, 0)*; <sup>m</sup>*<sup>0</sup> *and <sup>α</sup>*<sup>0</sup> *denote the true values of m and α, respectively.*

The brief proofs of Propositions 2–5 are given in Appendix A.

#### *3.2. Conditional Maximum Likelihood Estimation*

We maximize the likelihood function with respect to the model parameters *θ* = (*m*, *α*, *δ*) to get the conditional maximum likelihood (CML) estimate of the parametric case

$$\begin{aligned} L(X\_1 = \mathbf{x}\_1, \dots, X\_N = \mathbf{x}\_N | \theta) &= P\_\theta(X\_1 = \mathbf{x}\_1) \prod\_{i=1}^N P\_\theta(X\_i = \mathbf{x}\_i | X\_{i-1} = \mathbf{x}\_{i-1}, \dots, X\_1 = \mathbf{x}\_1) \\ &= P\_\theta(X\_1 = \mathbf{x}\_1) \prod\_{i=1}^N P\_\theta(X\_i = \mathbf{x}\_i | X\_{i-1} = \mathbf{x}\_{i-1}), \end{aligned}$$

where *δ* is the parameter of *i*, *PX*<sup>1</sup> is the pmf for *X*<sup>1</sup> and *P<sup>θ</sup>* (*Xi*+1|*Xi*) is the conditional pmf. Since the marginal distribution is difficult to obtain in general, a simple approach is conditional on the observed *X*1. By essentially ignoring the dependency on the initial value and considering the CML estimate given *X*<sup>1</sup> as an estimate for *θ* by maximizing the conditional log-likelihood

$$d(X\_1 = \mathbf{x}\_{1'} \dots \mathbf{x}\_N = \mathbf{x}\_N | \theta) = \sum\_{i=2}^N \log P\_{\theta}(X\_i | X\_{i-1})$$

over Θ, we denote the CML estimate by ˆ *θ* = (*m*ˆ , *α*ˆ, ˆ *δ*). The log-likelihood function is as follows:

$$\mathrm{cl}(X\_1 = \mathbf{x}\_1, \dots, X\_N = \mathbf{x}\_N | \boldsymbol{\theta}) = \sum\_{i=2}^N \log \left\{ \sum\_{w=0}^{\min\{(m-1)\mathbf{x}\_{i-1}, \mathbf{x}\_i\}} \mathbb{C}\_{\mathbf{m}}(\mathbf{x}\_{i-1}, \mathbf{w}) \mathbf{a}^w \boldsymbol{\beta}^{(m-1)\mathbf{x}\_{i-1} - w} \cdot \mathbf{P}(\mathbf{e}\_i = \mathbf{x}\_i - w) \right\},$$

where *α* and *β* satisfy (4); *<sup>i</sup>* follows a non-negative discrete distribution with a parameter *δ*. In what follows, we consider two cases: *m* = 3, 4.

**Case 1**: For *m*=3 with Poisson innovation, i.e., *<sup>t</sup>* ∼ *P*(*δ*).

$$l(\mathbf{X}\_1 = \mathbf{x}\_1, \dots, \mathbf{X}\_N = \mathbf{x}\_N | \boldsymbol{\theta}) = \sum\_{i=2}^N \log \left\{ \sum\_{w=0}^{\min\{2\mathbf{x}\_{i-1}, \mathbf{x}\_i\}} \sum\_{t\_1=\max\{0, \mathbf{x}\_{i-1} - w\}}^{\mathbf{x}\_{i-1} - \frac{w}{2}} \binom{\mathbf{x}\_{i-1}}{t\_1} \binom{\mathbf{x}\_{i-1} - t\_1}{2\mathbf{x}\_{i-1} - 2t\_1 - w} \right\}$$

$$\cdot (\boldsymbol{\beta}^2)^{t\_1} (\boldsymbol{a} \boldsymbol{\beta})^{2\mathbf{x}\_{i-1} - 2t\_1 - w} (\boldsymbol{a}^2)^{t\_1 - x\_{i-1} + w} \frac{\boldsymbol{\delta}^{(x\_i - w)}}{(x\_i - w)!} e^{-\boldsymbol{\delta}}\Big\},$$

where *β* is given in Remark 1.

**Case 2**: For *<sup>m</sup>*=4 with geometric innovation, i.e. *<sup>t</sup>* <sup>∼</sup> *Ge*(*δ*)=(<sup>1</sup> <sup>−</sup> *<sup>δ</sup>*)*k<sup>δ</sup>* for *<sup>k</sup>* <sup>=</sup> 0, 1, 2, . . .

$$\mathbb{I}(X\_1 = \mathbf{x}\_1, \dots, X\_N = \mathbf{x}\_N | \theta) = \sum\_{i=2}^N \log \left\{ \sum\_{w=0}^{\min\{3x\_{i-1}, x\_i\}} \sum\_{t\_1=x\_{i-1}-w}^{\hat{x}\_i - 1} \sum\_{t\_2=\max\{0, 2x\_{i-1} - 2t\_1 - w\}}^{\frac{3x\_{i-1} - 3t\_1 - w}{2}} \binom{\hat{x}\_{i-1} - t\_1}{t\_1} \binom{\hat{x}\_{i-1} - t\_1}{t\_2} \right\}$$

$$\cdot \begin{pmatrix} x\_{i-1} - t\_1 - t\_2\\ 3x\_{i-1} - 3t\_1 - 2t\_2 - w \end{pmatrix} (\theta^3)^{t\_1} (a\theta^2)^{t\_2} (a^2 \theta)^{3x\_{i-1} - 3t\_1 - 2t\_2 - w}$$

$$\cdot (a^3)^{2t\_1 + t\_2 - 2x\_{i-1} + w} (1 - \delta)^{(x\_i - w)} \delta \Big\}\_i$$

where *β* is given in Remark 1. For higher order *m*, the formula is a little tedious, which is omitted here. For the estimate of EB-INAR(*p*), the CML estimation is too complicated, but the two-step CLS estimation is quite feasible, the procedure is similar to the case of *p* = 1. For this reason, we only consider the case of EB-INAR(1) in simulation studies.

#### *3.3. Simulation*

A Monte Carlo simulation study was conducted to evaluate the finite sample performance of the estimator. For CLS estimation, we used the package BB in R for solving and optimizing large-scale nonlinear systems to solve Equations (4) and (8). For CML estimation, we used the package maxLik in R to maximize the log-likelihood function.

We considered the following configurations of the parameters:


In simulations, we chose sample sizes *n* = 100, 200 and 400 with *M* = 500 replications for each choice of parameters. The root mean squared error (RMSE) was calculated to evaluate the performance of the estimator according to the following formula: RMSE =

$$\sqrt{\frac{1}{M-1}\sum\_{j=1}^{M}(\mathfrak{f}\_{j}-\mathfrak{f}\_{0})^{2}}, \text{ where } \mathfrak{f}\_{j} \text{ is the estimator of } \mathfrak{f}\_{0} \text{ in the } j\text{th replica.}$$

For the CLS estimate, the solutions of (4) and (8) are sensitive to *μ*ˆ and *σ*ˆ 2, so we adopted the following estimation procedure. First, calculate 500 groups of *μ*ˆ and *σ*ˆ <sup>2</sup> estimates, then use the mean values of *μ*ˆ and *σ*ˆ <sup>2</sup> to solve the Equations (4) and (8). The simulation results of CLS are summarized in Table 1. We found that the estimation values are closer to the true value and the values of RMSE gradually decrease as the sample size increases.


**Table 1.** Means of estimates, RMSEs (within parentheses) by CLS.

Note: RMSE, root mean squared error.

As it is a little difficult to estimate the parameter *m* in CML estimation, we considered *m* as known. The simulation results of CML estimators are given in Table 2. For all cases, all estimates generally show small values of RMSE, and the values of RMSE gradually decrease as the sample size increases.


**Table 2.** Means of estimates, RMSEs (within parentheses) by CML.

Note: RMSE, root mean squared error.

#### **4. Real Data Examples**

In this section, three real data sets, including overdispersed and underdispersed settings, are considered to illustrate the better performance of the proposed model. The first example is overdispersed crime data in Pittsburgh; the second is overdispersed stock data in New York Stock Exchange (NYSE); and the third is underdispersed crime data in Pittsburgh, which was also analyzed by [15]. As is well known, in time series analysis, forecasting is very important in model evaluation. We first introduce two criteria on forecasting, and other preparations.

#### *4.1. Forecasting*

Before introducing the evaluation criterion, we briefly introduce the basic procedure as follows: First, we divide the *n*<sup>1</sup> + *n*<sup>2</sup> data into two parts, the training set with the first *n*<sup>1</sup> data and the prediction set with the last *n*<sup>2</sup> data. The training set is used to estimate the parameters and evaluate the fitness of the model. Then we can evaluate the efficiency of each model by comparing the following criteria between prediction data and the real data in the prediction set.

Similar to the procedure in [16], which performs an out-of-sample experiment to compare forecasting performances of two model-based bootstrap approaches, we introduce the forecasting procedure as follows: For each *t* = (*n*<sup>1</sup> + 1), ... ,(*n*<sup>1</sup> + *n*<sup>2</sup> − 5) we estimate an INAR(1) model for the data *x*1, ... , *xt*, then we use the fitted result based on *x*1, ... , *xt* to generate the next five forecasts, which is called the 5-step ahead forecast *x<sup>F</sup> <sup>t</sup>*+1, ... , *<sup>x</sup><sup>F</sup> t*+5 for each *<sup>t</sup>* in {(*n*<sup>1</sup> <sup>+</sup> <sup>1</sup>), ... ,(*n*<sup>1</sup> <sup>+</sup> *<sup>n</sup>*<sup>2</sup> <sup>−</sup> <sup>5</sup>)}, where *<sup>x</sup><sup>F</sup> <sup>t</sup>* is the forecast at time *t*. In this way we obtain many sequences of 1, 2, ... , 5 step-ahead forecasts, finally we replicate the whole procedure *P* times. Then we can evaluate the point forecast accuracy by the forecast mean square error (FMSE) defined as

$$\text{FMSE} = \frac{1}{P} \sum\_{i=(n\_1+1)}^{n\_2} (\mathfrak{x}\_i - \overline{\mathfrak{x}}\_i^F)^2 \mu$$

and forecast mean absolute error (FMAE) defined as

$$\text{FMAE} = \frac{1}{P} \sum\_{i=(n\_1+1)}^{n\_2} |x\_i - \overline{x}\_i^F|\_{\prime}$$

where *xi* is the true value of the data, *x<sup>F</sup> <sup>i</sup>* is the mean of all the forecasts at *i* and *P* is the number of replicates.

#### *4.2. Overdispersed Cases*

We consider two overdispersed data sets, the first one contains 144 observations and represents monthly tallies of crime data from the Forecasting Principles website http://www.forecastingprinciples.com, and these crimes are reported in the police car beats in Pittsburgh from January 1990 to December 2001; the second one is Empire District Electric Company (EDE) data set from the Trades and Quotes (TAQ) set in NYSE, which contains 300 observations, and it was also analyzed by [17].

#### 4.2.1. P1V Data

The 45th P1V (Part 1 Violent Crimes) data set contains crimes of murder, rape, robbery and other kinds; see more details in the data dictionary on the Forecasting Principles website. Figure 1 plots the time series plot, the autocorrelation function (ACF) and the partial autocorrelation function (PACF) of 45th data of P1V series, respectively. The maximum value of the data is 15 and the minimum is 0; the mean is 4.3333; the variance is 7.4685. From the ACF plot, we found that the data are dependent. From the PACF plots, we can see that only the first sample is significant, which strongly suggests an INAR(1) model.

First, we divided the data set into two parts–the training set with the first *n*<sup>1</sup> = 134 counting data and the prediction set with the last *n*<sup>2</sup> = 10 data. We fit the training set by the following models: expectation thinning INAR(1) (ETINAR(1)) model in [9], GSC thinning INAR(1) (GSCINAR(1)) model in [10], the binomial thinning INAR(1) model and EB thinning EB-INAR(1) models with *m* = 3, 4. According to the mean and variance of P1V data, we used one of the most common settings–geometric distribution–as the distribution of the innovation in above models.

In order to compare the effectiveness of the models, we consider the following evaluation criteria: (1) AIC. (2) The mean and standard error of Pearson residual *rt* and its related Ljung–Box statistics, where the Pearson residuals are defined as

$$r\_t = \frac{X\_t - \hat{\mu} X\_{t-1} - \hat{\mu}\_\varepsilon}{[\hat{\sigma}^2 X\_{t-1} + \hat{\sigma}\_\varepsilon^2]^{\frac{1}{2}}}, \ t = 1, 2, \dots, \tau$$

where *μ*ˆ and *σ*ˆ <sup>2</sup> are the estimated expectation and variance for related thinning operators, respectively. (3) Three goodness-of-fit statistics: RMS (root mean square error), MAE (mean absolute error) and MdAE (median absolute error), where the error is defined by *Xt* <sup>−</sup> *<sup>E</sup>*(*Xt*|*Xt*−1), *<sup>t</sup>* <sup>=</sup> 1, ... , *<sup>n</sup>*1. (4) The mean of the data *<sup>x</sup>*<sup>ˆ</sup> on the training set calculated by the estimated results.

Next, focusing on forecasting, we generated *P* = 100 replicates based on the training set for each model. Then we calculated the FMSE and FMAE for each model.

All results of the fitted models are given in Table 3. There is no evidence of any correlation within the residuals of all five models, which is also supported by the Ljung– Box statistic based on 15 lags (because *χ*<sup>2</sup> 0.05(14) = 23.6847). There were no significant

differences for the RMS, MAE, MdAE and *x*ˆ values (the true mean of the 134 training set was 4.3880) of the models. In other words, no model performed the best in terms of these four criteria, so we also considered AIC. Since the CML estimator cannot be adopted in GSCINAR(1), one can only compare other criteria.

Considering the fitness on the training set, the EB-INAR(1) with *m* = 3 has the smallest AIC, EB-INAR(1) with *m* = 4 has almost the same AIC as *m* = 3. For the results on forecasting, EB-INAR(1) with *m* = 4 has the smallest FMSE and the second smallest FMAE among all models. EB-INAR(1) with *m* = 3 has the second smallest FMSE and the smallest FMAE. Based on these results, we conclude that EB-INAR(1) with *m* = 3, 4 performs better than INAR(1), ETINAR(1) and GSCINAR(1).

**Figure 1.** The data, autocorrelation function (ACF) and partial autocorrelation function (PACF) of 45th P1V series.

#### 4.2.2. Stock Data

We analyzed another overdispersed data set of Empire District Electric Company (EDE) from the Trades and Quotes (TAQ) data set in NYSE. The data are about the number of trades in 5 min intervals between 9:45 a.m. and 4:00 p.m. in the first quarter of 2005 (3 January–31 March 2005, 61 trading days). Here we analyze a portion of the data between first to fourth trading days. As there are 75 5 min intervals per day, the sample size was *T* = 300.

Figure 2 plots the time series plot, the ACF and the PACF of the EDE series. The maximum value of the data is 25 and the minimum is 0; the mean is 4.6933; and the variance is 14.1665. It seems that the series is not completely stationary with several outliers or

influential observations based on the time series plot. Zhu et al. [18] analyzed the Poisson autoregression for the stock transaction data with extreme values, which can be considered in the current setting. From the ACF plot, we found that the data are dependent. From the PACF plots, we can see that only the first sample is significant, which strongly suggests an INAR(1) model. We used the same procedures and criteria as before. We used the geometric distribution as the distribution of the innovation in above models.

First divide the data set into two parts–the training set with the first *n*<sup>1</sup> = 270 data and the prediction set with the last *n*<sup>2</sup> = 30 data. All results of the fitted models are given in Table 4. Among all models, EB-INAR(1) with *m* = 4 has the smallest AIC, and there is no evidence of any correlation within the residuals of all five models, which is also supported by the Ljung–Box statistic based on 15 lags. There are no significant differences for the RMS, MAE, MdAE and *x*ˆ values (the true mean of the 270 training set was 4.3407) of all considered models. For the results of prediction, EB-INAR(1) with *m* = 4 has the smallest FMSE and FMAE among all models. Based on the above results, we conclude that EB-INAR(1) with *m* = 4 performs best for this data set.

**The 1th to 4th trading days data of EDE series**

**Figure 2.** The data, ACF and PACF of first to fourth trading days of EDE series.




#### *4.3. Underdispersed Case*

The 11th FAMVIOL data set contains the crimes of family violence, which can also be obtained from the Forecasting Principles website. Figure 3 plots the time series plot, the ACF and the PACF of the 11th data set of FAMVIOL series. The maximum value of the data is 3 and the minimum is 0; the mean is 0.4027; and the variance is 0.3820. We use the procedures and criteria in Section 4.2.1 to compare different models. According to the mean and the variance of FAMVIOL data, we use one of the most common settings-Poisson distribution as the distribution of the innovation in above models.

All results of the fitted models are given in Table 5. There is no evidence of any correlation within the residuals of all five models, which is also supported by the Ljung– Box statistic based on 15 lags. There are no significant differences about the criteria on the fitness and forecasting of all models. ETINAR(1) with the biggest AIC, performed the worst in these models.

Now let us have a brief summary. For the P1V data and stock data, which are overdispersed with slightly high-count data, the EB-INAR(1) of *m* > 2 is obviously better than *m* = 2. For the FAMVIOL data, which is underdispersed with small-count data, the EB-INAR(1) with *m* > 2 is also competitive.

**The 11th crime data of FAMVIOL series**

**Figure 3.** The data, ACF and PACF of the 11th data set of the FAMVIOL series.

*Entropy* **2021**, *23*, 62


**Table5.**Fittingresults,AICandsomecharacteristicsofFAMVIOLdata.

#### **5. Conclusions**

This paper proposes an EB-INAR(1) model based on the newly constructed EB thinning operator, which is an extension of the thinning-based INAR models. We gave the estimation method for parameters and established the asymptotic properties of the estimators for the innovation-free case. Based on the simulations and real data analysis, the EB-INAR(1) model can accurately and flexibly capture the dispersion features of the data, which shows its effectiveness and practicality. Compared with other models, such as ETINAR(1) and GSCINAR(1), our model is competitive.

We point out that many existing integer-valued models can be generalized by replacing the binomial thinning operator with the EB thinning operator, such as those models in [19–23]. In addition, we can extend the considered first-order INAR model to the higherorder one. More research will be studied in the future.

**Author Contributions:** Conceptualization, Z.L. and F.Z.; methodology, F.Z.; software, Z.L.; validation, Z.L. and F.Z.; formal analysis, Z.L.; investigation, Z.L. and F.Z.; resources, F.Z.; data curation, Z.L.; writing—original draft preparation, Z.L.; writing—review and editing, F.Z.; visualization, Z.L.; supervision, F.Z.; project administration, F.Z.; funding acquisition, F.Z. All authors have read and agreed to the published version of the manuscript.

**Funding:** Zhu's work is supported by National Natural Science Foundation of China grant numbers 11871027 and 11731015, and Cultivation Plan for Excellent Young Scholar Candidates of Jilin University.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

**Proof of Proposition 1.** Since 1–4 are easy to verify, we only prove 5. By the law of total covariance, we have

$$\begin{split} \mathsf{Cov}(\mathbf{X}\_{t}, \mathbf{X}\_{t-h}) &= \mathsf{Cov}(\mathbf{E}(\mathbf{X}\_{t}|\mathbf{X}\_{t-1}, \dots), \mathbf{E}(\mathbf{X}\_{t-h}|\mathbf{X}\_{t-1}, \dots)) + \mathsf{E}\left(\mathsf{Cov}(\mathbf{X}\_{t}, \mathbf{X}\_{t-h}|\mathbf{X}\_{t-1}, \dots)\right) \\ &= \mathsf{Cov}(\mu \mathbf{X}\_{t-1} + \mu\_{t}, \mathbf{X}\_{t-h}) \\ &+ \mathsf{E}\left(\mathbf{E}(\mathbf{X}\_{t} - \mathbf{E}(\mathbf{X}\_{t}|\mathbf{X}\_{t-1}, \dots))(\mathbf{X}\_{t-h} - \mathbf{E}(\mathbf{X}\_{t-h}|\mathbf{X}\_{t-1}, \dots))|\mathbf{X}\_{t-1}, \dots\right) \\ &= \mu \cdot \mathsf{Cov}(\mathbf{X}\_{t-1}, \mathbf{X}\_{t-h}) + 0 \\ &= \dots \cdot \\ &= \mu^{h} \cdot \mathsf{Var}(\mathbf{X}\_{t-h}) .\end{split}$$

Thus, the autocorrelation function *Corr*(*Xt*, *Xt*−*h*) = *<sup>μ</sup>h*.

**Proof of Propositions 2 and 3.** Propositions 2 and 3 are similar to Theorems 1 and 2 in [8], which can be proved by verifying the regularity conditions of Theorems 3.1 and 3.2 in [24]. For instance, in the proof of Proposition 2, the partial derivatives *∂g*(*α*, *Fm*−1)/*∂α<sup>i</sup>* have finite fourth moments in [24], which correspond to Assumption 2 in Section 3.1, *u*<sup>2</sup> *<sup>m</sup>*(*α*) in [24] is corresponds to *q*1*t*(*θ*1) in Step 1.1. Hence, Proposition 2 can be regarded as a direct conclusion of Theorem 3.2.

Besides, the proof of Proposition 3 is similar to Proposition 2; the procedure is almost the same as Theorem 3.2 in [24].

**Proof of Proposition 4.** Similarly to the Theorem in [25], based on Theorem 3.2 in [14], we have

$$
\sqrt{n}(\theta\_{CLS} - \theta\_0) \xrightarrow{L} N(0, \Omega),
$$

where

$$
\Omega = \begin{pmatrix}
V\_1^{-1} \mathcal{W}\_1 V\_1^{-1} & V\_1^{-1} \mathcal{M} V\_2^{-1} \\
V\_2^{-1} \mathcal{M}^\top V\_1^{-1} & V\_2^{-1} \mathcal{W}\_2 V\_2^{-1}
\end{pmatrix}.
$$

Based on the proof of the Theorem in [25], #*q*1*t*(*θ*1) corresponds to *ut* and #*q*2*t*(*θ*2) corresponds to *Ut* in [25]. Based on the result of *V*1, *W*<sup>1</sup> in Proposition 2 and *V*2, *W*<sup>2</sup> in Proposition 3, we can obtain *M* = *E* #*q*11(*θ*1,0)*q*21(*θ*2,0) *<sup>∂</sup> ∂θ*<sup>1</sup> *g*1(*θ*1,0, *X*0) *<sup>∂</sup> ∂θ* 2 *g*2(*θ*2,0, *X*0) .

**Proof of Proposition 5.** Since the solutions (*h*1(*μ*, *σ*2), *h*2(*μ*, *σ*2)) about (*m*, *α*) in (3.1) are real-valued and have a nonzero differential, Proposition 5 is an application of the *δ*-method, for example, which can be found in Theorem A on p.122 of [26].

#### **References**

