*Article* **A New Overdispersed Integer-Valued Moving Average Model with Dependent Counting Series**

**Kaizhi Yu and Huiqiao Wang \***

School of Statistics, Southwestern University of Finance and Economics, Chengdu 611130, China; yukz@swufe.edu.cn

**\*** Correspondence: 118020209004@smail.swufe.edu.cn

**Abstract:** A new integer-valued moving average model is introduced. The assumption of independent counting series in the model is relaxed to allow dependence between them, leading to the overdispersion in the model. Statistical properties were established for this new integer-valued moving average model with dependent counting series. The Yule–Walker method was applied to estimate the model parameters. The estimator's performance was evaluated using simulations, and the overdispersion test of the INMA(1) process was applied to examine the dependence between counting series.

**Keywords:** integer-valued moving average model; counting series; dispersion test

#### **1. Introduction**

Integer-valued time series can be encountered in numerous fields, such as epidemiology, insurance, and intraday stock transitions. The most widely used model is the integer-valued autoregressive (INAR) model, a recursive model first introduced by Alzaid and Al-Osh [1] and is similar to the traditional autoregressive (AR) model. Du and Li [2] generalized the model to the *p*-th order (which was called INAR(*p*) model) and proved the ergodic and Markov properties of the model. Similar to the continuous-valued moving average model, the *q*-th order integer-valued moving average model INMA(*q*) was introduced by Al-Osh and Alzaid [3], which is a slightly different form proposed by McKenzie [4].

Many researchers generalize the INAR model to deal with different real-life situations. Weiß [5] presented a new INAR(*p*) model showing possible marginal distributions of the DSD family. This model overcomes the difficulty of choosing the appropriate marginal distribution. Monteiro and Scotto [6] defined the periodic integer-valued autoregressive model, driven by a periodic sequence of independent Poisson-distributed random variables. Weiß [7] proposed the extended Poisson INAR(1) model, where the innovations are assumed to be serially dependent. Zhu [8] introduced a negative binomial INGARCH model to handle integer-valued time series with overdispersion and potential extreme observations. The study by Weiß [9] discussed threshold models for integer-valued time series with infinite range and briefly discussed new models for counting data time series with a finite range. Kang and Wang [10] generalized the mixture INAR(1) model based on mixing Pegram and binomial thinning operator. Li and Wang [11] proposed the first-order mixed integer-valued autoregressive process with zero-inflated generalized power series innovations, which contains the commonly used zero-inflated Poisson and geometric distributions. To handle the non-stationary integer-valued time series with a large dispersion, Kim and Park [12] introduced an integer-valued autoregressive process with a signed binomial thinning operator (INARS(*p*)).

Various modified thinning operators have been proposed to capture the specificity of real data, and many new INAR-type models have been defined. For example, Zheng and Basawa [13] introduced the random coefficient thinning operator while Risti´c and

**Citation:** Yu, K.; Wang, H. A New Overdispersed Integer-Valued Moving Average Model with Dependent Counting Series. *Entropy* **2021**, *23*, 706. https://doi.org/ 10.3390/e23060706

Academic Editor: Christian H. Weiss

Received: 30 April 2021 Accepted: 31 May 2021 Published: 2 June 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Bakouch [14] proposed the negative binomial thinning operator. The *p*th-order integervalued autoregressive process with signed generalized power series thinning operator was proposed by Zhang et al. [15].

The most significant generalization of the thinning operator was made by Risti´c et al. [16] which they called the dependent Bernoulli thinning operator. They constructed the new sequence of the Bernoulli random variable allowing correlation between counting series. Based on this, Mileti´c Ili´c et al. [17] proposed the new model, based on the mix of regular binomial thinning and dependent thinning operator. For more details of thinning operators, refer to Weiß [18].

MA-type models are very important in time series analysis. The method of moving average is generally popular in statistical and mathematical analyses. Some researchers study forecasting using the moving average method. Winters [19] analyzed the exponentially weighted moving average method for forecasting sales. Cox [20] proposed the weighted moving average method to predict the Markov series. Landauskas et al. [21] introduced an algebraic approach to select the appropriate weight coefficients for weight moving averages performing better than classical moving average predictors. Wind speed prediction using combined time series model and neural network prediction was studied by Nan et al. [22]. Other applied weight moving average methods in control charts. Alevizakos et al. [23] proposed the triple exponentially weighted moving average control chart (TEWMA), which improves the detection ability of the classical control chart. Capizzi and Masarotto [24] proposed the adaptive exponentially weighted moving average control chart (AEWMA), which weights past observations of the monitored process using a suitable function of the current error. Adegoke et al. [25] studied the multivariate homogeneously weighted moving average (MHWMA) control chart for monitoring a process mean vector.

Some researchers constructed MA-type models from the perspective of count time series. Brännäs and Quoreshi [26] used the INMA process to model the number of transactions for intraday stocks and extended the model to include explanatory variables. Brännäs and Hall [27] mainly focused on the estimation in the INMA model. The construction of the thinning operation from these studies is based on independent counting series, which is a strong assumption. Thus, to make the model more flexible in capturing the specificity of different data types, this assumption should be relaxed.

For small counts of intraday transactions in stocks per minute, the decision of buyer or seller could be affected by the public news, which means decisions from different individuals may not be uncorrelated. The change in inventories can sometimes be described as an INMA process. However, during a certain period, the change can be influenced by the same external factor. For the INMA model applying a discrete risk model, the number of claims for certain insurance will be affected by the same factor, such as natural disasters. Thus, these claims are no longer independent. The independence between counting series should be relaxed. Allowing the correlation between counting series is natural, and therefore the INMA model based on dependent counting series can be derived to handle different real data situations.

The rest paper is organized as follows. Section 2 presents the model construction and discussions on some relevant statistical properties, and Section 3 discussed the estimation of unknown parameters. Section 4 shows the numerical simulation results and give dispersion test for dependence between counting series, while the conclusions are given in Section 5.

#### **2. The Model and Basic Properties**

#### *2.1. The Model Construction*

The counting series {*U*}*i*∈<sup>N</sup> of the integer-valued model is defined as:

$$\mathcal{U}\_{\dot{\imath}} = (1 - V\_{\dot{\imath}})\mathcal{W}\_{\dot{\imath}} + V\_{\dot{\imath}}Z\_{\dot{\imath}}$$

{*W*}*i*∈<sup>N</sup> is a sequence of *<sup>i</sup>*.*i*.*<sup>d</sup>* random variable with *Wi* ∼ *<sup>B</sup>*(1, *<sup>β</sup>*), *<sup>β</sup>* ∈ [0, 1]. {*V*}*i*∈<sup>N</sup> is a sequence of *i*.*i*.*d* random variable with *Vi* ∼ *B*(1, *θ*), *θ* ∈ [0, 1]. *Z* is a random variable with *<sup>Z</sup>* <sup>∼</sup> *<sup>B</sup>*(1, *<sup>β</sup>*). The operator ◦*<sup>θ</sup>* is defined by *<sup>β</sup>* ◦*<sup>θ</sup> <sup>ε</sup><sup>t</sup>* <sup>=</sup> <sup>∑</sup>*ε<sup>t</sup> <sup>i</sup>*=<sup>1</sup> *Ui*, it is the dependent Bernoulli thinning operator, where *ε<sup>t</sup>* is a non-negative integer-valued random variable. Based on this construction, we can easily verify that *E*(*Ui*) = *β*, *Var*(*Ui*) = *β*(1 − *β*), *corr*(*Ui*, *Uj*) = *θ*2, which has promising dependence between the counting series. Now we generalize the dependent count series to the INMA(*q*) process, For convenience, we use ◦ instead of ◦*<sup>θ</sup>* to simplify the notation.

**Definition 1** (Dependent Counting series Integer-valued Moving Average Model (DCINMA))**.** *The DCINMA(q) model is defined as:*

$$X\_t = \varepsilon\_t + \beta\_1 \circ \varepsilon\_{t-1} + \dots + \beta\_q \circ \varepsilon\_{t-q}$$

*<sup>β</sup><sup>j</sup>* ◦ *<sup>ε</sup>t*−*j*, *<sup>j</sup>* = 1, 2, ... , *<sup>q</sup> is the dependent Bernoulli thinning operator, and the following conditions should be satisfied.*

*A1.* {*εt*}*t*∈<sup>N</sup> *is a sequence of i. i. d non-negative random variables.*

*A2. The counting variable* {*Ui*}*i*∈<sup>N</sup> *is independent of ε<sup>t</sup> for any i*, *t.*

*A3. <sup>β</sup><sup>j</sup>* ◦ *<sup>ε</sup>t*−*j, for any j* = 1, 2, . . . , *q are mutually independent.*

*2.2. The Numerical Properties for DCINMA(q) Model*

We denote *με* and *σ*<sup>2</sup> *<sup>ε</sup>* as the mean and variance of term *εt*.

**Theorem 1.** *The numerical characteristics of* {*Xt*} *in Definition 1 are as follows:*

$$\begin{aligned} (i) \ E(\mathbf{X}\_t) &= \mu\_t (1 + \sum\_{i=1}^q \beta\_i) \\ (ii) \ Var(\mathbf{X}\_t) &= \sigma\_\varepsilon^2 + \sum\_{i=1}^q \left[ \mu\_\varepsilon \beta\_i + \mu\_\varepsilon^2 \theta^2 \beta\_i (1 - \beta\_i) - \mu\_\varepsilon (\theta^2 \beta\_i - \theta^2 \beta\_i^2 + \beta\_i^2) + \beta\_i^2 \sigma\_\varepsilon^2 \right] \\ (iii) \end{aligned}$$
 
$$\operatorname{cov}(\mathbf{X}\_t, \mathbf{X}\_{t-k}) = \begin{cases} \sigma\_\varepsilon^2 \sum\_{i=0}^{q-k} \beta\_i \beta\_{i+k} & k = 1, \dots, q \\ 0 & k \ge q + 1 \end{cases}$$

**Proof.** See Appendix A.

**Theorem 2.** {*Xt*} *is the process defined in Definition 1, then* {*Xt*} *is a covariance stationary process.*

**Proof.** It can be seen from the Theorem 1 that the unconditional mean and unconditional variance of *Xt* is a finite constant given the distribution of *εt*. Thus, *Xt* is a stationary process.

**Theorem 3.** {*Xt*} *is the process defined in Definition 1, then* {*Xt*} *is ergodic in mean and autocovariance function.*

**Proof.** See Appendix A.

*2.3. The Probability Generating Functions for DCINMA Model*

Risti´c et al. [16] derived the probability generating function of the *<sup>n</sup>* ∑ *i*=1 *Ui* as follows:

$$\begin{aligned} \Phi\_{\mathsf{U}} &= E[\mathsf{s}^{(\mathsf{U}\_{1}+\mathsf{U}\_{2}+...+\mathsf{U}\_{n})}] \\ &= (1-\beta)(1-\beta(1-\theta)(1-s))^{n} + \beta(1-(\beta+\theta-\beta\theta)(1-s))^{n} \end{aligned}$$

The above equation implies that the term *<sup>n</sup>* ∑ *i*=1 *Ui* has a distribution of:

*U*<sup>1</sup> + *U*<sup>2</sup> + ... + *Un* = *Bin*(*n*, *β*(1 − *θ*)) *w*.*p* 1 − *β Bin*(*n*, *β* + *θ* − *βθ*) *w*.*p β*

Then the probability generating function (PGF) of {*Xt*} is:

$$\Phi\_{X\_{\theta}}(\mathbf{s}) = \left[ (1 - \beta) \cdot \phi\_{\mathbb{E}}(1 - \beta(1 - \theta)(1 - \mathbf{s})) + \beta \cdot \phi\_{\mathbb{E}}(1 - (\beta + \theta - \beta\theta)(1 - \mathbf{s})) \right] \cdot \phi\_{\mathbb{E}}(\mathbf{s})$$

*φε*(*s*) is the probability generating function of the *εt*. Given the distribution of *εt*, the explicit expression of the probability generating function can be derived. Suppose Poisson distribution of *εt*, then

$$\phi\_{X\_0}(s) = \left[ (1 - \beta) \cdot \mathcal{e}^{-\lambda \beta (1 - \theta)(1 - s)} + \beta \mathcal{e}^{-\lambda (\beta + \theta - \beta \theta)(1 - s)} \right] \cdot \mathcal{e}^{(\lambda (s - 1))}$$

The probability generating function is defined by the probabilities. The uniqueness of a power series expansion implies that the probability generating function in turn defines probabilities.

Therefore, we can derive the probability of {*Xt*}. For *Xt* = *j*, the probability mass function of *Xt* is:

$$P(X\_t = j) = [\frac{1}{j!} \frac{d^j \phi(s)}{ds^j}]\_{s=0}$$

The bivariate probability generating function of {*Xt*} is <sup>Φ</sup>*Xt*,*Xt*−<sup>1</sup> (*s*1,*s*2). Thus, deriving the explicit expression of bivariate probability generating function with Poisson innovation is easy for DCINMA(1) process.

$$\begin{aligned} E(s\_1^{X\_t} s\_2^{X\_{t-1}}) &= E(s\_1^{\mathcal{G} \cap \mathfrak{e}\_{t-1} + \mathfrak{e}\_t} \cdot s\_2^{\mathcal{G} \cap \mathfrak{e}\_{t-2} + \mathfrak{e}\_{t-1}}) \\ &= E(s\_1^{\mathcal{G} \cap \mathfrak{e}\_{t-1}} \cdot s\_1^{\mathfrak{e}\_t} \cdot s\_2^{\mathcal{G} \cap \mathfrak{e}\_{t-2}} \cdot s\_2^{\mathfrak{e}\_{t-1}}) \\ &= E(s\_1^{\mathfrak{e}\_t}) \cdot E(s\_1^{\mathcal{G} \cap \mathfrak{e}\_{t-2}}) \cdot E(s\_1^{\mathcal{G} \cap \mathfrak{e}\_{t-2}} \cdot s\_2^{\mathfrak{e}\_{t-1}}) \end{aligned}$$

Given the Poisson distribution of the innovation term, we can obtain:

$$E(s\_1^{\varepsilon\_t}) = e^{\lambda(s\_1 - 1)}, \\ E(s\_2^{\beta \circ \varepsilon\_{t-1}}) = (1 - \beta) \cdot e^{-\lambda \beta (1 - \theta)(1 - s\_2)} + \beta \cdot e^{-\lambda (\beta + \theta - \beta \theta)(1 - s\_2)}$$

$$E(s\_1^{\beta \circ \varepsilon\_{t-1}} \cdot s\_2^{\varepsilon\_{t-1}}) = (1 - \beta) \cdot e^{\lambda s\_2 [1 - \beta(1 - \theta)(1 - s\_2)]} + \beta \cdot e^{\lambda s\_2 [(1 - \beta - \theta - \beta \theta)(1 - s\_1)]}$$

#### *2.4. Compare with the INMA(q) Model*

The mean and the covariance of the *q*-th order integer-valued moving average model has the same expression, and the variance of the INMA(*q*) process is:

$$V\_{im\text{nr}} = \sigma\_\varepsilon^2 + \sum\_{i=1}^q [\sigma\_\varepsilon^2 \beta\_i + \mu\_\varepsilon \beta\_i (1 - \beta\_i)]^2$$

From Theorem 1, the variance of the DCINMA process presents a more complicated expression than the INMA(*q*) process due to the correlation between the counting series of parameter *θ*. For the Poisson innovation, the overdispersion index of the INMA and DCINMA model is as follows:

$$I\_{inma} = 1,\\ I\_{dimma} = \frac{1 + \beta + \beta\lambda\theta^2(1 - \beta)}{1 + \beta}$$

Since the value of *<sup>λ</sup>*, *<sup>β</sup>* and *<sup>θ</sup>* are all non-negative, the term 1 <sup>+</sup> *βλθ*2(<sup>1</sup> <sup>−</sup> *<sup>β</sup>*) <sup>&</sup>gt; 1. When two models (INMA(1) and DCINMA(1)) share the same *λ* and *β*, the *θ* will determine whether there is dependence between counting series (*θ* = 0). Thus, if we want to test *θ* = 0, it is equivalent to evaluating whether the model is overdispersed. If the value of *θ* is 0, the model degenerates to INMA model.

#### *2.5. Compare the Entropy with INMA(1) Model*

Entropy is an important concept in physics, but it can also be applied to other disciplines, including cosmology and economics. Entropy is a measure of the randomness or disorder of a system. In our case, entropy can be seen as a dispersion measure for the model. Thus, we evaluate the model from the perspective of entropy. The definition of Shannon entropy as follows:

$$H(\mathcal{Y}) = -\sum\_{i=1}^{n} p(y\_i) \cdot \ln p(y\_i)$$

where *Y* is a discrete random variable with probability mass function taking values on *y*1, ... , *yn*. We denote the *φdinma*(*s*) and *φinma*(*s*) as probability generating functions of the DCINMA(1) and INMA(1) process, respectively. Suppose the same innovation term for the two models follows the Poisson distribution. We can rewrite the probability generating function of them as:

$$\begin{aligned} \phi\_{\text{dimma}}(s) &= (1 - \beta) \cdot \mathfrak{c}^{(s-1)\cdot\left[\lambda\beta(1-\theta)+\lambda\right]} + \beta \cdot \mathfrak{c}^{(s-1)\cdot\left[\lambda\left(\beta+\theta-\beta\theta+\lambda\right)\right]}\\ \phi\_{\text{dimma}}(s) &= \mathfrak{c}^{(s-1)\cdot(\lambda\beta+\lambda)} \end{aligned}$$

Thus, we can conclude the distribution of DCINMA(1) and INMA(1) process based on the definition of the probability generating function. *Xinma*(1) *<sup>t</sup>* and *<sup>X</sup>dinma*(1) *<sup>t</sup>* denote the sample from INMA(1) model and DCINMA(1) model.

$$\begin{aligned} X\_t^{\
in \text{max}(1)} &\sim Poi(\lambda \beta + \lambda) \\ X\_t^{\
dim \text{var}(1)} &\sim \begin{cases} Poi(\lambda \beta (1 - \theta) + \lambda) & w.p & 1 - \beta \\ Poi(\lambda (\beta + \theta - \beta \theta) + \lambda) & w.p & \beta \end{cases} \end{aligned}$$

The Shannon entropy for both models can be derived as follows:

$$\begin{split} H(X\_{t}^{\text{imma}(1)}) &= (\lambda\beta + \lambda)[1 - \log((\lambda\beta + \lambda))] + \epsilon^{(\lambda\beta + \lambda)} \cdot \sum\_{k=0}^{\infty} \frac{(\lambda\beta + \lambda)^{k} \log(k!)}{k!} \\ H(X\_{t}^{\text{dimma}(1)}) &= (1 - \beta) \cdot \{ (\lambda\beta(1 - \theta) + \lambda)[1 - \log((\lambda\beta(1 - \theta) + \lambda))] \} \\ &+ \epsilon^{(\lambda\beta(1 - \theta) + \lambda)} \cdot \sum\_{k=0}^{\infty} \frac{(\lambda\beta(1 - \theta) + \lambda)^{k} \log(k!)}{k!} \\ &+ \beta \cdot \{ (\lambda(\beta + \theta - \beta\theta) + \lambda)[1 - \log((\lambda(\beta + \theta - \beta\theta) + \lambda))] \\ &+ \epsilon^{(\lambda(\beta + \theta - \beta\theta) + \lambda)} \cdot \sum\_{k=0}^{\infty} \frac{(\lambda(\beta + \theta - \beta\theta) + \lambda)^{k} \log(k!)}{k!} \end{split}$$

The expression of entropy for the DCINMA(1) model is more complicated than the INMA(1) model due to the additional parameter *θ*.

#### **3. Parameter Estimation**

The estimation of the INMA model is complicated. Brännäs and Hall [27] discussed the Yule–Walker estimator, generalized moment method (GMM) based on the probability generating function (PGF) function, and the conditional least square method. Here, we did not attempt to use maximum likelihood estimation, which requires density functions that are generally not easily obtained in the INMA model, especially for this dependent situation. The results of generalized moments method-based probability generating function (PGF) function estimator are highly correlated with the values of *z*<sup>1</sup> and *z*<sup>2</sup> in Φ*k*(*z*1, *z*2), which are not stable. On the other hand, for the conditional least square method, the number of estimation equations are less than the number of parameters. This means that there is an additional parameter *θ* to be estimated.

Therefore, we derive the Yule–Walker estimator to obtain the unknown parameters for the Poisson DCINMA(*q*) model. We denote the following symbols: *μ<sup>X</sup>* is the sample mean of {*Xt*}, *με* is the mean of innovation term *εt*. *γ*<sup>0</sup> is the sample variance of {*Xt*}, *γ*1, *γ*2, ... , *γ<sup>q</sup>* are the sample covariance of 1-*th*, 2-*th*,. . . , *q*-*th* order. The *β*<sup>0</sup> for the equations below is 1.

Then, the unknown parameters in the DCINMA(*q*) model can be solved by equations through the sample moments function, which is as follows:

$$\begin{cases} \mu\_{X} = \mu\_{\varepsilon}(1 + \beta\_{1} + \beta\_{2} + \dots + \beta\_{q}) \\ \gamma\_{0} = \sigma\_{\varepsilon}^{2} + \sum\_{i=1}^{q} [\mu\_{\varepsilon}\beta\_{i} + \mu\_{\varepsilon}^{2}\theta^{2}\beta\_{i}(1 - \beta\_{i}) - \mu\_{\varepsilon}(\theta^{2}\beta\_{i} - \theta^{2}\beta\_{i}^{2} + \beta\_{i}^{2}) + \beta\_{i}^{2}\sigma\_{\varepsilon}^{2}] \\ \gamma\_{1} = \lambda\left(\beta\_{0}\beta\_{1} + \beta\_{1}\beta\_{2} + \dots + \beta\_{q-1}\beta\_{q}\right) \\ \gamma\_{2} = \lambda\left(\beta\_{0}\beta\_{2} + \beta\_{1}\beta\_{3} + \dots + \beta\_{q-2}\beta\_{q}\right) \\ \vdots \\ \gamma\_{q-1} = \lambda\left(\beta\_{0}\beta\_{q-1} + \beta\_{1}\beta\_{q}\right) \\ \gamma\_{q} = \lambda\left(\beta\_{0}\beta\_{q}\right) \end{cases}$$

#### **4. Simulation Study**

*4.1. Estimation of the Model Parameters*

In this section, we present some simulation results to show the performance of the estimator using different sample sizes. (*n* = 100, 300, 700, 1000). We focused on the Poisson DCINMA(1) model. The three group parameter values considered in this model are as follows:

> **Model A**: (*λ* = 1, *θ* = 0.6, *β* = 0.2) **Model B**: (*λ* = 4, *θ* = 0.7, *β* = 0.1) **Model C**: (*λ* = 5, *θ* = 0.5, *β* = 0.1)

We used the above parameter groups to generate the data and applied Yule–Walker method, then computed the bias and standard error based on 10,000 replications for each parameter group. The estimation results and their performance are reported in Table 1.

As shown in Table 1, we obtained nearly convergent estimators in all cases. In all cases, as the sample size increased, the bias decreased.

**Table 1.** Some numerical results of the estimates for true values of the parameters *λ*, *θ* and *β*.



**Table 1.** *Cont*.

#### *4.2. Testing for Dependence between Counting Series*

From Section 2.5, when the two processes have the same *λ* and *β*, the DCINMA model always presents overdispersion due to *θ*. We tested whether the value of *θ* is 0, which is equivalent to assessing whether the DCINMA process presents overdispersion. Aleksandrov and Weiß [28] proposed the diagnostic test for the INMA process. In this section, only the overdispersion test was applied. Under the null hypothesis *H*<sup>0</sup> : *θ* = 0 (the process does not present overdispersion), the distribution of index dispersion has the following form:

$$\mathcal{I}\_{disp} \stackrel{d}{\longrightarrow} N(1 - \frac{1}{T} \frac{1 + 3\beta}{1 + \beta}, \frac{1}{T}(2 + 4(\frac{\beta}{1 + \beta})^2))$$

We then analyzed the simulation results to assess the performance of the overdispersion test. The nominal level *α* = 0.05 was employed, for sample sizes, *n* = 100, 150, 250. We did 1000 replications to calculate the size and power of the test. The following parameter groups were considered:

> **Model D:** (*λ* = 3, *θ*=0.4, *β* = 0.5) **Model E:** (*λ* = 5, *θ*=0.4, *β* = 0.3) **Model F:** (*λ* = 6, *θ*=0.3, *β* = 0.4) **Model G:** (*λ* = 7, *θ*=0.7, *β* = 0.8)

From Table 2, given the same values for *λ* and *β*, under the null hypothesis, *θ* = 0, the size of <sup>ˆ</sup>*Idisp* are close to nominal level. Under the alternative situation, *<sup>θ</sup>* <sup>=</sup> 0, for all cases, as *n* increase the power quickly increases to 1.


**Table 2.** Size and power of DCINMA(1) and INMA(1) model.

#### **5. Real Data Example**

We then applied the proposed model to a real dataset. Unlike the integer-valued autoregressive model, since the density function of this DCINMA process is hard to obtain, the Akaike information criterion (AIC) is difficult to use in measuring the fitness of such an INMA-type model. To assess the performance of the model, we adapted the parametric resampling method by Jung et al. [29].

We used crime data available from the Forecasting Principles site. The data consist of 144 observations of monthly larceny counts for the City of Pittsburgh from January of 1990 to December of 2001. It would be highly improbable for criminals to remain in the same place for a long time. They probably would flee in various directions to commit offenses, so that the INMA-type model is appropriate in this case.

The calculated of sample mean (4.73) and variance (6.40) suggest that the dataset is overdispersed. In addition, the sample autocorrelation function (ACF) in Figure 1 shows that the order of the model should be set to one. We fitted the data set into the DCINMA(1) model and INMA(1) model to evaluate the model performance for overdispersed data. The estimation results for the two models are presented in Table 3.

**Table 3.** Estimation results of DCINMA(1) and INMA(1) model.


The parametric resampling method can be performed in several steps:

**Step 1:** Generate samples with length equal to the original dataset from the fitted model for *R* times.

**Step 2:** Compute the autocorrelation function (ACF) for each sample series. Then derive the empirical sample autocorrelation function (ACF) for the different lag orders.

**Step 3:** Using the results from **Step 2**, compute the 100(1 − *α*/2) and 100(*α*/2) quantiles.

For the given example, *R* was set to 5000, and *α* was set to 0.05. We plotted the acceptance envelope for DCINMA(1) model and INMA(1) model, and the results are shown in Figures 2 and 3. The sample autocorrelations for the real dataset lie within the acceptance envelopes of the DCINMA(1) model except for only one point, while the important second order of the autocorrelation function (ACF) exceed the upper bound of the acceptance envelopes of the INMA(1) model. It is clear that the acceptance envelopes of the DCINMA(1) model performs better than the acceptance envelopes of the INMA(1) model. Thus, the proposed model is suitable for this dataset, adequately representing the autocorrelation for real dataset.

**Figure 1.** Time series plot (top figure) and autocorrelation function (below figure panel) for actual Larceny dataset.

**Figure 2.** Acceptance envelope of the DCINMA(1) model for the autocorrelation function for the larceny dataset.

**Figure 3.** Acceptance envelope of the INMA(1) model for the autocorrelation function for the larceny dataset.

#### **6. Conclusions**

In this paper, we constructed a new integer-valued moving average model with dependent counting series. The statistical properties of the proposed model were discussed and evaluated. The parameter estimation of the proposed model is based on the Yule–Walker method. The new model presents overdispersion due to the dependence parameter *θ*, which means the dependence between counting series can be verified by an overdispersion test. Numerical simulation results were used to evaluate the performance of the estimation and overdispersion test.

Future extensions for this study are as follows. First, we only focused on the stationary DCINMA(1) model in this paper. However, cases with non-stationary series are more common. The switched system is an important model in studying hybrid systems, particularly from the perspective of control science and engineering. Refer to ([30–32]) for more detailed discussion. The mechanism conducts the transformation between different subsystems, providing an approach to generalize the DCINMA process into a non-stationary case. A function can be introduced to control the change for different parameter values of the stationary case, which characterize non-stationary in series. Second, the parameter *θ* in our model provides a probability for overdispersion. In a switched system, the switching signal is a piecewise constant function, depending on time, external signal, output, and its own past value. Thus, the weighted switching signal function with weights *θ* and (1 − *θ*) can be considered, where the weight *θ* gives the probability for switching to different subsystems.

**Author Contributions:** Conceptualization, K.Y. and H.W.; methodology, K.Y.; software, H.W.; validation, K.Y. and H.W.; formal analysis, K.Y.; investigation, H.W.; resources, K.Y.; data curation, K.Y.; writing—original draft preparation, H.W.; writing—review and editing, K.Y.; visualization, H.W.; supervision, K.Y.; project administration, K.Y.; funding acquisition, K.Y. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Data Availability Statement:** The data used to support the findings of this study are included within the article.

**Acknowledgments:** This research was supported by the National Social Science Fund of China (No. 18BTJ039) and Fundamental Research Funds for the Central Universities (No. JBK2102021). In addition, the authors will be grateful for the editor and reviewers' comments and suggestions, which led to improvements of the paper.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

**Proof of Theorem 1.** we give the detailed derivation process for 1-th order, the *q*-th order can be derived in the same manner. term (*i*) and (*ii*) of the Theorem 1 are easy to verify, so we focus on the term (*iii*).

$$\begin{split} Var(\mathbf{X}\_t) &= Var(\boldsymbol{\beta} \circ \boldsymbol{\varepsilon}\_{t-1}) + Var(\boldsymbol{\varepsilon}\_t) \\ &= V[E[\boldsymbol{\beta} \circ \boldsymbol{\varepsilon}\_{t-1} | \boldsymbol{\varepsilon}\_{t-1}]] + E[V[\boldsymbol{\beta} \circ \boldsymbol{\varepsilon}\_{t-1} | \boldsymbol{\varepsilon}\_{t-1}]] + Var(\boldsymbol{\varepsilon}\_t) \end{split}$$

Then, we focus on the term *V*[*β* ◦ *εt*−1|*εt*−1].

$$\begin{split} V[\boldsymbol{\beta}\diamond\boldsymbol{\varepsilon}\_{t-1}|\boldsymbol{\varepsilon}\_{t-1}] &= E[(\sum\_{i=1}^{\varepsilon\_{t-1}}\boldsymbol{U}\_{i})^{2}|\boldsymbol{\varepsilon}\_{t-1}] - [E[\sum\_{i=1}^{\varepsilon\_{t-1}}\boldsymbol{U}\_{i}|\boldsymbol{\varepsilon}\_{t-1}]]^{2} \\ &= \boldsymbol{\varepsilon}\_{t-1}\cdot\boldsymbol{E}[\boldsymbol{U}\_{1}^{2}] + (\boldsymbol{\varepsilon}\_{t-1}-1)\cdot\boldsymbol{\varepsilon}\_{t-1}\cdot\boldsymbol{E}[\boldsymbol{U}\_{1}\cdot\boldsymbol{U}\_{2}] - [\boldsymbol{\varepsilon}\_{t-1}\cdot\boldsymbol{\beta}]^{2} \\ &= \boldsymbol{\varepsilon}\_{t-1}\cdot(\boldsymbol{\beta}\cdot(1-\boldsymbol{\beta})+\boldsymbol{\beta}^{2}) + \boldsymbol{\varepsilon}\_{t-1}\cdot(\boldsymbol{\varepsilon}\_{t-1}-1)\cdot(\boldsymbol{\theta}^{2}\cdot\boldsymbol{\beta}\cdot(1-\boldsymbol{\beta})+\boldsymbol{\beta}^{2}) \\ &- \boldsymbol{\beta}^{2}\cdot\boldsymbol{\varepsilon}\_{t-1}^{2} \end{split}$$

$$\begin{aligned} E\left[V[\boldsymbol{\beta}\circ\boldsymbol{\varepsilon}\_{t-1}|\boldsymbol{\varepsilon}\_{t-1}]\right] &= (\boldsymbol{\beta}\cdot(1-\boldsymbol{\beta})+\boldsymbol{\beta}^{2})\cdot E[\boldsymbol{\varepsilon}\_{t-1}] - E[\boldsymbol{\beta}^{2}\cdot\boldsymbol{\varepsilon}\_{t-1}^{2}] \\ &+ E[\boldsymbol{\varepsilon}\_{t-1}\cdot(\boldsymbol{\varepsilon}\_{t-1}-1)]\cdot(\boldsymbol{\theta}^{2}\cdot\boldsymbol{\beta}\cdot(1-\boldsymbol{\beta})+\boldsymbol{\beta}^{2}) \\ &= \boldsymbol{\lambda}\cdot(\boldsymbol{\beta}\cdot(1-\boldsymbol{\beta})-\boldsymbol{\beta}^{2})-\boldsymbol{\beta}^{2}\cdot(\boldsymbol{\lambda}+\boldsymbol{\lambda}^{2}) \\ &+ (\boldsymbol{\lambda}+\boldsymbol{\lambda}^{2}-\boldsymbol{\lambda})\cdot(\boldsymbol{\theta}^{2}\cdot\boldsymbol{\beta}\cdot(1-\boldsymbol{\beta})+\boldsymbol{\beta}^{2}) \\ &= \boldsymbol{\lambda}^{2}\cdot\boldsymbol{\beta}\cdot(1-\boldsymbol{\beta})\cdot\boldsymbol{\theta}^{2}+\boldsymbol{\beta}\cdot\boldsymbol{\lambda}-\boldsymbol{\beta}^{2}\cdot\boldsymbol{\lambda} \end{aligned}$$

$$\begin{split} Var(\mathbf{X}\_t) &= V[\boldsymbol{\beta} \cdot \boldsymbol{\varepsilon}\_{t-1}] \boldsymbol{\lambda}^2 \cdot \boldsymbol{\beta} \cdot (1 - \boldsymbol{\beta}) \cdot \boldsymbol{\theta}^2 + \boldsymbol{\beta} \cdot \boldsymbol{\lambda} - \boldsymbol{\beta}^2 \cdot \boldsymbol{\lambda} + \boldsymbol{\lambda} \\ &= \boldsymbol{\beta} \cdot \boldsymbol{\lambda} + \boldsymbol{\lambda}^2 \cdot \boldsymbol{\beta} \cdot (1 - \boldsymbol{\beta}) \cdot \boldsymbol{\theta}^2 + \boldsymbol{\lambda} \end{split}$$

**Proof of Theorem 3.** let *Yt* = *Xt* − *μX*, {*Xt*} is the process defined in Section 2. To prove the *Yt* is ergodic in autocovariance function, it is sufficient to show that:

$$\lim\_{T \to \infty} \frac{1}{T} \sum\_{l=0}^{T} \left\{ E\left(\mathbf{Y}\_t \mathbf{Y}\_{t+v} \mathbf{Y}\_{t+l} \mathbf{Y}\_{t+v+l}\right) - R^2(v) \right\} = 0$$

It is obvious *E*(*Y<sup>i</sup> <sup>t</sup>*) < ∞. For *i* = 1, 2, 3, 4, let *E*(*Y*<sup>2</sup> *<sup>t</sup>* ) = *c*2, *E*(*Y*<sup>4</sup> *<sup>t</sup>* ) = *c*4. **The case** *v* = 0**:**

$$E(\mathcal{Y}\_t \mathcal{Y}\_{t+\upsilon} \mathcal{Y}\_{t+l} \mathcal{Y}\_{t+\upsilon+l}) = E(\mathcal{Y}\_t^2 \mathcal{Y}\_{t+l}^2)$$

If *l* = 1,

$$E(\mathbf{Y}\_t \mathbf{Y}\_{t+\upsilon} \mathbf{Y}\_{t+l} \mathbf{Y}\_{t+\upsilon+l}) = E(\mathbf{Y}\_t^2 \mathbf{Y}\_{t+l}^2) \le \left( E(|\mathbf{Y}\_t^2|^2 |\mathbf{Y}\_{t+l}^2|^2) \right)^{1/2} = c\_4$$

If *l* > 1, *Yt* and *Yt*<sup>+</sup>*<sup>l</sup>* are irrelevant, then

$$E(\mathbf{Y}\_t \mathbf{Y}\_{t+\upsilon} \mathbf{Y}\_{t+l} \mathbf{Y}\_{t+\upsilon+l}) = E(\mathbf{Y}\_t^2 \mathbf{Y}\_{t+l}^2) = c\_2^2 = \gamma\_Y^2(0)$$

$$\frac{1}{T} \sum\_{l=0}^T \{ E(\mathbf{Y}\_t \mathbf{Y}\_{t+\upsilon} \mathbf{Y}\_{t+l} \mathbf{Y}\_{t+\upsilon+l}) - R^2(\upsilon) \} \le \frac{1}{T} \{ \varepsilon\_4 - \gamma\_Y^2(0) \} \to 0$$

$$\begin{array}{c} \text{The case } v \ge 1; \\ \text{If } l \le v+1; \end{array}$$

$$E(\mathbf{Y}\_t \mathbf{Y}\_{t+\upsilon} \mathbf{Y}\_{t+l} \mathbf{Y}\_{t+\upsilon+l}) \le [E(|\mathbf{Y}\_t \mathbf{Y}\_{t+\upsilon}|^2 | \mathbf{Y}\_{t+l} \mathbf{Y}\_{t+l+\upsilon}|^2]^{\frac{1}{2}} \le [E(|\mathbf{Y}\_t|^4 | \mathbf{Y}\_{t+\upsilon}|^4 | \mathbf{Y}\_{t+l}|^4 | \mathbf{Y}\_{t+l+\upsilon}|^4]^{\frac{1}{4}} = c\_4$$

If *l* > *v* + 1, *YtYt*<sup>+</sup>*<sup>l</sup>* and *Yt*+*lYt*<sup>+</sup>*l*+*<sup>v</sup>* are irrelevant, then

$$E(\mathbf{Y}\_t \mathbf{Y}\_{t+\upsilon} \mathbf{Y}\_{t+l} \mathbf{Y}\_{t+\upsilon+l}) = E(\mathbf{Y}\_t \mathbf{Y}\_{t+\upsilon})E(\mathbf{Y}\_{t+l} \mathbf{Y}\_{t+l+\upsilon}) = R^2(\upsilon)$$

$$\frac{1}{T} \sum\_{l=0}^T \{ E(\mathbf{Y}\_t \mathbf{Y}\_{t+\upsilon} \mathbf{Y}\_{t+l} \mathbf{Y}\_{t+\upsilon+l}) - R^2\_{\upsilon} \} \le \frac{1}{T} \{ (c\_4 - R^2(\upsilon))(\upsilon + 1) \} \to 0$$

We can obtain:

$$\lim\_{T \to \infty} E[(\mathcal{R}\_n^2(v) - \mathcal{R}^2(v))^2] = 0$$

Which implies:

$$
\mathbb{R}^2(v) \xrightarrow{p} \mathbb{R}^2(v).
$$

The above equation is holding for variable *Yt*:

$$\mathcal{Y}\_t = X\_t - \mu\_{X\_{\prime}} \mathring{R}\_Y^2(\upsilon) \stackrel{p}{\to} R\_Y^2(\upsilon),$$

We need to prove that

$$P(|\mathcal{R}\_X^2(v) - \mathcal{R}\_X^2(v)| \ge \epsilon) \to 0$$

Because of *R*<sup>2</sup> *<sup>X</sup>*(*v*) = *<sup>R</sup>*<sup>2</sup> *<sup>Y</sup>*(*v*), rewriting the above equation as

$$\begin{aligned} P(|\mathring{R}\_X^2(\upsilon) - R\_X^2(\upsilon)| \ge \epsilon) &= P(|\mathring{R}\_X^2(\upsilon) - \mathring{R}\_Y^2(\upsilon) + \mathring{R}\_Y^2(\upsilon) - R\_X^2(\upsilon)| \ge \epsilon) \\ &\le P(|\mathring{R}\_X^2(\upsilon) - \mathring{R}\_Y^2(\upsilon)| \ge \epsilon/2) + P(|\mathring{R}\_Y^2(\upsilon) - R\_Y^2(\upsilon)| \ge \epsilon/2) \end{aligned}$$

*R*ˆ 2 *<sup>Y</sup>*(*v*) and *<sup>R</sup>*<sup>ˆ</sup> <sup>2</sup> *<sup>X</sup>*(*v*) expressing as

$$\mathcal{R}\_X^2(\upsilon) = \frac{1}{T} \sum\_{t=1}^{T-k} (X\_{t+k} - \mathcal{X})(X\_t - \mathcal{X}),\\\mathcal{R}\_Y^2(\upsilon) = \frac{1}{T} \sum\_{t=1}^{T-k} Y\_{t+k} Y\_t = \frac{1}{T} \sum\_{t=1}^{T-k} (X\_{t+k} - \mu\_X)(X\_t - \mu\_X)$$

Then the following expression can be derived

$$\begin{split}P(|\mathbb{R}\_X^2(v) - \mathbb{R}\_Y^2(v)| \ge \varepsilon/2) &= P((\mathbb{X}^2 - \mu\_X^2) + (\mu\_X - \mathbb{X})(\mathbb{X}\_{l+k} + \mathbb{X}\_T) \ge \varepsilon/2) \\ &\le P(\bar{\mathcal{X}}^2 - \mu\_X^2 \ge \varepsilon/4) + P((\mu\_X - \bar{\mathcal{X}})(\bar{\mathcal{X}}\_{l+k} + \bar{\mathcal{X}}\_T) \ge \varepsilon/4) \\ &\ge \varepsilon/4) \end{split}$$

Since *<sup>X</sup>*¯ *<sup>p</sup>* → *μX*, by Slutsky's theorem

$$(\mathbb{X}^2 - \mu\_X^2 \overset{p}{\to} 0, (\mu\_X - \mathbb{X})(\mathbb{X}\_{t+k} + \mathbb{X}\_T) \overset{p}{\to} 0)$$

Consequently,

*<sup>P</sup>*(|*R*<sup>ˆ</sup> <sup>2</sup> *<sup>X</sup>*(*v*) <sup>−</sup> *<sup>R</sup>*<sup>2</sup> *<sup>X</sup>*(*v*)| ≥ ) → 0, *T* → ∞

#### **References**

