1. Introduction
In practice, we can often observe a series of integer-valued data that have their own distinguishing characteristics, and many models were proposed for modeling integer-valued time series, such as the integer-valued autoregressive (INAR) process introduced by McKenzie (1985) [
1], and Al-Osh and Alzaid (1987) [
2]; the integer-valued moving average process proposed by Al-Osh and Alzaid (1988) [
3]; the integer-valued autoregressive moving-average model defined by McKenize (1988) [
4]; and the integer-valued generalized autoregressive conditional heteroscedasticity (INGARCH) model proposed by Ferland et al. (2006) [
5], among others. Here we focus on two kinds of the models above: one is the INAR process, which was introduced as a convenient way to transfer the usual autoregressive structure to a discrete-valued time series, and a
p-order model, which is defined as follows:
where
for
and
is a sequence of independent and identically distributed (i.i.d.) non-negative integer-valued random variables with
and
. The binomial thinning operator ∘ is defined by Steutel and Van Harn (1979) [
6] as:
where
are i.i.d. Bernoulli random variables, independent of
X, with a success probability are defined by
. This model has been generalized by Qian and Zhu (2022) [
7], and Huang et al. (2023) [
8], among others.
The other is the INGARCH model which was proposed by Ferland et al. (2006) [
5] to model the observations of integer-valued time series which exist heteroscedasticity; this INGARCH
model with a Poisson deviate is defined as:
where
, and
is the
-field generated by
. This model has been generalized by Hu (2016) [
9], Liu et al. (2022) [
10], and Weiß et al. (2022) [
11], among others. Weiß (2018) [
12] and Davis et al. (2021) [
13] gave recent reviews. According to definitions of INAR and INGARCH models, we noticed that the INAR model is thinning-based, while the INGARCH model is specified by a conditional distribution with a time-varying mean depending on past observations. Combining the thinning-based stochastic equations and the INGARCH model, Aknouche and Scotto (2022) [
14] proposed a multiplicative thinning-based INGARCH (MthINGARCH) model to model the integer-valued time series with high overdispersion and persistence. Furthermore, it fits well with heavy-tailed data regardless of the choice of innovation distribution and does not require recourse to complex random coefficient equations. The MthINGARCH model is denoted by:
where the symbol ∘ stands for the binomial thinning operator, and
,
and
,
m is a fixed positive integer number that was introduced for more flexibility. Since there is no explicit probability mass function for the series
, then the traditional maximum likelihood estimation (MLE) cannot be applied to estimate the parameters; therefore, Aknouche and Scotto (2022) [
14] used a two-stage weighted least squares estimation instead.
Note that the probability mass function of the random variables cannot be given directly for the likelihood function in some cases; to solve this problem, saddlepoint approximation has been proposed. Daniel (1954) [
15] introduced saddlepoint techniques into the statistical field, which have been extended by Field and Ronchetti (1990) [
16], Jensen (1995) [
17], and Butler (2007) [
18]. Saddlepoint techniques have been used successfully in many applications because of the high accuracy with which they can approximate intractable densities and tail probabilities. Pedeli et al. (2015) [
19] proposed an alternative approach based on the saddlepoint approximation to log-likelihood, and the saddlepoint maximum likelihood estimation (SPMLE) was used to estimate the parameters of the INAR model, which demonstrates the usefulness of this technique. Thus, through combining the MthINGARCH model of Aknouche and Scotto (2022) [
14] and the saddlepoint approximation, we propose a modified multiplicative thinning-based INARCH model for modeling high overdispersion, before applying the saddlepoint method to the estimated parameters. Although the two-stage weighted least squares estimation could be used to estimate the parameters of our modified model, we still adopted the SPMLE as it was still expected to have a better performance than the two-stage weighted least squares estimation in practice. Here, we just consider the INARCH model instead of the INGARCH model because it is difficult and complex to give the conditional cumulant-generating function of random variables for the latter model when applying the saddlepoint approximation.
This article has the following structure. A modified multiplicative thinning-based INARCH model is given, alongside some related properties in
Section 2. Moreover, we use the Poisson distribution and geometric distribution for innovations.
Section 3 discusses the SPMLE and its asymptotic properties, then simulation studies for both models with SPMLE are also given. A real data example is analyzed with our modified models in
Section 4, and comparisons with existing models are made. In-sample and out-of-sample forecasts are used to show the superiority of the SPMLE and our modified model. The conclusion is given in
Section 5. Some details of SPMLE and proof of some theorems are presented in the
Appendix A.
2. A Multiplicative Thinning-Based INARCH Model
Note that
and
are the set of non-negative integers and integers, respectively. It can be supposed that
is a sequence of i.i.d. random variables with a mean of one and finite variance of
. The modified multiplicative thinning-based INARCH (denoted by the MthINARCH
) model, which we deal with in this paper, is defined by
where
,
,
m is a fixed positive integer number. In real applications, we can set
m as the upper integer part of the sample mean. It is assumed that the Bernoulli terms corresponding to the binomial variables
and
are mutually independent and independent of the sequence
. The reason that we defined the new model in this way can be explained as follows. The additive term 1 in
and in (
1) is unnatural, and is posed to ensure
, but we can achieve this by adjusting the range of
; therefore, we adopted a simple version of
in (
2).
Now that we discuss the conditional mean and conditional variance of
. Note that
is the
-field generated by
. For
, let
. Then we can obtain the conditional variance; first, let
and
. For
, so
. Therefore,
Proposition 1. The necessary and sufficient condition for the first-order stationarity of defined in (2) is that all roots of should lie outside the unit circle. Proposition 2. The necessary and sufficient condition for the second-order stationarity of defined in (2) is that Proofs of Propositions 1 and 2 are similar to the proofs of Theorems 2.1 and 2.2 in Aknouche and Scotto (2022) [
14], so we omit the details.
For convenience, we need to specify the distribution of
in (
2). First, we let
, then
, and this model is denoted by PMthINARCH
. It is easy to obtain
Second, let
. The mean of
is
, so we have
and the variance is
. This model is denoted by GMthINARCH
, then we have
4. A Real Example
Here, we considered the number of tick changes by the minute of the euro to the British pound exchange rate (ExRate for short) on December 12th from 9.00 a.m. to 9.00 p.m. The dataset is available at the website
http://www.histdata.com/ (accessed on 17 January 2023). The series comprises of 720 observations with a sample mean of 13.2153 and a sample variance of 224.2498. Obviously, the sample variance is much larger than the sample mean, which shows high overdispersion, and this high overdispersion can also be seen in
Figure 1a.
Figure 1b,c are the plots of the autocorrelation function (ACF), and the partial autocorrelation function (PACF) means that we know the tick changes are correlated.
We analyzed the data using the PMthINARCH
model, GMthINARCH
model, Poisson INAR
(here denoted by PINAR
for short) model, and the INARCH
model. The Poisson INAR model is mentioned in Pedeli et al. (2015) [
19], and the SPMLE was used to estimate the parameters. Here, the innovations in the PINAR model were assumed to be Poisson with a mean of one. The INARCH model with a Poisson deviate was proposed by Ferland et al. (2006) [
5], and the MLE was used to estimate the parameters. According to Aknouche and Scotto (2022) [
14], in real applications, we can set
m as the upper integer part of the sample mean. Here the sample mean is 13.2153, so
m is set to the value of 14.
Table 3 gives the estimates of SPMLE and the values of the Akaike information criterion (AIC) and Bayesian information criterion (BIC). According to
Table 3, it is clear to see that the values of AIC and BIC of PMthINARCH
and GMthINARCH
are smaller than those of the PINAR
and INARCH
models, the values of AIC and BIC of INARCH
are smaller than those of the PINAR
model. Moreover, the values of AIC and BIC of PMthINARCH
are smaller than those of GMthINARCH
. In summary, the INARCH model performed better than the PINAR model; meanwhile, the PMthINARCH model and GMthINARCH model performed better than the PINAR model and INARCH model.
According to Aknouche and Scotto (2022) [
14], the two-stage weighted least squares estimation (2SWLSE) was used to estimate the parameters of the MthINGARCH model. Therefore, to compare the performance of 2SWLSE and SPMLE, and the performance of PMthINARCH, GMthINARCH, and PINAR models, to consider the in-sample and out-of-sample forecasts of these two estimation methods and the three models above, respectively. First, we considered the in-sample forecast. We used all of the observations to estimate the model, and then we could forecast the last 10 observations 711–720, the last 15 observations 706–720, and the last 20 observations 701–720; these three-time horizons of in-sample forecast are denoted by C1, C2, and C3, respectively. Similar to the in-sample forecast process, we also considered the out-of-sample forecast and divided all the observations into three-time horizons: the first one was 1–710 and 711–720, the second one was 1–705 and 706–720, and the third one was 1–700 and 701–720, which are denoted by D1, D2, and D3, respectively.
Here we illustrate the performance of the considered models by comparing the MADEs of each forecast. The MADEs of in-sample forecasts and out-of-sample forecasts for three models with SPMLE are shown in
Table 4. The MADEs of the in-sample forecasts and out-of-sample forecasts for the PMthINARCH model with 2SWLSE and SPMLE are shown in
Table 5, and the in-sample forecasts and out-of-sample forecasts for the GMthINARCH model with 2SWLSE and SPMLE are shown in
Table 6. According to
Table 4, the MADEs of PMthINARCH
and GMthINARCH
are smaller than those of PINAR
,
Table 5 and
Table 6 show that the MADEs of PMthINARCH
and GMthINARCH
of SPMLE are smaller than those of 2SWLSE; meanwhile, in these three Tables, the MADEs of in-sample forecasts were smaller than those of out-of-sample forecasts. In summary, the PMthINARCH model and GMthINARCH model were superior to the PINAR model in modeling this real data set, and the PMthINARCH model performed better than the GMthINARCH model. Meanwhile, the performance of SPMLE was better than 2SWLSE for MthINARCH models.
5. Conclusions
In this paper, we modified a multiplicative thinning-based INARCH model. The probability mass function of random variables is provided by saddlepoint approximation. We used the SPMLE to estimate the parameters and obtain the asymptotic distribution of the SPMLE. Moreover, to show the superiority of the MthINARCH models and the SPMLE, we used the PMthINARCH process and GMthINARCH process for discussion and comparison. The SPMLE performs well in the simulation studies. A real dataset indicates that the PMthINARCH model and the GMthINARCH model are able to describe the overdispersed integer-valued data, and the real data example leads to a superior performance of the MthINARCH models compared with the PINAR and INARCH models. In addition, the results also show a superior performance of SPMLE compared with 2SWLSE.
For further discussion, more research is needed for some aspects. Here we used the Poisson distribution and geometric distribution for ; however, we could use the negative binomial distribution or some zero-inflated distributions as well. Moreover, we just considered the INARCH model, so the corresponding INGARCH model should be considered as well.