1. Introduction
In this paper we consider the cumulative sum (CUSUM) monitoring procedure for detecting a parameter change in integer-valued generalized autoregressive heteroscedastic (INGARCH) models. Integer-valued time series is a core area in time series analysis that includes diverse disciplines in social, physical, engineering, and medical sciences. Both integer-valued autoregressive (INAR) time series models and the integer-valued generalized autoregressive conditional heteroscedastic (INGARCH) models have been widely studied in the literature and applied to various practical problems. Refer to McKenzie [
1], Al-Osh and Alzaid [
2], Ferland, Latour and Oraichi [
3], Fokianos, Rahbek and Tjøstheim [
4], and Weiβ [
5] for a general review. Poisson, negative binomial (NB), and one-parameter exponential family distributions have been widely used as underlying distributions, as seen in Davis and Wu [
6], Zhu [
7], Zhu [
8], Jazi, Jones and Lai [
9], Christou and Fokianos [
10], Davis and Liu [
11], Lee, Lee and Chen [
12], and Chen, Khamthong and Lee [
13].
Since Page [
14], the CUSUM test has been a conventional tool to detect a structural change in underlying models. For a history and background, we refer to Csörgö and Horváth [
15], Chen and Gupta [
16], Lee, Ha, Na and Na [
17], and the papers cited therein. Several authors have studied the change point test for INGARCH models, including Fokianos and Fried [
18], Fokianos and Fried [
19], Franke, Kirch and Kamgaing [
20], Fokianos, Gombay and Hussein [
21], Hudecová [
22], Hudecová, HuŠková and Meintanis [
23], Kang and Lee [
24], Lee, Lee and Chen [
12], Lee, Lee and Tjøstheim [
25], and Lee and Lee [
26]. This CUSUM scheme has been applied not only to retrospective change point tests but also to on-line monitoring and statistical process control (SPC) problems, designed to monitoring abnormal phenomena in manufacturing processes and health care surveillance. The CUSUM control chart has been popular due to its considerable competency in early detection of anomalies. Refer to Weiβ [
27], Rakitzis, Maravelakis and Castagliola [
28], Kim and Lee [
29], and the papers cited therein. Meanwhile, Gombay and Serban [
30] used the CUSUM approach based on the score vectors for independent observations, and later extended it to autoregressive processes, wherein the Type I probability error is measured for obtaining control limits instead of the conventional average run length (ARL). Their CUSUM monitoring process is based on the asymptotic property of the partial sum process generated from score vectors. Later, Huh, Kim and Lee [
31] adopted their method for analyzing Poisson INGARCH models, and compared its performance with the likelihood ratio (LR)-based control chart, originally considered by Weiss and Testik [
32].
In this work, taking the approach of Gombay and Serban [
30] and Huh, Kim and Lee [
31], we designate a robust monitoring process based on the minimum distance power divergence estimator (MDPDE) proposed by Basu, Harris, Hjort and Jones [
33]. We do this because the MDPDE is well-known to be suitable for robust inference in various models, having a trade-off between efficiency and robustness controlled through the tuning parameters with little loss in asymptotic efficiency relative to the maximum likelihood estimator (MLE) (Riani, Atkinson, Corbellini and Perrotta [
34]). The MDPDE method has been successfully applied to various time series models, and in particular INGARCH models (Kim and Lee [
35], Kim and Lee [
36]). Recently, Lee and Lee [
26] and Kim and Lee [
37] considered the CUSUM tests based on score vectors for the MLE and MDPDE in exponential family distribution INGARCH models. See also Kang and Song [
38]. Using their results within the framework of Gombay and Serban [
30] and Huh, Kim and Lee [
31], we design an MDPDE-based monitoring process to detect a model parameter change in INGARCH models. Monte Carlo simulations are conducted to assess the performance of the proposed monitoring procedure. A focus is made on comparing the MDPDE-based CUSUM test with the MLE-based CUSUM test for Poisson INGARCH models to demonstrate the superiority of the former over the latter in the presence of outliers. A real data analysis of the return times of extreme events of Goldman Sachs Group (GS) stock prices is also provided to illustrate the validity of the proposed test.
The rest of the paper is organized as follows.
Section 2 reviews the MDPDE for INGARCH models and
Section 3 constructs the monitoring procedure for these models and investigates its asymptotic properties.
Section 4 presents a simulation study and
Section 5 provides a real data analysis.
Section 6 concludes the paper. The proof of the main theorem is provided in
Appendix A.
2. MDPDE for INGARCH Model: An Overview
In this section, we briefly review the MDPDE for INGARCH models in [
36]. Let
be the observations generated from integer-valued time series models with the conditional distribution of the one-parameter exponential family:
where
is a
-field generated by
, and
is a non-negative bivariate function, depending on the parameter
, and satisfies
for some
for all
, and
is a probability mass function given by
where
is the natural parameter,
and
are known functions, and both
A and
are strictly increasing. In particular,
and
is the conditional variance of
. In what follows, symbols
and
are also utilized to stand for
and
, respectively.
Davis and Liu [
11] demonstrated that the strict stationarity and ergodicity of
, and the expression of
are allowed for some nonnegative measurable function
defined on
under the contraction condition: for all
and
,
with constants
satisfying
.
Meanwhile, Basu, Harris, Hjort and Jones [
33] considered the minimum distance power divergence estimator (MDPDE) for model parameters using the density power divergence
between two density functions
g and
h, defined by:
Kim and Lee [
36] studied the MDPDE for one parameter exponential family distribution INGARCH models. Given
generated from (
1), the MDPDE is defined by
where
and
is updated recursively through the equations:
with an initial value
.
Below,
denotes the true value of
and is assumed to be an interior point in the compact parameter space
. Moreover, it is assumed that
,
,
a.s. implies
, and
a.s. implies
. Furthermore,
is twice continuously differentiable with respect to
and satisfies
Assuming
for some
, Kim and Lee [
36] verified that the MDPDE is strongly consistent. Additionally, they showed that provided
and
where
V and
denote a generic integrable random variable and a constant, respectively, the symbol
denotes the
-norm for matrices and vectors, and expectation
is taken under
, the MDPDE is asymptotically normal with asymptotic variance
where
and
is the same as
with
in (
3) replaced by
.
Moreover, additionally assuming
Kim and Lee [
37] showed that the CUSUM test statistics designed for detecting a change in
have the limiting null distribution of the sup of a Brownian bridge. In practice,
is often harnessed and an optimal
can be selected through the method of Warwick [
39] and Warwick and Jones [
40]; see Remark 1 of Kim and Lee [
36].
In the literature, the following linear INGARCH model has been frequently used:
where
and
satisfy
and
. Here, we assume that
is an interior of a compact neighborhood
for some
Moreover, the Poisson INGARCH(1,1) model with
and the NB-INGARCH(1,1) model with
, where NB
denotes a negative binomial (NB) distribution with parameters
and
, satisfy the aforementioned regularity conditions. Those conditions should be checked analytically when one aims to use a specific distribution as the conditional distribution of the INGARCH model. In this case, a goodness of fit test could be conducted to check the adequacy of the assumed underlying distribution (Fokianos and Neumann [
41]).
3. MDPDE-Based Monitoring Process
In this section, we consider a monitoring process detecting a significant change in the underlying models based on sequentially observed time series
following Model (
1), given a training sample
from Model (
1), where
is a sequence of positive integers that diverges to
∞ as
n tends to
∞. For this task, we set up the following hypotheses:
We first consider the case that
is known a priori from a past experience. Then we consider the monitoring process using the process
,
, constructed as
where
is the score vector as in (
3) based on
and
where
is the score vector based on the training sample. Here, the notation
with
is defined to be the vector with the
jth entry equal to
for
, and
for
. Similar versions of
and
based on MLE have been considered by Gombay and Serban [
30] and Huh, Kim and Lee [
31] for the AR and Poisson INGARCH models, while
is newly considered here. An anomaly is signaled at
k when
,
, or
get out of a control limit for some
, and the control limit can be determined using the convergence result in Theorem 1 addressed below.
Next, we consider the situation that
is unknown and must be estimated in the construction of the monitoring process in (
5). We employ a monitoring process constructed based on
, where
is the MDPDE of
obtained from the training sample and
which is obtained by substituting
in
in (
6) with
, namely,
An anomaly is detected at k when , , or get out of the control limit for some . The control limit can be determined theoretically using the asymptotic result in Theorem 1 addressed below. For this task, we investigate the asymptotic behavior of the monitoring processes , and defined below.
Let
, where
and
are the ones in (
4), and
Using Donsker’s invariance principle for martingale differences (Billingsley [
42]) and the fact that
in distribution for any standard Brownian motion
B, we obtain
where
and denote a
d-dimensional standard Brownian motion, so that
as
behaves asymptotically similarly to
. Meanwhile, we can see that
where
is a
d-dimensional Brownian bridge.
Using the above facts, we are led to attain the following theorem, whose proof is provided in the
Appendix A.
Theorem 1. Assume that(A.1)–(A.11)hold. Then, under , as , and converge to T in distribution, and the same holds for and if . Moreover, converges to in distribution as , and so does if .
The result in Theorem 1 can be used to determine a control limit for the monitoring process. Given significance level , we take c and satisfying . In particular, , so that c can be obtained from the fact that . The performance of the proposed CUSUM monitoring methods is evaluated in our simulation study, focusing on , , and . (We do not report the result for and , as these do not perform well compared to the others in most cases). Therein, a parametric bootstrap is adopted in obtaining control limits to reduce the parameter estimation effect, which can be more problematic when m is not so large compared to n, and the MDPDE from the training sample is used to generate the bootstrap sample.
4. Simulation Results
In this section, we compare the performance of the CUSUM monitoring processes
,
, and
in three different experimental environments for the Poisson INGARCH(1,1) model as follows:
For the comparison, we compute the empirical sizes and powers at the nominal level of 0.05 for
with 1000 implications. For the critical value of
, we use 2.633, which is the 0.95th quantile of
. However, for
and
, we use the critical values obtained from a parametric bootstrap method, as the MDPDE
might cause some size distortions. In implementation, the warp-bootstrap method is utilized to save computing times (Giacomini, Politis, and White [
43]).
-Part 1. We compare the performance of MLE- and MDPDE-based monitoring processes () by calculating the size and power for the four different cases of changing parameter from to when the parameter change is assumed to occur at .
Case 1: , , ; that is, all parameters change;
Case 2: , , ; that is, only changes;
Case 3: , , ; that is, only a changes;
Case 4: , , ; that is, only b changes.
-Part 2. We examine the size and power for the same settings as in Part 1 when the change occurs at .
-Part 3. We compare the performance of MLE- and MDPDE-based monitoring processes () for the same settings as in Part 1 when outliers exist in the time series, wherein the parameter change is assumed to occur at . In this case time series samples are generated from where is the INGARCH process with the parameters as in Part 1, are iid Bernoulli random variables with success probability p, and are iid Poisson variables wit intensity . Here, , and are all independent.
Figure 1 shows how the parameter change affects the pattern of the Poisson INGARCH(1,1) time series (Case 3) with
,
, and
for the left panel and
for the right panel. As
, we can see that parameter change causes a mean shift.
Table 1,
Table 2 and
Table 3 list the size and powers for Part 1 (
therein stands for the location of the change point) and show no severe size distortions and reasonably good powers for
. In particular,
and
largely outperform
in terms of power. However, as seen in
Table 4,
Table 5,
Table 6,
Table 7 and
Table 8, the power of
in Part 2 appears to increase up to that of
. In both Part 1 and Part 2, different
do not affect the size much, but a larger
tends to diminish the power. This appeals to our intuition, as the MLE is more efficient in the presence of no outliers.
Meanwhile,
Table 9,
Table 10,
Table 11 and
Table 12 show that the outliers undermine the performance of the MLE-based monitoring processes in terms of both size and power; namely, size distortions are notable and the power decreases to a certain extent. This result particularly indicates that
is improved when the MDPDE with
is used, which demonstrates the efficacy of the MDPDE in the monitoring process. By contrast, the size of
significantly increases when
, indicating that
is unstable; see
Figure 2. Although not reported here, we also examined the performance of the same monitoring processes for NB INGARCH(1,1) models. The result for this case showed a similar pattern to the Poisson INGARCH(1,1) case. All our findings strongly affirm that
is the most favorable among the monitoring methods considered in this study.
5. Real Data Analysis
In this section, we apply
to a real dataset, using the extreme events of the daily log-returns of GS stock from 2 July 2007 to 28 February 2020. Davis and Liu [
11] and Kim and Lee [
37] used the GS stock datasets with different periods, but their works were focused on parameter estimation and the retrospective change point test. For the task of online monitoring, we first calculated the hitting times,
for which the log-returns of GS stock fall outside the 0.05 and 0.95 quantiles of the data, and generated the time series of counts
,
.
Figure 3 plots
and exhibits the presence of a number of outliers. Fitting the Poisson INGARCH(1,1) model to the whole observations, we have the MLE of
and the MDPDE of
=
when
is used. The significant difference between the two estimates is seemingly due to the presence of outliers. Using
as a training sample and viewing
,
as sequentially observed testing data, we implement the monitoring process
with
to detect a parameter change. Subsequently, an anomaly is detected when
for
(blue vertical line) and
for
(red vertical line), which indicates that the monitoring process based on the MLE is more sensitive to relatively smaller outliers lying around
, while that based on MDPDE is more robust to those outliers and detects a more significant change around
, ignoring smaller ones. Obviously, we can see from
Figure 3 that
has a pattern with more fluctuations after
. Our finding affirms the adequacy of the MDPDE-based monitoring process in the presence of outliers.