Generalized Binary Time Series Models

Jentsch, Carsten; Reichmann, Lena

doi:10.3390/econometrics7040047

Open AccessArticle

Generalized Binary Time Series Models

by

Carsten Jentsch

¹ and

Lena Reichmann

^1,2,*

¹

Faculty of Statistics, TU Dortmund University, D-44221 Dortmund, Germany

²

Mathematical Institute, University of Mannheim, D-68131 Mannheim, Germany

^*

Author to whom correspondence should be addressed.

Econometrics 2019, 7(4), 47; https://doi.org/10.3390/econometrics7040047

Submission received: 21 June 2019 / Revised: 1 December 2019 / Accepted: 9 December 2019 / Published: 14 December 2019

(This article belongs to the Special Issue Discrete-Valued Time Series: Modelling, Estimation and Forecasting)

Download

Browse Figures

Versions Notes

Abstract

:

The serial dependence of categorical data is commonly described using Markovian models. Such models are very flexible, but they can suffer from a huge number of parameters if the state space or the model order becomes large. To address the problem of a large number of model parameters, the class of (new) discrete autoregressive moving-average (NDARMA) models has been proposed as a parsimonious alternative to Markov models. However, NDARMA models do not allow any negative model parameters, which might be a severe drawback in practical applications. In particular, this model class cannot capture any negative serial correlation. For the special case of binary data, we propose an extension of the NDARMA model class that allows for negative model parameters, and, hence, autocorrelations leading to the considerably larger and more flexible model class of generalized binary ARMA (gbARMA) processes. We provide stationary conditions, give the stationary solution, and derive stochastic properties of gbARMA processes. For the purely autoregressive case, classical Yule–Walker equations hold that facilitate parameter estimation of gbAR models. Yule–Walker type equations are also derived for gbARMA processes.

Keywords:

binary time series; autoregressive-moving average; autocovariance structure; Yule–Walker equations; stationarity

1. Introduction

Categorical time series data are collected in many fields of applications and the statistical research focusing on such data structures evolved considerably over the last years. As an important special case, binary time series that correspond to categorical data with two categories, occur in many different contexts. Often, binary time series are obtained from binarization of observed real-valued data. Such processes are considered, e.g., in Kedem and Fokianos (2002). In Figure 1, we show three real data examples of binary time series from different fields of research. For example, in Figure 1a, the eruption duration of the Old Faithful Geyser in the Yellowstone National Park is binarized using a threshold. It is coded with a value of one if an eruption lasts for longer than three minutes and zero if it is shorter. In economics, the two states of recessions and economic growth are of interest, as discussed, e.g., in Bellégo (2009). One example of a recession/no-recession time series is shown in Figure 1b, where for every quarter it is shown if Italy is in a recession, indicated by zero, or not, indicated by one. Recently, there is great interest in the air pollution in European cities, where an exceedance of the threshold of 50

μ

g/m

^{3}

PM

_{10}

(fine dust) causes a fine dust alarm. The resulting sequence of states of no exceedance corresponding to zero and exceedance corresponding to one is shown in Figure 1c. Further examples can be found, e.g., in geography, where sequences with the two states of dry and wet days are considered, e.g., in Buishand (1978). In biomedical studies, binary time series occur in the case, where the participants keep daily diaries of their disease. For example, in clinical trials, as in Fitzmaurice et al. (1995), the binary self assessment of participants of their arthritis is collected, where poor is indicated by zero and good by one. In natural language processing, the occurrence of vowels as a sequence can be of interest, as considered in Weiß (2009b), where a text is binarized by detecting a consonant or no consonant/vowel as the two states. The binarization of a time series by a threshold, as, e.g., in the PM

_{10}

example, or by categorizing the time series into two states, as, e.g., in dry and wet days, indeed simplifies the real valued time series to a binary version. As mentioned in Kedem (1980), nevertheless, the transformation keeps the random mechanism from which the data are generated. For the example of PM

_{10}

data, it might often be of more interest, whether a certain threshold is crossed (or not) instead of the actual amount. In general, the rhythm within the binarized time series contains a great amount of information of the original data.

As discussed in Kedem (1980), binary Markov chains are typically used for modelling the dependence structure due to their great flexibility. However, the number of parameters to estimate from the data grows exponentially with the order of the Markov model leading to over-parametrization (see, e.g., McKenzie (2003)).

To avoid the estimation of a large number of parameters, Jacobs and Lewis (1983) proposed the class of (new) discrete autoregressive moving-average (NDARMA) models for categorical time series. More precisely, for processes with discrete and finite state space, a parsimonious model is suggested. The idea is to choose the current value for

X_{t}

randomly either from the past values of the time series

X_{t - 1}, \dots, X_{t - p}

or from one of the innovations

e_{t}, e_{t - 1}, \dots, e_{t - q}

with certain probabilities, respectively. This random selection mechanism is described by independent and identically distributed (i.i.d) random vectors

(P_{t}, t \in Z)

with

P_{t} : = [a_{t}^{(1)}, \dots, a_{t}^{(p)}, b_{t}^{(0)}, \dots, b_{t}^{(q)}] \sim M u l t (1; P),

where

M u l t (1; P)

denotes the multinomial distribution with parameter 1 and probability vector

P : = [α^{(1)}, \dots, α^{(p)}, β^{(0)}, \dots, β^{(q)}]

with

α^{(1)}, \dots, α^{(p)} \in [0, 1)

,

β^{(0)} \in (0, 1]

and

β^{(1)}, \dots, β^{(q)} \in [0, 1)

such that

\sum_{i = 1}^{p} α^{(i)} + \sum_{j = 0}^{q} β^{(j)} = 1

. Then, the NDARMA(p,q) model equation is given by

X_{t} = \sum_{i = 1}^{p} a_{t}^{(i)} X_{t - i} + \sum_{j = 0}^{q} b_{t}^{(j)} e_{t - j}, t \in Z,

(1)

where

{(e_{t})}_{t \in Z}

is an i.i.d. process taking values in a discrete and finite state space

S

. Since for each time point t only one entry in the random vector

P_{t}

is realized to be one while all others become zero, the value of

X_{t}

takes either one of the values of

X_{s}

for

s \in {t - 1, \dots, t - p}

or one of the error terms

e_{s}

for

s \in {t, \dots, t - q}

. This sampling mechanism assures that the time series takes values in the state space

S

, such that, e.g., for a binary time series with

S = {0, 1}

, the process stays binary. In contrast to the real-valued ARMA model, the lagged time series values and errors are not weighted according to the model coefficients and summed-up since only one of them is actually multiplied with one and all the others with zero based on the realization of

P_{t}

.

The model parameters are the probabilities of the multinomial distribution, summarized in the parameter vector P, where all entries of P lie in the unit interval and sum-up to one. In comparison to Markov Chains, NDARMA models maintain the nice interpretable ARMA-type structure and have a parsimonious parameterization. Furthermore, NDARMA models fulfill certain Yule–Walker-type equations, as shown in Weiß and Göb (2008).

In Figure 2, one realization of an NDARMA(1,0) process, denoted by NDAR(1),

X_{t} = a_{t} X_{t - 1} + b_{t} e_{t}, [a_{t}, b_{t}] \sim M u l t (1; α, β), β = 1 - α

(2)

with binary state space is shown. NDAR(1) models are probably the simplest members of the NDARMA class, but Figure 2 nicely illustrates the limited flexibility of the whole NDARMA class. The sampling mechanism of choosing the predecessor with some probability

α

tends to generate long runs of the same value in particular when the parameter

α \in (0, 1)

is large. A switching from one state to the other, e.g., from

X_{t - 1} = 0

to

X_{t} = 1

, can only occur, e.g., if the error term

e_{t}

is selected (with probability

1 - α

) and the error term takes the value

e_{t} = 1

. Hence, the NDARMA class does not allow systematically selecting the opposite value of

X_{t - 1}

for

X_{t}

.

As for the NDARMA class all model parameters are restricted to be non-negative, which explains in particular why the NDARMA class can model exclusively non-negative autocorrelations in the data. For the example of a NDAR(1) process, the autocorrelation at lag one is equal to

α \in [0, 1)

, such that any alternating pattern that corresponds to negative model parameters as, e.g., observed in Figure 1a, cannot be captured. For a more detailed discussion of the properties of NDARMA models, we refer also to Jacobs and Lewis (1983) or Weiß (2009a). To increase its flexibility, Gouveia et al. (2018) proposed an extension of the NDARMA model class by using a variation function, but the resulting models do also not allow for negative model parameters and, hence, no negative dependence structure. Hence, whenever negative dependence structure is present in binary time series data, the NDARMA model is not suitable. In fact, in all three data examples of Figure 1, a straightforward estimation based on Yule–Walker estimators leads to at least some negative coefficients, such that NDAR models turn out to be not applicable.

To address this lacking flexibility of the NDARMA model class, we propose a simple and straightforward extension of the original idea of Jacobs and Lewis (1983) that allows also negative serial dependence. The resulting generalized binary ARMA (gbARMA) model class maintains the nicely interpretable model structure. Furthermore, no additional parameters are required to handle the negative dependence, preserving the parsimonious parameterization as well. In Figure 3, a realization of a gbARMA(1,0) process, denoted as gbAR(1), is shown. As a straightforward extension of an NDAR(1) model in Figure 2, gbAR(1) models allow for negative serial dependence. In fact, the range of the autocorrelation at lag one is extended from

[0, 1)

for NDAR(1) to

(- 1, 1)

for gbAR(1) models.

To allow for negative autocorrelation up to some limited extend, Kanter (1975) proposed the binary ARMA model class, where he applied the modulo 2 operator in an ARMA-type model equation. Using the modulo operation assures to stay in the binary state space, but the nice interpretability of the dependence structure in the model is lost since the past values of the time series are summed up prior to the modulo operation, see also McKenzie (1981). We follow a different path in this paper and propose a much simpler operation that enables modeling a systematic change of the state from one time point to the other.

The idea of allowing for negative serial dependence resulting in the gbARMA class is as follows: a negative model parameter

α \in (- 1, 0)

(and hence a negative autocorrelation

α \in (- 1, 0)

) in binary time series data corresponds to the time series systematically changing from one state to the other over time. Hence, the natural idea to incorporate negative serial dependence in the binary NDAR(1) Model (Equation (2)) is to replace

X_{t - 1}

by

1 - X_{t - 1}

as

(1 - X_{t - 1}) = \{\begin{matrix} 1 & for X_{t - 1} = 0 \\ 0 & for X_{t - 1} = 1 \end{matrix}

(3)

holds. This leads to the model equation

X_{t} = a_{t} (1 - X_{t - 1}) + b_{t} e_{t}, [a_{t}, b_{t}] \sim M u l t (1; | α |, β) .

This process has negative autocorrelation

α

at lag one. Note that, in comparison to Equation (2), as

α \in (- 1, 0)

here, we have to use its absolute value

| α |

as the probability to select the

1 - X_{t - 1}

. Altogether, for

α \in (- 1, 1)

, we can define the generalized binary AR(1) (gbAR(1)) process by the model equation

X_{t} = \{\begin{matrix} a_{t} X_{t - 1} + b_{t} e_{t}, [a_{t}, b_{t}] \sim M u l t (1; α, β), & α \in [0, 1) \\ a_{t} (1 - X_{t - 1}) + b_{t} e_{t}, [a_{t}, b_{t}] \sim M u l t (1; | α |, β), & α \in (- 1, 0) \end{matrix} .

(4)

Note that Equation (4) extends the parameter space from

α \in [0, 1)

for NDAR(1) models to

α \in (- 1, 1)

for gbAR(1) models. Further, note that, for identification of the model, we have to assume

β^{(0)} = β \in (0, 1]

. Using indicator variables, Equation (4) can be compactly written as

\begin{matrix} X_{t} & = a_{t} (𝟙_{{α \geq 0}} X_{t - 1} + 𝟙_{{α < 0}} (1 - X_{t - 1})) + b_{t} e_{t} \end{matrix}

(5)

\begin{matrix} = [a_{t}^{(+)} X_{t - 1} + a_{t}^{(-)}] + b_{t} e_{t} \end{matrix}

(6)

with

[a_{t}, b_{t}] \sim M u l t (1; | α |, β)

,

β = 1 - | α |

,

a_{t}^{(+)} : = a_{t} (𝟙_{{α \geq 0}} - 𝟙_{{α < 0}})

and

a_{t}^{(-)} : = a_{t} 𝟙_{{α < 0}}

.

In Figure 3, a realization of a gbAR(1) process with negative parameter

α = - 0.7

is shown, where the time series tends to take systematically the opposite state of the predecessor. The corresponding autocorrelation plot reflects the negative serial dependence leading to an alternating pattern. Runs of the same state can only occur, when the error term

e_{t}

is selected (with probability

1 - | α |

) and the error term

e_{t}

takes the same value as

X_{t - 1}

, that is,

e_{t} = X_{t - 1}

. The empirical autocorrelations for the Old Faithful Geyser data can be found in Figure 4a, where the pronounced alternating behavior clearly indicates negative linear dependence to be present in the data.

The idea of allowing for a negative model coefficient by replacing

X_{t - 1}

by

1 - X_{t - 1}

in gbAR(1) processes (Equation (5)) can be also employed for each parameter in pth order gbAR processes, where each

X_{t - i}

,

i = 1, \dots, p

may be replaced by

1 - X_{t - i}

in the model equation.

The paper is organized as follows. In Section 2, generalized binary AR processes of order

p \in N

are defined, where we also give stationarity conditions and state the stationary solution. Further, stochastic properties are derived that include formulas for the transition probabilities, the marginal distribution, and Yule–Walker equations. As a real data example, we illustrate the applicability of our model class to the geyser eruption data in Section 1. In Section 3, we present several simulation experiments. First, in Section 3.1, for the example of a gbAR(2) model, we illustrate the generality of the resulting gbAR model class in comparison to natural competitors including AR, NDAR, and Markov models of order two, respectively. In Section 3.2, we examine the estimation performance of Yule–Walker estimators in the gbAR models in Section 3.2.1. In Section 3.2.2, we investigate the benefit of using the parsimonious gbAR models in comparison to Markov models and their robustness in cases where the model is mis-specified. By adding a moving-average part to gbAR models in Section 4, ARMA-type extensions of gbAR models leading to gbARMA processes are discussed. We conclude in Section 5. All proofs are deferred to Appendix A.

2. The Generalized Binary Autoregressive (gbAR) Model Class

We define now generalized binary AR(p) (gbAR(p)) models for binary data based on the notation of NDAR(p) models by adopting the idea of replacing

X_{t - 1}

by

1 - X_{t - 1}

for a negative parameter

α

as in Equations (5) and (6) separately for all or some of the lagged values

X_{t - 1}, \dots, X_{t - p}

. To be most flexible, each parameter

α^{(i)}

corresponding to the lagged value

X_{t - i}

,

i = 1, \dots, p

is allowed to be either positive or negative, that is,

α^{(i)} \in (- 1, 1)

, respectively.

2.1. gbAR Models

The parameter vector

P : = [α^{(1)}, \dots, α^{(p)}, β^{(0)}]

contains the probabilities of the multinomial distribution that controls the selection mechanism of NDAR models. As we allow now for

α^{(i)} \in (- 1, 1)

,

i = 1, \dots, p

, i.e., the parameters can be negative, P has to be modified to serve again as a parameter vector of probabilities. This is achieved by taking entry-wise absolute values and we define

P_{| \cdot |} : = [| α^{(1)} |, \dots, | α^{(p)} |, β^{(0)}],

(7)

where

β^{(0)} \in (0, 1]

such that

\sum_{i = 1}^{p} | α^{(i)} | + β^{(0)} = 1

. This enables us to give the definition of the generalized binary AR model of arbitrary order

p \in N

.

Definition 1

(Generalized binary AR processes). Let

{(X_{t})}_{t \in Z}

be a stationary process taking values in

{0, 1}

. Let

{(e_{t})}_{t \in Z}

be a binary error process such that

e_{t}

is independent of

{(X_{s})}_{s < t}

with mean

μ_{e} = E (e_{t}) = P (e_{t} = 1)

and variance

σ_{e}^{2} = V a r (e_{t}) = P (e_{t} = 1) (1 - P (e_{t} = 1)) > 0

. Let

P : = [α^{(1)}, \dots, α^{(p)}, β^{(0)}]

be the parameter vector with

P_{| \cdot |}

as in Equation (7) such that

P_{| \cdot |} 𝟙_{p + 1} = 1

with

𝟙_{p + 1}

the one vector of length

p + 1

. Further, let

\begin{matrix} P_{t} : = (a_{t}^{(1)}, \dots, a_{t}^{(p)}, b_{t}^{(0)}) \sim M u l t (1; P_{| \cdot |}), t \in Z, \end{matrix}

be iid random vectors, which are independent of

{(e_{t})}_{t \in Z}

and

{(X_{t})}_{s < t}

. Then, the process

{(X_{t})}_{t \in Z}

is said to be a generalized binary AR process of order p (gbAR(p)), if it follows the recursion

\begin{matrix} X_{t} = \sum_{i = 1}^{p} [a_{t}^{(+, i)} X_{t - i} + a_{t}^{(-, i)}] + b_{t}^{(0)} e_{t} \end{matrix}

(8)

with

a_{t}^{(+, i)} : = a_{t}^{(i)} (𝟙_{{α^{(i)} \geq 0}} - 𝟙_{{α^{(i)} < 0}})

and

a_{t}^{(-, i)} : = a_{t}^{(i)} 𝟙_{{α^{(i)} < 0}}

for

i = 1, \dots, p

.

By rewriting the random variables

a_{t}^{(\cdot, i)}

,

\cdot \in {-, +}

in the defining model (Equation (8)), the model can be represented in the spirit of Equation (5). However, the benefit of the representation in Equation (8) is that only one random variable is multiplied with the lagged value

X_{t - i}

, whereas

a_{t}^{(-, i)}

is an additional random variable that accounts for the switching that leads to negative model coefficients.

2.2. Stochastic Properties of gbAR Models

Before calculating moments of the binary time series process

{(X_{t})}_{t \in Z}

itself, we first consider the expectation of the random variables related to the multinomial selection mechanism. Noting that

E (a_{t}^{(i)}) = | α^{(i)} |

, we have

\begin{matrix} E (a_{t}^{(-, i)}) & = | α^{(i)} | 𝟙_{{α^{(i)} < 0}} = : α^{(-, i)}, \\ E (a_{t}^{(+, i)}) & = α^{(i)} . \end{matrix}

This enables us to compute the stationary mean

μ_{X} = E (X_{t})

of the process, directly leading to

\begin{matrix} μ_{X} = \frac{\sum_{i = 1}^{p} α^{(-, i)} + β^{(0)} μ_{e}}{1 - \sum_{i = 1}^{p} α^{(i)}} . \end{matrix}

(9)

If all parameters

α^{(1)}, \dots, α^{(p)}

are non-negative, the above formula becomes

μ_{X} = μ_{e}

due to

\sum_{i = 1}^{p} α^{(-, i)} = 0

and

1 - \sum_{i = 1}^{p} α^{(i)} = β^{(0)}

, leading then to the well-known formula for the mean of NDAR(p) models. Otherwise, we have

μ_{X} \neq μ_{e}

for gbAR(p) models in contrast to NDAR(p) models (see, e.g., Weiß (2009a)).

For the familiar stationary condition imposed on the model parameters

α^{(1)}, \dots, α^{(p)}

that all roots of the characteristic polynomial lie outside the unit circle, i.e., if

\begin{matrix} (1 - α^{(1)} z - \dots - α^{(p)} z^{p}) \neq 0 \forall z \leq 1 \end{matrix}

(10)

holds, the stationary solution of the gbAR(p) model can be derived. Note that the condition in (10) is equivalent to

\sum_{i = 1}^{p} | α^{(i)} | < 1

, such that the error has to be selected with strictly positive probability

β^{(0)} > 0

by the multinomial distribution. If the stationarity condition in Equation (10) holds, a moving-average representation of the gbAR(p) process can be derived.

For constructing the stationary solution of the gbAR time series, we follow the common approach based on a multivariate representation of the model, as in (Lütkepohl 2005, Chap. 11.3.2). Precisely, the gbAR(p) model can be written as a p-dimensional gbVAR(1) process

(Y_{t}, t \in Z)

with the following matrices and vectors, such that the first entry of

(Y_{t}, t \in Z)

is equal to the gbAR(p) process. We define

\begin{matrix} Y_{t} : = (\begin{matrix} X_{t} \\ ⋮ \\ X_{t - p + 1} \end{matrix}) (p \times 1) and U_{t} : = (\begin{matrix} e_{t} \\ 0 \\ ⋮ \\ 0 \end{matrix}) (p \times 1) . \end{matrix}

To obtain a vector autoregressive representation for

Y_{t}

, we have to define several matrices that contain the random variables of the multinomial distribution. Precisely, for

\cdot = {-, +}

, let

\begin{matrix} {\tilde{A}}_{t}^{(\cdot)} & : = (\begin{matrix} a_{t}^{(\cdot, 1)} & \dots & a_{t}^{(\cdot, p - 1)} & a_{t}^{(\cdot, p)} \\ 1 & 0 & 0 \\ ⋱ & ⋮ \\ 0 & \dots & 1 & 0 \end{matrix}) and {\tilde{B}}_{t}^{(1)} : = (\begin{matrix} b_{t}^{(0)} & 0 & \dots & 0 \\ 0_{p - 1 \times 1} & 0_{p - 1 \times 1} & \dots & 0_{p - 1 \times 1} \end{matrix}) \end{matrix}

be

p \times p

matrices, where

0_{r \times s}

denotes the

(r \times s)

-dimensional zero matrix. Based on the notation introduced above, gbVAR(p) processes can be represented as a vector-valued gbAR model of first order (gbVAR(1)) as follows

\begin{matrix} Y_{t} = {\tilde{A}}_{t}^{(+)} Y_{t - 1} + {\tilde{A}}_{t}^{(-)} 𝟙_{p} + {\tilde{B}}_{t}^{(1)} U_{t}, \end{matrix}

(11)

where

𝟙_{p}

is the one vector of length p. The above notation enables us to state a moving-average representation of gbAR(p) processes as follows.

Theorem 1

(Moving-average representation of gbAR processes). Let

{(X_{t})}_{t \in Z}

be a stationary gbAR(p) process, that is

{(X_{t})}_{t \in Z}

fulfills Equation (10). Then, we have

(i): For $p = 1$ , the gbAR(1) model has a gbMA(∞)-type representation (in $L_{2}$ -sense), that is,

$X_{t} = \sum_{i = 0}^{\infty} ζ_{i} a_{t - i}^{(-)} + \sum_{i = 0}^{\infty} ζ_{i} b_{t - i}^{(0)} e_{t - i}, t \in Z,$

(12)

where $ζ_{0} : = I_{K}$ and $ζ_{i} : = \prod_{j = 0}^{i - 1} a_{t - j}^{(+)}$ since ${lim}_{k \to \infty} \prod_{i = 0}^{k - 1} a_{t - i}^{(+)} = 0$ in $L_{2}$ .
(ii): For $p \in N$ , the gbAR(p) model has a gbMA(∞)-type representation (in $L_{1}$ -sense), that is,

$\begin{matrix} X_{t} = e_{1}^{T} (\sum_{i = 0}^{\infty} \prod_{j = 0}^{i - 1} {\tilde{A}}_{t - j}^{(+)} {\tilde{A}}_{t - i}^{(-)} 𝟙_{p} + \sum_{i = 0}^{\infty} \prod_{j = 0}^{i - 1} {\tilde{A}}_{t - j}^{(+)} {\tilde{B}}_{t - i}^{(1)} U_{t - i}), t \in Z, \end{matrix}$

(13)

since ${lim}_{k \to \infty} \prod_{i = 0}^{k} {\tilde{A}}_{t - i}^{(+)} = 0_{p \times p}$ in $L_{1}$ . Here, $e_{1}$ is the first unit vector and $𝟙_{p}$ is the one vector of length p. The notation used here is obtained from that used in Section 4.2 for the special case of $q = 0$ .

Hence, the process can be represented as an infinite weighted sum of the error terms. However, in comparison to classical AR or NDAR processes, an additional term appears that takes control of potential negative parameters and, consequently, allows for negative dependence to be modeled. This term vanishes if all parameters

α_{1}, \dots, α_{p}

are positive.

The second-order dependence structure of gbAR processes coincides with that of AR or NDAR processes in the sense that the same Yule–Walker equations for

h \neq 0

hold. However, note again that the parameter space for gbAR models is considerably larger than for NDAR models allowing for negative parameters leading to more flexibility. The Yule–Walker equations link the model parameters to the autocovariances of the process. Hence, they can be used for estimating the model parameters by the same well-known Yule–Walker estimators. A link between the autocovariances, the model coefficients, and the mean and variance of the error terms is established by the Yule–Walker equation for

h = 0

, respectively.

Theorem 2

(Yule Walker Equations). Let

{(X_{t})}_{t \in Z}

be a stationary gbAR(p) process.

(i): For all $h \in N$ , we have

$\begin{matrix} γ (h) = \sum_{i = 1}^{p} α^{(i)} γ (| h - i |) . \end{matrix}$

(14)
(ii): For $h = 0$ , we have

$\begin{matrix} γ (0) = σ_{e}^{2} + \frac{(1 - 2 μ_{X}) \sum_{i = 1}^{p} α^{(-, i)} + (\sum_{i = 1}^{p} | α^{(i)} | - 1) μ_{X}^{2} + β^{(0)} μ_{e}^{2}}{(1 - \sum_{i = 1}^{p} | α^{(i)} |)} . \end{matrix}$

(15)

The next Lemma states some basic properties of the marginal distribution of gbAR processes and their transition probabilities. Since the time series has a binary state space, these conditional probabilities allow quantifying the probability to reach a certain state from the past values. In their derivation, the multinomial selection mechanism plays a crucial role and, in the stated formulas, the Kronecker delta

δ_{i j}

indicates if a past value has actually impact on the outcome of the time series or not.

Lemma 1

(Marginal, joint and transition probabilities of gbAR processes). Let

{(X_{t})}_{t \in Z}

be a stationary gbAR(p) model and set

p_{i} : = P (e_{t} = i)

. Then, the following properties hold:

(i): $P (X_{t} = i_{0} | X_{t - 1} = i_{1}, \dots, e_{t} = j_{0})$
$= \sum_{l = 1}^{p} | α^{(l)} | [𝟙_{{α^{(l)} \geq 0}} δ_{i_{0} i_{l}} + 𝟙_{{α^{(l)} < 0}} δ_{i_{0} (1 - i_{l})}] + β^{(0)} δ_{i_{0} j_{0}}$
(ii): $P (X_{t} = i_{0} | X_{t - 1} = i_{1}, \dots)$
$= \sum_{l = 1}^{p} | α^{(l)} | [𝟙_{{α^{(l)} \geq 0}} δ_{i_{0} i_{l}} + 𝟙_{{α^{(l)} < 0}} δ_{i_{0} (1 - i_{l})}] + β^{(0)} p_{i_{0}}$
(iii): $P (X_{t} = j) =$ $\frac{β^{(0)}}{(1 - \sum_{i = 1}^{p} α^{(i)})} p_{j} + \frac{\sum_{i = 1}^{p} | α^{(i)} | 𝟙_{{α^{(i)} < 0}}}{(1 - [\sum_{i = 1}^{p} α^{(i)}])}$
(iv): $P (X_{t} = i_{0}, e_{t} = j_{0}) =$ $β^{(0)} δ_{i_{0} j_{0}} + \sum_{i = 1}^{p} | α^{(i)} | 𝟙_{{α^{(i)} < 0}} (1 - p_{j_{0}} + \frac{p_{j_{0}}}{1 - \sum_{i = 1}^{p} | α^{(i)} | [𝟙_{{α^{(i)} \geq 0}} - 𝟙_{{α^{(i)} < 0}}]})$

Comparing the results in Lemma 1 with (Weiß 2009a, Lemma 11.2.1.3) established for NDARMA(p,q) processes, the main difference is in Part (iii). The marginal distribution of the NDARMA process is equal to the marginal distribution of the error term process, but this does not hold for gbAR processes. Instead, the marginal distribution of gbAR processes depends on an additional term that results from the absolute values of the negative parameters.

In the following example, let us conclude this section with a more detailed look at the gbAR(1) model and a real data example.

Example 1

(gbAR(1) process). Let

(X_{t}, t \in Z)

be a gbAR(1) process with parameter vector

P : = [α^{(1)}, β^{(0)}]

,

α^{(1)} \in (- 1, 1)

and

β^{(0)} = 1 - | α^{(1)} |

. The iid error term process

(e_{t}, t \in Z)

follows the distribution

P (e_{t} = 1) = p_{1} \in (0, 1)

such that

μ_{e} = p_{1}

and

σ_{e}^{2} = p_{1} (1 - p_{1}) > 0

. Then, the model equation equals

\begin{matrix} X_{t} & = a_{t}^{(1)} [𝟙_{{α^{(1)} \geq 0}} X_{t - 1} + 𝟙_{{α^{(1)} < 0}} (1 - X_{t - 1})] + b_{t}^{(0)} e_{t}, \\ [a_{t}^{(1)}, b_{t}^{(0)}] \sim M u l t (1; [| α^{(1)} |, β^{(0)}]) . \end{matrix}

At each time point t, if

α^{(1)} \geq 0

, either the predecessor

X_{t - 1}

with probability

α^{(1)}

or the error term

e_{t}

with probability

β^{(0)}

is selected by a multinomial distributed random variable to determine

X_{t}

. In the case of

α^{(1)} < 0

, either

1 - X_{t - 1}

with probability

| α^{(1)} |

or the error term

e_{t}

with probability

β^{(0)}

is selected. That is, as for each t, either

a_{t}

or

b_{t}

is equal to one and the other is zero, it holds

\begin{matrix} X_{t} = \{\begin{matrix} X_{t - 1} & i f a_{t}^{(1)} = 1 f o r α^{(1)} \geq 0 \\ 1 - X_{t - 1} & i f a_{t}^{(1)} = 1 f o r α^{(1)} < 0 \\ e_{t} & i f b_{t}^{(0)} = 1 \end{matrix} . \end{matrix}

For positive values of

α^{(1)}

, the gbAR(1) model coincides with the NDAR(1) model. A corresponding realization is shown in Figure 2, where for large values of

α^{(1)}

mainly the predecessor

X_{t - 1}

is chosen and long runs of the same value occur. Figure 3 shows one realization of a gbAR(1) process with negative value of

α^{(1)}

. The time series switches its states from zero to one and vice versa at most time points.

The transition probability to move from state

i_{1}

at time

t - 1

to state

i_{0}

at time t is given by

\begin{matrix} P (X_{t} = i_{0} | X_{t - 1} = i_{1}) = | α^{(1)} | [𝟙_{{α^{(1)} \geq 0}} δ_{i_{0} i_{1}} + 𝟙_{{α^{(1)} < 0}} δ_{i_{0} (1 - i_{1})}] + (1 - | α^{(1)} |) p_{i_{0}} . \end{matrix}

(16)

The probability of the process taking the value

i_{0} = 1

depends on two terms. First, the probability of choosing the error term is multiplied by the probability of the error term taking the same value as

X_{t}

, e.g.,

P (e_{t} = i_{0}) = p_{i_{0}}

with

i_{0} = 1

. If the probability of choosing the predecessor is added, it depends on its value and the sign of α. If, for example

α < 0

, then the probability of choosing

X_{t - 1}

is just added if its value is the contrary of

i_{0}

, such that the Kronecker delta is equal to one. This leads to the representation of Equation (16) as

\begin{matrix} P (X_{t} = i_{0} | X_{t - 1} = i_{1}) = \{\begin{matrix} | α^{(1)} | δ_{i_{0} i_{1}} + (1 - | α^{(1)} |) p_{i_{0}} & i f α \geq 0 \\ | α^{(1)} | δ_{i_{0} (1 - i_{1})} + (1 - | α^{(1)} |) p_{i_{0}} & i f α < 0 \end{matrix} . \end{matrix}

Example 2

(Eruption duration of the Old Faithful Geyser). The binarized eruption duration of the Old Faithful Geyser is illustrated in Figure 1a. Its empirical autocorrelation, as shown in Figure 4a, clearly indicates that there is negative serial dependence present in the data such that a gbAR(p) process appears to be appropriate. The order selection using the AIC criterion leads to a model of order

p = 2

with

A I C = 159.83

. This selection is confirmed by an inspection of the partial autocorrelation in Figure 4b. Parameter estimation is based on the Yule–Walker Equation (14) leading to the estimated parameter vector

\hat{P} = [- 0.3949, 0.2659, 0.3393]

and the fitted model

\begin{matrix} X_{t} & = \sum_{i = 1}^{2} [a_{t}^{(+, i)} X_{t - i} + a_{t}^{(-, i)}] + b_{t}^{(0)} e_{t}, \\ P_{t} & \sim M u l t (1; [| - 0.3949 |, 0.2659, 0.3393]) . \end{matrix}

The sample mean of the binary time series is equal to

{\hat{μ}}_{X} = 0.6488

since long eruptions of the geyser arise more often than short eruption duration. The first parameter

{\hat{α}}_{1}

is indeed estimated to be negative and the second one

{\hat{α}}_{2}

to be positive. From

{\hat{β}}^{(0)} = 1 - \sum_{i = 1}^{2} | {\hat{α}}^{(i)} |

, an error term is chosen with probability

{\hat{β}}^{(0)} = 0.3393

. In Figure 1a, a change from zero to one or vice versa can be observed in many time steps, whereas the run of ones in the time series correspond in most cases to choosing an error term. The error term distribution is calculated by Equation (9) with

{\hat{μ}}_{e} = P (e_{t} = 1) = 0.9953

.

To measure the predictive power of the estimated model, we use ROC curves and the corresponding area under the curve (AUC). The ROC concept indicates a good predictive performance whenever the resulting curve is “far away” above the diagonal leading to an AUC larger than 0.5. Note that the diagonal corresponds to the case of independent observations, where no prediction based on past values is meaningful. For the one step ahead prediction, the transition probability of Lemma 1 (ii) is used by plugging in the estimated probabilities.

Comparing the predictor to the realized values in the sample leads to the ROC curve shown in Figure 5, where the corresponding AUC becomes

0.8317

. Hence, as the ROC curve is “far away” above the diagonal and the AUC is larger than 0.5, the prediction performance of the gbAR model turns out to be considerably better than that of a model that relies on independent observations. By allowing for negative model parameters, gbAR models appear to be suitable for this real data example that shows negative serial dependence.

For further improvement, the Yule–Walker estimates might serve as starting values for a maximum likelihood estimation (MLE) based on the conditional log-likelihood function

\begin{matrix} ℓ (θ | x_{p}, \dots, x_{1}) = \sum_{t = p + 1}^{T} log p (x_{t} | x_{t - 1}, \dots, x_{t - p}), \end{matrix}

(17)

where

p (x_{t} | x_{t - 1}, \dots, x_{t - p}) : = P (X_{t} = x_{t} | X_{t - 1} = x_{t - 1}, \dots, X_{t - p} = x_{t - p})

(see also (Weiß 2018, (B.6))). However, the resulting parameter estimates

{\hat{P}}_{M L E} = [- 0.3935, 0.2711, 0.3353]

differ only slightly from the Yule–Walker estimates

\hat{P} = [- 0.3949, 0.2659, 0.3393]

, leading to virtually the same ROC and AUC.

To shed some light on the potential improvement of MLE in comparison to Yule–Walker estimation, we fit a gbAR(2) model to subsamples of length

T = 50, 100, 150, 200, 250

of the binarized Old Faithful Geyser data. The parameter estimates for Yule–Walker estimates and MLE are shown in Table 1. The results differ only slightly and decrease with increasing subsample sizes.

3. Generality of the gbAR Model Class and Estimation Performance

In this section, we investigate the generality of the gbAR model class in comparison to obvious competitors in Section 3.1 and address the estimation performance in different setups and in comparison to parameter-intensive Markovian models in Section 3.2.

3.1. Illustration of the Generality of gbAR Models

By construction of the gbAR model and in contrast to NDAR models, negative parameters

α^{(i)} \in (- 1, 1), i = 1, \dots, p

are allowed such that negative autocorrelation is possible. Hence, the proposed gbAR model class clearly generalizes the NDAR model class. In this section, we aim to shed some light on the question how much more general the gbAR model actually is in comparison to other AR-type models such as AR, NDAR, and binAR models, as well as Markov models. For this purpose, we consider such models of order

p = 2

and study their generality. That is, we compare the parameter ranges of these four model classes as well as the possible ranges of pairs of autocorrelation

(ρ (1), ρ (2))

. Precisely, we compare the flexibility of gbAR(2), NDAR(2), binAR(2), and AR(2) processes (even if they model continuous data) and second-order Markov chains. For all four autoregressive-type models, the autocorrelations depend on the model parameters as follows

\begin{matrix} ρ (1) & = \frac{(1 - 2 μ) α^{(1)}}{1 - (1 - 2 μ) α^{(2)}}, \\ ρ (2) & = (1 - 2 μ) \frac{(1 - 2 μ) {(α^{(1)})}^{2}}{1 - (1 - 2 μ) α^{(2)}} + (1 - 2 μ) α^{(2)} . \end{matrix}

For the gbAR(2), AR(2), and NDAR(2) processes, it holds

μ = 0

.

For a stationary AR(2) process, the range of possible coefficients is restricted to

α^{(1)} \in (- 2, 2)

and

α^{(2)} \in [- 1, 1]

such that

α^{(1)} + α^{(2)} < 1

and

α^{(2)} - α^{(1)} < 1

. For a stationary NDAR(2) and binAR(2) process, the parameter range is restricted by

α^{(1)}, α^{(2)} \in [0, 1]

with

α^{(1)} + α^{(2)} < 1

and in the binAR(2) process

P (e_{t} = 1) = μ \in (0, 1)

(for further details, see Weiß (2009b)). For a gbAR(2) model, the restrictions read

| α^{(1)} | + | α^{(2)} | < 1

with

α^{(1)}, α^{(2)} \in (- 1, 1)

.

The parameter ranges of AR, NDAR, binAR and gbAR models of order two, respectively, are illustrated and compared in Figure 6a and the corresponding range of pairs of autocorrelations

(ρ (1), ρ (2))

is shown in Figure 6b.

The parameter range as well as the range of autocorrelation pairs for the gbAR(2) model is considerably larger than those of an NDAR(2) model. The range of the classical AR model is again larger, but this is an unfair comparison as the AR model has been proposed for continuous data and is not suitable for binary data at all. In Figure 6b, the areas of AR(2), NDAR(2), and binAR(2) models are hyperboloid-shaped and, as shown in Jacobs and Lewis (1983), the autocorrelations of the NDAR(2) model take just positive values. In contrast to NDAR(2) processes, the binAR(2) process captures an additional area that corresponds to negative serial dependence. The range of autocorrelation pairs for the gbAR(2) is not hyperboloid-shaped, but forms a triangle. The range of this triangle comes actually close to the range of the AR(2) model, although the comparison with the AR(2) model is indeed unfair as the latter has been proposed for continuous data and the gbAR(2) for binary data. Compared to NDAR(2) processes, the extension allowing also for negative parameters leads to a much larger range of possible autocorrelation pairs than just the mirrored half parable. This is explained by the four times larger possible range for the model parameters of gbAR(2) processes in comparison to NDAR(2) processes, as shown in Figure 6a. In summary, by allowing for negative model parameters in gbAR models, we can get a considerably more flexible model class in comparison to NDAR models that is suitable to capture a wider range of dependence structures of binary time series data. In Figure 7, the possible range of autocorrelation pairs of gbAR(2) processes and Markov chains of order two are shown together. Recall that the gbAR(2) model is a parsimonious member of the class of Markov chains of order two and hence less flexible. Interestingly, with respect to pairs of autocorrelations of lags one and two, the possible range for the gbAR(2) model is only slightly smaller than that of a Markov chain of order two. Moreover, the largest range shown for the (continuous) AR(2) models in Figure 6b (in black) cannot be attained by Markov chains of order two. Hence, gbAR(2) models can to cover a large portion of the possible range of autocorrelation pairs of lags one and two of second-order Markov chains. However, keep in mind that for a pth-order Markov chain,

2^{p}

parameter have to be estimated. For example,

2^{2} = 4

parameters need to be specified for a second-order Markov chain, whereas gbAR(2) processes only require three parameters.

3.2. Simulations

In this simulation study, we addressed two things. First, as shown in Section 3.2.1, we investigated the estimation performance of Yule–Walker estimators for gbAR models of different orders and sample sizes. Second, as shown in Section 3.2.2, we studied the flexibility of the gbAR model class and compared the prediction performance to Markovian models in the case where the estimated model was correctly specified as a gbAR model and in the case where the underlying model was a Markovian model that does not belong to the class of gbAR models.

3.2.1. Estimation Performance

To study the estimation performance of Yule–Walker estimators in gbAR models, we considered three different specifications of gbAR(p) processes with

p = 1, 2, 3

and sample sizes

T = 100, 200, 500, 1000

. Precisely, we considered the following gbAR data generating processes (DGPs):

(DGP1): gbAR(1) with $α^{(1)} = - 0.85$ , $m u_{e} = 0.3$ , $β^{(0)} = 0.15$ and $μ_{X} = 0.48378$ .
(DGP2): gbAR(2) with $α^{(1)} = 0.42$ , $α^{(2)} = - 0.38$ , $m u_{e} = 0.3$ , $β^{(0)} = 0.2$ and $μ_{X} = 0.45833$ .
(DGP3): gbAR(3) with $α^{(1)} = 0 . - 0.294$ , $α^{(2)} = 0.382$ , $α^{(3)} = - 0.2393$ , $m u_{e} = 0.67$ , $β^{(0)} = 0.0847$ and $μ_{X} = 0.52140$ .

The model parameters summarized in P were estimated based on Yule–Walker Equation (14) and the error term distribution using Equation (9). Note that, in all setups, we considered gbAR models that make use of the extended parameter space by including negative parameters

α^{(i)}

in the model. For each DGP, we simulated 1000 replications to calculate the mean squared error (MSE) to measure the estimation performance.

Table 2 summarizes the simulation results for all DGPs and all considered sample sizes. The estimation performance is generally good, as confirmed by rather small MSEs. It turns out that, as expected, in all considered setups, the estimation performance improves with increasing the sample size. It is interesting to note that, relative to the estimation of the other quantities, the estimation of the mean of the error terms

μ_{e}

is generally less precise. This can be explained by the fact that the error terms

e_{t}

do enter the time series only in the case when it is actually selected which happens only with probabilities

β_{D G P 1}^{(0)} = 0.15

,

β_{D G P 2}^{(0)} = 0.2

and

β_{D G P 3}^{(0)} = 0.0874

for the three DGPs, respectively. A comparison of the estimation performance of the gbAR models of different orders shows that the estimation performance declines with increasing order, which is of course plausible as the number of parameters gets larger, leading to more estimation uncertainty.

3.2.2. Robustness of gbAR Model Class

The class of gbAR(p) models form a parsimoniously parametrized subclass in the class of Markovian models of order p. To study the benefit of this newly proposed class of binary models, we wanted to compare the gbAR(p) model to Markov chains which are mostly used for binary data.

First, let us consider the case of an underlying gbAR model. Since gbAR(p) processes have a Markov chain representation, a comparison in terms of the transition probabilities becomes suitable. From Lemma 1 (ii), the transition probabilities of gbAR models compute to

\begin{matrix} P & (X_{t} = i_{0} | X_{t - 1} = i_{1}, \dots, X_{t - p} = i_{p}) \\ = \sum_{l = 1}^{p} | α^{(l)} | [𝟙_{{α^{(l)} \geq 0}} δ_{i_{0} i_{l}} + 𝟙_{{α^{(l)} < 0}} δ_{i_{0} (1 - i_{l})}] + β^{(0)} P (e_{t} = i_{0}) . \end{matrix}

(18)

First, for

p = 1, 2, 3

, we simulated from the gbAR(p) models defined for DGP1-3 in Section 3.2.1 realizations of different sample sizes to estimate the transition probabilities of: (a) a pth-order Markov chain and (b) a gbAR(p) process. For gbAR models, the true transition probabilities are given by Equation (18) and can be estimated by replacing the model parameters by the corresponding estimators. Then, the MSE is calculated model-wise and over all transition probabilities. For all three DGPs with model orders

p = 1, 2, 3

, the simulation results are stated in Table 3. By “MSE gbAR(p)”, we denote the mean squared error by evaluating the difference between the estimated transition probability and the truth over all possible transition probabilities from a gbAR(p) process. Equivalently, “MSE MC” denotes the corresponding difference between the estimated transition probability and the truth of a Markov model over all possible transition probabilities.

The MSEs are calculated over 1000 replications and show clearly that the estimated gbAR(p) transition probabilities have smaller MSEs for all sample sizes and orders in comparison to the MSEs of the Markov chain fits. This indicates that, in the case of an underlying gbAR(p) process, fitting the the more parsimonious model to the data leads to better estimation performance than fitting a Markov chain.

Next, we considered the situation where the underlying model is a Markov chain of order p that does not belong to the subclass of gbAR(p) models. In general, the

2^{p} \times 2

-dimensional transition probability matrix Q of a Markov chain of order p is defined by

\begin{matrix} Q = {(p_{i_{1} i_{2} \dots i_{p}}^{i_{0}})}_{i_{j} = 0, 1, j = 0, \dots, p}, \end{matrix}

(19)

where

\begin{matrix} P (X_{t} = i_{0} | X_{t - 1} = i_{1}, \dots, X_{t - p} = i_{p}) = : p_{i_{1} i_{2} \dots i_{p}}^{i_{0}} . \end{matrix}

(20)

For the simulations, we had to make sure that the used specifications of Q are such that the resulting model is not a member of the gbAR class. For a set of specified transition probabilities, it is actually easy to check whether the resulting model is a gbAR model, by checking whether Equation (18) hold true.

It turns out that the class of gbAR(1) models and the class of binary Markov chains of order 1 coincide. Hence, for the simulations study, we chose transition probabilities such that Equation (18) does not hold for

p = 2, 3

. Precisely, we set

\begin{matrix} Q = (\begin{matrix} 0.46 & 0.54 \\ 0.56 & 0.44 \end{matrix}), Q = (\begin{matrix} p_{00}^{0} & p_{00}^{1} \\ p_{01}^{0} & p_{01}^{1} \\ p_{10}^{0} & p_{10}^{1} \\ p_{11}^{0} & p_{11}^{1} \end{matrix}) = (\begin{matrix} 0.21 & 0.79 \\ 0.69 & 0.31 \\ 0.32 & 0.68 \\ 0.89 & 0.11 \end{matrix}), Q = (\begin{matrix} p_{000}^{0} & p_{000}^{1} \\ p_{001}^{0} & p_{001}^{1} \\ p_{010}^{0} & p_{010}^{1} \\ p_{100}^{0} & p_{100}^{1} \\ p_{011}^{0} & p_{011}^{1} \\ p_{101}^{0} & p_{101}^{1} \\ p_{110}^{0} & p_{110}^{1} \\ p_{111}^{0} & p_{111}^{1} \end{matrix}) = (\begin{matrix} 0.16 & 0.84 \\ 0.26 & 0.74 \\ 0.42 & 0.58 \\ 0.21 & 0.79 \\ 0.75 & 0.25 \\ 0.64 & 0.36 \\ 0.57 & 0.43 \\ 0.94 & 0.06 \end{matrix}) . \end{matrix}

Using such transition probabilities summarized in Q, binary time series were generated. Again, a gbAR(p) process and a pth order Markov chain were fitted. In Table 4, the MSE estimation performance for the different DGPs is summarized. Interestingly, although the corresponding gbAR fits (for

p = 2, 3

) do actually estimate the wrong models, with respect to MSE over all transition probabilities, their estimation performances for small sample sizes are superior to those of Markov chains that estimate the correct models. However, for large sample sizes, the estimated Markov models outperform the mis-specified gbAR model fits. As it is estimating the true model, this pattern was expected. In summary, for time series with small sample size, where the true underlying DGP is indeed a Markov chain and not a gbAR(p) process, the parsimonious gbAR model might be a good approximation, leading potentially to more precise estimates of the transition probabilities although the model is mis-specified.

4. Further Extension: The Generalized Binary ARMA Class

In this section, we extend the gbAR model class and give a definition of generalized binary ARMA (gbARMA) models that additionally contain a moving average part in their model equations. In the spirit of the gbAR model as an extension of the NDAR model class, we allow also for negative parameters in the moving-average part of the model.

First, we provide the definition of the gbARMA(p,q) model, derive its stationary solution, and state some basic properties of marginal, joint, and transition probabilities of gbARMA(p,q) processes. We conclude this section with an example of a gbARMA(1,1) process.

4.1. gbARMA Models

To be most flexible, the gbARMA model class allows additionally for negative parameters to capture negative dependence structure also in the moving average part. As before, we assume

β^{(0)} \in (0, 1]

for identification reasons. In the gbARMA(p,q) model class, the parameters

α^{(i)}

and

β^{(j)}

are allowed to be either positive or negative, e.g.,

α^{(i)}, β^{(j)} \in (- 1, 1)

for

i = 1, \dots, p

and

j = 1, \dots, q

. To modify the parameter vector

P : = [α^{(1)}, \dots, α^{(p)}, β^{(0)}, \dots, β^{(q)}]

, again such that it contains the probabilities, we define

P_{| \cdot |} : = [| α^{(1)} |, \dots, | α^{(p)} |, β^{(0)}, | β^{(1)} | \dots, | β^{(q)} |] .

(21)

Definition 2

(Generalized binary ARMA processes). Let

{(X_{t})}_{t \in Z}

be a stationary process which takes values in

{0, 1}

. Let

{(e_{t})}_{t \in Z}

be a binary error process such that

e_{t}

is independent of

{(X_{s})}_{s < t}

with mean

μ_{e}

and variance

σ_{e}^{2} = V a r (e_{t}) > 0

. Let

P : = [α^{(1)}, \dots, α^{(p)}, β^{(0)}, \dots, β^{(q)}]

be the parameter vector with

P_{| \cdot |}

as in Equation (21) such that

P_{| \cdot |} 𝟙_{p + q + 1} = 1

. Further, let

P_{t} : = (a_{t}^{(1)}, \dots, a_{t}^{(p)}, b_{t}^{(0)}, \dots, b_{t}^{(q)}) \sim M u l t (1; P_{| \cdot |}), t \in Z

be iid random vectors, which are independent of

{(e_{t})}_{t \in Z}

and

{(X_{t})}_{s < t}

. Then, the process

{(X_{t})}_{t \in Z}

is said to be a generalized binary ARMA(p,q) process, if it follows the recursion

X_{t} = \sum_{i = 1}^{p} [a_{t}^{(+, i)} X_{t - i} + a_{t}^{(-, i)}] + b_{t}^{(0)} e_{t} + \sum_{j = 1}^{q} [b_{t}^{(+, j)} e_{t - j} + b_{t}^{(-, j)}]

(22)

with

a_{t}^{(+, i)} : = a_{t}^{(i)} (𝟙_{{α^{(i)} \geq 0}} - 𝟙_{{α^{(i)} < 0}})

,

a_{t}^{(-, i)} : = a_{t}^{(i)} 𝟙_{{α^{(i)} < 0}}

and analogous definitions for

b_{t}^{(+, \cdot)}

and

b_{t}^{(-, \cdot)}

.

The model parameters are contained in the vector P with entries

α^{(i)} \in (- 1, 1)

for

i = 1, \dots, p

,

β^{(0)} \in (0, 1]

and

β^{(j)} \in (- 1, 1)

for

j = 1, \dots, q

. Note that, as

β^{(0)} > 0

holds, no random variable

b_{t}^{(-, 0)}

is contained in the model equation.

With probability

\sum_{i = 1}^{p} | α^{(i)} |

, a predecessor

X_{s}, s \in {t - 1, \dots, t - p}

is chosen, whereas, with probability

\sum_{j = 0}^{q} | β^{(j)} |

, the process takes the value of an error term

e_{s}, s \in {t, \dots, t - q}

, where it follows that

\sum_{i = 1}^{p} | α^{(i)} | + \sum_{j = 0}^{q} | β^{(j)} | = 1

.

4.2. Stochastic Properties of gbARMA Models

When dealing with possibly negative parameters also in the moving-average part of gbARMA models, the idea of Equation (4) is employed also for the lagged error terms. Hence, this allows modeling negative dependence in the moving average part as well. In the multinomial distribution, all values of the parameter vector P have to be considered in absolute value, thus we have to use

P_{| \cdot |}

as defined in Equation (21). For the expectation of gbARMA processes, two additional sums show up in comparison to the NDARMA case. Precisely, we have

\begin{matrix} μ_{X} = \frac{\sum_{i = 1}^{p} α^{(-, i)} + \sum_{j = 1}^{q} β^{(-, j)} + \sum_{j = 0}^{q} β^{(j)} μ_{e}}{1 - \sum_{i = 1}^{p} α^{(i)}} . \end{matrix}

The construction of the stationary solution of the gbARMA time series is similar to the construction of the gbAR(p) process introduced in Section 2.1 and (Lütkepohl 2005, Chap. 11.3.2). The vector representation of the process

(Y_{t}, t \in Z)

is equipped with a moving average part and thus the dimension of the corresponding random matrices becomes

p + q \times p + q

. Precisely, the gbARMA(p,q) model can be written as a

(p + q)

-dimensional gbVAR(1) process

(Y_{t}, t \in Z)

with the following matrices and vectors, such that the first entry of

(Y_{t}, t \in Z)

is equal to the gbARMA(p,q) process. We define

\begin{matrix} Y_{t} : = (\begin{matrix} X_{t} \\ ⋮ \\ X_{t - p + 1} \\ e_{t} \\ ⋮ \\ e_{t - q + 1} \end{matrix}) ((p + q) \times 1) and U_{t} : = (\begin{matrix} e_{t} \\ 0 \\ ⋮ \\ 0 \\ e_{t} \\ 0 \\ ⋮ \\ 0 \end{matrix}) ((p + q) \times 1) . \end{matrix}

To obtain a vector autoregressive representation for

Y_{t}

, we define directly matrices that contain the random variables of the multinomial distribution. Precisely, for

\cdot = {-, +}

, let

\begin{matrix} {\tilde{A}}_{t}^{(\cdot)} & : = (\begin{matrix} A_{t, 11}^{(\cdot)} & A_{t, 12}^{(\cdot)} \\ A_{t, 21}^{(\cdot)} & A_{t, 22}^{(\cdot)} \end{matrix}) and {\tilde{B}}_{t}^{(1)} : = (\begin{matrix} b_{t}^{(0)} & 0 & \dots & 0 \\ 0_{p - 1 \times 1} & 0_{p - 1 \times 1} & \dots & 0_{p - 1 \times 1} \\ 1 & 0 & ⋱ & ⋮ \\ 0_{q - 1 \times 1} & \dots & 0_{q - 1 \times 1} \end{matrix}) \end{matrix}

be

(p + q) \times (p + q)

matrices, where

\begin{matrix} A_{t, 11}^{(\cdot)} : = (\begin{matrix} a_{t}^{(\cdot, 1)} & \dots & a_{t}^{(\cdot, p - 1)} & a_{t}^{(\cdot, p)} \\ 1 & 0 & 0 \\ ⋱ & ⋮ \\ 0 & \dots & 1 & 0 \end{matrix}), A_{t, 12}^{(\cdot)} : = (\begin{matrix} b_{t}^{(\cdot, 1)} & \dots & b_{t}^{(\cdot, q)} \\ 0 & \dots & 0 \\ ⋮ & ⋮ \\ 0 & \dots & 0 \end{matrix}), \\ A_{t, 22}^{(\cdot)} : = (\begin{matrix} 0 & \dots & 0 & 0 \\ 1 & 0 & 0 \\ ⋱ & ⋮ \\ 0 & \dots & 1 & 0 \end{matrix}) \end{matrix}

are

p \times p

,

p \times q

and

q \times q

matrices, respectively, and

A_{t, 21} : = 0_{q \times p}

. Based on the notation introduced above, gbARMA(p,q) processes can be represented as a vector-valued gbAR model of first order (gbVAR(1)) as follows

\begin{matrix} Y_{t} = {\tilde{A}}_{t}^{(+)} Y_{t - 1} + {\tilde{A}}_{t}^{(-)} 𝟙_{p + q} + {\tilde{B}}_{t}^{(1)} U_{t} \end{matrix}

(23)

with

𝟙_{p + q}

being the one vector of length

p + q

.

To derive a suitable stationarity condition for the process, we know from Lütkepohl (2005) that it corresponds to the characteristic polynomial of the parameter matrix

\tilde{A} : = E ({\tilde{A}}_{t}^{(+)})

.

\begin{matrix} d e t (I_{K (p + q)} - \tilde{A} z) \neq 0 \forall | z | \leq 1 . \end{matrix}

From the block structure of

\tilde{A}

, the polynomial can be reduced to the determinant of the block matrices

A_{11}^{(+)} : = E (A_{t, 11}^{(+)})

and

A_{22}^{(+)} : = E (A_{t, 22}^{(+)})

. Hence, a gbARMA(p,q) process is stationary if the roots of the characteristic polynomial of the autoregressive part lie outside the unit circle, that is, if

\begin{matrix} (1 - α^{(1)} z - \dots - α^{(p)} z^{p}) \neq 0 \forall | z | \leq 1 \end{matrix}

holds. The assumption is fulfilled whenever an error term has a positive probability, such that there exists a

| β^{(j)} | > 0

for some

j \in {0, \dots, q}

. Therefore, the sum over all probabilities of choosing a predecessor fulfills

\sum_{i = 1}^{p} | α^{(i)} | < 1

. Without any restriction, we assume that

β^{(0)}

is strictly positive for a stationary gbARMA process, i.e.,

β^{(0)} \in (0, 1]

.

For a stationary gbARMA(p,q) process, a moving average representation can be derived using the above defined vectors and matrices.

Theorem 3

(Moving Average representation of gbARMA processes). Let

{(X_{t})}_{t \in Z}

be a stationary gbARMA(p,q) process with gbVAR(1) representation (Equation (23) ). Then, it follows that

\begin{matrix} X_{t} = e_{1}^{T} (\sum_{i = 0}^{\infty} \prod_{j = 0}^{i - 1} {\tilde{A}}_{t - j}^{(+)} {\tilde{A}}_{t - i}^{(-)} 𝟙_{p + q} + \sum_{i = 0}^{\infty} \prod_{j = 0}^{i - 1} {\tilde{A}}_{t - j}^{(+)} {\tilde{B}}_{t - i}^{(1)} U_{t - i}), \end{matrix}

where

{lim}_{k \to \infty} \prod_{i = 0}^{k} {\tilde{A}}_{t - i}^{(+)} = 0_{(p + q) \times (p + q)}

in

L_{1}

and

e_{1}

is the first unit vector.

The univariate moving average representation is obtained from the multivariate formula by multiplying it with the first unit vector because of

X_{t} = e_{1}^{T} Y_{t}

.

Considering the autocorrelation structure, Jacobs and Lewis (1983) and Weiß (2011) showed that the NDARMA(p,q) model fulfils a set of Yule–Walker type equations which was also derived by Möller and Weiß (2018) for the GenDARMA class of categorical processes. The following result shows that this property is maintained for the gbARMA class.

Theorem 4

(Yule–Walker-type equations). Let

{(X_{t})}_{t \in Z}

be a stationary gbARMA(p,q) process. Set

β^{(k)} : = 0

for

k > q

. Define coefficients

{(ϕ_{k})}_{k \in Z}

recursively by

\begin{matrix} ϕ_{k} = 0 f o r k < 0, ϕ_{0} = β^{(0)}, ϕ_{k} = \sum_{i = 1}^{p} α^{(i)} ϕ_{k - i} + β^{(k)} f o r k > 0 . \end{matrix}

Then, the autocovariance function for lags

k > 0

is obtained by

\begin{matrix} γ (k) - \sum_{i = 1}^{p} α^{(i)} γ (| k - i |) = σ_{e}^{2} \sum_{j = k}^{q} β^{(j)} ϕ_{j - k} \end{matrix}

The autocovariances of the NDARMA and GenDARMA processes can only be positive, whereas the Yule–Walker type equations of gbARMA processes allow for possibly negative model parameters

α^{(i)}, β^{(j)} \in (- 1, 1)

for

i = 1, \dots, p

and

j = 1, \dots, q

.

For the generalized binary ARMA model, formulas for the marginal, joint and transition probabilities can be calculated, extending the results from Lemma 1.

Lemma 2

(Marginal, joint, and transition probability of gbARMA processes). Let

{(X_{t})}_{t \in Z}

be a stationary gbARMA(p,q) process. Then, the following properties hold:

(i): $P (X_{t} = i_{0} | X_{t - 1} = i_{1}, \dots, e_{t} = j_{0} . \dots)$
$= \sum_{l = 1}^{p} | α^{(l)} | [𝟙_{{α^{(l)} \geq 0}} δ_{i_{0} i_{l}} + 𝟙_{{α^{(l)} < 0}} δ_{i_{0} (1 - i_{l})}] + β^{(0)} δ_{i_{0} j_{0}}$
$+ \sum_{k = 1}^{q} | β^{(k)} | [𝟙_{{β^{(k)} \geq 0}} δ_{i_{0} j_{k}} + 𝟙_{{β^{(k)} < 0}} δ_{i_{0} (1 - j_{k})}]$
(ii): Defining $p_{i} : = P (e_{t} = i)$ then it follows
$P (X_{t} = i_{0} | X_{t - 1} = i_{1}, \dots, e_{t - 1} = j_{1} . \dots)$
$= \sum_{l = 1}^{p} | α^{(l)} | [𝟙_{{α^{(l)} \geq 0}} δ_{i_{0} i_{l}} + 𝟙_{{α^{(l)} < 0}} δ_{i_{0} (1 - i_{l})}]$
$+ \sum_{k = 1}^{q} | β^{(k)} | [𝟙_{{β^{(k)} \geq 0}} δ_{i_{0} j_{k}} + 𝟙_{{β^{(k)} < 0}} δ_{i_{0} (1 - j_{k})}] + β^{(0)} p_{i_{0}}$
(iii): $P (X_{t} = j) =$ $\frac{[\sum_{l = 1}^{q} | β^{(l)} | (𝟙_{{β^{(l)} \geq 0}} - 𝟙_{{β^{(l)} < 0}})] + β^{(0)}}{(1 - \sum_{i = 1}^{p} α^{(i)})} p_{j} + \frac{\sum_{i = 1}^{p} | α^{(i)} | 𝟙_{{α^{(i)} < 0}} + \sum_{l = 1}^{q} | β^{(l)} | 𝟙_{{β^{(l)} < 0}}}{(1 - \sum_{i = 1}^{p} α^{(i)})}$
(iv): $P (X_{t} = i_{0}, e_{t} = j_{0}) =$

$\begin{matrix} p_{i_{0}} p_{j_{0}} [\frac{\sum_{l = 1}^{q} | β^{(l)} | 𝟙_{{β^{(l)} < 0}}}{1 - \sum_{i = 1}^{p} | α^{(i)} | [𝟙_{{α^{(i)} \geq 0}} - 𝟙_{{α^{(i)} < 0}}]} - \sum_{l = 1}^{q} | β^{(l)} | 𝟙_{{β < 0}}] + β^{(0)} δ_{i_{0} j_{0}} \\ + \sum_{l = 1}^{q} | β^{(l)} | 𝟙_{{β^{(l)} < 0}} (1 - p_{j_{0}} + \frac{p_{j_{0}}}{1 - \sum_{i = 1}^{p} | α^{(i)} | [𝟙_{{α^{(i)} \geq 0}} - 𝟙_{{α^{(i)} < 0}}]}) \\ + \sum_{i = 1}^{p} | α^{(i)} | 𝟙_{{α^{(i)} < 0}} (1 - p_{j_{0}} + \frac{p_{j_{0}}}{1 - \sum_{i = 1}^{p} | α^{(i)} | [𝟙_{{α^{(i)} \geq 0}} - 𝟙_{{α^{(i)} < 0}}]}) \end{matrix}$

The flexibility of gbARMA models obtained by allowing for negative parameters shows also in the transition probabilities and in the joint and marginal distributions. Hence, more complex structures can be captured since systematic changes in the error terms are allowed as well.

We conclude this section with an example of a gbARMA(1,1) model.

Example 3

(gbARMA(1,1) process). Let

{(X_{t})}_{t \in Z}

be a stationary gbARMA(1,1) process. Then, the process follows the recursion

\begin{matrix} X_{t} = a_{t}^{(+, 1)} X_{t - 1} + a_{t}^{(-, 1)} + b_{t}^{(0)} e_{t} + b_{t}^{(+, 1)} e_{t - 1} + b_{t}^{(-, 1)} \end{matrix}

Four sign combinations of parameter pairs are possible and the corresponding model equations are given as follows:

\begin{matrix} X_{t} = \{\begin{matrix} a_{t}^{(1)} X_{t - 1} + b_{t}^{(0)} e_{t} + b_{t}^{(1)} e_{t - 1} & f o r α^{(1)} \geq 0, β^{(1)} \geq 0 \\ a_{t}^{(1)} (1 - X_{t - 1}) + b_{t}^{(0)} e_{t} + b_{t}^{(1)} e_{t - 1} & f o r α^{(1)} < 0, β^{(1)} \geq 0 \\ a_{t}^{(1)} X_{t - 1} + b_{t}^{(0)} e_{t} + b_{t}^{(1)} (1 - e_{t - 1}) & f o r α^{(1)} \geq 0, β^{(1)} < 0 \\ a_{t}^{(1)} (1 - X_{t - 1}) + b_{t}^{(0)} e_{t} + b_{t}^{(1)} (1 - e_{t - 1}) & f o r α^{(1)} < 0, β^{(1)} < 0 \end{matrix} . \end{matrix}

Whereas for identification purposes

β^{(0)}

only takes positive values, the predecessors

X_{t - 1}

and

e_{t - 1}

are systematically switched if the corresponding model parameters are negative, respectively.

For a stationary gbARMA(1,1) process, the moving average representation fulfills the following equation:

\begin{matrix} X_{t} = \sum_{j = 0}^{\infty} \prod_{i = 0}^{j - 1} a_{t - i}^{(+, 1)} a_{t - j}^{(-, 1)} + \sum_{j = 0}^{\infty} \prod_{i = 0}^{j - 1} a_{t - i}^{(+, 1)} b_{t - j}^{(0)} e_{t - j} + \sum_{j = 0}^{\infty} \prod_{i = 0}^{j - 1} a_{t - i}^{(+, 1)} [b_{t - j}^{(+, 1)} e_{t - (j + 1)} + b_{t - j}^{(-, 1)}] . \end{matrix}

From the stationarity assumption, we have

| α^{(1)} | < 1

,

β^{(0)} \in (0, 1]

and

| β^{(1)} | \in [0, 1)

.

The moving average representation consists of three parts. There first is a sum over all terms

a_{t - j}^{(-, 1)}

for the potential case of

α^{(1)} < 0

. This part accounts for the choosing a predecessor and its switching. Since

β^{(0)}

is strictly positive, the second is a sum over all error terms without any modification occurs. In the third sum, the random variable

b_{t}^{(-, 1)}

appears for controlling the case of

β^{(1)} < 0

.

5. Conclusions

By extending the NDARMA model class of Jacobs and Lewis (1983) to allow for negative parameters in the binary state space, the generalized binary ARMA model remains parsimonious, but it becomes more flexible to allow for negative model parameters and, hence, negative dependence structure in the data. The extension of the model to a more general parameter space enables the application to real data without having that many restrictions as in the NDARMA model class. Although the extension leads to additional terms in the model equation, the Yule–Walker equations still provide a direct way to estimate the model parameters.

We discuss stationarity conditions for gbARMA models and derive the stationary solution. The resulting moving average representation shows an additional term, compared to most MA(∞)-type representations. These additional terms control for the switching of the states.

An illustration of autocorrelation pairs

(ρ (1), ρ (2))

of four different models of order 2 shows a comparison of the captured dependence structure of the time series models. It reveals that the proposed gbARMA model can capture a wide range of negative and positive dependence structures. A second-order Markov chain is shown to capture only a slightly larger range of negative dependence structure that gbAR(2) models. Hence, by allowing for negative parameters, the proposed extension of the NDARMA model class leads to a new model class that allows capturing a wide range of dependence structures in binary time series data, while maintaining a parsimonious parametrization. Moreover, in small sample sizes, parsimonious gbAR models might turn out to be beneficial in cases where the model is actually mis-specified as they may provide a sufficient approximation to the true model.

Author Contributions

Both authors contributed to the methodology, investigation, writing and formal analysis. The software was written by L.R. and the supervision and conceptualization was given by C.J.

Funding

This research was financially supported by the Eliteprogramme for Postdocs of the Baden—Württemberg Stiftung.

Acknowledgments

Large parts of this research were conducted while Carsten Jentsch held a position at the University of Mannheim, where he was financially supported by the German Research Foundation DFG via the Collaborative Research Center SFB 884 “Political Economy of Reforms” (Project B6).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Appendix A.1. Proof of Theorem 1

Proof.

(i)

By recursively inserting the model equation, the process can be expressed as

\begin{matrix} X_{t} = \prod_{i = 0}^{k - 1} a_{t - i}^{(+)} X_{t - k} + \sum_{i = 0}^{k - 1} \prod_{j = 0}^{i - 1} a_{t - j}^{(+)} a_{t - i}^{(-)} + \sum_{i = 0}^{k - 1} \prod_{j = 0}^{i - 1} a_{t - j}^{(+)} b_{t - i}^{(0)} e_{t - i} . \end{matrix}

Since the random variables

a_{t}^{(+)}

take values in

{- 1, 0, 1}

and the process is also binary with mean

μ_{X} \in (0, 1)

, the convergence of the first part follows directly in quadratic mean (in

L_{2}

sense), that is,

\begin{matrix} E (| \prod_{i = 0}^{k - 1} (a_{t - i}^{(+)}) X_{t - k} |^{2}) & = E (\prod_{i = 0}^{k - 1} | a_{t - i} |^{2} {| X_{t - k} |}^{2}) = E (\prod_{i = 0}^{k - 1} a_{t - i} X_{t - k}) \\ = α^{k} E (X_{t - k}) ⟶_{k \to \infty} 0 for α \in (- 1, 1) . \end{matrix}

Part

(i i)

follows from Theorem 3 by setting

q = 0

. Its proof can be found in Appendix A.2. □

Appendix A.2. Proof of Theorem 3

Proof.

The convergence is shown by using the p-norm

{∥A∥}_{p}

for a matrix A, which is induced by the vector norm, that is,

{∥A∥}_{p} = {(\sum_{j = 1}^{K} \sum_{i = 1}^{K} {| a_{i j} |}^{p})}^{\frac{1}{p}} .

For

p = 1

, we get

\begin{matrix} lim_{k \to \infty} E {∥\prod_{l = 0}^{k - 1} {\tilde{A}}_{t - l}^{(+)}∥}_{1} = lim_{k \to \infty} E (\sum_{i = 1}^{p + q} \sum_{j = 1}^{p + q} {| (\prod_{l = 0}^{k - 1} {\tilde{A}}_{t - l}^{(+)}) |}_{i j}) \\ = lim_{k \to \infty} E [(\begin{matrix} 1 & \dots & 1 \end{matrix}) (| \prod_{l = 0}^{k - 1} {\tilde{A}}_{t - l}^{(+)} |) (\begin{matrix} 1 \\ ⋮ \\ 1 \end{matrix})] = lim_{k \to \infty} (\begin{matrix} 1 & \dots & 1 \end{matrix}) \prod_{l = 0}^{k - 1} E [| {\tilde{A}}_{t - l}^{(+)} |] (\begin{matrix} 1 \\ ⋮ \\ 1 \end{matrix}) \\ = lim_{k \to \infty} (\begin{matrix} 1 & \dots & 1 \end{matrix}) \prod_{l = 0}^{k - 1} E [{\tilde{A}}_{t - i}] (\begin{matrix} 1 \\ ⋮ \\ 1 \end{matrix}) = lim_{k \to \infty} (\begin{matrix} 1 & \dots & 1 \end{matrix}) \prod_{l = 0}^{k - 1} {\tilde{A}}_{| \cdot |} (\begin{matrix} 1 \\ ⋮ \\ 1 \end{matrix}) \\ = lim_{k \to \infty} (\begin{matrix} 1 & \dots & 1 \end{matrix}) {\tilde{A}}_{| \cdot |}^{k} (\begin{matrix} 1 \\ ⋮ \\ 1 \end{matrix}) \overset{k \to \infty}{⟶} 0_{(p + q) \times (p + q)} f o r | α_{i j} | \in [0, 1) . \end{matrix}

Since the entries of

{\tilde{A}}_{t - l}^{(+)}

lie in

{- 1, 0, 1}

, it follows that

| {\tilde{A}}_{t - l}^{(+)} | : = {\tilde{A}}_{t - l} \in {0, 1}

and the expectation is given by

E ({\tilde{A}}_{t - l}) = : {\tilde{A}}_{| \cdot |}

with only positive values in

(0, 1)

. Consequently, the term vanishes for

k \to \infty

and as result we get the gbVMA

(\infty)

representation. □

Appendix A.3. Proof of Theorem 2 and 4

Proof.

For a stationary gbARMA process, it follows

\begin{matrix} γ (k) = C o v (X_{t}, X_{t - k}) \\ = C o v (\sum_{i = 1}^{p} a_{t}^{(i)} [𝟙_{{α^{(i)} \geq 0}} X_{t - i} + 𝟙_{{α^{(i)} < 0}} (1 - X_{t - i})] + \sum_{j = 0}^{q} b_{t}^{(j)} [𝟙_{{β^{(j)} \geq 0}} e_{t - j} + 𝟙_{{β^{(j)} < 0}} (1 - e_{t - j})], X_{t - k}) \\ = \sum_{i = 1}^{p} α^{(i)} C o v (X_{t - i}, X_{t - k}) + \sum_{j = 0}^{q} β^{(j)} C o v (e_{t - j}, X_{t - k}) \end{matrix}

The above equation leads in the Yule–Walker equations of Theorem 2

(i)

since

q = 0

and

C o v (e_{t}, X_{t - k}) = 0

for

k > 0

.

We now have to consider the covariance of the error terms and the time series. Therefore, we define the variable

φ_{k} : = C o v (X_{t}, e_{t - k})

. For

k < 0

, it follows

φ_{k} = 0

and, for

k = 0

, we have

\begin{matrix} φ_{0} = C o v (X_{t}, e_{t}) = \sum_{i = 1}^{p} α^{(i)} C o v (X_{t - i}, e_{t}) + \sum_{j = 0}^{q} β^{(j)} C o v (e_{t - j}, e_{t}) = σ_{e}^{2} β^{(0)} . \end{matrix}

For

k \geq 1

, we get

\begin{matrix} φ_{k} & = C o v (X_{t}, e_{t - k}) = \sum_{i = 1}^{p} α^{(i)} C o v (X_{t - i}, e_{t - k}) + \sum_{j = 0}^{q} β^{(j)} C o v (e_{t - j}, e_{t - k}) \\ = \sum_{i = 1}^{p} α^{(i)} φ_{k - i} + β^{(k)} σ_{e}^{2} . \end{matrix}

By defining

ϕ_{k} : = \frac{φ_{k}}{σ_{e}^{2}}

, the recursion of Theorem 4 follows.

Part

(i i)

of Theorem 2 is directly obtained by inserting the model equation and by using the property of multinomial choosing only one entry of

P_{t}

equal to one and all others to zero, such that

\begin{matrix} E ({(a_{t}^{(+, i)})}^{2}) = | α^{(i)} |, E ({(a_{t}^{(-, i)})}^{2}) = | α^{(i)} | 𝟙_{{α^{(i)} < 0}} . \end{matrix}

□

Appendix A.4. Proof of Lemma 1 and 2

Proof.

(i) The conditional probability is an immediate consequence from the model equation and multinomial distribution of the random variables

a_{t}

and

b_{t}

.

(ii) With the independence assumption on the error terms and Part

(i)

, the conditional probability without conditioning on the current error term is:

\begin{matrix} P (X_{t} = i_{0} | X_{t - 1} = i_{1}, \dots, e_{t - 1} = j_{1} . \dots) \\ = \sum_{j_{0} = 0}^{1} P (X_{t} = i_{0} | X_{t - 1} = i_{1}, \dots, e_{t} = j_{0}, \dots) P (e_{t} = j_{0} | X_{t - 1} = i_{1}, \dots, e_{t - 1} = j_{1}, \dots) \\ = \sum_{l = 1}^{p} | α^{(l)} | [𝟙_{{α^{(l)} \geq 0}} δ_{i_{0} i_{l}} + 𝟙_{{α^{(l)} < 0}} δ_{i_{0} (1 - i_{l})}] + \sum_{k = 1}^{q} | β^{(k)} | [𝟙_{{β^{(k)} \geq 0}} δ_{i_{0} j_{k}} + 𝟙_{{β^{(k)} < 0}} δ_{i_{0} (1 - j_{k})}] \\ + β^{(0)} p_{i_{0}} . \end{matrix}

(iii) Consider the probability that the time series is in state

j \in {0, 1}

at time point t and note that

P (e_{t} = 1 - j) = 1 - P (e_{t} = j), P (X_{t} = 1 - j) = 1 - P (X_{t} = j)

.

\begin{matrix} P (X_{t} = j) & = \sum_{i = 1}^{p} | α^{(i)} | [𝟙_{{α^{(i)} \geq 0}} P (X_{t - i} = j) + 𝟙_{{α^{(i)} < 0}} P (X_{t - i} = 1 - j)] \\ + \sum_{l = 1}^{q} | β^{(l)} | [𝟙_{{β^{(l)} \geq 0}} P (e_{t - l} = j) + 𝟙_{{β^{(l)} < 0}} P (e_{t - l} = 1 - j)] + β^{(0)} P (e_{t} = j) \\ = \sum_{i = 1}^{p} | α^{(i)} | [𝟙_{{α^{(i)} \geq 0}} P (X_{t} = j) + 𝟙_{{α^{(i)} < 0}} P (X_{t} = 1 - j)] \\ + \sum_{l = 1}^{q} | β^{(l)} | [𝟙_{{β^{(l)} \geq 0}} P (e_{t} = j) + 𝟙_{{β^{(l)} < 0}} P (e_{t} = 1 - j)] + β^{(0)} P (e_{t} = j) \\ = (\sum_{i = 1}^{p} | α^{(i)} | [𝟙_{{α^{(i)} \geq 0}} - 𝟙_{{α^{(i)} < 0}}]) P (X_{t} = j) + (\sum_{l = 1}^{q} | β^{(l)} | [𝟙_{{β^{(l)} \geq 0}} - 𝟙_{{β^{(l)} < 0}}]) p_{j} \\ + β^{(0)} p_{j} + \sum_{i = 1}^{p} | α^{(i)} | 𝟙_{{α^{(i)} < 0}} + \sum_{l = 1}^{q} | β^{(l)} | 𝟙_{{β^{(l)} < 0}} . \end{matrix}

Then, by rearranging the terms on the last right side, we get

\begin{matrix} P (X_{t} = j) \\ = \frac{[\sum_{l = 1}^{q} | β^{(l)} | (𝟙_{{β^{(l)} \geq 0}} - 𝟙_{{β^{(l)} < 0}})] + β^{(0)}}{(1 - \sum_{i = 1}^{p} α^{(i)})} p_{j} + \frac{\sum_{i = 1}^{p} | α^{(i)} | 𝟙_{{α^{(i)} < 0}} + \sum_{l = 0}^{q} | β^{(l)} | 𝟙_{{β^{(l)} < 0}}}{(1 - \sum_{i = 1}^{p} α^{(i)})} . \end{matrix}

(iv) Consider the joint probability of the error term and time series at time point t. We get

\begin{matrix} P (X_{t} = i_{0}, e_{t} = j_{0}) \\ = \sum_{i = 1}^{p} | α^{(i)} | 𝟙_{{α^{(i)} \geq 0}} P (X_{t - i} = i_{0}, e_{t} = j_{0}) + \sum_{i = 1}^{p} | α^{(i)} | 𝟙_{{α^{(i)} < 0}} P (X_{t - i} = 1 - i_{0}, e_{t} = j_{0}) \\ + \sum_{l = 1}^{q} | β^{(l)} | 𝟙_{{β^{(l)} \geq 0}} P (e_{t - l} = i_{0}, e_{t} = j_{0}) + \sum_{l = 1}^{q} | β^{(l)} | 𝟙_{{β^{(l)} < 0}} P (e_{t - l} = i_{0}, e_{t} = j_{0}) \\ + β^{(0)} P (e_{t} = i_{0}, e_{t} = j_{0}) \\ = \sum_{i = 1}^{p} | α^{(i)} | 𝟙_{{α^{(i)} \geq 0}} P (X_{t - i} = i_{0}) p_{j_{0}} + \sum_{i = 1}^{p} | α^{(i)} | 𝟙_{{α^{(i)} < 0}} P (X_{t - i} = 1 - i_{0}) p_{j_{0}} \\ + \sum_{l = 1}^{q} | β^{(l)} | 𝟙_{{β^{(l)} \geq 0}} P (e_{t - l} = i_{0}) p_{j_{0}} + \sum_{l = 1}^{q} | β^{(l)} | 𝟙_{{β^{(l)} < 0}} P (e_{t - l} = 1 - i_{0}) p_{j_{0}} \\ + β^{(0)} P (e_{t} = i_{0}, e_{t} = j_{0}) . \end{matrix}

By inserting Part (iii) into the equation above, we get

\begin{matrix} \sum_{i = 1}^{p} | α^{(i)} | p_{j_{0}} [𝟙_{{α^{(i)} \geq 0}} - 𝟙_{{α^{(i)} < 0}}] & (\frac{1 - \sum_{i = 1}^{p} | α^{(i)} | - \sum_{l = 0}^{q} | β^{(l)} | 𝟙_{{β^{(l)} < 0}}}{(1 - [\sum_{i = 1}^{p} | α^{(i)} | (𝟙_{{α^{(i)} \geq 0}} - 𝟙_{{α^{(i)} < 0}})])} p_{i_{0}} \\ + \frac{\sum_{i = 1}^{p} | α^{(i)} | 𝟙_{{α^{(i)} < 0}} + \sum_{l = 0}^{q} | β^{(l)} | 𝟙_{{β^{(l)} < 0}}}{(1 - [\sum_{i = 1}^{p} | α^{(i)} | (𝟙_{{α^{(i)} \geq 0}} - 𝟙_{{α^{(i)} < 0}})])}) + β^{(0)} p_{j_{0}} δ_{i_{0}, j_{0}} \\ + \sum_{i = 1}^{p} | α^{(i)} | 𝟙_{{α^{(i)} < 0}} + \sum_{l = 1}^{q} | β^{(l)} | p_{j_{0}} [𝟙_{{β^{(l)} \geq 0}} p_{i_{0}} + 𝟙_{{β^{(l)} < 0}} (1 - p_{i_{0}})] . \end{matrix}

Using the properties of the parameters, the joint distribution of the time series and error term is given by

\begin{matrix} P (X_{t} = i_{0}, e_{t} = j_{0}) \\ = p_{i_{0}} p_{j_{0}} [\frac{\sum_{l = 1}^{q} | β^{(l)} | 𝟙_{{β^{(l)} < 0}}}{1 - \sum_{i = 1}^{p} | α^{(i)} | [𝟙_{{α^{(i)} \geq 0}} - 𝟙_{{α^{(i)} < 0}}]} - \sum_{l = 1}^{q} | β^{(l)} | 𝟙_{{β < 0}}] + β^{(0)} δ_{i_{0} j_{0}} \\ + \sum_{l = 1}^{q} | β^{(l)} | 𝟙_{{β^{(l)} < 0}} (1 - p_{j_{0}} + \frac{p_{j_{0}}}{1 - \sum_{i = 1}^{p} | α^{(i)} | [𝟙_{{α^{(i)} \geq 0}} - 𝟙_{{α^{(i)} < 0}}]}) \\ + \sum_{i = 1}^{p} | α^{(i)} | 𝟙_{{α^{(i)} < 0}} (1 - p_{j_{0}} + \frac{p_{j_{0}}}{1 - \sum_{i = 1}^{p} | α^{(i)} | [𝟙_{{α^{(i)} \geq 0}} - 𝟙_{{α^{(i)} < 0}}]}) . \end{matrix}

□

References

Bellégo, Christophe, and Laurent Ferrara. 2009. Forecasting Euro-Area Recessions Using Time-Varying Binary Response Models for Financial. Working Paper. Paris, France: Banque de France, vol. 259. [Google Scholar]
Buishand, T. Adri. 1978. The Binary DARMA(1,1) Process as a Model for Wet-Dry-Sequences. Technical Note 78-01. Wageningen: Department of Mathematics, Statistics Division, Agricultural University Wageningen. [Google Scholar]
Fitzmaurice, Garrett M., and Stuart R. Lipsitz. 1995. A Model for Binary Time Series Data with Serial Odds Ratio Patterns. Journal of the Royal Statistical Society: Series C (Applied Statistics) 44: 51–61. [Google Scholar] [CrossRef]
Gouveia, Sónia, Tobias A. Möller, Christian H. Weiß, and Manuel G. Scotto. 2018. A full ARMA model for counts with bounded support and its application to rainy-days time series. Stochastic Environmental Research and Risk Assessment 32: 2495–514. [Google Scholar] [CrossRef]
Jacobs, Patricia A., and Peter A. W. Lewis. 1983. Stationary Discrete Autoregressive-Moving Average Time Series Generated By Mixtures. Journal of Time Series Analysis 4: 19–36. [Google Scholar] [CrossRef]
Kanter, Marek. 1975. Autoregression for discrete processes mod 2. Journal of Applied Probability 12: 371–75. [Google Scholar] [CrossRef]
Kedem, Benjamin. 1980. Binary Time Series. In Lecture Notes in Pure and Applied Mathematics. New York: Marcel Dekker, Inc. [Google Scholar]
Kedem, Benjamin, and Konstantinos Fokianos. 2002. Regression Models for Time Series Analysis. Hoboken: Whiley & Sons. [Google Scholar]
Lütkepohl, Helmut. 2005. New Introduction to Multiple Time Series Analysis. Berlin/Heidelberg: Springer. [Google Scholar]
McKenzie, Eddie. 1981. Extending the correlation structure of exponential autoregressive moving average processes. Journal of Applied Probability 18: 181–89. [Google Scholar] [CrossRef]
McKenzie, Eddie. 2009. Discrete variate time series. Handbook of Statistics 21: 573–606. [Google Scholar]
Möller, Tobias A., and Christian H. Weiß. 2018. Generalized Discrete ARMA Models. unpublished manuscript. [Google Scholar]
Weiß, Christian H., and Rainer Göb. 2008. Measuring serial dependence in categorical time series. Advances in Statistical Analysis 92: 71–89. [Google Scholar] [CrossRef]
Weiß, Christian H. 2009a. Categorical Time Series Analysis and Application in Statistical Quality Control. Berlin: de-Verlag im Internet GmbH. [Google Scholar]
Weiß, Christian H. 2009b. Properties of a class of binary ARMA models. Statistics 43: 131–38. [Google Scholar] [CrossRef]
Weiß, Christian H. 2011. Generalized choice models for categorical time series. Journal of Statistical Planning and Inference 141: 2849–62. [Google Scholar] [CrossRef]
Weiß, Christian H. 2018. An Introduction to Discrete-Valued Time Series. Chichester: John Wiley & Sons, Inc. [Google Scholar]

Figure 1. Three real data examples of binary time series: (a) binarized eruption duration of the Old Faithful Geyser over 299 eruptions; (b) quarterly detected binarized recession/no-recession time series of Italy from Quarter 1 in 1960 to Quarter 1 in 2017 (229 time points); and (c) binarized fine dust (PM

_{10}

) data from Stuttgart, Germany recorded daily from 3 March 2016 to 31 July 2018 over 881 days.

Figure 1. Three real data examples of binary time series: (a) binarized eruption duration of the Old Faithful Geyser over 299 eruptions; (b) quarterly detected binarized recession/no-recession time series of Italy from Quarter 1 in 1960 to Quarter 1 in 2017 (229 time points); and (c) binarized fine dust (PM

_{10}

) data from Stuttgart, Germany recorded daily from 3 March 2016 to 31 July 2018 over 881 days.

Figure 2. Realization of an NDAR(1) process (Equation (2)) with parameter vector

P = [0.7, 0.3]

and error distribution

P (e_{t} = 1) = 0.5

and corresponding autocorrelation function (ACF).

Figure 2. Realization of an NDAR(1) process (Equation (2)) with parameter vector

P = [0.7, 0.3]

and error distribution

P (e_{t} = 1) = 0.5

and corresponding autocorrelation function (ACF).

Figure 3. Realization of a gbAR(1) process (Equation (4)) with parameter vector

P = [- 0.7, 0.3]

and error distribution

P (e_{t} = 1) = 0.5

and the corresponding autocorrelation function (ACF).

Figure 3. Realization of a gbAR(1) process (Equation (4)) with parameter vector

P = [- 0.7, 0.3]

and error distribution

P (e_{t} = 1) = 0.5

and the corresponding autocorrelation function (ACF).

Figure 4. Autocorrelation (ACF) and partial autocorrelation (pACF) of the Old Faithful Geyser data.

Figure 5. ROC curves based on Yule–Walker estimation (black) and MLE (red) of a gbAR(2) model fitted to the binarized Old Faithful Geyser eruption duration leading to an

A U C = 0.8317

in both cases.

Figure 5. ROC curves based on Yule–Walker estimation (black) and MLE (red) of a gbAR(2) model fitted to the binarized Old Faithful Geyser eruption duration leading to an

A U C = 0.8317

in both cases.

Figure 6. Comparison of (a) parameter ranges of

(α^{(1)}, α^{(2)})

and (b) pairs of autocorrelations

(ρ (1), ρ (2))

for AR(2) (black), gbAR(2) (red), binAR(2) (blue), and NDAR(2) (green) models.

Figure 6. Comparison of (a) parameter ranges of

(α^{(1)}, α^{(2)})

and (b) pairs of autocorrelations

(ρ (1), ρ (2))

for AR(2) (black), gbAR(2) (red), binAR(2) (blue), and NDAR(2) (green) models.

Figure 7. Autocorrelation pairs of the gbAR(2) process (red) compared to the pairs of autocorrelation of a second-order Markov chain (blue).

Table 1. Comparison of (a) Yule–Walker and (b) MLE parameter estimates based on subsamples of length

T = 50, 100, 150, 200, 250

of the binarized Old Faithful Geyser data.

Table 1. Comparison of (a) Yule–Walker and (b) MLE parameter estimates based on subsamples of length

T = 50, 100, 150, 200, 250

of the binarized Old Faithful Geyser data.

(a) Yule–Walker
T	${\hat{α}}^{(1)}$	${\hat{α}}^{(2)}$	${\hat{β}}^{(0)}$
50	−0.5819	0.1444	0.2738
100	−0.4610	0.2675	0.2715
150	−0.3748	0.3382	0.2871
200	−0.3738	0.3440	0.2822
250	−0.4048	0.2625	0.3328
299	−0.3949	0.2659	0.3393
(b) MLE
T	${\hat{α}}^{(1)}$	${\hat{α}}^{(2)}$	${\hat{β}}^{(0)}$
50	−0.5556	0.1812	0.2632
100	−0.4546	0.2822	0.2632
150	−0.3658	0.3511	0.2830
200	−0.3706	0.3514	0.2780
250	−0.4004	0.2723	0.3259
299	−0.3935	0.2711	0.3353

Table 2. Estimation performance for several Yule–Walker parameter estimates with respect to mean squared errors for three different DGPs over 1000 Monte Carlo replications.

	T	MSE of ${\hat{α}}^{(1)}$	MSE of ${\hat{α}}^{(2)}$	MSE of ${\hat{α}}^{(3)}$	MSE of ${\hat{μ}}_{X}$	MSE of ${\hat{μ}}_{e}$	MSE of ${\hat{β}}^{(0)}$
DGP1	100	0.00271			0.00022	0.03140	0.00271
	200	0.00133			0.00011	0.01464	0.00133
	500	0.00051			0.00004	0.00543	0.00051
	1000	0.00025			0.00002	0.00262	0.00025
DGP2	100	0.00684	0.00588		0.00191	0.04877	0.01007
	200	0.00338	0.00314		0.00107	0.02689	0.00521
	500	0.00136	0.00124		0.00044	0.01169	0.00219
	1000	0.00068	0.00064		0.00020	0.00555	0.00108
DGP3	100	0.01079	0.01253	0.00900	0.00416	0.10187	0.02212
	200	0.00502	0.00525	0.00386	0.00200	0.08931	0.01044
	500	0.00194	0.00198	0.00143	0.00082	0.05859	0.00374
	1000	0.00097	0.00087	0.00065	0.00040	0.03913	0.000178

Table 3. Comparison of the estimation performance of gbAR(p) model fits and Markov chain fits of order p for

p = 1, 2, 3

for three different gbAR-DGP1-3 with respect to the mean squared difference of estimated transition probabilities to the truth over 1000 Monte Carlo replications.

Table 3. Comparison of the estimation performance of gbAR(p) model fits and Markov chain fits of order p for

p = 1, 2, 3

for three different gbAR-DGP1-3 with respect to the mean squared difference of estimated transition probabilities to the truth over 1000 Monte Carlo replications.

T	MSE gbAR(1)	MSE MC $p = 1$	MSE gbAR(2)	MSE MC $p = 2$	MSE gbAR(3)	MSE MC $p = 3$
100	0.01887	0.02690	0.02647	0.02674	0.03302	0.06942
200	0.01288	0.02399	0.01889	0.01904	0.02333	0.04935
500	0.00788	0.02256	0.01125	0.01191	0.01536	0.02940
1000	0.00529	0.02168	0.00775	0.00852	0.01074	0.02024

Table 4. Comparison of a gbAR Model and Markov chain by its difference of the transition probabilities to the truth from an underlying Markov Process.

T	MSE gbAR(1)	MSE MC $p = 1$	MSE gbAR(2)	MSE MC $p = 2$	MSE gbAR(3)	MSE MC $p = 3$
100	0.03183	0.03310	0.02563	0.02784	0.02771	0.03216
200	0.02219	0.02365	0.01894	0.01951	0.02276	0.02202
500	0.01420	0.01610	0.01371	0.01242	0.01960	0.01380
1000	0.00980	0.01216	0.01132	0.00882	0.01838	0.00977

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jentsch, C.; Reichmann, L. Generalized Binary Time Series Models. Econometrics 2019, 7, 47. https://doi.org/10.3390/econometrics7040047

AMA Style

Jentsch C, Reichmann L. Generalized Binary Time Series Models. Econometrics. 2019; 7(4):47. https://doi.org/10.3390/econometrics7040047

Chicago/Turabian Style

Jentsch, Carsten, and Lena Reichmann. 2019. "Generalized Binary Time Series Models" Econometrics 7, no. 4: 47. https://doi.org/10.3390/econometrics7040047

APA Style

Jentsch, C., & Reichmann, L. (2019). Generalized Binary Time Series Models. Econometrics, 7(4), 47. https://doi.org/10.3390/econometrics7040047

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Generalized Binary Time Series Models

Abstract

1. Introduction

2. The Generalized Binary Autoregressive (gbAR) Model Class

2.1. gbAR Models

2.2. Stochastic Properties of gbAR Models

3. Generality of the gbAR Model Class and Estimation Performance

3.1. Illustration of the Generality of gbAR Models

3.2. Simulations

3.2.1. Estimation Performance

3.2.2. Robustness of gbAR Model Class

4. Further Extension: The Generalized Binary ARMA Class

4.1. gbARMA Models

4.2. Stochastic Properties of gbARMA Models

5. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A

Appendix A.1. Proof of Theorem 1

Appendix A.2. Proof of Theorem 3

Appendix A.3. Proof of Theorem 2 and 4

Appendix A.4. Proof of Lemma 1 and 2

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI