**4. High Frequency Data**

In this section we further elaborate on high frequency data and introduce the series that will be analyzed later. High frequency data are very important in the financial environment, mainly because there exist large movements in short intervals of time. This aspect represents an interesting opportunity for trading. Furthermore, it is well known that volatilities in different frequencies have significant cross-correlation. We can even say that coarse volatility predicts fine volatility better than the inverse, as shown in Dacorogna et al. (2001).

As an example, take the tick by tick foreign exchange (FX) time series Euro-Dollar, from January First 1999 to December 31, 2002. Returns are calculated using bid and ask prices, as

$$r\_l = \ln\left(\left(p\_t^{\text{bid}} + p\_t^{\text{ask}}\right)/2\right) - \ln\left(\left(p\_{t-1}^{\text{bid}} + p\_{t-1}^{\text{ask}}\right)/2\right). \tag{6}$$

We discard Saturdays and Sundays, and we replace holidays with the means of the last ten observations of the returns for each respective hour and day. After cleaning the data (see Dacorogna et al. (2001), for details) we will consider equally spaced returns, with sampling interval Δ*t* = 15 min. This seems to be adequate, as many studies indicate.

Figure 2 shows Euro-Dollar returns calculated as above. The length of this time series is 95,317. The figure shows that the absolute returns present a seasonal pattern. This is due to the fact that physical time does not follow, necessarily, the same pattern as the business time. This is a typical behavior of a financial time series and we will use a seasonal adjustment procedure similar to that of Martens et al. (2002). However, we will use absolute returns instead of squared returns; that is, we will compute the seasonal pattern as

$$S\_{d,s,h} = \frac{1}{s} \sum\_{j=1}^{s} |(r\_{d,j,h})\_{\prime}| \tag{7}$$

where *rd*,*s*,*<sup>h</sup>* is the return in the weekday *d*, week *s* and hour *h*, and *s* is the number of weeks from the beginning of the series. Therefore, *Sd*,*Ns*,*<sup>h</sup>* is the rolling window mean of the absolute returns with the beginning fixed.

In Figure 3 we have the autocorrelation function of these returns and of squared returns. The seasonality pattern is no longer present.

FX data has some distinct characteristics, mainly because they are produced twenty four hours a day, seven days a week. In particular, Euro-Dollar is the most liquid FX in the world. However, there are periods where the activity is greater or smaller, causing seasonal patterns to occur, as seen above.

Let us analyze some facts about these returns that we will denote simply by *rt*. We can see in Figure 4 the histogram fitted with a non-parametric density kernel estimate, using unbiased cross-validation method to estimate the bandwidth. It shows fat tails and high kurtosis, namely, 121, while its skewness coefficient is −0.079, showing almost symmetry. A normality test (Jarque-Bera) rejects the hypothesis that these returns are normal.

**Figure 2.** Euro-Dollar returns: acf of returns, acf of absolute returns and acf of squared returns.

The seasonally adjusted returns are then given by

$$
\tilde{r}\_l = \tilde{r}\_{d,s,l} = \frac{r\_{d,s,l}}{S\_{d,s,l}}.\tag{8}
$$

We may assume for example that the errors of a GARCH model fitted to these returns follow a Student's *t* distribution or a generalized error distribution, which represents better the fat tails of the distribution.

Often the optimization of the likelihood function can be a very difficult task, due mainly to the flat behavior of likelihood function, as can be seen in Zumbach (2000). Bayesian methods are an alternative, and in the next section we will use the Griddy-Gibbs sampling to estimate the parameters of a PHARCH model.

Figure 4 also shows that the Euro-Dollar series has some clusters of volatility. This is a typical behavior of financial time series. A problem is that we do not know how many clusters there are and what their sizes are. The reason for this is that the information arriving is different for each sampling frequency.

We can look these clusters as *market components* and they depend on the heterogeneity of the market. These market components are considered in our PHARCH model, as seen in Equation (3). Differently from GARCH-type models, PHARCH models have a variance equation with returns over intervals of different sizes. Therefore PHARCH models take into account the sign of the returns and not only their absolute value as GARCH models do. Two subsequent returns with similar sizes in the same direction will cause a higher impact on the variance than two subsequent returns with similar sizes but opposite signs.

Now we need to determine the number and the size of the market components for the Euro-Dollar FX series. Ruilova (2007) proposed some technical rules to determine these market components, and Dacorogna et al. (2001) proposed some empirical rules.

To help us to determine if the component sizes chosen are correct we can use the *impact of the component*.

We define the impact *Ii* of the *i*th component as,

$$I\_i = a\_i \mathbf{C}\_i \,\forall i. \tag{9}$$

Note that the stationary condition to PHARCH(*m*) models can be written in terms of these impacts; namely,

$$\sum\_{i=1}^{m} I\_i < 1.$$

We also notice that if we consider the Student's *t* distribution with *ν* degrees of freedom, the impact should be defined as

$$I\_{\mathbf{i}} = \frac{\upsilon}{\upsilon - 2} a\_{\mathbf{i}} \mathbb{C}\_{\mathbf{i}\prime} \quad \forall \mathbf{i} \geq \mathbf{1}. \tag{10}$$

As remarked above, the number of components in a financial series can vary depending how the returns are being traded in this market. That is, liquid series can have a structure with more components than a non-liquid series.
