Next Article in Journal
Endogeneity, Time-Varying Coefficients, and Incorrect vs. Correct Ways of Specifying the Error Terms of Econometric Models
Previous Article in Journal
Between Institutions and Global Forces: Norwegian Wage Formation Since Industrialisation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Fast Algorithm for the Computation of HAC Covariance Matrix Estimators †

Faculty of Business Administration, Universität Hamburg, 20146 Hamburg, Germany
*
Author to whom correspondence should be addressed.
In memoriam Kostas Kyriakoulis.
Econometrics 2017, 5(1), 9; https://doi.org/10.3390/econometrics5010009
Submission received: 10 August 2016 / Revised: 5 January 2017 / Accepted: 9 January 2017 / Published: 25 January 2017

Abstract

:
This paper considers the algorithmic implementation of the heteroskedasticity and autocorrelation consistent (HAC) estimation problem for covariance matrices of parameter estimators. We introduce a new algorithm, mainly based on the fast Fourier transform, and show via computer simulation that our algorithm is up to 20 times faster than well-established alternative algorithms. The cumulative effect is substantial if the HAC estimation problem has to be solved repeatedly. Moreover, the bandwidth parameter has no impact on this performance. We provide a general description of the new algorithm as well as code for a reference implementation in R.
JEL Classification:
C01; C55; C58; C63; G17

1. Introduction

This paper considers the heteroskedasticity and autocorrelation consistent (HAC) estimation of covariance matrices. This estimation problem arises in the construction of large-sample tests for the parameters in linear and nonlinear models. The HAC estimator for the covariance matrix of parameter estimators applies to a variety of model frameworks and estimation methods. Some examples are ordinary least squares (OLS), maximum likelihood, generalized method of moments (GMM) or instrumental variables (see Andrews [1], and Zeileis [2]). It also corresponds to the so-called sandwich estimator in the context of quasi-maximum likelihood (QML) estimation technique (cf. White [3], Chapter 8.3). In this connection we combine two essential topics to applied econometrics: robust covariance matrix estimation and fast computation of covariance matrix estimators.
In the last decades, several techniques for HAC covariance matrix estimation have been proposed in the literature, e.g., Andrews [1], Newey and West [4], White [5], MacKinnon and White [6]. These statistical methods go back to earlier literature, such as Jowett [7], Hannan [8], Brillinger [9]. Nowadays, these methods are widely used in econometric analysis. Beyond that, researchers require covariance matrices not only for the purpose of some hypothesis tests, but also as stand-alone functions which can be used in various statistical methods. This is becoming more and more relevant as pointed out in Cribari-Neto and Zarkos [10]. In spite of the vast econometric literature on this subject, little econometric software is available for the computation of HAC covariance matrix estimators.
The aim of this paper is to show that the computing time for HAC covariance matrix estimators can be decreased massively by using given information about the structure of the HAC covariance matrix estimators together with some matrix algebra. Particularly, we exploit the evaluation of a circulant matrix product, which can be efficiently calculated using the fast Fourier transform. The same calculation idea is employed by Wood and Chan [11] for the simulation of stationary Gaussian processes with prescribed covariance matrix as well as by Jensen and Nielsen [12] for the calculation of fractional differences.
We compare our new algorithm with two popular statistic algorithms: the algorithm by Roncalli [13], which is written for the statistical software GAUSS (cf. Aptech Systems [14]) and the algorithm by Zeileis [15] written for R (cf. R Development Core Team [16]), as well as with the lesser-known algorithm by Kyriakoulis [17] for MATLAB (cf. MathWorks [18]). According to our results, our new algorithm is up to ~20 times faster than the algorithms by Roncalli and Zeileis. The saved time can increase up to a few minutes for only one HAC estimation depending on the settings of the estimation problem. This is particularly relevant for the QML estimation of generalized autoregressive conditional heteroskedastic (GARCH) models (cf. Zivot [19]) and the estimation of stochastic volatility (SV) models via QML (cf. Ruiz [20]) or GMM (cf. Renault [21]) based on large financial datasets with high frequency sampling or multivariate structure. Another application area is the GMM estimation of multifractal volatility models (cf. Bacry et al. [22], Lux [23]), which proves to be a time consuming issue even in the univariate case with daily data. Thus reliable estimation results for the Multifractal Random Walk model require a data sample of size N > 2000, all the more the asymptotic normality of the GMM estimates can be reached only for sample sizes not less than ca. 16,000 data points (cf. Bacry et al. [24]). The cumulative effect is substantial if the HAC estimation problem has to be solved repeatedly (e.g., in the case of an iterated GMM estimation or for the purpose of simulation and forecast studies.).
Moreover, our algorithm does not employ the bandwidth parameter explicitly. Its performance is independent of the value of the bandwidth parameter as contrasted with the algorithms by Roncalli and Zeileis.
The paper is organized as follows. In Section 2 we give an overview on the HAC estimation problem and some of its application fields and introduce the notation we use. Section 3 combines some matrix algebra results and the structure of HAC estimators in order to introduce the new algorithm. In Section 4 we discuss the alternative algorithms. Section 5 investigates the performance issues of our new algorithm as compared with the alternative algorithms. We replicate the HAC computation steps in Chaussé and Xu [25] for the estimation of a SV model with high frequency data as well as in Lux et al. [26] for the purpose of a forecast study and report the computing time. We show that the new algorithm outperforms the other algorithms in the majority of cases we analyse. There are some isolated cases where our algorithm performs more slowly. However the computational cost is below 1 millisecond, which should be irrelevant in practice. The R-Codes for the different HAC covariance matrix estimators are given in the Appendix.

2. HAC Covariance Matrix Estimation

2.1. The Estimation Problem

We consider a t t Z a stationary ergodic q-dimensional stochastic process of mean zero and Γ τ τ Z its autocovariance matrices
Γ τ = E a t a t + τ .
We want to estimate the quantity
S N = 1 N s = 1 N t = 1 N E a t a s ,
where N denotes the number of given observations. S N can be also written as (cf. Smith [27])
S N = Γ 0 + τ = 1 N 1 N τ N Γ τ + Γ τ .
This estimation problem can be solved in the limit for N , i.e.,
S = lim N S N = τ = Γ τ = f ( 0 )
with f ( 0 ) the spectral density matrix of the process ( a t ) in 0. The estimation of S is a nonparametric spectral estimation problem with the corresponding lag window spectral estimator
S ^ N = Γ ^ 0 + τ = 1 N 1 ω τ , N Γ ^ τ + Γ ^ τ .
Γ ^ τ denotes the empirical autocovariance matrix of lag τ
Γ ^ τ = 1 N t = 1 N τ a t a t + τ
with 0 τ N 1 and ω τ , N is a function of weights (cf. Newey and West [4]). S ^ N is weakly consistent for given choice of ω τ , N and it is called a Heteroskedasticity and Autocorrelation Consistent (HAC) covariance matrix estimator for reasons to be explained below.
Let the bandwidth parameter b N control the number of nonzero weights, ω τ , N = 0 for τ > b N . Then we can also write Equation (5) as follows
S ^ N = Γ ^ 0 + τ = 1 b N ω τ , N Γ ^ τ + Γ ^ τ .
Note that the algorithm considered in this paper does not require the specification of a bandwidth parameter. In the following we suppress the index N for reasons of simplicity and write ω τ and b, respectively.

2.2. Application

This estimation problem can be applied to various econometric fields depending on the choice of ( a t ) . Its main interest resides in the construction of large-sample tests. Many parameter estimators θ ^ N in nonlinear dynamic models satisfy
N θ ^ N θ 0 d N 0 , M S M
with θ 0 the true parameter value to be estimated, M a non-random matrix and S given in (4). See Andrews [1] on the estimation of M. One can construct tests about the value of θ 0 based on the approximate distribution of θ ^ N in large samples
θ ^ N · N θ 0 , 1 N M S M
where S can be estimated by S ^ N in (5). It is now obvious why we call S ^ N a covariance matrix estimator.

2.3. The Case of the OLS Estimator

Consider the linear model Y = X θ + u with the OLS estimator θ ^ = ( X X ) 1 X Y and
Cov θ ^ = ( X X ) 1 X Cov [ Y ] X ( X X ) 1
= ( X X ) 1 X E [ u u ] X ( X X ) 1 .
In the case of homoskedastic and uncorrelated errors, E [ u u ] = σ 2 I , the covariance matrix of θ ^ simplifies to
Cov θ ^ = σ 2 ( X X ) 1
and it can be easily estimated by
Cov ^ θ ^ = s 2 ( X X ) 1
with s 2 an unbiased estimator for σ 2 . In the general case of heteroskedasticity and dependence of unknown forms of the error term u one can estimate the following asymptotic covariance matrix
lim N Cov N θ ^ N θ 0 = lim N N ( X X ) 1 X E [ u u ] X ( X X ) 1
= lim N 1 N X X 1 S N 1 N X X 1
= lim N 1 N X X 1 S lim N 1 N X X 1
with
S = lim N S N = lim N 1 N X E [ u u ] X = lim N 1 N t = 1 N s = 1 N E X t u t ( X s u s ) .
This estimation can be performed using the HAC covariance matrix estimator (cf. formula (5)) based on the process ( a t ) = ( X t u t ) .
The OLS estimator satisfies
N θ ^ N θ 0 d N 0 , Q 1 S Q 1
with Q = lim N 1 N X X a finite nonsingular matrix and respectively in large samples
θ ^ N · N θ 0 , 1 N Q 1 S Q 1 .

2.4. The Case of the GMM Estimator

Consider the model-free GMM estimation of θ 0 using q moment conditions. In this case, the process ( a t ) contains the q-dimensional deviation of the empirical moments m t = ( m i , t ) 1 i q from their theoretical counterparts M t ( θ ) = ( M i , t ) 1 i q with
a t ( θ ) = M t ( θ ) m t .
The GMM estimator is given by
θ ^ = arg min θ Θ 1 N t a t ( θ ) W 1 N t a t ( θ )
with W some weighting matrix (Hall [28]). Under some regularity conditions the GMM estimator is weakly consistent and asymptotically normally distributed
N θ ^ N θ 0 d N ( 0 , M S M ) ,
where M is a non-random matrix and
S = lim N N · Cov 1 N q = 1 N a t
= lim N N · E 1 N 2 t = 1 N a t s = 1 N a s
= lim N 1 N t = 1 N s = 1 N E a t a s .
Again we employ the HAC covariance matrix estimator (cf. formula (5)) in order to estimate S.

3. The Algorithm

In this paper, we introduce a fast algorithm for the computation of the HAC covariance matrix estimator S ^ N in (5). This is based on the equivalent representation (cf. Kyriakoulis [17])
S ^ N = 1 N A T ( ω ) A
with A R N × q and
A = a 1 a 2 a N .
The matrix T ( ω ) denotes the symmetric N × N Toeplitz matrix with first column ω given by the weights
ω = 1 ω 1 ω 2 ω N 1 .
For a more memory space-efficient computation of the matrix product T ( ω ) A we are using a special circulant matrix (cf. Van Loan [29]). Therefore, we embed the Toeplitz matrix T ( ω ) in a symmetric circulant matrix C ( ω * ) R 2 N × 2 N with
ω * = 1 ω 1 ω 2 ω N 1 0 ω N 1 ω N 2 ω 1 .
Furthermore, we construct the 2 N × q matrix A *
A * = A 0 N × q
by adding a N × q matrix containing only zeros at the bottom of A.
Remark 1.
  • The Toeplitz matrix T ( ω ) is given by the first N rows and first N columns of C ( ω * ) , i.e.,
    T ( ω ) = C 1 : N , 1 : N ( ω * ) .
    Generally we denote with M a : b , c : d R m × n a sub-matrix of M containing the rows from a to b and the columns from c to d ( a , b , c , d N , 1 a b m and 1 c d n ).
  • The necessary product T ( ω ) A is given by
    C 1 : N , ( ω * ) A * = C 1 : N , 1 : N ( ω * ) A = T ( ω ) A .
    Thus, the fast evaluation of C ( ω * ) A * permits fast evaluation of T ( ω ) A .
The following theorem explains how the matrix product T ( ω ) A (cf. formula (26)) can be computed in a fast way by means of the discrete Fourier transform (DFT) and its inverse transform. It provides the basis for our new algorithm.
Theorem 1 (Circulant matrix and its eigenvalues and eigenvectors).
Let C ( c ) R n × n be a circulant matrix with first column c = c 1 c n and let V Λ V * be the matrix decomposition of C ( c ) , i.e.,
C ( c ) = V Λ V * .
Thereby, λ k ( k = 1 , , n ) are the eigenvalues of C ( c ) and v k ( k = 1 , , n ) the corresponding eigenvectors. The matrix Λ is diagonal with Λ = d i a g ( λ 1 , , λ n ) and V is a matrix containing the eigenvectors, i.e., V = v 1 v n . The matrix V * is the complex conjugate of V. Then, the following properties hold true:
1.
The eigenvalues λ k are the discrete Fourier transform (DFT) of the column vector c, i.e.,
λ k = j = 1 n exp 2 ( j 1 ) k π i n c j
for k = 1 , , n .
2.
The orthornomal left eigenvectors v k ( k = 1 , , n ) are given by
v k = n 1 2 1 r k r k 2 r k n 1
with r k = exp ( 2 ( k 1 ) π i n ) .
3.
The product V * x , for any x R n , is given by the DFT of x.
4.
The product y = V x R n , for any x R n , is given by the inverse discrete Fourier Transform (IDFT) of x, i.e.,
y k = 1 n j = 1 n exp 2 ( j 1 ) k π i n x j
for k = 1 , , n .
Proof. 
See Brockwell and Davis [30], Gray [31] or Golub and Van Loan [32]. ☐
We now introduce the new algorithm (cf. Algorithm 1) for the computation of HAC covariance matrix estimators on the basis of Theorem 1. In the following we assume that the weights ω τ ( τ = 1 , , N 1 ) are known.
Algorithm 1.
The algorithm is given in five steps while step three is subdivided into three steps.
1.
Compute the eigenvalues λ i ( i = 1 , , 2 N ) of C ( ω * ) using Equation (34) with
ω * = 1 ω 1 ω 2 ω N 1 0 ω N 1 ω N 2 ω 1 .
2.
Construct the matrix A * with dimension 2 N × q from
A * = A 0 N × q
using A from Equation (27).
3.
For all j { 1 , , q } compute the columns of the matrix C ( ω * ) A * . These columns are written as C ( ω * ) A j * = V Λ V * A j * while A j * is the j-th column of A * . This computation is done in three steps:
(a) 
Determine V * A j * given by the DFT of A j * .
(b) 
Multiply for all i { 1 , , 2 N } the i-th entry of the vector V * A j * with the eigenvalue λ i , in order to construct Λ V * A j * .
(c) 
Determine C ( ω * ) A j * = V Λ V * A j * given by the IDFT of Λ V * A j * .
4.
Select the upper N × q block of C ( ω * ) A * . This upper block results in T ( ω ) A , i.e.,:
( C ( ω * ) A * ) 1 : N , = T ( ω ) A
5.
Determine S ^ N = 1 N A T ( ω ) A .

4. Alternative Algorithms

In this paper, we compare our new algorithm with three alternative algorithms currently used. The first algorithm that we consider was developed by Roncalli [13] and can be found in the time series library TSM (Time Series and Wavelets for Finance) for the statistical software GAUSS (cf. Aptech Systems [14]). Here the computation of HAC estimators S ^ N is implemented by means of a for-loop according to expression (7) combined with an ingenious matrix product, which enables the fast computation of the autocovariance matrices Γ ^ τ :
Γ ^ τ = 1 N A 0 τ × q A ( 1 : ( N τ ) )
Algorithm 2 (Roncalli).
1.
Determine Γ ^ 0 = 1 N A A and set L = Γ ^ 0 .
2.
For τ from 1 to b determine Γ ^ τ according to (40) and update L = L + ω τ Γ ^ τ + Γ ^ τ .
3.
Determine S ^ N = L .
The second algorithm by Zeileis [15] is part of the “sandwich” package for the statistical software R (cf. R Development Core Team [16]). This algorithm is similar to Roncalli’s algorithm except for the calculation procedure for Γ ^ τ in Step 2. His procedure is less efficient than Roncalli’s as it requires the sequential updating of two matrices instead of one.
Algorithm 3 (Zeileis).
1.
Determine Γ ^ 0 = 1 N A A and set L = Γ ^ 0 .
2.
For τ from 1 to b determine Γ ^ τ = 1 N ( A ( ( τ + 1 ) : N ) ) A ( 1 : ( N τ ) ) and update L = L + ω τ Γ ^ τ + Γ ^ τ .
3.
Determine S ^ N = L .
Finally, the algorithm by Kyriakoulis [17] for MATLAB (cf. MathWorks [18]) excels in terms of its elegance and simplicity. This algorithm avoids the resource-intensive recursive summations employed above using instead expression (26) and is the basis for the new algorithm introduced in the previous section. It consists of only two steps:
Algorithm 4 (Kyriakoulis).
1.
Construct the symmetric Toeplitz matrix T ω with the first column
ω = 1 ω 1 ω 2 ω N 1 .
2.
Determine S ^ N = 1 N A T ω A .
A big drawback of this algorithm is the memory space-inefficient handling of the N × N matrix T ω . On account of this the program runs out of memory and fails to compute S ^ N for series longer than N = 10,000 data points. This problem could be solved within the scope of our algorithm, which does not employ the matrix T ω explicitly.

5. Comparing Different Algorithms for the Computation of HAC Covariance Matrix Estimators

In this section we present the gains in absolute and relative computing times that are achieved by our new algorithm compared with the three alternative algorithms discussed in the previous section.
All four algorithms were programmed and run in R (cf. R Development Core Team [16]) for reasons of comparability.
Remark 2.
  • We used the “fftwtools”-package of R for the fft-function. The four algorithms run a little bit faster when using the “compiler”-package of R , but relative computing times are nearly the same.
  • There is a little difference in the results between the algorithms {Roncalli [13], Zeileis [15]} and {NEW, Kyriakoulis [17]}. The difference is somewhere near machine precision ( exp ( 16 ) ) and the practical relevance should be very little.
We used the following hard- and software:
  • Intel i5 2.90 GHz
  • 8GB RAM
  • R 3.3.2
  • Windows 10 Professional 64bit
We measured the computing time for different values of b, N and q. The matrix A was randomly generated for every set of ( b , N , q ) , using a normally distributed random number generator with mean 0 and standard deviation 10. Nonetheless neither the distribution nor the parameters of the random number generator influenced the results significantly. All four algorithms were applied to the same matrix A (given b, N and q). After each application all variables in R except for A, b, N and q were deleted.
We used the weights ω τ for the quadratic spectral kernel function, since this kernel function is probably most frequently used in the literature (cf. Zeileis [15]). An overview of different weights can be found in Andrews [1]. The results of the computing times are given in Table 1 in absolute time (in milliseconds) and in Table 2 in relative time (compared with our new algorithm).
Obviously, in Table 1 one can see that our new algorithm has the advantage that the bandwidth b has no impact on its performance, while it has for the Roncalli and Zeileis algorithms. The saved time can increase up to a few minutes for only one HAC estimation depending on the settings of the estimation problem.
Figure 1 shows a plot of absolute computation times of our new algorithm against the algorithms of Roncalli [13] and Zeileis [15] for different bandwidths b ( N = 10 6 and q = 10 ). One can see again that our new algorithm is independent of the bandwidth b while the algorithms of Roncalli [13] and Zeileis [15] are not. This encouraging performance opens up new possibilities of using large bandwidths in combination with large datasets. We leave this issue for future exploration.
Let us exemplify how Table 2 needs to be read. For example consider the case b = 60 , N = 10 5 and q = 30 . Then the algorithm proposed by Roncalli [13] needs 8.19 times more computation time compared with our new algorithm and the algorithm proposed by Zeileis [15] needs 8.51 times more computation time compared with our new algorithm.
Table 2 can be summarized as follows:
  • Compared with the algorithm proposed by Roncalli [13] our new algorithm is between 1.83 times and 15.03 times faster.
  • Compared with the algorithm proposed by Zeileis [15] our new algorithm is between 2.04 and 15.82 times faster.
  • Compared with the algorithm proposed by Kyriakoulis [17] our new algorithm is between 7.68 and 140.40 times faster, while the algorithm proposed by Kyriakoulis [17] runs out of memory for N > 10 4 .
Overall, our new algorithm is faster than any of the compared algorithms. The time saved while using the new algorithm can increase considerably, especially if the HAC estimation problem has to be solved repeatedly. For example, the iterated GMM estimation procedure requires an update of the estimated covariance matrix in each step. If we consider 50 estimation steps, then our new algorithm can save up to ~95 min compared with the algorithms proposed by Roncalli [13] or Zeileis [15] ( b = 100 , N = 10 6 and q = 30 ). Even in the case of a shorter dataset ( N = 10 5 ) we would still save up to ~9 minutes as compared with the alternative algorithms ( b = 100 , N = 10 5 and q = 30 ).
Figure 2 shows different relative computing times as a function of the sample size ( N { 10 2 , 5 · 10 2 , 10 3 , 5 · 10 3 , 10 4 , 5 · 10 4 , 10 5 , 2 · 10 5 , 5 · 10 5 , 10 6 } ). One can see that our new algorithm outperforms in the majority of parameter constellations. Only in the case of a small N ( N { 100 , 500 } ) combined with a small bandwidth ( b = 30 ) and only few moment conditions ( q = 10 ) does our algorithm perform more slowly. This is in accordance with the performance pattern in Jensen and Nielsen [12]. However, the computational cost is below 1 millisecond, which should be irrelevant in practice. Our algorithm reaches its highest performance approximately for N between 500 and 1000. After N = 1000 the performance of our new algorithm reduces, but still remains better than Roncalli [13] or Zeileis [15]. At the same time, the absolute computing times increase significantly, which leads to a considerable difference in computational speed between the competing algorithms. This is also illustrated below.
We replicate the HAC estimation problem in two empirical applications and report the computing time for the three competing algorithms.
Remark 3.
We used the “fftwtools”-package of R for the fft-function. Additionally, the time series was padded with zeros such that the total length of the series was a power of two.
The first empirical application is the estimation of a generalized asymmetric SV model with realized volatility (GASV-RV) in Chaussé and Xu [25] based on high frequency financial data. The estimation sample spans 5 years (2003–2008) and a total of N = 1,456,650 observations. The authors consider four different GMM estimation procedures, each of them using the HAC covariance matrix estimation and various sets of moment conditions with q = 36 moments at most. We replicated this estimation problem and estimated only the corresponding HAC covariance matrices based on randomly generated data, so that our new algorithm can directly be compared with the other ones. According to our results in Table 3 the computation time is significant even for one single HAC estimation due to the large N. Altogether (four estimations, six assets), our new algorithm can save up to ~26 min as compared with Roncalli [13] or Zeileis [15]. It is important to note that Chaussé and Xu [25] use one-step GMM. The estimation problem would be all the more time consuming for iterated estimations.
The empirical application in Chaussé and Xu [25] is a comparative study between the GASV-RV model and the GARCH model with realized volatility of Hansen et al. [33]. The original dataset in Hansen et al. [33] comprises 29 assets over a time period of 6 years ( N = 1,747,980 ). However Chaussé and Xu [25] restricted their analysis to only 6 assets and 5 years, respectively, most likely due to the enormous expenditure of time (see Table 3 for the computation times based on the original dataset).
The second empirical application is the forecast study in Lux et al. [26]. The authors consider three forecast problems with different out-of-sample periods: the “full” sample (July 2005–April 2009), the “tranquil” sample (July 2005–July 2007) and the “turbulent” sample (July 2007–April 2009) including the financial crisis. From an estimation point of view, the “tranquil” sample scenario is redundant, since the relevant estimation results can be simply borrowed from the “full” sample problem. On account of this, we replicated the estimation problem only for the “full” sample and the “turbulent” sample problem and estimated only the corresponding HAC covariance matrices based on randomly generated data. We assumed recursive estimations with rolling time window after each forecast. Consider S&P 500 over the period roughly from 1983 to 2009. Then the “turbulent” sample forecast problem requires 454 estimations with sample sizes from N = 6067 to N = 6520 whereas the “full” sample problem requires 949 estimations with sample sizes from N = 5572 to N = 6520 . In each estimation step three models (the Binomial Markov-switching multifractal (MSM) model, the Log-normal MSM model and the Log-normal MSM model with realized volatility) were considered together with the iterated GMM procedure (approx. 30 iterations with q = 9 and b = 30 ). The gain in time as well as the overall computation time for the case of S&P 500 is given in Table 3. This time saving cumulates rapidly when considering a number of five assets, as in Lux et al. [26].

Author Contributions

Jochen Heberle and Cristina Sattarhoff wrote the paper and programmed the algorithms.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. R Codes

In this section we present the reference R code for the four algorithms examined in this paper. Our functions require three arguments: mcond, which corresponds to the matrix A, method, which specifies the weights function, and the bandwidth bw. An auxiliary function for the computation of different weights ω τ is also provided.

Appendix A.1. The R Code for Our New Algorithm

Econometrics 05 00009 i001

Appendix A.2. The R Code for the Algorithm Proposed by Roncalli

Econometrics 05 00009 i002

Appendix A.3. The R Code for the Algorithm Proposed by Zeileis

Econometrics 05 00009 i003

Appendix A.4. The R Code for the Algorithm Proposed by Kyriakoulis

Econometrics 05 00009 i004

Appendix A.5. The R Code for the Computation of the Weights

Econometrics 05 00009 i005

References

  1. D.W.K. Andrews. “Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimation.” Econometrica 59 (1991): 817–858. [Google Scholar] [CrossRef]
  2. A. Zeileis. “Object-oriented Computation of Sandwich Estimators.” J. Stat. Softw. 16 (2006): 1–16. [Google Scholar] [CrossRef]
  3. H. White. Estimation, Inference and Specification Analysis, 1st ed. Cambridge, UK: Cambridge University Press, 1994. [Google Scholar]
  4. W.K. Newey, and K.D. West. “A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix.” Econometrica 55 (1987): 703–708. [Google Scholar] [CrossRef]
  5. H. White. “A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity.” Econometrica 48 (1980): 817–838. [Google Scholar] [CrossRef]
  6. J.G. MacKinnon, and H. White. “Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties.” J. Econom. 29 (1985): 305–325. [Google Scholar] [CrossRef]
  7. G.H. Jowett. “The Comparison of Means of Sets of Observations from Sections of Independent Stochastic Series.” J. R. Stat. Soc. 17 (1955): 208–227. [Google Scholar]
  8. E.J. Hannan. “The Variance of the Mean of a Stationary Process.” J. R. Stat. Soc. 19 (1957): 282–285. [Google Scholar]
  9. D.R. Brillinger. “Confidence Intervals for the Crosscovariance Function.” Sel. Stat. Can. 5 (1979): 1–16. [Google Scholar]
  10. F. Cribari-Neto, and S.G. Zarkos. “Econometric and Statistical Computing Using Ox.” Comput. Econ. 21 (2003): 277–295. [Google Scholar] [CrossRef]
  11. A.T.A. Wood, and G. Chan. “Simulation of Stationary Gaussian Processes in [0, 1]d.” J. Comput. Graph. Stat. 3 (1994): 409–432. [Google Scholar] [CrossRef]
  12. A.N. Jensen, and M.O. Nielsen. “A Fast Fractional Difference Algorithm.” J. Time Ser. Anal. 35 (2014): 428–436. [Google Scholar] [CrossRef]
  13. T. Roncalli. TSM–Time Series and Wavelets for Finance. Paris, France: Ritme Informatique, 1996. [Google Scholar]
  14. Aptech Systems. GAUSS. Chandler, AZ, USA: Aptech Systems Inc., 2014. [Google Scholar]
  15. A. Zeileis. “Econometric Computing with HC and HAC Covariance Matrix Estimators.” J. Stat. Softw. 11 (2004): 1–17. [Google Scholar] [CrossRef]
  16. R Development Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing, 2016. [Google Scholar]
  17. K. Kyriakoulis. The GMM Toolbox. 2005. Available online: http://personalpages.manchester.ac.uk/staff/Alastair.Hall/GMMGUI.html (accessed on 13 January 2017).
  18. MathWorks. MATLAB. Natick, MA, USA: The MathWorks Inc., 2014. [Google Scholar]
  19. E. Zivot. “Practical Issues in the Analysis of Univariate GARCH Models.” In Handbook of Financial Time Series. Edited by T.G. Andersen, R.A. Davis, J.P. Kreiß and T. Mikosch. Berlin/Heidelberg, Germany: Springer-Verlag, 2009, pp. 113–155. [Google Scholar]
  20. E. Ruiz. “Quasi-Maximum Likelihood Estimation of Stochastic Volatility Models.” J. Econom. 63 (1994): 289–306. [Google Scholar] [CrossRef] [Green Version]
  21. E. Renault. “Moment-Based Estimation of Stochastic Volatility Models.” In Handbook of Financial Time Series. Edited by T.G. Andersen, R.A. Davis, J.P. Kreiß and T. Mikosch. Berlin/Heidelberg, Germany: Springer-Verlag, 2009, pp. 269–311. [Google Scholar]
  22. E. Bacry, A. Kozhemyak, and J.F. Muzy. “Continuous Cascade Models for Asset Returns.” J. Econ. Dyn. Control 32 (2008): 156–199. [Google Scholar] [CrossRef]
  23. T. Lux. “The Markov-Switching Multifractal Model of Asset Returns: GMM Estimation and Linear Forecasting of Volatility.” J. Bus. Econ. Stat. 26 (2008): 194–210. [Google Scholar] [CrossRef]
  24. E. Bacry, A. Kozhemyak, and J.F. Muzy. “Log-Normal Continuous Cascade Model of Asset Returns: Aggregation Properties and Estimation.” Quant. Finance 13 (2013): 795–818. [Google Scholar] [CrossRef]
  25. P. Chaussé, and D. Xu. “GMM Estimation of a Realized Stochastic Volatility Model: A Monte Carlo Study.” Econom. Rev., 2016. [Google Scholar] [CrossRef]
  26. T. Lux, L. Morales-Arias, and C. Sattarhoff. “Forecasting Daily Variations of Stock Index Returns with a Multifractal Model of Realized Volatility.” J. Forecast. 33 (2014): 532–541. [Google Scholar] [CrossRef]
  27. R.J. Smith. “Automatic positive semidefinite HAC covariance matrix and GMM estimation.” Econom. Theory 21 (2005): 158–170. [Google Scholar] [CrossRef]
  28. A.R. Hall. Generalized Method of Moments, 1st ed. Advanced Texts in Econometrics; Oxford, UK: Oxford University Press, 2005. [Google Scholar]
  29. C.F. Van Loan. Computational Frameworks for the Fast Fourier Transform. Frontiers in Applied Mathematics; Philadelphia, PA, USA: Society for Industrial and Applied Mathematics, 1992. [Google Scholar]
  30. P.J. Brockwell, and R.A. Davis. Time Series: Theory and Methods, 2nd ed. Springer Series in Statistics; New York, NY, USA: Heidelberg, Germany: Springer, 2006. [Google Scholar]
  31. R.M. Gray. “Toeplitz and Circulant Matrices: A review.” Found. Trends Commun. Inf. Theory 2 (2006): 155–239. [Google Scholar] [CrossRef]
  32. G.H. Golub, and C.F. van Loan. Matrix Computations, 3rd ed. Johns Hopkins Series in the Mathematical Sciences; Baltimore, MD, USA: Johns Hopkins University Press, 1996. [Google Scholar]
  33. P.R. Hansen, Z. Huang, and H.H. Shek. “Realized GARCH: A Joint Model for Returns and Realized Measures of Volatility.” J. Appl. Econom. 27 (2012): 877–906. [Google Scholar] [CrossRef]
Figure 1. Absolute computation times of our new algorithm against the algorithms of Roncalli [13] and Zeileis [15] as a function of the bandwidth b. The parameter set was N = 10 6 and q = 10 .
Figure 1. Absolute computation times of our new algorithm against the algorithms of Roncalli [13] and Zeileis [15] as a function of the bandwidth b. The parameter set was N = 10 6 and q = 10 .
Econometrics 05 00009 g001
Figure 2. Relative computing times of Roncalli [13] and Zeileis [15] compared with our new algorithm for different parameter constellations (parameter N reaches from 10 2 to 10 6 ). The green line is at “ y = 1 ”. The x-axis is logarithmic. Reading example: In the comparison of “Zeileis vs. NEW” with the parameter set N = 10 3 , q = 30 and b = 100 (blue dotted line) the NEW algorithm is about 20 times faster compared to the algorithm proposed by Zeileis [15].
Figure 2. Relative computing times of Roncalli [13] and Zeileis [15] compared with our new algorithm for different parameter constellations (parameter N reaches from 10 2 to 10 6 ). The green line is at “ y = 1 ”. The x-axis is logarithmic. Reading example: In the comparison of “Zeileis vs. NEW” with the parameter set N = 10 3 , q = 30 and b = 100 (blue dotted line) the NEW algorithm is about 20 times faster compared to the algorithm proposed by Zeileis [15].
Econometrics 05 00009 g002
Table 1. Absolute computing time (in milliseconds) for different values of b, N and q (Note: R runs out of memory in the “blank”-cases.).
Table 1. Absolute computing time (in milliseconds) for different values of b, N and q (Note: R runs out of memory in the “blank”-cases.).
New AlgorithmRoncalliZeileisKyriakoulis
Nqqqq
102030102030102030102030
b = 30 500011243121761562489165140416031806
10,0002554744916633454173340565468277398
50,00012531944726790517382919471797
100,0002876428925711893352163520663685
200,00062811951768118538557313128043217538
500,0001523300644852963962818,1453260984918,929
1,000,000320167279809586218,49736,687654519,62737,678
b = 60 5000922314614932056160342138816061807
10,00027497595324646108344672606765267375
50,0001213194465471850353059519503704
100,000330579851110837466971123840167242
200,000626125418232326776514,4742550825514,974
500,000150229774682608119,02136,469642719,80737,544
1,000,00031506398983311,56536,81672,32612,97239,16674,822
b = 100 50001025347824851288266539138216091907
10,00027527615654610651785891148577068327699
50,00012131945495029905917102533366380
100,0003315809271832620411,6502063653212,068
200,00063012501749388412,68724,201417213,59825,489
500,000151129914872976331,21060,45310,75333,04762,455
1,000,00031776441985519,42462,593121,81521,66765,241125,517
Table 2. Relative computing times (compared to our new algorithm) for different values of b, N and q (Note: R runs out of memory in the “blank”-cases.).
Table 2. Relative computing times (compared to our new algorithm) for different values of b, N and q (Note: R runs out of memory in the “blank”-cases.).
RoncalliZeileisKyriakoulis
Nqqq
102030102030102030
b = 30 50001.993.095.092.243.655.3845.7374.8423.89
10,0001.943.084.502.173.224.5876.24140.4044.61
50,0002.142.833.892.332.964.02
100,0001.992.953.952.213.224.13
200,0001.893.234.142.043.614.26
500,0001.953.204.052.143.284.22
1,000,0001.832.753.742.042.923.84
b = 60 50004.926.9010.356.007.3911.0644.8435.2212.10
10,0003.516.678.594.017.068.9380.6568.7522.75
50,0004.545.807.924.946.118.31
100,0003.356.478.193.756.938.51
200,0003.726.197.944.086.588.21
500,0004.056.397.794.286.658.02
1,000,0003.675.757.364.126.127.61
b = 100 50007.859.8415.038.7810.5415.8240.5620.507.68
10,0005.7010.5814.086.4811.4215.1776.2443.6814.11
50,0007.889.3713.028.5010.4514.04
100,0005.5410.7012.576.2411.2713.02
200,0006.1710.1513.836.6210.8814.57
500,0006.4610.4412.417.1111.0512.82
1,000,0006.119.7212.366.8210.1312.74
Table 3. Computation times and gain in time in minutes for our new algorithm compared to the ones proposed by Roncalli [13] and Zeileis [15] for the empirical applications in Chaussé and Xu [25] and Lux et al. [26].
Table 3. Computation times and gain in time in minutes for our new algorithm compared to the ones proposed by Roncalli [13] and Zeileis [15] for the empirical applications in Chaussé and Xu [25] and Lux et al. [26].
Time in Minutes
Chaussé and XuHansen et al.Lux et al.
one est.full est.one est.full est.“turbulent”“full”
Roncalli1.3833.031.67193.4017.3135.80
Zeileis1.4735.231.68195.2019.7140.41
NEW0.399.320.6676.4210.5522.84
gain in time NEW vs. Roncalli0.9923.701.01116.986.7612.96
gain in time NEW vs. Zeileis1.0825.911.02118.789.1617.57

Share and Cite

MDPI and ACS Style

Heberle, J.; Sattarhoff, C. A Fast Algorithm for the Computation of HAC Covariance Matrix Estimators. Econometrics 2017, 5, 9. https://doi.org/10.3390/econometrics5010009

AMA Style

Heberle J, Sattarhoff C. A Fast Algorithm for the Computation of HAC Covariance Matrix Estimators. Econometrics. 2017; 5(1):9. https://doi.org/10.3390/econometrics5010009

Chicago/Turabian Style

Heberle, Jochen, and Cristina Sattarhoff. 2017. "A Fast Algorithm for the Computation of HAC Covariance Matrix Estimators" Econometrics 5, no. 1: 9. https://doi.org/10.3390/econometrics5010009

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop