Next Article in Journal
Response-Based Sampling for Binary Choice Models With Sample Selection
Next Article in Special Issue
Econometric Fine Art Valuation by Combining Hedonic and Repeat-Sales Information
Previous Article in Journal
Top Incomes, Heavy Tails, and Rank-Size Regressions
Previous Article in Special Issue
Bayesian Analysis of Bubbles in Asset Prices
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Jackknife Bias Reduction in the Presence of a Near-Unit Root

by
Marcus J. Chambers
1,* and
Maria Kyriacou
2
1
Department of Economics, University of Essex, Wivenhoe Park, Colchester, Essex CO4 3SQ, UK
2
Department of Economics, University of Southampton, Southampton SO17 1BJ, UK
*
Author to whom correspondence should be addressed.
Econometrics 2018, 6(1), 11; https://doi.org/10.3390/econometrics6010011
Submission received: 29 September 2017 / Revised: 9 February 2018 / Accepted: 22 February 2018 / Published: 5 March 2018
(This article belongs to the Special Issue Celebrated Econometricians: Peter Phillips)

Abstract

:
This paper considers the specification and performance of jackknife estimators of the autoregressive coefficient in a model with a near-unit root. The limit distributions of sub-sample estimators that are used in the construction of the jackknife estimator are derived, and the joint moment generating function (MGF) of two components of these distributions is obtained and its properties explored. The MGF can be used to derive the weights for an optimal jackknife estimator that removes fully the first-order finite sample bias from the estimator. The resulting jackknife estimator is shown to perform well in finite samples and, with a suitable choice of the number of sub-samples, is shown to reduce the overall finite sample root mean squared error, as well as bias. However, the optimal jackknife weights rely on knowledge of the near-unit root parameter and a quantity that is related to the long-run variance of the disturbance process, which are typically unknown in practice, and so, this dependence is characterised fully and a discussion provided of the issues that arise in practice in the most general settings.
JEL Classification:
C22

1. Introduction

Throughout his career, Peter Phillips has made important contributions to knowledge across the broad spectrum of econometrics and statistics, providing inspiration to many other researchers along the way. This paper builds on two of the strands of Peter’s research, namely jackknife bias reduction and the analysis of nonstationary time series. Indeed, our own work on the jackknife (Chambers 2013, 2015; Chambers and Kyriacou 2013) was inspired by Peter’s work on this topic with Jun Yu, published as Phillips and Yu (2005), and the current contribution also extends the results on moment generating functions (MGFs) contained in Phillips (1987a).
The jackknife has been proven to be an easy-to-implement method of eliminating first-order estimation bias in a wide variety of applications in statistics and econometrics. Its genesis can be traced to Quenouille (1956) and Tukey (1958) in the case of independently and identically distributed (iid) samples, while it has been adapted more recently to accommodate more general time series settings. Within the class of stationary autoregressive time series models, Phillips and Yu (2005) show that the jackknife can effectively reduce bias in the pricing of bond options in finance, while Chambers (2013) analyses the performance of jackknife methods based on a variety of sub-sampling procedures. In subsequent work, Chambers and Kyriacou (2013) demonstrate that the usual jackknife construction in the time series case has to be amended when a unit root is present, while Chen and Yu (2015) show that a variance-minimising jackknife can be constructed in a unit root setting that also retains its bias reduction properties. In addition, Kruse and Kaufmann (2015) compare bootstrap, jackknife and indirect inference estimators in mildly explosive autoregressions, finding that the indirect inference estimator dominates in terms of root mean squared error, but that the jackknife excels for bias reduction in stationary and unit root situations.
The usual motivation for a jackknife estimator relies on the existence of a Nagar-type expansion of the original full-sample estimator’s bias. Its construction proceeds by finding a set of weights that, when applied to a full-sample estimator and a set of sub-sample estimators, is able to eliminate fully the first-order term in the resulting jackknife estimator’s bias expansion. In stationary time series settings, the bias expansions are common to both the full-sample and sub-sample estimators, but Chambers and Kyriacou (2013) pointed out that this property no longer holds in the case of a unit root. This is because the initial values in the sub-samples are no longer negligible in the asymptotics and have a resulting effect on the bias expansions, thereby affecting the optimal weights. Construction of a fully-effective jackknife estimator relies, therefore, on knowledge of the presence (or otherwise) of a unit root.
In this paper, we explore the construction of jackknife estimators that eliminate fully the first-order bias in the near-unit root setting. Near-unit root models have attracted a great deal of interest in time series owing, amongst other things, to their ability to capture better the effects of sample size in the vicinity of a unit root, to explore analytically the power properties of unit root tests and to allow the development of an integrated asymptotic theory for both stationary and non-stationary autoregressions; see Phillips (1987a) and Chan and Wei (1987) for details. We find that jackknife estimators can be constructed in the presence of a near-unit root that achieve this aim of bias reduction. Jackknife estimators have the advantage of incurring only a very slight additional computational burden, unlike alternative resampling and simulation-based methods such as the bootstrap and indirect inference. Furthermore, they are applicable in a wide variety of estimation frameworks and work well in finite sample situations in which the prime objective is bias reduction. Although the bootstrap is often a viable candidate for bias reduction, it was shown by Park (2006) that the bootstrap is inconsistent in the presence of a near-unit root, and hence, jackknife methods offer a useful alternative in these circumstances.
The development of a jackknife estimator that achieves bias reduction in the near-unit root case is not simply a straightforward application of previous results. While Chambers and Kyriacou (2013) first pointed out that under unit root non-stationarity, the effect of the sub-sample initial conditions does not vanish asymptotically, thereby affecting asymptotic expansions of sub-sample estimator bias and the resulting jackknife weights as compared to the stationary case, the extension of these results to a local-to-unity setting is not obvious. With a near-unit root, the autoregressive parameter plays an important role, and it is therefore necessary to derive the appropriate asymptotic expansion of sub-sample estimator bias for this more general case, as well as the MGFs of the relevant limiting distributions that can be used to construct the appropriate jackknife weights. The derivation of such results is challenging in itself and is a major reason why we focus on the bias-minimising jackknife, rather than attempting to derive results for the variance-minimising jackknife of Chen and Yu (2015).
The paper is organised as follows. Section 2 defines the near-unit root model of interest and focuses on the limit distributions of sub-sample estimators, demonstrating that these limit distributions are sub-sample dependent. An asymptotic expansion of these limit distributions demonstrates the source of the failure of the standard jackknife weights in a near-unit root setting by showing that the bias expansion is also sub-sample dependent. In order to define a successful jackknife estimator, it is necessary to compute the mean of these limit distributions, and so, Section 3 derives the moment generating function of two random variables that determine the limit distributions over an arbitrary sub-interval of the unit interval. Expressions for the computation of the mean of the ratio of the two random variables are derived using the MGF. Various properties of the MGF are established, and it is shown that results obtained in Phillips (1987a) arise as a special case, including those that emerge as the near-unit root parameter tends to minus infinity.
Based on the results in Section 2 and Section 3, the optimal weights for the jackknife estimator are defined in Section 4, which then goes on to explore, via simulations, the performance of the proposed estimator in finite samples. Consideration is given to the choice of the appropriate number of sub-samples to use when either bias reduction or root mean squared error (RMSE) minimisation is the objective. It is found that greatest bias reduction can be achieved using just two sub-samples, while minimisation of RMSE, which, it should be stressed, is not the objective of the jackknife estimator, requires a larger number of sub-samples, which increases with sample size. Section 5 contains some concluding comments, and all proofs are contained in the Appendix A.
The following notation will be used throughout the paper. The symbol = d denotes equality in distribution; d denotes convergence in distribution; p denotes convergence in probability; ⇒ denotes weak convergence of the relevant probability measures; W ( r ) denotes a Wiener process on C [ 0 , 1 ] , the space of continuous real-valued functions on the unit interval; and J c ( r ) = 0 r e ( r s ) c d W ( s ) denotes the Ornstein–Uhlenbeck process, which satisfies d J c ( r ) = c J c ( r ) d r + d W ( r ) for some constant parameter c. Functionals of W ( r ) and J c ( r ) , such as 0 1 J c ( r ) 2 d r , are denoted 0 1 J c 2 for notational convenience where appropriate, and in stochastic integrals of the form e c r J c , it is to be understood that integration is carried out with respect to r. Finally, L denotes the lag operator such that L j y t = y t j for a random variable y t .

2. Jackknife Estimation with a Near-Unit Root

2.1. The Model and the Standard Jackknife Estimator

The model with a near-unit root is defined as follows.
Assumption 1.
The sequence y 1 , , y n satisfies:
y t = ρ y t 1 + u t , t = 1 , , n ,
where ρ = e c / n = 1 + c / n + O ( n 2 ) for some constant c, y 0 is an observable O p ( 1 ) random variable and u t is the stationary linear process:
u t = δ ( L ) ϵ t = j = 0 δ j ϵ t j , t = 1 , , n ,
where ϵ t iid ( 0 , σ ϵ 2 ) , E ( ϵ t 4 ) < , δ ( z ) = j = 0 δ j z j , δ 0 = 1 and j = 0 j | δ j | < .
The parameter c controls the extent to which the near-unit root deviates from unity; when c < 0 , the process is (locally) stationary, whereas it is (locally) explosive when c > 0 . Strictly speaking, the autoregressive parameter should be denoted ρ n to emphasise its dependence on the sample size, n, but we use ρ for notational convenience. The linear process specification for the innovations is consistent with u t being a stationary ARMA(p, q) process of the form ϕ ( L ) u t = θ ( L ) ϵ t , where ϕ ( z ) = j = 0 p ϕ j z j , θ ( z ) = j = 0 q θ j z j , and all roots of the equation ϕ ( z ) = 0 lie outside the unit circle. In this case, δ ( z ) = θ ( z ) / ϕ ( z ) , but Assumption 1 also allows for more general forms of linear processes and is not restricted solely to the ARMA class. Under Assumption 1, u t satisfies the functional central limit theorem:
1 n t = 1 [ n r ] u t σ W ( r ) a s n
on C [ 0 , 1 ] , where σ 2 = σ ϵ 2 δ ( 1 ) 2 denotes the long-run variance.
Equations of the form (1) have been used extensively in the literature on testing for an autoregressive unit root (corresponding to c = 0 ) and for examining the power properties of the resulting tests (by allowing c to deviate from zero). In economic and financial time series, they offer a flexible mechanism of modelling highly persistent series whose autoregressive roots are generally close, but not exactly equal, to unity. Ordinary least squares (OLS) regression on (1) yields:
y t = ρ ^ y t 1 + u ^ t , t = 1 , , n ,
where u ^ t denotes the regression residual, and it can be shown see Phillips (1987a) that ρ ^ satisfies:
n ( ρ ^ ρ ) = 1 n t = 1 n y t 1 u t 1 n 2 t = 1 n y t 1 2 Z c ( η ) = 0 1 J c d W + 1 2 ( 1 η ) 0 1 J c 2 a s n ,
where η = σ u 2 / σ 2 , σ u 2 = E ( u t 2 ) = σ ϵ 2 j = 0 δ j 2 and the functional Z c ( η ) is implicitly defined. The limit distribution in (5) is skewed, and the estimator suffers from significant negative bias in finite samples; see Perron (1989) for properties of the limit distribution for the case where σ 2 = σ u 2 and (hence) η = 1 .
The jackknife estimator offers a computationally simple method of bias reduction by combining the full-sample estimator, ρ ^ , with a set of m sub-sample estimators, ρ ^ j ( j = 1 , , m ) , the weights assigned to these components depending on the type of sub-sampling method employed. Phillips and Yu (2005) find the use of non-overlapping sub-samples to perform well in reducing bias in the estimation of stationary diffusions, while the analysis of Chambers (2013) supports this result in the setting of stationary autoregressions. In this approach, the full sample of n observations is divided into m sub-samples, each of length , so that n = m × . The generic form of jackknife estimator is given by:
ρ ^ J = w 1 ρ ^ + w 2 1 m j = 1 m ρ ^ j ,
where the weights are determined so as to eliminate the first-order finite sample bias. Assuming that the full-sample estimator and each sub-sample estimator satisfy a (Nagar-type) bias expansion of the form:
E ( ρ ^ ρ ) = a n + O 1 n 2 , E ( ρ ^ j ρ ) = a + O 1 2 ,
it can be shown that the appropriate weights are given by w 1 = m / ( m 1 ) and w 2 = 1 / ( m 1 ) , in which case:
E ( ρ ^ J ρ ) = m m 1 E ( ρ ^ ρ ) 1 m 1 1 m j = 1 m E ( ρ ^ j ρ ) = m m 1 a n + O 1 n 2 1 m 1 a + O 1 2 = a m 1 m n 1 + O 1 n 2 = O 1 n 2 ,
using the fact that m / n = 1 / . Under such circumstances, the jackknife estimator is capable of completely eliminating the O ( 1 / n ) bias term in the estimator as compared to ρ ^ . However, in the pure unit root setting ( c = 0 ), Chambers and Kyriacou (2013) demonstrated that the sub-sample estimators do not share the same limit distribution as the full-sample estimator, which means that the expansions for the bias of the sub-sample estimators are incorrect, and hence, the weights defined above do not eliminate fully the first-order bias. It is therefore important to investigate this issue in the more general setting of a near-unit root with a view toward deriving the appropriate weights for eliminating the first-order bias term.

2.2. Sub-Sample Properties

In order to explore the sub-sample properties, let:
τ j = { ( j 1 ) + 1 , , j } , j = 1 , , m ,
denote the set of integers indexing the observations in each sub-sample. The sub-sample estimators can then be written, in view of (5), as:
ρ ^ j ρ = 1 t τ j y t 1 u t 1 2 t τ j y t 1 2 , j = 1 , , m .
Theorem 1 (below) determines the limiting properties of the quantities appearing in (7), as well as the limit distribution of ( ρ ^ j ρ ) itself.
Theorem 1.
Let y 1 , , y n satisfy Assumption 1. Then, if m is fixed as n (and hence, ):
(a) 
1 2 t τ j y t 2 σ 2 m 2 ( j 1 ) / m j / m J c 2 ;
(b) 
1 t τ j y t 1 u t σ 2 m ( j 1 ) / m j / m J c d W + 1 2 ( σ 2 σ u 2 ) ;
(c) 
( ρ ^ j ρ ) Z c , j ( η ) = ( j 1 ) / m j / m J c d W + 1 2 m ( 1 η ) m ( j 1 ) / m j / m J c 2 , j = 1 , , m ,
η = σ u 2 / σ 2 .
where the functional Z c , j ( η ) is implicitly defined.
The limit distribution in Part (c) of Theorem 1 is of the same form as that of the full-sample estimator in (5), except that the integrals are over the subset [ ( j 1 ) / m , j / m ] of [ 0 , 1 ] rather than the unit interval itself. Note, too, that the first component of the numerator of Z c , j ( η ) also has the representation:
( j 1 ) / m j / m J c d W = d 1 2 J c j m 2 J c j 1 m 2 2 c ( j 1 ) / m j / m J c 2 1 m ,
which follows from the Itô calculus and is demonstrated in the proof of Part (b) of Theorem 1 in the Appendix. The familiar result, 0 1 W d W = [ W ( 1 ) 2 1 ] / 2 , follows as a special case by setting j = m = 1 and c = 0 .
The fact that the distributions Z c , j ( η ) in Theorem 1 depend on j implies that the expansions for E ( ρ ^ j ρ ) that are used to derive the jackknife weights defined following (6) may not be correct under a near-unit root. When the process (1) has a near-unit root, we can expect the expansions for E ( ρ ^ j ρ ) to be of the form:
E ρ ^ j ρ = μ c , j + O 1 2 , j = 1 , , m ;
indeed, we later justify this expansion and characterise μ c , j precisely. Such expansions have been shown to hold in the unit root ( c = 0 ) case, as well as more generally when c 0 . For example, Phillips (1987b, Theorem 7.1) considered the Gaussian random walk (corresponding to (1) with c = 0 , δ ( z ) = 1 , y 0 = 0 and u t Gaussian) and demonstrated the validity of an asymptotic expansion for the normalised coefficient estimator; it is given by:
n ( ρ ^ 1 ) = d 0 1 W d W 0 1 W 2 ξ 2 n 0 1 W 2 + O p 1 n ,
where ξ is a standard normal random variable distributed independently of W. Taking expectations in (9), using the independence of ξ and W and noting that the expected value of the leading term is 1.7814 (see, for example, Table 7.1 of Tanaka 1996), the bias satisfies:
E ( ρ ^ 1 ) = 1 . 7814 n + o 1 n ;
see, also, Phillips (2012, 2014). In the more general setting of the model in Assumption 1, and assuming that u t is Gaussian, Theorem 1 of Perron (1996) established that:
n ( ρ ^ ρ ) = d 0 1 J c d W + 1 2 ( 1 η ) + y 0 σ n 0 1 e c r d W v f 2 σ 2 n ξ 0 1 J c 2 + 2 y 0 σ n 0 1 e c r J c + O p 1 n ,
where v f 2 = 2 π f u 2 ( 0 ) and f u 2 ( 0 ) denotes the spectral density of u t 2 σ u 2 at the origin. The following result extends the type of expansion in (11) to the sub-sample estimators.
Theorem 2.
Let y 1 , , y n satisfy Assumption 1 and, in addition, assume that u t is Gaussian. Then, for j = 1 , , m ,
( ρ ^ j ρ ) = d ( j 1 ) / m j / m J c d W + 1 2 m ( 1 η ) + 2 y 0 σ ξ 1 j v f σ 2 m ξ 2 j m ( j 1 ) / m j / m J c 2 + 2 m y 0 σ ( j 1 ) / m j / m e c r J c + O p 1 ,
where ξ i j N ( 0 , s j 2 ) , ξ 2 j N ( 0 , 1 ) and:
s j 2 = ( m 1 ) 2 2 c m e c j / m e c ( j 1 ) / m 2 + ( m 2 m 2 ) 2 c m e c j / m e c ( j 1 ) / m + 2 ( 1 + m ) m 2 e c j / m .
The form of the expansion for ( ρ ^ j ρ ) in Theorem 2 is similar to that for the full-sample estimator, but also depends on m and j. Use of these expansions to derive expressions for the biases of ρ ^ and ρ ^ j would be complicated due to the dependence on y 0 . We therefore take y 0 = 0 1, which results in the following expectations:
E ( ρ ^ ρ ) = E ( Z c ( η ) ) n + O 1 n 2 , E ( ρ ^ j ρ ) = E ( Z c , j ( η ) ) + O 1 2 ,
these results utilising the independence of the normally distributed random variables ( ξ 1 j and ξ 2 j ) and the Wiener process W. The next section provides the form of the moment generating function that enables expectations of the functionals Z c ( η ) and Z c , j ( η ) to be computed, thereby enabling the construction of a jackknife estimator that eliminates the first-order bias.

3. A Moment Generating Function and Its Properties

The following result provides the joint moment generating function (MGF) of two relevant functionals of J c defined over a subinterval [ a , b ] of [ 0 , b ] where 0 a < b . Although our focus is on sub-intervals of [ 0 , 1 ] , we leave b unconstrained for greater generality than is required for our specific purposes because the results may have more widespread use beyond our particular application.
Theorem 3.
Let N c = a b J c ( r ) d W ( r ) and D c = a b J c ( r ) 2 d r , where J c ( r ) is an Ornstein–Uhlenbeck process on r [ 0 , b ] with parameter c, and 0 a < b . Then:
(a) 
The joint MGF of N c and D c is given by:
M c ( θ 1 , θ 2 ) = E exp ( θ 1 N c + θ 2 D c ) = exp ( θ 1 + c ) 2 ( b a ) H c ( θ 1 , θ 2 ) 1 / 2 ,
where, defining λ = ( c 2 + 2 c θ 1 2 θ 2 ) 1 / 2 and v 2 = ( e 2 a c 1 ) / ( 2 c ) ,
H c ( θ 1 , θ 2 ) = cosh ( b a ) λ 1 λ θ 1 + c + v 2 θ 1 2 + 2 θ 2 sinh ( b a ) λ .
(b) 
The individual MGFs for N c and D c are given by, respectively,
M N c ( θ 1 ) = exp ( θ 1 + c ) 2 ( b a ) × cosh ( ( b a ) λ 1 ) 1 λ 1 θ 1 + c + v 2 θ 1 2 sinh ( ( b a ) λ 1 ) 1 / 2
M D c ( θ 2 ) = exp c 2 ( b a ) cosh ( b a ) λ 2 1 λ 2 c + 2 v 2 θ 2 sinh ( b a ) λ 2 1 / 2
where λ 1 = ( c 2 + 2 c θ 1 ) 1 / 2 and λ 2 = ( c 2 2 θ 2 ) 1 / 2 .
(c) 
Let:
g ( θ 2 ) = cosh ( b a ) ( c 2 + 2 θ 2 ) 1 / 2 c 2 v 2 θ 2 sinh ( b a ) ( c 2 + 2 θ 2 ) 1 / 2 ( c 2 + 2 θ 2 ) 1 / 2 .
Then, the expectation of N c / D c is given by:
E N c D c = 0 M c ( θ 1 , θ 2 ) θ 1 θ 1 = 0 d θ 2 = I 1 ( a , b ) + I 2 ( a , b ) + I 3 ( a , b ) + I 4 ( a , b ) ,
where:
I 1 ( a , b ) = ( b a ) 2 exp c ( b a ) 2 0 1 g ( θ 2 ) 1 / 2 d θ 2 , I 2 ( a , b ) = c ( b a ) 1 2 exp c ( b a ) 2 0 sinh ( b a ) ( c 2 + 2 θ 2 ) 1 / 2 ( c 2 + 2 θ 2 ) 1 / 2 g ( θ 2 ) 3 / 2 d θ 2 , I 3 ( a , b ) = c 2 exp c ( b a ) 2 0 c 2 v 2 θ 2 sinh ( b a ) ( c 2 + 2 θ 2 ) 1 / 2 ( c 2 + 2 θ 2 ) 3 / 2 g ( θ 2 ) 3 / 2 d θ 2 , I 4 ( a , b ) = c ( b a ) 2 exp c ( b a ) 2 0 c 2 v 2 θ 2 cosh ( b a ) ( c 2 + 2 θ 2 ) 1 / 2 ( c 2 + 2 θ 2 ) g ( θ 2 ) 3 / 2 d θ 2 .
The MGFs for the two functionals in Theorem 3 have potential applications in a wide range of sub-sampling problems with near-unit root processes. A potential application of the joint MGF in Part (a) of Theorem 3 is in the computation of the cumulative and probability density functions of the distributions Z c , j ( η ) when setting a = ( j 1 ) / m and b = j / m . For example, the probability density function of m Z c , j ( 1 ) is given by (with i 2 = 1 ):
p d f ( z ) = 1 2 π i lim ϵ 1 0 , ϵ 2 ϵ 1 < | θ 1 | < ϵ 2 M c ( i θ 1 , i θ 2 ) θ 2 θ 2 = θ 1 z d θ 1 ;
see, for example, Perron (1991, p. 221), who performs this type of calculation for the distribution Z c , while Abadir (1993) derives a representation for the density function of Z c in terms of a parabolic cylinder function.
The result in Part (b) of Theorem 3 is obtained by differentiating the MGF and constructing the appropriate integrals. When c = 0 , the usual (full-sample) result, where a = 0 and b = 1 , can be obtained as a special case. Noting that v 2 = 0 in this case and making the substitution w = ( c 2 + 2 θ 2 ) 1 / 2 results in:
I 1 ( 0 , 1 ) = 1 2 0 w cosh ( w ) 1 / 2 d w , I 2 ( 0 , 1 ) = 1 2 0 sinh ( w ) cosh ( w ) 3 / 2 d w , I 3 ( 0 , 1 ) = 0 , I 4 ( 0 , 1 ) = 0 ;
these expressions can be found; for example, Gonzalo and Pitarakis (1998, Lemma 3.1). Some further special cases of interest that follow from Theorem 3 are presented below.
Corollary to Theorem 3.
(a) Let [ a , b ] = [ 0 , 1 ] so that N c = 0 1 J c ( r ) d W ( r ) and D c = 0 1 J c ( r ) 2 d r . Then:
M c ( θ 1 , θ 2 ) = exp ( θ 1 + c ) 2 cosh ( λ ) 1 λ θ 1 + c sinh ( λ ) 1 / 2 , M N c ( θ 1 ) = exp ( θ 1 + c ) 2 cosh ( λ 1 ) 1 λ 1 θ 1 + c sinh ( λ 1 ) 1 / 2 , M D c ( θ 2 ) = exp c 2 cosh λ 2 c λ 2 sinh λ 2 1 / 2 ,
while taking the limit as c 0 yields:
M 0 ( θ 1 , θ 2 ) = exp θ 1 2 cosh λ 0 θ 1 λ 0 sinh λ 0 1 / 2 , M N 0 ( θ 1 ) = e θ 1 / 2 ( 1 θ 1 ) 1 / 2 , M D 0 ( θ 2 ) = cosh λ 0 1 / 2 ,
where λ 0 = 2 θ 2 .
(b) Let [ a , b ] = [ ( j 1 ) / m , j / m ] so that N c = ( j 1 ) / m j / m J c ( r ) d W ( r ) and D c = ( j 1 ) / m j / m J c ( r ) 2 d r   ( j = 1 , , m ) . Then:
M c ( θ 1 , θ 2 ) = exp ( θ 1 + c ) 2 m cosh λ m 1 λ θ 1 + c + v j 1 2 θ 1 2 + 2 θ 2 sinh λ m 1 / 2 , M N c ( θ 1 ) = exp ( θ 1 + c ) 2 m cosh λ 1 m 1 λ 1 θ 1 + c + v j 1 2 θ 1 2 sinh λ 1 m 1 / 2 ,
M D c ( θ 2 ) = exp c 2 m cosh λ 2 m 1 λ 2 c + 2 v j 1 2 θ 2 sinh λ 2 m 1 / 2 ,
where v j 1 2 = ( exp ( 2 ( j 1 ) c / m ) 1 ) / ( 2 c ) . Taking the limit as c 0 results in:
M 0 ( θ 1 , θ 2 ) = exp θ 1 2 m cosh λ 0 m 1 λ 0 θ 1 + ( j 1 ) m θ 1 2 + 2 θ 2 sinh λ 0 m 1 / 2 , M N 0 ( θ 1 ) = exp θ 1 2 m 1 θ 1 m ( j 1 ) θ 1 2 m 2 1 / 2 , M D 0 ( θ 2 ) = cosh λ 0 m 2 ( j 1 ) θ 2 m λ 0 sinh λ 0 m 1 / 2 .
The results in Part (a) of the corollary are relevant in the full-sample case, and the result for M 0 ( θ 1 , θ 2 ) goes back to White (1958). The results in Part (b) of the corollary are pertinent to the sub-sampling issues being investigated here in the case of a near-unit root, with the unit root ( c = 0 ) result for M 0 ( θ 1 , θ 2 ) having been first derived by Chambers and Kyriacou (2013).
It is also possible to use the above results to explore the relationship between the sub-sample distributions and the full-sample distribution. For example, it is possible to show that M N c / m ( θ 1 / m ) on [ 0 , 1 ] is equal to M N c ( θ 1 ) for j = 1 in the sub-samples, while M D c / m ( θ 2 / m 2 ) on [ 0 , 1 ] is equal to M D c ( θ 2 ) for j = 1 in the sub-samples; an implication of this is that:
0 1 / m J c d W = d 1 m 0 1 J c / m d W , 0 1 / m J c 2 = d 1 m 2 0 1 J c / m 2 .
Furthermore, this implies that the limit distribution of the first sub-sample estimator, ( ρ ^ 1 ρ ) , when ρ = e c / n = e c / m , is the same as that of the full-sample estimator, n ( ρ ^ ρ ) , when ρ = e c / m n .
The sub-sample results with a near-unit root can be related to the full-sample results of Phillips (1987a). For example, the MGF in Theorem 3 has the equivalent representation:
M c ( θ 1 , θ 2 ) = 1 2 exp ( θ 1 + c ) ( b a ) λ 1 ( 2 λ + δ ) ( 1 + δ v 2 ) exp ( z ) 2 λ δ v 2 + δ ( 1 + δ v 2 ) exp ( z )
where λ and v 2 are defined in the theorem, z = ( b a ) λ and δ = θ 1 + c λ . When a = 0 , b = 1 , it follows that v 2 = 0 , and the above expression nests the MGF in Phillips (1987a), i.e.,
M c ( θ 1 , θ 2 ) = 1 2 exp ( θ 1 + c ) ( c 2 + 2 c θ 1 2 θ 2 ) 1 / 2 × θ 1 + c + ( c 2 + 2 c θ 1 2 θ 2 ) 1 / 2 exp ( c 2 + 2 c θ 1 2 θ 2 ) 1 / 2 θ 1 + c ( c 2 + 2 c θ 1 2 θ 2 ) 1 / 2 exp ( c 2 + 2 c θ 1 2 θ 2 ) 1 / 2 ;
this follows straightforwardly from (14). It is also of interest to examine what happens when the local-to-unity parameter c , as in Phillips (1987a) and other recent work on autoregression, e.g., Phillips (2012). We present the results in Theorem 4 below.
Theorem 4.
Let J c ( r ) denote an Ornstein–Uhlenbeck process on r [ 0 , b ] with parameter c, and let 0 a < b . Furthermore, define the functional:
K ( c ) = g ( c ) 1 / 2 a b J c 2 1 a b J c d W + 1 2 ( 1 η ) ,
where η = σ u 2 / σ 2 and:
g ( c ) = E a b J c 2 = 1 4 c 2 exp ( 2 b c ) exp ( 2 a c ) + 1 2 c ( b a ) .
Then, as c :
(a) 
( 2 c ) 1 / 2 a b J c d W N ( 0 , ( b a ) ) ;
(b) 
( 2 c ) a b J c 2 p ( b a ) ;
(c) 
K ( c ) N ( 0 , 1 ) if σ u 2 = σ 2 (and hence η = 1 ) and diverges otherwise.
The functional K ( c ) in Theorem 4 represents the limit distribution of the normalised estimator g ( c ) 1 / 2 ( ρ ^ a , b ρ ) , where denotes the number of observations in the sub-sample [ b n + 1 , b n ] (so that a = b ( 1 / m ) in this case) and ρ ^ a , b is the corresponding estimator. However, as pointed out by Phillips (1987a), the sequential limits (large sample for fixed c, followed by c ) are only indicative of the results one might expect in the stationary case and do not constitute a rigorous demonstration. The results in Theorem 4 also encompass the related results in Phillips (1987a) obtained when a = 0 and b = 1 .

4. An Optimal Jackknife Estimator

The discussion following Theorem 2 indicates that the weights defining an optimal jackknife estimator, which removes first-order bias in the local-to-unity setting, depend on the quantities:
μ c ( η ) = E ( Z c ( η ) ) = μ c ( 1 ) + λ 1 ( η ) E 1 D c , μ c , j ( η ) = E ( Z c , j ( η ) ) = μ c , j ( 1 ) + λ 2 ( η , m ) E 1 D c , j , j = 1 , , m ,
where λ 1 ( η ) = ( 1 η ) / 2 , λ 2 ( η , m ) = λ 1 ( η ) / m 2 and:
D c = 0 1 J c 2 , D c , j = ( j 1 ) / m j / m J c 2 , j = 1 , , m .
In particular, Part (c) of Theorem 3 can be used to evaluate the quantities:
μ c ( 1 ) = E N c D c , μ c , j ( 1 ) = E N c , j m D c , j , j = 1 , , m ,
where we have defined:
N c = 0 1 J c d W , N c , j = ( j 1 ) / m j / m J c d W , j = 1 , , m .
The relevant MGFs for evaluating μ c ( 1 ) and μ c , j ( 1 ) are given in the corollary to Theorem 3.
Table 1 contains the values of μ c , j ( 1 ) for values of m and c as follows: m = { 1 , 2 , 3 , 4 , 6 , 8 , 12 } and c = { 50 , 20 , 10 , 5 , 1 , 0 , 1 } .2 The entries for m = 1 correspond to μ c ( 1 ) = E ( Z c ( 1 ) ) in view of the distributional equivalence of Z c ( 1 ) and Z c , 1 ( 1 ) discussed following the corollary. For a given combination of j and m, it can be seen that the expectations increase as c increases, while for given c and j the expectations increase with m. A simple explanation for the different properties of the sub-samples beyond j = 1 is that the initial values are of the same order of magnitude as the partial sums of the innovations. The values of the sub-sample expectations when c = 0 are seen from Table 1 to be independent of m and to increase with j. Note that μ 0 , 1 ( 1 ) = 1.7814 corresponds to the expected value of the limit distribution of the full-sample estimator ρ ^ under a unit root; see, for example, (10) and the associated commentary. The values of μ 0 , j ( η ) can be used to define jackknife weights under a unit root for different values of m; see, for example, Chambers and Kyriacou (2013). More generally, the values of μ c , j ( η ) can be used to define optimal weights for the jackknife estimator that achieve the aim of first-order bias removal in the presence of a near-unit root. The result is presented in Theorem 5.
Theorem 5.
Let μ ¯ c ( η ) = μ c ( η ) j = 1 m μ c , j ( η ) . Then, under Assumption 1, an optimal jackknife estimator is given by:
ρ ^ J = w 1 , c ( η ) ρ ^ + w 2 , c ( η ) 1 m j = 1 m ρ ^ j ,
where w 1 , c ( η ) = j = 1 m μ c , j ( η ) / μ ¯ c ( η ) and w 2 , c ( η ) = 1 w 1 , c ( η ) = μ c ( η ) / μ ¯ c ( η ) .
Theorem 5 shows the form of the optimal weights for the jackknife estimator when the process (1) has a near-unit root. It can be seen that the weights depend not only on the value of c, but also on the value of η , both of which are unknown in practice. The authors in Chambers and Kyriacou (2013) and Chen and Yu (2015) have emphasised the case c = 0 and η = 1 and have reported simulation results highlighting the good bias-reduction properties of appropriate jackknife estimators in that case. When c 0 and η = 1 , the optimal weights in Theorem 5 simplify to:
w 1 , c ( 1 ) = j = 1 m μ c , j ( 1 ) / μ ¯ c ( 1 ) , w 2 , c ( 1 ) = μ c ( 1 ) / μ ¯ c ( 1 ) .
The values of μ c , j ( 1 ) in Table 1 can be utilised to derive these optimal weights for the jackknife estimator in this case; these are reported in Table 2 for the values of m and c used in Table 1, along with the values of the standard weights that are applicable in stationary autoregressions. The entries in Table 1 show that the optimal weights are larger in (absolute) value than the standard weights that would apply if all the sub-sample distributions were the same and that they increase with c for given m. The optimal weights also converge towards the standard weights as c becomes more negative; this could presumably be demonstrated analytically using the properties of the MGF in constructing the μ c , j ( 1 ) by examining the appropriate limits as c , although we do not pursue such an investigation here.
The relationship between the optimal weights when η = 1 and when η 1 is not straightforward. Noting that:
μ ¯ c ( η ) = μ ¯ c ( 1 ) + λ 1 ( η ) E 1 D c λ 2 ( η , m ) j = 1 m E 1 D c , j
and that:
j = 1 m μ c , j ( η ) = j = 1 m μ c , j ( 1 ) + λ 2 ( η , m ) j = 1 m E 1 D c , j
we find that:
w 1 , c ( η ) = j = 1 m μ c , j ( 1 ) + λ 2 ( η , m ) j = 1 m E 1 D c , j μ ¯ c ( 1 ) + λ 1 ( η ) E 1 D c λ 2 ( η , m ) j = 1 m E 1 D c , j .
This expression can be manipulated to write w 1 , c ( η ) explicitly in terms of w 1 , c ( 1 ) as follows:
w 1 , c ( η ) = w 1 , c ( 1 ) λ 2 ( η , m ) μ ¯ c ( 1 ) j = 1 m E 1 D c , j 1 + λ 1 ( η ) μ ¯ c ( 1 ) E 1 D c λ 2 ( η , m ) μ ¯ c ( 1 ) j = 1 m E 1 D c , j .
The second weight is obtained simply as w 2 , c ( η ) = 1 w 1 , c ( η ) . In situations where η 1 , which essentially reflects cases where u t does not have a white noise structure, the optimal weights can be obtained from the entries in Table 2 (at least for the relevant values of c) using (15), but knowledge is still required not only of c and η , but also the expectations of the inverses of D c and D c , j . The latter can be computed numerically; Equation (2.3) of Meng (2005) shows that:
E 1 D c = 0 M D c ( θ 2 ) d θ 2 , E 1 D c , j = 0 M D c , j ( θ 2 ) d θ 2 , j = 1 , , m ,
where M D c ( θ 2 ) and M D c , j ( θ 2 ) are the MGFs of D c and D c , j , respectively, which can be obtained from the corollary to Theorem 3.
In practice, however, the values of c and η are still required in order to construct the optimal estimator. Although the localization parameter c from the model defined under Assumption 1 is identifiable, it is not possible to estimate it consistently, and attempts to do so require a completely different formulation of the model; see, for example, Phillips et al. (2001), who propose a block local-to-unity framework to consistently estimate c, although this approach does not appear to have been pursued subsequently. Furthermore η depends on an estimator of the long-run variance σ 2 , which is a notoriously difficult quantity to estimate in finite samples. In view of these unresolved challenges and following earlier work on jackknife estimation of autoregressive models with a (near-)unit root, we focus on the case η = 1 , but allow c 0 with particular attention paid to unit root and locally-stationary processes, i.e., c 0 .
Our simulations examine the performance of five estimators3 of the parameter ρ = e c / n . The baseline estimator is the OLS estimator in (4), the bias of which the jackknife estimators aim to reduce. Three jackknife estimators with the generic form:
ρ ^ J = w 1 ρ ^ + w 2 1 m j = 1 m ρ ^ j
are also considered, each differing in the choice of weights w 1 ; in all cases, w 2 = 1 w 1 . The standard jackknife sets w 1 = m / ( m 1 ) ; the optimal jackknife sets w 1 = w 1 , c ; and the unit root jackknife sets w 1 = w 1 , 0 . The standard jackknife removes fully the first-order bias in stationary autoregressions, but does not do so in the near-unit root framework, in which the optimal estimator achieves this goal. However, the optimal estimator is infeasible because it relies on the unknown parameter c.4 We therefore also consider the feasible unit root jackknife obtained by setting c = 0 . In addition, we consider the jackknife estimator of Chen and Yu (2015), which is of the form:
ρ ^ J C Y = w 1 ρ ^ + j = 1 m w 2 , j ρ ^ j .
The weights are chosen so as to minimise the variance of the estimator in addition to providing bias reduction in the case c = 0 . Because the choice of weights is a more complex problem for this type of jackknife estimator, Chen and Yu only provide results for the cases m = 2 and m = 3 , in which case the weights are w 1 = 2 . 8390 , w 2 , 1 = 0.6771 , w 2 , 2 = 1.1619 and w 1 = 2.0260 , w 2 , 1 = 0.2087 , w 2 , 2 = 0.3376 , w 2 , 3 = 0.4797 , respectively; see Table 1 of Chen and Yu (2015).
Table 3 reports the bias of the five estimators obtained from 100,000 replications of the model in Assumption 1 with u t i i d N ( 0 , 1 ) and y 0 = 0 using m = 2 for each of the jackknife estimators; this value has been found to provide particularly good bias reduction in a number of studies, including Phillips and Yu (2005), Chambers (2013), Chambers and Kyriacou (2013) and Chen and Yu (2015). The particular values of c are c = { 10 , 5 , 1 , 0 } , which focus on the pure unit root case, as well as locally stationary processes, and four sample sizes are considered, these being n = 24, 48, 96 and 192. The corresponding values of ρ are: { 0.5833 , 0.7917 , 0.8958 , 0.9479 }   ( c = 10 ) ; { 0.7917 , 0.8958 , 0.9479 , 0.9740 }   ( c = 5 ) ; { 0.9583 , 0.9792 , 0.9896 , 0.9948 }   ( c = 1 ) ; and ρ = 1 for all values of n when c = 0 . The values of ρ when c = 10 are some way from unity for the smaller sample sizes, which suggests that the standard jackknife might perform well in these cases.
The value of the bias of the estimator producing the minimum (absolute) bias for each c and n is highlighted in bold in Table 3. The results show the substantial reduction in bias that can be achieved with jackknife estimators, the superiority of the optimal estimator being apparent as c becomes more negative, although the unit root jackknife also performs well in terms of bias reduction.
Table 4 contains the corresponding RMSE values for the jackknife estimators using m = 2 , as well as the RMSE corresponding to the RMSE-minimising values of m, which are typically larger than m = 2 and are also reported in the table. The RMSE value of the estimator producing the minimum RMSE for each c and n is highlighted in bold. In fact, the optimal jackknife estimator, although constructed to eliminate first order bias, manages to reduce the OLS estimator’s RMSE and outperforms the Chen and Yu (2015) jackknife estimator in both bias and RMSE reduction, although the latter occurs at a larger number of sub-samples. The results show that use of larger values of m tends to produce smaller RMSE than when m = 2 , and again, the optimal jackknife performs particularly well when c becomes more negative. The performance of the unit root jackknife is also impressive, suggesting that it is a feasible alternative to the optimal estimator when the value of c is unknown.
Although in itself important, bias is not the only feature of a distribution that is of interest, and hence, the RMSE values in Table 4 should also be taken into account when assessing the performance of the estimators. The substantial bias reductions obtained with the bias-minimising value of m = 2 come at the cost of a larger variance that ultimately feeds through into a larger RMSE compared with the OLS estimator ρ ^ . This can be offset, however, by using the larger RMSE-minimising values of m that, despite having a larger bias than when m = 2 , are nevertheless able to reduce the variance sufficiently to result in a smaller RMSE than ρ ^ .
In order to assess the robustness of the jackknife estimators, some additional bias results are presented in Table 5 that correspond to values of η < 1 , while the estimators are based on the assumption that η = 1 , as in the preceding simulations.5 The results correspond to two different specifications for u t that enable data to be generated that are consistent with different values of η . The first specifies u t to be a first-order moving average (MA(1)) process, so that u t = ϵ t + θ ϵ t 1 where ϵ t i i d N ( 0 , 1 ) ; in this case η = ( 1 + θ 2 ) / ( 1 + θ ) 2 . The second specification is a first-order autoregressive (AR(1)) process of the form u t = ϕ u t 1 + ϵ t , in which case η = ( 1 ϕ ) 2 / ( 1 ϕ 2 ) . In the MA(1) case, we have chosen θ = 0 . 5 in order to give an intermediate value of η = 0.5556 , while in the AR(1) case, we have chosen ϕ = 0.9 to give a small value of η = 0.0526 . As in Table 3, the value of the bias of the estimator producing the minimum (absolute) bias for each c and n is highlighted in bold. Table 5 shows, in the MA case, that the jackknife estimators are able to reduce bias when c = 0 , but none of them is able to do so when c = 1 or c = 5 . In the AR case, with a smaller value of η , the jackknife estimators are still able to deliver bias reduction, albeit to a lesser extent than when η = 1 , and it is the unit root jackknife of Chambers and Kyriacou (2013) that achieves the greatest bias reduction in this case. These results are indicative of the importance of knowing η and suggest that developing methods to allow for η 1 is important from an empirical viewpoint.

5. Conclusions

This paper has analysed the specification and performance of jackknife estimators of the autoregressive coefficient in a model with a near-unit root. The limit distributions of sub-sample estimators that are used in the construction of the jackknife estimator are derived, and the joint MGF of two components of that distribution is obtained and its properties explored. The MGF can then be used to derive the weights for an optimal jackknife estimator that removes fully the first-order finite sample bias from the OLS estimator. The resulting jackknife estimator is shown to perform well at finite sample bias reduction in a simulation study and, with a suitable choice of the number of sub-samples, is shown to be able to reduce the overall finite sample RMSE, as well.
The theoretical findings in Section 3 and Section 4 show how first-order approximations on sub-sample estimators can be used along with the well-known full-sample results of Phillips (1987a) for finite-sample refinements. The jackknife uses analytical (rather than simulation-based) results to achieve bias reduction at minimal computational cost along the same lines as indirect inference methods based on analytical approximations in Phillips (2012) and Kyriacou et al. (2017). Apart from computational simplicity, an evident advantage of analytical-based methods over simulation-based alternatives such as bootstrap or (traditional, simulation-based) indirect inference methods is that they require no distributional assumptions on the error term.
Despite its success in achieving substantial bias reduction in finite samples, as shown in the simulations, a shortcoming of the jackknife estimator, and an impediment to its use in practice, is the dependence of the optimal weights on the unknown near-unit root parameter, as well as on a quantity related to the long-run variance of the disturbances.6 However, our theoretical results in Section 3 and Section 4 reveal precisely how these quantities affect the optimal weights and therefore can, in principle, be used to guide further research into the development of a feasible data-driven version of the jackknife within this framework. Such further work is potentially useful in view of the simulations in Table 3 and Table 4 highlighting that (feasible) jackknife estimators are an effective bias and RMSE reduction tool in a local unit root setting, even if they do not fully remove first order bias. Moreover, the results obtained in Theorems 1–4 can be utilised in a wide range of sub-sampling situations outside that of jackknife estimation itself.
The results in this paper could be utilised and extended in a number of directions. An obvious application would be in the use of jackknife estimators as the basis for developing unit root test statistics, the local-to-unity framework being particularly well suited to the analysis of the power functions of such tests. It would also be possible to develop, fully, a variance-minimising jackknife estimator along the lines of Chen and Yu (2015) who derived analytic results for c = 0 and m = 2 or 3, although extending their approach to arbitrary c and m represents a challenging task. However, considerable progress has been made in this direction by Stoykov (2017), who builds upon our results and also proposes a two-step jackknife estimator that incorporates an estimate of c to determine the jackknife weights. The estimation model could also be extended to include an intercept and/or a time trend. The presence of an intercept will affect the limit distributions by replacing the Ornstein–Uhlenbeck processes by demeaned versions thereof, which will also have an effect on the finite sample biases. Such effects have been investigated by Stoykov (2017), who shows that substantial reductions in bias can still be achieved by jackknife methods. Applications of jackknife methods in multivariate time series settings are also possible, a recent example being Chambers (2015) in the case of a cointegrated system, but other multivariate possibilities could be envisaged.

Acknowledgments

We thank the Editors, two anonymous referees and Marian Stoykov for helpful comments on this piece of work. The initial part of the first author’s research was funded by the Economic and Social Research Council under Grant Number RES-000-22-3082.

Author Contributions

Both authors contributed equally to the paper.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A.

Proof of Theorem 1.
The proofs of Parts (a) and (b) rely on the solution to the stochastic difference equation generating y t , which is given by:
y t = j = 1 t e c ( t j ) / n u j + e c t / n y 0 .
The normalised partial sums of u t , S t = j = 1 t u j , are also important, as is the functional:
X n ( r ) = 1 n S [ n r ] = 1 n S j 1 , j 1 n r < j n .
Under the conditions on u t , it follows that X n ( r ) σ W ( r ) as n . Taking each part in turn:
(a) In view of (A1) and (A2), the object of interest can be written:
1 2 t τ j y t 2 = 1 2 t τ j j = 1 t e c ( t j ) / n u j + e c t / n y 0 2 = 1 2 t τ j j = 1 t e c ( t j ) / n u j 2 + 2 e c t / n y 0 j = 1 t e c ( t j ) / n u j + e 2 c t / n y 0 2 = n 2 2 t τ j ( t 1 ) / n t / n j = 1 t e c ( t j ) / n ( j 1 ) / n j / n d X n ( s ) 2 d r + o p ( 1 ) = m 2 ( j 1 ) / m j / m 0 r e c ( r s ) d X n ( s ) 2 d r + o p ( 1 ) σ 2 m 2 ( j 1 ) / m j / m J c 2
where, in the penultimate line, we note that j / n = j / m and ( j 1 ) / n = ( j 1 ) / m to give the limits of the outer integral.
(b) Squaring the difference equation for y t , summing over t τ j and noting that e 2 c / n = 1 + ( 2 c / n ) + O ( n 2 ) , we obtain:
1 t τ j y t 2 = e 2 c / n t τ j y t 1 2 + 2 e c / n t τ j y t 1 u t + 1 t τ j u t 2 = 1 t τ j y t 1 2 + 2 c n t τ j y t 1 2 + 2 t τ j y t 1 u t + 1 t τ j u t 2 + o p ( 1 ) .
Solving for the quantity of interest yields:
1 t τ j y t 1 u t = 1 2 1 t τ j y t 2 1 t τ j y t 1 2 2 c n t τ j y t 1 2 1 t τ j u t 2 + o p ( 1 ) = 1 2 1 y j 2 1 y ( j 1 ) 2 2 c m 2 t τ j y t 1 2 1 t τ j u t 2 + o p ( 1 ) .
Now, as n ,
1 y j = m n y j σ m J c j m , 1 y ( j 1 ) = m n y ( j 1 ) σ m J c j 1 m ,
noting that j = ( j / m ) n and ( j 1 ) = ( ( j 1 ) / m ) n . It follows that:
1 t τ j y t 1 u t 1 2 σ 2 m J c j m 2 σ 2 m J c j 1 m 2 2 c m σ 2 m 2 ( j 1 ) / m j / m J c 2 σ u 2 2 = σ 2 m 2 J c j m 2 J c j 1 m 2 2 c ( j 1 ) / m j / m J c 2 σ u 2 2 .
Using the Itô calculus (see, for example, Tanaka (1996, p. 58), we obtain the following stochastic differential equation for J c ( t ) 2 :
d [ J c ( t ) 2 ] = 2 J c ( t ) d J c ( t ) + d t ;
substituting d J c ( t ) = c J c ( t ) d t + d W ( t ) then yields:
d [ J c ( t ) 2 ] = 2 J c ( t ) d W ( t ) + 1 + 2 c J c ( t ) 2 d t .
Integrating the above over [ ( j 1 ) / m , j / m ] , we find that:
J c j m 2 J c j 1 m 2 = 2 ( j 1 ) / m j / m J c d W + 2 c ( j 1 ) / m j / m J c 2 + 1 m ,
and hence, we obtain:
1 t τ j y t 1 u t σ 2 m 2 2 ( j 1 ) / m j / m J c d W + 1 m σ u 2 2 = σ 2 m ( j 1 ) / m j / m J c d W + 1 2 ( σ 2 σ u 2 )
as required.
(c) The result follows immediately from Parts (a) and (b) in view of (7). ☐
Proof of Theorem 2.
Proceeding as in the proof of Theorem 1, but retaining higher order terms, we find that:
1 2 t τ j y t 2 = n 2 2 t τ j ( t 1 ) / n t / n j = 1 t e c ( t j ) / n ( j 1 ) / n j / n d X n ( s ) 2 d r + 2 n 3 / 2 y 0 2 t τ j ( t 1 ) / n t / n e c t / n j = 1 t e c ( t j ) / n ( j 1 ) / n j / n d X n ( s ) d r + n 2 t τ j ( t 1 ) / n t / n e 2 c t / n d r y 0 2 = m 2 ( j 1 ) / m j / m 0 r e c ( r s ) d X n ( s ) 2 d r + 2 m 3 / 2 y 0 ( j 1 ) / m j / m e c r 0 r e c ( r s ) d X n ( s ) d r + m 2 ( j 1 ) / m j / m e 2 c r d r y 0 2 = d σ 2 m 2 ( j 1 ) / m j / m J c 2 + 2 σ m 3 / 2 y 0 ( j 1 ) / m j / m e c r J c + O p 1 .
Next, as before, we have:
1 t τ j y t 1 u t = 1 2 1 y j 2 1 y ( j 1 ) 2 2 c m 2 t τ j y t 1 2 1 t τ j u t 2 + O p 1 .
Now:
1 y j 2 = d σ m J c j m + m e c j / m y 0 2 + O p 1 = d σ 2 m J c j m 2 + 2 σ m e c j / m y 0 J c j m + O p 1 ;
a similar result holds for ( y ( j 1 ) / ) 2 . Furthermore,
1 t τ j u t 2 = 1 1 t τ j ( u t 2 σ u 2 ) + σ u 2 = d v f ξ 2 j + σ u 2 + O p 1
where ξ 2 j N ( 0 , 1 ) ( j = 1 , , m ) . Combining with the result for ( 1 / 2 ) t τ j y t 2 , we find that:
1 t τ j y t 1 u t = d 1 2 σ 2 m J c j m 2 J c j 1 m 2 2 c ( j 1 ) / m j / m J c 2 1 m + σ 2 + 2 σ m y 0 e c j / m J c j m e c ( j 1 ) / m J c j 1 m 4 c σ m y 0 ( j 1 ) / m j / m e c r J c 1 ξ 2 j σ u 2 + O p 1 = d σ 2 m ( j 1 ) / m j / m J c d W + 1 2 ( σ 2 σ u 2 ) + 2 σ m y 0 ξ 1 j v f ξ 2 j + O p 1
where ξ 1 j   ( j = 1 , , m ) is defined by:
ξ 1 j = e c j / m J c j m e c ( j 1 ) / m J c j 1 m 2 c m ( j 1 ) / m j / m e c r J c .
The stated distribution of ξ 1 j then follows using the property that:
E J c ( r ) J c ( s ) = e c ( r + s ) e c ( max ( r , s ) min ( r , s ) ) 2 c
to calculate the variances and covariances; see Perron (1991, p. 234). In particular:
E e c j / m J c j m e c ( j 1 ) / m J c j 1 m 2 = 1 2 c ( e 2 c j / m e 2 c ( j 1 ) / m ) 2 + e 2 c j / m e 2 c ( j 1 ) / m ,
E ( j 1 ) / m j / m e c r J c 2 = ( e 2 c j / m e 2 c ( j 1 ) / m ) 2 8 c 3 ( e 2 c j / m e 2 c ( j 1 ) / m ) 4 c 3 + e 2 c j / m 2 m c 2 ,
E e c j / m J c j m e c ( j 1 ) / m J c j 1 m ( j 1 ) / m j / m e c r J c = ( e 2 c j / m e 2 c ( j 1 ) / m ) 2 4 c 2 + ( e 2 c j / m e 2 c ( j 1 ) / m ) 4 c 2 e 2 c j / m 2 m c ,
which combine to determine s 2 . The result for ( ρ ^ j ρ ) follows from the above results. ☐
Proof of Theorem 3.
(a) The aim is to derive the joint MGF:
M c ( θ 1 , θ 2 ) = E   exp θ 1 a b J c d W + θ 2 a b J c 2 .
We begin by noting that:
a b J c d W = 1 2 J c ( b ) 2 J c ( a ) 2 2 c a b J c 2 ( b a )
so that the function of interest becomes:
M c ( θ 1 , θ 2 ) = exp θ 1 ( b a ) 2 E   exp θ 1 2 J c ( b ) 2 J c ( a ) 2 + ( θ 2 c θ 1 ) a b J c 2 .
Evaluation of this expectation is aided by introducing the auxiliary O-U process Y ( t ) on t [ 0 , b ] with parameter λ , defined by:
d Y ( t ) = λ Y ( t ) d t + d W ( t ) , Y ( 0 ) = 0 .
Let μ J c and μ Y denote the probability measures induced by J c and Y, respectively. These measures are equivalent and, by Girsanov’s theorem (see, for example, Theorem 4.1 of Tanaka 1996),
d μ J c d μ Y ( s ) = exp ( c λ ) 0 b s ( t ) d s ( t ) ( c 2 λ 2 ) 2 0 b s ( t ) 2 d t
is the Radon–Nikodym derivative evaluated at s ( t ) , a random process on [ 0 , b ] with s ( 0 ) = 0 . The above change of measure will be used because, for a function f ( J c ) ,
E f ( J c ) = E f ( Y ) d μ J c d μ Y ( Y ) .
Using the change of measure, we obtain:
M c ( θ 1 , θ 2 ) = exp θ 1 ( b a ) 2 E   exp θ 1 2 Y ( b ) 2 Y ( a ) 2 + ( θ 2 c θ 1 ) a b Y 2 + ( c λ ) 0 b Y d Y ( c 2 λ 2 ) 2 0 b Y 2 .
Now, using the Itô calculus, 0 b Y d Y = ( 1 / 2 ) [ Y ( b ) 2 b ] , and so:
θ 1 2 Y ( b ) 2 Y ( a ) 2 + ( c λ ) 0 b Y d Y = ( θ 1 + c λ ) 2 Y ( b ) 2 θ 1 2 Y ( a ) 2 ( c λ ) 2 b ,
while splitting the second integral involving Y 2 yields:
( θ 2 c θ 1 ) a b Y 2 ( c 2 λ 2 ) 2 0 b Y 2 = ( λ 2 c 2 2 c θ 1 + 2 θ 2 ) 2 a b Y 2 ( c 2 λ 2 ) 2 0 a Y 2 .
Hence, defining δ = θ 1 + c λ ,
M c ( θ 1 , θ 2 ) = exp θ 1 a δ b 2 E   exp δ 2 Y ( b ) 2 θ 1 2 Y ( a ) 2 + ( λ 2 c 2 2 c θ 1 + 2 θ 2 ) 2 a b Y 2 ( c 2 λ 2 ) 2 0 a Y 2 .
As the parameter λ is arbitrary, it is convenient to set λ = ( c 2 + 2 c θ 1 2 θ 2 ) 1 / 2 so as to eliminate the term a b Y 2 . We shall then proceed in two steps:
(i)
Take the expectation in M c ( θ 1 , θ 2 ) conditional on F 0 a , the sigma field generated by W on [ 0 , a ] .
(ii)
Introduce another O-U process V and apply Girsanov’s theorem again to take the expectation with respect to F 0 a .
Step (i). Conditional on F 0 a , we obtain:
M c ( θ 1 , θ 2 ; F 0 a ) = exp θ 1 a δ b 2 exp θ 1 2 Y ( a ) 2 ( c 2 λ 2 ) 2 0 a Y 2 E exp δ 2 Y ( b ) 2 F 0 a .
Now, from the representation Y ( b ) = exp ( ( b a ) λ ) Y ( a ) + a b exp ( ( b r ) λ ) d W ( r ) , it follows that Y ( b ) | F 0 a N ( μ , ω 2 ) , where:
μ = E Y ( b ) | F 0 a = exp ( ( b a ) λ ) Y ( a ) ,
ω 2 = E Y ( b ) E Y ( b ) | F 0 a 2 | F 0 a = exp ( 2 ( b a ) λ ) 1 2 λ .
Hence, using Lemma 5 of Magnus (1986), for example,
E exp δ 2 Y ( b ) 2 F 0 a = exp δ 2 k Y ( a ) 2 1 δ ω 2 1 / 2 ,
where k = exp ( 2 ( b a ) λ ) / ( 1 δ ω 2 ) , and so:
M c ( θ 1 , θ 2 ; F 0 a ) = exp θ 1 a δ b 2 1 δ ω 2 1 / 2 exp δ k θ 1 2 Y ( a ) 2 ( c 2 λ 2 ) 2 0 a Y 2 .
Step (ii). We now introduce a new auxiliary process, V ( t ) , on [ 0 , a ] , given by:
d V ( t ) = η V ( t ) d t + d W ( t ) , V ( 0 ) = 0 ,
and will make use of the change of measure:
d μ Y d μ V ( s ) = exp ( λ η ) 0 a s ( t ) d s ( t ) ( λ 2 η 2 ) 2 0 a s ( t ) 2 d t
in order to eliminate 0 a Y 2 . We have M c ( θ 1 , θ 2 ) = E M c ( θ 1 , θ 2 ; F 0 a ) , and so:
M c ( θ 1 , θ 2 ) = exp θ 1 a δ b 2 1 δ ω 2 1 / 2 E   exp δ k θ 1 2 Y ( a ) 2 ( c 2 λ 2 ) 2 0 a Y 2 .
With the change of measure, the expectation of interest becomes:
E   exp δ k θ 1 2 V ( a ) 2 + ( λ η ) 0 a V d V + η 2 c 2 2 0 a V 2 .
However, η is arbitrary, and so, we set η = c in order to eliminate 0 a V 2 . Furthermore, noting that 0 a V d V = ( 1 / 2 ) [ V ( a ) 2 a ] , we obtain:
E   exp δ k θ 1 2 V ( a ) 2 + ( λ c ) 0 a V d V = exp ( λ c ) 2 a E   exp δ ( k 1 ) 2 V ( a ) 2 .
Now, V ( a ) = 0 a e c ( a r ) d W ( r ) , and so, V ( a ) N ( 0 , v 2 ) where v 2 = ( e 2 a c 1 ) / ( 2 c ) , hence:
E   exp δ ( k 1 ) 2 V ( a ) 2 = 1 δ ( k 1 ) v 2 1 / 2 .
It follows that M c ( θ 1 , θ 2 ) = exp ( ( θ 1 + c ) ( b a ) / 2 ) H c ( θ 1 , θ 2 ) 1 / 2 where:
H c ( θ 1 , θ 2 ) = exp ( ( b a ) λ ) ( 1 δ ω 2 ) ( 1 δ ( k 1 ) v 2 ) .
Let z = ( b a ) λ . Then:
e z ( 1 δ ω 2 ) = e z δ e z ( e 2 z 1 ) 2 λ = e z θ 1 + c λ 1 ( e z e z ) 2 = ( e z + e z ) 2 ( θ 1 + c ) λ ( e z e z ) 2 = cosh z ( θ 1 + c ) λ sinh z .
The second term involves the expression ( k 1 ) ( 1 δ ω 2 ) = e 2 z 1 + δ ω 2 , and so, we obtain:
e z ( k 1 ) ( 1 δ ω 2 ) = e z e z + δ e z ( e 2 z 1 ) 2 λ = e z e z + θ 1 + c λ 1 ( e z e z ) 2 = 1 + θ 1 + c λ ( e z e z ) 2 = 1 λ λ + θ 1 + c sinh z .
Noting that δ ( θ 1 + c + λ ) = ( θ 1 + c λ ) ( θ 1 + c + λ ) = ( θ 1 + c ) 2 λ 2 = θ 1 2 + 2 θ 2 and combining these components yields the required expression for H c ( θ 1 , θ 2 ) .
(b) The individual MGFs follow straightforwardly from Part (a), noting that M N c ( θ 1 ) = M c ( θ 1 , 0 ) and M D c ( θ 2 ) = M c ( 0 , θ 2 ) .
(c) From the definition of M c ( θ 1 , θ 2 ) , we obtain:
M c ( θ 1 , θ 2 ) θ 1 = ( b a ) 2 exp ( θ 1 + c ) 2 ( b a ) H c ( θ 1 , θ 2 ) 1 / 2 1 2 exp ( θ 1 + c ) 2 ( b a ) H c ( θ 1 , θ 2 ) 3 / 2 H c ( θ 1 , θ 2 ) θ 1 .
Partial differentiation of H c ( θ 1 , θ 2 ) yields:
H c ( θ 1 , θ 2 ) θ 1 = c ( b a ) sinh ( b a ) λ λ + c θ 1 + c + v 2 ( θ 1 2 + 2 θ 2 ) sinh ( b a ) λ λ 3 1 + 2 θ 1 v 2 sinh ( b a ) λ λ c ( b a ) θ 1 + c + v 2 ( θ 1 2 + 2 θ 2 ) cosh ( b a ) λ λ 2 ,
which makes use of the results:
cosh ( b a ) λ θ 1 = c ( b a ) sinh ( b a ) λ λ , sinh ( b a ) λ θ 1 = c ( b a ) cosh ( b a ) λ λ .
We need to evaluate H c ( θ 1 , θ 2 ) / θ 1 at θ 1 = 0 and at θ 2 , and this is facilitated by defining x = ( c 2 + 2 θ 2 ) 1 / 2 to replace λ ; this results in:
H c ( θ 1 , θ 2 ) θ 1 θ 1 = 0 = c ( b a ) 1 sinh ( b a ) x x + c c 2 v 2 θ 2 sinh ( b a ) x x 3 c ( b a ) c 2 v 2 θ 2 cosh ( b a ) x x 2 .
It is also convenient to define:
g ( x ) = H c ( 0 , θ 2 ) = cosh ( b a ) x c 2 v 2 θ 2 sinh ( b a ) x x .
Combining the results above yields:
M c ( θ 1 , θ 2 ) θ 1 θ 1 = 0 = ( b a ) 2 exp c ( b a ) 2 g ( x ) 1 / 2 1 2 exp c ( b a ) 2 g ( x ) 3 / 2 c ( b a ) 1 sinh ( b a ) x x + c c 2 v 2 θ 2 sinh ( b a ) x x 3 c ( b a ) c 2 v 2 θ 2 cosh ( b a ) x x 2 .
Integrating with respect to θ 2 yields the result in the theorem.
Proof of Corollary to Theorem 3.
The results follow from Theorem 3 noting that:
(a)
b a = 1 and v 2 = 0 ;
(b)
b a = 1 / m and lim c 0 v j 1 2 = ( j 1 ) / m .
Derivation of (14). From (A3), we can write:
M c ( θ 1 , θ 2 ) = exp ( ( θ 1 + c ) ( b a ) ) exp ( z ) ( 1 δ ω 2 ) ( 1 δ ( k 1 ) v 2 ) 1 / 2 ,
where z = ( b a ) λ . It can be shown that:
1 δ ( k 1 ) v 2 = 1 δ ω 2 δ v 2 [ exp ( 2 z ) ( 1 δ ω 2 ) ] 1 δ ω 2
so that:
( 1 δ ω 2 ) ( 1 δ ( k 1 ) v 2 ) = ( 1 + δ v 2 ) ( 1 δ ω 2 ) δ v 2 exp ( 2 z ) .
Multiplying by exp ( z ) and noting that:
exp ( z ) ( 1 δ ω 2 ) = 2 λ + δ 2 λ exp ( z ) δ 2 λ exp ( z )
results in the expression for M c ( θ 1 , θ 2 ) in (14). ☐
Proof of Theorem 4.
We can examine what happens to the quantities in Parts (a) and (b) by considering the joint MGF of ( 2 c ) 1 / 2 a b J c d W and ( 2 c ) a b J c 2 , which is given by:
L c ( p , q ) = M c ( 2 c ) 1 / 2 p , 2 c q .
Using (14), we need to examine the asymptotic properties of λ , δ and v 2 as c . The following asymptotic expansions facilitate this:
λ = ( c 2 + 2 c ( 2 c ) 1 / 2 p + 4 c q ) 1 / 2 = ( c 2 2 3 / 2 ( c ) 3 / 2 p + 4 c q ) 1 / 2 = c 2 1 / 2 ( c ) 1 / 2 p p 2 2 q + O ( | c | 1 / 2 ) ;
δ = ( 2 c ) 1 / 2 p + c λ = 2 1 / 2 ( c ) 1 / 2 p + c + c + 2 1 / 2 ( c ) 1 / 2 p + p 2 + 2 q + O ( | c | 1 / 2 ) = 2 3 / 2 ( c ) 1 / 2 p + 2 c + p 2 + 2 q + O ( | c | 1 / 2 ) ;
2 λ + δ = ( 2 c ) 1 / 2 p + c + λ = ( 2 c ) 1 / 2 p + c c 2 1 / 2 ( c ) 1 / 2 p p 2 2 q + O ( | c | 1 / 2 ) = p 2 2 q + O ( | c | 1 / 2 ) ;
δ v 2 = exp ( 2 a c ) 1 2 3 / 2 ( c ) 1 / 2 p + 2 c + p 2 + 2 q + O ( | c | 1 / 2 ) 2 ( c ) = exp ( 2 a c ) 1 2 1 / 2 ( c ) 1 / 2 p + 1 2 1 ( c ) 1 p 2 ( c ) 1 q + O ( | c | 3 / 2 ) 1 a s c .
Combining these results, we find that:
L c ( p , q ) exp 1 2 p 2 + q ( b a ) a s c ,
from which the results in (a) and (b) follow immediately. To establish (c), note that:
K ( c ) = ( b a ) 1 / 2 ( 2 c ) a b J c 2 1 ( 2 c ) 1 / 2 a b J c d W + 1 2 ( 1 η ) + o p ( 1 ) .
The result then follows using (a) and (b). ☐
Proof of Theorem 5.
To determine the weights for ρ ^ J , note that:
E ( ρ ^ ) = ρ + μ c ( η ) n + O 1 n 2 , E ( ρ ^ j ) = ρ + μ c , j ( η ) + O 1 2 , j = 1 , , m ,
where μ c ( η ) is defined in the theorem. From the definition of ρ ^ J , taking expectations yields:
E ( ρ ^ J ) = w 1 , c ( η ) + w 2 , c ( η ) ρ + 1 n w 1 , c ( η ) μ c ( η ) + w 2 , c ( η ) j = 1 m μ c , j ( η ) + O 1 n 2 .
In order that E ( ρ ^ J ) = ρ + O ( 1 / n 2 ) , the requirements are that:
(i)
w 1 , c ( η ) + w 2 , c ( η ) = 1 , and
(ii)
w 1 , c ( η ) μ c ( η ) + w 2 , c ( η ) j = 1 m μ c , j ( η ) = 0 .
Solving these two conditions simultaneously yields the stated weights. ☐

References

  1. Abadir, Karim M. 1993. The limiting distribution of the autocorrelation coefficient under a unit root. Annals of Statistics 21: 1058–70. [Google Scholar] [CrossRef]
  2. Chambers, Marcus J. 2013. Jackknife estimation and inference in stationary autoregressive models. Journal of Econometrics 172: 142–57. [Google Scholar] [CrossRef]
  3. Chambers, Marcus J. 2015. A jackknife correction to a test for cointegration rank. Econometrics 3: 355–75. [Google Scholar] [CrossRef]
  4. Chambers, Marcus J., and Maria Kyriacou. 2013. Jackknife estimation with a unit root. Statistics and Probability Letters 83: 1677–82. [Google Scholar] [CrossRef]
  5. Chan, Ngai H., and Ching-Zong Wei. 1987. Asymptotic inference for nearly nonstationary AR(1) processes. Annals of Statistics 15: 1050–63. [Google Scholar] [CrossRef]
  6. Chen, Ye, and Jun Yu. 2015. Optimal jackknife for unit root models. Statistics and Probability Letters 99: 135–42. [Google Scholar] [CrossRef]
  7. Gonzalo, Jesus, and Jean-Yves Pitarakis. 1998. On the exact moments of nonstandard asymptotic distributions in an unstable AR(1) with dependent errors. International Economic Review 39: 71–88. [Google Scholar] [CrossRef]
  8. Kruse, Robinson, and Hendrik Kaufmann. 2015. Bias-corrected estimation in mildly explosive autoregressions. Paper presented at Annual Conference 2015: Economic Development—Theory and Policy, Verein für Socialpolitik/German Economic Association, Muenster, Germany, September 6–9. [Google Scholar]
  9. Kyriacou, Maria. 2011. Jackknife Estimation and Inference in Non-stationary Autoregression. Ph.D. thesis, University of Essex, Colchester, UK. [Google Scholar]
  10. Kyriacou, Maria, Peter C. B. Phillips, and Francesca Rossi. 2017. Indirect inference in spatial autoregression. The Econometrics Journal 20: 168–89. [Google Scholar] [CrossRef]
  11. Magnus, Jan R. 1986. The exact moments of a ratio of quadratic forms in normal variables. Annales d’Économie et de Statistique 4: 95–109. [Google Scholar] [CrossRef]
  12. Meng, Xiao-Li. 2005. From unit root to Stein’s estimator to Fisher’s k statistics: If you have a moment, I can tell you more. Statistical Science 20: 141–62. [Google Scholar] [CrossRef]
  13. Park, Joon. 2006. A bootstrap theory for weakly integrated processes. Journal of Econometrics 133: 639–72. [Google Scholar] [CrossRef]
  14. Perron, Pierre. 1989. The calculation of the limiting distribution of the least-squares estimator in a near-integrated model. Econometric Theory 5: 241–55. [Google Scholar] [CrossRef]
  15. Perron, Pierre. 1991. A continuous time approximation to the unstable first-order autoregressive process: The case without an intercept. Econometrica 59: 211–36. [Google Scholar] [CrossRef]
  16. Perron, Pierre. 1996. The adequacy of asymptotic approximations in the near-integrated autoregressive model with dependent errors. Journal of Econometrics 70: 317–50. [Google Scholar] [CrossRef]
  17. Phillips, Peter C. B. 1987a. Towards a unified asymptotic theory for autoregression. Biometrika 74: 535–47. [Google Scholar] [CrossRef]
  18. Phillips, Peter C. B. 1987b. Time series regression with a unit root. Econometrica 55: 277–301. [Google Scholar] [CrossRef]
  19. Phillips, Peter C. B. 2012. Folklore theorems, implicit maps, and indirect inference. Econometrica 80: 425–54. [Google Scholar]
  20. Phillips, Peter C. B. 2014. On confidence intervals for autoregressive roots and predictive regression. Econometrica 82: 1177–95. [Google Scholar] [CrossRef]
  21. Phillips, Peter C. B., Hyungsik Roger Moon, and Zhijie Xiao. 2001. How to estimate autoregressive roots near unity. Econometric Theory 17: 26–69. [Google Scholar] [CrossRef]
  22. Phillips, Peter C. B., and Jun Yu. 2005. Jackknifing bond option prices. Review of Financial Studies 18: 707–42. [Google Scholar] [CrossRef]
  23. Quenouille, Maurice H. 1956. Notes on bias in estimation. Biometrika 43: 353–60. [Google Scholar] [CrossRef]
  24. Stoykov, Marian Z. 2017. Optimal Jackknife Estimation of Local to Unit Root Models. Colchester: Essex Business School, preprint. [Google Scholar]
  25. Tanaka, Katsuto. 1996. Time Series Analysis: Nonstationary and Noninvertible Distribution Theory. New York: Wiley. [Google Scholar]
  26. Tukey, John W. 1958. Bias and confidence in not-quite large samples. Annals of Mathematical Statistics 29: 614. [Google Scholar]
  27. White, John S. 1958. The limiting distribution of the serial correlation coefficient in the explosive case. Annals of Mathematical Statistics 29: 1188–97. [Google Scholar] [CrossRef]
1
If y 0 0 , then additional random variables appear in the numerator and denominator of the bias, thereby complicating the derivation of the required expectation. In the case of c = 0 , the simulation results reported in Table 2.3 of Kyriacou (2011) indicate that the bias of ρ ^ increases with y 0 / σ , but that the jackknife continues to be an effective method of bias reduction.
2
The integrals were computed numerically using an adaptive quadrature method in the integrate1d routine in Gauss 17.
3
The Gauss codes used for the jackknife estimators are available from the authors on request.
4
In the simulations, we are taking η = 1 as known.
5
We thank a referee for suggesting that we investigate the performance of the estimators when η < 1 .
6
It should also be noted that we have assumed y 0 = 0 in deriving the jackknife weights, whereas the expansion in Theorem 2 suggests that the bias functions (and, hence, the jackknife weights) will depend in a non-trivial and more complicated way on y 0 when y 0 0 . Although we have not investigated the issue further here, the results in Kyriacou (2011) suggest that jackknife methods can still provide bias reduction even when y 0 0 .
Table 1. Values of μ c , j ( 1 ) = E ( Z c , j ( 1 ) ) .
Table 1. Values of μ c , j ( 1 ) = E ( Z c , j ( 1 ) ) .
j c 50 20 10 5 1 01
m = 1
1 1.9995 1.9972 1.9912 1.9758 1.8818 1.7814 1.5811
m = 2
1 1.9981 1.9912 1.9758 1.9439 1.8408 1.7814 1.6969
2 1.9604 1.9043 1.8214 1.6891 1.3295 1.1382 0.8920
m = 3
1 1.9962 1.9838 1.9595 1.9175 1.8234 1.7814 1.7283
2 1.9412 1.8613 1.7502 1.5921 1.2722 1.1382 0.9791
3 1.9412 1.8613 1.7500 1.5845 1.1515 0.9319 0.6759
m = 4
1 1.9939 1.9758 1.9439 1.8973 1.8138 1.7814 1.7427
2 1.9225 1.8214 1.6891 1.5210 1.2411 1.1382 1.0210
3 1.9225 1.8214 1.6879 1.5021 1.1016 0.9319 0.7410
4 1.9225 1.8214 1.6879 1.5006 1.0396 0.8143 0.5643
m = 6
1 1.9884 1.9594 1.9175 1.8698 1.8037 1.7814 1.7564
2 1.8867 1.7502 1.5921 1.4268 1.2085 1.1382 1.0616
3 1.8867 1.7500 1.5845 1.3812 1.0482 0.9319 0.8059
4 1.8867 1.7500 1.5843 1.3732 0.9697 0.8143 0.6472
5 1.8867 1.7500 1.5842 1.3717 0.9243 0.7348 0.5331
6 1.8867 1.7500 1.5842 1.3715 0.8958 0.6761 0.4450
m = 8
1 1.9823 1.9439 1.8973 1.8526 1.7984 1.7814 1.7629
2 1.8530 1.6891 1.5210 1.3686 1.1915 1.1382 1.0813
3 1.8530 1.6879 1.5021 1.2991 1.0203 0.9319 0.8381
4 1.8530 1.6879 1.5006 1.2815 0.9326 0.8143 0.6893
5 1.8530 1.6879 1.5005 1.2766 0.8795 0.7348 0.5829
6 1.8530 1.6879 1.5005 1.2752 0.8444 0.6761 0.5008
7 1.8530 1.6879 1.5005 1.2748 0.8201 0.6302 0.4345
8 1.8530 1.6879 1.5005 1.2747 0.8027 0.5931 0.3793
m = 12
1 1.9693 1.9175 1.8698 1.8324 1.7929 1.7814 1.7693
2 1.7916 1.5921 1.4268 1.3016 1.1742 1.1382 1.1007
3 1.7916 1.5845 1.3812 1.1979 0.9916 0.9319 0.8698
4 1.7916 1.5842 1.3732 1.1612 0.8943 0.8143 0.7313
5 1.7916 1.5842 1.3717 1.1464 0.8328 0.7348 0.6335
6 1.7916 1.5842 1.3715 1.1403 0.7904 0.6761 0.5585
7 1.7916 1.5842 1.3714 1.1376 0.7595 0.6302 0.4981
8 1.7916 1.5842 1.3714 1.1365 0.7362 0.5931 0.4477
9 1.7916 1.5842 1.3714 1.1360 0.7183 0.5622 0.4047
10 1.7916 1.5842 1.3714 1.1358 0.7041 0.5358 0.3674
11 1.7916 1.5842 1.3714 1.1357 0.6928 0.5131 0.3346
12 1.7916 1.5842 1.3714 1.1356 0.6837 0.4931 0.3055
Table 2. Values of standard and optimal jackknife weights ( η = 1 ) .
Table 2. Values of standard and optimal jackknife weights ( η = 1 ) .
m : 2346812
Standard weights
w 1 2.00001.50001.33331.20001.14291.0909
w 2 −1.0000−0.5000−0.3333−0.2000−0.1429−0.0909
Optimal weights: c = 50
w 1 , c ( 1 ) 2.02061.51561.34701.21221.15441.1016
w 2 , c ( 1 ) −1.0206−0.5156−0.3470−0.2122−0.1544−0.1016
Optimal weights: c = 20
w 1 , c ( 1 ) 2.05211.53851.36701.22921.16981.1151
w 2 , c ( 1 ) −1.0521−0.5385−0.3670−0.2292−0.1698−0.1151
Optimal weights: c = 10
w 1 , c ( 1 ) 2.10261.57411.39691.25351.19091.1325
w 2 , c ( 1 ) −1.1026−0.5741−0.3969−0.2535−0.1909−0.1325
Optimal weights: c = 5
w 1 , c ( 1 ) 2.19231.63361.44451.28981.22131.1565
w 2 , c ( 1 ) −1.1923−0.6336−0.4445−0.2898−0.2213−0.1565
Optimal weights: c = 1
w 1 , c ( 1 ) 2.46051.79561.56781.37881.29371.2117
w 2 , c ( 1 ) −1.4605−0.7956−0.5678−0.3788−0.2937−0.2117
Optimal weights: c = 0
w 1 , c ( 1 ) 2.56511.86051.61761.41471.32281.2337
w 2 , c ( 1 ) −1.5651−0.8605−0.6176−0.4147−0.3228−0.2337
Optimal weights: c = 1
w 1 , c ( 1 ) 2.56891.87731.63551.43111.33731.2455
w 2 , c ( 1 ) −1.5689−0.8773−0.6355−0.4311−0.3373−0.2455
Table 3. Bias of OLS and jackknife estimators ( m = 2 ) when η = 1 .
Table 3. Bias of OLS and jackknife estimators ( m = 2 ) when η = 1 .
Estimator n = 24 n = 48 n = 96 n = 192
c = 0
OLS−0.0663−0.0351−0.0180−0.0091
Standard jackknife−0.0343−0.0155−0.0072−0.0035
Optimal/unit root jackknife−0.0163−0.0045−0.0011−0.0003
Chen-Yu jackknife−0.0264−0.0144−0.0110−0.0102
c = 1
OLS−0.0675−0.0365−0.0188−0.0096
Standard jackknife−0.0323−0.0145−0.0067−0.0032
Optimal jackknife−0.0161−0.0044−0.0011−0.0002
Unit root jackknife−0.0124−0.00210.00020.0004
Chen-Yu jackknife−0.0190−0.0099−0.0087−0.0089
c = 5
OLS−0.0589−0.0350−0.0188−0.0098
Standard jackknife−0.0193−0.0088−0.0038−0.0017
Optimal jackknife−0.0116−0.0037−0.0009−0.0001
Unit root jackknife0.00310.00610.00480.0029
Chen-Yu jackknife0.00370.0027−0.0017−0.0051
c = 10
OLS−0.0437−0.0310−0.0178−0.0095
Standard jackknife−0.0114−0.0055−0.0022−0.0009
Optimal jackknife−0.0080−0.0029−0.0006−0.0001
Unit root jackknife0.00690.00890.00660.0039
Chen-Yu jackknife0.00900.00710.0013−0.0035
Table 4. RMSE of OLS and jackknife estimators when η = 1 .
Table 4. RMSE of OLS and jackknife estimators when η = 1 .
Estimator n = 24 n = 48 n = 96 n = 192
c = 0
OLS0.13660.07190.03710.0187
Standard jackknife ( m = 2 ) 0.14820.07660.03960.0199
Standard jackknife ( m = 4 , 6 , 6 , 8 ) 0.13100.06590.03330.0165
Optimal/unit root jackknife ( m = 2 ) 0.17530.09150.04790.0242
Optimal/unit root jackknife ( m = 4 , 8 , 12 , 12 ) 0.13830.06420.03120.0154
Chen-Yu jackknife ( m = 2 ) 0.16400.08640.04620.0249
Chen-Yu jackknife ( m = 3 ) 0.13920.07190.03740.0188
c = 1
OLS0.14280.07620.03960.0200
Standard jackknife ( m = 2 ) 0.15240.07970.04150.0209
Standard jackknife ( m = 4 , 6 , 6 , 8 ) 0.13680.06980.03550.0176
Optimal jackknife ( m = 2 ) 0.17240.09080.04770.0242
Optimal jackknife ( m = 4 , 8 , 12 , 12 ) 0.14160.06790.03340.0165
Unit root jackknife ( m = 2 ) 0.17780.09390.04950.0251
Unit root jackknife ( m = 4 , 8 , 12 , 12 ) 0.14350.06800.03330.0165
Chen-Yu jackknife ( m = 2 ) 0.17100.09160.04890.0262
Chen-Yu jackknife ( m = 3 ) 0.14770.07780.04080.0207
c = 5
OLS0.16260.09010.04760.0243
Standard jackknife ( m = 2 ) 0.17450.09440.04980.0253
Standard jackknife ( m = 6 , 6 , 8 , 8 ) 0.16150.08550.04420.0223
Optimal jackknife ( m = 2 ) 0.18130.09820.05200.0265
Optimal jackknife ( m = 6 , 8 , 12 , 12 ) 0.16410.08470.04320.0216
Unit root jackknife ( m = 2 ) 0.19750.10780.05760.0295
Unit root jackknife ( m = 6 , 12 , 12 , 12 ) 0.17100.08520.04330.0219
Chen-Yu jackknife ( m = 2 ) 0.20660.11380.06100.0318
Chen-Yu jackknife ( m = 3 ) 0.18570.10140.05400.0279
c = 10
OLS0.18090.10370.05580.0288
Standard jackknife ( m = 2 ) 0.19710.10960.05840.0300
Standard jackknife ( m = 8 , 8 , 12 , 12 ) 0.18530.10160.05340.0272
Optimal jackknife ( m = 2 ) 0.20030.11140.05950.0306
Optimal jackknife ( m = 4 , 12 , 12 , 12 ) 0.18770.10150.05300.0270
Unit root jackknife ( m = 2 ) 0.21750.12170.06560.0340
Unit root jackknife ( m = 6 , 12 , 12 , 12 ) 0.19850.10320.05390.0277
Chen-Yu jackknife ( m = 2 ) 0.23280.13150.07110.0372
Chen-Yu jackknife ( m = 3 ) 0.21510.12020.06490.0338
Table 5. Bias of OLS and jackknife estimators ( m = 2 ) when η 1 .
Table 5. Bias of OLS and jackknife estimators ( m = 2 ) when η 1 .
Estimator n = 24 n = 48 n = 96 n = 192
η = 0.5556 (MA case)
c = 0
OLS−0.0191−0.0105−0.0054−0.0028
Standard jackknife−0.0059−0.0018−0.0006−0.0002
Optimal/unit root jackknife ( η = 1 )0.00160.00310.00220.0012
Chen-Yu jackknife0.00620.00540.00340.0018
c = 1
OLS−0.0060−0.0040−0.0022−0.0012
Standard jackknife0.01100.00680.00380.0020
Optimal jackknife ( η = 1 )0.01880.01180.00650.0034
Unit root jackknife0.02060.01290.00720.0037
Chen-Yu jackknife0.02750.01670.00910.0048
c = 5
OLS0.06310.03040.01500.0074
Standard jackknife0.08850.04570.02340.0118
Optimal jackknife ( η = 1 )0.09340.04860.02500.0127
Unit root jackknife0.10280.05430.02810.0143
Chen-Yu jackknife0.11290.06010.03120.0159
η = 0.0526 (AR case)
c = 0
OLS0.05240.02370.01070.0049
Standard jackknife0.03060.01370.00660.0033
Optimal/unit root jackknife ( η = 1 )0.01830.00810.00430.0024
Chen-Yu jackknife0.02930.01430.00760.0039
c = 1
OLS0.08010.03860.01840.0089
Standard jackknife0.06240.03050.01530.0077
Optimal jackknife ( η = 1 )0.05420.02680.01390.0071
Unit root jackknife0.05240.02600.01350.0070
Chen-Yu jackknife0.06450.03310.01720.0088
c = 5
OLS0.21160.10860.05470.0273
Standard jackknife0.20960.10700.05410.0271
Optimal jackknife ( η = 1 )0.20920.10670.05390.0271
Unit root jackknife0.20840.10610.05370.0270
Chen-Yu jackknife0.22120.11380.05750.0289

Share and Cite

MDPI and ACS Style

Chambers, M.J.; Kyriacou, M. Jackknife Bias Reduction in the Presence of a Near-Unit Root. Econometrics 2018, 6, 11. https://doi.org/10.3390/econometrics6010011

AMA Style

Chambers MJ, Kyriacou M. Jackknife Bias Reduction in the Presence of a Near-Unit Root. Econometrics. 2018; 6(1):11. https://doi.org/10.3390/econometrics6010011

Chicago/Turabian Style

Chambers, Marcus J., and Maria Kyriacou. 2018. "Jackknife Bias Reduction in the Presence of a Near-Unit Root" Econometrics 6, no. 1: 11. https://doi.org/10.3390/econometrics6010011

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop