Next Article in Journal
Estimating the Relative Risks of Spatial Clusters Using a Predictor–Corrector Method
Previous Article in Journal
Extending the Meshless Natural-Neighbour Radial-Point Interpolation Method to the Structural Optimization of an Automotive Part Using a Bi-Evolutionary Bone-Remodelling-Inspired Algorithm
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Higher-Order Expansions for Estimators in the Presence of Nuisance Parameters

Department of Economics, York University, Toronto, ON M3J 1P3, Canada
Mathematics 2025, 13(2), 179; https://doi.org/10.3390/math13020179
Submission received: 7 August 2024 / Revised: 21 December 2024 / Accepted: 22 December 2024 / Published: 7 January 2025

Abstract

:
Higher-order asymptotic methods for nonlinear models with nuisance parameters are developed. We allow for both one-step estimators, in which the nuisance and parameters of interest are jointly estimated; and also two-step (or iterated) estimators, in which the nuisance parameters are first estimated. The properties of the former, although in principle simpler to conceptualize, are more difficult to establish explicitly. The iterated estimators allow for a variety of scenarios. The results indicate when second-order considerations should be taken into account when conducting inferences with two-step estimators. The results in the paper accomplish three objectives: (i) provide simpler methods for deriving higher-order moments when nuisance parameters are present; (ii) indicate more explicitly the sources of deviations of estimators’ sampling distributions from that given by standard first-order asymptotic theory; and, in turn, (iii) indicate in which situations the corrections (either analytically or by a resampling method such as bootstrap or jackknife) should be made when making inferences. We illustrate using several popular examples in econometrics. We also provide a numerical example which highlights how a simple analytical bias correction can improve inferences.

1. Introduction

Standard higher-order asymptotics are usually applicable in situations which focus on method-of-moments estimators of k parameters, which solve a set of k estimating equations (some specific counter-examples are Rilstone and Ullah’s [1] derivation of the second-order bias of Heckman’s estimator and Newey and Smith’s [2] derivation of the second-order properties of the Generalized method-of-moments (GMM) and Generalized Empirical Likelihood (GEL) estimators). It is often the case that estimators are not expressed this way. Examples are generalized least squares estimators, in which variance estimates are obtained using residuals from a preliminary regression. GMM estimators typically depend on an estimated weighting matrix, which can itself be seen as an estimate of its probability limit. Two-stage least squares (2SLS) estimators use a preliminary linear regression to estimate instruments. In these cases, an estimated parameter appears in the estimating equation defining the parameter of interest. Often, one’s interest is simply in a subset of all the parameters in a model. In this paper, we develop a few generic tools that help in these situations. In each of these cases, we distinguish between the “parameter of interest”,  β 0 , about which we want to make inferences, and the “nuisance parameter”,  η 0 , which needs to be estimated in order to make inferences about  β 0 , but in which we have no intrinsic interest. Thus, in this paper the author develops methods for deriving stochastic expansions and approximate moments for the estimators of the parameters of interest when there are additional nuisance parameters.
In this paper, we distinguish between two types of estimation situations. In the first case, the more general, the nuisance parameter cannot be estimated without also estimating the parameter of interest. In the second case, the nuisance parameter can be estimated consistently by itself. Here, we equate this with situations in which the subset of moment conditions germane to the nuisance parameter are not a function of the parameter of interest. This corresponds to the case of two-step estimation. We refer to the more general case as one-step estimation. We spend time on both situations here, developing some tools to assist in higher-order asymptotics.
There is an extensive literature on higher-order expansions of nonlinear estimators, and specifically their second-order or approximate moments. We highlight some related examples. Nagar [3] examined the second-order bias and mean squared error (MSE) of k-class estimators. Firth [4] examined the bias of maximum likelihood estimators. Rilstone et al. [5] derived the second-order bias and MSE of method of moment estimators. Newey and Smith [2] examined the bias and MSE of GMM and GEL estimators. Hahn and Newey [6] examined bias reduction for nonlinear panel models. Bao and Ullah [7] examined the second-order skewness and kurtosis of estimators. Iglesias [8] derived the approximate bias of the smoothed maximum score estimator. Chen and Giles [9] derived the approximate bias and MSE of the binary logit model. Khundi and Rilstone [10] provided simplified methods for deriving higher-order moments. Rilstone [11] derived higher-order expansions and moments for nonlinear models with heterogeneous observations.
We proceed in a few steps. In Section 2, we review the standard setup for estimation with nuisance parameters and first-order asymptotic results. In Section 3 and Section 4, we develop techniques which streamline the derivation of higher-order stochastic expansions and approximate moments for the estimator of the parameter of interest. In Section 5, these results are then illustrated with three popular models used in econometrics: Ordinary least squares (OLS) when the focus is on a subset of parameters; estimation of the variance in a regression model; and 2SLS/instrumental variables (IV) estimation. In Section 6, we provide a numerical example based on the latter, which highlights how a simple analytical bias correction can improve inferences. The paper has three objectives: (i) provide simpler methods for deriving higher-order moments when nuisance parameters are present; (ii) indicate more explicitly the sources of deviations of estimators’ sampling distributions compared to that given by standard first-order asymptotic theory; and, in turn, (iii) indicate in which situations higher-order corrections (either analytically or by a resampling method such as bootstrap or jackknife) should be made when making inferences. Section 7 concludes and Section 8 provides detailed derivations of the results.

2. Joint Estimation

2.1. Estimating Equations and Their Derivatives

The approach we use is based on a set of  m = l + k  estimating equations (this is identical to the setup in Rilstone et al. ([5], Equation 2.2), except here we decompose the parameter vector and the set of estimating equations into two components for the nuisance parameter and the parameter of interest):
ψ ( θ ) = 1 N q i ( θ ) , q i ( θ ) = q η , i ( θ ) q β , i ( θ ) , θ = η β
and
θ ^ = η ^ β ^
is the solution to  ψ ( θ ^ ) = 0 , where  η  is  l × 1  and  β  is  k × 1 q η , i ( θ )  and  q β , i ( θ )  denote the first l and last k elements of  q i ( θ ) . For example, in a regression framework  y i = η X 1 i + β X 2 i + ϵ i , where we have two sets of regressors, we would have  q η i ( θ ) = ϵ i X 1 i  and  q β i ( θ ) = ϵ i X 2 i . True values are indicated by  θ 0 = η 0 β 0 ϵ i  is a regression residual, orthogonal to the regressors, such as  E [ ϵ i X i ] = 0 , with  X i = ( X 1 i , X 2 i ) .
With two-step estimation (and, for our purposes, what distinguishes it),  q η , i ( θ ) = q η , i ( η )  (with a slight abuse of notation) is only a function of  η  and the estimate of  η 0  is obtained by solving  1 N q η , i ( η ^ ) = 0 . There are many advantages to such models and we focus much of our attention in this paper on them.
We suppose in this paper that the regularity conditions used in Rilstone et al. [5] to obtain the stochastic expansions and approximate moments for the estimator of the full parameter vector,  θ ^ , are satisfied. It is convenient to introduce some notation when we have partitioned the parameter space in the way that we have. We define the selection matrices,
S η = I l 0 l × k , S β = 0 k × l I k ,
so that  η = S η θ  and  β = S β θ . These selection matrices will prove useful in a number of ways. Throughout this paper, we denote the component of a vector or matrix germane to  β  (or  η ) by affixing an additional subscript  β  (or  η ). This will be clear in the context. Formally, we have, e.g.,  q β , i = S β q i .
We use notation which indicates differentiation and cross-differentiation with respect to  η  and  β . In this paper, we are interested in first- through third-order derivatives, indicating these by using parameters in parentheses as superscripts. Temporarily, let  θ 1 θ 2 , and  θ 3  denote the  m 1 × 1 m 2 × 1 , and  m 3 × 1  sub-vectors of  θ 1 m j m j = 1 , 2 , 3 . In this paper, each  θ j  is either  θ η , or  β . Then, we define
q i ( θ 1 ) ( θ 1 ) = q i ( θ ) θ 1 , q i ( θ 1 θ 2 ) ( θ ) = q i ( θ 1 ) ( θ ) θ 2 , q i ( θ 1 θ 2 θ 3 ) ( θ ) = q i ( θ 1 θ 2 ) ( θ ) θ 3
as the row-wise  m × m 1 m × m 1 m 2 , and  m × m 1 m 2 m 3  matrices of derivatives of  q i ( θ )  with respect to the given arguments. As examples we have
q i ( θ η ) ( θ ) = q i ( θ ) ( θ ) η , q i ( θ η β ) ( θ ) = q i ( θ η ) ( θ ) β ,
indicating the row-wise  m × m l  and  m × m l k  matrices of derivatives of  q i ( θ ) ( θ )  with respect to  η , and   q i ( θ η )  with respect to  β . We could establish more generality, but that is not required here and would obscure some of the results.

2.2. First-Order Asymptotics

With the assumptions of Rilstone et al. [5] satisfied, we can define the Jacobian of the estimating equations as
q i ( θ ) ( θ ) = q η , i ( η ) ( θ ) q η , i ( β ) ( θ ) q β , i ( η ) ( θ ) q β , i ( β ) ( θ ) .
We denote  Q = ( E [ q 1 ( θ ) ( θ 0 ) ] ) 1  and partition this:
Q Q η η Q η β Q β η Q β β
where
Q η η = q ¯ η , 1 ( η ) 1 + q ¯ η , 1 ( η ) 1 q ¯ η , 1 ( β ) Q β β q ¯ β , 1 ( η ) q ¯ η , 1 ( η ) 1 Q η β = q ¯ η , 1 ( η ) 1 q ¯ η , 1 ( β ) Q β β Q β η = Q β β q ¯ β , 1 ( η ) q ¯ η , 1 ( η ) 1 Q β β = ( q ¯ β , 1 ( β ) q ¯ β , 1 ( η ) q ¯ η , 1 ( η ) 1 q ¯ η , 1 ( β ) ) 1 .
The formulae we use for the components of Q are standard (the elements of Q correspond to the elements of the partitioned inverse given in Greene ([12], Equation A-74)). Note that  Q η η Q η β Q β η , and  Q β β  are  l × l l × k k × l , and  k × k , respectively. (When a function’s argument is suppressed, that argument is being evaluated at its true value. For emphasis we sometimes include it.)
We have the ongoing assumption that  E [ q 1 ] = 0 , and we let
Ω E [ q 1 q 1 ] = E [ q η , 1 q η , 1 ] E [ q η , 1 q β , 1 ] E [ q β , 1 q η , 1 ] E [ q β 1 q β 1 ] Ω η η Ω η β Ω β η Ω β β .
We first state some results from standard first-order asymptotics. The usual (joint) influence function for  θ ^  is written
d i = Q q i
with
N ( θ ^ θ 0 ) = 1 N d i + o P ( 1 ) d N ( 0 , V 1 )
where
V 1 = E [ d 1 d 1 ] = Q Ω Q .
Our focus is on  β ^ β ^ β 0  is of course asymptotically normal with the first-order variance matrix given by the lower right  k × k  sub-matrix of  V 1 :
V β , 1 = S β V 1 S β = Q β η Q β β Ω η η Ω η β Ω β η Ω β β Q β η Q β β = Q β η Ω η η + Q β β Ω β η Q β η Ω η β + Q β β Ω β β Q β η Q β β = Q β η Ω η η Q β η + Q β β Ω β η Q β η + Q β η Ω η β Q β β + Q β β Ω β β Q β β
Since each of the  Q ’s contain a number of terms,  V β , 1  can be rather complicated, at least analytically. The complexity carries over to higher-order asymptotics. Fortunately, additional information on the structure of models often simplifies matters. By inspection of Equations (7) and (9) it is useful to see under which conditions  V β , 1  simplifies.
Note that if  Ω η β = Ω β η  is zero, the two middle terms in  V β , 1  vanish. In general, this is not the case. However, it does hold for some very important situations. This will occur with a linear regression model with orthogonal regressors. It occurs when estimating the mean and variance of a random variable when  Ω η β  is proportional to the third moment of the random variable and is zero if the third moment is zero.
In the case of two-step estimators, we take as given that  β  does not appear in  q η , i ( θ ) , so that  q η , i ( β ) ( θ ) = 0 . (We are largely interested in the resulting impact on the first-order variance. In this case, the important condition can be weakened:  E [ q η , 1 ( β ) ( θ 0 ) ] = 0 .) The form of Q is critical to the properties of  θ ^  and, by implication, to the properties of  β ^ . Using the formula for inverting a partitioned matrix we see that Q is upper diagonal in the two-step case and that the other elements of Q simplify as well, with  Q β β = ( q ¯ β , 1 ( β ) ) 1 . Typically,  q ¯ η , 1 ( η )  and  q ¯ β , 1 ( β )  are symmetric Hessian matrices, so that
Q = Q η η 0 Q β η Q β β q ¯ η , 1 ( η ) 1 0 q ¯ β , 1 ( β ) 1 q ¯ β , 1 ( η ) q ¯ η , 1 ( η ) 1 q ¯ β , 1 ( β ) 1
and the components of  V β , 1  simplify:
Q β η Ω η η Q β η = q ¯ β , 1 ( β ) 1 q ¯ β , 1 ( η ) q ¯ η , 1 ( η ) 1 Ω η η q ¯ η , 1 ( η ) 1 q ¯ β , 1 ( η ) q ¯ β , 1 ( β ) 1 Q β β Ω β η Q β η = q ¯ β , 1 ( β ) 1 Ω β η q ¯ η , 1 ( η ) 1 q ¯ β , 1 ( η ) q ¯ β , 1 ( β ) 1 Q β η Ω η β Q β β = q ¯ β , 1 ( β ) 1 q ¯ β , 1 ( η ) q ¯ η , 1 ( η ) 1 Ω η β q ¯ β , 1 ( β ) 1 Q β η Ω β β Q β β = q ¯ β , 1 ( β ) 1 Ω β β q ¯ β , 1 ( β ) 1 .
Suppose  η 0  is known. Under regularity conditions, standard results would inform us that the corresponding estimating equations
ψ * ( β ) = 1 N q β , i ( η 0 , β )
would result in an estimator,  β ^ * , which would be approximated as
N ( β ^ * β 0 ) = 1 N d i * + o P ( 1 ) d N ( 0 , V β , 1 * )
where  d i * = Q * q β , i Q * = ( E [ q β , 1 ( β ) ] ) 1 ; and  V β , 1 * = Q * Ω β β Q * . In the case that the estimating equations  q η , i  and  q β , i  are uncorrelated, the two middle terms of  V 1 , β  drop out. In this case,  V 1 , β V 1 , β * = Q β η Ω η η Q β η  is positive definite. There are cases (when  Ω η β 0 ) when  V 1 , β V 1 , β *  is negative definite, e.g., with a fully and correctly specified maximum likelihood estimator. See Wooldridge [13] for a discussion.

3. Stochastic Expansions with Nuisance Parameters

Whether the estimation method is one or two steps (or more), we can adapt the methodology of Rilstone et al. [5] in a common manner. We suppose that the standard first- through third-order stochastic expansions and approximate moments for the  m × 1  estimator  θ ^  are valid.
We recall the standard form of the influence functions. In order to systematically derive the moments of the various estimators, we write the stochastic expansions for  θ ^ , as in Rilstone et al. [5], as
θ ^ θ 0 = a 1 / 2 + o P ( N 1 / 2 ) , a 1 / 2 + a 1 + o P ( N 1 ) , a 1 / 2 + a 1 + a 3 / 2 + o P ( N 3 / 2 )
and
a 1 / 2 = 1 N 1 d i , a 1 = 1 N 2 i 1 i 2 d i 1 i 2 , a 3 / 2 = 1 N 3 i 1 i 2 i 3 d i 1 i 2 i 3
where
d i = Q q i d i 1 i 2 = s = 1 2 A i 1 i 2 ( s ) A i 1 i 2 ( 1 ) = d ˜ i 1 ( θ ) d i 2 , A i 1 i 2 ( 2 ) = 1 2 d ¯ 1 ( θ θ ) ( d i 1 d i 2 ) , d i 1 i 2 i 3 = s = 1 6 E i 1 i 2 i 3 ( s ) , E i 1 i 2 i 3 ( 1 ) = d ¯ 1 ( θ θ ) ( d i 1 d ˜ i 2 ( θ ) d i 3 ) , E i 1 i 2 i 3 ( 2 ) = 1 2 d ¯ 1 ( θ θ ) ( d i 1 d ¯ 1 ( θ θ ) ( d i 2 d i 3 ) ) , E i 1 i 2 i 3 ( 3 ) = 1 6 d ¯ 1 ( θ θ θ ) ( d i 1 d i 2 d i 3 ) , E i 1 i 2 i 3 ( 4 ) = d ˜ i 1 ( θ ) d ˜ i 2 ( θ ) d i 3 , E i 1 i 2 i 3 ( 5 ) = 1 2 d ˜ i 1 ( θ ) d ¯ 1 ( θ θ ) ( d i 2 d i 3 ) , E i 1 i 2 i 3 ( 6 ) = 1 2 d ˜ i 1 ( θ θ ) ( d i 2 d i 3 )
and ⊗ denotes the usual Kroenecker product where, for the matrices  A = [ a i j ] m × n  and  B p × q , we have  A B = [ a i j B ] m p × n q .
Note here that each of the terms is an  m × 1  vector. We single out the k rows of these pertinent to  β ^  using the following notation:
d β , i 1 = S β d i 1 d β , i 1 i 2 = S β d i 1 i 2 , d β , i 1 i 2 i 3 = S β d i 1 i 2 i 3 , A β , i 1 i 2 ( s ) = S β A i 1 i 2 ( s ) , s = 1 , 2 , E β , i 1 i 2 i 3 ( s ) = S β E i 1 i 2 i 3 ( s ) , s = 1 , 2 , , 6 .
Note that each  A β , i 1 i 2 ( s )  and  E β , i 1 i 2 i 3 ( s )  is constructed from the k bottom rows of each  A i 1 i 2 ( l )  and  E i 1 i 2 i 3 ( s ) , which depend on the entire vectors of  d i  and its derivatives with respect to  θ .
Defining
a β , 1 / 2 = 1 N i 1 d β , i 1 a β , 1 = 1 N 2 i 1 i 2 d β , i 1 i 2 a β , 3 / 2 = 1 N 3 i 1 i 2 i 3 d β , i 1 i 2 i 3 ,
we see that the stochastic expansion for  β ^  can thus be written as
β ^ β 0 = a β , 1 / 2 + o P ( N 1 / 2 ) , a β , 1 / 2 + a β , 1 + o P ( N 1 ) , a β , 1 / 2 + a β , 1 + a β , 3 / 2 + o P ( N 3 / 2 ) .
These expressions are then used in order to obtain the approximate moments of  β ^ .
Recall also the manner in which derivatives are defined recursively using this notation, so we can also decompose matrices of mixed derivatives in a similar manner. In many cases, models will be such that certain subsets of derivatives are zero. In these cases, and particularly in the cases of the GMM and GEL estimators we consider in future work, it will be convenient to decompose the derivatives into sub-blocks. With respect to the first-order derivatives, we see immediately that we can write the Jacobian of  q i ( θ )  as
q i ( θ ) ( θ ) = q i ( η ) ( θ ) S η + q i ( β ) ( θ ) S β .
With respect to the second-order derivatives, we have the following:
q i ( θ θ ) ( θ ) = q i ( η η ) ( θ ) ( S η S η ) + q i ( η β ) ( θ ) ( S η S β ) + q i ( β η ) ( θ ) ( S β S η ) + q i ( β β ) ( θ ) ( S β S β ) = q i ( η η ) ( θ ) S η 2 + q i ( η β ) ( θ ) ( S η S β ) + q i ( β η ) ( θ ) ( S β S η ) + q i ( β β ) ( θ ) S β 2 .
Similarly, with respect to the third-order derivatives we have the following:
q i ( θ θ θ ) ( θ ) = q i ( η η η ) ( θ ) S η 3 + q i ( η η β ) ( θ ) ( S η 2 S β ) + q i ( η β η ) ( θ ) ( S η S β S η ) + q i ( η β β ) ( θ ) ( S η S β 2 ) + q i ( β η η ) ( θ ) ( S β S η 2 ) + q i ( β η β ) ( θ ) ( S β S η S β ) + q i ( β β η ) ( θ ) ( S β 2 S η ) + q i ( β β β ) ( θ ) S β 3 .
These expressions may seem overly long. However, in many situations, blocks of the cross-derivatives are equal to zero (this may hold, e.g., at the true and/or expected values and/or in deviations from expectations). Also, these matrices are often post-multiplied by a block diagonal matrix, which will automatically nullify the resulting product. A formal proof that these matrices can be decomposed in this way can be obtained by application of the product rule of calculus.
As a practical matter, it is often simpler to compute derivatives by decomposing them into their, say,  η  and  β  (by inspection, many of these are often zero) components. Also note that it is often convenient to decompose the derivatives row-wise so that
q i ( θ ) ( θ ) = q η i ( θ ) ( θ ) q β i ( θ ) ( θ ) , q i ( θ θ ) ( θ ) = q η i ( θ θ ) ( θ ) q β i ( θ θ ) ( θ ) , q i ( θ θ θ ) ( θ ) = q η i ( θ θ θ ) ( θ ) q β i ( θ θ θ ) ( θ ) .

4. Approximate Moments with Nuisance Parameters

Once we have obtained the stochastic expansions for the estimators of interest, we define the approximate moments for them in the same manner as we do the moments for the entire parameter, typically affixing a subscript  β . We can obtain the approximate moments in a couple of ways. One is by constructing these from the appropriate moments of the “marginal” influence functions  d β , i d β , i 1 , i 2 , and  d β , i 1 , i 2 , i 3 .
Alternatively, if we have obtained the approximate moment for  θ ^ , it may be possible to obtain that for  β ^  directly. Note that if we have obtained, say, the lth approximate (Kronecker) moment of  θ ^  using  E [ θ ^ j / 2 l ] , then we can obtain the lth approximate (Kronecker) moment of  β ^  using
E [ β ^ j / 2 l ] = E [ ( S β θ ^ j / 2 ) l ] = S β l E [ θ ^ j / 2 l ] .
In each of these cases, there will often be some simplification possible. Note that the approximate bias for  β ^  is obtained as
Bias 2 [ β ^ ] = S β Bias 2 [ θ ^ ] = S β E [ θ ^ 1 ] .
The approximate second, third, and fourth (Kronecker) moments of  β ^  can be obtained from Equation (28) by setting, respectively,  j = 3 , l = 2 j = 2 , l = 3 ; and  j = 3 , l = 4 . In matrix form, the approximate MSE of  β ^  is obtained by evaluating
E [ β 3 / 2 β 3 / 2 ] = E [ S β θ 3 / 2 ( S β θ 3 / 2 ) ] = S β E [ θ 3 / 2 θ 3 / 2 ] S β ,
so that
MSE 2 [ β ^ ] = S β MSE 2 [ θ ^ ] S β = S β MSE 2 [ θ ^ 3 / 2 θ ^ 3 / 2 ] S β ,
retaining terms up to  O ( N 2 ) . In the illustrations that follow, we use a variety of these techniques.

5. Illustrations

In this section, we examine in detail three examples. First, the case of OLS when we are only interested in a subset of the regression parameters. This is typically a case of one-step estimation: neither parameter can generally be estimated consistently without also estimating the other. The other two examples in this section are two-step estimators. In each case, we derive the first through fourth approximate moments. It is useful for researchers to examine these higher moments, including skewness and kurtosis, when conducting inferences as they can indicate when deviations from standard first-order results can impact on the conclusions of their inferences. For example, Hansen et al. [14] conducted simulations using estimators which fall into our framework (various GMM estimators). They documented extensive deviations from normality in the sampling distributions, notably in higher moments, of these estimators.

5.1. Ordinary Least Squares with Nuisance Parameters

We reconsider the OLS estimator of the linear regression model with the focus now on a subset of the parameters. We can obtain the relevant stochastic expansions and approximate moments for these in a variety of ways. Here, we achieve this in a manner to highlight the approach in this paper. To put this in the current context we modify the notation slightly. Again, we have a model and estimating equations that can be written as
y i = X i θ 0 + ϵ i , q i ( θ ) = X i ( y i X i θ ) .
Note that we have labeled the entire regression parameter  θ , which is  m × 1 , and partitions such that  θ = ( η , β ) , where  η  is the  l × 1  nuisance parameter and  β  is the  k × 1  parameter of interest. Denote the OLS estimators as  θ ^ = ( η ^ , β ^ ) . Denote the influence functions for  θ ^  as  d i d i 1 i 2 , and  d i 1 i 2 i 3 . The influence functions for  β ^  are obtained simply by pre-multiplying the influence functions for  θ ^  by  S β :
d β , i = S β d i = S β V X 1 X i ϵ i , d β , i 1 i 2 = S β d i 1 i 2 = S β V X 1 W i 1 V X 1 X i 2 ϵ i 2 , d β , i 1 i 2 i 3 = S β d i 1 i 2 i 3 = S β V X 1 W i 1 V X 1 W i 2 V X 1 X i 3 ϵ i 3 .
where  V X = E [ X 1 X 1 ] W i = X i X i V X . We note the following, some of which are very familiar as they arise extensively with the linear regression model. We also see here that
Ω = σ 2 E [ X η , 1 X η , 1 ] E [ X η , 1 X β , 1 ] E [ X β , 1 X η , 1 ] E [ X β , 1 X β , 1 ] = σ 2 V X .
The approximate bias (and exact bias) for  β ^  is of course zero, noting that  E [ d β , 11 ] = S β E [ d 11 ] = 0 .
The decomposition of  V X 1  will be familiar to most readers. Using the partitioned inverse formula we have
V X 1 = V η η V η β V β η V β β
V η η = E [ X η , 1 X η , 1 ] 1 + E [ X η , 1 X η , 1 ] 1 E [ X η , 1 X β , 1 ] V β β E [ X β , 1 X η , 1 ] E [ X η , 1 X η , 1 ] 1 V η β = E [ X η , 1 X η , 1 ] 1 E [ X η , 1 X β , 1 ] V β β V β η = V β β E [ X β , 1 X η , 1 ] E [ X η , 1 X η , 1 ] 1 V β β = ( E [ X β , 1 X β , 1 ] E [ X β , 1 X η , 1 ] E [ X η , 1 X η , 1 ] E [ X η , 1 X β , 1 ] ) 1 .
The usual first-order asymptotic variance for  θ ^  is given by
V 1 = E [ d 1 d 1 ] = V X 1 Ω V X 1 = σ 2 V X 1 .
and that for  β ^  is
V β , 1 = S β V 1 S β = σ 2 V β β
Let  X i * = V X 1 / 2 X i . With respect to the approximate MSE, we now obtain the approximate MSE of  β ^  by pre-multiplying the standard formula by  S β  and post-multiplying by  S β . In this way, we see that the approximate MSE of  β ^  can be written as
MSE β , 2 [ β ^ ] = 1 N S β V 1 S β + 1 N 2 S β V 1 1 / 2 E [ ( X 1 * X 1 * ) 2 ] V 1 1 / 2 V 1 S β
As a special case, consider when  X i N ( 0 , V X ) . Then,  X 1 * N ( 0 , I m ) . The second and fourth moments of the pth element of  X 1 * , say  X 1 p * , are 1 and 3. The  p q th element of  S = ( X 1 * X 1 * ) 2 = X 1 * ( X 1 * X 1 * ) X 1 *  can be written as
S p q = X 1 p * X 1 q * r = 1 k X l r * 2
For  p q E [ S p q ] = 0 . For  p = q  we have
E [ S p p ] = E [ X p p * 2 ( X p p 2 + r p m X 1 r * 2 ) ] = 3 + m 1 = m + 2
so that  E [ ( X 1 * X 1 * ) 2 ] = ( m + 2 ) I m  and
MSE β , 2 [ β ^ ] = 1 N 1 + m + 1 N S β V 1 S β = 1 N 1 + m + 1 N V β , 1 .
Note here the presence of m in the second-order term.
With regards to the approximate skewness of  β ^ , it follows from Equation (28) that
Skew 2 [ β ^ ] = S β 3 Skew 2 [ θ ^ ] = 1 N 2 ( S β V X 1 ) 3 E [ X 1 3 ] E [ ϵ 1 3 ] .
Note the presence of  S β V X 1 . Typically, this does not simplify. The notable case in which it does is when  X β , 1  and  X η , 1  are orthogonal and  E [ X β , 1 X η , 1 ] = 0 . Then,  S β V X 1 = 0 E [ X β , 1 X β , 1 ] 1  and
Skew 2 [ β ^ ] = 1 N 2 ( E [ X β , 1 X β , 1 ] 1 ) 3 E [ X β , 1 3 ] E [ ϵ 1 3 ] .
To obtain the approximate kurtosis of  β ^ , it follows from Equation (28) that
Kurt 2 β ^ = S β 4 Kurt 2 θ ^ = 1 N 3 S β 4 V X 1 4 E [ X 1 4 ] E [ ϵ 1 4 ] 3 V 1 2 + 6 [ E [ V 1 1 / 2 W 1 * 2 V 1 1 / 2 ] V 1 = 1 N 3 ( ( S β V X 1 ) 4 E [ X 1 4 ] E [ ϵ 1 4 ] 3 V β , 1 2 + 6 S β 2 [ E [ V 1 1 / 2 W 1 * 2 V 1 1 / 2 ] V β , 1 )
As we saw with the approximate MSE and skewness, we can verify that this corresponds to the results with the full OLS estimator. This is simply achieved here and above by setting  m = k  and  S β = I m .
We now consider examples of estimation of the parameter of interest with two-step estimators.

5.2. Regression-Variance Problem

A simple illustration with two-step estimators is the estimation of the variance of a linear regression model. The first step estimates the regression parameters and these are then used in the second step to estimate the variance. This example is useful pedagogically and it also serves as an important segue into other very common and more complex models which use regression coefficient estimates from a first stage. These include various two-stage least squares estimators which we examine, also weighted and generalized least squares estimators, where OLS estimates are used in a first stage and then used to estimate weights in a second stage. Generated regressor models and treatment effects parameters also fall into this category. Before going on to those extensions, however, the focus here is on the second moment of the random variable,  ϵ i :
β 0 = E [ ϵ i 2 ]
where  ϵ i  is the residual from the regression model
y i = X i η 0 + ϵ i E [ X i ϵ i ] = 0
and  X i  is  l × 1 . Framing this as a two-step estimator, we treat  η  as the first step or “nuisance” parameter and  β  as the second step or parameter “of interest”. The joint estimating equations are given by
q i ( θ ) = q η , i q β , i = X i ( y i X i η ) ( y i X i η ) 2 β , q i = ϵ i μ i
where  μ i = ( y i X i η 0 ) 2 β 0 . Put  V X = E [ X 1 X 1 ] W i = X i X i V X .
q i ( θ ) = X i X i 0 l × 1 2 ( y i η ) X i 1 , q ¯ 1 ( θ ) = V X 0 l × 1 0 1 × l 1 ,
q ˜ i ( θ ) = W i 0 l × 1 2 ϵ i X i 0 1 × l q i ( θ θ ) = 0 l × l 2 0 0 0 2 X i 2 0 0 0 , q ¯ 1 ( θ θ ) = 0 l × l 2 0 0 0 2 V X 0 0 0
Noting that the derivative matrices of  q i q ˜ i , and  q ¯ 1  are quite sparse, it is useful to use the partitioned matrix notation to write
q i ( θ ) = q i ( η ) S η + q i ( η ) S β ,
q i ( η ) = X i X i 2 ( y i X i η ) X i , q i ( β ) = 0 l × 1 1
This partitioning becomes particularly useful with higher-order derivatives, here and with more complex models, whence we see immediately that
q i ( θ θ ) = q i ( η η ) S η 2 , q ˜ i ( θ θ ) = q ˜ i ( η η ) S η 2 , q ¯ 1 ( θ θ ) = q ¯ 1 ( η η ) S η 2 q i ( η η ) = 0 l × l 2 2 X i 2 , q ˜ i ( η η ) = 0 l × l 2 2 W i , q ¯ 1 ( η η ) = 0 l × l 2 2 V X
At the end of the paper we show that the influence functions for  β ^  are given by
d β , i = μ i d β , i 1 i 2 = X i 1 V X 1 X i 2 ϵ i 1 ϵ i 2 d β , i 1 i 2 i 3 = X i 2 V X 1 W i 1 V X 1 X i 3 ϵ i 1 ϵ i 2 .
These are then used to obtain the approximate moments of  β ^ .
The approximate moments are derived at the end of the paper. There, we see that the second-order bias is
Bias 2 [ β ^ ] = 1 N l β 0 .
We note that the second-order bias is the usual  l β 0 / N , which is an alternative way of obtaining the result that the method-of-moments estimator of the variance in a linear regression model has the expected value  β 0 ( N l ) / N .
The components of the approximate MSE are as follows.
V β , 1 = E [ d β , 1 2 ] = E [ μ 1 2 ] V β , 21 = l E [ μ 1 2 ] V β , 22 = l 2 β 0 2 + 2 l β 0 2 V β , 23 = 0
and the approximate MSE is given by
MSE 2 [ β ^ ] = 1 N V β , 1 + 1 N 2 2 V β , 21 + V β , 22 + 2 V β , 23 = 1 N E [ μ 1 2 ] + 1 N 2 2 l E [ μ 1 2 ] + l 2 β 0 2 + 2 l β 0 2
The approximate skewness is given by
Skew 2 [ β ^ ] = S β , 1 + 3 S β , 2
where
S β , 1 = E [ μ 1 3 ] , S β , 2 = l V β , 1 β 0 2 E [ ϵ 1 3 ] 2 E [ X 1 ] V X 1 E [ X 1 ]
The approximate kurtosis is
Kurt 2 [ β ^ ] = 1 N 3 K β , 1 + 4 K β , 2 + 4 K β , 3 + 6 K β , 4
with
K β , 1 = E [ μ 1 4 ] 3 V β , 1 2 K β , 2 = 3 l E [ ϵ 1 2 μ 1 ] E [ μ 1 2 ] l β 0 E [ μ 1 3 ] 6 E [ ϵ 1 μ 1 2 ] E [ ϵ 1 3 ] E [ X 1 ] V X 1 E [ X 1 ] K β , 3 = 0 K β , 4 = ( l 2 + 2 l ) E [ μ 1 2 ] β 0 2 + ( 4 l + 8 ) β 0 E [ μ 1 ϵ 1 ] 2 E [ X 1 ] V X 1 E [ X 1 ]
Note that, under symmetry, some terms simplify:  E [ μ 1 ϵ 1 ] = 0 = E [ ϵ 2 μ 2 2 ] . Also, when  X i = 1 , the approximate moments correspond to those of the sample variance estimator.

5.3. Two-Stage Least Squares

The 2SLS estimator of a linear model with endogenous regressors fits nicely within this framework. For simplicity, we consider the case where there is only one (endogenous) variable:
y i = Y i β 0 + ϵ i
where  ϵ i  is possibly correlated with  Y i . There is an  l × 1  “instrument”  X i  available, with
Y i = π 0 X i + η i
such that  E [ X i ( y i Y i β ) ] = 0  if  β = β 0 , the “true” value of the parameter. We use  π  to denote the nuisance parameter here, with
θ = π β .
Let  θ = ( π ^ , β ^ )  solve the sample moments:
1 N q i ( θ ^ ) = 0
where
q i ( θ ) = q π , i ( θ ) q β , i ( θ ) = X i ( Y i π X i ) π X i ( y i Y i β ) , q i = X i η i g i ϵ i ,
and
g i = π 0 X i .
β ^  is the usual 2SLS estimator (in this case also an instrumental variables, or IV, estimator). The properties of  π ^  are those of the usual OLS estimator from the regression of  Y i  on  X i . Note that l here denotes the dimension of  π , the number of instruments. l factors into the approximate moments of the 2SLS estimator as it did with the moments of the variance estimator associated with a regression model. Let
σ ϵ 2 = E [ ϵ 1 2 ] , σ η 2 = E [ η 1 2 ] , ϱ = E [ ϵ 1 η 1 ] , V X = E [ X 1 X 1 ] , V g = E [ g 1 2 ] = π 0 V X π 0 .
For simplicity, we suppose the  X i s are independent of  ϵ i , η i . (This assumption can be relaxed by making moments of the disturbances conditional on the  X i s, in which case the results would then be modified with terms such as  σ ϵ 2 V X  changed to  E [ E [ ϵ i 2 | X i ] X i X i ] , and so on. This makes the derivations substantially messier without providing much more insight.) We see that
Ω = E [ q 1 q 1 ] = σ η 2 E [ X 1 X 1 ] ϱ E [ X 1 g 1 ] ϱ E [ g 1 X 1 ] σ ϵ 2 E [ g 1 2 ] = σ η 2 V X ϱ V X π 0 ϱ π 0 V X σ ϵ 2 π 0 V X π 0 .
We place many of the derivations at the end of the paper. Note that the influence functions are based on the first three derivatives of  q i ( θ )  (evaluated at  θ 0 ):
q i ( π ) ( θ ) = X i X i X i ( y i Y i β ) , q i ( β ) ( θ ) = 0 l × 1 π X i Y i
so that  q i ( θ ) = q i ( π ) S π + q i ( β ) S β , i.e.,
q i ( θ ) = X i X i 0 X i ϵ i g i Y i , q ¯ 1 ( θ ) = V X 0 0 V g , Q = V X 1 0 0 V g 1
We have  q i ( π π ) = 0  and  q i ( β β ) = 0 :
q i ( π β ) = q i ( β π ) = 0 l × l X i Y i ,
so that
q i ( θ θ ) = 0 l × l X i Y i ( S π S β ) + ( S β S π )
and  q i ( θ θ θ ) ( θ ) = 0 .
The influence function for  β ^  is
d β , i = S β Q q i = V g 1 g i ϵ i
Note that the first-order asymptotic variance for  β ^  is
V β , 1 = E [ d β , 1 2 ] = σ ϵ 2 V g 1 .
We derive the higher-order influence functions and approximate moments at the end of the paper. Some additional notation facilitates the presentation and also permits some additional intuition along the way. For a random variable  e i , put  e ˙ i = e i / V g . Also, denote the scaled second moments:  σ ˙ ϵ 2 = E [ ϵ ˙ 1 2 ] σ ˙ η 2 = E [ η ˙ 1 2 ]  and  ϱ ˙ = E [ ϵ ˙ 1 η ˙ 1 ] . We note in advance that the approximate moments of  β ^  can all be written in terms of these (and third and fourth) scaled moments. Scaling streamlines and simplifies the notation.
Scaling the moments also underlines one of the fundamental problems in 2SLS estimation: weak instruments. Since each of these moments is being scaled by multiples of  1 / V g  and  V g = π V X π , if  π 0 , as is the case with weak instruments, each of the approximate moments (not just the variance) will be large in absolute value. This will make traditional corrections based directly on approximate moments, including bias and variance corrections and Edgeworth and similar analytical approximations, unreliable. Also, to the extent that resampling-based methods such as bootstrap and jackknife are seen as mimicking the properties of analytical corrections, these too will be unreliable. This is to say that the methods explored here, while providing insight into the weak instruments problem, may not be the best approach for inference in that context.
For the 2SLS estimator, the terms entering into the approximate bias are given by
B β , 21 = ( l 1 ) ϱ ˙ , B β , 22 = ϱ ˙
so that using the standard formula,
Bias 2 [ β ^ ] = 1 N ( B β , 21 + B β , 22 ) = ( l 2 ) 1 N ϱ V g .
This result has been known since at least Nagar’s [3] paper. We note its dependence on (a), the degree of endogeneity as measured by the correlation  ϱ  between the two equations in the model; (b) the number of instruments, l; (c) the strength of the instruments as measured by  V g = π 0 V X π 0 ; and (d) the sample size. We will see that each of these factors impacts on the other approximate moments. The components of the approximate bias are trivially estimated from the corresponding sample moments: the cross-moment of the residuals from the structural equation and reduced-form equation for the endogenous variable.
The approximate MSE is given by
MSE 2 [ β ^ ] = 1 N V β , 1 + 1 N 2 ( 2 V β , 21 + V β , 23 + 2 V β , 22 )
with
V β , 1 = σ ˙ ϵ 2 σ ϵ 2 V g ,
being the usual first-order variance for this model, and the higher-order terms are given by
V β , 21 = E [ ϵ ˙ 1 2 η ˙ 1 ] ( E [ g ˙ 1 X 1 V X 1 X 1 ] 2 E [ g ˙ 1 3 ] ) V β , 1 Var [ g ˙ 1 2 ] , V β , 22 = ( l 2 ) σ ˙ ϵ 2 σ ˙ η 2 2 ( l 2 ) ϱ ˙ 2 + V β , 1 Var [ g ^ 1 2 ] V β , 23 = B β , 2 2 + l ( σ ˙ ϵ 2 σ ˙ η 2 + ϱ ˙ 2 ) + V β , 1 Var [ g ˙ 1 2 ] .
We note a few things here. Under the cross-symmetry (we use the term “cross-symmetry” to underscore that there is not an assumption that, say,  E [ ϵ 1 3 ] = 0  or  E [ η 1 3 ] = 0 ) condition that  E [ ϵ ˙ 1 2 η ˙ 1 ] = 0 , the first expression in  V β , 21  drops out. Second, the  Var [ g ^ 1 2 ]  terms in  V β , 21  and  V β , 22  cancel, leaving one  V β , 1 Var [ g ˙ 1 2 ]  term in the approximate MSE. This may be attributed to the random  X i s. If these are assumed “fixed”, “fixed” regressors have no variance and this term drops out (the same holds if we derive results conditional on the  X i s). Rothenberg [15] derived an expression for the approximate MSE under normality and fixed regressors. The expression here for MSE conforms with his under-symmetry and fixed regressors. It is useful to note therefore that the second-order variance result in Rothenberg has robustness against departures from normality, so long as the departures satisfy the cross-symmetry condition.
The approximate skewness is given by
Skew 2 [ β ^ ] = S β , 1 + 3 S β , 2
with
S β , 1 = E [ g ˙ 1 3 ] E [ ϵ ˙ 1 3 ] , S β , 2 = V β , 1 B β , 2 2 σ ˙ ϵ 2 ϱ ˙ .
Note that the first term is simply the third moment of the influence function, which can be simply estimated from sample moments. The other terms are comprised of terms which already occur in the first-order variance and second-order bias. Hence, estimating skewness is a trivial step when conducting standing inferences.
The approximate kurtosis is
Kurt 2 [ β ^ ] = 1 N 3 K β , 1 + 4 K β , 2 + 4 K β , 3 + 6 K β , 4
with
K β , 1 = Var [ g ˙ 1 2 ] E [ ϵ ˙ 1 4 ] + ( E [ ϵ ˙ 1 4 ] 3 ( σ ˙ ϵ 2 ) 2 ) K β , 2 = S β , 1 B β , 2 + 3 ( σ ˙ ϵ 2 V β , 21 E [ g ˙ 1 3 ] E [ ϵ ˙ 1 2 η ˙ 1 ] σ ˙ ϵ 2 S β , 1 ϱ ˙ Var [ g ˙ 1 2 ] σ ˙ ϵ 2 ϱ ˙ ) K β , 3 = 3 V β , 1 ( V β , 22 + 2 ρ ˙ 2 ) K β , 4 = V β , 1 ( V β , 23 4 ϱ ˙ B β , 2 + 2 σ ˙ ϵ 2 σ ˙ η 2 + 6 ϱ ˙ 2 )
Note that all but one of the terms in the approximate kurtosis appear in the lower-order approximate moments. The only other term,  E [ ϵ ˙ 1 4 ] , can be simply estimated by its empirical analogue of the residuals.

6. An Alternative Approach

In this section, we outline an alternative approach which can be used in the case of two-step estimation. This alternative method is useful if we wish to remain agnostic with respect to the manner in which the nuisance parameter is estimated. This may be seen as an advantage, although it begs the question as to the provenance of the first-step estimator.
The setup in this case is as follows. The parameter of interest is the solution to the subset of moment conditions applying to the estimation of the parameters of interest. Let
ψ β ( η , β ) = 1 N q β , i ( θ )
q β , i ( θ ) = q ( Z i ; θ ) , θ = ( η , β )
where  E [ q β , i ( θ ) ] = 0  only at  θ = θ 0 = ( η 0 , β 0 ) . The estimator,  β ^ , is the solution to the  k × 1  set of equations  ψ β ( η ^ , β ^ ) = 0 . This may appear to be the same setup as previously, and to all intents and purposes it is, except we simply make additional assumptions on  η ^  and on  q β , i ( θ )  in some neighborhood of  θ 0 . If we were simply interested in consistency, we might, e.g., assume that  η ^ p η 0 . To obtain first-order asymptotic results we may additionally assume that
η ^ η 0 = c 1 / 2 + o P ( N 1 / 2 ) , c 1 / 2 = 1 N W i
where  W i  is mean-zero independent and identically distributed vectors or matrices and that we are able to perform a linearization of  ψ β ( η ^ , β )  uniformly in some neighborhood of  θ 0 . For first-order asymptotics, this may be expedient. For example, if we have established the consistency of  η ^  and  β ^ , we can often write
0 = ψ β ( η ^ , β ^ ) = ψ β ( η 0 , β ^ ) + F 0 c 1 / 2 + o P ( N 1 / 2 ) = ψ β ( η 0 , β 0 ) + ψ β ( β ) ( η 0 , β 0 ) ( β ^ β 0 ) + F 0 c 1 / 2 + o P ( N 1 / 2 )
where  F 0  is a constant  k × l  matrix which leads to the first-order approximation
β ^ β 0 = 1 N d i + o P ( N 1 / 2 )
where
d i = Q ( q i + F 0 W i ) , Q = ( q ¯ β , 1 ( β ) ( η 0 , β 0 ) ) 1 , q i = q i ( η 0 , β 0 )
Under standard regularity conditions  N ( β ^ β 0 ) d N ( 0 , V ) , where
V = E [ d 1 d 1 ] = Q ( E [ q i q i ] + E [ q i W i ] F 0 + F 0 E [ W i q i ] + F 0 E [ W i W i ] F 0 ) Q .
The exact form of V depends on the variance of  W i , the correlation of  W i  with  q i , and the form of  F 0 . To say anything meaningful, one needs to be more specific about  W i , which typically requires a more complete statement regarding how  η ^  was obtained. As Wooldridge [13] points out, this generally results in a statement corresponding to a set of moment conditions (first-order conditions) such as we started out with in the second section. Wooldridge has a good summary of the issues involved. Typically,  F 0  is defined as the limiting value of the derivative:  ψ β ( η ) ( η , β )  evaluated at  η 0 , β 0 F 0 = q ¯ β , 1 ( η ) . Also, since  η ^  can typically be framed as the solution to a set of l estimating equations:  1 N r i ( η ^ ) = 0 , then  W i  can be thought of as the influence function for  η ^ W i = ( E [ r i ( η ) ( η ) ] ) 1 r i  if we are willing to assume standard regularity conditions hold. We thus have
F 0 = q ¯ β , 1 ( η ) ( η 0 , β 0 ) W i = ( E [ r i ( η ) ( η 0 ) ] ) 1 r i
Now, when we substitute these expressions into V, we see that, with   r i = q η , i , we obtain the same expression for the variance as in Equation (13). Thus, from a practical perspective, nothing is gained in the models we examine by using a generic approach, at least from the perspective of first-order asymptotics.
In terms of conducting higher-order asymptotics, it is possible to obtain “generic” higher-order stochastic expansions in the following way. First, assume that we have expansions  η ^ η 0 = η ^ j + O p ( N ( j + 1 ) / 2 ) :
η ^ j = c 1 / 2 + c j / 2
where each  c r / 2  is an rth-order V-statistic,  c 1 / 2  is as above, and
c 1 = 1 N 2 i 1 i 2 W i 1 i 2 , c 3 / 2 = 1 N 3 i 1 i 2 i 3 W i 1 i 2 i 3
Second, write down a higher-order Taylor series expansion of  ψ ( η ^ , β ^ )  as a function of  η ^  and  β ^ , iteratively solving for  β ^ β 0 .
Third, replace those terms in the Taylor series containing  η ^ η 0  with the approximations in Equation (90). This results in a generic stochastic expansion. Some explicit results can be tweaked out of this once one is more specific about the nature of the Ws. Newey and Smith [2] use this approach to obtain the bias of GMM estimators. We find that it becomes more difficult to examine higher-order moments in this manner and it is simpler to specify the first-step as a method-of-moments estimator. The method here for two-step estimation is naturally extended to three- or more step estimation, which permits examination of iterated estimators, as are common with GMM estimation.

7. Numerical Illustration

We provide a brief numerical illustration based on the 2SLS framework in Section 5.3. The 2SLS estimator in the case examined here is also an IV estimator. Variations of the bias result in Equation (75) were obtained by Nagar [3] for fixed  X i s and Rilstone et al. [5] for random  X i s. Nagar’s result has the sample analog, with  1 N X i 2  rather than  E [ X i 2 ] , say, appearing in the denominator. The importance of this simple result and others related to it cannot be over-emphasized. The sampling properties of 2SLS/IV estimators, and the difference of these from those predicted by first-order asymptotic theory in different contexts, have been examined extensively in econometrics. Note that the approximate bias in the model is a function of (i) the degree of endogeneity, measured by  ρ ; and (ii) the explanatory power of the instruments, as measured by  V g = π 0 V X π  (low values of  π 0 V X π  correspond to “weak” instruments) as well as the number of regressors.
With respect to the first problem, the methods explored here can perform well in correcting bias and other moments. Numerous studies have performed detailed simulations to examine the performance of bias approximations. To illustrate approximate bias, we conducted simulations following the basic 2SLS/IV framework with one instrument,  X i . (Note that this is for illustrative purposes only. Moments such as the actual bias may not exist and moments approximating to, say,  β ^ , are based on those of the random variable (the stochastic expansion which is close in the probabilistic sense to  β ^ ).) We drew  N = 50  observations with the  X i s from the fixed design  X i = 1 + i / N , the   η i  from  N ( 0 , σ η ) , and  ϵ i = ρ η i + u i , with  u i N ( 0 , σ u ) σ ϵ = 1 σ η = 1 β 0 = 0 , and  π 0 = 1 . We varied the measure of endogeneity,  ρ , as per the table. The table reports the average over  M = 10 , 000  replications of the usual 2SLS/IV estimator; the corrected version, using the analytic formula; and a bias-corrected version, where actual moments have been replaced with sample moments. We see that both bias corrections can yield substantial improvements in the point estimates. We also provide plots of the (standardized (estimates are divided by the asymptotic standard error)) distribution of these estimates (these densities are actually kernel estimates using standard normal kernel and window width  γ = M 1 / 5 σ β σ β 2 = Var [ β ^ ] ) for the case with  ρ = 1  (plots of the corresponding densities of the 2SLS/IV estimators with values of  ρ = 0.5 , 0.75  are similar) against N(0, 1).
On the other hand, when the underlying problem is that of weak instruments, the issue is more fundamental (one way to see this is that when  π 0 , (1) estimates based on  B 2  will be unreliable and (2) we will see that the same issue crops up in all approximate moments of  β ^ , making them all unreliable and in fact undermining standard analytical asymptotic expansions and resampling methods which are justified by standard analytical results) and the analytical methods in this paper are not necessarily that helpful. The problem of weak instruments was emphasized in the influential articles by Nelson and Startz [16] and Bound et al. [17]. In the case when  π 0 = 0 β 0  is in fact unidentified, and the sorts of corrections here are not appropriate (note that even the usual asymptotic variance of the 2SLS estimator explodes as  π 0 V X π 0 ). Much subsequent literature followed, including Staiger and Stock [18], who considered situations of near non-identification ( π 0 0 ). In the Staiger and Stock [18] approach, the concentration parameter  μ 2  is defined (in our notation) as  μ 0 2 = π 0 V X π N / σ v 2 . Their expansions are then performed as  μ  increases, setting  μ = C N 1 / 2 , with C a positive constant, similar to expansions for local power. When the primary concern to a researcher is weak instruments, these alternatives and various extensions are more likely to provide improved inferences rather than simple moment corrections (although traditional analytical techniques (and resampling methods as well) may perform poorly in the weak instrument case when applied to standard inference procedures, Kleibergen [19] has shown that this is not the case when these corrections are applied to tests which are not dependent on strong instruments) (see Table 1 and Figure 1).

8. Conclusions

This paper has developed methods for deriving higher-order stochastic expansions and approximate moments for the estimators of the parameters of interest for models in which there are additional nuisance parameters. These methods can be useful in a number of ways. They can be useful for computing the approximate moments. An inspection of the expansions and approximate moments may indicate when these should be taken into account. For example, in the case where the nuisance parameter estimate may not affect the first-order distribution of the estimate of the parameter of interest, but does affect its higher-order moments. In any case, the results may well indicate when any first-order approximation may be deficient. The Edgeworth and related expansions for the estimators of the parameters of interest can be derived immediately from the approximate moments.
This paper also sets the stage for examination of some more difficult estimators which may not appear to fit into the framework of estimating m parameters with some estimating equations. Two-stage least squares and certain other forms of GMM estimators based on an initial consistent estimate can be fit into the two-step paradigm. However, other estimators such as three-stage least squares and other iterated GMM estimators require more stages. This can be extended to GMM estimators by extending the two-step approach by stacking equations and creating additional nuisance parameters. The notation developed here is helpful in that regard. Somewhat different are certain “harder” problems such as GEL, where there may be equal numbers of parameters and estimating equations, but the relevant information regarding the parameters of interest is not easily retrieved. The techniques developed in this paper set the stage for that.

9. Derivations

This section provides the details for the two-step illustrations in the paper. Throughout, when summations are taken over a set denoted by  σ ( ) , this is taken to refer to the set of unique permutations of the given arguments. Specifically, the following sets of permutations of subscripts are used below.
σ ( 1122 ) = { ( 1122 ) , ( 1221 ) , ( 1212 ) } , σ ( 11122 ) = { ( 11122 ) , ( 11212 ) , ( 11221 ) , ( 12112 ) , ( 12121 ) , ( 12211 ) } , σ ( 11222 ) = { ( 11222 ) , ( 12122 ) , ( 12212 ) , ( 12221 ) } , σ ( 112233 ) = { ( 112233 ) , ( 112323 ) , ( 112332 ) , ( 121233 ) , ( 121323 ) , ( 121332 ) , ( 122133 ) , ( 123123 ) , ( 123132 ) , ( 123213 ) , ( 122313 ) , ( 123312 ) , ( 123231 ) , ( 123321 ) , ( 122331 ) } .
A more detailed discussion can be found in Rilstone [11].
  • Regression-Variance Problem
We first derive the influence functions for the parameter of interest,  β , recalling the definition  μ i = ϵ i 2 β 0 . Note here that  S β Q = 0 1  and  S η Q = V X 1 0 . The influence function for  θ ^  is
d i = Q q i = V X 1 0 0 1 X i ϵ i μ i = V X 1 X i ϵ i μ i , d i ( θ ) = Q q i ( θ ) = V X 1 0 0 1 X i X i 0 2 ϵ i X i 1 = V X 1 X i X i 0 2 ϵ i X i 1 d ˜ i ( θ ) = V X 1 W i 0 2 ϵ i X i 0
  W i = X i X i V X .
d i ( θ θ ) = Q q i ( θ θ ) = V X 1 0 0 1 0 0 0 0 2 X i 2 0 0 0 = 0 0 0 0 2 X i 2 0 0 0 d ¯ 1 ( θ θ ) = 0 0 0 0 2 V X 0 0 0 , d ˜ 1 ( θ θ ) = 0 0 0 0 2 W i 0 0 0 d i ( θ θ θ ) = 0 .
So,
d β , i = S β d i = μ i
d β , i 1 i 2 = S β d β i 1 i 2 = s = 1 2 A β , i 1 i 2 ( s ) A β , i 1 i 2 ( 1 ) = S β d ˜ i 1 ( θ ) d i 2 = S β V X 1 W i 1 0 2 ϵ i 1 X i 1 0 V X 1 X i 2 ϵ i 2 μ i 2 = S β V X 1 W i 1 V X 1 X i 2 ϵ i 2 2 ϵ i 1 X i 1 V X 1 X i 2 ϵ i 2 = 2 X i 1 V X 1 X i 2 ϵ i 1 ϵ i 2 A β , i 1 i 2 ( 2 ) = S β A i 1 i 2 ( 2 ) = S β 1 2 d ¯ 1 ( θ θ ) ( d i 1 d i 2 ) = 1 2 S β 0 2 V X S η 2 V X 1 X i 1 ϵ i 1 μ i 1 V X 1 X i 2 ϵ i 2 μ i 2 = V X V X 1 X i 1 ϵ i 1 V X 1 X i 2 ϵ i 2 = [ X i 1 V X 1 V X V X 1 X i 2 ] ϵ i 1 ϵ i 2 = X i 1 V X 1 X i 2 ϵ i 1 ϵ i 2 .
Therefore,
d β , i 1 i 2 = X i 1 V X 1 X i 2 ϵ i 1 ϵ i 2
and
d β , i 1 i 2 i 3 = s = 1 6 E β , i 1 i 2 i 3 ( s ) ,
E β , i 1 i 2 i 3 ( 1 ) = S β d ¯ 1 ( θ θ ) ( d i 1 d ˜ i 2 ( θ ) d i 3 ) , = S β 0 2 V X S η 2 V X 1 X i 1 ϵ i 1 μ i 1 V X 1 W i 2 V X 1 X i 3 ϵ i 3 2 ϵ i 2 X i 2 V X 1 X i 3 ϵ i 3 = 2 V X V X 1 X i 1 ϵ i 1 ( V X 1 W i 2 V X 1 X i 3 ϵ i 3 ) = 2 V X V X 1 X i 1 V X 1 W i 2 V X 1 X i 3 ϵ i 1 ϵ i 3 = 2 [ X i 1 V X 1 V X V X 1 W i 2 V X 1 X i 3 ] ϵ i 1 ϵ i 3 = 2 X i 1 V X 1 W i 2 V X 1 X i 3 ϵ i 1 ϵ i 3
E β , i 1 i 2 i 3 ( 2 ) = 1 2 S β d ¯ 1 ( θ θ ) ( d i 1 d ¯ 1 ( θ θ ) ( d i 2 d i 3 ) ) = 1 2 2 V X S η 2 d i 1 0 V X S η 2 d i 2 d i 3 = 0
E β , i 1 i 2 i 3 ( 3 ) = S β 1 6 d ¯ 1 ( θ θ θ ) ( d i 1 d i 2 d i 3 ) = 0 .
E β , i 1 i 2 i 3 ( 4 ) = S β d ˜ i 1 ( θ ) d ˜ i 2 ( θ ) d i 3 = S β V X 1 W i 1 0 2 ϵ i 1 X i 1 0 V X 1 W i 2 0 2 ϵ i 2 X i 2 0 V X 1 X i 3 ϵ i 3 μ i 3 = 2 ϵ i 1 X i 1 0 V X 1 W i 2 V X 1 X i 3 ϵ i 3 2 ϵ i 2 X i 2 V X 1 X i 3 ϵ i 3 = 2 ϵ i 1 X i 1 V X 1 W i 2 V X 1 X i 3 ϵ i 3 = 2 E β , i 1 i 2 i 3 ( 1 ) .
E β , i 1 i 2 i 3 ( 5 ) = S β 1 2 d ˜ i 1 ( θ ) d ¯ 1 ( θ θ ) ( d i 2 d i 3 ) = 1 2 2 ϵ i 1 X i 1 0 0 V X S η 2 ( d i 2 d i 3 ) = 0 .
E β , i 1 i 2 i 3 ( 6 ) = S β 1 2 d ˜ i 1 ( θ θ ) ( d i 2 d i 3 ) = S β 1 2 0 2 W i 1 S η 2 V X 1 X i 2 ϵ i 2 μ i 2 V X 1 X i 3 ϵ i 3 μ i 3 . = W i 1 V X 1 X i 2 ϵ i 2 V X 1 X i 3 ϵ i 3 . = X i 2 V X 1 W i 1 V X 1 X i 3 ϵ i 2 ϵ i 3 .
We thus have
d β , i 1 i 2 i 3 = s = 1 6 E β , i 1 i 2 i 3 ( s ) = X i 2 V X 1 W i 1 V X 1 X i 3 ϵ i 2 ϵ i 3 .
We examine now the approximate moments of  β ^ . We obtain the approximate bias as ( N 1  times) the expected value of the second-order influence function.
E [ d β , 11 ] = E [ X 1 V X 1 X 1 ] E [ ϵ 1 2 ] = l σ 2 .
We obtain the components of the approximate MSE as follows.
V β , 1 = E [ d β , 1 2 ] = E [ μ 1 2 ] .
V β , 21 = E [ d β , 1 d β , 11 ] = E [ μ 1 ( X 1 V X 1 X 1 ϵ 1 ϵ 1 ) ] = E [ μ 1 ϵ 1 2 ] E [ X 1 V X 1 X 1 ] = l E [ μ 1 ϵ 1 2 ] = l E [ μ 1 2 ] .
V β , 22 = E [ d β , 11 ] E [ d β , 11 ] + E [ d β , 12 d β , 12 ] + E [ d β , 12 d β , 21 ] = E [ X 1 V X 1 X 1 ϵ 1 ϵ 1 ] E [ X 1 V X 1 X 1 ϵ 1 ϵ 1 ] + E [ X 1 V X 1 X 2 ϵ 1 ϵ 2 X 1 V X 1 X 2 ϵ 1 ϵ 2 ] + E [ X 1 V X 1 X 2 ϵ 1 ϵ 2 X 2 V X 1 X 1 ϵ 2 ϵ 1 ] = l 2 β 0 2 + E [ X 2 V X 1 X 1 X 1 V X 1 X 2 ] β 0 2 + E [ X 1 V X 1 X 1 ] β 0 2 = l 2 β 0 2 + 2 l β 0 2 .
V β , 23 = E [ d β , 1 d β , 122 ] + E [ d β , 1 d β , 212 ] + E [ d β , 1 d β , 221 ] = E [ μ 1 X 2 V X 1 W 1 V X 1 X 2 ϵ 2 ϵ 2 ] + E [ μ 1 X 1 V X 1 W 2 V X 1 X 2 ϵ 1 ϵ 2 ] + E [ μ 1 X 2 V X 1 W 2 V X 1 X 1 ϵ 2 ϵ 1 ] = 0 .
so that the approximate variance is given by
MSE 2 [ β ^ ] = 1 N V β , 1 + 1 N 2 2 V β , 21 + V β , 22 + 2 V β , 23 = 1 N E [ μ 1 2 ] + 1 N 2 2 l E [ μ 1 2 ] + l 2 β 0 2 + 2 l β 0 2 .
We obtain the components of the skewness as follows.
S β , 1 = E [ d β , 1 3 ] = E [ μ 1 3 ] .
To obtain  S β , 2 , we have
E [ d β , 1 2 d β , 22 ] = V β , 1 E [ X 2 V X 1 X 2 ϵ 2 ϵ 2 ] = l V β , 1 β 0 E [ d β , 1 d β , 2 d β , 12 ] = E [ d β , 1 d β , 2 d β , 21 ] = E [ μ 1 ϵ 1 ] E [ μ 2 ϵ 2 ] E [ X 1 V X 1 X 2 ] = E [ μ 1 ϵ 1 ] 2 E [ X 1 ] V X 1 E [ X 1 ] = E [ ϵ 1 3 ] 2 E [ X 1 ] V X 1 E [ X 1 ]
so that
S β , 2 = σ ( 1122 ) E [ d β , i 1 d β , i 2 d β , i 3 i 4 ] = l V β , 1 β 0 2 E [ ϵ 1 3 ] 2 E [ X 1 ] V X 1 E [ X 1 ] .
We obtain the components of the approximate kurtosis as follows.
K β , 1 = E [ d β , 1 4 ] 3 σ ( 1122 ) E [ d β , i 1 d β , i 2 d β , i 3 d β , i 4 ] = E [ μ 1 4 ] 3 E [ μ 1 2 ] 2 = E [ μ 1 4 ] 3 V β , 1 2 .
K β , 2 = σ ( 11122 ) σ ( 11222 ) E d β , i 1 i 2 d β , i 3 d β , i 4 d β , i 5 = σ ( 11122 ) σ ( 11222 ) E X i 1 V X 1 X i 2 ϵ i 1 ϵ i 2 μ i 3 μ i 4 μ i 5 .
We see by symmetry that many terms are the same. We have
E d β , 11 d β , 1 d β , 2 d β , 2 = E d β , 11 d β , 2 d β , 1 d β , 2 = E d β , 11 d β , 2 d β , 2 d β , 1 = E X 1 V X 1 X 1 ϵ 1 ϵ 1 μ 1 μ 2 μ 2 = l E [ ϵ 1 2 μ 1 ] E [ μ 1 2 ] , E d β , 12 d β , 1 d β , 1 d β , 2 = E d β , 12 d β , 1 d β , 2 d β , 1 = E d β , 12 d β , 2 d β , 1 d β , 1 = E X 1 V X 1 X 2 ϵ 1 ϵ 2 μ 1 μ 2 μ 1 = E [ X 1 ] V X 1 E [ X 2 ] E [ ϵ 1 μ 1 2 ] E [ ϵ 2 μ 2 ] = E [ X 1 ] V X 1 E [ X 1 ] E [ ϵ 1 μ 1 2 ] E [ ϵ 1 3 ] , E d β , 12 d β , 1 d β , 2 d β , 2 = E d β , 12 d β , 2 d β , 1 d β , 2 = E d β , 12 d β , 2 d β , 2 d β , 1 = E X 1 V X 1 X 2 ϵ 1 ϵ 2 μ 1 μ 2 μ 2 = E [ X 1 ] V X 1 E [ X 2 ] E [ ϵ 1 μ 1 2 ] E [ ϵ 2 μ 2 ] = E [ X 1 ] V X 1 E [ X 1 ] E [ ϵ 1 μ 1 2 ] E [ ϵ 1 3 ] , E d β , 11 d β , 2 d β , 2 d β , 2 = E X 1 V X 1 X 1 ϵ 1 ϵ 1 μ 2 μ 2 μ 2 = l β 0 E [ μ 1 3 ] ,
so
K β , 2 = 3 l E [ ϵ 1 2 μ 1 ] E [ μ 1 2 ] l β 0 E [ μ 1 3 ] 6 E [ ϵ 1 μ 1 2 ] E [ ϵ 1 3 ] E [ X 1 ] V X 1 E [ X 1 ] .
K β , 3 = σ ( 112233 ) E [ d i 1 d i 2 d i 3 d i 4 i 5 i 6 ] = σ ( 112233 ) E [ μ i 1 μ i 2 μ i 3 X i 5 V X 1 W i 4 V X 1 X i 6 ϵ i 5 ϵ i 6 ] = 0 .
which we have obtained by noting that in any of the summands in  K β , 3 , there is a term containing either  E [ ϵ 1 W 1 ]  or  E [ μ 1 W 1 ] , which are both zero due to the independence of the residuals and the  X i s. Note that this also holds if independence is replaced with a conditional mean-zero assumption. Note that the corresponding term in the sample variance example is also zero.
K β , 4 = σ ( 112233 ) E [ d i 1 d i 2 d i 3 i 4 d i 5 i 6 ] = σ ( 112233 ) E [ μ i 1 μ i 2 X i 3 V X 1 X i 4 ϵ i 3 ϵ i 4 X i 5 V X 1 X i 6 ϵ i 5 ϵ i 6 ]
E [ d 1 d 1 d 22 d 33 ] = E [ μ 1 μ 1 X 2 V X 1 X 2 ϵ 2 ϵ 2 X 3 V X 1 X 3 ϵ 3 ϵ 3 ] = l 2 E [ μ 1 2 ] β 0 2 , E [ d 1 d 1 d 23 d 23 ] = E [ d 1 d 1 d 23 d 32 ] = E [ μ 1 μ 1 X 2 V X 1 X 3 ϵ 2 ϵ 3 X 2 V X 1 X 3 ϵ 2 ϵ 3 ] = l E [ μ 1 2 ] β 0 2 , E [ d 1 d 2 d 12 d 33 ] = E [ d 1 d 2 d 21 d 33 ] = E [ d 1 d 2 d 33 d 21 ] = E [ d 1 d 2 d 33 d 12 ] = E [ μ 1 μ 2 X 1 V X 1 X 2 ϵ 1 ϵ 2 X 3 V X 1 X 3 ϵ 3 ϵ 3 ] = l E [ μ 1 ϵ 1 ] 2 E [ X 1 ] V X 1 E [ X 1 ] β 0 .
E [ d 1 d 2 d 13 d 23 ] = E [ d 1 d 2 d 13 d 32 ] = E [ d 1 d 2 d 31 d 23 ] = E [ d 1 d 2 d 31 d 32 ] = E [ d 1 d 2 d 32 d 13 ] = E [ d 1 d 2 d 23 d 13 ] = E [ d 1 d 2 d 32 d 31 ] = E [ d 1 d 2 d 23 d 31 ] = E [ μ 1 μ 2 X 1 V X 1 X 3 ϵ 1 ϵ 3 X 2 V X 1 X 3 ϵ 2 ϵ 3 ] = E [ μ 1 ϵ 1 ] 2 E [ X 1 ] V X 1 E [ X 1 ] β 0 ,
so
K β , 4 = ( l 2 + 2 l ) E [ μ 1 2 ] β 0 2 + ( 4 l + 8 ) β 0 E [ μ 1 ϵ 1 ] 2 E [ X 1 ] V X 1 E [ X 1 ]
  • Derivations for Two-Stage Least Squares
We derive the influence functions and moments for the 2SLS estimator. Define  W i = X i X i V X  and  Z i = g ˙ i Y ˙ i E [ g ˙ i Y ˙ i ] = g ˙ i Y ˙ i 1 = g ˙ i 2 + ˙ g ˙ i η ˙ i 1 . Some intermediate expressions are useful. We easily see that
d i = Q q i = V X 1 0 0 V g 1 X i η i g i ϵ i = V X 1 X i η i g ˙ i ϵ ˙ i , d i ( θ ) = Q q i ( θ ) = V X 1 X i X i 0 X ˙ i ϵ ˙ i π 0 X ˙ i Y ˙ i , d ˜ i ( θ ) = Q q i ( θ ) = V X 1 W i 0 X ˙ i ϵ ˙ i Z i , d i ( θ θ ) = Q q i ( θ θ ) = V X 1 0 0 V g 1 0 l × l X i Y i ( S π S β ) + ( S β S π ) = 0 l × l X ˙ i Y ˙ i ( S π S β ) + ( S β S π ) , d ¯ 1 ( θ θ ) = 0 l × l E [ X ˙ 1 Y ˙ 1 ] ( S π S β ) + ( S β S π ) d ˜ i ( θ θ ) = 0 l × l X ˙ i Y ˙ i E [ X ˙ i Y ˙ i ] ( S π S β ) + ( S β S π ) .
Note that the first-order asymptotic variance for  θ ^  is
V 1 = E [ d 1 d 1 ] = σ η 2 V X 1 ϱ V X 1 E [ X ˙ 1 g ˙ 1 ] ϱ E [ g ˙ 1 X ˙ 1 ] V X 1 σ ϵ 2 V g 1 = σ η 2 V X 1 ϱ π 0 V g 1 ϱ π 0 V g 1 σ ϵ 2 V g 1
and the influence function for  β ^  is
d β , i = S β d i = g ˙ i ϵ ˙ i
which are standard representations. The two terms making up the second-order influence function for  β ^  can be determined as follows. First,
A β , i 1 i 2 ( 1 ) = S β d ˜ i 1 ( θ ) d i 2 = S β V X 1 W i 1 0 X ˙ i 1 ϵ ˙ i 1 Z i 1 V X 1 X i 2 η i 2 g ˙ i 2 ϵ ˙ i 2 = X ˙ i 1 ϵ ˙ i 1 Z i 1 V X 1 X i η i g ˙ i 2 ϵ ˙ i 2 = X i 1 ϵ ˙ i 1 V X 1 X i 2 η ˙ i 2 Z i 1 g ˙ i 2 ϵ ˙ i 2 .
Second, noting that we can write
S β d ¯ 1 ( θ θ ) = V g 1 π 0 V X { ( S π S β ) + ( S β S π ) } ,
we have
2 A β , i 1 i 2 ( 2 ) = S β d ¯ 1 ( θ θ ) ( d i 1 d i 2 ) = V g 1 π 0 V X { ( S π d i 1 S β d i 2 ) + ( S β d i 1 S π d i 2 ) } = V g 1 π 0 V X { ( V X 1 X i 1 η i 1 g ˙ i 2 ϵ ˙ i 2 ) + ( g ˙ i 1 ϵ ˙ i 1 V X 1 X i 2 η i 2 } = V g 1 π 0 { ( X i 1 g ˙ i 2 η i 1 ϵ ˙ i 2 ) + X i 2 g ˙ i 1 η i 2 ϵ ˙ i 1 } = { g ˙ i 1 g ˙ i 2 η ˙ i 1 ϵ ˙ i 2 + g ˙ i 2 g ˙ i 1 η ˙ i 2 ϵ ˙ i 1 } .
The terms making up the third-order influence function are derived as follows:
E β , i 1 i 2 i 3 ( 1 ) = S β d ¯ 1 ( θ θ ) ( d i 1 d ˜ i 2 ( θ ) d i 3 ) = V g 1 π 0 V X ( S π S β ) + ( S β S π ) ( d i 1 d ˜ i 2 ( θ ) d i 3 ) = V g 1 π 0 V X ( S π d i 1 S β d ˜ i 2 ( θ ) d i 3 ) + ( S β d i 1 S π d ˜ i 2 ( θ ) d i 3 ) = V g 1 π 0 V X { V X 1 X i 1 η i 1 X ˙ i 2 ϵ ˙ i 2 Z i 2 V X 1 X i 3 η i 3 g ˙ i 3 ϵ ˙ i 3 + g ˙ i 1 ϵ ˙ i 1 V X 1 W i 2 0 V X 1 X i 3 η i 3 g ˙ i 3 ϵ ˙ i 3 } = V g 1 π 0 { X i 1 η i 1 X ˙ i 2 ϵ ˙ i 2 V X 1 X i 3 η i 3 Z i 2 g ˙ i 3 ϵ ˙ i 3 g ˙ i 1 ϵ ˙ i 1 W i 2 V X 1 X i 3 η i 3 } = { g ˙ i 1 η ˙ i 1 X i 2 ϵ ˙ i 2 V X 1 X i 3 η ˙ i 3 Z i 2 g ˙ i 3 ϵ ˙ i 3 g ˙ i 1 ϵ ˙ i 1 π 0 W i 2 V X 1 X ˙ i 3 η ˙ i 3 } = g ˙ i 1 η ˙ i 1 X i 2 ϵ ˙ i 2 V X 1 X i 3 η ˙ i 3 Z i 2 g ˙ i 3 ϵ ˙ i 3 ϵ ˙ i 1 π 0 W i 2 V X 1 X ˙ i 3 η ˙ i 3 = g ˙ i 1 η ˙ i 1 X i 2 ϵ ˙ i 2 V X 1 X i 3 η ˙ i 3 + g ˙ i 1 η ˙ i 1 Z i 2 g ˙ i 3 ϵ ˙ i 3 + g ˙ i 1 ϵ ˙ i 1 g ˙ i 2 X i 2 V X 1 X ˙ i 3 η ˙ i 3 g ˙ i 1 ϵ ˙ i 1 g ˙ i 3 η ˙ i 3 ,
2 E β , i 1 i 2 i 3 ( 2 ) = S β d ¯ 1 ( θ θ ) ( d i 1 d ¯ 1 ( θ θ ) ( d i 2 d i 3 ) )
Note first that
d ¯ 1 ( θ θ ) ( d i 2 d i 3 ) = 0 l × l V g 1 π 0 V X ( S π S β ) + ( S β S π ) × d i 2 d i 3 = 0 l × l V g 1 π 0 V X { ( V X 1 X i 2 η i 2 g ˙ i 3 ϵ ˙ i 3 ) + ( g ˙ i 2 ϵ ˙ i 2 V X 1 X i 3 η i 3 ) } = 0 l × 1 g ˙ i 2 η ˙ i 2 g ˙ i 3 ϵ ˙ i 3 0 l × l g ˙ i 2 ϵ ˙ i 2 g ˙ i 3 η ˙ i 3 = 0 l × 1 g ˙ i 2 g ˙ i 3 ( η ˙ i 2 ϵ ˙ i 3 + ϵ ˙ i 2 η ˙ i 3 )
so that
2 E β , i 1 i 2 i 3 ( 2 ) = S β d ¯ 1 ( θ θ ) d i 1 0 l × 1 g ˙ i 2 g ˙ i 3 ( η ˙ i 2 ϵ ˙ i 3 + ϵ ˙ i 2 η ˙ i 3 ) = V g 1 π 0 V X ( S π S β ) + ( S β S π ) × V X 1 X i 1 η i 1 g ˙ i 1 ϵ ˙ i 1 0 l × 1 g ˙ i 2 g ˙ i 3 ( η ˙ i 2 ϵ ˙ i 3 + ϵ ˙ i 2 η ˙ i 3 ) = V g 1 π 0 V X × V X 1 X i 1 η i 1 g ˙ i 2 g ˙ i 3 ( η ˙ i 2 ϵ ˙ i 3 + ϵ ˙ i 2 η ˙ i 3 ) = g ˙ i 1 η ˙ i 1 g ˙ i 2 g ˙ i 3 ( η ˙ i 2 ϵ ˙ i 3 + ϵ ˙ i 2 η ˙ i 3 ) .
Since third derivatives of  q i ( θ )  are zero,
E β , i 1 i 2 i 3 ( 3 ) = 0 .
Next,
E β , i 1 i 2 i 3 ( 4 ) = S β d ˜ i 1 ( θ ) d ˜ i 2 ( θ ) d i 3 = S β V X 1 W i 1 0 X ˙ i 1 ϵ ˙ i 1 Z i 1 V X 1 W i 2 0 X ˙ i 2 ϵ ˙ i 2 Z i 2 V X 1 X i 3 η i 3 g ˙ i 3 ϵ ˙ i 3 = X ˙ i 1 ϵ ˙ i 1 Z i 1 V X 1 W i 2 V X 1 X i 3 η i 3 X i 2 ϵ ˙ i 2 V X 1 X i 3 η ˙ i 3 Z i 2 g ˙ i 3 ϵ ˙ i 3 = X i 1 ϵ ˙ i 1 V X 1 W i 2 V X 1 X i 3 η ˙ i 3 Z i 1 X i 2 ϵ ˙ i 2 V X 1 X i 3 η ˙ i 3 Z i 2 g ˙ i 3 ϵ ˙ i 3 ,
2 E β , i 1 i 2 i 3 ( 5 ) = S β d ˜ i 1 ( θ ) d ¯ 1 ( θ θ ) ( d i 2 d i 3 ) = X ˙ i 1 ϵ ˙ i 1 Z i 1 0 l × 1 g ˙ i 2 g ˙ i 3 ( η ˙ i 2 ϵ ˙ i 3 + ϵ ˙ i 2 η ˙ i 3 ) = Z i 1 g ˙ i 2 g ˙ i 3 ( η ˙ i 2 ϵ ˙ i 3 + ϵ ˙ i 2 η ˙ i 3 ) ,
2 E β , i 1 i 2 i 3 ( 6 ) = S β d ˜ i 1 ( θ θ ) ( d i 2 d i 3 ) = ( X ˙ i 1 Y ˙ i 1 E [ X ˙ i Y ˙ i ] ) { ( V X 1 X i 2 η i 2 g ˙ i 3 ϵ ˙ i 3 ) + ( g ˙ i 2 ϵ ˙ i 2 V X 1 X i 3 η i 3 } = V g 1 ( π 0 X i 1 X i 1 + X i 1 η i 1 π 0 E [ X i X i ] ) V X 1 { X i 2 η i 2 g ˙ i 3 ϵ ˙ i 3 + g ˙ i 2 ϵ ˙ i 2 X i 3 η i 3 } = V g 1 ( π 0 X i 1 X i 1 ) V X 1 { X i 2 η i 2 g ˙ i 3 ϵ ˙ i 3 + g ˙ i 2 ϵ ˙ i 2 X i 3 η i 3 } + V g 1 ( X i 1 η i 1 ) V X 1 { X i 2 η i 2 g ˙ i 3 ϵ ˙ i 3 + g ˙ i 2 ϵ ˙ i 2 X i 3 η i 3 } V g 1 ( π 0 E [ X i X i ] ) V X 1 { X i 2 η i 2 g ˙ i 3 ϵ ˙ i 3 + g ˙ i 2 ϵ ˙ i 2 X i 3 η i 3 } = g ˙ i 1 X i 1 V X 1 X i 2 η ˙ i 2 g ˙ i 3 ϵ ˙ i 3 + g ˙ i 1 X i 1 V X 1 g ˙ i 2 ϵ ˙ i 2 X i 3 η ˙ i 3 + X i 1 η i 1 V X 1 X i 2 η ˙ i 2 g ˙ i 3 ϵ ˙ i 3 + X i 1 η i 1 V X 1 g ˙ i 2 ϵ ˙ i 2 X i 3 η ˙ i 3 ( g ˙ i 2 η i 2 g ˙ i 3 ϵ ˙ i 3 + g ˙ i 2 ϵ ˙ i 2 g ˙ i 3 η ˙ i 3 )
The approximate bias (times N) is given by the expected value of  A β , 11 ( 1 ) + A β , 11 ( 2 ) . We have
B β , 21 = A ¯ β , 11 ( 1 ) = E [ X 1 ϵ ˙ 1 V X 1 X 1 η ˙ 1 + ( g ˙ 1 Y ˙ 1 + 1 ) g ˙ 1 ϵ ˙ 1 ] = E [ X 1 V X 1 X 1 ] E [ η ˙ 1 ϵ ˙ 1 ] E [ g ˙ 1 g ˙ 1 ] E [ η ˙ 1 ϵ ˙ 1 ] = l ϱ ˙ ϱ ˙ ,
A ¯ β , 11 ( 1 ) = ( l 1 ) ϱ V g 1 ,
2 B β , 22 = 2 A ¯ β , 11 ( 2 ) = E { g ˙ 1 g ˙ 1 η ˙ 1 ϵ ˙ 1 + g ˙ 1 g ˙ 1 η ˙ 1 ϵ ˙ 1 } = 2 E [ g ˙ 1 2 ] E [ η ˙ 1 ϵ ˙ 1 ] = 2 ϱ ˙
so
A ¯ 11 ( 1 ) + A ¯ 11 ( 2 ) = ( l 2 ) ϱ V g 1 .
To find the third approximate moment of  β ^ , we need to evaluate the two terms from the standard formula, which we denote here as  S β , 1  and  S β , 2 . First,
S β , 1 = E [ d β , 1 3 ] = E [ g ˙ 1 3 ] E [ ϵ ˙ 1 3 ] .
Second, by symmetry,
S β , 2 = σ ( 1122 ) E [ d β , i 1 d β , i 2 d β , i 3 i 4 ] = E [ d β , 1 d β , 1 d β , 22 ] + E [ d β , 1 d β , 2 d β , 12 ] + E [ d β , 1 d β , 2 d β , 21 ] = E [ d β , 1 2 ] E [ d β , 22 ] + 2 E [ d β , 1 d β , 2 d β , 12 ] .
We see immediately that
E [ d β , 1 2 ] E [ d β , 22 ] = V β , 1 B β , 2
and almost as immediately that
E [ d β , 1 d β , 2 d β , 12 ] = E [ g ˙ 1 ϵ ˙ 1 g ˙ 2 ϵ ˙ 2 ( X 1 ϵ ˙ 1 V X 1 X 2 η ˙ 2 Z 1 g ˙ 2 ϵ ˙ 2 ( g ˙ 1 g ˙ 2 η ˙ 1 ϵ ˙ 2 + g ˙ 2 g ˙ 1 η ˙ 2 ϵ ˙ 1 ) / 2 ) ] = E [ ϵ ˙ 1 2 ] E [ ϵ ˙ 2 η ˙ 2 ] E [ ϵ ˙ 1 2 ] E [ ϵ ˙ 2 η ˙ 2 ] ( E [ ϵ ˙ 1 2 ] E [ ϵ ˙ 2 η ˙ 2 ] + E [ ϵ ˙ 1 2 ] E [ ϵ ˙ 2 η ˙ 2 ] ) / 2 = σ ˙ ϵ 2 ϱ ˙
noting that  g 1 X 1 = π 0 X 1 X 1  so  E [ g 1 X 1 ] = π 0 V X . We thus have
S β , 2 = V β , 1 B β , 2 σ ˙ ϵ 2 ϱ ˙ .
The components of the higher-order terms in the approximate MSE of  β ^  can be written as
V β , 21 = E [ d β , 1 d β , 11 ] , V β , 22 = σ ( 1122 ) E [ d β , i 1 d β , i 2 i 3 i 4 ] , V β , 23 = σ ( 1122 ) E [ d β , i 1 i 2 d i 3 i 4 ] .
Evaluating these in turn we have
V β , 21 = E [ d β , 1 d β , 11 ] = E [ g ˙ 1 ϵ ˙ 1 ( X 1 ϵ ˙ 1 V X 1 X 1 η ˙ 1 Z 1 g ˙ 1 ϵ ˙ 1 ( g ˙ 1 g ˙ 1 η ˙ 1 ϵ ˙ 1 + g ˙ 1 g ˙ 1 η ˙ 1 ϵ ˙ 1 ) / 2 ) ] = E [ g ˙ 1 X 1 V X 1 X 1 ] E [ ϵ ˙ 1 2 η ˙ 1 ] E [ g ˙ 1 ϵ ˙ 1 ( V g 1 g 1 ( g 1 + η 1 ) 1 ) g ˙ 1 ϵ ˙ 1 ] E [ g ˙ 1 3 ϵ ˙ 1 2 η ˙ 1 ] = E [ g ˙ 1 X 1 V X 1 X 1 ] E [ ϵ ˙ 1 2 η ˙ 1 ] E [ g ˙ 1 4 ϵ ˙ 1 2 ] E [ g ˙ 1 3 ϵ ˙ 1 2 η ˙ 1 ] + E [ g ˙ 1 2 ϵ ˙ 1 2 ] E [ g ˙ 1 3 ] E [ ϵ ˙ 1 2 η ˙ 1 ] = E [ g ˙ 1 X 1 V X 1 X 1 ] E [ ϵ ˙ 1 2 η ˙ 1 ] E [ g ˙ 1 4 ] E [ ϵ ˙ 1 2 ] 2 E [ g ˙ 1 3 ] E [ ϵ ˙ 1 2 η ˙ 1 ] + E [ ϵ ˙ 1 2 ] = E [ ϵ ˙ 1 2 η ˙ 1 ] ( E [ g ˙ 1 X 1 V X 1 X 1 ] 2 E [ g ˙ 1 3 ] ) V β , 1 Var [ g ˙ 1 2 ] .
To evaluate  V β , 23  we break it up into its components:
V β , 23 = σ ( 1122 ) E [ d β , i 1 i 2 d β , i 3 i 4 ] = E [ d β , 11 ] E [ d β , 22 ] + E [ d β , 12 2 ] + E [ d β , 12 d β , 21 ] .
First, we see that  E [ d β , 11 ] E [ d β , 22 ] = B β , 2 2 . With
E [ d β , i 1 i 2 d β , i 3 i 4 ] = E [ ( A β , i 1 i 2 ( 1 ) + A β , i 1 i 2 ( 2 ) ) ( A β , i 3 i 4 ( 1 ) + A β , i 3 i 4 ( 2 ) ) ] ,
we evaluate
E [ d β , 12 2 ] = E [ A β , 12 ( 1 ) 2 ] + 2 E [ A β , 12 ( 1 ) A β , 12 ( 2 ) ] + E [ A β , 12 ( 2 ) 2 ]
term by term. The first term in  E [ d β , 12 2 ]  is
E [ A β , 12 ( 1 ) 2 ] = E [ ( X 1 ϵ ˙ 1 V X 1 X 2 η ˙ 2 Z 1 g ˙ 2 ϵ ˙ 2 ) 2 ] = E [ ( X 1 ϵ ˙ 1 V X 1 X 2 η ˙ 2 ) 2 ] 2 E [ X 1 ϵ ˙ 1 V X 1 X 2 η ˙ 2 Z 1 g ˙ 2 ϵ ˙ 2 ] + E [ ( Z 1 g ˙ 2 ϵ ˙ 2 ) 2 ] = l σ ˙ ϵ 2 σ ˙ η 2 2 ϱ ˙ 2 + ( Var [ g ˙ 1 2 ] + σ ˙ η 2 ) σ ˙ ϵ 2 = ( l + 1 ) σ ˙ ϵ 2 σ ˙ η 2 2 ϱ ˙ 2 + σ ˙ ϵ 2 Var [ g ˙ 1 2 ] ,
where we have used
E ( X 1 V X 1 X 2 ) 2 ] = E [ X 1 V X 1 X 2 X 2 V X 1 X 1 ] = l , E [ X 1 V ˙ X 1 X 2 g ˙ 1 g ˙ 2 ] = E X ˙ 1 V ˙ X 1 X 2 X 2 π 0 g ˙ 1 ] = E [ g ˙ 1 : g 1 ] = 1 , E [ Z 1 2 ] = E [ ( g ˙ 1 ( g ˙ 1 + η ˙ 1 ) 1 ) 2 ] = E [ g ˙ 1 2 ( g ˙ 1 + η ˙ 1 ) 2 2 g ˙ 1 ( g ˙ 1 + η ˙ 1 ) ] + 1 = E [ g ˙ 1 4 ] + E [ g ˙ 1 2 ] E [ η ˙ 1 2 ] 1 = Var [ g ˙ 1 2 ] + E [ η ˙ 1 2 ] .
Next, the middle term in  E [ d β , 12 2 ]  is
2 E [ A β , 12 ( 1 ) A β , 12 ( 2 ) ] = E [ ( X 1 ϵ ˙ 1 V X 1 X 2 η ˙ 2 Z 1 g ˙ 2 ϵ ˙ 2 ) ( g ˙ 1 g ˙ 2 η ˙ 1 ϵ ˙ 2 + g ˙ 2 g ˙ 1 η ˙ 2 ϵ ˙ 1 ) ] = E [ ( X 1 ϵ ˙ 1 V X 1 X 2 η ˙ 2 g ˙ 1 η ˙ 1 g ˙ 2 ϵ ˙ 2 ) ( g ˙ 1 g ˙ 2 η ˙ 1 ϵ ˙ 2 + g ˙ 2 g ˙ 1 η ˙ 2 ϵ ˙ 1 ) ] = E [ g ˙ 1 g ˙ 2 ( X 1 ϵ ˙ 1 V X 1 X 2 η ˙ 2 ) ( η ˙ 1 ϵ ˙ 2 + η ˙ 2 ϵ ˙ 1 ) ] + E [ g ˙ 1 g ˙ 2 g ˙ 1 η ˙ 1 g ˙ 2 ϵ ˙ 2 ( η ˙ 1 ϵ ˙ 2 + η ˙ 2 ϵ ˙ 1 ) ] = ϱ ˙ 2 E [ ϵ ˙ 1 2 ] E [ η ˙ 1 2 ] ) + E [ η ˙ 2 2 ] E [ ϵ ˙ 2 2 ] + ϱ ˙ 2 = 0 .
The third component of  E [ d β , 12 2 ] E [ A β , 12 ( 2 ) 2 ] , can be written
4 E [ A β , 12 ( 2 ) 2 ] = E [ { g ˙ 1 g ˙ 2 η ˙ 1 ϵ ˙ 2 + g ˙ 1 g ˙ 2 η ˙ 2 ϵ ˙ 1 } { g ˙ 1 g ˙ 2 η ˙ 1 ϵ ˙ 2 + g ˙ 1 g ˙ 2 η ˙ 2 ϵ ˙ 1 } ] = E [ ( g ˙ 1 g ˙ 2 η ˙ 1 ϵ ˙ 2 ) 2 ] + 2 E [ g ˙ 1 g ˙ 2 η ˙ 1 ϵ ˙ 2 g ˙ 1 g ˙ 2 η ˙ 2 ϵ ˙ 1 ] + E [ ( g ˙ 1 g ˙ 2 η ˙ 2 ϵ ˙ 1 ) 2 ] = σ ˙ η 2 σ ˙ ϵ 2 + 2 ϱ ˙ 2 + σ ˙ η 2 σ ˙ ϵ 2 = 2 ( σ ˙ η 2 σ ˙ ϵ 2 + ϱ ˙ 2 ) .
Putting these three results together we have
E [ d β , 12 2 ] = ( l + 1 ) σ ˙ ϵ 2 σ ˙ η 2 2 ϱ ˙ 2 + σ ˙ ϵ 2 Var [ g ˙ 1 2 ] + 1 2 ( σ ˙ η 2 σ ˙ ϵ 2 + ϱ ˙ 2 )
Similarly, we evaluate the components of
E [ d β , 12 d β , 21 ] = E [ A β , 12 ( 1 ) A β , 21 ( 1 ) ] + E [ A β , 12 ( 1 ) A β , 21 ( 2 ) ] + E [ A β , 12 ( 2 ) A β , 21 ( 1 ) ] + E [ A β , 12 ( 2 ) A β , 21 ( 2 ) ]
as follows. The first term in  E [ d β , 12 d β , 21 ]  is
E [ A β , 12 ( 1 ) A β , 21 ( 1 ) ] = E [ ( X 1 ϵ ˙ 1 V X 1 X 2 η ˙ 2 Z 1 g ˙ 2 ϵ ˙ 2 ) ( X 2 ϵ ˙ 2 V X 1 X 1 η ˙ 1 Z 2 g ˙ 1 ϵ ˙ 1 ) ] = E [ X 1 ϵ ˙ 1 V X 1 X 2 η ˙ 2 X 2 ϵ ˙ 2 V X 1 X 1 η ˙ 1 ] 2 E [ X 1 ϵ ˙ 1 V X 1 X 2 η ˙ 2 Z 2 g ˙ 1 ϵ ˙ 1 ] + E [ Z 1 g ˙ 2 ϵ ˙ 2 Z 2 g ˙ 1 ϵ ˙ 1 ] = l ϱ ˙ 2 2 E [ g ˙ 2 2 σ ˙ η 2 σ ˙ ϵ 2 ] + E [ g ˙ 1 η ˙ 1 g ˙ 2 ϵ ˙ 2 g ˙ 2 η ˙ 2 g ˙ 1 ϵ ˙ 1 ] = ( l + 1 ) ϱ ˙ 2 2 σ ˙ η 2 σ ˙ ϵ 2
By symmetry, the second and third terms in  E [ d β , 12 d β , 21 ]  are
2 E [ A β , 12 ( 1 ) A β , 21 ( 2 ) ] = E [ ( X 1 ϵ ˙ 1 V X 1 X 2 η ˙ 2 Z 1 g ˙ 2 ϵ ˙ 2 ) ( g ˙ 1 g ˙ 2 η ˙ 1 ϵ ˙ 2 + g ˙ 2 g ˙ 1 η ˙ 2 ϵ ˙ 1 ) ] = E [ ( X 1 ϵ ˙ 1 V X 1 X 2 η ˙ 2 g ˙ 1 η ˙ 1 g ˙ 2 ϵ ˙ 2 ) ( g ˙ 1 g ˙ 2 η ˙ 1 ϵ ˙ 2 + g ˙ 2 g ˙ 1 η ˙ 2 ϵ ˙ 1 ) ] = E [ A β , 12 ( 2 ) A β , 12 ( 1 ) ] = 0 .
Also by symmetry, the fourth term in  E [ d β , 12 d β , 21 ]  is
4 E [ ( A β , 12 ( 2 ) A β , 21 ( 2 ) ) 2 ] = 4 E [ ( A β , 12 ( 2 ) ) 2 ] = 2 ( σ ˙ η 2 σ ˙ ϵ 2 + ϱ ˙ 2 ) ,
so that
E [ d β , 12 d β , 21 ] = ( l + 1 ) ϱ ˙ 2 2 σ ˙ η 2 σ ˙ ϵ 2 + 0 + 0 + 1 2 ( σ ˙ η 2 σ ˙ ϵ 2 + ϱ ˙ 2 )
and
V β , 23 = B β , 2 2 + ( l + 1 ) σ ˙ ϵ 2 σ ˙ η 2 2 ϱ ˙ 2 + σ ˙ ϵ 2 Var [ g ˙ 1 2 ] + 1 2 ( σ ˙ η 2 σ ˙ ϵ 2 + ϱ ˙ 2 ) + ( l + 1 ) ϱ ˙ 2 2 σ ˙ η 2 σ ˙ ϵ 2 + 1 2 ( σ ˙ η 2 σ ˙ ϵ 2 + ϱ ˙ 2 ) ] = B β , 2 2 + l ( σ ˙ ϵ 2 σ ˙ η 2 + ϱ ˙ 2 ) + σ ˙ ϵ 2 Var [ g ˙ 1 2 ] .
To evaluate  V β , 22  we break it up into its components:
V β , 22 = σ ( 1122 ) E [ d β , i 1 d β , i 2 i 3 i 4 ] = σ ( 1122 ) l = 1 6 E [ d β , i 1 E β , i 2 i 3 i 4 ]
For  l = 1 ,
E [ d β , 1 E β , 122 ( 1 ) ] = E [ g ˙ 1 ϵ ˙ 1 ( g ˙ 1 η ˙ 1 X 2 ϵ ˙ 2 V X 1 X 2 η ˙ 2 + g ˙ 1 η ˙ 1 Z 2 g ˙ 2 ϵ ˙ 2 + g ˙ 1 ϵ ˙ 1 g ˙ 2 X 2 V X 1 X ˙ 2 η ˙ 2 g ˙ 1 ϵ ˙ 1 g ˙ 2 η ˙ 2 ) ] = l ϱ ˙ 2 + ϱ ˙ 2 + 0 0 = ( l 1 ) ϱ ˙ 2 E [ d β , 1 E β , 212 ( 1 ) ] = E [ g ˙ 1 ϵ ˙ 1 ( g ˙ 2 η ˙ 2 X 1 ϵ ˙ 1 V X 1 X 2 η ˙ 2 + g ˙ 2 η ˙ 2 Z 1 g ˙ 2 ϵ ˙ 2 + g ˙ 2 ϵ ˙ 2 g ˙ 1 X 1 V X 1 X ˙ 2 η ˙ 2 g ˙ 2 ϵ ˙ 2 g ˙ 2 η ˙ 2 ) ] = E [ g ˙ 1 ϵ ˙ 1 g ˙ 2 η ˙ 2 X 1 ϵ ˙ 1 V X 1 X 2 η ˙ 2 ] + E [ g ˙ 1 ϵ ˙ 1 g ˙ 2 η ˙ 2 Z 1 g ˙ 2 ϵ ˙ 2 ] + E [ g ˙ 1 ϵ ˙ 1 g ˙ 2 ϵ ˙ 2 g ˙ 1 X 1 V X 1 X ˙ 2 η ˙ 2 ] E [ g ˙ 1 ϵ ˙ 1 g ˙ 2 ϵ ˙ 2 g ˙ 2 η ˙ 2 ] = σ ˙ ϵ 2 σ ˙ η 2 + ϱ ˙ 2 + 0 0 = σ ˙ ϵ 2 σ ˙ η 2 + ϱ ˙ 2 E [ d β , 1 E β , 221 ( 1 ) ] = E [ g ˙ 1 ϵ ˙ 1 ( g ˙ 2 η ˙ 2 X 2 ϵ ˙ 2 V X 1 X 1 η ˙ 1 + g ˙ 2 η ˙ 2 Z 2 g ˙ 1 ϵ ˙ 1 + g ˙ 2 ϵ ˙ 2 g ˙ 2 X 2 V X 1 X ˙ 1 η ˙ 1 g ˙ 2 ϵ ˙ 2 g ˙ 1 η ˙ 1 ) ] = E [ g ˙ 1 ϵ ˙ 1 g ˙ 2 η ˙ 2 X 2 ϵ ˙ 2 V X 1 X 1 η ˙ 1 ] + E [ g ˙ 1 ϵ ˙ 1 g ˙ 2 η ˙ 2 Z 2 g ˙ 1 ϵ ˙ 1 ] + E [ g ˙ 1 ϵ ˙ 1 g ˙ 2 ϵ ˙ 2 g ˙ 2 X 2 V X 1 X ˙ 1 η ˙ 1 ] E [ g ˙ 1 ϵ ˙ 1 g ˙ 2 ϵ ˙ 2 g ˙ 1 η ˙ 1 ] = ϱ ˙ 2 + σ ˙ ϵ 2 σ ˙ 2 + 0 0 = ϱ ˙ 2 + σ ˙ ϵ 2 σ ˙ 2
so that
σ ( 1122 ) E [ d β , i 1 E β , i 2 i 3 i 4 ( 1 ) ] = ( l 1 ) ϱ ˙ 2
For  l = 2 ,
2 E [ d β , 1 E β , 122 ( 2 ) ] = E [ g ˙ 1 ϵ ˙ 1 g ˙ 1 η ˙ 1 g ˙ 2 g ˙ 2 ( η ˙ 2 ϵ ˙ 2 + ϵ ˙ 2 η ˙ 2 ) = 2 ϱ ˙ 2 2 E [ d β , 1 E β , 212 ( 2 ) ] = E [ g ˙ 1 ϵ ˙ 1 g ˙ 2 η ˙ 2 g ˙ 1 g ˙ 2 ( η ˙ 1 ϵ ˙ 2 + ϵ ˙ 1 η ˙ 2 ) = ϱ ˙ 2 + σ ˙ ϵ 2 σ ˙ η 2 2 E [ d β , 1 E β , 221 ( 2 ) ] = E [ g ˙ 1 ϵ ˙ 1 g ˙ 2 η ˙ 2 g ˙ 2 g ˙ 1 ( η ˙ 2 ϵ ˙ 1 + ϵ ˙ 2 η ˙ 1 ) = σ ˙ ϵ 2 σ ˙ η 2 + ϱ ˙ 2
so that
σ ( 1122 ) E [ d β , i 1 E β , i 2 i 3 i 4 ( 2 ) ] = σ ˙ ϵ 2 σ ˙ η 2 + 2 ϱ ˙ 2 .
For  l = 3 E [ d β , . E β , ( 3 ) ] = 0 .
For  l = 4 , we have
E [ d β , 1 E β , 122 ( 4 ) ] = E [ g ˙ 1 ϵ ˙ 1 { X 1 ϵ ˙ 1 V X 1 W 2 V X 1 X 2 η ˙ 2 Z 1 ( X 2 ϵ ˙ 2 V X 1 X 2 η ˙ 2 Z 2 g ˙ 2 ϵ ˙ 2 ) } ] = 0 l ϱ ˙ 2 + ϱ ˙ 2 = ( l 1 ) ϱ ˙ 2 , E [ d β , 1 E β , 212 ( 4 ) ] = E [ g ˙ 1 ϵ ˙ 1 { X 2 ϵ ˙ 2 V X 1 W 1 V X 1 X 2 η ˙ 2 Z 2 ( X 1 ϵ ˙ 1 V X 1 X 2 η ˙ 2 Z 1 g ˙ 2 ϵ ˙ 2 ) } ] = 0 σ ˙ ϵ 2 σ ˙ η 2 + ϱ ˙ 2 = ϱ ˙ 2 σ ˙ ϵ 2 σ ˙ η 2 , E [ d β , 1 E β , 221 ( 4 ) ] = E [ g ˙ 1 ϵ ˙ 1 { X 2 ϵ ˙ 2 V X 1 W 2 V X 1 X 1 η ˙ 1 Z 2 ( X 2 ϵ ˙ 2 V X 1 X 1 η ˙ 1 Z 2 g ˙ 1 ϵ ˙ 1 ) } ] = 0 ϱ ˙ 2 + σ ˙ ϵ 2 σ ˙ η 2 + σ ˙ ϵ 2 Var [ g ^ 1 2 ] = σ ˙ ϵ 2 σ ˙ η 2 ϱ ˙ 2 + σ ˙ ϵ 2 Var [ g ^ 1 2 ] ,
using the expression  E [ Z 1 2 ] = σ ˙ η 2 + Var [ g ^ 1 2 ] , so that
σ ( 1122 ) E [ d β , i 1 E β , i 2 i 3 i 4 ( 4 ) ] = ( l 1 ) ϱ ˙ 2 + σ ˙ ϵ 2 Var [ g ^ 1 2 ] .
For  l = 5 , we have
2 E [ d β , 1 E β , 122 ( 5 ) ] = E [ g ˙ 1 ϵ ˙ 1 Z 1 g ˙ 2 g ˙ 2 ( η ˙ 2 ϵ ˙ 2 + ϵ ˙ 2 η ˙ 2 ) ] = 2 ϱ ˙ 2 , 2 E [ d β , 1 E β , 212 ( 5 ) ] = E [ g ˙ 1 ϵ ˙ 1 Z 2 g ˙ 1 g ˙ 2 ( η ˙ 1 ϵ ˙ 2 + ϵ ˙ 1 η ˙ 2 ) ] = ϱ ˙ 2 + σ ˙ ϵ 2 σ ˙ η 2 , 2 E [ d β , 1 E β , 221 ( 5 ) ] = E [ g ˙ 1 ϵ ˙ 1 Z 2 g ˙ 2 g ˙ 1 ( η ˙ 2 ϵ ˙ 1 + ϵ ˙ 2 η ˙ 1 ) ] = σ ˙ ϵ 2 σ ˙ η 2 + ϱ ˙ 2
so that
σ ( 1122 ) E [ d β , i 1 E β , i 2 i 3 i 4 ( 5 ) ] = σ ˙ ϵ 2 σ ˙ η 2 + 2 ϱ ˙ 2 .
For  l = 6 , we have
2 E [ d β , 1 E β , 122 ( 6 ) ] = E [ g ˙ 1 ϵ ˙ 1 { g ˙ 1 X 1 V X 1 X 2 η ˙ 2 g ˙ 2 ϵ ˙ 2 + g ˙ 1 X 1 V X 1 g ˙ 2 ϵ ˙ 2 X 2 η ˙ 2 + X 1 η 1 V X 1 X 2 η ˙ 2 g ˙ 2 ϵ ˙ 2 + X 1 η 1 V X 1 g ˙ 2 ϵ ˙ 2 X 2 η ˙ 2 ( g ˙ 2 η 2 g ˙ 2 ϵ ˙ 2 + g ˙ 2 ϵ ˙ 2 g ˙ 2 η ˙ 2 ) } ] = 0 + 0 + ϱ ˙ 2 + ϱ ˙ 2 + 0 + 0 = 2 ϱ ˙ 2 , 2 E [ d β , 1 E β , 212 ( 6 ) ] = E [ g ˙ 1 ϵ ˙ 1 { g ˙ 2 X 2 V X 1 X 1 η ˙ 1 g ˙ 2 ϵ ˙ 2 + g ˙ 2 X 2 V X 1 g ˙ 1 ϵ ˙ 1 X 2 η ˙ 2 + X 2 η 2 V X 1 X 1 η ˙ 1 g ˙ 2 ϵ ˙ 2 + X 2 η 2 V X 1 g ˙ 1 ϵ ˙ 1 X 2 η ˙ 2 ( g ˙ 1 η 1 g ˙ 2 ϵ ˙ 2 + g ˙ 1 ϵ ˙ 1 g ˙ 2 η ˙ 2 ) } ] = 0 + 0 + ϱ ˙ 2 + l σ ˙ ϵ 2 σ ˙ η 2 0 0 = ϱ ˙ 2 + l σ ˙ ϵ 2 σ ˙ η 2 , 2 E [ d β , 1 E β , 221 ( 6 ) ] = E [ g ˙ 1 ϵ ˙ 1 { g ˙ 2 X 2 V X 1 X 2 η ˙ 2 g ˙ 1 ϵ ˙ 1 + g ˙ 2 X 2 V X 1 g ˙ 2 ϵ ˙ 2 X 1 η ˙ 1 + X 2 η 2 V X 1 X 2 η ˙ 2 g ˙ 1 ϵ ˙ 1 + X 2 η 2 V X 1 g ˙ 2 ϵ ˙ 2 X 1 η ˙ 1 ( g ˙ 2 η 2 g ˙ 1 ϵ ˙ 1 + g ˙ 2 ϵ ˙ 2 g ˙ 1 η ˙ 1 ) } ] = 0 + 0 + l σ ˙ ϵ 2 σ ˙ η 2 + ϱ ˙ 2 0 0 = l σ ˙ ϵ 2 σ ˙ η 2 + ϱ ˙ 2 ,
so that
σ ( 1122 ) E [ d β , i 1 E β , i 2 i 3 i 4 ( 6 ) ] = l σ ˙ ϵ 2 σ ˙ η 2 2 ϱ ˙ 2 .
Putting together these results, we have
V β , 22 = l = 1 6 σ ( 1122 ) E [ d β , i 1 E β , i 2 i 3 i 4 ( l ) ] = ( l 2 ) σ ˙ ϵ 2 σ ˙ η 2 2 ( l 2 ) ϱ ˙ 2 + σ ˙ ϵ 2 Var [ g ^ 1 2 ]
We find the components of the approximate kurtosis for the 2SLS estimator as follows:
K β , 1 = E [ d β , 1 4 ] 3 V β , 1 2 = E [ ( g ˙ 1 ϵ ˙ 1 ) 4 ] 3 E [ ( g ˙ 1 ϵ ˙ 1 ) 2 ] 2 = E [ g ˙ 1 4 ] E [ ϵ ˙ 1 4 ] 3 ( σ ˙ ϵ 2 ) 2 = Var [ g ˙ 1 2 ] E [ ϵ ˙ 1 4 ] + ( E [ ϵ ˙ 1 4 ] 3 ( σ ˙ ϵ 2 ) 2 ) .
K β , 2 = σ ( 11122 ) σ ( 11222 ) E [ d β , i 1 d β , i 2 d β , i 3 d β , i 4 i 5 ] .
We note from symmetry that some of the summands in  K β , 2  are equal for  i , j . Noting the symmetry we evaluate the summands in  K β , 2  as follows. Recall that
d β , i 1 , i 2 = A β , i 1 i 2 ( 1 ) + A β , i 1 i 2 ( 2 ) .
A β , i 1 i 2 ( 1 ) = X i 1 ϵ ˙ i 1 V X 1 X i 2 η ˙ i 2 Z i 1 g ˙ i 2 ϵ ˙ i 2 , 2 A β , i 1 i 2 ( 2 ) = { g ˙ i 1 g ˙ i 2 η ˙ i 1 ϵ ˙ i 2 + g ˙ i 2 g ˙ i 1 η ˙ i 2 ϵ ˙ i 1 } .
E [ d β , 1 d β , 1 d β , 1 d β , 22 ] = E [ d β , 1 3 ] E [ d β , 22 ] = S β , 1 B 2 .
E [ d β , 1 d β , 1 d β , 2 d β , 12 ] = E [ d β , 1 d β , 2 d β , 1 d β , 12 ] = E [ d β , 1 d β , 2 d β , 2 d β , 21 ] = E [ ( g ˙ 1 ϵ ˙ 1 ) 2 g ˙ 2 ϵ ˙ 2 ( X 1 ϵ ˙ 1 V X 1 X 2 η ˙ 2 Z 1 g ˙ 2 ϵ ˙ 2 ) ] E [ ( g ˙ 1 ϵ ˙ 1 ) 2 g ˙ 2 ϵ ˙ 2 ( g ˙ 1 g ˙ 2 η ˙ 1 ϵ ˙ 2 + g ˙ 2 g ˙ 1 η ˙ 2 ϵ ˙ 1 ) ] / 2 = E [ ( g ˙ 1 ϵ ˙ 1 ) 2 g ˙ 2 ϵ ˙ 2 ( X 1 ϵ ˙ 1 V X 1 X 2 η ˙ 2 ( g ˙ 1 2 + g ˙ 1 η ˙ 1 1 ) g ˙ 2 ϵ ˙ 2 ) ] E [ ( g ˙ 1 ϵ ˙ 1 ) 2 g ˙ 2 ϵ ˙ 2 ( g ˙ 1 g ˙ 2 η ˙ 1 ϵ ˙ 2 + g ˙ 2 g ˙ 1 η ˙ 2 ϵ ˙ 1 ) ] / 2 = S β , 1 ϱ ˙ E [ g ˙ 1 3 ] E [ ϵ ˙ 1 2 η ˙ 1 ] σ ˙ ϵ 2 E [ g ˙ 1 4 ] σ ˙ ϵ 2 ϱ ˙ + σ ˙ ϵ 2 ϱ ˙ E [ g ˙ 1 3 ] E [ ϵ ˙ 1 2 η ˙ 1 ] σ ˙ ϵ 2 / 2 S β , 1 ϱ ˙ / 2 = S β , 1 ϱ ˙ / 2 3 E [ g ˙ 1 3 ] E [ ϵ ˙ 1 2 η ˙ 1 ] σ ˙ ϵ 2 / 2 Var [ g ˙ 1 2 ] σ ˙ ϵ 2 ϱ ˙ .
E [ d β , 1 d β , 1 d β , 2 d β , 21 ] = E [ d β , 1 d β , 2 d β , 1 d β , 21 ] = E [ d β , 1 d β , 2 d β , 2 d β , 12 ] = E [ ( g ˙ 1 ϵ ˙ 1 ) 2 g ˙ 2 ϵ ˙ 2 ( X 2 ϵ ˙ 2 V X 1 X 1 η ˙ 1 Z 2 g ˙ 1 ϵ ˙ 1 ) ] E [ ( g ˙ 1 ϵ ˙ 1 ) 2 g ˙ 2 ϵ ˙ 2 ( g ˙ 2 g ˙ 1 η ˙ 2 ϵ ˙ 1 + g ˙ 1 g ˙ 2 η ˙ 1 ϵ ˙ 2 ) ] / 2 = E [ ( g ˙ 1 ϵ ˙ 1 ) 2 g ˙ 2 ϵ ˙ 2 ( X 2 ϵ ˙ 2 V X 1 X 1 η ˙ 1 ( g ˙ 2 2 + g ˙ 2 η ˙ 2 1 ) g ˙ 1 ϵ ˙ 1 ) ] E [ ( g ˙ 1 ϵ ˙ 1 ) 2 g ˙ 2 ϵ ˙ 2 ( g ˙ 2 g ˙ 1 η ˙ 2 ϵ ˙ 1 + g ˙ 1 g ˙ 2 η ˙ 1 ϵ ˙ 2 ) ] / 2 = E [ g ˙ 1 3 ] E [ ϵ ˙ 1 2 η ˙ 1 ] σ ˙ ϵ 2 0 E [ g ˙ 1 3 ] E [ ϵ ˙ 1 3 ] ϱ ˙ + 0 E [ g ˙ 1 3 ] E [ ϵ ˙ 1 3 ] ϱ ˙ / 2 E [ g ˙ 1 3 ] E [ ϵ ˙ 1 2 η ˙ 1 ] σ ˙ ϵ 2 = E [ g ˙ 1 3 ] E [ ϵ ˙ 1 2 η ˙ 1 ] σ ˙ ϵ 2 / 2 3 S β , 1 ϱ ˙ / 2 .
E [ d β , 1 d β , 2 d β , 2 d β , 11 ] = E [ d β , 1 d β , 1 d β , 2 d β , 22 ] = E [ d β , 1 d β , 2 d β , 1 d β , 22 ] = E [ d β , 2 2 ] E [ d β , 1 d β , 1 ] = V β , 1 V β , 21 .
Putting these results together, we have
K β , 2 = S β , 1 B β , 2 + 3 ( s 1 ϱ ˙ / 2 3 E [ g ˙ 1 3 ] E [ ϵ ˙ 1 2 η ˙ 1 ] σ ˙ ϵ 2 / 2 ) + 3 ( E [ g ˙ 1 3 ] E [ ϵ ˙ 1 2 η ˙ 1 ] σ ˙ ϵ 2 / 2 3 s 1 ϱ ˙ / 2 ) + 3 σ ˙ ϵ 2 V β , 21 = S β , 1 B β , 2 + 3 ( σ ˙ ϵ 2 V β , 21 E [ g ˙ 1 3 ] E [ ϵ ˙ 1 2 η ˙ 1 ] σ ˙ ϵ 2 S β , 1 ϱ ˙ Var [ g ˙ 1 2 ] σ ˙ ϵ 2 ϱ ˙ ) .
K β , 3 = σ ( 112233 ) E [ d i 1 d i 2 d i 3 d i 4 i 5 i 6 ] .
We note from symmetry that many of the summands in  K β , 3  are equal for different permutations of the indices. These are
σ a = { ( 112233 ) , ( 121233 ) , ( 122133 ) } , σ b = { ( 112323 ) , ( 121323 ) , ( 122313 ) } . σ c = { ( 112332 ) , ( 121332 ) , ( 122331 ) } , σ d = { ( 123123 ) , ( 123132 ) , ( 123213 ) , ( 123312 ) , ( 123231 ) , ( 123321 ) } .
First, we see that
E [ d β , 1 d β , 1 d β , 2 d β , 233 ] = E [ d β , 1 d β , 2 d β , 1 d β , 233 ] = E [ d β , 1 d β , 2 d β , 2 d β , 133 ] = V β , 1 E [ d 2 d 233 ] , E [ d β , 1 d β , 1 d β , 2 d β , 323 ] = E [ d β , 1 d β , 2 d β , 1 d β , 323 ] = E [ d β , 1 d β , 2 d β , 2 d β , 313 ] = V β , 1 E [ d β , 2 d β , 323 ] , E [ d β , 1 d β , 1 d β , 2 d β , 332 ] = E [ d β , 1 d β , 2 d β , 1 d β , 332 ] = E [ d β , 1 d β , 2 d β , 2 d β , 331 ] = V β , 1 E [ d β , 2 d β , 332 ]
By inspection and comparison with (161) we see that
V β , 1 V β , 22 = E [ d β , 1 d β , 1 d β , 2 d β , 233 ] + E [ d β , 1 d β , 1 d β , 2 d β , 323 ] + E [ d β , 1 d β , 1 d β , 2 d β , 332 ]
so that
σ a σ b σ c E [ d β , i 1 d β , i 2 d β , i 3 d β , i 4 i 5 i 6 ] = 3 V β , 1 V β , 22 .
With respect to the six terms in  σ d  we have
E [ d β , 1 d β , 2 d β , 3 d β , 123 ] = E [ d β , 1 d β , 2 d β , 3 d β , 132 ] = E [ d β , 1 d β , 2 d β , 3 d β , 213 ] = E [ d β , 1 d β , 2 d β , 3 d β , 312 ] = E [ d β , 1 d β , 2 d β , 3 d β , 231 ] = E [ d β , 1 d β , 2 d β , 3 d β , 321 ] = l = 1 6 E [ d β , 1 d β , 2 d β , 3 E β , 123 ( l ) ] .
We evaluate the summands for  l = 1 , 2 , , 6 . Recall that  Z i = g ˙ i 2 + g ˙ i η ˙ i 1 .
E β , i 1 i 2 i 3 ( 1 ) = g ˙ i 1 η ˙ i 1 X i 2 ϵ ˙ i 2 V X 1 X i 3 η ˙ i 3 + g ˙ i 1 η ˙ i 1 Z i 2 g ˙ i 3 ϵ ˙ i 3 + g ˙ i 1 ϵ ˙ i 1 g ˙ i 2 X i 2 V X 1 X ˙ i 3 η ˙ i 3 g ˙ i 1 ϵ ˙ i 1 g ˙ i 3 η ˙ i 3 .
E [ d β , 1 d β , 2 d β , 3 E β , 123 ( 1 ) ] = E [ g ˙ 1 ϵ ˙ 1 g ˙ 2 ϵ ˙ 2 g ˙ 3 ϵ ˙ 3 g ˙ 1 η ˙ 1 X 2 ϵ ˙ 2 V X 1 X 3 η ˙ 3 ] + E [ g ˙ 1 ϵ ˙ 1 g ˙ 2 ϵ ˙ 2 g ˙ 3 ϵ ˙ 3 g ˙ 1 η ˙ 1 Z 2 g ˙ 3 ϵ ˙ 3 ] + E [ g ˙ 1 ϵ ˙ 1 g ˙ 2 ϵ ˙ 2 g ˙ 3 ϵ ˙ 3 g ˙ 1 ϵ ˙ 1 g ˙ 2 X 2 V X 1 X ˙ 3 η ˙ 3 ] E [ g ˙ 1 ϵ ˙ 1 g ˙ 2 ϵ ˙ 2 g ˙ 3 ϵ ˙ 3 g ˙ 1 ϵ ˙ 1 g ˙ 3 η ˙ 3 ] = ρ ˙ 2 σ ˙ ϵ 2 + ρ ˙ 2 σ ˙ ϵ 2 + 0 0 = 0 .
2 E [ d 1 d 2 d 3 E β , 123 ( 2 ) ] = E [ g ˙ 1 ϵ ˙ 1 g ˙ 2 ϵ ˙ 2 g ˙ 3 ϵ ˙ 3 g ˙ 1 η ˙ 1 g ˙ 2 g ˙ 3 ( η ˙ 2 ϵ ˙ 3 + ϵ ˙ 2 η ˙ 3 ) ] = 2 ρ ˙ 2 σ ˙ ϵ 2 .
E [ d 1 d 2 d 3 E β , 123 ( 3 ) ] = 0 .
E [ d 1 d 2 d 3 E β , 123 ( 4 ) ] = E [ g ˙ 1 ϵ ˙ 1 g ˙ 2 ϵ ˙ 2 g ˙ 3 ϵ ˙ 3 X 1 ϵ ˙ 1 V X 1 W 2 V X 1 X 3 η ˙ 3 ] E [ g ˙ 1 ϵ ˙ 1 g ˙ 2 ϵ ˙ 2 g ˙ 3 ϵ ˙ 3 Z 1 ( X 2 ϵ ˙ 2 V X 1 X 3 η ˙ 3 Z 2 g ˙ 3 ϵ ˙ 3 ) ] = 0 ρ ˙ 2 σ ˙ ϵ 2 + ρ ˙ 2 σ ˙ ϵ 2 = 0 .
2 E [ d 1 d 2 d 3 E β , 123 ( 5 ) ] = E [ g ˙ 1 ϵ ˙ 1 g ˙ 2 ϵ ˙ 2 g ˙ 3 ϵ ˙ 3 Z 1 g ˙ 2 g ˙ 3 ( η ˙ 2 ϵ ˙ 3 + ϵ ˙ 2 η ˙ 3 ) ] = 2 ρ ˙ 2 σ ˙ ϵ 2 .
2 E [ d 1 d 2 d 3 E β , 123 ( 6 ) ] = E [ g ˙ 1 ϵ ˙ 1 g ˙ 2 ϵ ˙ 2 g ˙ 3 ϵ ˙ 3 g ˙ 1 X 1 V X 1 X 2 η ˙ 2 g ˙ 3 ϵ ˙ 3 + E [ g ˙ 1 ϵ ˙ 1 g ˙ 2 ϵ ˙ 2 g ˙ 3 ϵ ˙ 3 g ˙ 1 X 1 V X 1 g ˙ 2 ϵ ˙ 2 X 3 η ˙ 3 + E [ g ˙ 1 ϵ ˙ 1 g ˙ 2 ϵ ˙ 2 g ˙ 3 ϵ ˙ 3 X 1 η 1 V X 1 X 2 η ˙ 2 g ˙ 3 ϵ ˙ 3 + E [ g ˙ 1 ϵ ˙ 1 g ˙ 2 ϵ ˙ 2 g ˙ 3 ϵ ˙ 3 X 1 η 1 V X 1 g ˙ 2 ϵ ˙ 2 X 3 η ˙ 3 E [ g ˙ 1 ϵ ˙ 1 g ˙ 2 ϵ ˙ 2 g ˙ 3 ϵ ˙ 3 ( g ˙ 2 η 2 g ˙ 3 ϵ ˙ 3 + g ˙ 2 ϵ ˙ 2 g ˙ 3 η ˙ 3 ) = 0 + 0 + ρ ˙ 2 σ ˙ ϵ 2 + ρ ˙ 2 σ ˙ ϵ 2 + 0 = 2 ρ ˙ 2 σ ˙ ϵ 2 .
Thus,
E [ d β , 1 d β , 2 d β , 3 d β , 123 ] = l = 1 6 E [ d β , 1 d β , 2 d β , 3 E β , 123 ( l ) ] = ρ ˙ 2 σ ˙ ϵ 2
and so
σ d E [ d i 1 d i 2 d i 3 d i 4 i 5 i 6 ] = 6 l = 1 6 g ˙ 1 ϵ ˙ 1 g ˙ 2 ϵ ˙ 2 g ˙ 3 ϵ ˙ 3 E β , 123 ( l ) = 6 ρ ˙ 2 σ ˙ ϵ 2
so that
K β , 3 = 3 V β , 1 V β , 22 + 6 ρ ˙ 2 σ ˙ ϵ 2 = 3 V β , 1 ( V β , 22 + 2 ρ ˙ 2 ) .
Next, we evaluate
K β , 4 = σ ( 112233 ) E [ d β , i 1 d β , i 2 d β , i 3 i 4 d β , i 5 i 6 ] .
Examining the terms in  K β , 4 , we see that the first three terms (as we go through the set of indices in  σ ( 112233 ) ) are
E [ d β , 1 2 ( d β , 22 d β , 33 + d β , 23 d β , 23 + d β , 23 d β , 32 ) ] = E [ d β , 1 2 ] × E [ d β , 22 d β , 33 + d β , 23 d β , 23 + d β , 23 d β , 32 ] = V β , 1 V β , 23 .
Next, we see that
E [ d 1 d 2 d 12 d 33 ] = E [ d 1 d 2 d 21 d 33 ] = E [ d 1 d 2 d 33 d 12 ] = E [ d 1 d 2 d 33 d 21 ] = E [ g ˙ 1 ϵ ˙ 1 g ˙ 2 ϵ ˙ 2 ( X 1 ϵ ˙ 1 V X 1 X 2 η ˙ 2 Z 1 g ˙ 2 ϵ ˙ 2 { g ˙ 1 g ˙ 2 η ˙ 1 ϵ ˙ 2 + g ˙ 2 g ˙ 1 η ˙ 2 ϵ ˙ 1 } / 2 ) ] E [ d 33 ] = ( σ ˙ ϵ 2 ϱ ˙ σ ˙ ϵ 2 ϱ ˙ { σ ˙ ϵ 2 ϱ ˙ + σ ˙ ϵ 2 ϱ ˙ } / 2 ) B β , 2
= σ ˙ ϵ 2 ϱ ˙ B β , 2 .
There are eight terms remaining, which can be treated two-by-two and then summed together neatly as follows.
E [ d 1 d 2 d 13 d 23 ] = E [ d 1 d 2 d 23 d 13 ] = E [ g ˙ 1 ϵ ˙ 1 g ˙ 2 ϵ ˙ 2 ( X 1 ϵ ˙ 1 V X 1 X 3 η ˙ 3 Z 1 g ˙ 3 ϵ ˙ 3 { g ˙ 1 g ˙ 3 η ˙ 1 ϵ ˙ 3 + g ˙ 3 g ˙ 1 η ˙ 3 ϵ ˙ 1 } / 2 ) ( X 2 ϵ ˙ 2 V X 1 X 3 η ˙ 3 Z 2 g ˙ 3 ϵ ˙ 3 { g ˙ 2 g ˙ 3 η ˙ 2 ϵ ˙ 3 + g ˙ 3 g ˙ 2 η ˙ 3 ϵ ˙ 2 } / 2 ) ] = E [ ϵ ˙ 1 ϵ ˙ 2 ( ϵ ˙ 1 g ˙ 3 η ˙ 3 η ˙ 1 g ˙ 3 ϵ ˙ 3 { g ˙ 3 η ˙ 1 ϵ ˙ 3 + g ˙ 3 η ˙ 3 ϵ ˙ 1 } / 2 ) ( ϵ ˙ 2 g ˙ 3 η ˙ 3 η ˙ 2 g ˙ 3 ϵ ˙ 3 { g ˙ 3 η ˙ 2 ϵ ˙ 3 + g ˙ 3 η ˙ 3 ϵ ˙ 2 } / 2 ) ] = E [ ϵ ˙ 1 ϵ ˙ 2 ( ϵ ˙ 1 η ˙ 3 η ˙ 1 ϵ ˙ 3 { η ˙ 1 ϵ ˙ 3 + η ˙ 3 ϵ ˙ 1 } / 2 ) ( ϵ ˙ 2 η ˙ 3 η ˙ 2 ϵ ˙ 3 { η ˙ 2 ϵ ˙ 3 + η ˙ 3 ϵ ˙ 2 } / 2 ) ] = E [ ( σ ˙ ϵ 2 η ˙ 3 ϱ ˙ ϵ ˙ 3 { ϱ ˙ ϵ ˙ 3 + η ˙ 3 σ ˙ ϵ 2 } / 2 ) ( σ ˙ ϵ 2 η ˙ 3 ϱ ˙ ϵ ˙ 3 { ϱ ˙ ϵ ˙ 3 + η ˙ 3 σ ˙ ϵ 2 } / 2 ) ] .
E [ d 1 d 2 d 13 d 32 ] = E [ d 1 d 2 d 32 d 13 ] = E [ g ˙ 1 ϵ ˙ 1 g ˙ 2 ϵ ˙ 2 ( X 1 ϵ ˙ 1 V X 1 X 3 η ˙ 3 Z 1 g ˙ 3 ϵ ˙ 3 { g ˙ 1 g ˙ 3 η ˙ 1 ϵ ˙ 3 + g ˙ 3 g ˙ 1 η ˙ 3 ϵ ˙ 1 } / 2 ) ( X 3 ϵ ˙ 3 V X 1 X 2 η ˙ 2 Z 3 g ˙ 2 ϵ ˙ 2 { g ˙ 3 g ˙ 2 η ˙ 3 ϵ ˙ 2 + g ˙ 2 g ˙ 3 η ˙ 2 ϵ ˙ 3 } / 2 ) ] = E [ ϵ ˙ 1 ϵ ˙ 2 ( ϵ ˙ 1 g ˙ 3 η ˙ 3 η ˙ 1 g ˙ 3 ϵ ˙ 3 { g ˙ 3 η ˙ 1 ϵ ˙ 3 + g ˙ 3 η ˙ 3 ϵ ˙ 1 } / 2 ) ( ϵ ˙ 3 g ˙ 3 η ˙ 2 η ˙ 3 g ˙ 3 ϵ ˙ 2 { g ˙ 3 η ˙ 3 ϵ ˙ 2 + g ˙ 3 η ˙ 2 ϵ ˙ 3 } / 2 ) ] = E [ ϵ ˙ 1 ϵ ˙ 2 ( ϵ ˙ 1 η ˙ 3 η ˙ 1 ϵ ˙ 3 { η ˙ 1 ϵ ˙ 3 + η ˙ 3 ϵ ˙ 1 } / 2 ) ( ϵ ˙ 3 η ˙ 2 η ˙ 3 ϵ ˙ 2 { η ˙ 3 ϵ ˙ 2 + η ˙ 2 ϵ ˙ 3 } / 2 ) ] = E [ ( σ ˙ ϵ 2 η ˙ 3 ϱ ˙ ϵ ˙ 3 { ϱ ˙ ϵ ˙ 3 + η ˙ 3 σ ˙ ϵ 2 } / 2 ) ( ϱ ˙ ϵ ˙ 3 σ ˙ ϵ 2 η ˙ 3 { σ ˙ ϵ 2 ϵ ˙ 3 + η ˙ 3 ϱ ˙ } / 2 ) ] .
E [ d 1 d 2 d 31 d 23 ] = E [ d 1 d 2 d 23 d 31 ] = E [ g ˙ 1 ϵ ˙ 1 g ˙ 2 ϵ ˙ 2 ( X 3 ϵ ˙ 3 V X 1 X 1 η ˙ 1 Z 3 g ˙ 1 ϵ ˙ 1 { g ˙ 3 g ˙ 1 η ˙ 3 ϵ ˙ 1 + g ˙ 1 g ˙ 3 η ˙ 1 ϵ ˙ 3 } / 2 ) ( X 2 ϵ ˙ 2 V X 1 X 3 η ˙ 3 Z 2 g ˙ 3 ϵ ˙ 3 { g ˙ 2 g ˙ 3 η ˙ 2 ϵ ˙ 3 + g ˙ 3 g ˙ 2 η ˙ 3 ϵ ˙ 2 } / 2 ) ] = E [ ϵ ˙ 1 ϵ ˙ 2 ( ϵ ˙ 3 g ˙ 3 η ˙ 1 η ˙ 3 g ˙ 3 ϵ ˙ 1 { g ˙ 3 η ˙ 3 ϵ ˙ 1 + g ˙ 3 η ˙ 1 ϵ ˙ 3 } / 2 ) ( ϵ ˙ 2 g ˙ 3 η ˙ 3 η ˙ 2 g ˙ 3 ϵ ˙ 3 { g ˙ 3 η ˙ 2 ϵ ˙ 3 + g ˙ 3 η ˙ 3 ϵ ˙ 2 } / 2 ) ] = E [ ϵ ˙ 1 ϵ ˙ 2 ( ϵ ˙ 3 η ˙ 1 η ˙ 3 ϵ ˙ 1 { η ˙ 3 ϵ ˙ 1 + η ˙ 1 ϵ ˙ 3 } / 2 ) ( ϵ ˙ 2 η ˙ 3 η ˙ 2 ϵ ˙ 3 { η ˙ 2 ϵ ˙ 3 + η ˙ 3 ϵ ˙ 2 } / 2 ) ] = E [ ( ϱ ˙ ϵ ˙ 3 σ ˙ ϵ 2 η ˙ 3 { σ ˙ ϵ 2 η ˙ 3 + ϵ ˙ 3 ϱ ˙ } / 2 ) ( σ ˙ ϵ 2 η ˙ 3 ϱ ˙ ϵ ˙ 3 { ϱ ˙ ϵ ˙ 3 + η ˙ 3 σ ˙ ϵ 2 } / 2 ) ] .
E [ d 1 d 2 d 31 d 32 ] = E [ d 1 d 2 d 32 d 31 ] = E [ g ˙ 1 ϵ ˙ 1 g ˙ 2 ϵ ˙ 2 ( X 3 ϵ ˙ 3 V X 1 X 1 η ˙ 1 Z 3 g ˙ 1 ϵ ˙ 1 { g ˙ 3 g ˙ 1 η ˙ 3 ϵ ˙ 1 + g ˙ 1 g ˙ 3 η ˙ 1 ϵ ˙ 3 } / 2 ) ( X 3 ϵ ˙ 3 V X 1 X 2 η ˙ 2 Z 3 g ˙ 2 ϵ ˙ 2 { g ˙ 3 g ˙ 2 η ˙ 3 ϵ ˙ 2 + g ˙ 2 g ˙ 3 η ˙ 2 ϵ ˙ 3 } / 2 ) ] = E [ ϵ ˙ 1 ϵ ˙ 2 ( ϵ ˙ 3 g ˙ 3 η ˙ 1 η ˙ 3 g ˙ 3 ϵ ˙ 1 { g ˙ 3 η ˙ 3 ϵ ˙ 1 + g ˙ 3 η ˙ 1 ϵ ˙ 3 } / 2 ) ( ϵ ˙ 3 g ˙ 2 η ˙ 2 η ˙ 3 g ˙ 2 ϵ ˙ 2 { g ˙ 2 η ˙ 3 ϵ ˙ 2 + g ˙ 2 η ˙ 2 ϵ ˙ 3 } / 2 ) ] = E [ ϵ ˙ 1 ϵ ˙ 2 ( ϵ ˙ 3 η ˙ 1 η ˙ 3 ϵ ˙ 1 { η ˙ 3 ϵ ˙ 1 + η ˙ 1 ϵ ˙ 3 } / 2 ) ( ϵ ˙ 3 η ˙ 2 η ˙ 3 ϵ ˙ 2 { η ˙ 3 ϵ ˙ 2 + η ˙ 2 ϵ ˙ 3 } / 2 ) ] = E [ ( ϱ ˙ ϵ ˙ 3 σ ˙ ϵ 2 η ˙ 3 { σ ˙ ϵ 2 η ˙ 3 + ϵ ˙ 3 ϱ ˙ } / 2 ) ( ϱ ˙ ϵ ˙ 3 σ ˙ ϵ 2 η 3 { σ ˙ ϵ 2 η 3 + ϵ ˙ 3 ϱ ˙ } / 2 ) ] .
Temporarily, put
C 1 = E [ d 1 d 2 d 13 d 32 ] + E [ d 1 d 2 d 13 d 32 ] , C 2 = E [ d 1 d 2 d 31 d 23 ] + E [ d 1 d 2 d 31 d 32 ] , C 3 = E [ d 1 d 2 d 23 d 13 ] + E [ d 1 d 2 d 32 d 13 ] , C 4 = E [ d 1 d 2 d 23 d 31 ] + E [ d 1 d 2 d 32 d 31 ] .
We see that
C 1 = E [ ( σ ˙ ϵ 2 η ˙ 3 ϱ ˙ ϵ ˙ 3 { ϱ ˙ ϵ ˙ 3 + η ˙ 3 σ ˙ ϵ 2 } / 2 ) ( σ ˙ ϵ 2 ϵ ˙ 3 + η ˙ 3 ϱ ˙ ) ] C 2 = E [ ( ϱ ˙ ϵ ˙ 3 σ ˙ ϵ 2 η ˙ 3 { σ ˙ ϵ 2 η ˙ 3 + ϵ ˙ 3 ϱ ˙ } / 2 ) ( σ ˙ ϵ 2 η 3 + ϵ ˙ 3 ϱ ˙ ) ] ,
so that
C 1 + C 2 = E [ ( σ ˙ ϵ 2 η ˙ 3 + ϵ ˙ 3 ϱ ˙ ) ( σ ˙ ϵ 2 η 3 + ϵ ˙ 3 ϱ ˙ ) ] = E [ ( σ ˙ ϵ 2 η ˙ 3 + ϵ ˙ 3 ϱ ˙ ) 2 ] = ( σ ˙ ϵ 2 ) 2 σ ˙ η 2 + 2 σ ˙ ϵ 2 ϱ ˙ 2 + σ ˙ ϵ 2 ϱ ˙ 2 = σ ˙ ϵ 2 ( σ ˙ ϵ 2 ) σ ˙ η 2 + 3 ϱ ˙ 2 ) .
Noting the equalities between  C 1  and  C 3  and  C 2  and  C 4 , the last eight terms in  K β , 4  can be written as
C 1 + C 2 + C 3 + C 4 = 2 σ ˙ ϵ 2 ( σ ˙ ϵ 2 σ ˙ η 2 + 3 ϱ ˙ 2 )
and we have
K β , 4 = V β , 1 V β , 23 4 σ ˙ ϵ ϱ ˙ B β , 2 + 2 σ ˙ ϵ 2 ( σ ˙ ϵ 2 σ ˙ η 2 + 3 ϱ ˙ 2 ) = V β , 1 ( V β , 23 4 ϱ ˙ B β , 2 + 2 σ ˙ ϵ 2 σ ˙ η 2 + 6 ϱ ˙ 2 ) .

Funding

This research received no external funding.

Data Availability Statement

Dataset available on request from the author.

Acknowledgments

The author is grateful for comments from the editor, two referees, and participants at the African Meetings of the Econometric Society in Nairobi, Kenya, 2023. Any errors are the responsibility of the author.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Rilstone, P.; Ullah, A. The sampling bias of Heckman’s sample bias estimator. In Recent Advances in Statistical Methods; Chaubey, Y.P., Ed.; World Scientific Publishing: London, UK, 2001. [Google Scholar]
  2. Newey, W.K.; Smith, R. Higher Order Properties of GMM and Generalized Empirical Likelihood Estimators. Econometrica 2004, 72, 219–255. [Google Scholar]
  3. Nagar, A.L. The Bias and Moments Matrix of the General k-Class Estimators of the Parameters in Structural Equations. Econometrica 1959, 27, 575–595. [Google Scholar] [CrossRef]
  4. Firth, D. Bias reduction of maximum likelihood estimates. Biomelrika 1993, 80, 27–38. [Google Scholar] [CrossRef]
  5. Rilstone, P.; Srivastava, V.K.; Ullah, A. The second-order bias and mean squared error of nonlinear estimators. J. Econom. 1996, 75, 369–395. [Google Scholar] [CrossRef]
  6. Hahn, J.; Newey, W. Jackknife and Analytical Bias Reduction for Nonlinear Panel Models. Econometrica 2004, 72, 1295–1319. [Google Scholar] [CrossRef]
  7. Bao, Y.; Ullah, A. On skewness and kurtosis of econometric estimators. Econom. J. 2009, 12, 232–247. [Google Scholar] [CrossRef]
  8. Iglesias, E. First and Second Order Asymptotic Bias Correction of Nonlinear Estimators in a Non-Parametric Setting and an Application to the Smoothed Maximum Score Estimator. Stud. Nonlinear Dyn. Econom. 2010, 14, 1–28. [Google Scholar] [CrossRef]
  9. Chen, Q.; Giles, D.E. Finite-sample properties of the maximum likelihood estimator for the binary logit model with random covariates. Stat. Pap. 2012, 53, 409–426. [Google Scholar] [CrossRef]
  10. Khundi, G.; Rilstone, P. Simplified Matrix Methods for Multivariate Edgeworth Expansions. J. Quant. Econ. 2020, 18, 293–326. [Google Scholar]
  11. Rilstone, P. Higher-Order Stochastic Expansions and Approximate Moments for Non-linear Models with Heterogeneous Observations. J. Quant. Econ. 2021, 19, 99–120. [Google Scholar] [CrossRef]
  12. Greene, W.H. Econometric Analysis; Prentice Hall: New York, NY, USA, 2012. [Google Scholar]
  13. Wooldridge, J.M. Econometric Analysis of Cross Section and Panel Data, 2nd ed.; MIT Press: Cambridge, MA, USA, 2010. [Google Scholar]
  14. Hansen, L.P.; Heaton, J.; Yaron, A. Finite-Sample Properties of Some Alternative GMM Estimators. J. Bus. Econ. Stat. 1996, 14, 262–280. [Google Scholar] [CrossRef]
  15. Rothenberg, T.J. Approximating the distributions of econometric estimators and test statistics. In Handbook of Econometrics; Griliches, Z., Intriligator, M.D., Eds.; Elsevier Science Inc.: New York, NY, USA, 1984; Volume 2, Charpter 15. [Google Scholar]
  16. Nelson, C.R.; Startz, R. Some Further Results on the Exact Small Sample Properties of the Instrumental Variable Estimator. Econometrica 1990, 58, 967–976. [Google Scholar] [CrossRef]
  17. Bound, J.; Jaeger, D.A.; Baker, R.M. Problems with Instrumental Variables Estimation When the Correlation Between the Instruments and the Endogeneous Explanatory Variable is Weak. J. Am. Stat. Assoc. 1995, 90, 443–450. [Google Scholar] [CrossRef]
  18. Staiger, D.; Stock, J.H. Instrumental variables regression with weak instruments. Econometrica 1997, 65, 557–586. [Google Scholar] [CrossRef]
  19. Kleibergen, F. Improved Accuracy of Weak Instrument Robust GMM Statistics Through Bootstrap and Edgeworth Approximations; University of Amsterdam: Amsterdam, The Netherlands, 2019. [Google Scholar]
Figure 1. Densities of IV and Bias-Corrected IV Estimators.
Figure 1. Densities of IV and Bias-Corrected IV Estimators.
Mathematics 13 00179 g001
Table 1. Average of OLS, IV, and two bias-corrected IV estimators.
Table 1. Average of OLS, IV, and two bias-corrected IV estimators.
  ρ OLS   β ^   β ^ BC *   β ^ BC
0.500.3733−0.0342−0.00500.0234
0.750.5591−0.0531−0.00950.0341
1.000.7449−0.0721−0.01390.0447
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Rilstone, P. Higher-Order Expansions for Estimators in the Presence of Nuisance Parameters. Mathematics 2025, 13, 179. https://doi.org/10.3390/math13020179

AMA Style

Rilstone P. Higher-Order Expansions for Estimators in the Presence of Nuisance Parameters. Mathematics. 2025; 13(2):179. https://doi.org/10.3390/math13020179

Chicago/Turabian Style

Rilstone, Paul. 2025. "Higher-Order Expansions for Estimators in the Presence of Nuisance Parameters" Mathematics 13, no. 2: 179. https://doi.org/10.3390/math13020179

APA Style

Rilstone, P. (2025). Higher-Order Expansions for Estimators in the Presence of Nuisance Parameters. Mathematics, 13(2), 179. https://doi.org/10.3390/math13020179

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop