Next Article in Journal
Generalized Spatial Two Stage Least Squares Estimation of Spatial Autoregressive Models with Autoregressive Disturbances in the Presence of Endogenous Regressors and Many Instruments
Previous Article in Journal
Constructing U.K. Core Inflation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Outlier Detection in Regression Using an Iterated One-Step Approximation to the Huber-Skip Estimator

1
Department of Economics, University of Copenhagen, Øster Farimagsgade 5, 1353 Copenhagen, Denmark
2
CREATES, Department of Economics and Business, Aarhus University, Fuglesangs Alle 4,8210 Aarhus, Denmark
3
Department of Economics, University of Oxford & Nuffield College, OX1 1NF, Oxford, UK
*
Author to whom correspondence should be addressed.
Econometrics 2013, 1(1), 53-70; https://doi.org/10.3390/econometrics1010053
Submission received: 28 January 2013 / Revised: 3 April 2013 / Accepted: 3 April 2013 / Published: 13 May 2013

Abstract

:
In regression we can delete outliers based upon a preliminary estimator and re-estimate the parameters by least squares based upon the retained observations. We study the properties of an iteratively defined sequence of estimators based on this idea. We relate the sequence to the Huber-skip estimator. We provide a stochastic recursion equation for the estimation error in terms of a kernel, the previous estimation error and a uniformly small error term. The main contribution is the analysis of the solution of the stochastic recursion equation as a fixed point, and the results that the normalized estimation errors are tight and are close to a linear function of the kernel, thus providing a stochastic expansion of the estimators, which is the same as for the Huber-skip. This implies that the iterated estimator is a close approximation of the Huber-skip.

Graphical Abstract

1. Introduction and Main Results

Outlier detection in regression is an important topic in econometrics. The idea is to find an estimation method that is robust to the presence of outliers, and the statistical literature abounds in robust methods, since the introduction of M-estimators by Huber [1], see also the monographs Maronna, Martin, and Yohai [2], Huber and Ronchetti [3], and Jurečková, Sen, and Picek [4]. Recent contributions are the impulse indicator saturation method, see Hendry, Johansen, and Santos [5] and Johansen and Nielsen [6], and the Forward Search, see Atkinson, Riani, and Cerioli [7].
The present paper is a contribution to the theory of the robust estimators, where we focus on the Huber [1] skip-estimator that minimizes
i = 1 n ρ ( y i β X i ) ,
where the objective function, ρ , is given by
ρ ( z ) = 1 2 min ( z 2 , c 2 ) = 1 2 ( z 2 1 ( | z | c ) + c 2 1 ( | z | > c ) ) .
This estimator removes the observations with large residuals, something that, at least in the analysis of economic time series, appears to be a reasonable method.
It is seen that ρ is absolutely continuous with derivative ρ ( z ) = z 1 ( | z | c ) , but ρ ( z ) is neither monotone nor absolutely continuous, which makes the calculation of the minimizer somewhat tricky, and the asymptotic analysis rather difficult.
Thus the estimator is often replaced by the Winsorized estimator, which has convex objective function
ρ 1 ( z ) = 1 2 z 2 1 ( | z | c ) + c ( | z | 1 2 c ) 1 ( | z | > c )
with derivative
ρ 1 ( z ) = z 1 ( | z | c ) + c sign ( z ) 1 ( | z | > c ) ,
which is both monotone and absolutely continuous and hence a lot easier to analyse, see Huber [1]. Note, however, that the function ρ 1 replaces the large residuals by ± c , instead of removing the observation. This is a less common method in time series econometrics.
An alternative simplification is formulated by Bickel [8], who suggested applying a preliminary estimator β ^ n 0 and define the one-step estimator, β ^ n 1 , by linearising the first order condition. He also suggested iterating this by using β ^ n 1 as initial estimator for β ^ n 2 etc., but no results were given.
In the analysis of the Huber-skip, derived from ρ , we shall replace β by a preliminary estimator in the indicator function, which leads to eliminating the outlying observations, and run a regression on the retained observations. We shall do so iteratively and study the sequence of recursively defined estimators β ^ n m . We prove under fairly general assumptions on regressors and distribution that for ( m , n ) , the estimator β ^ n m has the same asymptotic expansion as the Huber-skip, and in this sense β ^ n m , which is easy to calculate, is a very good approximation to the Huber-skip.
One-step M-estimators have been analysed previously in various situations. Apart from Bickel [8], who considered a situation with fixed regressors and weight functions satisfying certain smoothness and integrability conditions, Ruppert and Carroll [9] considered one-step Huber-skip L-estimators. Welsh and Ronchetti analysed the one-step Huber-skip estimator when the initial estimator is the least squares estimator, as well as one-step M-estimators with general initial estimator but with a function ρ with absolutely continuous derivative [10]. Recently Cavaliere and Georgiev analysed a sequence of Huber-skip estimators for the parameter of an A R ( 1 ) model with infinite variance errors in case the autoregressive coefficient is 1 [11]. Johansen and Nielsen analysed one-step Huber-skip estimators for general n 1 / 2 consistent initial estimators and stationary as well as some non-stationary regressors [6].
Iterated one-step M-estimators are related to iteratively reweighted least squares estimators. Indeed the one-step Huber-skip estimator corresponds to a reweighted least squares estimator with weights of zero or unity. Dollinger and Staudte considered a situation with smooth weights, hence ruling out Huber-skips, and gave conditions for convergence [12]. Their argument was cast in terms of influence functions. Our result for iteration of Huber-skip estimators is similar, but the employed tightness argument is different because of the non-smooth weight function.
Notation: The Euclidean norm for vectors x is denoted | x | . We write ( m , n ) if both m and n tend to infinity. We use the notation o P ( 1 ) and O P ( 1 ) implicitly assuming that n , and P means convergence in probability and D denotes convergence in distribution. For matrices M we choose the spectral norm | | M | | = max { eigen ( M M ) } 1 / 2 , so that | | x | | = | x | for vectors x .

2. The Model and the Definition of the One-step Huber-skip

We consider the multiple regression model with p regressors X
y i = β X i + ε i , i = 1 , , n ,
and ε i is assumed independent of ( X 1 , , X i , ε 1 , , ε i 1 ) with known density f , which does not have to be symmetric. These assumptions allow for both deterministic and stochastic regressors. In particular X i can be the lagged dependent variables as for an autoregressive process, and the process can be stationary or non-stationary.
We consider estimation of both β and σ 2 . Thus we start with some preliminary estimator ( β ^ n 0 , σ ^ n 0 2 ) and seek to improve it through an iterative procedure by using it to identify outliers, discard them and then run a regression on the remaining observations. The technical assumptions are listed in Assumption A, see §2.2 below, and allows the regressors to be deterministic or stochastic and stationary or trending.
The preliminary estimator ( β ^ n 0 , σ ^ n 0 2 ) could be a least squares estimator on the full sample, although that is not a good idea from a robustness viewpoint, see Welsh and Ronchetti [10]. Alternatively, the initial estimator, β ^ n 0 , could be chosen as a robust estimator, as for instance the least trimmed squares estimator of Rousseeuw [13], Rousseeuw and Leroy [14] (p. 180). When the trimming proportion is at most a half, this convergences in distribution at a usual n 1 / 2 -rate, see Víšek [15,16,17], and as σ ^ n 0 2 we would choose the least squares residual variance among the trimmed observations, bias corrected as in (2.7) below.
The outliers are identified by first choosing a ψ giving the proportion of good, central observations and then, because f is not assumed symmetric, introducing two critical values c ̲ and c ¯ so
c ̲ c ¯ f ( v ) d v = ψ and c ̲ c ¯ v f ( v ) d v = 0 .
This can also be written as τ 0 = ψ and τ 1 = 0 , where τ k are the truncated moments
τ k = c ̲ c ¯ v k f ( v ) d v for k N 0 .
If f is symmetric we find c = c ̲ = c ¯ and τ 2 k + 1 = 0 , k N 0 . Observations are retained based on ( β ^ n 0 , σ ^ n 0 2 ) if their residuals y i β ^ n 0 X i are in the interval [ c ̲ σ ^ n 0 , c ¯ σ ^ n 0 ] and otherwise deleted from the sample.
The Huber-skip, β ^ n H , is defined by minimizing
1 2 i = 1 n [ ( y i X i β ) 2 1 ( c ̲ σ y i X i β c ¯ σ ) + c ̲ 2 1 ( y i X i β c ̲ σ ) + c ¯ 2 1 ( c ¯ σ y i X i β ) ] ,
for a given σ . If the minimum is attained at a point of differentiability of the objective function, then the solution solves the equation
β ^ n H = ( i = 1 n X i X i 1 ( c ̲ σ y i X i β ^ n H c ¯ σ ) ) 1 i = 1 n X i y i 1 ( c ̲ σ y i X i β ^ n H c ¯ σ ) = g n ( β ^ n H ) .
We apply this to propose a sequence of recursively defined estimators ( β ^ n m , σ ^ n m 2 ) by starting with ( β ^ n 0 , σ ^ n 0 2 ) and defining for m , n = 1 , 2 ,
S n , m 1 = { i : c ̲ σ ^ n , m 1 y i X i β ^ n , m 1 c ¯ σ ^ n , m 1 } ,
β ^ n m = ( i S n , m 1 X i X i ) 1 i S n , m 1 X i y i ,
σ ^ n m 2 = ψ τ 2 1 ( i S n , m 1 1 ) 1 i S n , m 1 ( y i X i β ^ n , m ) 2 .
Thus, the iterated one-step Huber-skip estimators β ^ n m and σ ^ n m 2 are the least squares estimator of y i on X i among the retained observations in S n , m 1 based upon β ^ n , m 1 and σ ^ n , m 1 2 . The bias correction factor ψ τ 2 1 in σ ^ n m 2 is needed to obtain consistency.
Note that if β ^ n , m 1 and σ ^ n , m 1 are regression- and scale-equivariant, then the updated estimators β ^ n m and σ ^ n m are also regression- and scale-equivariant. Indeed, if y i is replaced by s y i + X i d for all i for a scalar s > 0 and a vector d , then β ^ n , m 1 and σ ^ n , m 1 are replaced by s β ^ n , m 1 + d and s σ ^ n , m 1 so that the sets S n , m 1 are unaltered, which in turn lead to regression- and scale-equivariance of β ^ n m and σ ^ n m .

2.1. Asymptotic Results

To obtain asymptotic results we need a normalisation matrix N for the regressors. If X i is stationary then N = n 1 / 2 I p . If X i is trending, a different normalisation is needed. For a linear trend component the normalisation is n 3 / 2 and for a random walk component it is n . We assume that N has been chosen such that matrices Σ and μ exist for which
Σ ^ n = N i = 1 n X i X i N D Σ > a . s . 0 , μ ^ n = n 1 / 2 N i = 1 n X i D μ .
Note that Σ and μ may be stochastic as for instance when X i is a random walk and N = n 1 .
The estimation errors are denoted
u ^ n m = N 1 ( β ^ n m β ) n 1 / 2 ( σ ^ n m σ ) ,
and the recursion defined in (2.5), (2.6), and (2.7) can expressed as
u ^ n m = G n ( u ^ n , m 1 ) .
We introduce coefficient matrices
Ψ ^ n 1 = ψ Σ ^ n 0 0 2 τ 2 , Ψ 2 = ξ 1 Σ ^ n ξ 2 μ ^ n ζ 2 μ ^ n ζ 3 ,
where
ξ n = ( c ¯ ) n f ( c ¯ ) ( c ̲ ) n f ( c ̲ ) , n = 0 , , 3 and ζ n = ξ n ξ n 2 τ 2 / ψ , n = 2 , 3 ,
and τ 2 is defined in (2.3), and define
Γ ^ n = Ψ ^ n 1 1 Ψ ^ n 2 = ψ 1 ξ 1 I p ψ 1 ξ 2 Σ ^ n 1 μ ^ n ( 2 τ 2 ) 1 ζ 2 μ ^ n ( 2 τ 2 ) 1 ζ 3 .
Here ( Γ ^ n , Ψ ^ n 1 , Ψ ^ n 2 ) D ( Γ , Ψ 1 , Ψ 2 ) , where the limits are defined similarly in terms of Σ and μ .
When f is symmetric we let c = c ̲ = c ¯ and find ζ 2 = ξ 2 = 0 , so that Γ is diagonal. Moreover from ξ 2 k + 1 = 2 c 2 k + 1 f ( c ) , we find ξ 1 / ψ = 2 c f ( c ) / ψ , and ζ 3 / ( 2 τ 2 ) = c 3 f ( c ) / τ 2 c f ( c ) / ψ and therefore Γ = diag { 2 c f ( c ) / ψ I p , c f ( c ) ( c 2 / τ 2 1 / ψ ) } .
Finally, we define a kernel
K n = Ψ ^ n 1 1 i = 1 n N X i ε i n 1 / 2 ( ε i 2 σ 2 τ 2 / ψ ) 1 c ̲ σ ε i σ c ¯ .
The analysis of the one-step estimator in Johansen and Nielsen [6] shows that, by linearising G n , the one-step estimation errors u ^ n m satisfy the recursion equation
u ^ n m = G n ( u ^ n , m 1 ) = Γ ^ n u ^ n , m 1 + K n + R n ( u ^ n , m 1 ) ,
for some remainder term R n ( u ^ n , m 1 ) . In this notation it is emphasized that the remainder term is a function of the previous estimation error u ^ n , m 1 , see Lemma 5.1 in the Appendix for a precise formulation.
It will be shown in Section 3 that if max | eigen ( Γ ) | < 1 a.s. so that Γ is a contraction, then
u ^ n m ( I 1 + p Γ ^ n ) 1 K n P 0 for ( m , n ) ,
that is, for any η and ϵ > 0 there exist m 0 and n 0 such that for m m 0 and n n 0 it holds that
P ( | u ^ n m ( I 1 + p Γ ^ n ) 1 K n | η ) ϵ .
We therefore define u ^ n * = ( I 1 + p Γ ^ n ) 1 K n and note that it satisfies the equation
u ^ n * = Γ ^ n u ^ n * + K n ,
and in this sense the estimation error of ( β , σ ) has the same limit distribution as the fixed point of the linear function u Γ ^ n u + K n .
Moreover it follows from Johansen and Nielsen [19] that, for the case of known σ = 1 and symmetric density, the Huber skip has the stochastic expansion
β ^ n H = ( I p , 0 ) ( I 1 + p Γ ^ n ) 1 K n + o P ( 1 )
and hence the same asymptotic distribution as ( I p , 0 ) u ^ n * .
Finally it holds that
n 1 / 2 ( β ^ n H β ^ n m ) P 0 for ( n , m ) .
Finally the asymptotic distribution of K n , and therefore u ^ n * , is discussed in Section 4.

2.2. Assumptions for the Asymptotic Analysis

The assumptions are fairly general, in particular we do not assume that f is symmetric.
Assumption A Consider model (2.1). Assume
( i ) The density f has continuous derivative f and satisfies
(a)
sup v R { ( 1 + v 4 ) f ( v ) + ( 1 + v 2 ) | f ( v ) | } < ,
(b)
it has mean zero, variance one, and finite fourth moment,
(c)
c ¯ , c ̲ are chosen so τ 0 = ψ and τ 1 = 0
( i i ) For a suitable normalization matrix N 0 , the regressors satisfy, jointly,
(a)
Σ ^ n = N i = 1 n X i X i N D Σ > a . s . 0 ,
(b)
μ ^ n = n 1 / 2 N i = 1 n X i D μ ,
(c)
max i n E | n 1 / 2 N X i | 4 = O ( 1 ) .
( i i i ) The initial estimator error satisfies
( N 1 ( β ^ n 0 β ) , n 1 / 2 ( σ ^ n 0 σ ) ) = O P ( 1 ) .

3. The Fixed Point Result

The fixed point result is primarily a tightness result. Thus, for the moment, only tightness of the kernel K n is needed, and it is not necessary to establish the limit distribution, which is discussed in Section 4. The first result is a tightness result for the kernel, see (2.13).
Theorem 3.1 Suppose Assumption A ( i b , i i c ) holds. Then K n , see (2.10) and (2.13), is tight, that is,
K n = Ψ ^ n 1 1 i = 1 n N X i ε i n 1 / 2 ( ε i 2 σ 2 τ 2 / ψ ) 1 c ̲ σ ε i σ c ¯ = O P ( 1 ) .
The proof follows from Chebyshev’s inequality and the details are given in the appendix.
The next result discusses one step of the iteration (2.14), and it is shown that the remainder term R n ( u ) in (2.14) vanishes in probability uniformly in | u | U .
Theorem 3.2 Let m be fixed. Suppose Assumption A holds for the initial estimator u ^ n , m 1 , see (2.8). Then, for all U > 0 , it holds that
u ^ n m = Γ ^ n u ^ n , m 1 + K n + R n ( u ^ n , m 1 ) ,
where the remainder term satisfies
sup | u | U | R n ( u ) | = o P ( 1 ) .
The proof involves a chaining argument that was given in Johansen and Nielsen [6], although there the result was written up in a slightly different way as discussed in the appendix.
The iterated estimators start with an initial estimator ( β ^ n 0 , σ ^ n 0 ) with tight estimation error, see Assumption A( i i i ). This is iterated through the one-step (2.14) and defines the sequence of estimation errors u ^ n m . We next show that this sequence is tight uniformly in m .
Theorem 3.3 Suppose Assumption A holds and that max | eigen ( Γ ) | < 1 a.s. so that Γ is a contraction. Then the sequence of estimation errors u ^ n m is tight uniformly in m
sup 0 m < | u ^ n m | = O P ( 1 ) .
That is, for all ϵ > 0 there exist U > 0 and n 0 > 0 , so that for all n n 0 it holds that
P ( sup 0 m < | u ^ n m | > U ) < ϵ .
The proof is given in the appendix, but the idea of the proof is to write the solution of the recursive relation (2.14) as
u ^ n m = Γ ^ n m u ^ n 0 + = 1 m Γ ^ n 1 { K n + R n ( u ^ n m ) } .
Then, if the initial estimator u ^ n 0 takes values in a large compact set with large probability, it follows from (3.1), by finite induction, that also u ^ n m takes values in the same compact set for all m , and therefore u ^ n m is tight uniformly in m .
Finally we give the fixed point result. Theorem 3.4 shows that the estimator has the same limit distribution as the solution of equation (2.15), u ^ n * = ( I p + 1 Γ ^ n ) 1 K n , which is a fixed point of the linear function u Γ ^ n u + K n .
Theorem 3.4 Suppose Assumption A holds and that max | eigen ( Γ ) | < 1 a.s. so that Γ is a contraction. Then
u ^ n m u ^ n * = u ^ n m ( I p + 1 Γ ^ n ) 1 K n P 0 f o r ( m , n ) .
That is, for all ϵ and η > 0 , an n 0 > 0 and m 0 > 0 exist so that for all n n 0 and m m 0 it holds
P ( | u ^ n m ( I p + 1 Γ ^ n ) 1 K n | > η ) < ϵ .
Using = 1 m Γ ^ n 1 = ( I p + 1 Γ ^ n ) 1 ( I p + 1 Γ ^ n m ) we find from (3.1) that
u ^ n m ( I p + 1 Γ ^ n ) 1 K n = Γ ^ n m ( u ^ n 0 ( I p + 1 Γ ^ n ) 1 K n ) + = 1 m Γ ^ n 1 R n ( u ^ n , m ) .
From (3.2) it can be seen that | u ^ n m ( I p + 1 Γ ^ n ) 1 K n | is the sum of two terms vanishing in probability, where the first decreases exponentially. The details are given in the Appendix.
In the special case where σ is known, then u ^ n m reduces to b ^ n m = N 1 ( β ^ n m β ) and Γ = ψ 1 ξ 1 I p , and β ^ n H becomes a fixed point of the mapping g n defined in (2.4). The estimator b ^ n * = ( ψ ξ 1 ) 1 Σ ^ n 1 i = 1 n N X i ε i 1 ( c ̲ σ < ε i c ¯ σ ) appears as the leading term for other robust estimators, such as the Least Trimmed Squares estimator discussed later on.
A necessary condition for the result is that the autoregressive coefficient matrix Γ is contracting. Therefore Γ is analyzed next.
Theorem 3.5 The autoregressive coefficient matrix Γ in (2.12) has p 1 eigenvalues equal to ξ 1 / ψ and two eigenvalues solving
λ 2 ( ζ 3 2 τ 2 + ξ 1 ψ ) λ + 1 2 τ 2 ψ ( ζ 3 ξ 1 ζ 2 ξ 2 μ Σ 1 μ ) = 0 ,
where the coefficients ζ n and ξ n are given in (2.11).
Further results can be given about the eigenvalues of Γ for symmetric densities, where ξ 2 = 0 , and Γ = diag ( ξ 1 ψ 1 I p , ζ 3 / ( 2 τ 2 ) ) . Note that the quantities ( c , τ , ξ n , ζ n ) all depend on ψ , see (2.2), (2.3), and (2.11). If f is symmetric, we show below, ( a ) , that ξ 1 < ψ and a condition, ( c ) , is given for ζ 3 < 2 τ 2 , in which case the eigenvalues of Γ are less than one, and Γ isa contraction. Finally ( d ) shows that Γ is a contraction if f is log-concave.
Theorem 3.6 Suppose f is symmetric with third moment, f ( c ) 0 for c > 0 , and lim c 0 f ( c ) < 0 . Then
( a ) 0 < ξ 1 / ψ < 1 for 0 < ψ < 1 while lim ψ 0 ξ 1 / ψ = 1 and lim ψ 1 ξ 1 / ψ = 0 ;
( b ) 0 < ζ 3 / ( 2 τ 2 ) for 0 < ψ < 1 and lim ψ 0 ζ 3 / ( 2 τ 2 ) = 1 and lim ψ 1 ζ 3 / ( 2 τ 2 ) = 0 ;
( c ) if [ c { log 0 c f ( x ) d x } ] < 0 for c > 0 then ζ 3 / ( 2 τ 2 ) < 1 for 0 < ψ < 1 ;
( d ) { log f ( c ) } < 0 [ c { log f ( c ) } ] < 0 [ c { log 0 c f ( x ) d x } ] < 0 .
The condition [ c { log 0 c f ( x ) d x } ] < 0 is satisfied for the Gaussian density that is log-concave and by t-densities that are not log-concave but satisfy [ c { log f ( c ) } ] < 0 . In the robust statistics literature, Rousseeuw uses the condition [ c { log f ( c ) } ] < 0 when discussing change-of-variance curves for M-estimators and assumes log-concave densities [18].
A consequence of Theorem 3.6 is that if f is symmetric, the roots of the coefficient matrix Γ are bounded away from unity for ψ 0 ψ 1 for all ψ 0 > 0 . The uniform distribution on [ a , a ] provides an example where Γ is not contracting since in this situation ξ 1 = ψ over the entire support. However, the weak unimodality condition f ( c ) 0 in Theorem 3.6 is not necessary, as long as the mode at the origin is large in comparison with other modes.

4. Distribution of the Kernel

It follows from Theorem 3.4 that u ^ n * = ( I p + 1 Γ ^ n ) 1 K n has the same limit as u ^ n m , and we therefore find the limit distribution of the kernel K n in a few situations.

4.1. Stationary Case

Suppose the regressors are a stationary time series. Then the limits Σ and μ in Assumption A ( i a , i b ) are deterministic and ( Σ ^ n , μ ^ n ) P ( Σ , μ ) . The central limit theorem then shows that
K n D N p + 1 ( 0 , Φ ) ,
where
Φ = ψ 2 σ 2 τ 2 Σ 1 ( 2 ψ τ 2 ) 1 σ 3 τ 3 Σ 1 μ ( 2 ψ τ 2 ) 1 σ 3 τ 3 μ Σ 1 4 1 σ 4 { τ 4 τ 2 2 ψ 1 } .
As a consequence, the fully iterated estimator has limit distribution
u ^ n * = ( I p + 1 Γ ^ n ) 1 K n D ( I p + 1 Γ ) 1 N p + 1 ( 0 , Φ ) .
In the special case where the errors are symmetric, we find
N 1 ( β ^ n * β ) = 1 ( ψ ξ 1 ) Σ 1 i = 1 n N X i ε i 1 | ε i | σ c + o P ( 1 ) D N p { 0 , σ 2 τ 2 ( ψ ξ 1 ) 2 Σ 1 } , n 1 / 2 ( σ ^ n * 2 σ 2 τ ψ / ψ ) = { 1 ζ 3 ( 2 τ 2 ) 1 } 1 i = 1 n n 1 / 2 ( ε i 2 σ 2 τ 2 ψ 1 ) 1 | ε i | σ c + o P ( 1 ) D N p { 0 , σ 4 τ 2 2 ( τ 4 ψ 1 τ 2 2 ) ( 2 τ 2 ζ 3 ) 2 } ,
noting that ψ > ξ 1 and ζ 3 > 2 τ 2 are satisfied for symmetric, unimodal distributions by Theorem 3.6 ( a , b ) .
The limiting distribution of N 1 ( β ^ n * β ) is also seen elsewhere in the robust statistics literature.
First, Víšek [15] (Theorem 1, p. 215) analysed the least trimmed squares estimator of Rousseeuw [13]. The estimator is given by
β ^ n L T S = arg min β R p i = 1 int ( n ψ ) r ( i ) 2 ( β ) ,
where r ( 1 ) 2 ( β ) < < r ( n ) 2 ( β ) are the ordered squared residuals r i = y i X i β . The estimator has the property that it does not depend on the scale of the problem. Víšek argued that in the symmetric case, the least trimmed squares estimator satisfies
N 1 ( β ^ n L T S β ) = 1 ( ψ ξ 1 ) Σ 1 i = 1 n N X i ε i 1 | ε i | c σ + o P ( 1 ) ,
that is, the main term is the same as for β ^ n * , and it follows from Theorem 3.4 that because β ^ n L T S and β ^ n * have the same expansions we have
| N 1 ( β ^ n m β ^ n L T S ) | P 0
for ( m , n ) . Thus β ^ n m can be seen as an approximation to the L T S estimator when there are no outliers.
Second, Jurečková, Sen, and Picek [4] (Theorem 5.5, p. 176) considered a pure location problem with regressor X i = 1 and known σ = 1 , and found an asymptotic expansion like (4.4) for the Huber-skip, and Johansen and Nielsen [19] showed the similar result for the general regression model. A consequence of this is that the iterated 1-step Huber-skip has the same limit distribution as the Huber-skip, and because β ^ n m and β ^ n H have the same expansion, it follows from Theorem 3.4 that
n 1 / 2 | β ^ n m β ^ n H | P 0 for ( m , n ) ,
so the iterated estimator is in this sense an approximation to the Huber-skip.

4.2. Deterministic Trends

As a simple example with i.i.d. errors, consider the regression
y i = β 1 + β 2 i + ε i ,
where ε i R satisfies Assumption A ( i ) . Define the normalisation
N = n 1 / 2 0 0 n 3 / 2 .
Then Assumption A ( i i ) is met with X i = ( 1 , i ) and
Σ = 1 1 / 2 1 / 2 1 / 3 , μ = 1 1 / 2 ,
and max i n E | n 1 / 2 N X i | 4 4 . The kernel has a limit distribution given by (4.1), where the matrix Φ in (4.2) is computed in terms of the Σ and μ derived in (4.6).
If the errors are autoregressive, the derivation is in principle similar, but involves a notationally tedious detrending argument. The argument is similar to that of Johansen and Nielsen [6] (Section 1.5.1), and (4.5) holds.

4.3. Unit Roots

Consider as an example the autoregression y i = β y i 1 + ε i , i = 1 , , n . If β = 1 then X i = y i 1 = y 0 + s = 1 i 1 ε s and we have to choose N = n 1 . By the functional Central Limit Theorem
n 1 / 2 i = 1 int ( n u ) ε i ε i 1 c ̲ σ ε i σ c ¯ ( ε i 2 σ 2 τ 2 / ψ ) 1 c ̲ σ ε i σ c ¯ D W x , u W 1 , u W 2 , u ,
where the limit is a Brownian motion with zero mean and variance
Φ W = σ 2 σ 2 τ 2 σ 3 τ 3 σ 2 τ 2 σ 2 τ 2 σ 3 τ 3 σ 3 τ 3 σ 3 τ 3 σ 4 { τ 4 τ 2 2 / ψ } .
Thus the limit variables Σ and μ in Assumption A ( i ) are
Σ = 0 1 W x , u 2 d u , μ = 0 1 W x , u d u ,
while the kernel has limit distribution
K n D Ψ 1 1 0 1 W x , u d W 1 , u W 2 , 1 ,
and (4.5) holds. Thus, when the density of ε i is symmetric, β ^ n * has limit distribution
n ( β ^ n * β ) D 0 1 W x , u d W 1 , u ( ψ ξ 1 ) 0 1 W x , u 2 d u .
When ψ 1 then ξ 1 0 and τ 2 1 so W 1 , u and W x , u become identical and the limit distribution becomes the usual Dickey–Fuller distribution. See also Johansen and Nielsen [6] (Section 1.5.4) for a related and more detailed derivation.

5. Discussion of Possible Extensions

The iteration result in Theorem 3.4 has a variety of extensions. An issue of interest in the literature is whether a slow initial convergence rate can be improved upon through iteration. This would open up for using robust estimators converging for instance at a n 1 / 3 rate as initial estimator. Such a result would complement the result of He and Portnoy, who find that the convergence rate cannot be improved in a single step by this procedure that applies least squares to the retained observations [20].
The key is to show that the remainder term of the one-step estimator in Theorem 3.2 remains small in an appropriately larger neighbourhood. The proof of Theorem 3.4 then applies the same way leading to the same fixed point result. The necessary techniques are developed by Johansen and Nielsen [21].
A related algorithm is the Forward Search of Atkinson, Riani, and Cerioli [7,22]. This involves finding an initial set of “good” observations using for instance the least trimmed squares estimator of Rousseeuw [13] and then increase the number of “good” observations using a recursive test procedure. The algorithm involves iteration of one-step Huber-skip estimators, see Johansen and Nielsen [23]. Again the key to its analysis is to improve Theorem 3.2, in this instance to hold uniformly in the cut-off fraction ψ , see Johansen and Nielsen for details [21].
Another algorithm of interest would be to analyse algorithms such as Autometrics of Hendry and Krolzig [24] and Doornik [25], which involves selection over observations as well as regressors.
In practice it is not a trivial matter to compute the least trimmed squares estimator of Rousseeuw [13]. A number of algorithms have been suggested in the literature, see for instance Hawkins and Olive [26]. Algorithms based on a “concentration” approach start with an initial trial fit that is iterated towards a final fit. It is possible that the abovementioned results will extend to shed some further light on the properties of such resampling algorithms.

Acknowledgments

The authors would like to thank the two referees for their useful comments. Søren Johansen is grateful to CREATES—Center for Research in Econometric Analysis of Time Series (DNRF78), funded by the Danish National Research Foundation. Bent Nielsen gratefully acknowledges financial support from the Programme of Economic Modelling, Oxford.

References

  1. P.J. Huber. “Robust estimation of a location parameter.” Ann. Math. Stat. 35 (1964): 73–101. [Google Scholar]
  2. R.A. Maronna, D.R. Martin, and V.J. Yohai. Robust Statistics: Theory and Methods. New York, NY, USA: Wiley, 2006. [Google Scholar]
  3. P.J. Huber, and E.M. Ronchetti. Robust Statistics, 2nd ed. New York, NY, USA: Wiley, 2009. [Google Scholar]
  4. J. Jurečková, P.K. Sen, and J. Picek. Methodological Tools in Robust and Nonparametric Statistics. London, UK: Chapman & Hall/CRC Press, 2012. [Google Scholar]
  5. D.F. Hendry, S. Johansen, and C. Santos. “Automatic selection of indicators in a fully saturated regression.” Computation. Stat. 23 (2008): 317–335, and Erratum 337-339. [Google Scholar]
  6. S. Johansen, and B. Nielsen. “An analysis of the indicator saturation estimator.” In The Methodology and Practice of Econometrics: A Festschrift in Honour of David F. Hendry. Edited by J.L. Castle and N. Shepard. Oxford, UK: Oxford University Press, 2009, pp. 1–36. [Google Scholar]
  7. A.C. Atkinson, M. Riani, and A. Cerioli. Exploring Multivariate Data with the Forward Search. New York, NY, USA: Springer, 2004. [Google Scholar]
  8. P.J. Bickel. “One-step Huber estimates in the linear model.” J. Am. Statist. Assoc. 70 (1975): 428–434. [Google Scholar] [CrossRef]
  9. D. Ruppert, and R.J. Carroll. “Trimmed least squares estimation in the linear model.” J. Am. Statist. Assoc. 75 (1980): 828–838. [Google Scholar] [CrossRef]
  10. A.H. Welsh, and E. Ronchetti. “A journey in single steps: robust one step M-estimation in linear regression.” J. Stat. Plan. Infer. 103 (2002): 287–310. [Google Scholar] [CrossRef]
  11. G. Cavaliere, and I. Georgiev. Exploiting Infinite Variance Through Dummy Variables in an AR Model. discussion paper; Lisbon, Spain: Universidade Nova de Lisboa, 2011. [Google Scholar]
  12. M.B. Dollinger, and R.G. Staudte. “Influence functions of iteratively reweighted least squares estimators.” J. Am. Statist. Assoc. 86 (1991): 709–716. [Google Scholar] [CrossRef]
  13. P.J. Rousseeuw. “Least median of squares regression.” J. Am. Statist. Assoc. 79 (1984): 871–880. [Google Scholar] [CrossRef]
  14. P.J. Rousseeuw, and A.M. Leroy. Robust Regression and Outlier Detection. New Jersey, NJ, USA: Wiley, 1987. [Google Scholar]
  15. J.Á. Víšek. “The least trimmed squares. Part I: Consistency.” Kybernetika 42 (2006): 1–36. [Google Scholar]
  16. J.Á. Víšek. “The least trimmed squares. Part II: n -consistency.” Kybernetika 42 (2006): 181–202. [Google Scholar]
  17. J.Á. Víšek. “The least trimmed squares. Part III: Asymptotic normality.” Kybernetika 42 (2006): 203–224. [Google Scholar]
  18. P.J. Rousseeuw. “Most robust M-estimators in the infinitesimal sense.” Zeitschrift für Warhscheinlichkeitstheorie und verwandte Gebiete 61 (1982): 541–551. [Google Scholar] [CrossRef]
  19. S. Johansen, and B. Nielsen. A stochastic expansion of the Huber-skip estimator for regression analysis. discussion paper; Copenhagen, Denmark: University of Copenhagen, work in progress; 2013. [Google Scholar]
  20. X. He, and S. Portnoy. “Reweighted LS estimators converge at the same rate as the initial estimator.” Ann. Stat. 20 (1992): 2161–2167. [Google Scholar]
  21. S. Johansen, and B. Nielsen. Asymptotic analysis of the Forward Search. discussion paper 13-01; Copenhagen, Denmark: University of Copenhagen, 2013. [Google Scholar]
  22. A.C. Atkinson, M. Riani, and A. Ceroli. “The forward search: Theory and data analysis.” J. Korean Stat. Soc. 39 (2010): 117–134. [Google Scholar]
  23. S. Johansen, and B. Nielsen. “Discussion: The forward search: Theory and data analysis.” J. Korean Stat. Soc. 39 (2010): 137–145. [Google Scholar] [CrossRef]
  24. D.F. Hendry, and H.-M. Krolzig. “The properties of automatic Gets modelling.” Economic J. 115 (2005): C32–C61. [Google Scholar] [CrossRef]
  25. J.A. Doornik. “Autometrics.” In The Methodology and Practice of Econometrics: A Festschrift in Honour of David F. Hendry. Edited by J.L. Castle and N. Shephard. Oxford, UK: Oxford University Press, 2009, pp. 88–121. [Google Scholar]
  26. D.M. Hawkins, and D.J. Olive. “Inconsistency of resampling algorithms for high-breakdown regression estimators and a new algorithm.” J. Am. Statist. Assoc. 97 (2002): 136–148. [Google Scholar] [CrossRef]
  27. R.S. Varga. Matrix Iterative Analysis, 2nd ed. Berlin, Germany: Springer, 2000. [Google Scholar]

Appendix

Proof of Theorem 3.1. The process
K ˜ n = i = 1 n N X i ε i n 1 / 2 ( ε i 2 σ 2 τ 2 / ψ ) 1 ( c ̲ σ ε i c ¯ σ )
is a martingale, we find that
E K ˜ n K ˜ n = σ 2 τ 2 i = 1 n E ( N X i X i N ) σ 3 τ 3 i = 1 n E ( N X i ) σ 3 τ 3 i = 1 n E ( N X i ) σ 4 ( τ 4 τ 2 2 ψ 1 ) .
Due to assumptions ( i i c ) , ( i i i b ) this is bounded in n. Chebyshev’s inequality gives P ( | K ˜ n | > C ) C 2 E | K ˜ n | 2 . Thus, both K ˜ and Ψ ^ n 1 1 , and hence their product, are tight.
The key to proving Theorem 3.2 is to understand the remainder terms of the moment matrices. This was done by Johansen and Nielsen [6]. As that paper was concerned only with the convergence of the 1-step estimator, the main Theorem 1.1 simply stated that the remainder terms vanish as n . A more detailed result can, however, be extracted from the proof. To draw that out, let a and b be the scale and location coordinates of u = ( b , a ) , respectively, and define, for g i , h i ( 1 , X i , ε i ) , the product moment matrices
S ˜ g h ( u ) = i = 1 n g i h i 1 { ( σ + n 1 / 2 a ) c ̲ < ε i X i N b ( σ + n 1 / 2 a ) c ¯ } .
Lemma 5.1 Suppose Assumption A holds. Define the remainder terms R 11 ( u ) , R x x ( u ) , R x 1 ( u ) , R x ε ( u ) , and R ε ε ( u ) by the equations
n 1 S ˜ 11 ( u ) = ψ + R 11 ( u ) , N S ˜ x x ( u ) N = ψ Σ ^ n + R x x ( u ) , n 1 / 2 N S ˜ x 1 ( u ) = ψ μ ^ n + R x 1 ( u ) , N S ˜ x ε ( u ) n 1 / 2 { S ˜ ε ε ( u ) σ 2 τ 2 ψ 1 S ˜ 11 ( u ) } = i = 1 n N X i ε i n 1 / 2 ( ε i 2 σ 2 τ 2 ψ 1 ) 1 ( c ̲ σ < ε i c ¯ σ ) + ξ 1 Σ ^ n ξ 2 μ ^ n σ ζ 2 μ ^ n σ ζ 3 b a + R x ε ( u ) R ε ε ( u ) ,
where, for notational convenience, the dependence of n in the remainder terms is suppressed. Then for all U > 0 and n it holds that
sup | u | < U { | R 11 ( u ) | + | R x x ( u ) | + | R x 1 ( u ) | + | R x ε ( u ) | + | R ε ε ( u ) | } = o P ( 1 ) .
Proof of Lemma 5.1. Theorem 1.1 in Johansen and Nielsen [6] states that | R 11 ( u ) | , | R x x ( u ) | , | R x 1 ( u ) | , | R ε ( u ) | , | R ε ε ( u ) | vanish when u is evaluated at u ^ = { N 1 ( β ^ β ) , n 1 / 2 ( σ ^ σ ) } under the assumption that u ^ = O P ( 1 ) , as n . The proof of that result then progresses by noting that assumption u ^ = O P ( 1 ) means that for all ϵ > 0 , a U exists so P ( | u | U ) < ϵ and therefore it suffices to prove that (5.1) holds. Therefore the proof of that theorem continues to prove precisely the statement (5.1), which is the desired result here. ■
Proof of Theorem 3.2. The updated estimator ( β ^ n m , σ ^ n m 2 ) is defined in (2.6) and (2.7) in terms of the initial estimator ( β ^ n , m 1 , σ ^ n , m 1 2 ) , and we express them in terms of S g h = S ˜ g h ( u ^ n , m 1 ) where u ^ n , m 1 = { N 1 ( β ^ n , m 1 β ) , n 1 / 2 ( σ ^ n , m 1 σ ) } , as follows
N 1 ( β ^ n m β ) = ( N S x x N ) 1 N S x ε , n 1 / 2 ( σ ^ n m 2 σ 2 ) = ψ τ 2 1 ( S 11 ) 1 n 1 / 2 { S ε ε S ε x N ( N S x x N ) 1 N S x ε σ 2 τ 2 ψ 1 S 11 } .
For u ^ n , m 1 = ( b ^ n , m 1 , a ^ n , m 1 ) we get, by inserting the definitions from Lemma 5.1,
b ^ n m = { ψ Σ ^ n + R x x ( u ^ n , m 1 ) } 1 { i = 1 n ( N X i ε i ) 1 ( c ̲ σ < ε i c ¯ σ ) + ξ 1 Σ ^ n b ^ n , m 1 + ξ 2 μ ^ n a ^ n , m 1 + R x ε ( u ^ n , m 1 ) } .
Since i = 1 n ( N X i ε i ) 1 ( c ̲ σ < ε i c ¯ σ ) is tight by Theorem 3.1, u ^ n , m 1 is O P ( 1 ) , and the remainders are vanishing by Lemma 5.1 for n , then
b ^ n m = ( ψ Σ ^ n ) 1 i = 1 n ( N X i ε i ) 1 ( c ̲ σ < ε i c ¯ σ ) + ( ψ Σ ^ n ) 1 ( ξ 1 Σ ^ n b ^ n , m 1 + ξ 2 μ ^ n a ^ n , m 1 ) + R b , n ( u ^ n , m 1 ) ,
where sup | u | < U | R b , n ( u ) | = o P ( 1 ) . From n 1 / 2 ( σ ^ n m 2 σ 2 ) = ( σ ^ n m + σ ) n 1 / 2 ( σ ^ n m σ ) = 2 σ a ^ n m ( 1 + o P ( 1 ) ) we find that a similar argument shows
a ^ n m = ( 2 σ τ 2 ) 1 n 1 / 2 i = 1 n ( ε i 2 ψ 1 σ 2 τ 2 ) 1 ( σ c ̲ < ε i σ c ¯ ) + ( 2 τ 2 ) 1 ( ζ 2 μ ^ n b ^ n , m 1 + ζ 3 a ^ n , m 1 ) + R a , n ( u ^ n , m 1 ) ,
where sup | u | < U | R a , n ( u ) | = o P ( 1 ) .
Proof of Theorem 3.3. We want to show that for all ϵ > 0 there exists U > 0 , and n 0 so that for n n 0 it holds
P ( sup 0 m < | u ^ n m | U ) 1 ϵ .
From the recursion (2.14) we find the representation
u ^ n m = Γ ^ n m u ^ n 0 + = 1 m Γ ^ n 1 { K n + R n ( u ^ n , m ) } .
The spectral norm and the Euclidean norm are compatible, | M x | | | M | | | x | , see Varga [27] (Theorem 1.5). Therefore it holds
| u ^ n m | | | Γ ^ n m | | | u ^ n 0 | + ( | K n | + max 0 m 1 | R n ( u ^ n ) | ) = 1 m | | Γ ^ n 1 | | .
By assumption a δ exists so that the spectral radius max | eigen ( Γ ) | < δ < 1 with large probability. Because Γ ^ n D Γ , then n 0 > 0 and δ < δ 0 < 1 exist so that for all n n 0 then max | eigen ( Γ ^ n ) | < δ 0 < 1 with probability larger than 1 ϵ / 2 . Then Gelfand’s formula, Varga [27] (Theorem 3.4), shows there is an m 0 > 0 so for all m > m 0 then | | Γ ^ n m | | δ 0 m . This in turn implies for some c > 1 , that max 0 m < | | Γ ^ n m | | < = 0 | | Γ ^ n | | < c , and hence
| u ^ n m | c { | u ^ n 0 | + | K n | + max 0 m 1 | R n ( u ^ n ) | } .
Because it is assumed that u ^ n 0 is tight, and the sequence { K n } is tight by Theorem 3.1, and max | u | U 1 | R n ( u ) | = o P ( 1 ) by Theorem 3.2, then constants U 0 > η / 2 , n 0 > 0 exist so that for n n 0 , the set
A n = ( max | eigen ( Γ ^ n ) | < δ 0 ) ( c | u ^ 0 | U 0 ) ( c | K n | U 0 ) ( c max | u | 3 U 0 | R n ( u ) | η / 2 )
has probability larger than 1 ϵ .
An induction over m is now used to show that sup 0 m < | u ^ n m | 3 U 0 on the set A n . As induction start, for m = 0 , then | u ^ n 0 | c 1 U 0 < 3 U 0 by the tightness assumption to u ^ 0 and c > 1 . The induction assumption is that max 0 m 1 | u ^ n | 3 U 0 . This implies that on the set A n then c max 0 m 1 | R n ( u ^ n ) | c max | u | 3 U 0 | R n ( u ) | η / 2 . Thus, the bound (5.4) becomes | u ^ n m | 2 U 0 + η / 2 3 U 0 . It follows that max 0 m | u ^ n | 3 U 0 . This proves (5.2) for U = 3 U 0 . ■
Proof of Theorem 3.4. We want to show that for all η , ϵ > 0 there is an n 0 and m 0 so that for n n 0 and m m 0 it holds that
P ( | u ^ n m ( I p + 1 Γ ^ n ) 1 K n | > η ) < ϵ .
In order to show (5.6), note that on the set A n we find = 1 m Γ ^ n 1 = ( I p + 1 Γ ^ n m ) ( I p + 1 Γ ^ n ) 1 where ( I p + 1 Γ ^ n ) 1 = = 0 Γ ^ n . Therefore equation (5.3) shows that
u ^ n m ( I p + 1 Γ ^ n ) 1 K n = Γ ^ n m { u ^ n 0 ( I p + 1 Γ ^ n ) 1 K n } + = 1 m Γ ^ n 1 R n ( u ^ n , m ) .
To bound this, note first that | | ( I p + 1 Γ ^ n ) 1 | | = | | = 0 Γ ^ n | | = 0 | | Γ ^ n | | < c . Thus on the set A n , see (5.5), it holds that
| u ^ n m ( I p + 1 Γ ^ n ) 1 K n | | | Γ ^ n m | | ( c 1 U 0 + U 0 ) + c max 0 m 1 | R n ( u ^ n ) | | | Γ ^ n m | | 2 U 0 + η / 2 .
Now, for m m 0 then | | Γ ^ n m | | δ 0 m . Since δ 0 m declines exponentially, m 0 can be chosen so large that it also holds that | | Γ ^ n m | | 2 U 0 η / 2 . Thus P ( | u ^ n m ( I p + 1 Γ ^ n ) 1 K n | η ) < ϵ , for m m 0 and n n 0 , which proves (5.6). ■
Proof of Theorem 3.5. The matrices Γ and Γ λ I p + 1 are of the form
a I p b c d ,
and the result follows from the identity
a det a I p b c d = det I p 0 c a det a I p b c d = det a I p b 0 a d c b = a p ( a d c b ) .
Proof of Theorem 3.6. ( a ) For c > 0 then f ( x ) 1 ( | x | c ) f ( c ) 1 ( | x | c ) because f is symmetric and non-increasing. Integration gives
ψ = c c f ( x ) d x 2 c f ( c ) = ξ 1 ,
where equality holds for f ( x ) = f ( c ) for | x | c , by continuity of f . This is, however, ruled out by assuming lim c 0 f ( c ) < 0 . It holds lim c 0 c 1 0 c f ( x ) d x = f ( 0 ) and lim c 0 ξ 1 / ( 2 c ) = f ( 0 ) so lim c 0 ξ 1 / ψ = 1 . Similarly, 0 f ( x ) d x = 1 and lim ψ 1 c f ( c ) 0 so lim ψ 1 ξ 1 / ψ = 0 .
( b ) We find
g ( c ) = ζ 3 / ( 2 τ 2 ) = ξ 3 / ( 2 τ 2 ) ξ 1 / ( 2 τ 0 ) = 2 c f ( c ) { 0 c ( c 2 x 2 ) f ( x ) d x } τ 2 τ 0 > 0 .
For c 0 , or ψ 0 , we find the approximations for k = 0 , 1 : τ 2 k = 2 0 c x 2 k f ( x ) d x 2 c 2 k + 1 f ( 0 ) / ( 2 k + 1 ) , which show that g ( c ) 1 .
For c , or ψ 1 , we find τ 0 1 , τ 2 1 and g ( c ) 2 c f ( c ) ( c 2 1 ) 0 because f is assumed to have finite third moment.
( c ) Using c τ 0 = 2 c f ( c ) we find from (5.7) that g ( c ) < 1 if
h ( c ) = c τ 0 τ 0 ( c 2 τ 0 τ 2 ) τ 2 = 2 c f ( c ) τ 0 { 0 c ( c 2 x 2 ) f ( x ) d x } τ 2 < 0 ,
and because the limit for c 0 is zero it is enough to show that h ( c ) < 0 .
We find
h ( c ) = ( c τ 0 τ 0 ) ( c 2 τ 0 τ 2 ) + c τ 0 τ 0 ( 2 c τ 0 + c 2 τ 0 τ 2 ) τ 2 = ( c τ 0 τ 0 ) ( c 2 τ 0 τ 2 ) ,
because the extra term vanishes:
c τ 0 τ 0 ( 2 c τ 0 + c 2 τ 0 τ 2 ) τ 2 = 2 c 2 f ( c ) + c 3 { 2 f ( c ) } 2 τ 0 2 c 3 f ( c ) 2 f ( c ) τ 0 2 c 2 f ( c ) = 0 .
Because c 2 τ 0 τ 2 > 0 and ( c τ 0 τ 0 ) = [ c { log 0 c f ( x ) d x } ] < 0 by assumption we find g ( c ) < 1 .
( d ) First, assume { log f ( c ) } < 0 and f ( c ) < 0 for c > 0 . Then
[ c { log f ( c ) } ] = { log f ( c ) } + c { log f ( c ) } = f ( c ) f ( c ) + c { log f ( c ) } < 0 .
Secondly, assume [ c { log f ( c ) } ] < 0 . Denote F ( c ) = 0 c f ( x ) d x . Then
[ c { log F ( c ) } ] = { c f ( c ) } F ( c ) c { f ( c ) } 2 { F ( c ) } 2 = f ( c ) { F ( c ) } 2 L ,
where L = [ 1 + c { log f ( c ) } ] F ( c ) c f ( c ) . Since f ( c ) 0 and F ( c ) > 0 for c > 0 it has to be argued that L < 0 . Now lim c 0 L = 0 so it suffices to argue that L < 0 for c < 0 . But L = [ c { log f ( c ) } ] F ( c ) , which is negative by assumption. ■

Share and Cite

MDPI and ACS Style

Johansen, S.; Nielsen, B. Outlier Detection in Regression Using an Iterated One-Step Approximation to the Huber-Skip Estimator. Econometrics 2013, 1, 53-70. https://doi.org/10.3390/econometrics1010053

AMA Style

Johansen S, Nielsen B. Outlier Detection in Regression Using an Iterated One-Step Approximation to the Huber-Skip Estimator. Econometrics. 2013; 1(1):53-70. https://doi.org/10.3390/econometrics1010053

Chicago/Turabian Style

Johansen, Søren, and Bent Nielsen. 2013. "Outlier Detection in Regression Using an Iterated One-Step Approximation to the Huber-Skip Estimator" Econometrics 1, no. 1: 53-70. https://doi.org/10.3390/econometrics1010053

Article Metrics

Back to TopTop