Next Article in Journal
The Identity of Information: How Deterministic Dependencies Constrain Information Synergy and Redundancy
Next Article in Special Issue
ϕ-Divergence in Contingency Table Analysis
Previous Article in Journal
Thermodynamic Analysis of an Irreversible Maisotsenko Reciprocating Brayton Cycle
Previous Article in Special Issue
Composite Likelihood Methods Based on Minimum Density Power Divergence Estimator
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Robustness Property of Robust-BD Wald-Type Test for Varying-Dimensional General Linear Models

1
Department of Statistics and Finance, School of Management, University of Science and Technology of China, Hefei 230026, China
2
Department of Statistics, University of Wisconsin-Madison, Madison, WI 53706, USA
*
Author to whom correspondence should be addressed.
Entropy 2018, 20(3), 168; https://doi.org/10.3390/e20030168
Submission received: 12 January 2018 / Revised: 1 March 2018 / Accepted: 1 March 2018 / Published: 5 March 2018

Abstract

:
An important issue for robust inference is to examine the stability of the asymptotic level and power of the test statistic in the presence of contaminated data. Most existing results are derived in finite-dimensional settings with some particular choices of loss functions. This paper re-examines this issue by allowing for a diverging number of parameters combined with a broader array of robust error measures, called “robust- BD ”, for the class of “general linear models”. Under regularity conditions, we derive the influence function of the robust- BD parameter estimator and demonstrate that the robust- BD Wald-type test enjoys the robustness of validity and efficiency asymptotically. Specifically, the asymptotic level of the test is stable under a small amount of contamination of the null hypothesis, whereas the asymptotic power is large enough under a contaminated distribution in a neighborhood of the contiguous alternatives, thus lending supports to the utility of the proposed robust- BD Wald-type test.

1. Introduction

The class of varying-dimensional “general linear models” [1], including the conventional generalized linear model ( GLM in [2]), is flexible and powerful for modeling a large variety of data and plays an important role in many statistical applications. In the literature, it has been extensively studied that the conventional maximum likelihood estimator for the GLM is nonrobust; for example, see [3,4]. To enhance the resistance to outliers in applications, many efforts have been made to obtain robust estimators. For example, Noh et al. [5] and Künsch et al. [6] developed robust estimator for the GLM , and Stefanski et al. [7], Bianco et al. [8] and Croux et al. [9] studied robust estimation for the logistic regression model with the deviance loss as the error measure.
Besides robust estimation for the GLM , robust inference is another important issue, which, however, receives relatively less attention. Basically, the study of robust testing includes two aspects: (i) establishing the stability of the asymptotic level under small departures from the null hypothesis (i.e., robustness of “validity”); and (ii) demonstrating that the asymptotic power is sufficiently large under small departures from specified alternatives (i.e., robustness of “efficiency”). In the literature, robust inference has been conducted for different models. For example, Heritier et al. [10] studied the robustness properties of the Wald, score and likelihood ratio tests based on M estimators for general parametric models. Cantoni et al. [11] developed a test statistic based on the robust deviance, and conducted robust inference for the GLM using quasi-likelihood as the loss function. A robust Wald-type test for the logistic regression model is studied in [12]. Ronchetti et al. [13] concerned the robustness property for the generalized method of moments estimators. Basu et al. [14] proposed robust tests based on the density power divergence (DPD) measure for the equality of two normal means. Robust tests for parameter change have been studied using the density-based divergence method in [15,16]. However, the aforementioned methods based on the GLM mostly focus on situations where the number of parameters is fixed and the loss function is specific.
Zhang et al. [1] developed robust estimation and testing for the “general linear model” based on a broader array of error measures, namely Bregman divergence, allowing for a diverging number of parameters. The Bregman divergence includes a wide class of error measures as special cases, e.g., the (negative) quasi-likelihood in regression, the deviance loss and exponential loss in machine learning practice, among many other commonly used loss functions. Zhang et al. [1] studied the consistency and asymptotic normality of their proposed robust- BD parameter estimator and demonstrated the asymptotic distribution of the Wald-type test constructed from robust- BD estimators. Naturally, it remains an important issue to examine the robustness property of the robust- BD Wald-type test [1] in the varying-dimensional case, i.e., whether the test still has stable asymptotic level and power, in the presence of contaminated data.
This paper aims to demonstrate the robustness property of the robust- BD Wald-type test in [1]. Nevertheless, it is a nontrivial task to address this issue. Although the local stability for the Wald-type tests have been established for the M estimators [10], generalized method of moment estimators [13], minimum density power divergence estimator [17] and general M estimators under random censoring [18], their results for finite-dimensional settings are not directly applicable to our situations with a diverging number of parameters. Under certain regularity conditions, we provide rigorous theoretical derivation for robust testing based on the Wald-type test statistic. The essential results are approximations of the asymptotic level and power under contaminated distributions of the data in a small neighborhood of the null and alternative hypotheses, respectively.
  • Specifically, we show in Theorem 1 that, if the influence function of the estimator is bounded, then the asymptotic level of the test is also bounded under a small amount of contamination.
  • We also demonstrate in Theorem 2 that, if the contamination belongs to a neighborhood of the contiguous alternatives, then the asymptotic power is also stable.
Hence, we contribute to establish the robustness of validity and efficiency for the robust- BD Wald-type test for the “general linear model” with a diverging number of parameters.
The rest of the paper is organized as follows. Section 2 reviews the Bregman divergence ( BD ), robust- BD estimation and the Wald-type test statistic proposed in [1]. Section 3 derives the influence function of the robust- BD estimator and studies the robustness properties of the asymptotic level and power of the Wald-type test under a small amount of contamination. Section 4 conducts the simulation studies. The technical conditions and proofs are given in Appendix A. A list of notations and symbols is provided in Appendix B.
We will introduce some necessary notations. In the following, C and c are generic finite constants which may vary from place to place, but do not depend on the sample size n. Denote by E K ( · ) the expectation with respect to the underlying distribution K. For a positive integer q, let 0 q = ( 0 , , 0 ) T R q be a q × 1 zero vector and I q be the q × q identity matrix. For a vector v = ( v 1 , , v q ) T R q , the L 1 norm is v 1 = i = 1 q | v i | , L 2 norm is v 2 = ( i = 1 q v i 2 ) 1 / 2 and the L norm is v = max i = 1 , , q | v i | . For a q × q matrix A, the L 2 and Frobenius norms of A are A 2 = { λ max ( A T A ) } 1 / 2 and A F = tr ( A A T ) , respectively, where λ max ( · ) denotes the largest eigenvalue of a matrix and tr ( · ) denotes the trace of a matrix.

2. Review of Robust- BD Estimation and Inference for “General Linear Models”

This section briefly reviews the robust- BD estimation and inference methods for the “general linear model” developed in [1]. Let { ( X n 1 , Y 1 ) , , ( X n n , Y n ) } be i . i . d . observations from some underlying distribution ( X n , Y ) with X n = ( X 1 , , X p n ) T R p n the explanatory variables and Y the response variable. The dimension p n is allowed to diverge with the sample size n. The “general linear model” is given by
m ( x n ) E ( Y X n = x n ) = F 1 ( x ˜ n T β ˜ n , 0 ) ,
and
var ( Y X n = x n ) = V ( m ( x n ) ) ,
where F is a known link function, β ˜ n , 0 R p n + 1 is the vector of unknown true regression parameters, x ˜ n = ( 1 , x n T ) T and V ( · ) is a known function. Note that the conventional generalized linear model ( GLM ) satisfying Equations (1) and (2) assumes that Y X n = x n follows a particular distribution in the exponential family. However, our “general linear model” does not require explicit form of distributions of the response. Hence, the “general linear model” includes the GLM as a special case. For notational simplicity, denote Z n = ( X n T , Y ) T and Z ˜ n = ( X ˜ n T , Y ) T .
Bregman divergence ( BD ) is a class of error measures, which is introduced in [19] and covers a wide range of loss functions. Specifically, Bregman divergence is defined as a bivariate function,
Q q ( ν , μ ) = q ( ν ) + q ( μ ) + ( ν μ ) q ( μ ) ,
where q ( · ) is the concave generating q-function. For example, q ( μ ) = a μ μ 2 for a constant a corresponds to the quadratic loss Q a ( Y , μ ) = ( Y μ ) 2 . For a binary response variable Y, q ( μ ) = min { μ , 1 μ } gives the misclassification loss Q q ( Y , μ ) = I { Y I ( μ > 0.5 ) } ; q ( μ ) = 2 { μ log ( μ ) + ( 1 μ ) log ( 1 μ ) } gives Bernoulli deviance loss Q q ( Y , μ ) = 2 { Y log ( μ ) + ( 1 Y ) log ( 1 μ ) } ; q ( μ ) = 2 min { μ , 1 μ } gives the hinge loss Q q ( Y , μ ) = max { 1 ( 2 Y 1 ) sign ( μ 0.5 ) , 0 } for the support vector machine; q ( μ ) = 2 { μ ( 1 μ ) } 1 / 2 yields the exponential loss Q q ( Y , μ ) = exp [ ( Y 0.5 ) log { μ / ( 1 μ ) } ] used in AdaBoost [20]. Furthermore, Zhang et al. [21] showed that if
q ( μ ) = a μ s μ V ( s ) d s ,
where a is a finite constant such that the integral is well-defined, then Q q ( y , μ ) is the “classical (negative) quasi-likelihood” function Q QL ( y , μ ) with Q QL ( y , μ ) / μ = ( y μ ) / V ( μ ) .
To obtain a robust estimator based on BD , Zhang et al. [1] developed the robust- BD loss function
ρ q ( y , μ ) = y μ ψ ( r ( y , s ) ) { q ( s ) V ( s ) } d s G ( μ ) ,
where ψ ( · ) is a bounded odd function, such as the Huber ψ -function [22], r ( y , s ) = ( y s ) / V ( s ) denotes the Pearson residual and G ( μ ) is the bias-correction term satisfying
G ( μ ) = G 1 ( μ ) { q ( μ ) V ( μ ) } ,
with
G 1 ( m ( x n ) ) = E { ψ ( r ( Y , m ( x n ) ) ) X n = x n } .
Based on robust- BD , the estimator of β ˜ n , 0 proposed in [1] is defined as
β ˜ ^ = arg min β ˜ 1 n i = 1 n ρ q ( Y i , F 1 ( X ˜ n i T β ˜ ) ) w ( X n i ) ,
where w ( · ) 0 is a known bounded weight function which downweights the high leverage points.
In [11], the “robust quasi-likelihood estimator” of β ˜ n , 0 is formulated according to the “robust quasi-likelihood function” defined as
Q RQL ( x n , y , μ ) = μ 0 μ ψ ( r ( y , s ) ) / V ( s ) d s w ( x n ) 1 n j = 1 n μ 0 μ j E { ψ ( r ( Y j , s ) ) | X n j } / V ( s ) d s w ( X n j ) ,
where μ = F 1 ( x ˜ n T β ˜ ) and μ j = μ j ( β ˜ ) = F 1 ( X ˜ n j T β ˜ ) , j = 1 , , n . To describe the intuition of the “robust- BD ”, we use the following diagram from [1], which illustrates the relation among the “robust- BD ”, “classical- BD ”, “robust quasi-likelihood” and “classical (negative) quasi-likelihood”.
Entropy 20 00168 i001
For the robust- BD , assume that
p j ( y ; θ ) = j θ j ρ q ( y , F 1 ( θ ) ) , j = 0 , 1 , ,
exist finitely up to any order required. For example, for j = 1 ,
p 1 ( y ; θ ) = { ψ ( r ( y , μ ) ) G 1 ( μ ) } { q ( μ ) V ( μ ) } / F ( μ ) ,
where μ = F 1 ( θ ) . Explicit expressions for p j ( y ; θ ) ( j = 2 , 3 ) can be found in Equation (3.7) of [1]. Then, the estimation equation for β ˜ ^ is
1 n i = 1 n ψ RBD ( Z n i ; β ˜ ) = 0 ,
where the score vector is
ψ RBD ( z n ; β ˜ ) = p 1 ( y ; θ ) w ( x n ) x ˜ n ,
with θ = x ˜ n T β ˜ . The consistency and asymptotic normality of β ˜ ^ have been studied in [1]; see Theorems 1 and 2 therein.
Furthermore, to conduct statistical inference for the “general linear model”, the following hypotheses are considered,
H 0 : A n β ˜ n , 0 = g 0 versus H 1 : A n β ˜ n , 0 g 0 ,
where A n is a given k × ( p n + 1 ) matrix such that A n A n T G with G being a k × k positive-definite matrix, and g 0 is a known k × 1 vector.
To perform the test of Equation (8), Zhang et al. [1] proposed the Wald-type test statistic,
W n = n ( A n β ˜ ^ g 0 ) T ( A n H ^ n 1 Ω ^ n H ^ n 1 A n T ) 1 ( A n β ˜ ^ g 0 ) ,
constructed from the robust- BD estimator β ˜ ^ in Equation (5), where
Ω ^ n = 1 n i = 1 n p 1 2 ( Y i ; X ˜ n i T β ˜ ^ ) w 2 ( X n i ) X ˜ n i X ˜ n i T , H ^ n = 1 n i = 1 n p 2 ( Y i ; X ˜ n i T β ˜ ^ ) w ( X n i ) X ˜ n i X ˜ n i T .
The asymptotic distributions of W n under the null and alternative hypotheses have been developed in [1]; see Theorems 4–6 therein.
On the other hand, the issue on the robustness of W n , used for possibly contaminated data, remains unknown. Section 3 of this paper will address this issue with detailed derivations.

3. Robustness Properties of W n in Equation (9)

This section derives the influence function of the robust- BD Wald-type test and studies the influence of a small amount of contamination on the asymptotic level and power of the test. The proofs of the theoretical results are given in Appendix A.
Denote by K n , 0 the true distribution of Z n following the “general linear model” characterized by Equations (1) and (2). To facilitate the discussion of robustness properties, we consider the ϵ -contamination,
K n , ϵ = 1 ϵ n K n , 0 + ϵ n J ,
where J is an arbitrary distribution and ϵ > 0 is a constant. Then, K n , ϵ is a contaminated distribution of Z n with the amount of contamination converging to 0 at rate 1 / n . Denote by K n the empirical distribution of { Z n i } i = 1 n .
For a generic distribution K of Z n , define
K ( β ˜ ) = E K { ρ q ( Y , F 1 ( X ˜ n T β ˜ ) ) w ( X n ) } , S K = { β ˜ : E K { ψ RBD ( Z n ; β ˜ ) } = 0 } ,
where ρ q ( · , · ) and ψ RBD ( · ; · ) are defined in Equations (4) and (7), respectively. It’s worth noting that the solution to E K { ψ RBD ( Z n ; β ˜ ) } = 0 may not be unique, i.e., S K may contain more than one element. We then define a functional for the estimator of β ˜ n , 0 as follows,
T ( K ) = arg min β ˜ S K β ˜ β ˜ n , 0 .
From the result of Lemma A1 in Appendix A, T ( K n , ϵ ) is the unique local minimizer of K n , ϵ ( β ˜ ) in the p n / n -neighborhood of β ˜ n , 0 . Particularly, T ( K n , 0 ) = β ˜ n , 0 . Similarly, from Lemma A2 in Appendix A, T ( K n ) is the unique local minimizer of K n ( β ˜ ) which satisfies T ( K n ) β ˜ n , 0 = O P ( p n / n ) .
From [23] (Equation (2.1.6) on pp. 84), the influence function of T ( · ) at K n , 0 is defined as
IF ( z n ; T , K n , 0 ) = t T ( ( 1 t ) K n , 0 + t Δ z n ) | t = 0 = lim t 0 T ( ( 1 t ) K n , 0 + t Δ z n ) β ˜ n , 0 t ,
where Δ z n is the probability measure which puts mass 1 at the point z n . Since the dimension of T ( · ) diverges with n, its influence function is defined for each fixed n. From Lemma A8 in Appendix A, under certain regularity conditions, the influence function exists and has the following expression:
IF ( z n ; T , K n , 0 ) = H n 1 ψ RBD ( z n ; β ˜ n , 0 ) ,
where H n = E K n , 0 { p 2 ( Y ; X ˜ n T β ˜ n , 0 ) w ( X n ) X ˜ n X ˜ n T } . The form of the influence function for diverging p n in Equation(13) coincides with that in [23,24] for fixed p n .
In our theoretical derivations, approximations of the asymptotic level and power of W n will involve the following matrices:
Ω n = E K n , 0 { p 1 2 ( Y ; X ˜ n T β ˜ n , 0 ) w 2 ( X n ) X ˜ n X ˜ n T } , U n = A n H n 1 Ω n H n 1 A n T .

3.1. Asymptotic Level of W n under Contamination

We now investigate the asymptotic level of the Wald-type test W n under the ϵ -contamination.
Theorem 1.
Assume Conditions A0–A9 and B4 in Appendix A. Suppose p n 6 / n 0 as n , sup n E J ( w ( X n ) X ˜ n ) C . Denote by α ( K n , ϵ ) the level of W n = n { A n T ( K n ) g 0 } T ( A n H ^ n 1 Ω ^ n H ^ n 1 A n T ) 1 { A n T ( K n ) g 0 } when the underlying distribution is K n , ϵ in Equation (10) and by α 0 the nominal level. Under H 0 in Equation (8), it follows that
lim sup n α ( K n , ϵ ) = α 0 + ϵ 2 μ k D + o ( ϵ 2 ) as ϵ 0 ,
where
D = lim sup n U n 1 / 2 A n E J { IF ( Z n ; T , K n , 0 ) } 2 < ,
μ k = δ H k ( η 1 α 0 ; δ ) | δ = 0 , H k ( · ; δ ) is the cumulative distribution function of a χ k 2 ( δ ) distribution, and η 1 α 0 is the 1 α 0 quantile of the central χ k 2 distribution.
Theorem 1 indicates that if the influence function for T ( · ) is bounded, then the asymptotic level of W n under the ϵ -contamination is also bounded and close to the nominal level when ϵ is sufficiently small. As a comparison, the robustness property in [10] of the Wald-type test is studied based on M-estimator for general parametric models with a fixed dimension p n . They assumed certain conditions that guarantee Fréchet differentiability which further implies the existence of the influence function and the asymptotic normality of the corresponding estimator. However, in the set-ups of our paper, it’s difficult to check those conditions, due to the use of Bregman divergence and the diverging dimension p n . Hence, the assumptions we make in Theorem 1 are different from those in [10], and are comparatively mild and easy to check. Moreover, the result of Theorem 1 cannot be easily derived from that of [10].
In Theorem 1, p n is allowed to diverge with p n 6 / n = o ( 1 ) , which is slower than that in [1] with p n 5 / n = o ( 1 ) . Theoretically, the assumption p n 5 / n = o ( 1 ) is required to obtain the asymptotic distribution of W n in [1]. Furthermore, to derive the limit distribution of W n under the ϵ -contamination, assumption p n 6 / n = o ( 1 ) is needed (see Lemma A7 in Appendix A). Hence, the reason that our assumption is stronger than that in [1] is the consideration of the ϵ -contamination of the data. Practically, due to the advancement of technology and different forms of data gathering, large dimension becomes a common characteristic and hence the varying-dimensional model has a wide range of applications, e.g., brain imaging data, financial data, web term-document data and gene expression data. Even some of the classical settings, e.g., the Framingham heart study with n = 25 , 000 and p n = 100 , can be viewed as varying-dimensional cases.
As an illustration, we apply the general result of Theorem 1 to the special case of a point mass contamination.
Corollary 1.
With the notations in Theorem 1, assume Conditions A 0 A 9 in Appendix A, sup x n R p n w ( x n ) x n C and sup μ R | q ( μ ) V ( μ ) / F ( μ ) | C .
(i)
If p n p , A n A , β ˜ n , 0 β ˜ 0 , K n , 0 K 0 and U n U are fixed, then, for K n , ϵ = ( 1 ϵ / n ) K 0 + ϵ / n Δ z with z R p a fixed point, under H 0 in Equation (8), it follows that
sup z R p lim n α ( K n , ϵ ) = α 0 + ϵ 2 μ k D 1 + o ( ϵ 2 ) as ϵ 0 ,
where
D 1 = sup z R p U 1 / 2 A IF ( z ; T , K 0 ) 2 < .
(ii)
If p n diverges with p n 6 / n 0 , for K n , ϵ = ( 1 ϵ / n ) K n , 0 + ϵ / n Δ z n with z n R p n a sequence of deterministic points, then, under H 0 in Equation (8),
sup C 0 > 0 sup z n S C 0 lim sup n α ( K n , ϵ ) = α 0 + ϵ 2 μ k D 2 + o ( ϵ 2 ) as ϵ 0 ,
where S C 0 = { z n = ( x n T , y ) T : x n C 0 } , C 0 > 0 is a constant and
D 2 = sup C 0 > 0 sup z n S C 0 lim sup n U n 1 / 2 A n IF ( z n ; T , K n , 0 ) 2 < .
In Corollary 1, conditions sup x n R p n w ( x n ) x n C and sup μ R | q ( μ ) V ( μ ) / F ( μ ) | C are needed to guarantee the boundedness of the score function in Equation (7). Particularly, the function w ( x n ) downweights the high leverage points and can be chosen as, e.g., w ( x n ) = 1 / ( 1 + x n ) . The condition sup μ R | q ( μ ) V ( μ ) / F ( μ ) | C is needed to bound Equation (6), and is satisfied in many situations.
  • For example, for the linear model with q ( μ ) = a μ μ 2 , V ( μ ) = σ 2 and F ( μ ) = μ , where a and σ 2 are constants, we observe | q ( μ ) V ( μ ) / F ( μ ) | = 2 σ C .
  • Another example is the logistic regression model with binary response and q ( μ ) = 2 { μ log ( μ ) + ( 1 μ ) log ( 1 μ ) } (corresponding to Bernoulli deviance loss), V ( μ ) = μ ( 1 μ ) , F ( μ ) = log { μ / ( 1 μ ) } . In this case, | q ( μ ) V ( μ ) / F ( μ ) | = 2 { μ ( 1 μ ) } 1 / 2 C since μ [ 0 , 1 ] . Likewise, if q ( μ ) = 2 { μ ( 1 μ ) } 1 / 2 (for the exponential loss), then | q ( μ ) V ( μ ) / F ( μ ) | = 1 / 2 .
Furthermore, the bound on ψ ( · ) is useful to control deviations in the Y-space, which ensures the stability of the robust- BD test if Y is arbitrarily contaminated.
Concerning the dimensionality p n , Corollary 1 reveals the following implications. If p n is fixed, then the asymptotic level of W n under the ϵ -contamination is uniformly bounded for all z R p , which implies the robustness of validity of the test. This result coincides with that in Proposition 5 of [10]. When p n diverges, the asymptotic level is still stable if the point contamination satisfies x n C 0 , where C 0 > 0 is an arbitrary constant. Although this condition may not be the weakest, it still covers a wide range of point mass contaminations.

3.2. Asymptotic Power of W n under Contamination

Now, we will study the asymptotic power of W n under a sequence of contiguous alternatives of the form
H 1 n : A n β ˜ n , 0 g 0 = n 1 / 2 c ,
where c = ( c 1 , , c k ) T 0 is fixed.
Theorem 2.
Assume Conditions A 0 A 9 and B 4 in Appendix A. Suppose p n 6 / n 0 as n , sup n E J ( w ( X n ) X ˜ n ) C . Denote by β ( K n , ϵ ) the power of W n = n { A n T ( K n ) g 0 } T ( A n H ^ n 1 Ω ^ n H ^ n 1 A n T ) 1 { A n T ( K n ) g 0 } when the underlying distribution is K n , ϵ in Equation (10) and by β 0 the nominal power. Under H 1 n in Equation (14), it follows that
lim inf n β ( K n , ϵ ) = β 0 + ϵ ν k B + o ( ϵ ) as ϵ 0 ,
where
B = lim inf n 2 c T U n 1 A n E J { IF ( Z n ; T , K n , 0 ) } ,
with | B | < , ν k = δ H k ( η 1 α 0 ; δ ) | δ = c T U n 1 c and H k ( · ; δ ) and η 1 α 0 being defined in Theorem 1.
The result for the asymptotic power is similar in spirit to that for the level. From Theorem 2, if the influence function is bounded, the asymptotic power is also bounded from below and close to the nominal power under a small amount of contamination. This means that the robust- BD Wald-type test enjoys the robustness of efficiency. In addition, the property of the asymptotic power can be obtained for a point mass contamination.
Corollary 2.
With the notations in Theorem 2, assume Conditions A 0 A 9 in Appendix A, sup x n R p n w ( x n ) x n C and sup μ R | q ( μ ) V ( μ ) / F ( μ ) | C .
(i)
If p n p , A n A , β ˜ n , 0 β ˜ 0 , K n , 0 K 0 and U n U are fixed, then, for K n , ϵ = ( 1 ϵ / n ) K 0 + ϵ / n Δ z with z R p a fixed point, under H 1 n in Equation (14), it follows that
inf z R p lim n β ( K n , ϵ ) = β 0 + ϵ ν k B 1 + o ( ϵ ) as ϵ 0 ,
where
B 1 = inf z R p 2 c T U 1 A IF ( z ; T , K 0 ) ,
with | B 1 | < .
(ii)
If p n diverges with p n 6 / n 0 , for K n , ϵ = ( 1 ϵ / n ) K n , 0 + ϵ / n Δ z n with z n R p n a sequence of deterministic points, then, under H 1 n in Equation (14),
inf C 0 > 0 inf z n S C 0 lim inf n β ( K n , ϵ ) = β 0 + ϵ ν k B 2 + o ( ϵ ) as ϵ 0 ,
where S C 0 = { z n = ( x n T , y ) T : x n C 0 } , C 0 > 0 is a constant and
B 2 = inf C 0 > 0 inf z n S C 0 lim inf n 2 c T U n 1 A n IF ( Z n ; T , K n , 0 ) ,
with | B 2 | < .

4. Simulation

Regarding the practical utility of W n , numerical studies concerning the empirical level and power of W n under a fixed amount of contamination have been conducted in Section 6 of [1]. To support the theoretical results in our paper, we conduct new simulations to check the robustness of validity and efficiency of W n . Specifically, we will examine the empirical level and power of the test statistic as ϵ varies.
The robust- BD estimation utilizes the Huber ψ -function ψ c ( · ) with c = 1.345 and the weight function w ( X n ) = 1 / ( 1 + X n ) . Comparisons are made with the classical non-robust counterparts corresponding to using ψ ( r ) = r and w ( x n ) 1 . For each situation below, we set n = 1000 and conduct 400 replications.

4.1. Overdispersed Poisson Responses

Overdispersed Poisson counts Y, satisfying var ( Y | X n = x n ) = 2 m ( x n ) , are generated via a negative Binomial ( m ( x n ) , 1 / 2 ) distribution. Let p n = 4 ( n 1 / 5 . 5 1 ) and β ˜ n , 0 = ( 0 , 2 , 0 , , 0 ) T , where · denotes the floor function. Generate X n i = ( X i , 1 , , X i , p n ) T by X i , j i . i . d . Unif [ 0.5 , 0.5 ] . The log link function is considered and the (negative) quasi-likelihood is utilized as the BD, generated by the q-function in Equation (3) with V ( μ ) = μ . The estimator and test statistic are calculated by assuming Y follows Poisson distribution.
The data are contaminated by X i , mod ( i , p n 1 ) + 1 = 3 sign ( U i 0.5 ) and Y i = Y i I ( Y i > 20 ) + 20 I ( Y i 20 ) for i = 1 , , k , with k { 2 , 4 , 6 , 8 , 10 , 12 , 14 , 16 } the number of contaminated data points, where mod ( a , b ) is the modulo operation “a modulo b” and { U i } i . i . d . Unif ( 0 , 1 ) . Then, the proportion of contaminated data, k / n , is equal to ϵ / n as in Equation (10), which implies ϵ = k / n .
Consider the null hypothesis H 0 : A n β ˜ n , 0 = 0 with A n = ( 0 , 0 , 0 , 1 , 0 , , 0 ) . Figure 1 plots the empirical level of W n versus ϵ . We observe that the asymptotic nominal level 0.05 is approximately retained by the robust Wald-type test. On the other hand, under contaminations, the non-robust Wald-type test breaks in level, showing high sensitivity to the presence of outliers.
To assess the stability of the power of the test, we generate the original data from the true model, but with the true parameter β ˜ n , 0 replaced by β ˜ n = β ˜ n , 0 + δ c with δ { 0.4 , 0.4 , 0.6 , 0.6 } and c = ( 1 , , 1 ) T a vector of ones. Figure 2 plots the empirical rejection rates of the null model, which implies that the robust Wald-type test has sufficiently large power to detect the alternative hypothesis. In addition, the power of the robust method is generally larger than that of the non-robust method.

4.2. Bernoulli Responses

We generate data with two classes from the model, Y | X n = x n Bernoulli { m ( x n ) } , where logit { m ( x n ) } = x ˜ n T β ˜ n , 0 . Let p n = 2 , β ˜ n , 0 = ( 0 , 1 , 1 ) T and X n i i . i . d . N ( 0 , I p n ) . The null hypothesis is H 0 : β ˜ n , 0 = ( 0 , 1 , 1 ) T . Both the deviance loss and the exponential loss are employed as the BD. We contaminate the data by setting X i , 1 = 2 + i / 8 and Y i = 0 for i = 1 , , k with k { 2 , 4 , 6 , 8 , 10 , 12 , 14 , 16 } . To investigate the robustness of validity of W n , we plot the observed level versus ϵ in Figure 3. We find that the level of the non-robust method diverges fast as ϵ increases. It’s also clear that the empirical level of the robust method is close to the nominal level when ϵ is small and increases slightly with ϵ , which coincides with our results in Theorem 1.
To assess the stability of the power of W n , we generate the original data from the true model, but with the true parameter β ˜ n , 0 replaced by β ˜ n = β ˜ n , 0 + δ c with δ { 0.1 , 0.2 , 0.3 , 0.4 } and c = ( 1 , , 1 ) T a vector of ones. Figure 4 plots the power of the Wald-type test versus ϵ , which implies that the robust method has sufficiently large power, and hence supports the theoretical results in Theorem 2.

Acknowledgments

We thank the two referees for insightful comments and suggestions. Chunming Zhang’s research is supported by the U.S. NSF Grants DMS–1712418, DMS–1521761, the Wisconsin Alumni Research Foundation and the National Natural Science Foundation of China, grants 11690014. Xiao Guo’s research is supported by the Fundamental Research Funds for the Central Universities and the National Natural Science Foundation of China, grants 11601500, 11671374 and 11771418.

Author Contributions

Chunming Zhang conceived and designed the experiments; Xiao Guo performed the experiments; Xiao Guo analyzed the data; Chunming Zhang contributed to analysis tools; Chunming Zhang and Xiao Guo wrote the paper.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Conditions and Proofs of Main Results

We first introduce some necessary notations used in the proof.
Notations. 
For arbitrary distributions K and K of Z n , define
Ω n , K , T ( K ) = E K { p 1 2 ( Y ; X ˜ n T T ( K ) ) w 2 ( X n ) X ˜ n X ˜ n T } , H n , K , T ( K ) = E K { p 2 ( Y ; X ˜ n T T ( K ) ) w ( X n ) X ˜ n X ˜ n T } .
Therefore, Ω n = Ω n , K n , 0 , β ˜ n , 0 , H n = H n , K n , 0 , β ˜ n , 0 , Ω ^ n = Ω n , K n , T ( K n ) and H ^ n = H n , K n , T ( K n ) . For notational simplicity, let Ω n , ϵ = Ω n , K n , ϵ , T ( K n , ϵ ) and H n , ϵ = H n , K n , ϵ , T ( K n , ϵ ) .
Define the following matrices,
U ( K n , ϵ ) = A n H n , ϵ 1 Ω n , ϵ H n , ϵ 1 A n T , U ( K n ) = A n H ^ n 1 Ω ^ n H ^ n 1 A n T .
The following conditions are needed in the proof, which are adopted from [1].
Condition A.
A0.
sup n 1 β ˜ n , 0 1 < .
A1.
w ( · ) is a bounded function. Assume that ψ ( r ) is a bounded, odd function, and twice differentiable, such that ψ ( r ) , ψ ( r ) r , ψ ( r ) , ψ ( r ) r and ψ ( r ) r 2 are bounded; V ( · ) > 0 , V ( 2 ) is continuous.
A2.
q ( 4 ) ( · ) is continuous, and q ( 2 ) ( · ) < 0 . G 1 ( 3 ) is continuous.
A3.
F ( · ) is monotone and a bijection, F ( 3 ) ( · ) is continuous, and F ( 1 ) ( · ) 0 .
A4.
X n C almost surely if the underlying distribution is K n , 0 .
A5.
E K n , 0 ( X ˜ n X ˜ n T ) exists and is nonsingular.
A6.
There is a large enough open subset of R p n + 1 which contains β ˜ n , 0 , such that F 1 ( x ˜ n T β ˜ ) is bounded for all β ˜ in the subset and all x ˜ n such that x ˜ n C , where C > 0 is a large enough constant.
A7.
H n is positive definite, with eigenvalues uniformly bounded away from 0.
A8.
Ω n is positive definite, with eigenvalues uniformly bounded away from 0.
A9.
H n 1 Ω n is bounded away from .
Condition B.
B4.
X n C almost surely if the underlying distribution is J.
The following Lemmas A1–A9 are needed to prove the main theoretical results in this paper.
Lemma A1 ( T ( K n , ϵ ) β ˜ n , 0 ).
Assume Conditions A0–A7 and B4. For K n , ϵ in Equation (10), K ( · ) in Equation (11) and T ( · ) in Equation (12), if p n 4 / n 0 as n , then T ( K n , ϵ ) is a local minimizer of K n , ϵ ( β ˜ ) such that T ( K n , ϵ ) β ˜ n , 0 = O ( p n / n ) . Furthermore, T ( K n , ϵ ) is unique.
Proof. 
We follow the idea of the proof in [25]. Let r n = p n / n and u ˜ n = ( u 0 , u 1 , , u p n ) T R p n + 1 . First, we show that there exists a sufficiently large constant C such that, for large n, we have
inf u ˜ n = C K n , ϵ ( β ˜ n , 0 + r n u ˜ n ) > K n , ϵ ( β ˜ n , 0 ) .
To show Equation (A1), consider
K n , ϵ ( β ˜ n , 0 + r n u ˜ n ) K n , ϵ ( β ˜ n , 0 ) = E K n , ϵ { ρ q ( Y , F 1 ( X ˜ n T β ˜ n , 0 + r n X ˜ n T u ˜ n ) ) w ( X n ) ρ q ( Y , F 1 ( X ˜ n T β ˜ n , 0 ) ) w ( X n ) } I 1 ,
where u ˜ n = C .
By Taylor expansion,
I 1 = I 1 , 1 + I 1 , 2 + I 1 , 3 ,
where
I 1 , 1 = r n E K n , ϵ { p 1 ( Y ; X ˜ n T β ˜ n , 0 ) w ( X n ) X ˜ n T } u ˜ n , I 1 , 2 = r n 2 / 2 E K n , ϵ { p 2 ( Y ; X ˜ n T β ˜ n , 0 ) w ( X n ) ( X ˜ n T u ˜ n ) 2 } , I 1 , 3 = r n 3 / 6 E K n , ϵ { p 3 ( Y ; X ˜ n T β ˜ n ) w ( X n ) ( X ˜ n T u ˜ n ) 3 } ,
for β ˜ n located between β ˜ n , 0 and β ˜ n , 0 + r n u ˜ n . Hence
| I 1 , 1 | r n E K n , ϵ { p 1 ( Y ; X ˜ n T β ˜ n , 0 ) w ( X n ) X ˜ n } u ˜ n = r n ϵ n E J { p 1 ( Y ; X ˜ n T β ˜ n , 0 ) w ( X n ) X ˜ n } u ˜ n C r n p n / n u ˜ n ,
since E J { p 1 ( Y ; X ˜ n T β ˜ n , 0 ) w ( X n ) X ˜ n } = O ( p n ) and E K n , 0 { p 1 ( Y ; X ˜ n T β ˜ n , 0 ) w ( X n ) X ˜ n } = 0 . For I 1 , 2 in Equation (A2),
I 1 , 2 = r n 2 2 E K n , 0 { p 2 ( Y ; X ˜ n T β ˜ n , 0 ) w ( X n ) ( X ˜ n T u ˜ n ) 2 } + r n 2 2 [ E K n , ϵ { p 2 ( Y ; X ˜ n T β ˜ n , 0 ) w ( X n ) ( X ˜ n T u ˜ n ) 2 } E K n , 0 { p 2 ( Y ; X ˜ n T β ˜ n , 0 ) w ( X n ) ( X ˜ n T u ˜ n ) 2 } ] I 1 , 2 , 1 + I 1 , 2 , 2 ,
where I 1 , 2 , 1 = 2 1 r n 2 u ˜ n T H n u ˜ n . Meanwhile, we have
| I 1 , 2 , 2 | r n 2 E K n , ϵ { p 2 ( Y ; X ˜ n T β ˜ n , 0 ) w ( X n ) X ˜ n X ˜ n T } E K n , 0 { p 2 ( Y ; X ˜ n T β ˜ n , 0 ) w ( X n ) X ˜ n X ˜ n T } F u ˜ n 2 = r n 2 ϵ n E J { p 2 ( Y ; X ˜ n T β ˜ n , 0 ) w ( X n ) X ˜ n X ˜ n T } E K n , 0 { p 2 ( Y ; X ˜ n T β ˜ n , 0 ) w ( X n ) X ˜ n X ˜ n T } F u ˜ n 2 C r n 2 p n u ˜ n 2 / n ,
where E J { p 2 ( Y ; X ˜ n T β ˜ n , 0 ) w ( X n ) X ˜ n X ˜ n T } F = O ( p n ) and E K n , 0 { p 2 ( Y ; X ˜ n T β ˜ n , 0 ) w ( X n ) X ˜ n X ˜ n T } F = O ( p n ) . Thus,
I 1 , 2 = 2 1 r n 2 u ˜ n T H n u ˜ n + O ( r n 2 p n / n ) u ˜ n 2 .
For I 1 , 3 in Equation (A2), we observe that
| I 1 , 3 | C r n 3 E K n , ϵ { | p 3 ( Y ; X ˜ n T β ˜ n ) | w ( X n ) | X ˜ n T u ˜ n | 3 } = O ( r n 3 p n 3 / 2 ) u ˜ n 3 .
We can choose some large C such that I 1 , 1 , I 1 , 2 , 2 and I 1 , 3 are all dominated by the first term of I 1 , 2 in Equation (A3), which is positive by the eigenvalue assumption. This implies Equation (A1). Therefore, there exists a local minimizer of K n , ϵ ( β ˜ ) in the p n / n neighborhood of β ˜ n , 0 , and denote this minimizer by β ˜ n , ϵ .
Next, we show that the local minimizer β ˜ n , ϵ of K n , ϵ ( β ˜ ) is unique in the p n / n neighborhood of β ˜ n , 0 . For all β ˜ such that β ˜ β ˜ n , 0 = O ( n 1 / 4 p n 1 / 2 ) ,
E K n , ϵ β ˜ ρ q ( Y , F 1 ( X ˜ n T β ˜ ) ) w ( X n ) = E K n , ϵ p 1 ( Y ; X ˜ n T β ˜ ) w ( X n ) X ˜ n C p n E K n , ϵ 2 β ˜ 2 ρ q ( Y , F 1 ( X ˜ n T β ˜ ) ) w ( X n ) = E K n , ϵ p 2 ( Y ; X ˜ n T β ˜ ) w ( X n ) X ˜ n X n T C p n
and hence,
β ˜ E K n , ϵ { ρ q ( Y , F 1 ( X ˜ n T β ˜ ) ) w ( X n ) } = E K n , ϵ β ˜ ρ q ( Y , F 1 ( X ˜ n T β ˜ ) ) w ( X n ) 2 β ˜ 2 E K n , ϵ { ρ q ( Y , F 1 ( X ˜ n T β ˜ ) ) w ( X n ) } = E K n , ϵ 2 β ˜ 2 ρ q ( Y , F 1 ( X ˜ n T β ˜ ) ) w ( X n ) .
Therefore,
2 β ˜ 2 E K n , ϵ { ρ q ( Y , F 1 ( X ˜ n T β ˜ ) ) w ( X n ) } = E K n , ϵ { p 2 ( Y ; X ˜ n T β ˜ ) w ( X n ) X ˜ n X ˜ n T } = E K n , 0 { p 2 ( Y ; X ˜ n T β ˜ n , 0 ) w ( X n ) X ˜ n X ˜ n T } + E K n , 0 [ { p 2 ( Y ; X ˜ n T β ˜ ) p 2 ( Y ; X ˜ n T β ˜ n , 0 ) } w ( X n ) X ˜ n X ˜ n T ] + [ E K n , ϵ { p 2 ( Y ; X ˜ n T β ˜ ) w ( X n ) X ˜ n X ˜ n T } E K n , 0 { p 2 ( Y ; X ˜ n T β ˜ ) w ( X n ) X ˜ n X ˜ n T } ] = I 1 + I 2 + I 3 .
We know that the minimum eigenvalues of I 1 are uniformly bounded away from 0,
I 2 = E K n , 0 { p 3 ( Y ; X ˜ n T β ˜ ) w ( X n ) X ˜ n X ˜ n T X ˜ n T ( β ˜ β ˜ n , 0 ) } C p n / n 1 / 4 = o ( 1 ) I 3 ϵ / n [ E K n , 0 { p 2 ( Y ; X ˜ n T β ˜ ) w ( X n ) X ˜ n X ˜ n T } + E J { p 2 ( Y ; X ˜ n T β ˜ ) w ( X n ) X ˜ n X ˜ n T } ] C p n / n = o ( 1 ) .
Hence, for n large enough, 2 β ˜ 2 E K n , ϵ { ρ q ( Y , F 1 ( X ˜ n T β ˜ ) ) w ( X n ) } is positive definite for all β ˜ such that β ˜ β ˜ n , 0 = O ( n 1 / 4 p n 1 / 2 ) . Therefore, there exists a unique minimizer of K n , ϵ ( β ˜ ) in the n 1 / 4 p n 1 / 2 neighborhood of β ˜ n , 0 which covers β ˜ n , ϵ . From
0 = β ˜ E K n , ϵ { ρ q ( Y , F 1 ( X ˜ n T β ˜ ) ) w ( X n ) } | β ˜ = β ˜ n , ϵ = E K n , ϵ β ˜ ρ q ( Y , F 1 ( X ˜ n T β ˜ ) ) | β ˜ = β ˜ n , ϵ w ( X n ) = E K n , ϵ { p 1 ( Y ; X ˜ n T β ˜ n , ϵ ) w ( X n ) X ˜ n } ,
we know T ( K n , ϵ ) = β ˜ n , ϵ . From the definition of T ( · ) , it’s easy to see that T ( K n , ϵ ) is unique. ☐
Lemma A2 ( T ( K n ) T ( K n , ϵ ) ).
Assume Conditions A0–A7 and B4. For K n , ϵ in Equation (10), K ( · ) in Equation (11) and T ( · ) in Equation (12), if p n 4 / n 0 as n and the distribution of ( X n , Y ) is K n , ϵ , then there exists a unique local minimizer β ˜ ^ n of K n ( β ˜ ) such that β ˜ ^ n T ( K n , ϵ ) = O P ( p n / n ) . Furthermore, β ˜ ^ n β ˜ n , 0 = O P ( p n / n ) and T ( K n ) = β ˜ ^ .
Proof. 
Let r n = p n / n and u ˜ n = ( u 0 , u 1 , , u p n ) T R p n + 1 . To show the existence of the estimator, it suffices to show that for any given κ > 0 , there exists a sufficiently large constant C κ such that, for large n we have
P inf u ˜ n = C κ K n ( T ( K n , ϵ ) + r n u ˜ n ) > K n ( T ( K n , ϵ ) ) 1 κ .
This implies that with probability at least 1 κ , there exists a local minimizer β ˜ ^ n of K n ( β ˜ ) in the ball { T ( K n , ϵ ) + r n u ˜ n : u ˜ n C κ } . To show Equation (A4), consider
K n ( T ( K n , ϵ ) + r n u ˜ n ) K n ( T ( K n , ϵ ) ) = 1 n i = 1 n { ρ q ( Y i , F 1 ( X ˜ n i T ( T ( K n , ϵ ) + r n u ˜ n ) ) ) w ( X n i ) ρ q ( Y i , F 1 ( X ˜ n i T T ( K n , ϵ ) ) ) w ( X n i ) } I 1 ,
where u ˜ n = C κ .
By Taylor expansion,
I 1 = I 1 , 1 + I 1 , 2 + I 1 , 3 ,
where
I 1 , 1 = r n / n i = 1 n p 1 ( Y i ; X ˜ n i T T ( K n , ϵ ) ) w ( X n i ) X ˜ n i T u ˜ n , I 1 , 2 = r n 2 / ( 2 n ) i = 1 n p 2 ( Y i ; X ˜ n i T T ( K n , ϵ ) ) w ( X n i ) ( X ˜ n i T u ˜ n ) 2 , I 1 , 3 = r n 3 / ( 6 n ) i = 1 n p 3 ( Y i ; X ˜ n i T β ˜ n ) w ( X n i ) ( X ˜ n i T u ˜ n ) 3
for β ˜ n located between T ( K n , ϵ ) and T ( K n , ϵ ) + r n u ˜ n .
Since T ( K n , ϵ ) β ˜ n , 0 = O ( p n / n ) = o ( 1 ) , the large open set considered in Condition A 6 contains T ( K n , ϵ ) when n is large enough, say n N where N is a positive constant. Therefore, for any fixed n N , there exists a bounded open subset of R p n + 1 containing T ( K n , ϵ ) such that for all β ˜ in this set, p 1 ( Y ; X ˜ n T β ˜ ) w ( X n ) X ˜ n C X ˜ n which is integrable with respect to K n , ϵ , where C is a positive constant. Thus, for n N ,
0 = β ˜ E K n , ϵ { ρ q ( Y , F 1 ( X ˜ n T β ˜ ) ) w ( X n ) } | β ˜ = T ( K n , ϵ ) = E K n , ϵ { p 1 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n } .
Hence,
| I 1 , 1 | r n 1 n i = 1 n p 1 ( Y i ; X ˜ n i T T ( K n , ϵ ) ) w ( X n i ) X ˜ n i u ˜ n = O P ( r n p n / n ) u ˜ n .
For I 1 , 2 in Equation (A5),
I 1 , 2 = r n 2 2 n i = 1 n E K n , ϵ { p 2 ( Y i ; X ˜ n i T T ( K n , ϵ ) ) w ( X n i ) ( X ˜ n i T u ˜ n ) 2 } + r n 2 2 n i = 1 n [ p 2 ( Y i ; X ˜ n i T T ( K n , ϵ ) ) w ( X n i ) ( X ˜ n i T u ˜ n ) 2 E K n , ϵ { p 2 ( Y i ; X ˜ n i T T ( K n , ϵ ) ) w ( X n i ) ( X ˜ n i T u ˜ n ) 2 } ] I 1 , 2 , 1 + I 1 , 2 , 2 ,
where I 1 , 2 , 1 = 2 1 r n 2 u ˜ n T H n , ϵ u ˜ n . Meanwhile, we have
| I 1 , 2 , 2 | r n 2 2 1 n i = 1 n [ p 2 ( Y i ; X ˜ n i T T ( K n , ϵ ) ) w ( X n i ) X ˜ n i X ˜ n i T E K n , ϵ { p 2 ( Y i ; X ˜ n i T T ( K n , ϵ ) ) w ( X n i ) X ˜ n i X ˜ n i T } ] F u ˜ n 2 = r n 2 O P ( p n / n ) u ˜ n 2 .
Thus,
I 1 , 2 = 2 1 r n 2 u ˜ n T H n , ϵ u ˜ n + O P ( r n 2 p n / n ) u ˜ n 2 .
For I 1 , 3 in Equation (A5), we observe that
| I 1 , 3 | C r n 3 1 n i = 1 n | p 3 ( Y i ; X ˜ n i T β ˜ n ) | w ( X n i ) | X ˜ n i T u ˜ n | 3 = O P ( r n 3 p n 3 / 2 ) u ˜ n 3 .
We will show that the minimum eigenvalue of H n , ϵ is uniformly bounded away from 0. H n , ϵ = ( 1 ϵ / n ) H n , K n , 0 , T ( K n , ϵ ) + ϵ / n H n , J , T ( K n , ϵ ) . Note
H n , K n , 0 , T ( K n , ϵ ) H n = E K n , 0 [ { p 2 ( Y ; X ˜ n T T ( K n , ϵ ) ) p 2 ( Y ; X ˜ n T β ˜ n , 0 ) } w ( X n ) X ˜ n X ˜ n T ] = E K n , 0 [ p 3 ( Y ; X ˜ n T β ˜ n ) w ( X n ) X ˜ n X ˜ n T X ˜ n T { T ( K n , ϵ ) β ˜ n , 0 } ] = O ( p n 2 / n ) .
Since the eigenvalues of H n are uniformly bounded away from 0, so are those of H n , K n , 0 , T ( K n , ϵ ) and H n , ϵ .
We can choose some large C κ such that I 1 , 1 and I 1 , 3 are both dominated by the first term of I 1 , 2 in Equation (A7), which is positive by the eigenvalue assumption. This implies Equation (A4).
Next we show the uniqueness of β ˜ ^ . For all β ˜ such that β ˜ T ( K n , ϵ ) = O ( n 1 / 4 p n 1 / 2 ) ,
1 n i = 1 n p 2 ( Y i ; X ˜ n i T β ˜ ) w ( X n i ) X ˜ n i X ˜ n i T = E K n , 0 { p 2 ( Y ; X ˜ n T β ˜ n , 0 ) w ( X n ) X ˜ n X n T } + E K n , 0 [ { p 2 ( Y ; X ˜ n T β ˜ ) p 2 ( Y ; X ˜ n T β ˜ n , 0 ) } w ( X n ) X ˜ n X n T ] + [ E K n , ϵ { p 2 ( Y ; X ˜ n T β ˜ ) w ( X n ) X ˜ n X n T } E K n , 0 { p 2 ( Y ; X ˜ n T β ˜ ) w ( X n ) X ˜ n X n T } ] + 1 n i = 1 n p 2 ( Y i ; X ˜ n i T β ˜ ) ) w ( X n i ) X ˜ n i X ˜ n i T E K n , ϵ { p 2 ( Y ; X ˜ n T β ˜ ) w ( X n ) X ˜ n X n T } = I 1 + I 2 + I 3 + I 4 .
We know that the minimum eigenvalues of I 1 are uniformly bounded away from 0. Following the proof of Lemma A1, we have I 2 = o ( 1 ) and I 3 = o ( 1 ) . It’s easy to see I 4 = O P ( p n / n ) .
Hence, for n large enough, 2 β ˜ 2 K n ( β ˜ ) is positive definite with high probability for all β ˜ such that β ˜ β ˜ n , 0 = O ( n 1 / 4 p n 1 / 2 ) . Therefore, there exists a unique minimizer of K n ( β ˜ ) in the n 1 / 4 p n 1 / 2 neighborhood of T ( K n , ϵ ) which covers β ˜ ^ . ☐
Lemma A3 ( A n { T ( K n , ϵ ) β ˜ n , 0 } ).
Assume Conditions A0–A7 and B4. For K n , ϵ in Equation (10) and T ( · ) in Equation (12), if p n 5 / n 0 as n , the distribution of ( X n , Y ) is K n , ϵ and E J ( w ( X n ) X n ) C , then
n A n { T ( K n , ϵ ) β ˜ n , 0 } = O ( 1 ) ,
where A n is any given k × ( p n + 1 ) matrix such that A n A n T G , with G being a k × k positive-definite matrix and k is a fixed integer.
Proof. 
Taylor’s expansion yields
0 = E K n , ϵ { p 1 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n } = E K n , ϵ { p 1 ( Y ; X ˜ n T β ˜ n , 0 ) w ( X n ) X ˜ n } + E K n , ϵ { p 2 ( Y ; X ˜ n T β ˜ n , 0 ) w ( X n ) X ˜ n X ˜ n T } { T ( K n , ϵ ) β ˜ n , 0 } + 1 / 2 E K n , ϵ p 3 ( Y ; X ˜ n T β ˜ n ) w ( X n ) X ˜ n [ X ˜ n T { T ( K n , ϵ ) β ˜ n , 0 } ] 2 = I 1 + I 2 { T ( K n , ϵ ) β ˜ n , 0 } + I 3 ,
where β ˜ n lies between T ( K n , ϵ ) and β ˜ n , 0 . Below, we will show
I 1 = O ( 1 / n ) , I 2 H n = O ( p n / n ) , I 3 = O ( p n 5 / 2 / n ) .
First, I 1 = ϵ / n E J { p 1 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n } C ϵ / n E J ( w ( X n ) X n ) = O ( 1 / n ) . Following the proof of I 3 in Lemma A1, I 2 H n = O ( p n / n ) . Since T ( K n , ϵ ) β ˜ n , 0 = O ( p n / n ) , we have I 3 = O ( p n 5 / 2 / n ) .
Therefore, n A n { T ( K n , ϵ ) β ˜ n , 0 } = n A n H n 1 I 1 + o ( 1 ) , which completes the proof. ☐
Lemma A4 (asymptotic normality of T ( K n ) T ( K n , ϵ ) ).
Assume Conditions A0–A8 and B4. If p n 5 / n 0 as n and the distribution of ( X n , Y ) is K n , ϵ , then
n { U ( K n , ϵ ) } 1 / 2 A n { T ( K n ) T ( K n , ϵ ) } L N ( 0 , I k ) ,
where U ( K n , ϵ ) = A n H n , ϵ 1 Ω n , ϵ H n , ϵ 1 A n T , A n is any given k × ( p n + 1 ) matrix such that A n A n T G , with G being a k × k positive-definite matrix, k is a fixed integer.
Proof. 
We will first show that
T ( K n ) T ( K n , ϵ ) = 1 n H n , ϵ 1 i = 1 n p 1 ( Y i ; X ˜ n i T T ( K n , ϵ ) ) w ( X n i ) X ˜ n i + o P ( n 1 / 2 ) .
From K n ( β ˜ ) β ˜ | β ˜ = T ( K n ) = 0 , Taylor’s expansion yields
0 = 1 n i = 1 n p 1 ( Y i ; X ˜ n i T T ( K n , ϵ ) ) w ( X n i ) X ˜ n i + 1 n i = 1 n p 2 ( Y i ; X ˜ n i T T ( K n , ϵ ) ) w ( X n i ) X ˜ n i X ˜ n i T { T ( K n ) T ( K n , ϵ ) } + 1 2 n i = 1 n p 3 ( Y i ; X ˜ n i T β ˜ n ) w ( X n i ) [ X ˜ n i T { T ( K n ) T ( K n , ϵ ) } ] 2 X ˜ n i 1 n i = 1 n p 1 ( Y i ; X ˜ n i T T ( K n , ϵ ) ) w ( X n i ) X ˜ n i + I 2 { T ( K n ) T ( K n , ϵ ) } + I 3 ,
where β ˜ n lies between T ( K n , ϵ ) and T ( K n ) . Below, we will show
I 2 H n , ϵ = O P ( p n / n ) , I 3 = O P ( p n 5 / 2 / n ) .
Similar arguments for the proof of I 1 , 2 of Lemma A2, we have I 2 H n , ϵ = O P ( p n / n ) .
Second, a similar proof used for I 1 , 3 in Equation (A5) gives I 3 = O P ( p n 5 / 2 / n ) .
Third, by Equation (A9) and T ( K n ) T ( K n , ϵ ) = O P ( p n / n ) , we see that
H n , ϵ { T ( K n ) T ( K n , ϵ ) } = 1 n i = 1 n p 1 ( Y i ; X ˜ n i T T ( K n , ϵ ) ) w ( X n i ) X ˜ n i + u n ,
where u n = O P ( p n 5 / 2 / n ) = o P ( n 1 / 2 ) . From the proof of Lemma A2, the eigenvalues of H n , ϵ are uniformly bounded away from 0 and we complete the proof of Equation (A8).
Following the proof for the bounded eigenvalues of H n , ϵ in Lemma A2, we can show that the eigenvalues of Ω n , ϵ are uniformly bounded away from 0. Hence, the eigenvalues of H n , ϵ 1 Ω n , ϵ H n , ϵ 1 are uniformly bounded away from 0, as are the eigenvalues of U ( K n , ϵ ) . From Equation (A8), we see that
A n { T ( K n ) T ( K n , ϵ ) } = 1 n A n H n , ϵ 1 i = 1 n p 1 ( Y i ; X ˜ n i T T ( K n , ϵ ) ) w ( X n i ) X ˜ n i + o P ( n 1 / 2 ) .
It follows that
n { U ( K n , ϵ ) } 1 / 2 A n { T ( K n ) T ( K n , ϵ ) } = i = 1 n R n i + o P ( 1 ) ,
where R n i = n 1 / 2 { U ( K n , ϵ ) } 1 / 2 A n H n , ϵ 1 p 1 ( Y i ; X ˜ n i T T ( K n , ϵ ) ) w ( X n i ) X ˜ n i . Following (A6) in Lemma A2, one can show that E K n , ϵ ( R n i ) = 0 for n large enough.
To show i = 1 n R n i L N ( 0 , I k ) , we apply the Lindeberg-Feller central limit theorem in [26]. Specifically, we check (I) i = 1 n cov K n , ϵ ( R n i ) I k ; (II) i = 1 n E K n , ϵ ( R n i 2 + δ ) = o ( 1 ) for some δ > 0 . Condition (I) is straightforward since i = 1 n cov K n , ϵ ( R n i ) = { U ( K n , ϵ ) } 1 / 2 U ( K n , ϵ ) { U ( K n , ϵ ) } 1 / 2 = I k . To check condition (II), we can show that E K n , ϵ ( R n i 2 + δ ) = O ( ( p n / n ) ( 2 + δ ) / 2 ) . This yields i = 1 n E K n , ϵ ( R n i 2 + δ ) O ( p n ( 2 + δ ) / 2 / n δ / 2 ) = o ( 1 ) . Hence
n { U ( K n , ϵ ) } 1 / 2 A n { T ( K n ) T ( K n , ϵ ) } L N ( 0 , I k ) .
Thus, we complete the proof. ☐
Lemma A5 (asymptotic covariance matrices U ( K n , ϵ ) and U n ).
Assume Conditions A0–A9 and B4. If p n 4 / n 0 as n , then
U n 1 / 2 { U ( K n , ϵ ) } 1 / 2 I k = O ( p n / n 1 / 4 ) ,
where U ( K n , ϵ ) = A n H n , ϵ 1 Ω n , ϵ H n , ϵ 1 A n T , A n is any given k × ( p n + 1 ) matrix such that A n A n T G , with G being a k × k positive-definite matrix, and k is a fixed integer.
Proof. 
Note that
{ U ( K n , ϵ ) } 1 / 2 U n 1 / 2 2 U ( K n , ϵ ) U n H n , ϵ 1 Ω n , ϵ H n , ϵ 1 H n 1 Ω n H n 1 A n F 2 .
Since A n F 2 tr ( G ) , it suffices to prove that H n , ϵ 1 Ω n , ϵ H n , ϵ 1 H n 1 Ω n H n 1 = O ( p n 2 / n ) .
First, we prove H n , ϵ H n = O ( p n 2 / n ) . Note that
H n , ϵ H n = E K n , ϵ [ { p 2 ( Y ; X ˜ n T T ( K n , ϵ ) ) p 2 ( Y ; X ˜ n T β ˜ n , 0 ) } w ( X n ) X ˜ n X ˜ n T ] + [ E K n , ϵ { p 2 ( Y ; X ˜ n T β ˜ n , 0 ) w ( X n ) X ˜ n X ˜ n T } H n ] = E K n , ϵ [ p 3 ( Y ; X ˜ n T β ˜ ) w ( X n ) X ˜ n X ˜ n T X ˜ n T { T ( K n , ϵ ) β ˜ n , 0 } ] + [ E K n , ϵ { p 2 ( Y ; X ˜ n T β ˜ n , 0 ) w ( X n ) X ˜ n X ˜ n T } H n ] I 1 + I 2 .
We know that I 1 = O ( p n 2 / n ) and I 2 = O ( p n / n ) . Thus, I 1 = O ( p n 2 / n ) .
Second, we show Ω n , ϵ Ω n = O ( p n 2 / n ) . It is easy to see that
Ω n , ϵ Ω n = E K n , ϵ [ { p 1 2 ( Y ; X ˜ n T T ( K n , ϵ ) ) p 1 2 ( Y ; X ˜ n T β ˜ n , 0 ) } w 2 ( X n ) X ˜ n X ˜ n T ] + [ E K n , ϵ { p 1 2 ( Y ; X ˜ n T β ˜ n , 0 ) w 2 ( X n ) X ˜ n X ˜ n T } Ω n ] = Δ 1 , 1 + Δ 1 , 2 ,
where Δ 1 , 1 = O ( p n 2 / n ) and Δ 1 , 2 = O ( p n / n ) . We observe that Ω n , ϵ Ω n = O ( p n 2 / n ) .
Third, we show H n , ϵ 1 Ω n , ϵ H n , ϵ 1 H n 1 Ω n H n 1 = O ( p n 2 / n ) . Note H n , ϵ 1 Ω n , ϵ H n , ϵ 1 H n 1 Ω n H n 1 = L 1 + L 2 + L 3 , where L 1 = H n , ϵ 1 ( Ω n , ϵ Ω n ) H n , ϵ 1 , L 2 = H n , ϵ 1 ( H n H n , ϵ ) H n 1 Ω n H n , ϵ 1 and L 3 = H n 1 Ω n H n , ϵ 1 ( H n H n , ϵ ) H n 1 . Under Conditions A 7 and A 9 , it is straightforward to see that H n , ϵ 1 = O ( 1 ) , H n 1 = O ( 1 ) and H n 1 Ω n = O ( 1 ) . Since L 1 H n , ϵ 1 Ω n , ϵ Ω n H n , ϵ 1 , we conclude L 1 = O ( p n 2 / n ) , and similarly L 2 = O ( p n 2 / n ) and L 3 = O ( p n 2 / n ) . Hence, H n , ϵ 1 Ω n , ϵ H n , ϵ 1 H n 1 Ω n H n 1 = O ( p n 2 / n ) .
Thus, we can conclude that U ( K n , ϵ ) U n = O ( p n 2 / n ) and that the eigenvalues of U ( K n , ϵ ) and U n are uniformly bounded away from 0 and . Consequently, { U ( K n , ϵ ) } 1 / 2 U n 1 / 2 = O ( p n / n 1 / 4 ) and proof is finished. ☐
Lemma A6 (asymptotic covariance matrices U ( K n ) and U ( K n , ϵ ) ).
Assume Conditions A0–A9 and B4. If p n 4 / n 0 n and the distribution of ( X n , Y ) is K n , ϵ , then
{ U ( K n ) } 1 / 2 { U ( K n , ϵ ) } 1 / 2 I k = O P ( p n / n 1 / 4 ) ,
where U ( K n , ϵ ) = A n H n , ϵ 1 Ω n , ϵ H n , ϵ 1 A n T , U ( K n ) = A n H ^ n 1 Ω ^ n H ^ n 1 A n T , A n is any given k × ( p n + 1 ) matrix such that A n A n T G , with G being a k × k positive-definite matrix, and k is a fixed integer.
Proof. 
Note that { U ( K n ) } 1 / 2 { U ( K n , ϵ ) } 1 / 2 2 U ( K n ) U ( K n , ϵ ) H ^ n 1 Ω ^ n H ^ n 1 H n , ϵ 1 Ω n , ϵ H n , ϵ 1 A n F 2 . Since A n F 2 tr ( G ) , it suffices to prove that H ^ n 1 Ω ^ n H ^ n 1 H n , ϵ 1 Ω n , ϵ H n , ϵ 1 = O P ( p n 2 / n ) .
Following the proof of Proposition 1 in [1], we can show that H ^ n H n , ϵ = O P ( p n 2 / n ) and Ω ^ n Ω n , ϵ = O P ( p n 2 / n ) .
To show H ^ n 1 Ω ^ n H ^ n 1 H n , ϵ 1 Ω n , ϵ H n , ϵ 1 = O P ( p n 2 / n ) , note H ^ n 1 Ω ^ n H ^ n 1 H n , ϵ 1 Ω n , ϵ H n , ϵ 1 = L 1 + L 2 + L 3 , where L 1 = H ^ n 1 ( Ω ^ n Ω n , ϵ ) H ^ n 1 , L 2 = H ^ n 1 ( H n , ϵ H ^ n ) H n , ϵ 1 Ω n , ϵ H ^ n 1 and L 3 = H n , ϵ 1 Ω n , ϵ H ^ n 1 ( H n , ϵ H ^ n ) H n , ϵ 1 . Following the proof in Lemma A2, it is straightforward to verify that H n , ϵ 1 = O ( 1 ) , H ^ n 1 = O P ( 1 ) . In addition, H n , ϵ 1 Ω n , ϵ = ( H n , ϵ 1 H n 1 ) Ω n , ϵ + H n 1 ( Ω n , ϵ Ω n ) + H n 1 Ω n H n , ϵ 1 H n , ϵ H n H n 1 Ω n , ϵ + H n 1 Ω n , ϵ Ω n + H n 1 Ω n = O ( 1 ) .
Since L 1 H ^ n 1 Ω ^ n Ω n , ϵ H ^ n 1 , we conclude L 1 = O P ( p n 2 / n ) , and similarly L 2 = O P ( p n 2 / n ) and L 3 = O P ( p n 2 / n ) . Hence, H ^ n 1 Ω ^ n H ^ n 1 H n , ϵ 1 Ω n , ϵ H n , ϵ 1 = O P ( p n 2 / n ) .
Thus, we can conclude that U ( K n ) U ( K n , ϵ ) = O P ( p n 2 / n ) and the eigenvalues of U ( K n ) are uniformly bounded away from 0 and with probability tending to 1. Noting that { U ( K n ) } 1 / 2 { U ( K n , ϵ ) } 1 / 2 2 U ( K n ) U ( K n , ϵ ) . ☐
Lemma A7 (asymptotic distribution of test statistic).
Assume Conditions A0–A9 and B4. If p n 6 / n 0 n and the distribution of ( X n , Y ) is K n , ϵ , then
n [ { U ( K n ) } 1 / 2 A n { T ( K n ) β ˜ n , 0 } U n 1 / 2 A n { T ( K n , ϵ ) β ˜ n , 0 } ] L N ( 0 , I k ) ,
where A n is any given k × ( p n + 1 ) matrix such that A n A n T G , with G being a k × k positive-definite matrix, and k is a fixed integer.
Proof. 
Note that
n [ { U ( K n ) } 1 / 2 A n { T ( K n ) β ˜ n , 0 } U n 1 / 2 A n { T ( K n , ϵ ) β ˜ n , 0 } ] = n { U ( K n ) } 1 / 2 A n { T ( K n ) T ( K n , ϵ ) } + n [ { U ( K n ) } 1 / 2 { U ( K n , ϵ ) } 1 / 2 ] A n { T ( K n , ϵ ) β ˜ n , 0 } + n [ { U ( K n , ϵ ) } 1 / 2 U n 1 / 2 ] A n { T ( K n , ϵ ) β ˜ n , 0 } I + II + III .
For term I, we obtain from Lemma A4 that n { U ( K n , ϵ ) } 1 / 2 A n ( T ( K n ) T ( K n , ϵ ) ) L N ( 0 , I k ) . From Lemma A6, we get { U ( K n ) } 1 / 2 { U ( K n , ϵ ) } 1 / 2 I k = o P ( 1 ) . Thus, by Slutsky theorem,
I L N ( 0 , I k ) .
For term II , we see from Lemma A6 that
{ U ( K n ) } 1 / 2 { U ( K n , ϵ ) } 1 / 2 = O P ( p n / n 1 / 4 ) .
Since
A n { T ( K n , ϵ ) β ˜ n , 0 } A n T ( K n , ϵ ) β ˜ n , 0 = O ( p n / n ) .
Thus,
II n { U ( K n ) } 1 / 2 { U ( K n , ϵ ) } 1 / 2 A n T ( K n , ϵ ) β ˜ n , 0 = O P ( p n 3 / 2 / n 1 / 4 ) .
Similarly, III = o P ( 1 ) . Combining (A10) and (A11) with Slutsky theorem completes the proof. ☐
Lemma A8 (Influence Function IF).
Assume Conditions A1–A8 and B4. For any fixed sample size n,
t T ( 1 t ) K n , 0 + t J | t = t 0 lim t t 0 T ( 1 t ) K n , 0 + t J T ( 1 t 0 ) K n , 0 + t 0 J t t 0 = H n , K t 0 , T ( K t 0 ) 1 [ E J { ψ RBD ( Z n ; T ( K t 0 ) ) } E K n , 0 { ψ RBD ( Z n ; T ( K t 0 ) ) } ] ,
where K t 0 = ( 1 t 0 ) K n , 0 + t 0 J and t 0 is a positive constant such that t 0 c / p n 2 with c > 0 a sufficiently small constant. In addition, H n , K t 0 , T ( K t 0 ) 1 C uniformly for all n and t 0 such that t 0 c / p n 2 with c > 0 a sufficiently small constant.
Proof. 
We follow the proof of Theorem 5.1 in [27]. Note
lim t t 0 T ( 1 t ) K n , 0 + t J T ( 1 t 0 ) K n , 0 + t 0 J t t 0 = lim Δ 0 T K t 0 + Δ ( J K n , 0 ) T K t 0 Δ ,
where Δ = t t 0 .
It suffices to prove that for any sequence { Δ j } j = 1 such that lim j Δ j = 0 , we have
lim j T K t 0 + Δ j ( J K n , 0 ) T ( K t 0 ) Δ j = H n , K t 0 , T ( K t 0 ) 1 [ E J { ψ RBD ( Z n ; T ( K t 0 ) ) } E K n , 0 { ψ RBD ( Z n ; T ( K t 0 ) ) } ] .
Following similar proofs in Lemma A1, we can show that for t 0 sufficiently small,
β ˜ n , 0 T ( K t 0 ) C t 0 p n .
Next we will show that the eigenvalues of H n , K t 0 , T ( K t 0 ) are bounded away from 0.
H n , K t 0 , T ( K t 0 ) = ( 1 t 0 ) H n , K n , 0 , T ( K t 0 ) + t 0 H n , J , T ( K t 0 ) = ( 1 t 0 ) H n + t 0 H n , J , β ˜ n , 0 + ( 1 t 0 ) { H n , K n , 0 , T ( K t 0 ) H n } + t 0 { H n , J , T ( K t 0 ) H n , J , β ˜ n , 0 } = ( 1 t 0 ) I 1 + t 0 I 2 + I 3 + I 4 .
First,
I 3 C E K n , 0 { p 2 ( Y ; X ˜ n T T ( K t 0 ) ) p 2 ( Y ; X ˜ n T β ˜ n , 0 ) } w ( X n ) X ˜ n X ˜ n T C p n 3 / 2 T ( K t 0 ) β ˜ n , 0 C p n 2 t 0 .
Similarly, I 2 C p n t 0 and I 4 C p n 2 t 0 2 . Since the eigenvalues of I 1 are bounded away from zero, I 2 , I 3 and I 4 could be sufficiently small, we conclude that for t 0 c / p n 2 when c is sufficiently small, the eigenvalues of H n , K t 0 , T ( K t 0 ) are uniformly bounded away from 0.
Define K j = K t 0 + Δ j ( J K n , 0 ) . Following similar arguments for (A6) in Lemma A2, for j large enough, E K j { ψ RBD ( Z n ; T ( K j ) ) } = 0 . We will only consider j large enough below. The two term Taylor expansion yields
0 = E K j { ψ RBD ( Z n ; T ( K j ) ) } = E K j { ψ RBD ( Z n ; T ( K t 0 ) ) } + H n , K j , β ˜ j { T ( K j ) T ( K t 0 ) } ,
where β ˜ j lies between T ( K t 0 ) and T ( K j ) .
Thus, from (A13) and the fact E K j { ψ RBD ( Z n ; T ( K t 0 ) ) } = Δ j [ E J { ψ RBD ( Z n ; T ( K t 0 ) ) } E K n , 0 { ψ RBD ( Z n ; T ( K t 0 ) ) } ] , we have
0 = E K j { ψ RBD ( Z n ; T ( K t 0 ) ) } + H n , K t 0 , T ( K t 0 ) { T ( K j ) T ( K t 0 ) } + { H n , K j , β ˜ j H n , K t 0 , T ( K t 0 ) } { T ( K j ) T ( K t 0 ) } = Δ j [ E J { ψ RBD ( Z n ; T ( K t 0 ) ) } E K n , 0 { ψ RBD ( Z n ; T ( K t 0 ) ) } ] + H n , K t 0 , T ( K t 0 ) { T ( K j ) T ( K t 0 ) } + ( H n , K j , β ˜ j H n , K t 0 , T ( K t 0 ) ) { T ( K j ) T ( K t 0 ) } ,
and we obtain that
T ( K j ) T ( K t 0 ) = Δ j H n , K t 0 , T ( K t 0 ) 1 [ E J { ψ RBD ( Z n ; T ( K t 0 ) ) } E K n , 0 { ψ RBD ( Z n ; T ( K t 0 ) ) } ] H n , K t 0 , T ( K t 0 ) 1 { H n , K j , β ˜ j H n , K t 0 , T ( K t 0 ) } { T ( K j ) T ( K t 0 ) } .
Next, we will show that H n , K j , β ˜ j H n , K t 0 , T ( K t 0 ) = o ( 1 ) as j for any fixed n. Since β ˜ j T ( K t 0 ) T ( K j ) T ( K t 0 ) = O ( Δ j ) ,
H n , K j , β ˜ j H n , K t 0 , β ˜ j = Δ j E J { p 2 ( Y ; X ˜ n T β ˜ j ) w ( X n ) X ˜ n X ˜ n T } E K n , 0 { p 2 ( Y ; X ˜ n T β ˜ j ) w ( X n ) X ˜ n X ˜ n T } = O ( Δ j ) = o ( 1 ) as j ,
and also,
H n , K t 0 , β ˜ j H n , K t 0 , T ( K t 0 ) = E K t 0 [ { p 2 ( Y ; X ˜ n T β ˜ j ) p 2 ( Y ; X ˜ n T T ( K t 0 ) ) } w ( X n ) X ˜ n X ˜ n T ] = o ( 1 ) as j .
From Equations (A15) and (A16),
H n , K j , β ˜ j H n , K t 0 , T ( K t 0 ) = o ( 1 ) as j
which, together with Equations (A12) and (A14), implies that
T ( K j ) T ( K t 0 ) + Δ j H n , K t 0 , T ( K t 0 ) 1 [ E J { ψ RBD ( Z n ; T ( K t 0 ) ) } E K n , 0 { ψ RBD ( Z n ; T ( K t 0 ) ) } ] = o ( Δ j ) .
This completes the proof. ☐
Lemma A9.
Assume Conditions A1–A8 and B4 and sup n E J ( w ( X n ) X ˜ n ) C . Let H k ( · ; δ ) be the cumulative distribution function of χ k 2 ( δ ) distribution with δ the noncentrality parameter. Denote δ ( ϵ ) = n U n 1 / 2 { A n T ( K n , ϵ ) g 0 } 2 . Let b ( ϵ ) = H k ( x ; δ ( ϵ ) ) . Then, for any fixed x > 0 , sup ϵ [ 0 , C ] lim sup n | b ( 3 ) ( ϵ ) | C under H 0 and sup ϵ [ 0 , C ] lim sup n | b ( ϵ ) | C under H 1 n .
Proof. 
Since b ( ϵ ) = H k ( x ; δ ( ϵ ) ) , we have
b ( ϵ ) = ϵ H k ( x ; δ ( ϵ ) ) = δ H k ( x ; δ ) | δ = δ ( ϵ ) δ ( ϵ ) ϵ b ( ϵ ) = 2 δ 2 H k ( x ; δ ) | δ = δ ( ϵ ) δ ( ϵ ) ϵ 2 + δ H k ( x ; δ ) | δ = δ ( ϵ ) 2 δ ( ϵ ) ϵ 2 b ( 3 ) ( ϵ ) = 3 δ 3 H k ( x ; δ ) | δ = δ ( ϵ ) δ ( ϵ ) ϵ 3 + 3 2 δ 2 H k ( x ; δ ) | δ = δ ( ϵ ) δ ( ϵ ) ϵ 2 δ ( ϵ ) ϵ 2 + δ H k ( x ; δ ) | δ = δ ( ϵ ) 3 δ ( ϵ ) ϵ 3 .
To complete the proof, we only need to show that i / δ i H k ( x ; δ ) | δ = δ ( ϵ ) and i δ ( ϵ ) / ϵ i ( i = 1 , 2 , 3 ) are bounded as n for all ϵ [ 0 , C ] . Note that
H k ( x ; δ ) = e δ / 2 j = 0 ( δ / 2 ) j j ! γ ( j + k / 2 , x / 2 ) Γ ( j + k / 2 ) ,
where Γ ( · ) is the Gamma function, and γ ( · , · ) is the lower incomplete gamma function γ ( s , x ) = 0 x t s 1 e t d t , which satisfies γ ( s , x ) = ( s 1 ) γ ( s 1 , x ) x s 1 e x . Therefore,
δ H k ( x ; δ ) = e δ / 2 2 j = 0 ( δ / 2 ) j j ! γ ( j + k / 2 , x / 2 ) Γ ( j + k / 2 ) + e δ / 2 2 j = 1 ( δ / 2 ) j 1 ( j 1 ) ! γ ( j + k / 2 , x / 2 ) Γ ( j + k / 2 ) = 1 2 e δ / 2 j = 0 ( δ / 2 ) j j ! γ ( j + k / 2 , x / 2 ) Γ ( j + k / 2 ) + γ ( j + 1 + k / 2 , x / 2 ) Γ ( j + 1 + k / 2 ) .
Since
γ ( j + 1 + k / 2 , x / 2 ) Γ ( j + 1 + k / 2 ) = ( j + k / 2 ) γ ( j + k / 2 , x / 2 ) ( x / 2 ) j + k / 2 e x / 2 Γ ( j + 1 + k / 2 ) = γ ( j + k / 2 , x / 2 ) Γ ( j + k / 2 ) ( x / 2 ) j + k / 2 e x / 2 Γ ( j + 1 + k / 2 ) ,
we have
δ H k ( x ; δ ) = 1 2 e δ / 2 j = 0 ( δ / 2 ) j j ! ( x / 2 ) j + k / 2 e x / 2 Γ ( j + 1 + k / 2 ) 2 δ 2 H k ( x ; δ ) = 1 4 e δ / 2 j = 0 ( δ / 2 ) j j ! ( x / 2 ) j + k / 2 e x / 2 Γ ( j + 1 + k / 2 ) 1 4 e δ / 2 j = 0 ( δ / 2 ) j j ! ( x / 2 ) j + 1 + k / 2 e x / 2 Γ ( j + 2 + k / 2 ) = 1 4 ( x / 2 ) k / 2 e x / 2 e δ / 2 j = 0 ( δ / 2 ) j j ! ( x / 2 ) j Γ ( j + 1 + k / 2 ) ( x / 2 ) j + 1 Γ ( j + 2 + k / 2 ) = 1 4 ( x / 2 ) k / 2 e x / 2 e δ / 2 j = 0 ( δ / 2 ) j j ! ( x / 2 ) j Γ ( j + 1 + k / 2 ) 1 ( x / 2 ) j + 1 + k / 2 3 δ 3 H k ( x ; δ ) = 1 8 ( x / 2 ) k / 2 e x / 2 e δ / 2 j = 0 ( δ / 2 ) j j ! ( x / 2 ) j Γ ( j + 1 + k / 2 ) 1 ( x / 2 ) j + 1 + k / 2 + 1 8 ( x / 2 ) k / 2 e x / 2 e δ / 2 j = 0 ( δ / 2 ) j j ! ( x / 2 ) j + 1 Γ ( j + 2 + k / 2 ) 1 ( x / 2 ) j + 2 + k / 2 = 1 8 ( x / 2 ) k / 2 e x / 2 e δ / 2 j = 0 ( δ / 2 ) j j ! ( x / 2 ) j Γ ( j + 1 + k / 2 ) · ( x / 2 ) j + 1 + k / 2 1 ( x / 2 ) j + 2 + k / 2 1 ( x / 2 ) j + 1 + k / 2 .
From the results of Lemma A3, that | δ ( ϵ ) | is bounded as n for all ϵ [ 0 , C ] under both H 0 and H 1 n , so are i / δ i H k ( x ; δ ) | δ = δ ( ϵ ) ( i = 1 , 2 , 3 ). Now, we consider the derivatives of δ ( ϵ ) ,
δ ( ϵ ) ϵ = 2 n A n T ( K n , ϵ ) ϵ T U n 1 { A n T ( K n , ϵ ) g 0 } 2 δ ( ϵ ) ϵ 2 = 2 n A n T ( K n , ϵ ) ϵ T U n 1 A n T ( K n , ϵ ) ϵ + 2 n A n 2 T ( K n , ϵ ) ϵ 2 T U n 1 { A n T ( K n , ϵ ) g 0 } 3 δ ( ϵ ) ϵ 3 = 6 n A n 2 T ( K n , ϵ ) ϵ 2 T U n 1 A n T ( K n , ϵ ) ϵ + 2 n A n 3 T ( K n , ϵ ) ϵ 3 T U n 1 { A n T ( K n , ϵ ) g 0 } .
To complete the proof, we only need to show that n i / ϵ i T ( K n , ϵ ) ( i = 1 , 2 , 3 ) are bounded as n for all ϵ [ 0 , C ] , and n A n T ( K n , ϵ ) g 0 is bounded under H 0 and H 1 n as n for all ϵ [ 0 , C ] . The result for n A n T ( K n , ϵ ) g 0 is straightforward from Lemma A3.
First, for the first order derivative of T ( K n , ϵ ) ,
n ϵ T ( K n , ϵ ) = H n , K n , ϵ , T ( K n , ϵ ) 1 [ E J { p 1 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n } E K n , 0 { p 1 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n } ] .
Since H n , K n , ϵ , T ( K n , ϵ ) 1 C , E J { p 1 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n } C E J w ( X n ) X ˜ n C and
E K n , 0 { p 1 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n } = E K n , 0 { p 1 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n } E K n , 0 { p 1 ( Y ; X ˜ n T β ˜ n , 0 ) w ( X n ) X ˜ n } = E K n , 0 [ p 2 ( Y ; X ˜ n T β ˜ ) w ( X n ) X ˜ n X ˜ n T { T ( K n , ϵ ) β ˜ n , 0 } ] C p n 3 / 2 / n ,
we conclude that n / ϵ T ( K n , ϵ ) is uniformly bounded for all ϵ [ 0 , C ] as n .
Second, for the second order derivative of T ( K n , ϵ ) ,
n 2 ϵ 2 T ( 1 ϵ / n ) K n , 0 + ϵ / n J = H n , K n , ϵ , T ( K n , ϵ ) 1 ϵ · [ E J { p 1 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n } E K n , 0 { p 1 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n } ] H n , K n , ϵ , T ( K n , ϵ ) 1 · ϵ [ E J { p 1 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n } E K n , 0 { p 1 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n } ]
with
ϵ H n , K n , ϵ , T ( K n , ϵ ) 1 = H n , K n , ϵ , T ( K n , ϵ ) 1 H n , K n , ϵ , T ( K n , ϵ ) ϵ H n , K n , ϵ , T ( K n , ϵ ) 1 , H n , K n , ϵ , T ( K n , ϵ ) ϵ = 1 n E K n , 0 { p 2 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n X ˜ n T } + 1 n E J { p 2 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n X ˜ n T } + ( 1 ϵ / n ) E K n , 0 p 3 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n X ˜ n T X ˜ n T ϵ T ( K n , ϵ ) + ϵ / n E J p 3 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n X ˜ n T X ˜ n T ϵ T ( K n , ϵ ) .
Therefore, / ϵ H n , K n , ϵ , T ( K n , ϵ ) 1 C / ϵ H n , K n , ϵ , T ( K n , ϵ ) C p n 3 / 2 / n . In addition,
ϵ [ E J { p 1 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n } E K n , 0 { p 1 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n } ] = E J p 2 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n X ˜ n T ϵ T ( K n , ϵ ) E K n , 0 p 2 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n X ˜ n T ϵ T ( K n , ϵ ) C p n / n .
Therefore, n 2 ϵ 2 T ( 1 ϵ / n ) K n , 0 + ϵ / n J = o ( 1 ) for all ϵ [ 0 , C ] .
Finally, for the third order derivative of T ( K n , ϵ ) ,
n 3 ϵ 3 T ( 1 ϵ / n ) K n , 0 + ϵ / n J = 2 H n , K n , ϵ , T ( K n , ϵ ) 1 ϵ 2 · [ E J { p 1 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n } E K n , 0 { p 1 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n } ] 2 H n , K n , ϵ , T ( K n , ϵ ) 1 ϵ · ϵ [ E J { p 1 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n } E K n , 0 { p 1 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n } ] H n , K n , ϵ , T ( K n , ϵ ) 1 · 2 ϵ 2 [ E J { p 1 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n } E K n , 0 { p 1 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n } ] .
Note:
2 ϵ 2 H n , K n , ϵ , T ( K n , ϵ ) 1 = H n , K n , ϵ , T ( K n , ϵ ) 1 ϵ H n , K n , ϵ , T ( K n , ϵ ) ϵ H n , K n , ϵ , T ( K n , ϵ ) 1 H n , K n , ϵ , T ( K n , ϵ ) 1 2 H n , K n , ϵ , T ( K n , ϵ ) ϵ 2 H n , K n , ϵ , T ( K n , ϵ ) 1 H n , K n , ϵ , T ( K n , ϵ ) 1 H n , K n , ϵ , T ( K n , ϵ ) ϵ H n , K n , ϵ , T ( K n , ϵ ) 1 ϵ ,
where
2 ϵ 2 H n , K n , ϵ , T ( K n , ϵ ) = 2 n E K n , 0 p 3 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n X ˜ n T X ˜ n T ϵ T ( K n , ϵ ) + 2 n E J p 3 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n X ˜ n T X ˜ n T ϵ T ( K n , ϵ ) + ( 1 ϵ / n ) E K n , 0 p 4 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n X ˜ n T ( X ˜ n T ϵ T ( K n , ϵ ) ) 2 + ϵ / n E J p 4 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n X ˜ n T ( X ˜ n T ϵ T ( K n , ϵ ) ) 2 .
Hence, 2 ϵ 2 H n , K n , ϵ , T ( K n , ϵ ) C p n 2 / n which implies that 2 ϵ 2 H n , K n , ϵ , T ( K n , ϵ ) 1 = o ( 1 ) for all ϵ [ 0 , C ] . In addition,
2 ϵ 2 [ E J { p 1 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n } E K n , 0 { p 1 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n } ] = ϵ [ E J p 2 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n X ˜ n T ϵ T ( K n , ϵ ) E K n , 0 p 2 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n X ˜ n T ϵ T ( K n , ϵ ) ] = E J p 3 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n X ˜ n T ϵ T ( K n , ϵ ) 2 + E J p 2 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n X ˜ n T 2 ϵ 2 T ( K n , ϵ ) E β ˜ n , 0 p 3 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n X ˜ n T ϵ T ( K n , ϵ ) 2 E β ˜ n , 0 p 2 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n X ˜ n T 2 ϵ 2 T ( K n , ϵ ) .
Hence, 2 ϵ 2 [ E J { p 1 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n } E K n , 0 { p 1 ( Y ; X ˜ n T T ( K n , ϵ ) ) w ( X n ) X ˜ n } ] C p n / n . Therefore, n 3 ϵ 3 T ( 1 ϵ / n ) K n , 0 + ϵ / n J = o ( 1 ) for all ϵ [ 0 , C ] . Hence, we complete the proof. ☐
Proof of Theorem 1.
We follow the idea of the proof in [10]. Lemma A7 implies that the Wald-type test statistic W n is asymptotically noncentral χ k 2 with noncentrality parameter δ ( ϵ ) = n U n 1 / 2 { A n T ( K n , ϵ ) g 0 } 2 . Therefore, α ( K n , ϵ ) = P ( W n > η 1 α 0 | H 0 ) = 1 H k ( η 1 α 0 ; δ ( ϵ ) ) + h ( n , ϵ ) where h ( n , ϵ ) = α ( K n , ϵ ) 1 + H k ( η 1 α 0 ; δ ( ϵ ) ) 0 as n for any fixed ϵ . Let b ( ϵ ) = H k ( η 1 α 0 ; δ ( ϵ ) ) . Then, for ϵ close to 0, we have
α ( K n , ϵ ) α 0 = b ( ϵ ) b ( 0 ) + h ( n , ϵ ) h ( n , 0 ) = ϵ b ( 0 ) + 1 2 ϵ 2 b ( 0 ) + 1 6 ϵ 3 b ( 3 ) ( ϵ ) + h ( n , ϵ ) h ( n , 0 ) ,
where 0 < ϵ < ϵ . Note that under H 0
b ( 0 ) = μ k δ ( ϵ ) ϵ | ϵ = 0 = 2 μ k n A n T ( K n , ϵ ) ϵ T | ϵ = 0 U n 1 { A n β ˜ n , 0 g 0 } = 0 .
From Lemma A8, under H 0
T ( K n , ϵ ) ϵ | ϵ = 0 = 1 / n E J { IF ( Z n ; T , K n , 0 ) } .
Thus,
b ( 0 ) = μ k 2 δ ( ϵ ) ϵ 2 | ϵ = 0 = 2 μ k U n 1 / 2 A n E J { IF ( Z n ; T , K n , 0 ) } 2 .
Since from Lemma A8, IF ( z n ; T , K n , 0 ) = H n 1 E J { ψ RBD ( z n ; β ˜ n , 0 ) } is uniformly bounded, we have
D = lim sup n U n 1 / 2 A n E J { IF ( Z n ; T , K n , 0 ) } 2 < .
From Equation (A17)
lim sup n α ( K n , ϵ ) = α 0 + ϵ 2 μ k D + o ( ϵ 2 ) ,
since sup ϵ [ 0 , C ] lim sup n | b ( 3 ) ( ϵ ) | C from Lemma A9. We complete the proof. ☐
Proof of Corollary 1.
For Part ( i ) , following the proof of Theorem 1, for any fixed z ,
lim n α ( K n , ϵ ) = α 0 + ϵ 2 μ k U 1 / 2 A IF ( z ; T , K 0 ) 2 + d ( z , ϵ ) ,
where d ( z , ϵ ) = o ( ϵ 2 ) . From the assumption that sup x R p w ( x ) x C and sup μ R | q ( μ ) V ( μ ) / F ( μ ) | C , we know D 1 . Following the proof of Lemma A9, sup z R | d ( z , ϵ ) | = o ( ϵ 2 ) . We finished the proof of part ( i ) .
Part ( ii ) is straightforward by applying Theorem 1 with J = Δ z n . ☐
Proof of Theorem 2.
Lemma A7 implies that
n [ { U ( K n ) } 1 / 2 { A n T ( K n ) g 0 } { U ( K n ) } 1 / 2 ( A n β ˜ n , 0 g 0 ) U n 1 / 2 A n { T ( K n , ϵ ) β ˜ n , 0 } ] L ( 0 , I k ) .
From Lemmas A5 and A6,
n [ { U ( K n ) } 1 / 2 { A n T ( K n ) g 0 } U n 1 / 2 { A n T ( K n , ϵ ) g 0 } ] L ( 0 , I k ) .
Then, W n is asymptotically χ k 2 ( δ ( ϵ ) ) with δ ( ϵ ) = n U n 1 / 2 { A n T ( K n , ϵ ) g 0 } 2 under H 1 n . Therefore, β ( K n , ϵ ) = P ( W n > η 1 α 0 | H 1 n ) = 1 H k ( η 1 α 0 ; δ ( ϵ ) ) + h ( n , ϵ ) , where h ( n , ϵ ) = β ( K n , ϵ ) 1 + H k ( η 1 α 0 ; δ ( ϵ ) ) 0 as n for any fixed ϵ . Let b ( ϵ ) = H k ( η 1 α 0 ; δ ( ϵ ) ) . Then, for ϵ close to 0, we have
β ( K n , ϵ ) β 0 = b ( ϵ ) b ( 0 ) + h ( n , ϵ ) h ( n , 0 ) = ϵ b ( 0 ) + 1 2 ϵ 2 b ( ϵ ) + h ( n , ϵ ) h ( n , 0 ) ,
where 0 < ϵ < ϵ . Note that under H 1 n , δ ( 0 ) = n U n 1 / 2 ( A n β ˜ n , 0 g 0 ) 2 = c T U n 1 c . Then,
b ( 0 ) = H k ( η 1 α 0 ; δ ) δ | δ = δ ( 0 ) δ ( ϵ ) ϵ | ϵ = 0 = 2 ν k n A n T ( K n , ϵ ) ϵ T | ϵ = 0 U n 1 { A n β ˜ n , 0 g 0 } = 2 ν k n A n T ( K n , ϵ ) ϵ T | ϵ = 0 U n 1 c .
From Lemma A8,
T ( K n , ϵ ) ϵ | ϵ = 0 = 1 / n E J { IF ( Z n ; T , K n , 0 ) } ,
and hence,
b ( 0 ) = 2 ν k c T U n 1 A n E J { IF ( Z n ; T , K n , 0 ) } .
Since sup ϵ [ 0 , C ] lim sup n | b ( ϵ ) | C under H 1 n by Lemma A9, we have lim inf n 1 / 2 ϵ 2 b ( ϵ ) = o ( ϵ ) as ϵ 0 .
Since from Lemma A8, IF ( z n ; T , K n , 0 ) = H n 1 E J { ψ RBD ( z n ; β ˜ n , 0 ) } is uniformly bounded,
| B | = | lim inf n 2 c T U n 1 A n E J { IF ( Z n ; T , K n , 0 ) } | < .
From Equation (A18), we complete the proof. ☐
Proof of Corollary 2.
The proof is similar to that for Corollary 1, using the results in Theorem 2. ☐

Appendix B. List of Notations and Symbols

  • A n : k × ( p n + 1 ) matrix in hypotheses Equations (8) and (14)
  • c : k dimensional vector in H 1 n in Equation (14)
  • F ( · ) : link function
  • G: bias-correction term in “robust- BD
  • G : limit of A n A n T , i.e., A n A n T n G
  • H n : H n = E K n , 0 { p 2 ( Y ; X ˜ n T β ˜ n , 0 ) w ( X n ) X ˜ n X ˜ n T }
  • IF ( · ; · , · ) : influence function
  • J: an arbitrary distribution in the contamination of Equation (10)
  • K n , 0 : true parametric distribution of Z n
  • K n , ϵ : K n , ϵ = ( 1 ϵ n ) K n , 0 + ϵ n J , ϵ -contamination in Equation (10)
  • K n : empirical distribution of { Z n i } i = 1 n
  • K ( · ) : expectation of robust- BD in Equation (11)
  • m ( · ) : conditional mean of Y given X n in Equation (1)
  • n: sample size
  • p n : dimension of β
  • p i ( · ; · ) : ith order derivative of robust- BD
  • q ( · ) : generating q-function of BD
  • T ( · ) : vector, a functional of estimator in Equation (12)
  • U n : U n = A n H n 1 Ω n H n 1 A n T
  • V ( · ) : conditional variance of Y given X n in Equation (2)
  • W n : Wald-type test statistic in Equation (9)
  • w ( · ) : weight function
  • X n : explanatory variables
  • Y: response variable
  • Z n = ( X n T , Y ) T
  • α ( · ) : level of the test
  • β ( · ) : power of the test
  • β ˜ n , 0 : true regression parameter
  • Δ z n : probability measure which puts mass 1 at the point z n
  • ϵ : amount of contamination in Equation (10), positive constant
  • ψ RBD ( · ; · ) : score vector in Equation (7)
  • Ω n : Ω n = E K n , 0 { p 1 2 ( Y ; X ˜ n T β ˜ n , 0 ) w 2 ( X n ) X ˜ n X ˜ n T }
  • ρ q ( · , · ) : robust- BD in Equation (4)

References

  1. Zhang, C.M.; Guo, X.; Cheng, C.; Zhang, Z.J. Robust-BD estimation and inference for varying-dimensional general linear models. Stat. Sin. 2012, 24, 653–673. [Google Scholar] [CrossRef]
  2. McCullagh, P.; Nelder, J.A. Generalized Linear Models, 2nd ed.; Chapman & Hall: London, UK, 1989. [Google Scholar]
  3. Morgenthaler, S. Least-absolute-deviations fits for generalized linear models. Biometrika 1992, 79, 747–754. [Google Scholar] [CrossRef]
  4. Ruckstuhl, A.F.; Welsh, A.H. Robust fitting of the binomial model. Ann. Stat. 2001, 29, 1117–1136. [Google Scholar]
  5. Noh, M.; Lee, Y. Robust modeling for inference from generalized linear model classes. J. Am. Stat. Assoc. 2007, 102, 1059–1072. [Google Scholar] [CrossRef]
  6. Künsch, H.R.; Stefanski, L.A.; Carroll, R.J. Conditionally unbiased bounded-influence estimation in general regression models, with applications to generalized linear models. J. Am. Stat. Assoc. 1989, 84, 460–466. [Google Scholar]
  7. Stefanski, L.A.; Carroll, R.J.; Ruppert, D. Optimally bounded score functions for generalized linear models with applications to logistic regression. Biometrika 1986, 73, 413–424. [Google Scholar] [CrossRef]
  8. Bianco, A.M.; Yohai, V.J. Robust estimation in the logistic regression model. In Robust Statistics, Data Analysis, and Computer Intensive Methods; Springer: New York, NY, USA, 1996; pp. 17–34. [Google Scholar]
  9. Croux, C.; Haesbroeck, G. Implementing the Bianco and Yohai estimator for logistic regression. Comput. Stat. Data Anal. 2003, 44, 273–295. [Google Scholar] [CrossRef]
  10. Heritier, S.; Ronchetti, E. Robust bounded-influence tests in general parametric models. J. Am. Stat. Assoc. 1994, 89, 897–904. [Google Scholar] [CrossRef]
  11. Cantoni, E.; Ronchetti, E. Robust inference for generalized linear models. J. Am. Stat. Assoc. 2001, 96, 1022–1030. [Google Scholar] [CrossRef]
  12. Bianco, A.M.; Martínez, E. Robust testing in the logistic regression model. Comput. Stat. Data Anal. 2009, 53, 4095–4105. [Google Scholar] [CrossRef]
  13. Ronchetti, E.; Trojani, F. Robust inference with GMM estimators. J. Econom. 2001, 101, 37–69. [Google Scholar] [CrossRef]
  14. Basu, A.; Mandal, N.; Martin, N.; Pardo, L. Robust tests for the equality of two normal means based on the density power divergence. Metrika 2015, 78, 611–634. [Google Scholar] [CrossRef] [Green Version]
  15. Lee, S.; Na, O. Test for parameter change based on the estimator minimizing density-based divergence measures. Ann. Inst. Stat. Math. 2005, 57, 553–573. [Google Scholar] [CrossRef]
  16. Kang, J.; Song, J. Robust parameter change test for Poisson autoregressive models. Stat. Probab. Lett. 2015, 104, 14–21. [Google Scholar] [CrossRef]
  17. Basu, A.; Ghosh, A.; Martin, N.; Pardo, L. Robust Wald-type tests for non-homogeneous observations based on minimum density power divergence estimator. ArXiv Pre-print, 2017; arXiv:1707.02333. [Google Scholar]
  18. Ghosh, A.; Basu, A.; Pardo, L. Robust Wald-type tests under random censoring. ArXiv, 2017; arXiv:1708.09695. [Google Scholar]
  19. Brègman, L.M. A relaxation method of finding a common point of convex sets and its application to the solution of problems in convex programming. U.S.S.R. Comput. Math. Math. Phys. 1967, 7, 620–631. [Google Scholar] [CrossRef]
  20. Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning; Springer: Berlin, Germany, 2001. [Google Scholar]
  21. Zhang, C.M.; Jiang, Y.; Shang, Z. New aspects of Bregman divergence in regression and classification with parametric and nonparametric estimation. Can. J. Stat. 2009, 37, 119–139. [Google Scholar] [CrossRef]
  22. Huber, P. Robust estimation of a location parameter. Ann. Math. Statist. 1964, 35, 73–101. [Google Scholar] [CrossRef]
  23. Hampel, F.R.; Ronchetti, E.M.; Roussecuw, P.J.; Stahel, W.A. Robust Statistics: The Application Based on Influence Function; John Wiley: New York, NY, USA, 1986. [Google Scholar]
  24. Hampel, F.R. The influence curve and its role in robust estimation. J. Am. Stat. Assoc. 1974, 69, 383–393. [Google Scholar] [CrossRef]
  25. Fan, J.; Peng, H. Nonconcave penalized likelihood with a diverging number of parameters. Ann. Stat. 2004, 32, 928–961. [Google Scholar]
  26. Van der Vaart, A.W. Asymptotic Statistics; Cambridge University Press: Cambridge, UK, 1998. [Google Scholar]
  27. Clarke, B.R. Uniqueness and Fréchet differentiability of functional solutions to maximum likelihood type equations. Ann. Stat. 1983, 11, 1196–1205. [Google Scholar] [CrossRef]
Figure 1. Observed level of W n versus ϵ for overdispersed Poisson responses. The dotted line indicates the 5% significance level.
Figure 1. Observed level of W n versus ϵ for overdispersed Poisson responses. The dotted line indicates the 5% significance level.
Entropy 20 00168 g001
Figure 2. Observed power of W n versus ϵ for overdispersed Poisson responses. The statistics in the left panel correspond to non-robust method and those in the right panel are for robust method. The asterisk line indicates the 5% significance level.
Figure 2. Observed power of W n versus ϵ for overdispersed Poisson responses. The statistics in the left panel correspond to non-robust method and those in the right panel are for robust method. The asterisk line indicates the 5% significance level.
Entropy 20 00168 g002
Figure 3. Observed level of W n versus ϵ for Bernoulli responses. The statistics in (a) use deviance loss and those in (b) use exponential loss. The dotted line indicates the 5% significancelevel.
Figure 3. Observed level of W n versus ϵ for Bernoulli responses. The statistics in (a) use deviance loss and those in (b) use exponential loss. The dotted line indicates the 5% significancelevel.
Entropy 20 00168 g003
Figure 4. Observed power of W n versus ϵ for Bernoulli responses. The top panels correspond to deviance loss while the bottom panels are for exponential loss. The statistics in the left panels are calculated using non-robust method and those in the right panels are from robust method. The asterisk line indicates the 5% significance level.
Figure 4. Observed power of W n versus ϵ for Bernoulli responses. The top panels correspond to deviance loss while the bottom panels are for exponential loss. The statistics in the left panels are calculated using non-robust method and those in the right panels are from robust method. The asterisk line indicates the 5% significance level.
Entropy 20 00168 g004

Share and Cite

MDPI and ACS Style

Guo, X.; Zhang, C. Robustness Property of Robust-BD Wald-Type Test for Varying-Dimensional General Linear Models. Entropy 2018, 20, 168. https://doi.org/10.3390/e20030168

AMA Style

Guo X, Zhang C. Robustness Property of Robust-BD Wald-Type Test for Varying-Dimensional General Linear Models. Entropy. 2018; 20(3):168. https://doi.org/10.3390/e20030168

Chicago/Turabian Style

Guo, Xiao, and Chunming Zhang. 2018. "Robustness Property of Robust-BD Wald-Type Test for Varying-Dimensional General Linear Models" Entropy 20, no. 3: 168. https://doi.org/10.3390/e20030168

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop