Overview of High-Dimensional Measurement Error Regression Models

Luo, Jingxuan; Yue, Lili; Li, Gaorong

doi:10.3390/math11143202

Open AccessReview

Overview of High-Dimensional Measurement Error Regression Models

by

Jingxuan Luo

¹,

Lili Yue

² and

Gaorong Li

^1,*

¹

School of Statistics, Beijing Normal University, Beijing 100875, China

²

School of Statistics and Data Science, Nanjing Audit University, Nanjing 211815, China

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(14), 3202; https://doi.org/10.3390/math11143202

Submission received: 14 June 2023 / Revised: 10 July 2023 / Accepted: 19 July 2023 / Published: 21 July 2023

(This article belongs to the Special Issue Computational Statistics and Data Analysis, 2nd Edition)

Download Versions Notes

Abstract

:

High-dimensional measurement error data are becoming more prevalent across various fields. Research on measurement error regression models has gained momentum due to the risk of drawing inaccurate conclusions if measurement errors are ignored. When the dimension p is larger than the sample size n, it is challenging to develop statistical inference methods for high-dimensional measurement error regression models due to the existence of bias, nonconvexity of the objective function, high computational cost and many other difficulties. Over the past few years, some works have overcome the aforementioned difficulties and proposed several novel statistical inference methods. This paper mainly reviews the current development on estimation, hypothesis testing and variable screening methods for high-dimensional measurement error regression models and shows the theoretical results of these methods with some directions worthy of exploring in future research.

Keywords:

convex optimization; estimation; high-dimensional data; hypothesis testing; measurement error; variable selection

MSC:

62J05; 62H15

1. Introduction

Measurement error data inevitably exist in applications, causing significant concern in various fields including biology, medicine, epidemiology, economics, finance and remote sensing. So far, there have been a wealth of research achievements on classical low-dimensional measurement error regression models under various assumptions. Numerous studies focus on parameter estimation for low-dimensional measurement error regression models, with the primary techniques listed below: (1) Corrected regression estimation methods [1]; (2) Simulation–Extrapolation (SIMEX) estimation methods [2,3]; (3) Deconvolution methods [4]; (4) Corrected empirical likelihood methods [5,6]. For more detailed discussions on other estimation and hypothesis testing methods for classical low-dimensional measurement error models, please refer to the literature [7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29], as well as the monographs [30,31,32,33,34,35].

As one of the most popular research fields in statistics, high-dimensional regression has been widely used in various fields including genetics, economics, medical imaging, meteorology and sensor networks. Over the past two decades, various high-dimensional regression methods have been widely proposed such as Lasso [36], smoothly clipped absolute deviation (SCAD) [37], Elastic Net [38], Adaptive Lasso [39], Dantzig Selector [40], smooth integration of counting and absolute deviation (SICA) [41], and minimax concave penalty (MCP) [42], among many others. These methods have been widely applied to estimate regression coefficients while also achieving the goal of variable selection by adding penalties to objective functions; please refer to the literature review [43,44,45,46] as well as the monographs [47,48,49].

For the variable screening methods of ultrahigh-dimensional regression models where dimension p and sample size n satisfy

log p = O (n^{κ}), κ > 0

, Fan and Lv [50] proposed the sure independence screening (SIS) method, which is a pioneering method in this field. For the estimation and variable selection of ultrahigh-dimensional regression models, it is suggested to apply the SIS method for variable screening first. Then, based on the variables screened in the first step, we can utilize regularization methods with penalties to estimate the regression coefficients and identify the significant variables simultaneously. Due to the operability and effectiveness of the SIS method in applications, numerous works have extended the method; see [51,52,53,54,55,56,57,58,59,60].

However, most of the aforementioned theories and applications for high-dimensional regression models focused on clean data. In the era of big data, researchers frequently collect high-dimensional data with measurement errors. Typical instances include gene expression data [61] and sensor network data [62]. The imprecise measurements are the result of poorly managed and defective data collection processes as well as the imprecise measuring instruments. It is well known that ignoring the influence of measurement errors will result in biased estimators and erroneous conclusions. Therefore, developing statistical inference methods for high-dimensional measurement error regression models has drawn a lot of interest.

Based on the types of measurement errors, research on high-dimensional measurement error regression models can be divided into the following three categories: covariates containing measurement errors; response variables containing measurement errors; both covariates and response variables containing measurement errors. In this paper, we mainly focus on the category where covariates contain measurement errors. When the dimension p is larger than the sample size n, parameter estimation can be challenging due to the nonconvexity of the penalized objective function caused by correction for the bias. This further makes it impossible to obtain the optimal solution of optimization problem. We utilize the following linear regression model to illustrate this problem

y = X β + ε,

(1)

where

y = {(y_{1}, \dots, y_{n})}^{T} \in R^{n}

is the

n \times 1

response vector,

X = {(X_{1}, \dots, X_{n})}^{T} \in R^{n \times p}

is the

n \times p

fixed design matrix with

X_{i} = {(x_{i 1}, \dots, x_{i p})}^{T}

,

β = {(β_{1}, \dots, β_{p})}^{T} \in R^{p}

is the sparse regression coefficient vector with only s nonzero components, and assume that model error vector

ε = {(ε_{1}, \dots, ε_{n})}^{T} \in R^{n}

is independent of

X

. In order to obtain a sparse estimator of the true regression coefficient vector

β_{0} = {(β_{01}, \dots, β_{0 p})}^{T} \in R^{p}

, we can minimize the following penalized least-square objective function

\frac{1}{2 n} {∥ y - X β ∥}_{2}^{2} + {∥ p_{λ} (β) ∥}_{1},

(2)

which is equivalent to minimizing

\frac{1}{2} β^{T} Σ β - ρ^{T} β + {∥ p_{λ} (β) ∥}_{1},

(3)

where

Σ = n^{- 1} X^{T} X

,

ρ = n^{- 1} X^{T} y

,

p_{λ} (\cdot)

is a penalty function with regularization parameter

λ \geq 0

. If the covariates matrix

X

can be precisely measured, the penalized objective functions (2) and (3) are convex. Thus, we can obtain a sparse estimator of

β_{0}

by minimizing the penalized objective function (2) or (3).

However, it is common that the covariates matrix

X

cannot be accurately observed in practice. Let

W = {(W_{1}, \dots, W_{n})}^{T} = {(w_{i j})}_{n \times p}

be the observed covariates matrix with additive measurement errors satisfying

W = X + U

, where

U = {(U_{1}, \dots, U_{n})}^{T}

is the matrix of measurement errors,

U_{i} = {(u_{i 1}, \dots, u_{i p})}^{T}

follows a sub-Gaussian distribution with mean zero and covariance matrix

Σ_{u}

, and it is assumed to be independent of

(X, y)

. To reduce the influence of measurement errors, Loh and Wainwright [63] proposed replacing

Σ

and

ρ

in the penalized objective function (3) by their consistent estimators

\hat{Σ} = n^{- 1} W^{T} W - Σ_{u}

and

\tilde{ρ} = n^{- 1} W^{T} y

, respectively. Then, we can obtain the sparse estimator of

β_{0}

by minimizing the following penalized objective function

\frac{1}{2} β^{T} \hat{Σ} β - {\tilde{ρ}}^{T} β + {∥ p_{λ} (β) ∥}_{1} .

(4)

Note that when the dimension p is fixed or smaller than the sample size n, it can be guaranteed that

\hat{Σ}

is a positive definite or semi positive-definite matrix. It further ensures that the penalized objective function (4) remains convex. Thus, the global optimal solution of

β

can be obtained by minimizing the penalized objective function (4).

However, for high-dimensional or ultrahigh-dimensional regression models, i.e.,

p > n

or

p ≫ n

, there are two key problems: (i) the penalized objective function (4) is no longer convex and unbounded from below because the corrected estimator

\hat{Σ}

of

Σ

is no longer a semi-positive definite matrix. This further makes it impossible to obtain the estimator of

β_{0}

by minimizing the penalty objective function (4). (ii) In order to construct an objective function similar to that of standard Lasso and solve the corresponding optimization problem using R package “glmnet” or “lars”, it is necessary to decompose

\hat{Σ}

by Cholesky’s decomposition method and obtain the substitution of response vector and covariates matrix. However, this process results in an error accumulation and makes it challenging to guarantee valid theoretical results; please see the detailed discussions in [64,65].

For problem (i), Loh and Wainwright [63] changed the unconstrained optimization problem into a constrained optimization problem by adding restrictions to

β

. They suggested applying the projected gradient descent algorithm to solve the restricted optimization problem and acquire the global optimal solution of true regression coefficient vector

β_{0}

. Nevertheless, the penalized objective function of the optimization problem is still nonconvex. To address this issue, Datta and Zou [64] suggested substituting

\hat{Σ}

by its semi-positive definite projection matrix

\tilde{Σ}

, and they proposed convex conditioned Lasso (CoCoLasso). Furthermore, Zheng et al. [65] introduced a balanced estimation that prevented overfitting while maintaining the estimation accuracy by combining

l_{1}

and concave penalty. Zhang et al. [66] further proposed an estimation method based on

L_{0}

regularization. Tao et al. [67] constructed a modified least-squares loss function using a semi-positive definite projection matrix for the estimated covariance matrix and proposed a calibrated zero-norm regularized least squares (CaZnRLS) estimation of regression coefficients. Rosenbaum and Tsybakov [68,69] proposed a matrix uncertainty (MU) selector and its improved version compensated MU selector for high-dimensional linear models with additive measurement errors in covariates. Sørensen et al. [70] extended the MU selector to generalized linear models and developed the generalized matrix uncertainty (GMU) selector. Sørensen et al. [71] showed the theoretical results of relevant variable selection methods. Based on the MU selector, Belloni et al. [72] introduced an estimator that can achieve the minimax efficiency bound. They proved that the corresponding optimization problem can be converted into a second-order cone programming problem, which can be solved in polynomial time. Romeo and Thoresen [73] evaluated the performance of the MU selector in [68], nonconvex Lasso in [63], and CoCoLasso in [64] using simulation studies. Brown et al. [74] proposed a path-following iterative algorithm called Measurement Error Boosting (MEBoost), which is a computationally effective method for variable selection in high-dimensional measurement error regression models. Nghiem and Potgieter [75] introduced a new estimation method called simulation–selection–extrapolation (SIMSELEX), which used Lasso in the simulation step and group Lasso in the selection step. Li and Wu [76] established minimax convergence rates for the estimation of regression coefficients under a more general situation. Bai et al. [77] proposed a variable selection method for ultrahigh-dimensional linear quantile regression models with measurement errors. Jiang and Ma [78] drew on the idea of nonconvex Lasso in [63] and proposed an estimator of the regression coefficients for high-dimensional Poisson models with measurement errors. Byrd and McGee [79] developed an iterative estimation method for high-dimensional generalized linear models with additive measurement errors based on the imputation-regularized optimization (IRO) algorithm in [80]. However, the error accumulation issue mentioned in problem (ii) has not been addressed in the literature.

The aforementioned works place more emphasis on estimation and variable selection problems rather than hypothesis testing. For high-dimensional regression models with clean data, research on hypothesis testing problems has made significant progress under various settings in [81,82,83,84,85,86,87,88]. For high-dimensional measurement error models, the hypothesis testing methods are equally crucial. However, the bias and instability caused by measurement errors make hypothesis testing extremely difficult. Recently, some progress has been achieved in statistical inference methods. Based on a multiplier bootstrap, Belloni [89] constructed simultaneous confidence intervals for the target parameters in high-dimensional linear measurement error models. Focused on the case where a fixed number of covariates contain measurement errors, Li et al. [90] proposed a corrected decorrelated score test for parameters corresponding to the error-prone covariates and created asymptotic confidence intervals for them. Huang et al. [91] proposed a new variable selection method based on debiased CoCoLasso and proved that it can achieve false discovery rate (FDR) control. Jiang et al. [92] developed Wald and score tests for high-dimensional Poisson measurement error models.

Compared to the above estimation and hypothesis testing methods, there are relatively few screening techniques for ultrahigh-dimensional measurement error models. Nghiem et al. [93] introduced two screening methods named corrected penalized marginal screening (PMSc) and corrected sure independence screening (SISc) for ultrahigh-dimensional linear measurement error models.

This paper gives an overview of the estimation and hypothesis testing methods for high-dimensional measurement error regression models as well as the variable screening methods for ultrahigh-dimensional measurement error models. The rest of this paper is organized as follows. In Section 2, we review some estimation methods for linear models. We survey the estimation methods for generalized linear models in Section 3. Section 4 presents the recent advances in hypothesis testing methods for high-dimensional measurement error models. Section 5 introduces the variable screening techniques for ultrahigh-dimensional linear measurement error models. We conclude the paper with some discussions in Section 6.

Notation 1.

Let

S^{p}

be the set of all

p \times p

real symmetric matrices and

S_{+}^{p}

be the subset of

S^{p}

containing all positive semi-definite matrix in

S^{p}

. We use

| A |

to denote the cardinality of set

A

. Let

S = {j : β_{0 j} \neq 0, j = 1, \dots, p}

be the index set of nonzero parameters. For a vector

a = (a_{1}, \dots, a_{m}) \in R^{m}

, let

{∥ a ∥}_{q} = (\sum_{ℓ = 1}^{m} | a_{ℓ} {|^{q})}^{1 / q}, 1 \leq q < \infty

denote its

l_{q}

norm, and write

{∥ a ∥}_{\infty} = {max}_{1 \leq ℓ \leq m} | a_{ℓ} |

. Denote by

a_{A} \in R^{| A |}

the subvector of

a

with index set

A \subset {1, \dots, m}

. Denote by

e

the vector of all ones. For a matrix

B = (b_{i j})

, let

{∥ B ∥}_{1} = {max}_{j} \sum_{i} |b_{i j}|, {∥ B ∥}_{max} = {max}_{i, j} |b_{i j}|

and

{∥ B ∥}_{\infty} = {max}_{i} \sum_{j} |b_{i j}| .

For constants a and b, define

a \lor b = max {a, b}

. We use c and C to denote positive constants that may vary throughout the paper. Finally, let

\overset{d}{\to}

denote convergence in distribution.

2. Estimation Methods for Linear Models

This section mainly focuses on the linear model (1) with high-dimensional settings where the dimension p is larger than the sample size n. When the data can be observed precisely, we can estimate the true regression coefficient vector

β_{0}

by minimizing the penalized objective function (2) or (3). However, we frequently come across cases where the measured covariates contain measurement errors. There are various types of measurement error data, and we primarily focus on the two categories below.

(1) Covariates with additive errors. The observed error-prone covariate

W_{i} = X_{i} + U_{i}

, where the measurement error

U_{i}

is independent of

X_{i}

and independently generated from a distribution with mean zero and known covariance matrix

Σ_{u}

.

(2) Covariates with multiplicative errors. The observed error-prone covariates

W_{i} = X_{i} ⊙ M_{i}

, where ⊙ denotes the Hadamard product, and the measurement error

M_{i}

is independent of

X_{i}

and follows from a distribution with mean

μ_{M}

and known covariance matrix

Σ_{M}

.

Our main goal is to obtain the sparse estimator

\hat{β}

of true regression coefficient vector

β_{0}

in the presence of measurement errors. As we introduced in Section 1, we will run into the issue of the penalized objective function being nonconvex and unbounded from below after correcting the bias caused by measurement errors. This prevents us solving the optimization problem. Several works focused on this issue and proposed some estimation methods.

2.1. Nonconvex Lasso

In order to resolve the issue of the objective function being unbounded from below and unsolvable in the presence of measurement errors, Loh and Wainwright [63] added restrictions to regression coefficients

β

and adopted an

l_{1}

penalty. Then, the estimator of

β_{0}

can be obtained by the following

l_{1}

-constrained quadratic program

{\hat{β}}_{NCL} \in arg min_{{∥ β ∥}_{1} \leq c_{0} \sqrt{s}} \{\frac{1}{2} β^{T} \hat{Σ} β - {\tilde{ρ}}^{T} β + λ {∥ β ∥}_{1}\} = : arg min_{{∥ β ∥}_{1} \leq c_{0} \sqrt{s}} \{L (β) + {λ ∥ β ∥}_{1}\},

(5)

where

c_{0} > 0

is a constant,

s = | S |

denotes the number of nonzero components of

β_{0}

,

L (β) = 2^{- 1} β^{T} \hat{Σ} β - {\tilde{ρ}}^{T} β

is the loss function,

\hat{Σ}

and

\tilde{ρ}

are the consistent estimators of covariance matrix

Σ

of

X_{i}

and marginal correlation coefficient vector

ρ

of

(X_{i}, y_{i})

, and they may differ in terms of various kinds of measurement error data. Under the additive error setting,

{\hat{Σ}}_{add} = n^{- 1} W^{T} W - Σ_{u}, {\tilde{ρ}}_{add} = n^{- 1} W^{T} y .

(6)

Under the multiplicative error setting,

{\hat{Σ}}_{mul} = n^{- 1} W^{T} W ⊘ (Σ_{m} + μ_{m} μ_{m}^{T}), {\tilde{ρ}}_{mul} = n^{- 1} W^{T} y ⊘ μ_{m},

(7)

where ⊘ denotes the elementwise division operator, and let

\hat{Σ} = {\hat{Σ}}_{add}

or

{\hat{Σ}}_{mul}

throughout the sequel. The reason for using “∈” rather than “=” in (5) is that several local minima might exist in the objective function. Note that this method still relies on a nonconvex objective function to obtain the estimator of

β_{0}

. Thus, we refer to it as “nonconvex Lasso”. It can be implemented by the R package “hdme” [94] at https://cran.r-project.org/web/packages/hdme/vignettes/hdme.html (accessed on 13 June 2023).

The nonconvexity of the penalized objective function makes it challenging to obtain the global minimum of the optimization problem (5). To solve the optimization problem (5), Loh and Wainwright [63] applied the projected gradient descent algorithm and demonstrated that even if the penalized objective function is nonconvex, the solution produced by this algorithm can reach the global minimum with high probability. The algorithm finds the global minimum in an iterative way as follows. At the

(k + 1)

th iteration,

β_{NCL}^{(k + 1)} = arg min_{{∥ β ∥}_{1} \leq c_{0} \sqrt{s}} \{L (β_{NCL}^{(k)}) + \nabla L {(β_{NCL}^{(k)})}^{T} (β - β_{NCL}^{(k)}) + \frac{η}{2} ∥ β - β_{NCL}^{(k)} ∥_{2}^{2} + λ {∥ β ∥}_{1}\},

(8)

where

\nabla L (β) = \hat{Σ} β - \tilde{ρ}

is the gradient of loss function

L (β)

,

η > 0

denotes the step-size parameter. For details of this algorithm, please see [63,95,96,97]. Loh and Wainwright [63] proved that the solution obtained by iteration (8) is quite near to the global minimum in both

l_{1}

-norm and

l_{2}

-norm under some conditions. Specifically, for all

t \geq 0

,

\begin{matrix} ∥ β_{NCL}^{(k)} - {\hat{β}}_{NCL} ∥_{2}^{2} & \leq γ^{k} ∥ β_{NCL}^{(0)} - {\hat{β}}_{NCL} ∥_{2}^{2} + C_{1} \frac{log p}{n} ∥ {\hat{β}}_{NCL} - β_{0} ∥_{1}^{2} + C_{2} {∥ {\hat{β}}_{NCL} - β_{0} ∥}_{2}^{2}, \\ ∥ β_{NCL}^{(k)} - {\hat{β}}_{NCL} ∥_{1} & \leq 2 \sqrt{k} ∥ β_{NCL}^{(k)} - {\hat{β}}_{NCL} ∥_{2} + 2 \sqrt{k} ∥ {\hat{β}}_{NCL} - β_{0} ∥_{2} + 2 {∥ {\hat{β}}_{NCL} - β_{0} ∥}_{1}, \end{matrix}

where

C_{1}

and

C_{2}

are positive constants, and

γ \in (0, 1)

is a contraction coefficient independent of

(n, p, k)

. For the estimator

{\hat{β}}_{NCL}

of the true regression coefficient vector

β_{0}

, Loh and Wainwright [63] showed that with any

c_{0} \geq {∥ β_{0} ∥}_{2}

and

λ = O (\sqrt{log p / n})

, the

l_{q}

-estimation error of

{\hat{β}}_{NCL}

satisfies the bounds

∥ {\hat{β}}_{NCL} - β_{0} ∥_{q} = O (s^{1 / q} \sqrt{\frac{log p}{n}}), q = 1, 2 .

When

q = 1

, the

l_{1}

-estimation error can reach the convergence rate

s \sqrt{log p / n}

; when

q = 2

, the

l_{2}

-estimation error can reach the convergence rate

\sqrt{s log p / n}

. However, Loh and Wainwright [63] did not establish the variable selection consistency and oracle inequality for the prediction error of the nonconvex Lasso estimator.

2.2. Convex Conditioned Lasso

The nonconvex Lasso [63] overcomes the problem of unsolvability caused by the nonconvex objective function in the presence of measurement errors. However, there are some drawbacks to this method. First, the nonconvex Lasso solves the problem by adding constraints to

β

, but the penalized objective function remains nonconvex. It is well recognized that the convexity of the penalized objective function will be incredibly useful for theoretical analysis and computation. Second, two important unknown parameters

c_{0}

and s are included in the optimization problem (5). These two parameters have a direct impact on the estimation results, but we are not sure about their magnitudes in applications. Third, Loh and Wainwright [63] have not established the variable selection results of the nonconvex Lasso estimator. To remedy these issues, Datta and Zou [64] proposed Convex Conditioned Lasso (CoCoLasso) based on a convex objective function, which possesses computational and theoretical superiority brought by convexity.

In order to construct the convex objective function, Datta and Zou [64] introduced a nearest positive semi-definite matrix projection operator for the square matrix, which is defined as

{(A)}_{+} = arg min_{A_{1} \geq 0} {∥ A - A_{1} ∥}_{max},

(9)

where

A

is a square matrix. Let

\tilde{Σ} = {(\hat{Σ})}_{+}

, and the alternating direction method of multipliers (ADMM) algorithm [98] can be utilized to derive

\tilde{Σ}

from

\hat{Σ}

. Based on

\tilde{Σ}

, the following convex objective function can be constructed, and it yields the CoCoLasso estimator

{\hat{β}}_{coco} = arg min_{β} \{\frac{1}{2} β^{T} \tilde{Σ} β - {\tilde{ρ}}^{T} β + λ {∥ β ∥}_{1}\} .

(10)

When the covariates contain additive measurement errors,

{\tilde{Σ}}_{add} = {({\hat{Σ}}_{add})}_{+}, {\tilde{ρ}}_{add} = n^{- 1} W^{T} y, {\hat{Σ}}_{add} = n^{- 1} W^{T} W - Σ_{u} .

(11)

When the covariates contain multiplicative measurement errors,

{\tilde{Σ}}_{mul} = {({\hat{Σ}}_{mul})}_{+}, {\tilde{ρ}}_{mul} = n^{- 1} W^{T} y ⊘ μ_{m}, {\hat{Σ}}_{mul} = n^{- 1} W^{T} W ⊘ (Σ_{m} + μ_{m} μ_{m}^{T}) .

(12)

Note that

\tilde{Σ}

not only contributes to the construction of the convex objective function but also possesses the same level of estimation accuracy as

\hat{Σ}

in [63]. It can be guaranteed by the following equation

∥ \tilde{Σ} {- Σ ∥}_{max} \leq ∥ \tilde{Σ} - \hat{Σ} ∥_{max} + ∥ \hat{Σ} {- Σ ∥}_{max} \leq 2 {∥ \hat{Σ} - Σ ∥}_{max} .

Since

\tilde{Σ}

is semi-positive definite, we can perform Cholesky decomposition on

\tilde{Σ}

. Then, the Cholesky factor of

\tilde{Σ}

can be used to simplify computations by rewriting (10) as

{\hat{β}}_{coco} = arg min_{β} \frac{1}{2 n} ∥ \tilde{y} - \tilde{W} {β ∥}_{2}^{2} + λ {∥ β ∥}_{1},

(13)

where

\tilde{W}

denotes the Cholesky factor of

\tilde{Σ}

satisfying

n^{- 1} {\tilde{W}}^{T} \tilde{W} = \tilde{Σ}

, and

\tilde{y}

is the vector satisfying

n^{- 1} {\tilde{W}}^{T} \tilde{y} = \tilde{ρ}

. The penalized objective function in (13) is similar to that of the standard Lasso. Thus, we can utilize the coordinate descent algorithm to obtain the CoCoLasso estimator; please see the details in [64,99,100]. Theoretically, Datta and Zou [64] established the

l_{q}

-estimation

(q = 1, 2)

and prediction error bounds of the CoCoLasso estimator. Suppose that

ψ = min_{δ \neq 0, ∥ δ_{S^{c}} ∥_{1} \leq 3 {∥ δ_{S} ∥}_{1}} \frac{δ^{T} Σ δ}{{∥ δ ∥}_{2}^{2}} > 0 .

For

s \sqrt{ζ log p / n} < λ \leq min {ϵ_{0}, 12 ϵ_{0} ∥ β_{0 S} ∥_{\infty}}

, where

ζ = max {σ_{ε}^{4}, σ_{U}^{4}, 1}

,

ϵ_{0} = σ_{U}^{2}

,

σ_{ε}^{2}

and

σ_{U}^{2}

are sub-Gaussian parameters of model error and measurement error, respectively, the CoCoLasso estimator

{\hat{β}}_{coco}

satisfies that with probability at least

1 - C exp (- c log p)

,

\begin{matrix} ∥ {\hat{β}}_{coco} - β_{0} ∥_{q} = & O (\frac{λ s^{1 / q}}{ψ}), q = 1, 2, \end{matrix}

(14)

\begin{matrix} n^{- 1 / 2} {∥ X ({\hat{β}}_{coco} - β_{0}) ∥}_{2} = & O (λ \sqrt{\frac{s}{ψ}}) . \end{matrix}

(15)

The fomulas (14) and (15) show the oracle inequalities for the

l_{q}

-estimation error with

q = 1, 2

and prediction error. Furthermore, Datta and Zou [64] established the sign consistency of the CoCoLasso estimator under an additional irrepresentable condition and minimum signal strength condition. Meanwhile, there was no variable selection result provided for the nonconvex Lasso estimator

{\hat{β}}_{NCL}

in [63]. Thus, the CoCoLasso estimation method not only enjoys the computational convenience of convexity but also possesses excellent theoretical results. However, when the dimension of covariates p is large, the computation of

\tilde{Σ}

is expensive. To improve the computational efficiency, Escribe et al. [101] applied a two-step block descent algorithm and proposed a block coordinate descent convex conditioned Lasso (BDCoCoLasso), which is designed for the case in which the covariate matrix is only partially corrupted. CoCoLasso and BDCoCoLasso are now available in R package “BDcocolasso” at https://github.com/celiaescribe/BDcocolasso (accessed on 13 June 2023).

2.3. Balanced Estimation

CoCoLasso is effective in the parameter estimation of high-dimensional measurement error models, but it suffers from overfitting. To overcome this drawback, Zheng et al. [65] replaced the Lasso penalty in CoCoLasso with the combined

l_{1}

and concave penalty and developed the balanced estimator, which can be obtained by

{\hat{β}}_{bal} = arg min_{β} \{\frac{1}{2} β^{T} \tilde{Σ} β - {\tilde{ρ}}^{T} β + λ_{0} {∥ β ∥}_{1} + {∥ p_{λ} (β) ∥}_{1}\},

(16)

where

λ_{0} = c_{1} \sqrt{log p / n}

is the regularization parameter for the

l_{1}

penalty with

c_{1}

being a positive constant,

p_{λ} (β) = [p_{λ} (| β_{1} |), \dots, p_{λ} (| β_{p} {|)]}^{T}

, and

p_{λ} (u), u \in [0, + \infty)

is a concave penalty function with the tuning parameter

λ \geq 0

. The definitions of

\tilde{Σ}

and

\tilde{ρ}

are the same as those in (11) and (12) with the two kinds of measurement error data. In contrast to the CoCoLasso estimator, the balanced estimator strikes a perfect balance between prediction and variable selection. Surprisingly, excellent variable selection results promote the estimation and prediction accuracy of the balanced estimator. The simulation studies in [65] demonstrate the estimation and prediction accuracy as well as the better variable selection results of the balanced estimator. As for the asymptotic properties of

{\hat{β}}_{bal}

, Zheng et al. [65] established the oracle inequalities for the

l_{q}

-estimation and prediction error,

\begin{matrix} ∥ {\hat{β}}_{bal} - β_{0} ∥_{q} = & O_{p} (\frac{λ_{0} s^{1 / q}}{ϕ^{2}}), q = 1, 2, \end{matrix}

(17)

\begin{matrix} n^{- 1 / 2} {∥ X ({\hat{β}}_{bal} - β_{0}) ∥}_{2} = & O_{p} (\frac{λ_{0} \sqrt{s}}{ϕ}), \end{matrix}

(18)

where

ϕ = min_{δ \neq 0, ∥ δ_{S^{c}} ∥_{1} \leq 7 {∥ δ_{S} ∥}_{1}} \frac{n^{- 1 / 2} {∥ X δ ∥}_{2}}{∥ δ_{S} ∥_{2} \lor {∥ δ_{S^{c}}^{*} ∥}_{2}} > 0,

and

δ_{S^{c}}^{*} \in R^{s}

contains the s largest absolute vaules of

δ_{S^{c}}

. It can be seen from (17) and (18) that the bounds of

l_{q}

-estimation (

q = 1, 2

) and prediction error are free of regularization parameter

λ

for the concave penalty. Also, the upper bound of falsely discovered signs is provided in [65]. Denote

FS (\hat{β}) = | {1 \leq j \leq p : sgn ({\hat{β}}_{j}) \neq sgn (β_{0, j})} |

; then

FS (\hat{β}) = O_{p} (\frac{λ_{0}^{2} s}{λ^{2} ϕ^{4}}) .

(19)

From (19), we can see that if

{min}_{j \in S} | β_{0 j} | ≫ \sqrt{s log p / n}

such that

λ^{2} ≫ λ_{0}^{2} s

, a balanced estimator can achieve sign consistency, which is stronger than the variable selection consistency. Compared with the balanced estimator, the CoCoLasso estimator requires an additional irrepresentable condition to achieve this property.

2.4. Calibrated Zero-Norm Regularized Least Square Estimation

The nearest positive semi-definite matrix projection operator defined in [64] solves the problem that the penalized objective function is nonconvex in high-dimensional measurement error models. However, with the constraint of the positive semi-definite matrix, the computation cost of

\tilde{Σ}

is high. Tao et al. [67] demonstrated that as the dimension p increases, the time required to calculate

\tilde{Σ}

using the ADMM algorithm will increase significantly. Thus, Tao et al. [67] suggested replacing

\tilde{Σ}

with an approximation of

\hat{Σ}

that is easy to obtain but less precise. To achieve this purpose, consider the eigendecomposition of

\hat{Σ}

as follows

\hat{Σ} = V diag (θ_{1}, \dots, θ_{p}) V^{T},

where

diag (θ_{1}, \dots, θ_{p})

is a diagonal matrix containing the eigenvalues of

\hat{Σ}

with

θ_{1} \geq θ_{2} \geq \dots \geq θ_{p}

, and

V \in R^{p \times p}

is an orthonormal matrix consisting of the corresponding eigenvectors. Then, Tao et al. [67] substituted the Frobenius norm for the elementwise maximum norm in (9) and obtained a positive definite approximation of

\hat{Σ}

as follows

{\tilde{Σ}}_{F} = arg min_{W \geq ξ I} {∥ \hat{Σ} - W ∥}_{F} for some ξ > 0 .

(20)

Note that the optimal solution of (20) is the same as that of the problem

min_{W \geq ξ I} {∥ \hat{Σ} - W ∥}_{F}^{2} .

(21)

Thus, we have

{\tilde{Σ}}_{F} = ξ I + Π_{S_{+}^{p}} (\hat{Σ} - ξ I) = V diag [max (θ_{1}, ξ), \dots, max (θ_{p}, ξ)] V^{T},

(22)

where

Π_{S_{+}^{p}} (\cdot)

denotes the projection of a matrix on

S_{+}^{p}

. Similar to

\tilde{Σ}

, we have

{\tilde{Σ}}_{F} = n^{- 1} {\tilde{W}}_{F}^{T} {\tilde{W}}_{F}

, where

n^{- 1 / 2} {\tilde{W}}_{F}

is the Cholesky factor of

{\tilde{Σ}}_{F}

. Let

{\tilde{y}}_{F}

be the vector satisfying

n^{- 1} {\tilde{W}}_{F}^{T} {\tilde{y}}_{F} = \tilde{ρ}

. By some simple calculation, we can obtain that

\{\begin{matrix} {\tilde{W}}_{F} & = \sqrt{n} V diag (\sqrt{max (θ_{1}, ξ)}, \dots, \sqrt{max (θ_{p}, ξ)}) V^{T}, \\ {\tilde{y}}_{F} & = \sqrt{n} V diag (\frac{1}{\sqrt{max (θ_{1}, ξ)}}, \dots, \frac{1}{\sqrt{max (θ_{p}, ξ)}}) V^{T} \tilde{ρ} . \end{matrix}

(23)

Based on Equation (22),

{\tilde{Σ}}_{F}

can be obtained easily. This implies that computing

{\tilde{Σ}}_{F}

requires substantially less time than computing

\tilde{Σ}

. However, the approximation accuracy of

{\tilde{Σ}}_{F}

to

\hat{Σ}

is not as good as that of

\tilde{Σ}

because minimizing the Frobenius norm may yield larger components compared with the elementwise maximum norm. To obtain an excellent estimator of

β_{0}

, it is reasonable to find a more effective regression method to replace Lasso. Tao et al. [67] considered the zero norm penalty and defined the following calibrated zero-norm regularized least squares (CaZnRLS) estimator

{\hat{β}}_{zn} \in arg min_{β \in R^{p}} \{\frac{1}{2 n λ} ∥ {\tilde{W}}_{F} β - {\tilde{y}}_{F} ∥_{2}^{2} + {∥ β ∥}_{0}\} .

(24)

However, it is difficult to solve (24) directly. Thus, to give an equivalent form for (24) that can be solved, Tao et al. [67] defined

ϕ (u) : = \frac{a - 1}{a + 1} u^{2} + \frac{2}{a + 1} u (a > 1), u \in R .

It is easy to verify that for any

β \in R^{p}

,

{∥ β ∥}_{0} = min_{w \in R^{p}} \{\sum_{i = 1}^{p} ϕ (w_{i}) : {(e - w)}^{T} | β | = 0, 0 \leq w \leq e\},

(25)

where

| β | = (| β_{1} |, \dots, | β_{p} {|)}^{T}

. The Formula (25) implies that the optimization problem (24) can be rewritten as the following mathematical program with equilibrium constraints (MPEC)

min_{β, w \in R^{p}} \{\frac{1}{2 n λ} ∥ {\tilde{W}}_{F} β - {\tilde{y}}_{F} ∥_{2}^{2} + \sum_{i = 1}^{p} ϕ (w_{i}) : {(e - w)}^{T} | β | = 0, 0 \leq w \leq e\} .

(26)

Note that if the optimal solution of optimization problem (24) is

{\hat{β}}^{*}

, then the corresponding optimal solution of optimization problem (26) is

({\hat{β}}^{*}, sign (| {\hat{β}}^{*} |))

.

However, it can be seen that the annoying nonconvexity is introduced by the restriction

{(e - w)}^{T} | β | = 0

in (26), and it is the cause of the difficulty in obtaining the estimator

{\hat{β}}_{zn}

. Accordingly, Tao et al. [67] considered the following penalized version of optimization problem (26)

min_{β, w \in R^{p}} \{\frac{1}{2 n λ} ∥ {\tilde{W}}_{F} β - {\tilde{y}}_{F} ∥_{2}^{2} + \sum_{i = 1}^{p} ϕ (w_{i}) + ρ {(e - w)}^{T} | β |, 0 \leq w \leq e\},

(27)

where

ρ > 0

is the penalty parameter. Tao et al. [67] proved that the global optimal solution of optimization problem (27) with

ρ \geq \bar{ρ} : = (4 a L_{f}) {[(a + 1) λ]}^{- 1}

is the same as that of optimization problem (26), where

L_{f}

is the Lipschitz constant of the function

f (β) : = {(2 n)}^{- 1} {∥ {\tilde{W}}_{F} β - {\tilde{y}}_{F} ∥}_{2}^{2}

on the ball

{β \in R^{p} : ∥ β ∥_{2} \leq R}

, and R is a constant. Thus,

{\hat{β}}_{zn}

can be obtained by solving the following optimization problem with

ρ \geq \bar{ρ}

{\hat{β}}_{zn} \in arg min_{β \in R^{p}, w \in [0, e]} \{\frac{1}{2 n} {∥ {\tilde{W}}_{F} β - {\tilde{y}}_{F} ∥}_{2}^{2} + \sum_{i = 1}^{p} λ [ϕ (w_{i}) + ρ (1 - w_{i}) | β_{i} |]\} .

(28)

Tao et al. [67] recommended using the multi-stage convex relaxation approach (GEP–MSCRA) to obtain

{\hat{β}}_{zn}

. This approach solves (28) in an iterative way with the main steps summarized as follows.

Step 1. Initialize the algorithm with

w^{(0)} \in [0, 2^{- 1} e]

,

ρ^{(0)} = 1

,

λ > 0

,

k = 1

.

Step 2. Solve the following optimization problem and obtain

{\hat{β}}_{zn}^{(k)}

{\hat{β}}_{zn}^{(k)} = arg min_{β \in R^{p}} \{\frac{1}{2 n} ∥ {\tilde{W}}_{F} β - {\tilde{y}}_{F} ∥_{2}^{2} + λ \sum_{i = 1}^{p} (1 - w_{i}^{(k - 1)}) | β_{i} |\} .

Step 3. If

k = 1

, choose an appropriate

ρ^{(1)} > ρ^{(0)}

using the information from

∥ {\hat{β}}_{zn}^{(1)} ∥_{\infty}

; if

1 < k \leq 3

, choose

ρ^{(k)}

satisfying

ρ^{(k)} > ρ^{(k - 1)}

; if

k > 3

, let

ρ^{(k)} = ρ^{(k - 1)}

.

Step 4. Obtain

w_{i}^{(k)} (i = 1, \dots, p)

through the following optimization problem

w_{i}^{(k)} = arg min_{0 \leq w_{i} \leq 1} \{ϕ (w_{i}) - ρ^{(k)} w_{i} | {\hat{β}}_{zn, i}^{(k)} |\} .

Step 5. Let

k \leftarrow k + 1

and repeat Steps 2–4 until the stopping conditions are satisfied.

Note that the initial

w^{(0)}

in Step 1 is an arbitrary vector from the interval

[0, 2^{- 1} e]

rather than the feasible set

[0, e]

in (28). The reason is to obtain a better initial estimator

{\hat{β}}_{zn}^{(1)}

. In addition,

w_{i}^{(k)}

in Step 4 has the following closed form based on the convexity of

ϕ

w_{i}^{(k)} = min [1, max (\frac{(a + 1) ρ^{(k)} | β_{i}^{(k)} | - 2}{2 (a + 1)}, 0)], i = 1, \dots, p .

Consequently, the primary calculation in each iteration is to solve a weighted

l_{1}

-norm regularized least square problem. Under some regularity conditions,

{\hat{β}}_{zn}^{(k)}

satisfies

∥ {\hat{β}}_{zn}^{(k)} - β_{0} ∥_{2} = O_{p} (λ \sqrt{s}) \forall k \in N^{+} .

(29)

It can be seen from (29) that the

l_{2}

-estimation error bound of CaZnRLS estimator possesses the same order as those of nonconvex Lasso and CoCoLasso estimators. Tao et al. [67] further showed that the error bound of

{\hat{β}}_{zn}^{(k + 1)}

is better than that of

{\hat{β}}_{zn}^{(k)}

for all

k \in N^{+}

. Furthermore, Tao et al. [67] demonstrated that GEP-MSCRA will produce a

{\hat{β}}_{zn}^{(k)}

such that

supp ({\hat{β}}_{zn}^{(k)}) = supp (β_{0})

in a finite number of iterations if the minimum nonzero value of the smallest nonzero entries of

β_{0}

is not too small.

2.5. Linear and Conic Programming Estimation

In addition to the approaches mentioned above, another class of methods is based on the idea of the Dantzig selector to acquire an estimator of true regression coefficients

β_{0}

. Rosenbaum and Tsybakov [68] proposed the following matrix uncertainty (MU) selector

{\hat{β}}_{MU} = arg min_{β} \{{∥ β ∥}_{1} : ∥ n^{- 1} W^{T} {(y - W β) ∥}_{\infty} \leq δ {∥ β ∥}_{1} + λ\},

(30)

where

δ \geq 0

and

λ \geq 0

are tuning parameters depending on the level of measurement error U and model error

ε

, respectively. The MU selector is available in R package “hdme” [94].

However, the

n^{- 1} W^{T} W

is included in (30) rather than

n^{- 1} X^{T} X

due to the unobservability of

X

. Obviously, the matrix

n^{- 1} W^{T} W

contains bias caused by measurement errors. To address this issue, Rosenbaum and Tsybakov [69] proposed an improved version of the MU selector called the compensated MU selector. It is applicable to the case that the entries of measurement error

U_{i}

are independent such that

σ_{U, j}^{2} = n^{- 1} \sum_{i = 1}^{n} E (U_{i j}^{2})

is finite for

j = 1, \dots, p

. The compensated MU selector is defined as

{\hat{β}}_{CMU} = arg min_{β} \{{∥ β ∥}_{1} : ∥ n^{- 1} W^{T} (y - W β) + \hat{D} {β ∥}_{\infty} \leq δ {∥ β ∥}_{1} + λ\},

(31)

where

\hat{D}

is a diagonal matrix consisting of

{\hat{σ}}_{U, j}^{2}, j = 1, \dots, p

, and constants

δ

and

λ

are the same as those in (30). Rosenbaum and Tsybakov [69] showed that the

l_{q}

-estimation error of the estimator

{\hat{β}}_{CMU}

satisfies

∥ {\hat{β}}_{CMU} - β_{0} ∥_{q} = O_{p} (s^{1 / q} (∥ β_{0} ∥_{1} + 1) \sqrt{\frac{log p}{n}}), 1 \leq q \leq \infty .

The MU selector and compensated MU selector provide two alternative estimation methods for high-dimensional measurement error models, but there remains an issue. The optimization problem in (31) may be nonconvex, and Rosenbaum and Tsybakov [69] did not offer a suitable algorithm to the general case. To remedy this issue, Belloni et al. [72] proposed the conic-programming-based estimator

{\hat{β}}_{cp}

. Consider the following optimization problem

\begin{matrix} min_{β, t} \{{∥ β ∥}_{1} + κ t\}, \\ s . t . ∥ n^{- 1} W^{T} (y - W β) & + \hat{D} {β ∥}_{\infty} \leq δ t + λ, {∥ β ∥}_{2} \leq t, t \in R^{+}, \end{matrix}

(32)

where

κ, δ

and

λ

are positive tuning parameters. Suppose that the solution of (32) is

({\hat{β}}_{cp}; \hat{t})

, then,

{\hat{β}}_{cp}

is defined as the conic-programming-based estimator of true regression coefficients

β_{0}

. It is easy to verify that the optimization problem (32) can be solved efficiently in a polynomial time as it is a second-order cone programming problem. To analyze the asymptotic properties of

{\hat{β}}_{cp}

, assume that

κ \in [2^{- 1}, 2]

,

δ = O (\sqrt{log p / n})

, and

λ = O (\sqrt{log p / n})

. Then, Belloni et al. [72] showed that the

l_{q}

-estimation

(1 \leq q \leq \infty)

and prediction error of

{\hat{β}}_{cp}

satisfy

\begin{matrix} ∥ {\hat{β}}_{cp} - β_{0} ∥_{q} & = O_{p} (s^{1 / q} (∥ β_{0} ∥_{2} + 1) \sqrt{\frac{log p}{n}}), 1 \leq q \leq \infty, \end{matrix}

(33)

\begin{matrix} n^{1 / 2} {∥ X ({\hat{β}}_{cp} - β_{0}) ∥}_{2} & = O_{p} (s^{1 / 2} (∥ β_{0} ∥_{2} + 1) \sqrt{\frac{log p}{n}}) . \end{matrix}

(34)

In contrast to nonconvex Lasso in [63], the conic-programming-based estimator

{\hat{β}}_{cp}

can achieve the convergence rate in (33) and (34) without any information of the parameters

∥ β_{0} ∥_{1}

,

∥ β_{0} ∥_{2}

or s. Compared with the compensated MU selector in [69], the conic-programming based estimator

{\hat{β}}_{cp}

can be obtained in the general case without the computational difficulty of nonconvexity.

3. Estimation Methods for Generalized Linear Models

The above methods are mainly for linear models. This section introduces the estimation methods for high-dimensional generalized linear models with measurement errors.

3.1. Estimation Method for Poisson Models

Count data are commonly encountered in various fields including finance, economics and social sciences. In order to analyze count data, Poisson regression models are a popular choice in practice. Jiang and Ma [78] studied the high-dimensional Poisson regression models with additive measurement errors and proposed a novel optimization algorithm to obtain the estimator of true regression coefficient vector

β_{0}

. Suppose that

Y_{i}

is the response variable following a Poisson distribution satisfying

E (Y_{i} | X_{i}) = exp (X_{i}^{T} β)

, where

X_{i} \in R^{p}

is an unobservable covariate. Its error-prone surrogate

W_{i} = X_{i} + U_{i}

, and the measurement error

U_{i}

follows from a sub-Gaussian distribution with known covariance matrix

Σ_{u}

. It is easy to verify that

E \{Y_{i} W_{i}^{T} β - exp (β^{T} W_{i} - β^{T} Σ_{u} β / 2) ∣ X_{i}, Y_{i}\} = Y_{i} X_{i}^{T} β - exp (β^{T} X_{i}) .

(35)

From (35), Jiang and Ma [78] imposed a restriction on

β

similar to it in [63] and estimated

β

by solving the following optimization problem

{\hat{β}}_{p} = arg min_{{∥ β ∥}_{1} \leq c_{p} \sqrt{s}, {∥ β ∥}_{2} \leq c_{p}} \{L (β) + {λ ∥ β ∥}_{1}\},

(36)

where

L (β) = - \frac{1}{n} \sum_{i = 1}^{n} \{Y_{i} W_{i}^{T} β - exp (β^{T} W_{i} - β^{T} Σ_{u} β / 2)\} .

(37)

The estimator

{\hat{β}}_{p}

can be obtained by the composite gradient descent algorithm. Specifically, at the

(k + 1)

th iteration, first solve the following optimization problem without any restrictions on

β

{\tilde{β}}_{p}^{(k + 1)} = arg min_{β} \{\partial L (β_{p}^{(k)}) / \partial β^{T} (β - β^{(k)}) + η / 2 ∥ β - β^{(k)} ∥_{2}^{2} + λ {∥ β ∥}_{1}\},

where

η > 0

is a stepsize parameter. Next, apply the projection method in [95] to project

{\tilde{β}}_{p}^{(k + 1)}

onto the

l_{1}

ball with radius

c_{p} \sqrt{s}

and produce

{\overset{˘}{β}}_{p}^{(k + 1)}

. If

∥ {\overset{˘}{β}}_{p}^{(k + 1)} ∥_{2} > c_{p}

, let

{\hat{β}}_{p}^{(k + 1)} = {\overset{˘}{β}}_{p}^{(k + 1)} c_{p} / {∥ {\overset{˘}{β}}_{p}^{(k + 1)} ∥}_{2}

; otherwise, let

{\hat{β}}_{p}^{(k + 1)} = {\overset{˘}{β}}_{p}^{(k + 1)}

. Repeat the above steps until the stopping condition is satisfied. Jiang and Ma [78] proved the convergence of this algorithm. Under some regularity conditions, they further showed that the global minimum

{\hat{β}}_{p}

of (36) satisfies

∥ {\hat{β}}_{p} - β_{0} ∥_{q} = O (s^{1 / q} λ) .

(38)

There is an usual requirement that

λ ⩾ 2 {∥\partial L (β) / \partial β∥}_{\infty}

in Poisson models, and the term

{∥\partial L (β) / \partial β∥}_{\infty} = O (\sqrt{n / log p})

. Thus, the convergence rate of

{\hat{β}}_{p}

is slower than those of nonconvex Lasso, CoCoLasso and balanced estimators in linear models.

3.2. Generalized Matrix Uncertainty Selector

The method in [78] is only designed for high-dimensional Poisson models with measurement errors. To develop a method that is applicable to generalized linear models, Sørensen et al. [70] drew on the idea of the MU selector and proposed the generalized matrix uncertainty (GMU) selector for high-dimensional generalized linear models with additive measurement errors.

Consider a generalized linear model with response variable Y distributed according to

f_{Y} (y; θ, ϕ) = exp \{\frac{y θ - b (θ)}{a (ϕ)} + c (y, ϕ)\},

where

θ = X^{T} β_{0}

,

X \in R^{p}

are the covariates. The expected response is given by the mean function

μ (θ) = b^{'} (θ)

, and the Taylor expansion of the mean function

μ (X_{i}^{T} β_{0})

at point

W_{i}^{T} β_{0}

is

μ (X_{i}^{T} β_{0}) = \sum_{ℓ = 0}^{\infty} \frac{μ^{(ℓ)} (W_{i}^{T} β_{0})}{ℓ!} {(- U_{i}^{T} β_{0})}^{ℓ},

(39)

where

μ^{(ℓ)} (\cdot)

is the ℓth derivative of function

μ (\cdot)

. With the Taylor expansion (39) of the mean function, the generalized matrix uncertainty selector can be defined as

\begin{matrix} {\hat{β}}_{GMU}^{L} = arg min_{β} \{{∥ β ∥}_{1} : β \in Θ^{L}\}, \\ Θ^{L} = \{β \in R^{p} : max_{1 \leq j \leq p} & |\frac{1}{n} w_{i j} [Y_{i} - μ (W_{i}^{T} β)]| \leq λ + \sum_{ℓ = 1}^{L} \frac{δ^{ℓ}}{ℓ! \sqrt{n}} {∥ β ∥}_{1}^{ℓ} {∥ μ^{(ℓ)} (W β) ∥}_{2}\}, \end{matrix}

(40)

where

δ

is the positive parameter satisfying

{∥ U ∥}_{\infty} \leq δ, μ^{(ℓ)} (W β) = {[μ^{(ℓ)} (W_{1}^{T} β), \dots, μ^{(ℓ)} (W_{n}^{T} β)]}^{T} .

In practice, Sørensen et al. [70] recommended using

L = 1

for computational convenience and demonstrated that the first-order approximation produces satisfactory results.

To solve the optimation problem (40) and obtain the estimator

{\hat{β}}_{GMU}^{L}

, we can utilize the iterative reweighing algorithm. The main iteration step of the algorithm is stated as follows

{\hat{β}}_{GMU}^{(k + 1)} = arg min_{β} \{{∥ β ∥}_{1} : \frac{1}{n} {∥{\tilde{W}}_{g}^{(k) T} ({\tilde{z}}^{(k)} - {\tilde{W}}_{g}^{(k)} β)∥}_{\infty} \leq λ + \sum_{ℓ = 1}^{L} \frac{δ^{ℓ}}{ℓ! \sqrt{n}} {∥ β ∥}_{1}^{ℓ} {∥ V^{(ℓ, k)} ∥}_{2}\},

(41)

where

{\tilde{W}}_{g} \in R^{n \times p}

is a matrix of the weighted error-prone surrogate of covariates with elements

{\tilde{w}}_{g, i j}^{(k)} = w_{i j} \sqrt{V_{i}^{(1, k)}}

,

{\tilde{z}}^{(k)} \in R^{n}

is a vector with the elements

{\tilde{z}}_{i}^{(k)} = z_{i}^{(k)} \sqrt{V_{i}^{(1, k)}}

,

z_{i}^{(k)} = W_{i}^{T} {\hat{β}}_{GMU}^{(k)} + [Y_{i} - μ \{W_{i}^{T} {\hat{β}}_{GMU}^{(k)}\}] μ^{'} {\{W_{i}^{T} {\hat{β}}_{GMU}^{(k)}\}}^{- 1}, i = 1, \dots, n,

and

V^{(ℓ, k)} = {[μ^{(ℓ)} \{W_{1}^{T} {\hat{β}}_{GMU}^{(k)}\}, \dots, μ^{(ℓ)} \{W_{n}^{T} {\hat{β}}_{GMU}^{(k)}\}]}^{T} = {(V_{1}^{(ℓ, k)}, \dots, V_{n}^{(ℓ, k)})}^{T}, ℓ = 1, \dots, L

is the weight vector in Taylor expansion with L terms. When

L = 1

is applied, it is easy to verify that (41) is a linear program. For more details about the algorithm, please see [70,102]. The GMU selector can be implemented by R package “hdme” [94]. However, Sørensen et al. [70] did not establish any asymptotic properties of the GMU selector.

4. Hypothesis Testing Methods

The aforementioned works on high-dimensional measurement error models mainly investigate estimation problems and numerical algorithms of optimization problems as well as the theoretical properties of estimators. Recently, some works have studied the hypothesis testing problems for high-dimensional measurement error regression models, which will be introduced in this section.

4.1. Corrected Decorrelated Score Test

The above methods are proposed under the setting that all covariates are corrupted. In practice, it is common that not all covariates are measured with errors. Thus, Li et al. [90] investigated high-dimensional measurement error models where a fixed number of covariates contain measurement errors and proposed statistical inference methods for the regression coefficients corresponding to these covariates.

Consider the following high-dimensional linear model with one of the covariates containing additive errors

\{\begin{matrix} y_{i} = β_{0} X_{i} + γ_{0}^{T} Z_{i} + ε_{i}, \\ W_{i} = X_{i} + U_{i}, i = 1, \dots, n, \end{matrix}

where

X_{i} \in R

is an unobservable covariate,

W_{i}

is its error-prone surrogate, and

Z_{i} \in R^{p - 1}

is an observed covariate precisely. The measurement error

U_{i}

follows from sub-Gaussian distribution with mean zero and variance

σ_{U}^{2}

, and

U_{i}

is independent of

(X_{i}, Z_{i}, ε_{i})

. Denote

y = {(y_{1}, \dots, y_{n})}^{T}, X = {(X_{1}, \dots, X_{n})}^{T}, W = {(W_{1}, \dots, W_{n})}^{T}

and

Z = {(Z_{1}, \dots, Z_{n})}^{T} .

This subsection aims to test the hypothesis:

H_{0} : β_{0} = β^{*} ⟷ H_{1} : β_{0} \neq β^{*} (β^{*} \in R),

and construct a confidence interval for

β_{0}

under high-dimensional settings.

Since we are only concerned with the inference of the parameter

β

, then the parameter

γ

is regarded as a nuisance. Following the idea in [85], Li et al. [90] defined the corrected score function as

S_{θ} (θ) = \hat{Σ} θ - \hat{ρ} = \frac{1}{n} \sum_{i = 1}^{n} S_{i θ} (θ) = (\begin{matrix} S_{β} (β, γ) \\ S_{γ} (β, γ) \end{matrix}) = (\begin{matrix} {\hat{Σ}}_{11} β + {\hat{Σ}}_{12} γ - {\hat{ρ}}_{1} \\ {\hat{Σ}}_{21} β + {\hat{Σ}}_{22} γ - {\hat{ρ}}_{2} \end{matrix}),

where

θ = {(β, γ^{T})}^{T}

,

\hat{Σ} = (\begin{matrix} {\hat{Σ}}_{11} & {\hat{Σ}}_{12} \\ {\hat{Σ}}_{21} & {\hat{Σ}}_{22} \end{matrix}) = (\begin{matrix} W^{T} W / n - σ_{U}^{2} & W^{T} Z / n \\ Z^{T} W / n & Z^{T} Z / n \end{matrix}) and \hat{ρ} = (\begin{matrix} {\hat{ρ}}_{1} \\ {\hat{ρ}}_{2} \end{matrix}) = (\begin{matrix} W^{T} y / n \\ Z^{T} y / n \end{matrix})

are consistent estimators of

Σ = {(X, Z)}^{T} (X, Z) / n

and

ρ = {(X, Z)}^{T} y / n

, respectively. The corrected score covariance matrix is defined as

I (θ) = E \{S_{i θ} (θ) S_{i θ} {(θ)}^{T}\} = (\begin{matrix} I_{β β} & I_{β γ} \\ I_{γ β} & I_{γ γ} \end{matrix}) .

To conduct statistical inference on the target parameter

β

, it is crucial to eliminate the influence of nuisance parameter

γ

. Thus, Li et al. [90] developed the corrected decorrelated score function for the target parameter

β

as

S (β, γ) = S_{β} (β, γ) - ω^{T} S_{γ} (β, γ),

where

ω^{T} = I_{β γ} I_{γ γ}^{- 1} = E (X_{i} Z_{i}^{T}) E {(Z_{i} Z_{i}^{T})}^{- 1}

. It easy to verify that

E [S (β_{0}, γ_{0}) S_{γ} (β_{0}, γ_{0})] = 0

, which indicates that

S (β, γ)

and nuisance score function

S_{γ} (β, γ)

are uncorrelated. Obviously, we can obtain that

Var [S (β, γ)] = I_{β β} - I_{β γ} I_{γ γ}^{- 1} I_{γ β} = : σ_{β ∣ γ}^{2}

. Then, Li et al. [90] constructed the test statistic and the confidence interval for

β_{0}

based on the estimated decorrelated score function. This statistical inference procedure is summarized as follows.

Step 1. Apply the CoCoLasso estimation method in [64] to calculate initial estimator

\tilde{θ} = {(\tilde{β}, {\tilde{γ}}^{T})}^{T}

, and utilize the following Dantzig-type estimator to estimate

ω

\hat{ω} = arg min_{ω} {∥ ω ∥}_{1}, s . t . {∥ {\hat{Σ}}_{12} - ω^{T} {\hat{Σ}}_{22} ∥}_{\infty} \leq λ^{'},

where

λ^{'} = O (\sqrt{log p / n})

.

Step 2. Estimate the decorrelated score function by

\hat{S} (β, \tilde{γ}) = S_{β} (β, \tilde{γ}) - {\hat{ω}}^{T} S_{γ} (β, \tilde{γ}),

and calculate the test statistic

\hat{T} = \sqrt{n} \hat{S} (β^{*}, \tilde{γ}) {({\hat{σ}}_{β ∣ γ, H_{0}}^{2})}^{- 1 / 2}

, where

\begin{matrix} {\hat{σ}}_{β ∣ γ, H_{0}}^{2} & = {\{{\hat{I}}_{β β} - {\hat{ω}}^{T} {\hat{I}}_{γ β}\}|}_{β = β^{*}} \\ = ({\hat{σ}}_{ε, H_{0}}^{2} + β^{* 2} σ_{U}^{2}) (1 - {\hat{ω}}^{T} {\hat{Σ}}_{21}) + β^{* 2} E (U_{i}^{4}) + {\hat{σ}}_{ε, H_{0}}^{2} σ_{U}^{2} - β^{* 2} σ_{U}^{4} . \end{matrix}

Step 3. Estimate

β

as

\hat{β} = \tilde{β} - \hat{S} (\tilde{θ}) / ({\hat{Σ}}_{11} - {\hat{ω}}^{T} {\hat{Σ}}_{21}),

and construct the

(1 - α) 100 %

confidence interval for

β_{0}

as

[\hat{β} - u_{1 - α / 2} \sqrt{{\hat{σ}}_{β}^{2} / n}, \hat{β} + u_{1 - α / 2} \sqrt{{\hat{σ}}_{β}^{2} / n}],

where

u_{1 - α / 2}

is the

(1 - α / 2)

quantile of standard normal distribution,

{\hat{σ}}_{β}^{2} = {(1 - {\hat{ω}}^{T} {\hat{Σ}}_{21})}^{- 2} \{({\hat{σ}}_{ε}^{2} + {\hat{β}}^{2} σ_{U}^{2}) (1 - {\hat{ω}}^{T} {\hat{Σ}}_{21}) + {\hat{β}}^{2} E (U_{i}^{4}) + {\hat{σ}}_{ε}^{2} σ_{U}^{2} - {\hat{β}}^{2} σ_{U}^{4}\}

is the estimator of the asymptotic variance

σ_{β}^{2}

of

\hat{β}

, and

{\hat{σ}}_{ε}^{2} = n^{- 1} \sum_{i = 1}^{n} {(y_{i} - \hat{β} W_{i} - {\tilde{γ}}^{T} Z_{i})}^{2} - {\hat{β}}^{2} σ_{U}^{2}

is the estimator of the variance

σ_{ε}^{2}

of

ε_{i}

.

Note that the methods used to estimate

θ

and

ω

in Step 1 can be varying, as long as the corresponding estimators are consistent; please see more discussions in [90]. Li et al. [90] showed that under some regularity conditions,

\sqrt{n} \hat{S} (β^{*}, \tilde{γ}) {({\hat{σ}}_{β ∣ γ, H_{0}}^{2})}^{- 1 / 2} \overset{d}{\to} N (0, 1) as n \to \infty .

Furthermore, the asymptotic normality of the test statistic

{\hat{T}}_{n}

at local alternatives was also established in [90] without any additional condition. Li et al. [90] also constructed the asymptotic confidence interval for target parameter

β

in Step 3 based on the asymptotic normality of

\hat{β}

, which is given as follows

\sqrt{n} (\hat{β} - β_{0}) = - {[E \{{\frac{\partial S (β, γ_{0})}{\partial β}|}_{β = β_{0}}\}]}^{- 1} \sqrt{n} S (β_{0}, γ_{0}) + o_{P} (1) \overset{d}{\to} N (0, σ_{β}^{2}) as n \to \infty,

where

σ_{β}^{2} = {\{E (X_{i}^{2}) - ω^{T} E (X_{i} Z_{i})\}}^{- 2} σ_{β ∣ γ, 0}^{2}

, and

σ_{β ∣ γ, 0}^{2} = (σ_{ε}^{2} + β_{0}^{2} σ_{U}^{2}) \{1 - ω^{T} E (X_{i} Z_{i})\} + β_{0}^{2} E (U_{i}^{4}) + σ_{ε}^{2} σ_{U}^{2} - β_{0}^{2} σ_{U}^{4} .

4.2. Wald and Score Tests for Poisson Models

In addition to linear models, researchers have made some progress on hypothesis-testing problems for Poisson models. Jiang et al. [92] studied hypothesis-testing problems for high-dimensional Poisson measurement error models, and they proposed Wald and score tests for the linear function of regression coefficients.

Consider the following hypothesis test

H_{0} : C β_{0 M} = b ⟷ H_{1} : C β_{0 M} = b + h_{n} for some h_{n} \in R^{r},

where

C \in R^{r \times m}

is a matrix with

r \leq m

, and

β_{0 M} \in R^{m \times 1}

is a subvector of the true regression coefficient vector

β_{0} = {(β_{01}, \dots, β_{0 p})}^{T}

formed by

β_{0 j} (j \in M)

. To construct a valid test statistic, Jiang et al. [92] drew on the idea of the estimation method in [78] and suggested estimating regression coefficients under the null hypothesis by

{\hat{β}}_{p n} = arg min_{{∥ β ∥}_{1} ⩽ R_{1}, {∥ β ∥}_{2} ⩽ R_{2}} \{L (β) + p_{λ} (β_{M^{c}})\}, s . t . C β_{M} = b,

(42)

where

p_{λ} (\cdot)

is a penalty function, and

L (β)

is defined in (37). Similarly, the following estimator of

β_{0}

can be considered without assuming the null hypothesis

{\hat{β}}_{p w} = arg min_{{∥ β ∥}_{1} ⩽ R_{1}, {∥ β ∥}_{2} ⩽ R_{2}} \{L (β) + p_{λ} (β_{M^{c}})\} .

(43)

The estimators

{\hat{β}}_{p n}

and

{\hat{β}}_{p w}

can be obtained by the ADMM algorithm; for more details, please see [92]. It can be seen that optimization problems (42) and (43) can be distinguished from the method in (36) because we do not impose penalties on the components of the target parameter

β_{M}

to avoid forcing them to be zeros. Then, based on the above estimators of

β_{0}

, Jiang et al. [92] proposed the following score statistic and Wald statistic to test whether

C β_{0 M} = b

or not

\begin{matrix} T_{S} = & n {\{\frac{\partial L (\hat{β})}{\partial β^{T}}\}}_{M \cup S} A^{T} Ψ^{- 1} ({\hat{Σ}}^{r}, \hat{Q}, \hat{β}) A {\{\frac{\partial L (\hat{β})}{\partial β}\}}_{M \cup S}, \\ T_{W} = & n {(C {\hat{β}}_{pw, M} - b)}^{T} Ψ {({\hat{Σ}}^{r}, \hat{Q}, {\hat{β}}_{pw})}^{- 1} (C {\hat{β}}_{pw, M} - b), \end{matrix}

where

A = C [I_{m \times m}, 0_{m \times k}] {\hat{Q}}_{M \cup S, M \cup S}^{- 1} (\hat{β})

,

Ψ (Σ, Q, β) \equiv C [I_{m \times m}, 0_{m \times k}] Q_{M \cup S, M \cup S}^{- 1} (β) Σ_{M \cup S, M \cup S} (β) Q_{M \cup S, M \cup S}^{- 1} (β) {[I_{m \times m}, 0_{m \times k}]}^{T} C^{T},

{\hat{Σ}}^{r} (β)

and

\hat{Q} (β)

are estimators of

Σ^{r} (β)

and

Q (β) = E \{exp (β^{T} X) X X^{T}\}

, respectively, and

Σ^{r} (β) = E [{\{Y_{i} W_{i} - exp (β^{T} W_{i} - β^{T} Σ_{u} β / 2) (W_{i} - Σ_{u} β)\}}^{\otimes 2}]

is the covariance of the residuals.

Jiang et al. [92] established the consistency of

{\hat{β}}_{p n}

and

{\hat{β}}_{p w}

with

λ

larger than

O ({log p / n}^{1 / 4})

,

m = o ({log p / n}^{1 / 2})

and

s = o ({log p / n}^{1 / 2})

. Furthermore, the asymptotic distributions of the two test statistics are established; specifically, as

n \to \infty

, we have

T_{S} \overset{d}{\to} χ^{2} (r, n h_{n}^{T} Ψ^{- 1} (Σ, Q, β_{t}) h_{n}), T_{W} \overset{d}{\to} χ^{2} (r, n h_{n}^{T} Ψ^{- 1} (Σ, Q, β_{t}) h_{n}) .

Thus, we reject the null hypothesis if

T_{S} > χ_{1 - α}^{2} (r)

for the score test with the nominal significance level

α > 0

, and we reject the null hypothesis if

T_{W} > χ_{1 - α}^{2} (r)

for the Wald test, where

χ_{1 - α}^{2} (r)

is the

(1 - α)

quantile of the chi-square distribution

χ^{2} (r)

.

5. Screening Methods

As the dimensions of data become higher and higher, we often encounter ultrahigh-dimensional data. For the ultrahigh-dimensional models, we frequently reduce the dimensions using variable screening techniques and then apply other estimation or hypothesis-testing methods. The variable screening technique SIS [50] designed for ultrahigh-dimensional clean data has achieved great success and has been extended to various settings. SIS screens the variables according to the magnitudes of their marginal correlations with the response variable. Nghiem et al. [93] drew inspiration from the ideas of SIS in [50] and marginal bridge estimation in [103], and they proposed the corrected sure independence screening (SISc) method and corrected penalized marginal screening method (PMSc). Consider the following optimization problem

\begin{matrix} {\tilde{β}}_{sc} = & arg min_{β} L (β) = arg min_{β} \{\sum_{j = 1}^{p} L_{j} (β_{j})\} \\ = & arg min_{β} \{\frac{1}{n} \sum_{j = 1}^{p} [\sum_{i = 1}^{n} {(y_{i} - w_{i j} β_{j})}^{2} - σ_{j}^{2} β_{j}^{2} + p_{λ} (|β_{j}|)]\}, \end{matrix}

(44)

where

p_{λ} (\cdot)

is a penalty function, and the bridge penalty is adopted in [93]. Based on (44), Nghiem et al. [93] proposed PMSc and SISc methods. For the PMSc method, it suggested taking the selected submodel as

{\hat{S}}_{PMSc} = \{j : {\tilde{β}}_{sc, j} \neq 0\} .

Under some regularity conditions, Nghiem et al. [93] showed that

P (S \subset {\hat{S}}_{PMSc}) \to 1

. Furthermore, when

λ = 0

, we can obtain that

{\tilde{β}}_{sc, j} = \frac{\sum_{i = 1}^{n} w_{i j} y_{i}}{\sum_{i = 1}^{n} w_{i j}^{2} - n σ_{u, j}^{2}}, j = 1, \dots, p,

which measures the marginal correlation between the jth variable and the response variable. The SISc selects the variable according to the magnitude of

{\tilde{β}}_{sc, j}

. The corresponding selected set is

{\hat{S}}_{SISc} = \{1 \leq j \leq p : | {\tilde{β}}_{j} | is among the d largest of all\} .

Nghiem et al. [93] proved that

P (S \subset {\hat{S}}_{SISc}) = 1 - O {p exp (- C n)}

for some constant

C > 0

under some regularity conditions.

6. Conclusions

With the advent of the big data era, high-dimensional measurement error data have proliferated in various fields. Over the past few years, many statistical inference methods for high-dimensional measurement error regression models have been developed to overcome the difficulties in scientific research and provide effective approaches for tackling problems in applications. This paper reviews the research advances in estimation and hypothesis testing methods for high-dimensional measurement error models as well as variable screening methods for ultrahigh-dimensional measurement error models. The aforementioned estimation methods can be classified into the following three categories: (i) methods based on a nonconvex objective function with restrictions on the regression coefficients, such as the nonconvex Lasso and the estimation method for Poisson models in [78]; (ii) methods with a convex objective function including CoCoLasso, the balanced estimation method and the CaZnRLS estimation method; (iii) methods that draw on the idea of a Dantzig selector, such as the MU selector, compensated MU selector, GMU selector, and conic-programming-based estimation method. Many methods are now available in R packages “hdme” and “BDcocolasso”. Thus, we can apply these methods to analyze high-dimensional measurement error data. For the use of estimation methods, it is recommended to use CoCoLasso and balanced estimation methods due to their operability. If a higher computational efficiency is required, the CaZnRLS estimation method can be considered. If covariates are only partially corrupted by measurement errors, it is better to apply BDCoCoLasso.

Due to the prevalence of high-dimensional measurement error data in daily life and the growing demand for the statistical inference methods of measurement error regression models in applications, the related research is still one of the crucial aspects in statistical research. At present, the statistical inference methods and the theoretical system of high-dimensional measurement error models are far from complete. To the best of our knowledge, the study of high-dimensional measurement error regression models is currently limited to linear models and generalized linear models. However, it is common that covariates and response variables show a complicated relationship rather than a simple linear relationship in practice. Therefore, in order to meet the urgent needs of applications, it is necessary to develop more general statistical inference methods for high-dimensional nonlinear measurement error models. Further research in this area includes the following aspects.

Existing estimation methods for high-dimensional measurement error regression models are mainly for linear or generalized linear models. Therefore, it is urgent to develop estimation methods for nonlinear models with high-dimensional measurement error data such as nonparametric and semiparametric models.
Existing works mainly focus on independent and identically distributed data. It is worthwhile to extend the estimation and hypothesis-testing methods to measurement error models with complex data such as panel data and functional data.
In most studies of high-dimensional measurement error models, it is assumed that the covariance structure of the measurement errors is specific or the covariance matrix of measurement errors is known. Thus, it is a challenging problem to develop estimation and hypothesis-testing methods in the case that the covariance matrix of measurement errors is completely unknown.

Author Contributions

Conceptualization, G.L.; methodology, J.L.; validation, L.Y.; formal analysis, G.L.; investigation, J.L.; writing—original draft preparation, J.L.; writing—review and editing, G.L. and L.Y.; supervision, G.L.; project administration, G.L.; funding acquisition, G.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Natural Science Foundation of China (grant numbers: 12271046, 11971001, 12131006 and 12001277).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors sincerely thank the editor, the associate editor, and two reviewers for their constructive comments that have led to a substantial improvement of this paper.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SIMEX	Simulation–extrapolation
SCAD	Smoothly clipped absolute deviation
SICA	Smooth integration of counting and absolute deviation
MCP	Minimax concave penalty
SIS	Sure independence screening
CoCoLasso	Convex conditioned Lasso
CaZnRLS	Calibrated zero-norm regularized least squares
MU	Matrix uncertainty
MEBoost	Measurement error boosting
SIMSELEX	Simulation--selection--extrapolation
IRO	Imputation-regularized optimization
FDR	False discovery rate
PMSc	Corrected penalized marginal screening
SISc	Corrected sure independence screening
ADMM	Alternating direction method of multipliers
BDCoCoLasso	Block coordinate descent convex conditioned Lasso
MPEC	Mathematical program with equilibrium constraints
GEP–MSCRA	Multi-stage convex relaxation approach
GMU	Generalized matrix uncertainty

References

Liang, H.; Härdle, W.; Carroll, R.J. Estimation in a semiparametric partially linear errors-in-variables model. Ann. Stat. 1999, 27, 1519–1535. [Google Scholar] [CrossRef]
Cook, J.; Stefanski, L.A. Simulation-extrapolation estimation in parametric measurement error models. J. Am. Stat. Assoc. 1994, 89, 1314–1328. [Google Scholar] [CrossRef]
Carroll, R.J.; Lombard, F.; Kuchenhoff, H.; Stefanski, L.A. Asymptotics for the SIMEX estimator in structural measurement error models. J. Am. Stat. Assoc. 1996, 91, 242–250. [Google Scholar] [CrossRef]
Fan, J.Q.; Truong, Y.K. Nonparametric regression with errors in variables. Ann. Stat. 1993, 21, 1900–1925. [Google Scholar] [CrossRef]
Cui, H.J.; Chen, S.X. Empirical likelihood confidence region for parameter in the errors-in-variables models. J. Multivar. Anal. 2003, 84, 101–115. [Google Scholar] [CrossRef] [Green Version]
Cui, H.J.; Kong, E.F. Empirical likelihood confidence region for parameters in semi-linear errors-in-variables models. Scand. J. Stat. 2006, 33, 153–168. [Google Scholar] [CrossRef] [Green Version]
Cheng, C.L.; Tsai, J.R.; Schneeweiss, H. Polynomial regression with heteroscedastic measurement errors in both axes: Estimation and hypothesis testing. Stat. Methods Med. Res. 2019, 28, 2681–2696. [Google Scholar] [CrossRef]
He, X.M.; Liang, H. Quantile regression estimates for a class of linear and partially linear errors-in-variables models. Stat. Sin. 2000, 10, 129–140. [Google Scholar]
Carroll, R.J.; Delaigle, A.; Hall, P. Nonparametric prediction in measurement error models. J. Am. Stat. Assoc. 2009, 104, 993–1003. [Google Scholar] [CrossRef] [Green Version]
Jeon, J.M.; Park, B.U.; Keilegom, I.V. Nonparametric regression on lie groups with measurement errors. Ann. Stat. 2022, 50, 2973–3008. [Google Scholar] [CrossRef]
Chen, L.P.; Yi, G.Y. Model selection and model averaging for analysis of truncated and censored data with measurement error. Electron. J. Stat. 2020, 14, 4054–4109. [Google Scholar] [CrossRef]
Shi, P.X.; Zhou, Y.C.; Zhang, A.R. High-dimensional log-error-in-variable regression with applications to microbial compositional data analysis. Biometrika 2022, 109, 405–420. [Google Scholar] [CrossRef]
Li, B.; Yin, X.R. On surrogate dimension reduction for measurement error regression: An invariance law. Ann. Stat. 2007, 35, 2143–2172. [Google Scholar] [CrossRef]
Staudenmayer, J.; Buonaccorsi, J.P. Measurement error in linear autoregressive models. J. Am. Stat. Assoc. 2005, 100, 841–852. [Google Scholar] [CrossRef]
Wei, Y.; Carroll, R.J. Quantile regression with measurement error. J. Am. Stat. Assoc. 2009, 104, 1129–1143. [Google Scholar] [CrossRef] [Green Version]
Liang, H.; Li, R.Z. Variable selection for partially linear models with measurement errors. J. Am. Stat. Assoc. 2009, 104, 234–248. [Google Scholar] [CrossRef] [Green Version]
Hall, P.; Ma, Y.Y. Testing the suitability of polynomial models in errors-in-variables problems. Ann. Stat. 2007, 35, 2620–2638. [Google Scholar] [CrossRef]
Hall, P.; Ma, Y.Y. Semiparametric estimators of functional measurement error models with unknown error. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2007, 69, 429–446. [Google Scholar] [CrossRef]
Ma, Y.Y.; Carroll, R.J. Locally efficient estimators for semiparametric models with measurement error. J. Am. Stat. Assoc. 2006, 101, 1465–1474. [Google Scholar] [CrossRef]
Ma, Y.Y.; Li, R.Z. Variable selection in measurement error models. Bernoulli 2010, 16, 274–300. [Google Scholar] [CrossRef] [Green Version]
Ma, Y.Y.; Hart, J.D.; Janicki, R.; Carroll, R.J. Local and omnibus goodness-of-fit tests in classical measurement error models. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2011, 73, 81–98. [Google Scholar] [CrossRef] [Green Version]
Wang, L.Q. Estimation of nonlinear models with Berkson measurement errors. Ann. Stat. 2004, 32, 2559–2579. [Google Scholar] [CrossRef] [Green Version]
Nghiem, L.H.; Byrd, M.C.; Potgieter, C.J. Estimation in linear errors-in-variables models with unknown error distribution. Biometrika 2020, 107, 841–856. [Google Scholar] [CrossRef]
Pan, W.Q.; Zeng, D.L.; Lin, X.H. Estimation in semiparametric transition measurement error models for longitudinal data. Biometrics 2009, 65, 728–736. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Zhang, J.; Zhou, Y. Calibration procedures for linear regression models with multiplicative distortion measurement errors. Braz. J. Probab. Stat. 2020, 34, 519–536. [Google Scholar] [CrossRef]
Zhang, J. Estimation and variable selection for partial linear single-index distortion measurement errors models. Stat. Pap. 2021, 62, 887–913. [Google Scholar] [CrossRef]
Wang, L.Q.; Hsiao, C. Method of moments estimation and identifiability of semiparametric nonlinear errors-in-variables models. J. Econom. 2011, 165, 30–44. [Google Scholar] [CrossRef]
Schennach, S.M.; Hu, Y.Y. Nonparametric identification and semiparametric estimation of classical measurement error models without side information. J. Am. Stat. Assoc. 2013, 108, 177–186. [Google Scholar] [CrossRef] [Green Version]
Zhang, X.Y.; Ma, Y.Y.; Carroll, R.J. MALMEM: Model averaging in linear measurement error models. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2019, 81, 763–779. [Google Scholar] [CrossRef]
Carroll, R.J.; Ruppert, D.; Stefanski, L.A.; Crainiceanu, C.M. Measurement Error in Nonlinear Models, 2nd ed.; Chapman and Hall: New York, NY, USA, 2006. [Google Scholar]
Cheng, C.L.; Van Ness, J.W. Statistical Regression With Measurement Error; Oxford University Press: New York, NY, USA, 1999. [Google Scholar]
Fuller, W.A. Measurement Error Models; John Wiley & Sons: New York, NY, USA, 1987. [Google Scholar]
Li, G.R.; Zhang, J.; Feng, S.Y. Modern Measurement Error Models; Science Press: Beijing, China, 2016. [Google Scholar]
Yi, G.Y. Statistical Analysis with Measurement Error or Misclassification; Springer: New York, NY, USA, 2017. [Google Scholar]
Yi, G.Y.; Delaigle, A.; Gustafson, P. Handbook of Measurement Error Models; Chapman and Hall: New York, NY, USA, 2021. [Google Scholar]
Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 1996, 58, 267–288. [Google Scholar] [CrossRef]
Fan, J.Q.; Li, R.Z. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2005, 67, 301–320. [Google Scholar] [CrossRef] [Green Version]
Zou, H. The adaptive Lasso and its oracle properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar] [CrossRef] [Green Version]
Candès, E.J.; Tao, T. The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Stat. 2007, 35, 2313–2351. [Google Scholar]
Lv, J.C.; Fan, Y.Y. A unified approach to model selection and sparse recovery using regularized least squares. Ann. Stat. 2009, 37, 3498–3528. [Google Scholar] [CrossRef]
Zhang, C.-H. Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 2010, 38, 894–942. [Google Scholar] [CrossRef] [Green Version]
Fan, J.Q.; Lv, J.C. A selective overview of variable selection in high dimensional feature space. Stat. Sin. 2010, 20, 101–148. [Google Scholar]
Wu, Y.N.; Wang, L. A survey of tuning parameter selection for high-dimensional regression. Annu. Rev. Stat. Its Appl. 2020, 7, 209–226. [Google Scholar] [CrossRef] [Green Version]
Yang, E.; Lozano, A.C.; Ravikumar, P. Elementary estimators for high-dimensional linear regression. In Proceedings of the 31st International Conference on International Conference on Machine Learning, Beijing, China, 21 June 2014. [Google Scholar]
Kuchibhotla, A.K.; Kolassa, J.E.; Kuffner, T.A. Post-selection inference. Annu. Rev. Stat. Its Appl. 2022, 9, 505–527. [Google Scholar] [CrossRef]
Bühlmann, P.; van de Geer, S. Statistics for High-Dimensional Data: Methods, Theory and Applications; Springer: Heidelberg, Germany, 2011. [Google Scholar]
Hastie, T.; Tibshirani, R.; Wainwright, M. Statistical Learning with Sparsity: The Lasso and Generalizations; Taylor & Francis Group, CRC: Boca Raton, FL, USA, 2015. [Google Scholar]
Fan, J.Q.; Li, R.Z.; Zhang, C.-H.; Zou, H. Statistical Foundations of Data Science; Chapman and Hall: Boca Raton, FL, USA, 2020. [Google Scholar]
Fan, J.Q.; Lv, J.C. Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2008, 70, 849–911. [Google Scholar] [CrossRef] [Green Version]
Barut, E.; Fan, J.Q.; Verhasselt, A. Conditional sure independence screening. J. Am. Stat. Assoc. 2016, 111, 1266–1277. [Google Scholar] [CrossRef] [PubMed]
Fan, J.Q.; Song, R. Sure independence screening in generalized linear models with NP-dimensionality. Ann. Stat. 2010, 38, 3567–3604. [Google Scholar] [CrossRef]
Fan, J.Q.; Feng, Y.; Song, R. Nonparametric independence screening in sparse ultrahigh-dimensional additive models. J. Am. Stat. Assoc. 2011, 106, 544–557. [Google Scholar] [CrossRef] [Green Version]
Li, G.R.; Peng, H.; Zhang, J.; Zhu, L.X. Robust rank correlation based screening. Ann. Stat. 2012, 40, 1846–1877. [Google Scholar] [CrossRef] [Green Version]
Ma, S.J.; Li, R.Z.; Tsai, C.L. Variable screening via quantile partial correlation. J. Am. Stat. Assoc. 2017, 112, 650–663. [Google Scholar] [CrossRef]
Pan, W.L.; Wang, X.Q.; Xiao, W.N.; Zhu, H.T. A generic sure independence screening procedure. J. Am. Stat. Assoc. 2019, 114, 928–937. [Google Scholar] [CrossRef]
Tong, Z.X.; Cai, Z.R.; Yang, S.S.; Li, R.Z. Model-free conditional feature screening with FDR control. J. Am. Stat. Assoc. 2022, in press. [Google Scholar] [CrossRef]
Wen, C.H.; Pan, W.L.; Huang, M.; Wang, X.Q. Sure independence screening adjusted for confounding covariates with ultrahigh dimensional data. Stat. Sin. 2018, 28, 293–317. [Google Scholar]
Wang, L.M.; Li, X.X.; Wang, X.Q.; Lai, P. Unified mean-variance feature screening for ultrahigh-dimensional regression. Comput. Stat. 2022, 37, 1887–1918. [Google Scholar] [CrossRef]
Zhao, S.F.; Fu, G.F. Distribution-free and model-free multivariate feature screening via multivariate rank distance correlation. J. Multivar. Anal. 2022, 192, 105081. [Google Scholar] [CrossRef]
Purdom, E.; Holmes, S.P. Error distribution for gene expression data. Stat. Appl. Genet. Mol. Biol. 2005, 4, 16. [Google Scholar] [CrossRef] [Green Version]
Slijepcevic, S.; Megerian, S.; Potkonjak, M. Location errors in wireless embedded sensor networks: Sources, models, and effects on applications. Mob. Comput. Commun. Rev. 2002, 6, 67–78. [Google Scholar] [CrossRef]
Loh, P.-L.; Wainwright, M.J. High-dimensional regression with noisy and missing data: Provable guarantees with nonconvexity. Ann. Stat. 2012, 40, 1637–1664. [Google Scholar] [CrossRef]
Datta, A.; Zou, H. CoCoLasso for high-dimensional error-in-variables regression. Ann. Stat. 2017, 45, 2400–2426. [Google Scholar] [CrossRef] [Green Version]
Zheng, Z.M.; Li, Y.; Yu, C.X.; Li, G.R. Balanced estimation for high-dimensional measurement error models. Comput. Stat. Data Anal. 2018, 126, 78–91. [Google Scholar] [CrossRef]
Zhang, J.; Li, Y.; Zhao, N.; Zheng, Z.M. L₀ regularization for high-dimensional regression with corrupted data. Commun. Stat. Theory Methods 2022, in press. [Google Scholar] [CrossRef]
Tao, T.; Pan, S.H.; Bi, S.J. Calibrated zero-norm regularized LS estimator for high-dimensional error-in-variables regression. Stat. Sin. 2018, 31, 909–933. [Google Scholar] [CrossRef]
Rosenbaum, M.; Tsybakov, A. Sparse recovery under matrix uncertainty. Ann. Stat. 2010, 38, 2620–2651. [Google Scholar] [CrossRef]
Rosenbaum, M.; Tsybakov, A. Improved matrix uncertainty selector. Probab. Stat. Back-High-Dimens. Model. Processes 2013, 9, 276–290. [Google Scholar]
Sørensen, Ø.; Hellton, K.H.; Frigessi, A.; Thoresen, M. Covariate selection in high-dimensional generalized linear models with measurement error. J. Comput. Graph. Stat. 2018, 27, 739–749. [Google Scholar] [CrossRef]
Sørensen, Ø.; Frigessi, A.; Thoresen, M. Measurement error in Lasso: Impact and likelihood bias correction. Stat. Sin. 2015, 25, 809–829. [Google Scholar] [CrossRef] [Green Version]
Belloni, A.; Rosenbaum, M.; Tsybakov, A.B. Linear and conic programming estimators in high dimensional errors-in-variables models. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2017, 79, 939–956. [Google Scholar] [CrossRef] [Green Version]
Romeo, G.; Thoresen, M. Model selection in high-dimensional noisy data: A simulation study. J. Stat. Comput. Simul. 2019, 89, 2031–2050. [Google Scholar] [CrossRef]
Brown, B.; Weaver, T.; Wolfson, J. Meboost: Variable selection in the presence of measurement error. Stat. Med. 2019, 38, 2705–2718. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Nghiem, L.H.; Potgieter, C.J. Simulation-selection-extrapolation: Estimation in high-dimensional errors-in-variables models. Biometrics 2019, 75, 1133–1144. [Google Scholar] [CrossRef] [PubMed]
Li, X.; Wu, D.Y. Minimax rates of l_p-losses for high-dimensional linear errors-in-variables models over l_q-balls. Entropy 2021, 23, 722. [Google Scholar] [CrossRef]
Bai, Y.X.; Tian, M.Z.; Tang, M.-L.; Lee, W.-Y. Variable selection for ultra-high dimensional quantile regression with missing data and measurement error. Stat. Methods Med. Res. 2021, 30, 129–150. [Google Scholar] [CrossRef]
Jiang, F.; Ma, Y.Y. Poisson regression with error corrupted high dimensional features. Stat. Sin. 2022, 32, 2023–2046. [Google Scholar] [CrossRef]
Byrd, M.; McGee, M. A simple correction procedure for high-dimensional generalized linear models with measurement error. arXiv 2019, arXiv:1912.11740. [Google Scholar]
Liang, F.M.; Jia, B.C.; Xue, J.N.; Li, Q.Z.; Luo, Y. An imputation–regularized optimization algorithm for high dimensional missing data problems and beyond. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2018, 80, 899–926. [Google Scholar] [CrossRef]
van de Geer, S.; Bühlmann, P.; Ritov, Y.; Dezeure, R. On asymptotically optimal confidence regions and tests for high-dimensional models. Ann. Stat. 2014, 42, 1166–1202. [Google Scholar] [CrossRef]
Zhang, C.-H.; Zhang, S.S. Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2014, 76, 217–242. [Google Scholar] [CrossRef] [Green Version]
Ma, S.J.; Carroll, R.J.; Liang, H.; Xu, S.Z. Estimation and inference in generalized additive coefficient models for nonlinear interactions with high-dimensional covariates. Ann. Stat. 2015, 43, 2102–2131. [Google Scholar] [CrossRef] [PubMed]
Dezeure, R.; Bühlmann, P.; Meier, L.; Meinshausen, N. High-dimensional inference: Confidence intervals, p-values and R-software hdi. Stat. Sci. 2015, 30, 533–558. [Google Scholar] [CrossRef] [Green Version]
Ning, Y.; Liu, H. A general theory of hypothesis tests and confidence regions for sparse high dimensional models. Ann. Stat. 2017, 45, 158–195. [Google Scholar] [CrossRef]
Zhang, X.Y.; Cheng, G. Simultaneous inference for high-dimensional linear models. J. Am. Stat. Assoc. 2017, 112, 757–768. [Google Scholar] [CrossRef] [Green Version]
Vandekar, S.N.; Reiss, P.T.; Shinohara, R.T. Interpretable high-dimensional inference via score projection with an application in neuroimaging. J. Am. Stat. Assoc. 2019, 114, 820–830. [Google Scholar] [CrossRef]
Ghosh, S.; Tan, Z.Q. Doubly robust semiparametric inference using regularized calibrated estimation with high-dimensional data. Bernoulli 2022, 28, 1675–1703. [Google Scholar] [CrossRef]
Belloni, A.; Chernozhukov, V.; Kaul, A. Confidence bands for coefficients in high dimensional linear models with error-in-variables. arXiv 2017, arXiv:1703.00469. [Google Scholar]
Li, M.Y.; Li, R.Z.; Ma, Y.Y. Inference in high dimensional linear measurement error models. J. Multivar. Anal. 2021, 184, 104759. [Google Scholar] [CrossRef]
Huang, X.D.; Bao, N.N.; Xu, K.; Wang, G.P. Variable selection in high-dimensional error-in-variables models via controlling the false discovery proportion. Commun. Math. Stat. 2022, 10, 123–151. [Google Scholar] [CrossRef]
Jiang, F.; Zhou, Y.Q.; Liu, J.X.; Ma, Y.Y. On high dimensional Poisson models with measurement error: Hypothesis testing for nonlinear nonconvex optimization. Ann. Stat. 2023, 51, 233–259. [Google Scholar] [CrossRef]
Nghiem, L.H.; Hui, F.K.C.; Müller, S.; Welsh, A.H. Screening methods for linear errors-in-variables models in high dimensions. Biometrics 2023, 79, 926–939. [Google Scholar] [CrossRef] [PubMed]
Sørensen, Ø. hdme: High-dimensional regression with measurement error. J. Open Source Softw. 2019, 4, 1404. [Google Scholar] [CrossRef] [Green Version]
Duchi, J.; Shalev-Shwartz, S.; Singer, Y.; Chandra, T. Efficient projections onto the l₁-ball for learning in high dimensions. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 5–9 July 2008. [Google Scholar]
Agarwal, A.; Negahban, S.; Wainwright, M.J. Fast global convergence of gradient methods for high-dimensional statistical recovery. Ann. Stat. 2012, 40, 2452–2482. [Google Scholar] [CrossRef]
Chen, Y.D.; Caramanis, C. Noisy and missing data regression: Distribution-oblivious support recovery. J. Mach. Learn. Res. 2013, 28, 383–391. [Google Scholar]
Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 2011, 3, 1–122. [Google Scholar] [CrossRef]
Efron, B.; Hastie, T.; Johnstone, I.; Tibshirani, R. Least angle regression. Ann. Stat. 2004, 32, 407–499. [Google Scholar] [CrossRef] [Green Version]
Friedman, J.; Hastie, T.; Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 2010, 33, 1–22. [Google Scholar] [CrossRef]
Escribe, C.; Lu, T.Y.; Keller-Baruch, J.; Forgetta, V.; Xiao, B.W.; Richards, J.B.; Bhatnagar, S.; Oualkacha, K.; Greenwood, C.M.T. Block coordinate descent algorithm improves variable selection and estimation in error-in-variables regression. Genet. Epidemiol. 2021, 45, 874–890. [Google Scholar] [CrossRef]
James, G.M.; Radchenko, P. A generalized Dantzig selector with shrinkage tuning. Biometrika 2009, 96, 323–337. [Google Scholar] [CrossRef] [Green Version]
Huang, J.; Horowitz, J.L.; Ma, S.G. Asymptotic properties of bridge estimators in sparse high-dimensional regression models. Ann. Stat. 2008, 36, 587–613. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Luo, J.; Yue, L.; Li, G. Overview of High-Dimensional Measurement Error Regression Models. Mathematics 2023, 11, 3202. https://doi.org/10.3390/math11143202

AMA Style

Luo J, Yue L, Li G. Overview of High-Dimensional Measurement Error Regression Models. Mathematics. 2023; 11(14):3202. https://doi.org/10.3390/math11143202

Chicago/Turabian Style

Luo, Jingxuan, Lili Yue, and Gaorong Li. 2023. "Overview of High-Dimensional Measurement Error Regression Models" Mathematics 11, no. 14: 3202. https://doi.org/10.3390/math11143202

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Overview of High-Dimensional Measurement Error Regression Models

Abstract

1. Introduction

2. Estimation Methods for Linear Models

2.1. Nonconvex Lasso

2.2. Convex Conditioned Lasso

2.3. Balanced Estimation

2.4. Calibrated Zero-Norm Regularized Least Square Estimation

2.5. Linear and Conic Programming Estimation

3. Estimation Methods for Generalized Linear Models

3.1. Estimation Method for Poisson Models

3.2. Generalized Matrix Uncertainty Selector

4. Hypothesis Testing Methods

4.1. Corrected Decorrelated Score Test

4.2. Wald and Score Tests for Poisson Models

5. Screening Methods

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI