Smoothed Quantile Regression with Factor-Augmented Regularized Variable Selection for High Correlated Data

Zhang, Yongxia; Wang, Qi; Tian, Maozai

doi:10.3390/math10162935

Open AccessArticle

Smoothed Quantile Regression with Factor-Augmented Regularized Variable Selection for High Correlated Data

by

Yongxia Zhang

¹

,

Qi Wang

² and

Maozai Tian

^3,*

¹

College of Science, North China University of Technology, Beijing 100144, China

²

Key Laboratory of Quantitative Remote Sensing Information Technology, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China

³

School of Statistics, Renmin University of China, Beijing 100872, China

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(16), 2935; https://doi.org/10.3390/math10162935

Submission received: 16 July 2022 / Revised: 10 August 2022 / Accepted: 11 August 2022 / Published: 15 August 2022

(This article belongs to the Special Issue Advances in Artificial Intelligence: Data, Methods and Interdisciplinary Applications)

Download

Browse Figure

Versions Notes

Abstract

:

This paper studies variable selection for the data set, which has heavy-tailed distribution and high correlations within blocks of covariates. Motivated by econometric and financial studies, we consider using quantile regression to model the heavy-tailed distribution data. Considering the case where the covariates are high dimensional and there are high correlations within blocks, we use the latent factor model to reduce the correlations between the covariates and use the conquer to obtain the estimators of quantile regression coefficients, and we propose a consistency strategy named factor-augmented regularized variable selection for quantile regression (Farvsqr). By principal component analysis, we can obtain the latent factors and idiosyncratic components; then, we use both as predictors instead of the covariates with high correlations. Farvsqr transforms the problem from variable selection with highly correlated covariates to that with weakly correlated ones for quantile regression. Variable selection consistency is obtained under mild conditions. Simulation study and real data application demonstrate that our method is better than the common regularized M-estimation LASSO.

Keywords:

quantile regression; high correlations; latent factor model; variable selection

MSC:

62H25; 62F12

1. Introduction

Along with the continuous development of data collection and storage technology, data sets that present high dimensions and high correlations within blocks of variables can cause some new research problems in economics, finance, genomics, statistics, machine learning, etc. Because for such data, we need to make a variable selection in highly correlated variables.

There has been significant research into variable selection methods, and many variable selection methods have been developed, such as the regularized M-estimation method, which includes the LASSO [1], SCAD [2], elastic net [3], and the Dantzig selector [4]. There are many references to the regularized M-estimation method’s theoretical properties and algorithmic studies, including [5,6,7,8,9,10,11,12,13,14].

Most existing variable selection methods assume that the covariates are cross-sectionally weakly correlated, even, and serially independent. However, these assumptions are easily invalid in the data sets, which present high dimensions and high correlations within blocks of covariates, such as economic and financial data sets. For example, economics studies [15,16,17] show a strong correlation within blocks of covariates. In order to deal with the problem, Fan et al. proposed factor-adjusted variable selection for mean regression [18].

However, mean regression cannot simultaneously fit the skew and heavy-tailed data; mean regression is not robust against the outliers. Koenker and Bassett [19] proposed quantile regression (QR) to model the relationship between the response y and the covariates

x

. Compared to the mean regression, QR has two significant advantages: (i) QR can be used to model the entire conditional distribution of y given

x

, and thus, it provides insightful information about the relationship between y and

x

. The conditional distribution function of Y given

x

is

F (y | x) = P (Y \leq y | x)

. For

0 < τ < 1

, the

τ -

th conditional quantile of Y given

x

is defined as

Q_{Y | x} (τ) = inf {t : F (t | x) \geq τ}

. (ii) QR is robust against outliers and can be used to model the response in which distribution is skewed or heavy-tailed without correct error assumption. These two advantages make QR an appealing method to reflect data information that is difficult for the mean regression. The researchers can refer to Koenker [20] and Koenker et al. [21] for a comprehensive overview of methods, theory, computation, and many extensions of QR.

Ando and Tsay [22] proposed factor-augmented predictors for quantile regression, but the model did not contain the idiosyncratic components of the covariates, so it will cause an information loss of explanatory variables. So, we refer to Fan et al. [18] and propose the factor-augmented regularized variable selection (Farvsqr) for quantile regression to overcome the problems caused by the correlations within the covariates. As usual, let us assume that the i-th observation covariates

x_{i} = {(x_{i 1}, \dots, x_{i p})}^{T}

follow an approximate factor model,

x_{i} = Λ f_{i} + ϵ_{i},

(1)

where

f_{i}

is a

k \times 1

vector of latent factors,

Λ

is a

p \times k

loading matrix, and

ϵ_{i}

is a

p \times 1

vector of idiosyncratic components or errors which are independent of

f_{i}

.

The factor model has become one of the most popular and powerful tools in multivariate statistics and deeply impacted biology [23,24,25], economics, and finance [15,16,26]. Chamberlain and Rothschild [27] first proposed using principal component analysis (PCA) to solve the approximate factor model’s latent factors and loading matrix. Subsequently, much literature explores the factor model using the PCA method [28,29,30,31,32]. In our paper, we will use the PCA to obtain the estimators of

Λ

,

f_{i}

, and

ϵ_{i}

.

The process of Farvsqr is first to estimate model (1) and obtain the independent or low-correlated estimators of

f_{i}

and

ϵ_{i}

. Then, we replace the high correlation covariates

x_{i}

with the estimators

f_{i}

and

ϵ_{i}

. The second step is to solve a common regularized loss function. In this paper, we study Farvsqr by giving the specific parameter-solving process and the theoretical properties. Moreover, both simulation and real data application studies are presented.

The main contribution of our paper is to generalize the factor-adjusted regularized variable selection of mean regression to the quantile regression to accommodate the skew and heavy-tailed data. Section 2 introduces the smoothed quantile regression and the approximate factor models. Section 3 introduces the variable selection methodology of Farvsqr. Section 4 presents the general theoretical results. Section 5 provides simulation studies, and Section 6 applies our model to the Quarterly Database for Macroeconomic Research (FRED-QD).

2. Quantile Regression and Approximate Factor Model

2.1. Notations

Now, we will give some notations that will be used throughout the paper. Let

I_{n}

denote the

n \times n

identity matrix;

0

denotes the

n \times m

zero matrix;

0_{n}

and

1_{n}

denote the zero vector and one vector in

R^{n}

, respectively. For a matrix

W

, let

{∥ W ∥}_{m a x} = m a x_{i, j} ∥ W_{i j} ∥

denote its max norm, while

{∥ W ∥}_{F}

and

{∥ W ∥}_{p}

denote its Frobenius and induced p-norms, respectively. Let

λ_{m i n} (W)

denote the minimum eigenvalue of

W

if it is symmetric. For

W \in R^{n \times m}

,

I \in [n]

and

J \in [m]

, define

W_{I J} = {(W_{i j})}_{i \in I, j \in J}

,

W_{I \cdot} = {(W_{i j})}_{i \in I, j \in [m]}

,

W_{\cdot J} = {(W_{i j})}_{i \in [n], j \in J}

. For a vector

w \in R^{p}

and

L \subseteq [p]

, define

w_{L} = {(w_{i})}_{i \in L}

to be its subvector. Let ∇ and

\nabla^{2}

be the gradient and Hessian operators. For

f : R^{p} \to R

and

I, J \in [p]

, define

\nabla_{I} f (x) = {(\nabla f (x))}_{I}

and

\nabla_{I J}^{2} f (x) = {(\nabla^{2} f (x))}_{I J}

. Let

N (μ, Σ)

denote the normal distribution with mean

μ

and covariance matrix

Σ

.

2.2. Regularized M-Estimator for Quantile Regression

This subsection will begin with high-dimensional regression problems with heavy-tailed data. Let

y = (y_{1}, \dots, y_{n})

be the response vector,

x_{i} = {(x_{i 1}, \dots, x_{i p})}^{T}

,

i = 1, \dots, n

be the p-dimensional vectors of the explanatory variables. Let

X = (1_{n}, {(x_{1}, \dots, x_{n})}^{T}) \in R^{n \times (p + 1)}

be the design matrix and

y = {(y_{1}, y_{2}, \dots, y_{n})}^{T} \in R^{n}

be the response vector. Let

X_{1} = {(x_{1}, \dots, x_{n})}^{T} \in R^{n \times p}

be the matrix including n samples of the p-dimensional vector.

In this paper, we will fit the heavy-tailed data with quantile regression. Let

F_{y_{i} | x_{i}}

be the conditional cumulative distribution function of

y_{i}

given

x_{i}

. Under the linear quantile regression assumption, the

τ

th conditional quantile function is defined as

F_{y_{i} | x_{i}}^{- 1} (τ) = β_{0}^{*} (τ) + \sum_{j = 1}^{p} β_{j}^{*} (τ) x_{i j} = (1, x_{i}^{T}) β^{*} (τ)

(2)

where the quantile

τ \in (0, 1)

,

β^{*} (τ) = {(β_{0}^{*} (τ), β_{1}^{*} (τ), \dots, β_{p}^{*} (τ))}^{T}

is the true coefficients of the quantile regression that changes with the quantile

τ

. For the convenience of writing, we will omit the

τ

given in the following.

Under the linear quantile regression assumption, the common regression coefficient estimator at a given

τ

can be given as [19]

\hat{β} \in a r g m i n_{β \in R^{p + 1}} R (y, X β) = a r g m i n_{β \in R^{p + 1}} \frac{1}{n} \sum_{i = 1}^{n} ρ_{τ} (y_{i} - (1, x_{i}^{T}) β),

(3)

where

ρ_{τ} (u) = u (τ - I_{(u \leq 0)})

is the check function,

I_{(u \leq 0)}

is the indicator function, and

τ

is the quantile. However, as we know, the check function is not differentiable, which is very different from other widely used objective functions. The non-differentiable has two obvious disadvantages: (i) theoretical analysis of the estimator is very difficult; and (ii) gradient-based optimization methods cannot be used. So, He et al. [33] proposed a smoothed quantile regression for large-scale inference, which is denoted as conquer (convolution-type smoothed quantile regression). He et al. [33] concluded that the conquer method could improve estimation accuracy and computational efficiency for fitting large-scale linear quantile regression models rather than by minimizing the check function (3). So, in our paper, we will use the conquer to estimate the quantile regression. The estimator is given by

\hat{β} \in a r g m i n_{β \in R^{p + 1}} R (y, X β) = a r g m i n_{β \in R^{p + 1}} \frac{1}{n} \sum_{i = 1}^{n} L_{h} (y_{i} - (1, x_{i}^{T}) β)

(4)

where

L_{h} (v) = (ρ_{τ} * K_{h}) (v) = \int_{- \infty}^{\infty} ρ_{τ} (w) K_{h} (w - v) d w

,

K (\cdot)

is a symmetric and non-negative kernel function in which the integral is 1, and h is the bandwidth. Referring to He et al. [33], we have the definition:

K_{h} (v) = \frac{1}{h} K (v / h), K_{h} (v) = K (v / h), K (v) = \int_{- \infty}^{v} K (w) d w, v \in R .

The conquer function

R (y, X β)

is twice continuously differentiable relative to

β

; the gradient matrix and hessian matrix are as follows:

\begin{matrix} \nabla R (y, X β) = \frac{1}{n} \sum_{i = 1}^{n} {K_{h} ((1, x_{i}^{T}) β - y_{i}) - τ} {(1, x_{i}^{T})}^{T}, \\ \nabla^{2} R (y, X β) = \frac{1}{n} \sum_{i = 1}^{n} K_{h} ((1, x_{i}^{T}) β - y_{i}) {(1, x_{i}^{T})}^{T} (1, x_{i}^{T}) \end{matrix}

When

β = {(β_{0}, β_{1}, \dots, β_{p})}^{T}

is a sparse vector, it is common to estimate

β

through the regularized M-estimator as the following:

\begin{matrix} \hat{β} \in a r g m i n_{β \in R^{p + 1}} R (y, X β) + λ Q (β) \\ = a r g m i n_{β \in R^{p + 1}} \frac{1}{n} \sum_{i = 1}^{n} L_{h} (y_{i} - (1, x_{i}^{⊤}) β) + λ Q (β) \end{matrix}

(5)

We expect that the estimator of (5) satisfies two formulas:

∥ \hat{β} - β^{*} ∥ \overset{P}{\to} 0

for some norm

∥ \cdot ∥

and

P (s u p p (\hat{β}) = s u p p (β^{*})) \to 1

as

n \to \infty

. Zhao and Yu [9] studied the LASSO estimator for a sparse linear model and showed that there exists an irrepresentable condition that is sufficient and almost necessary for two formulas when we assume

s u p p (β^{*}) = [l] = L

. Let

{(X)}_{L}

and

{(X)}_{L^{c}}

denote the submatrices of

X

, which are the first l and the rest

(p + 1 - l)

of the columns, respectively. Then, the irrepresentable condition is:

{∥ (X)}_{L^{c}}^{⊤} {(X)}_{L} {[{(X)}_{L}^{T} {(X)}_{L}]}^{- 1} ∥_{\infty} \leq 1 - γ

(6)

where

γ \in (0, 1)

, but when the explanatory variables strongly correlate with the blocks, the irrepresentable condition will be easily invalid [18].

2.3. Approximate Factor Model

When there exist strong correlations between the covariates

x_{i}

, in order to estimate the parameters

β

, the common method is the latent factor model. There are many papers in the literature that studied the latent factor model in econometrics and statistics [15,16,18,30,34].

As usual, let us assume that

x_{i} \subseteq R^{p}, i = 1, \dots, n

follows the approximate factor model (1). As we know, the

x_{i}, i = 1, \dots, n

are the only observed variables; we need to estimate

Λ, f_{i}, ϵ_{i}, i = 1, \dots, n

. Generally, it is assumed that k is independent of n [18]. Let

F = {(f_{1}, \dots, f_{n})}^{T} \in R^{n \times k}

be the latent factors matrix, and

ε = {(ϵ_{1}, \dots, ϵ_{n})}^{T} \in R^{n \times p}

is the errors matrix. Then, Equation (1) can be written in a matrix as the following:

X_{1} = F Λ^{T} + ε .

(7)

Here, we need to note that

x_{i} = {(x_{i 1}, \dots, x_{i p})}^{T}, i = 1, \dots, n

have a strong correlation within the blocks, not including the intercept, so the matrix form of the latent factor model is

X_{1}

but not

X

. We impose the basic assumption for the latent factor model to identify the model as Assumption 1 [18].

Assumption 1.

Assume that

c o v (f_{i}) = I_{k}

,

Λ^{T} Λ

is diagonal, and all the eigenvalues of

Λ^{T} Λ / p

are bounded away from 0 and ∞ as

p \to \infty

.

3. Factor-Augmented Regularized Variable Selection

3.1. Methodology

Let

Λ_{0} = {(0_{k}, Λ^{T})}^{T} \in R^{(p + 1) \times k}

, and

ε_{1} = (1_{n}, ε) \in R^{n \times (p + 1)}

. With the approximate factor model (7), we have

X = F Λ_{0}^{T} + ε_{1}

, so we can obtain:

X β = F Λ_{0}^{T} β + ε_{1} β = F α + ε_{1} β,

(8)

where

α = Λ_{0}^{T} β \in R^{k}

. So, the regularized variable selection (5) can be written as:

\begin{matrix} \hat{β} \in a r g m i n_{β \in R^{p + 1}, α = Λ_{0}^{T} β \in R^{k}} R (y, X β) + λ Q (β) \\ = a r g m i n_{β \in R^{p + 1}, α = Λ_{0}^{T} β \in R^{k}} \frac{1}{n} \sum_{i = 1}^{n} L_{h} (y_{i} - (1, ϵ_{i}^{T}) β - f_{i}^{T} α) + λ Q (β) \end{matrix}

(9)

We need to estimate the coefficient of

x_{i}, i = 1, \dots, n

, namely

β

, so we consider

α

as the nuisance parameter. Now, let us consider a new estimator without the constraint

α = Λ_{0}^{T} β

,

\begin{matrix} \hat{β} \in a r g m i n_{β \in R^{p + 1}, α \in R^{k}} R (y, X β) + λ Q (β) \\ = a r g m i n_{β \in R^{p + 1}, α \in R^{k}} \frac{1}{n} \sum_{i = 1}^{n} L_{h} (y_{i} - (1, ϵ_{i}^{T}) β - f_{i}^{T} α) + λ Q (β) \end{matrix}

(10)

From Equation (10), we can see that the vector

{(ϵ_{i}^{T}, f_{i}^{T})}^{T}

can be considered as the new explanatory variables. In other words, we lift the covariate space from

R^{p + 1}

to

R^{p + 1 + k}

with the latent factor model, and the highly dependent covariates

x_{i}

are replaced by weakly dependent

{(ϵ_{i}^{T}, f_{i}^{T})}^{T}

.

We have the following lemma, whose proof is given in Appendix A:

Lemma 1.

Consider the model (2), let

R (y, X β) = \frac{1}{n} \sum_{i = 1}^{n} L_{h} (y_{i} - (1, x_{i}^{T}) β)

,

η_{i} = K_{h} ((1, x_{i}^{T}) β^{*} - y_{i}) - τ

and

v_{i} = {(1, ϵ_{i}^{T}, f_{i}^{T})}^{T} \in R^{p + 1 + k}

. If

E (η_{i} v_{i}) = 0_{p + 1 + k}

, then

(β^{*}, Λ_{0}^{T} β^{*}) = a r g m i n_{β \in R^{p + 1}, α \in R^{k}} E [R (y, F α + ε_{1} β)] .

(11)

By the latent factor model,

(ε, F)

has a much weaker correlation than

X_{1}

. So, we can calculate the estimators by the following two steps:

Let $X_{1} \in R^{n \times p}$ be the design matrix with strong cross-section correlations. Fit the approximate factor model (7), and the estimators of $Λ, F$ , and $ε$ are denoted as $\hat{Λ}, \hat{F},$ and $\hat{ε}$ . This paper will use the principal component analysis (PCA) to estimate all the parameters in the latent factor model. Regarding PCA, the references such as Bai [28] and Fan et al. [18,30] are available. More specifically, the columns of $\hat{F} / \sqrt{n}$ are the eigenvectors of $X_{1} X_{1}^{T}$ corresponding to the top k eigenvalues, $\hat{Λ} = \frac{1}{n} X^{T} \hat{F}$ .
Define $\hat{V} = (1_{n}, \hat{ε}, \hat{F}) \in R^{n \times (p + 1 + k)}$ and $θ = {(β^{T}, α^{T})}^{T} \in R^{p + 1 + k}$ . Then, $\hat{β}$ is obtained from the first $p + 1$ entries of the estimator vector of $θ$ .

$\begin{matrix} \hat{θ} \in a r g m i n_{θ \in R^{p + 1 + k}} R (y, \hat{V} θ) + λ Q (θ_{[p + 1]}) \\ = a r g m i n_{θ \in R^{p + 1 + k}} \frac{1}{n} \sum_{i = 1}^{n} L_{h} (y_{i} - {\hat{v}}_{i}^{T} θ) + λ Q (θ_{[p + 1]}), \end{matrix}$

(12)

where ${\hat{v}}_{i}^{T}$ is the i-th row of the matrix $\hat{V}$ .

We call the above two-step method as the factor-augmented regularized variable selection for quantile regression (Farvsqr). We successfully changed the quantile variable selection with highly correlated covariates

X

in (5) to quantile variable selection with weakly correlated or uncorrelated ones by the latent factor model in (12). Formula (12) is a convex function that can be minimized via the method conquer proposed by He et al. [33].

3.2. Selection Method of $λ$

Throughout all the study, the tuning parameter

λ

is selected by 10-fold cross-validation. First, we are given an equally spaced sequence of size 50 with the range from 0.05 to 2, which is the value range of

λ

. Second, the samples are divided into 10 pieces, nine of which are used as training sets and one of which is used as the test set. Third, for each value of

λ

, calculate the estimators of the model (12) using the training sets, then predict the test set, and select the

λ

which obtains the minimum value of the mean square error on the test set.

4. Theoretical Analysis

In this section, we will give the theoretical guarantees of the estimator from Formula (12) under the condition of the LASSO penalty. As the description before,

β^{*}

is the first

p + 1

elements of

θ^{*}

. Here, let

L = s u p p (θ^{*}), L_{1} = s u p p (β^{*}), L_{2} = [p + 1 + k] \ L

. When the explanatory variables

X_{1}

can be fitted well by the approximate factor model (7), then we can use the true augmented explanatory variables

v_{i} = {(1, ϵ_{i}^{T}, f_{i}^{T})}^{T}

to solve the objective function

m i n_{θ \in R^{p + 1 + k}} R (y, V θ) + λ {∥ θ_{[p + 1]} ∥}_{1},

where

V = {(v_{i}, \dots, v_{n})}^{T}

. However,

V

is not observable, so we need to use its estimator

\hat{V}

to solve the objective function

m i n_{θ \in R^{p + 1 + k}} R (y, \hat{V} θ) + λ {∥ θ_{[p + 1]} ∥}_{1} .

Assumption 2.

K (z) \in C^{2} (R)

. For some constants

W_{2}

and

W_{3}

, we have

0 \leq K (z) \leq W_{2}, |K^{'} (z)| \leq W_{3}

.

Assumption 3.

Let

θ^{*} = (\begin{matrix} β^{*} \\ Λ_{0}^{T} β^{*} \end{matrix})

. It is assumed that

ρ_{2} > ρ_{\infty} > 0

and

γ \in (0, 0.5)

such that

\begin{matrix} ∥ {[\nabla_{L L}^{2} R (y, V θ^{*})]}^{- 1} ∥_{l} \leq \frac{1}{4 ρ_{l}}, l = 2, \infty, \\ ∥ \nabla_{L_{2} L}^{2} R (y, V θ^{*}) {[\nabla_{L L}^{2} R (y, V θ^{*})]}^{- 1} ∥_{\infty} \leq 1 - 2 γ . \end{matrix}

Assumption 4.

∥ V ∥ \leq \frac{W_{0}}{2}

for some constant

W_{0} > 0

. In addition, there exists

k \times k

nonsingular matrix

M_{0}

, and

M = (\begin{matrix} I_{p + 1} & 0_{(p + 1) \times k} \\ 0_{(p + 1) \times k} & M_{0} \end{matrix})

such that for

\bar{V} = \hat{V} M

, we have

∥ \bar{V} {- V ∥}_{m a x} \leq \frac{W_{0}}{2}

and

σ = m a x_{j \in [p + 1 + k]} {(\frac{1}{n} \sum_{i = 1}^{n} {|{\bar{v}}_{i j} - v_{i j}|}^{2})}^{1 / 2} \leq \frac{4 ρ_{\infty} γ}{3 W_{0} W_{2} {|L|}^{2}}

.

Theorem 1.

Suppose Assumptions 2–4 hold. Define

W = W_{0}^{3} W_{3} L^{3 / 2}

and

ω = m a x_{j \in [p + 1 + k]} |\frac{1}{n} \sum_{i = 1}^{n} {\bar{v}}_{i j} [K_{h} ((1, x_{i}^{T}) β^{*} - y_{i}) - τ]| .

If

\frac{7 ω}{γ} < λ < \frac{ρ_{2} ρ_{\infty} γ}{12 W \sqrt{L}}

, then, we have

s u p p (\hat{β}) \subseteq s u p p (β^{*})

and

∥ \hat{β} - β^{*} ∥_{\infty} \leq \frac{6 λ}{5 ρ_{\infty}}, ∥ \hat{β} - β^{*} ∥_{2} \leq \frac{4 λ \sqrt{L}}{ρ_{2}}, {∥ \hat{β} - β^{*} ∥}_{1} \leq \frac{6 λ \sqrt{L}}{5 ρ_{\infty}}

.

5. Simulation Study

In this section, we will assess the performance of the method proposed by this paper through simulation. We compare Farvsqr with LASSO and SCAD under different simulation data.

We generate the response

y_{i}

from the model

y_{i} = x_{i} β^{*} + e_{i}

, where the true coefficients

β^{*}

are set to be

β^{*} = {(6, 5, 4, 0^{T})}^{T}

, and the error part

e_{i}

is following three models:

(i): $e_{i} \sim N (0, 1)$ ;
(ii): $e_{i} \sim t (2)$ ;
(iii): $e_{i} \sim 0.1 * N (0, 1) + 0.9 * N (0, 9)$ .

The covariates

x_{i}

are generated from one of the following two models:

(i): Factor model. $x_{i} = Λ f_{i} + ϵ_{i}$ with $k = 3$ . Factors are generated from a stationary $V A R (1)$ model $f_{i} = Φ f_{i - 1} + η_{i}$ with $f_{0} = 0$ . The $(i, j)$ -th entry of $Φ$ is set to be $0.5$ when $i = j$ and ${0.3}^{|i - j|}$ when $i \neq j$ . We draw $Λ, ϵ_{i}$ , and $η_{i}$ from the i.i.d. standard normal distribution.
(ii): Equal correlated case. We draw $x_{i}$ from i.i.d. $N_{p} (0, Σ)$ , where $Σ$ has diagonal elements 1 and off-diagonal elements 0.4.

For the factor model, in order to comprehensively evaluate the Farvsqr, given the quantile

τ

, we compare the influence of the different sample sizes and the explanatory variable’s dimensionality under different error distributions. We use the estimation error, namely

∥ \hat{β} - β^{*} ∥_{2}

, average model size, percentage of true positives (TP) for

β

, percentage of true negatives (TN) for

β

, and the elapsed time to compare the Farvsqr and LASSO. The percentage of TP and TN are defined as follows:

\begin{matrix} T P = \frac{1}{p} \sum_{j = 1}^{p} I ({\hat{β}}_{j} \neq 0, β_{j} \neq 0, s i g n ({\hat{β}}_{j}) = s i g n (β_{j})), \\ T N = \frac{1}{p} \sum_{j = 1}^{p} I ({\hat{β}}_{j} = 0, β_{j} = 0) . \end{matrix}

(13)

We compare the model performance of Farvsqr with LASSO under different error distributions and explanatory variable relationships; for each situation, we simulate 500 replications.

Influence of sample size

We compare the model with the fixed explanatory variable’s dimensionality

p = 200

; the sample size is set to be

100, 300, 500, 800

, and

1000,

respectively. For each sample size, we simulate 500 replications and calculate the average estimation error, average model size, TP, TN, and elapsed time. The results are presented in Table 1, Table 2 and Table 3. From the results, we can see that under three different error distributions, for each

τ

and n, the average estimation error of Farvsqr is smaller than that for LASSO. For example, when

τ = 0.25, n = 1000

of normal distribution, the average estimation errors of Farvsqr and LASSO are 0.127 and 2.586, respectively. As for the average model size, almost all the values of Farvsqr are smaller than those of LASSO, except for

n = 100

. For TP, all the scenarios are the same for Farvsqr and LASSO, so we can say that both can select the true non-zero variables. For elapsed time, all the values of Farvsqr are smaller than those of LASSO, so we can say that our method is more efficient. From all of the above, we can say that Farvsqr is better than LASSO. For every quantile

τ

, as the number of samples increases, the estimation error gradually decreases for Farvsqr, but for LASSO, the impact of sample size is not obvious. It may be that for the factor model, LASSO is not approximate, so although the sample size becomes larger, it cannot change the defects of LASSO method.

Influence of explanatory variable’s dimensionality

We compare the model with a fixed sample size

n = 1000

; the explanatory variable’s dimensionality is set to be

200, 300, 400, 500

, and 600, respectively. For each explanatory variable’s dimensionality, we simulate 500 replications and calculate the average estimation error, average model size, TP, TN, and elapsed time. The results are presented in Table 4, Table 5 and Table 6. From the results, we can see that under three different error distributions, for each

τ

and p, the average estimation error of Farvsqr is smaller than that of LASSO. For example, when

τ = 0.25, p = 200

of normal distribution, the average estimation errors of Farvsqr and LASSO are 0.124 and 2.059, respectively. As for the average model size, all the values of Farvsqr are smaller than those for LASSO. For TP, all the scenarios are the same for Farvsqr and LASSO, so we can say that both can select the true non-zero variables. For TN, all the values of Farvsqr are bigger than those of LASSO, so we can say that LASSO prefers to select redundant variables. For elapsed time, all the values of Farvsqr are smaller than those of LASSO, so we can say that our method is more efficient. From all of the above, we can say that Farvsqr is better than LASSO. For every quantile

τ

, as the dimension increases, the average estimation error also increases, which is consistent with common sense, however, the increase in range of Farvsqr is smaller than that for LASSO. For example, when

τ = 0.25

normal distribution, the values of Farvsqr are 0.124 and 0.158, respectively, for

p = 200

and

p = 500

, the relative increase is

27.42 %

; as for LASSO, the relative increase is

85.58 %

, so we can say that LASSO is vulnerable to the increase of variable dimension.

Equal correlated case

We also compare our model with LASSO under different sample sizes and explanatory variable’s dimensionality situation for the equal correlated case. By simulating 500 replications, we calculate the average estimation error, average model size, TP, TN, and elapsed time. The results are presented in Table 7, Table 8, Table 9, Table 10, Table 11 and Table 12. From all the tables, we can see that essentially all the elapsed time of Farvsqr is shorter than LASSO; at the same time, the estimation error is slightly larger for most situations. For the fixed explanatory variable’s dimensionality

p = 200

, as the number of samples increases, the elapsed time gradually decreases for Farvsqr and LASSO, but the relative increase is more significant for LASSO. For example, when

τ = 0.25

for

N (0, 1)

, the elapsed time of two methods for

n = 100

are 0.687 and 1.099, respectively, and the elapsed time of two methods for

n = 1000

are 1.965 and 3.856, respectively, and the relative increase is

186 %

for Farvsqr. As for LASSO, the relative increase is

251 %

. So, we can say that the efficiency of LASSO is easily affected by the sample size, and it is not appropriate for the large sample data. So, we can say that Farvsqr pays less cost for the similar correlated case.

From all the results above, we can draw the following conclusions:

(i): When the covariates are high dimensional and high correlations within blocks, namely, the covariates are generated from the factor model, our method Farvsqr is better than LASSO from all the evaluating indicators, including the average estimation error, average model size, TP, TN, and elapsed time.
(ii): For the factor model, the parameter estimation accuracy of LASSO is easily affected by the increase of the explanatory variable’s dimension.
(iii): For the equal correlated case, the Farvsqr pays less cost.
(iv): For all the different scenarios, the efficiency of the LASSO is easily affected by the sample size.

In order to illustrate further that our method is better for the data which is high dimensional and high correlations within blocks, we compare our method with SCAD also, and we found the same conclusions as LASSO. Here, we just give the results under normal distribution. Table 13 and Table 14 are, respectively, for the fixed explanatory variable’s dimensionality and sample size. We need to know here that the Farvsqr method is first to replace the highly dependent covariates by weakly dependent or uncorrelated ones by the latent factor model; then, we minimize (12) with LASSO or SCAD. However, LASSO and SCAD directly minimize Formula (5) in which the covariates are highly correlated.

6. Real Data Application

In this section, we will use the season U.S. macroeconomic variables in the FRED-QD database [17]. The dataset includes 247 dimensions, and the covariates in the FRED-QD data set are strongly correlated. We choose 88 data points which are complete observation samples from the first quarter of 2000 to the last quarter of 2021. The FRED-QD is a quarterly economic database updated by the Federal Reserve Bank of St. Louis, which is publicly available at http://research.stlouisfed.org/econ/mccracken/sel/ (accessed on 28 June 2022). The detailed information about the data can be found on the website. In this paper, we choose the variable GDP as the response and the other 246 variables as the explanatory variables. The density distribution of the response of our data is as shown in Figure 1. We compare the proposed Farvsqr with LASSO in variable selection, estimation, and elapsed time. The estimation performance is evaluated by the

R^{2}

, which is defined as:

1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

where

y_{i}

is the observed value at the time i,

{\hat{y}}_{i}

is the predicted value, and

\bar{y}

is the sample mean. We model the data given the quantile

τ = 0.1, τ = 0.5, τ = 0.75, τ = 0.9

. We evaluate the model from the

R^{2}

, model size, and elapsed time.

The results are presented in Table 15. From the result, we can see that the model sizes of Farvsqr are 18, 19, 38, and 38 for the quantile

τ = 0.1, τ = 0.5, τ = 0.75,

and

τ = 0.9

, respectively; however, the model sizes of LASSO are 241, 176, 207, and 222 for the quantile

τ = 0.1, τ = 0.5, τ = 0.75,

and

τ = 0.9

, respectively. The LASSO prefers to choose more related variables. For instance, for

τ = 0.1, τ = 0.5, τ = 0.75,

and

τ = 0.9

, all LASSO models include both Real PCE expenditures: durable goods, Real PCE: services, Real PCE: nondurable goods, Real gross private domestic investment, Real private fixed investment, Real gross private domestic investment: fixed investment: nonresidential: equipment, and Real private fixed investment: nonresidential because of the strong correlation between them. Moreover, all LASSO models also include both Number of civilians unemployed for less than 5 weeks, Number of civilians unemployed from 5 to 14 weeks, and Number of civilians unemployed from 15 to 26 weeks because of the strong correlation between them. Many other related variables are included by LASSO. The elapsed times of Farvsqr are 7.6209, 8.2036, 8.3589, and 8.3493 for the quantile

τ = 0.1, τ = 0.5, τ = 0.75,

and

τ = 0.9

respectively, while the elapsed times of LASSO are 9.8736, 13.8031, 10.6616, and 10.1012 for the quantile

τ = 0.1, τ = 0.5, τ = 0.75,

and

τ = 0.9

, respectively; so we can say that the algorithm efficiency of LASSO for our real data is much lower than that of Farvsqr. It may be because LASSO selects too many redundant explanatory variables, which not only affects the estimation accuracy of the model but also affects the efficiency of the algorithm. For the

R^{2}

, Farvsqr is better than LASSO except for

τ = 0.1

. So, we can see that Farvsqr is more suitable for this data set. Furthermore, we can say that for the data set with strong correlation between explanatory variables, Farvsqr is more suitable for use.

7. Conclusions

In this paper, we are aimed at the data set, which has heavy-tailed distribution, high dimension, and high correlations within the blocks of the covariates. By generalizing the factor-adjusted regularized variable selection for mean regression to the quantile regression, we proposed the method of factor-augmented regularized variable selection for quantile regression ( Farvsqr). In order to analyze the theoretical analysis and improve estimation accuracy and computational efficiency for fitting large-scale linear quantile regression models, we use the convolution-type smoothed quantile regression to estimate the quantile regression coefficients. The paper gives the theoretical result of the estimators. At the same time, from the simulation and the real data analysis, we can see that our method is better than LASSO. In the future, we will continue to study the missing data variable selection for quantile regression with the high correlations within the blocks of the covariates.

Author Contributions

Conceptualization, M.T.; methodology, Y.Z.; software, Q.W.; validation, Y.Z. and Q.W.; formal analysis, M.T.; investigation, M.T.; resources, M.T.; data curation, Y.Z.; writing—original draft preparation, Y.Z.; writing—review and editing, M.T.; visualization, M.T.; supervision, M.T.; project administration, M.T.; funding acquisition, M.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Fundamental Research Funds for the Central Universities and the Research Funds of Renmin University of China (22XNL016).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The researchers can download the FRED-QD database from the website http://research.stlouisfed.org/econ/mccracken/sel/ (accessed on 28 June 2022).

Acknowledgments

The authors would like to thank for Liwen Xu for some helpful suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

QR	Quantile Regression
Conquer	Convolution-type Smoothed Quantile Regression)
PCA	Principal Component Analysis

Appendix A

Appendix A.1. Proof of Lemma 1

Let

V = {(v_{1}, \dots, v_{n})}^{T}

and

θ^{*} = {({(β^{*})}^{T}, {(β^{*})}^{T} Λ_{0})}^{T}

. Note that

\begin{matrix} \nabla E [R (y, V θ)] = E {\frac{1}{n} \sum_{i = 1}^{n} [K_{h} ((1, x_{i}^{T}) β - y_{i}) - τ] {(1, x_{i}^{T})}^{T}} \\ = E {[K_{h} ((1, x_{1}^{T}) β - y_{1}) - τ] {(1, x_{1}^{T})}^{T}} \\ = E {[K_{h} (v_{1}^{T} θ - y_{1}) - τ] v_{1}} \end{matrix}

and

v_{i}^{T} θ^{*} = (1, x_{i}^{T}) β^{*}

. So the conclusion can be proved by

\begin{matrix} \nabla E [R (y, V θ)] |_{θ = θ^{*}} \\ = E {[K_{h} (v_{1}^{T} θ^{*} - y_{1}) - τ] v_{1}} \\ = E {[K_{h} ((1, x_{1}^{T}) β^{*} - y_{1}) - τ] v_{1}} \\ = E [η_{1} v_{1}] = 0_{p + 1 + k} \end{matrix}

Appendix A.2. Proof of Theorem 1

In order to proof the theorem 1, let us introduce the Lemma A1 from Fan et al. [18] first. When we assume that the last k variables are not penalized, let

R (\cdot) : R^{p + 1 + k} \to R

be a convex function,

θ^{*}

and

β^{*} = θ_{[p + 1]}^{*}

be the sparse sub-vector of interest. Then,

θ^{*}

and

β^{*}

are estimated by

\begin{matrix} \hat{θ} = a r g m i n {R (θ) + λ ∥ θ_{[p + 1]} ∥_{1}} \\ \hat{β} = {\hat{θ}}_{[p + 1]} \end{matrix}

Let

L = s u p p (θ^{*}), L_{1} = s u p p (β^{*}), L_{2} = [p + 1 + k] \ L

. Then, we can obtain the Lemma A1 as follows:

Assumption A1

(Smoothness).

R (θ) \in C^{2} (R^{p + k + 1})

and there exist

A > 0, W > 0

such that

∥ \nabla_{\cdot L}^{2} R (θ) - \nabla_{\cdot L}^{2} R (θ^{*}) ∥_{\infty} \leq W {∥ θ - θ^{*} ∥}_{2}

whenever

s u p p (θ) \in L

and

∥ θ - θ^{*} ∥_{2} \leq A

;

Assumption A2

(Restricted strong convexity). There exist

ρ_{2} > ρ_{\infty} > 0

such that

∥ {[\nabla_{L L}^{2} R (θ^{*})]}^{- 1} ∥_{\infty} \leq \frac{1}{2 ρ_{\infty}}

and

∥ {[\nabla_{L L}^{2} R (θ^{*})]}^{- 1} ∥_{2} \leq \frac{1}{2 ρ_{2}}

;

Assumption A3

(Irrepresentable condition).

∥ \nabla_{L_{2} L}^{2} R (θ^{*}) {[\nabla_{L L}^{2} R (θ^{*})]}^{- 1} ∥_{\infty} \leq 1 - γ

for some

γ \in (0, 1)

;

Lemma A1.

Under Assumptions A1–A3, if

\frac{7}{γ} {∥ \nabla R (θ^{*}) ∥}_{\infty} < λ < \frac{ρ_{2}}{4 L} m i n A, \frac{ρ_{\infty} γ}{3 W}

then

s u p p (\hat{θ}) \subset L

and

\begin{matrix} ∥ \hat{θ} - θ^{*} ∥_{\infty} \leq \frac{3}{5 ρ_{\infty}} (∥ \nabla_{L} R (θ^{*}) ∥_{\infty} + λ) \\ ∥ \hat{θ} - θ^{*} ∥_{2} \leq \frac{2}{ρ_{2}} (∥ \nabla_{L} R (θ^{*}) ∥_{2} + λ \sqrt{L_{1}}) \\ ∥ \hat{θ} - θ^{*} ∥_{1} \leq m i n \frac{3}{5 ρ_{\infty}} (∥ \nabla_{L} R (θ^{*}) ∥_{1} + λ \sqrt{L_{1}}), \frac{2 \sqrt{L}}{ρ_{2}} (∥ \nabla_{L} R (θ^{*}) ∥_{2} + λ \sqrt{L_{1}}) . \end{matrix}

Next, we will give the proof of the Theorem 1.

Proof of Theorem 1.

As we know,

\hat{θ} = a r g m i n_{θ} {R (y, \hat{V} θ) + λ ∥ θ_{[p + 1]} ∥_{1}}

. From Assumption 4, we know that

M_{0}

is nonsingular and

M = (\begin{matrix} I_{p + 1} & 0_{(p + 1) \times k} \\ 0_{(p + 1) \times k} & M_{0} \end{matrix})

. Let

\bar{V} = \hat{V} M, \bar{θ} = M^{- 1} \hat{θ}, {\hat{Λ}}_{0} = {(0_{k}^{T}, {\hat{Λ}}^{T})}^{T}, {\hat{θ}}^{*} = (\begin{matrix} β^{*} \\ {\hat{Λ}}_{0} β^{*} \end{matrix}), {\bar{θ}}^{*} = M^{- 1} {\hat{θ}}^{*}

. So, we can see that

\hat{β} = {\hat{θ}}_{[p + 1]} = {\bar{θ}}_{[p + 1]}

and

\bar{θ} = a r g m i n_{θ} {R (y, \bar{V} θ) + λ ∥ θ_{[p + 1]} ∥_{1}}

. So,

s u p p (\hat{β}) = s u p p ({\bar{θ}}_{[p + 1]})

and

∥ \hat{β} - β^{*} ∥ = ∥ {\bar{θ}}_{[p + 1]} - {\bar{θ}}_{[p + 1]}^{*} ∥ \leq ∥ \bar{θ} - {\bar{θ}}^{*} ∥

for any norm.

Then, we can change to study

\bar{θ}

and the objective function

R (y, \bar{V} θ)

in order to study the theoretical properties of

\hat{β}

. We will give the Theorem A1 which means all the assumptions in Lemma A1 are fulfilled.

Let

v_{i}^{T}

and

{\bar{v}}_{i}^{T}

be the

i -

th row of

V

and

\bar{V}

, respectively. We can see that

R (y, \bar{V} θ) = \frac{1}{n} \sum_{i = 1}^{n} L_{h} (y_{i} - {\bar{v}}_{i}^{T} θ), \nabla R (y, \bar{V} θ) = \frac{1}{n} \sum_{i = 1}^{n} {K_{h} ({\bar{v}}_{i}^{T} θ - y_{i}) - τ} {\bar{v}}_{i}

,

\bar{V} {\bar{θ}}^{*} = X β^{*}

. Hence

∥ \nabla R (y, \bar{V} {\bar{θ}}^{*}) ∥_{\infty} = ω

. From the properties of the vector norm, we can obtain

∥ \nabla_{L} R (y, \bar{V} {\bar{θ}}^{*}) ∥_{\infty} \leq ω, ∥ \nabla_{L} R (y, \bar{V} {\bar{θ}}^{*}) ∥_{2} \leq ω \sqrt{L}, {∥ \nabla_{L} R (y, \bar{V} {\bar{θ}}^{*}) ∥}_{1} \leq ω L

. In addition, let

λ > \frac{7 ω}{γ} \geq ω

. From Lemma A1, we can obtain that Theorem 1 is true. □

Theorem A1.

Based on all the Assumptions 2–4, define

W = W_{0}^{3} W_{3} L^{3 / 2}

, then

\begin{matrix} (i) ∥ \nabla_{\cdot L}^{2} R (y, \bar{V} θ) - \nabla_{\cdot L}^{2} R (y, \bar{V} {\bar{θ}}^{*}) ∥_{\infty} \leq W {∥ θ - {\bar{θ}}^{*} ∥}_{2}, \\ (i i) ∥ {[\nabla_{L L}^{2} R (y, \bar{V} {\bar{θ}}^{*})]}^{- 1} ∥_{\infty} \leq \frac{1}{2 ρ_{\infty}}, \\ (i i i) ∥ {[\nabla_{L L}^{2} R (y, \bar{V} {\bar{θ}}^{*})]}^{- 1} ∥_{2} \leq \frac{1}{2 ρ_{2}}, \\ (i v) ∥ \nabla_{L_{2} L}^{2} R (y, \bar{V} {\bar{θ}}^{*}) {[\nabla_{L L}^{2} R (y, \bar{V} {\bar{θ}}^{*})]}^{- 1} ∥_{\infty} \leq 1 - γ . \end{matrix}

Proof.

(i)

V θ^{*} = \bar{V} {\bar{θ}}^{*} = X β^{*}

, then

\begin{matrix} \nabla^{2} R (y, V θ^{*}) = \frac{1}{n} \sum_{i = 1}^{n} {K_{h} ({\bar{v}}_{i}^{T} {\bar{θ}}^{*} - y_{i})} v_{i} v_{i}^{T}, \\ \nabla^{2} R (y, \bar{V} {\bar{θ}}^{*}) = \frac{1}{n} \sum_{i = 1}^{n} {K_{h} ({\bar{v}}_{i}^{T} {\bar{θ}}^{*} - y_{i})} {\bar{v}}_{i} {\bar{v}}_{i}^{T} . \end{matrix}

For any

j, t \in [p + 1 + k]

and

s u p p (θ) \in L

, we have

\begin{matrix} |\nabla_{j t}^{2} R (y, \bar{V} θ) - \nabla_{j t}^{2} R (y, \bar{V} {\bar{θ}}^{*})| \\ = \frac{1}{n} |\sum_{i = 1}^{n} K_{h} ({\bar{v}}_{i}^{T} θ - y_{i}) {\bar{v}}_{i j} {\bar{v}}_{i t} - \sum_{i = 1}^{n} K_{h} ({\bar{v}}_{i}^{T} {\bar{θ}}^{*} - y_{i}) {\bar{v}}_{i j} {\bar{v}}_{i t}| \\ = \frac{1}{n} |\sum_{i = 1}^{n} [K_{h} ({\bar{v}}_{i}^{T} θ - y_{i}) - K_{h} ({\bar{v}}_{i}^{T} {\bar{θ}}^{*} - y_{i})] {\bar{v}}_{i j} {\bar{v}}_{i t}| \\ \leq \frac{1}{n} \sum_{i = 1}^{n} |K_{h} ({\bar{v}}_{i}^{T} θ - y_{i}) - K_{h} ({\bar{v}}_{i}^{T} {\bar{θ}}^{*} - y_{i})| |{\bar{v}}_{i j} {\bar{v}}_{i t}| \\ \leq \frac{1}{n} \sum_{i = 1}^{n} W_{3} |v_{i}^{T} (θ - θ^{*})| {∥ \bar{V} ∥}_{m a x}^{2} \end{matrix}

(A1)

By the Cauchy–Schwarz inequality and

∥ \bar{V} ∥_{m a x} \leq {∥ V ∥}_{m a x} + {∥ \bar{V} - V ∥}_{m a x} \leq W_{0}

, so for

i \in [n]

, we have

|{\bar{v}}_{i}^{T} (θ - {\bar{θ}}^{*})| = |{\bar{v}}_{i L}^{T} {(θ - {\bar{θ}}^{*})}_{L}| \leq ∥ {\bar{v}}_{i L} ∥_{2} ∥ θ - {\bar{θ}}^{*} ∥_{2} \leq \sqrt{L} W_{0} {∥ θ - {\bar{θ}}^{*} ∥}_{2}

. Plugging this result back to (A1), we can obtain

\begin{matrix} |\nabla_{j t}^{2} R (y, \bar{V} θ) - \nabla_{j t}^{2} R (y, \bar{V} {\bar{θ}}^{*})| \leq \sqrt{L} W_{3} W_{0}^{3} {∥ θ - {\bar{θ}}^{*} ∥}_{2}, \forall j, t \in [p + 1 + k] \\ ∥ \nabla_{\cdot L}^{2} R (y, \bar{V} θ) - \nabla_{\cdot L}^{2} R (y, \bar{V} {\bar{θ}}^{*}) ∥_{\infty} \leq L^{3 / 2} W_{3} W_{0}^{3} ∥ θ - {\bar{θ}}^{*} {∥_{2} = M θ - {\bar{θ}}^{*} ∥}_{2} . \end{matrix}

(ii) For any

t \in [p + 1 + k]

, we have

\begin{matrix} ∥ \nabla_{t L}^{2} R (y, \bar{V} {\bar{θ}}^{*}) - \nabla_{\cdot L}^{2} R (y, V θ^{*}) ∥_{\infty} = \frac{1}{n} {∥ \sum_{i = 1}^{n} K_{h} ({\bar{v}}_{i}^{T} {\bar{θ}}^{*} - y_{i}) ({\bar{v}}_{i t} {\bar{v}}_{i L}^{T} - v_{i t} v_{i L}^{T}) ∥}_{\infty} \\ \leq \frac{1}{n} \sum_{i = 1}^{n} K_{h} ({\bar{v}}_{i}^{T} {\bar{θ}}^{*} - y_{i}) ∥ {\bar{v}}_{i t} {\bar{v}}_{i L}^{T} - v_{i t} v_{i L}^{T} ∥_{m a x} \leq \frac{W_{2} \sqrt{L}}{n} \sum_{i = 1}^{n} {∥ {\bar{v}}_{i t} {\bar{v}}_{i L}^{T} - v_{i t} v_{i L}^{T} ∥}_{2} . \end{matrix}

With

{∥ V ∥}_{m a x} \leq W_{0} / 2, {∥ \bar{V} ∥}_{m a x} \leq W_{0}

, we can obtain

\begin{matrix} ∥ {\bar{v}}_{i t} {\bar{v}}_{i L}^{T} - v_{i t} v_{i L}^{T} ∥_{2} = {∥ {\bar{v}}_{i t} {\bar{v}}_{i L}^{T} - v_{i t} v_{i L}^{T} + v_{i t} {\bar{v}}_{i L}^{T} - v_{i t} {\bar{v}}_{i L}^{T} ∥}_{2} \\ \leq ∥ v_{i t} {({\bar{v}}_{i L} - v_{i L})}^{T} ∥_{2} + {∥ ({\bar{v}}_{i t} - v_{i t}) {\bar{v}}_{i L}^{T} ∥}_{2} \\ \leq |v_{i t}| ∥ {\bar{v}}_{i L} - v_{i L} ∥_{2} + |{\bar{v}}_{i t} - v_{i t}| {∥ {\bar{v}}_{i L}^{T} ∥}_{2} \\ \leq {∥ V ∥}_{m a x} ∥ {\bar{v}}_{i L} - v_{i L} ∥_{2} + |{\bar{v}}_{i t} - v_{i t}| \sqrt{L} {∥ \bar{V} ∥}_{m a x} \\ \leq \frac{W_{0}}{2} {∥ {\bar{v}}_{i L} - v_{i L} ∥}_{2} + W_{0} \sqrt{L} |{\bar{v}}_{i t} - v_{i t}| . \end{matrix}

From Assumption 4, we know that

σ = m a x_{j \in [p + 1 + k]} {(\frac{1}{n} \sum_{i = 1}^{n} {|{\bar{v}}_{i j} - v_{i j}|}^{2})}^{1 / 2}

. By Jensen’s inequality,

\forall J \subseteq [p + 1 + k]

, we have

\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} {∥ {\bar{v}}_{i J} - v_{i J} ∥}_{2} \leq {(\frac{1}{n} \sum_{i = 1}^{n} {∥ {\bar{v}}_{i J} - v_{i J} ∥}_{2}^{2})}^{1 / 2} \\ \leq {(\frac{J}{n} m a x_{j \in [p + 1 + k] \sum_{i = 1}^{n}} |{\bar{v}}_{i j} - v_{i j}|)}^{1 / 2} \leq \sqrt{L} σ . \end{matrix}

So

\begin{matrix} ∥ \nabla_{\cdot L}^{2} R (y, \bar{V} θ^{*}) - \nabla_{\cdot L}^{2} R (y, V θ^{*}) ∥_{\infty} \\ = L \cdot m a x_{j \in [p + 1 + k]} {∥ \nabla_{j L}^{2} R (y, \bar{V} θ^{*}) - \nabla_{j L}^{2} R (y, V θ^{*}) ∥}_{\infty} \\ \leq L \frac{W_{2} \sqrt{L}}{N} \sum_{i = 1}^{n} (\frac{W_{0}}{2} {∥ {\bar{v}}_{i L} - v_{i L} ∥}_{2} + W_{0} \sqrt{L} |{\bar{v}}_{i j} - v_{i j}|) \\ = \frac{W_{0} W_{2} \sqrt{L}}{2} \frac{1}{n} \sum_{i = 1}^{n} {∥ {\bar{v}}_{i L} - v_{i L} ∥}_{2} + W_{0} W_{2} L^{2} \frac{1}{n} \sum_{i = 1}^{n} |{\bar{v}}_{i j} - v_{i j}| \\ \leq \frac{W_{0} W_{2} L^{2}}{2} σ + W_{0} W_{2} L^{2} σ \\ = \frac{3}{2} W_{0} W_{2} L^{2} σ . \end{matrix}

(A2)

Let

κ = ∥ {(\nabla_{L L}^{2} R (y, V θ^{*}))}^{- 1} [\nabla_{L L}^{2} R (y, \bar{V} {\bar{θ}}^{*}) - \nabla_{L L}^{2} R (y, V θ^{*})] ∥_{\infty}

, then we can obtain

\begin{matrix} κ \leq ∥ {(\nabla_{L L}^{2} R (y, V θ^{*}))}^{- 1} ∥_{\infty} {∥ \nabla_{L L}^{2} R (y, \bar{V} {\bar{θ}}^{*}) - \nabla_{L L}^{2} R (y, V θ^{*}) ∥}_{\infty} \\ \leq \frac{3}{8 ρ_{\infty}} W_{0} W_{2} L^{2} σ \\ \leq \frac{1}{2} \end{matrix}

(A3)

And

\begin{matrix} ∥ {(\nabla_{L L}^{2} R (y, \bar{V} {\bar{θ}}^{*}))}^{- 1} - {(\nabla_{L L}^{2} R (y, V θ^{*}))}^{- 1} ∥_{\infty} \\ \leq ∥ {(\nabla_{L L}^{2} R (y, V θ^{*}))}^{- 1} ∥_{\infty} \frac{κ}{1 - κ} \\ \leq \frac{1}{4 ρ_{\infty}} \end{matrix}

So

\begin{matrix} ∥ {(\nabla_{L L}^{2} R (y, \bar{V} {\bar{θ}}^{*}))}^{- 1} ∥_{\infty} \\ \leq ∥ {(\nabla_{L L}^{2} R (y, V θ^{*}))}^{- 1} ∥_{\infty} + \frac{1}{4 ρ_{\infty}} \\ \frac{1}{2 ρ_{\infty}} \end{matrix}

(A4)

(iii) The third conclusion can be obtained easily from (A4). Since for any symmetric matrix B,

{∥ B ∥}_{2} \leq {∥ B ∥}_{\infty}

is satisfied. We can obtain

∥ {(\nabla_{L L}^{2} R (y, \bar{V} {\bar{θ}}^{*}))}^{- 1} - {(\nabla_{L L}^{2} R (y, V θ^{*}))}^{- 1} ∥_{2} \leq \frac{1}{4 ρ_{\infty}} \leq \frac{1}{4 ρ_{2}}

, and thus

∥ {(\nabla_{L L}^{2} R (y, \bar{V} {\bar{θ}}^{*}))}^{- 1} ∥_{2} \leq \frac{1}{2 ρ_{2}}

(iv)

\begin{matrix} ∥ \nabla_{L_{2} L}^{2} R (y, \bar{V} {\bar{θ}}^{*}) {(\nabla_{L L}^{2} R (y, \bar{V} {\bar{θ}}^{*}))}^{- 1} - \nabla_{L_{2} L}^{2} R (y, V θ^{*}) {(\nabla_{L L}^{2} R (y, V θ^{*}))}^{- 1} ∥_{\infty} \\ = ∥ \nabla_{L_{2} L}^{2} R (y, \bar{V} {\bar{θ}}^{*}) {(\nabla_{L L}^{2} R (y, \bar{V} {\bar{θ}}^{*}))}^{- 1} + \nabla_{L_{2} L}^{2} R (y, V θ^{*}) {(\nabla_{L L}^{2} R (y, \bar{V} {\bar{θ}}^{*}))}^{- 1} \\ - \nabla_{L_{2} L}^{2} R (y, V θ^{*}) {(\nabla_{L L}^{2} R (y, \bar{V} {\bar{θ}}^{*}))}^{- 1} - \nabla_{L_{2} L}^{2} R (y, V θ^{*}) {(\nabla_{L L}^{2} R (y, V θ^{*}))}^{- 1} ∥_{\infty} \\ \leq ∥ \nabla_{L_{2} L}^{2} R (y, \bar{V} {\bar{θ}}^{*}) - \nabla_{L_{2} L}^{2} R (y, V θ^{*}) ∥_{\infty} {∥ {(\nabla_{L L}^{2} R (y, \bar{V} {\bar{θ}}^{*}))}^{- 1} ∥}_{\infty} \\ + ∥ \nabla_{L_{2} L}^{2} R (y, V θ^{*}) [{(\nabla_{L L}^{2} R (y, \bar{V} {\bar{θ}}^{*}))}^{- 1} - {(\nabla_{L L}^{2} R (y, V θ^{*}))}^{- 1}] ∥_{\infty} \end{matrix}

From the conclusion (ii) and (A2), we can obtain that

∥ \nabla_{L_{2} L}^{2} R (y, \bar{V} {\bar{θ}}^{*}) - \nabla_{L_{2} L}^{2} R (y, V θ^{*}) ∥_{\infty} {∥ {(\nabla_{L L}^{2} R (y, \bar{V} {\bar{θ}}^{*}))}^{- 1} ∥}_{\infty} \leq \frac{3}{4 ρ_{\infty}} W_{0} W_{2} L^{2} σ

On the other hand, we can take

A = \nabla_{L_{2} L}^{2} R (y, V θ^{*}), B = \nabla_{L L}^{2} R (y, V θ^{*}), C = \nabla_{L L}^{2} R (y, \bar{V} {\bar{θ}}^{*}) - \nabla_{L L}^{2} R (y, V θ^{*})

. By Assumption 4,

∥ A B^{- 1} ∥_{\infty} \leq 1 - 2 γ \leq 1

, and we have

\begin{matrix} ∥ \nabla_{L_{2} L}^{2} R (y, V θ^{*}) {[{(\nabla_{L L}^{2} R (y, \bar{V} {\bar{θ}}^{*}))}^{- 1} - {(\nabla_{L L}^{2} R (y, V θ^{*}))}^{- 1} ∥}_{\infty} \\ = ∥ A [{(B + C)}^{- 1} - B^{- 1}] ∥_{\infty} \\ \leq ∥ A B^{- 1} ∥_{\infty} \frac{∥ C B^{- 1} ∥_{\infty}}{1 - ∥ C B^{- 1} ∥_{\infty}} \\ \leq \frac{{∥ C ∥}_{\infty} {∥ B^{- 1} ∥}_{\infty}}{1 - {∥ C ∥}_{\infty} {∥ B^{- 1} ∥}_{\infty}} \end{matrix}

From Formula (A3), we can obtain

{∥ C ∥}_{\infty} {∥ B^{- 1} ∥}_{\infty} \leq \frac{3}{8 ρ_{\infty}} W_{0} W_{2} L^{2} σ \leq \frac{1}{2}

. As a result,

∥ \nabla_{L_{2} L}^{2} R (y, V θ^{*}) {[{(\nabla_{L L}^{2} R (y, \bar{V} {\bar{θ}}^{*}))}^{- 1} - {(\nabla_{L L}^{2} R (y, V θ^{*}))}^{- 1} ∥}_{\infty} \leq \frac{3}{4 ρ_{\infty}} W_{0} W_{2} L^{2} σ

By combining these estimates, we have

\begin{matrix} ∥ \nabla_{L_{2} L}^{2} R (y, \bar{V} {\bar{θ}}^{*}) {(\nabla_{L L}^{2} R (y, \bar{V} {\bar{θ}}^{*}))}^{- 1} - \nabla_{L_{2} L}^{2} R (y, V θ^{*}) {(\nabla_{L L}^{2} R (y, V θ^{*}))}^{- 1} ∥_{\infty} \\ \leq \frac{3}{4 ρ_{\infty}} W_{0} W_{2} L^{2} σ + \frac{3}{4 ρ_{\infty}} W_{0} W_{2} L^{2} σ \\ \leq \frac{3}{2 ρ_{\infty}} W_{0} W_{2} L^{2} σ \\ \leq γ \end{matrix}

Therefore,

∥ \nabla_{L_{2} L}^{2} R (y, \bar{V} {\bar{θ}}^{*}) {(\nabla_{L L}^{2} R (y, \bar{V} {\bar{θ}}^{*}))}^{- 1} ∥_{\infty} \leq (1 - 2 γ) + γ = 1 - γ

. □

References

Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef]
Candes, E.; Tao, T. The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Stat. 2007, 35, 2313–2351. [Google Scholar]
Donoho, D.L.; Elad, M. Optimally sparse representation in general (nonorthogonal) dictionaries via l₁ minimization. Proc. Natl. Acad. Sci. USA 2003, 100, 2197–2202. [Google Scholar] [CrossRef]
Fan, J.; Peng, H. On non-concave penalized likelihood with diverging number of parameters. Ann. Stat. 2004, 32, 928–961. [Google Scholar] [CrossRef]
Efron, B.; Hastie, T.; Johnstone, I.; Tibshirani, R. Least angle regression. Ann. Stat. 2004, 32, 407–499. [Google Scholar] [CrossRef]
Meinshausen, N.; Bühlmann, P. High-dimensional graphs and variable selection with the lasso. Ann. Stat. 2006, 34, 1436–1462. [Google Scholar] [CrossRef]
Zhao, P.L.; Yu, B. On model selection consistency of Lasso. J. Mach. Learn. Res. 2006, 7, 2541–2563. [Google Scholar]
Fan, J.; Lv, J. Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B Stat. Methodol. 2008, 70, 849–911. [Google Scholar] [CrossRef] [PubMed]
Zou, H.; Li, R. One-step sparse estimates in nonconcave penalized likelihood models. Ann. Stat. 2008, 36, 1509–1533. [Google Scholar]
Bickel, P.J.; Ritov, Y.A.; Tsybakov, A.B. Simultaneous analysis of lasso and dantzig selector. Ann. Stat. 2009, 37, 1705–1732. [Google Scholar] [CrossRef]
Wainwright, M.J. Sharp thresholds for high-dimensional and noisy sparsity recovery using-constrained quadratic programming (lasso) quadratic programming (Lasso). IEEE Trans. Inform. Theory 2009, 55, 2183–2202. [Google Scholar] [CrossRef]
Zhang, C.H. Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 2010, 38, 894–942. [Google Scholar] [CrossRef]
Stock, J.; Watson, M. Forecasting using principal components from a large number of predictors. J. Am. Stat. Assoc. 2002, 97, 1167–1179. [Google Scholar] [CrossRef]
Bai, J.; Ng, S. Determining the number of factors in approximate factor models. Econometrica 2002, 70, 191–221. [Google Scholar] [CrossRef]
McCracken, M.; Ng, S. FRED-QD: A Quarterly Database for Macroeconomic Research; Federal Reserve Bank of St. Louis: St. Louis, MO, USA, 2021. [Google Scholar]
Fan, J.; Ke, Y.; Wang, K. Factor-Adjusted Regularized Model Selection. J. Econom. 2020, 216, 71–85. [Google Scholar] [CrossRef]
Koenker, R.; Bassett, G. Regression quantiles. Econometrica 1978, 46, 33–50. [Google Scholar] [CrossRef]
Koenker, R. Quantile Regression; Cambridge University Press: Cambridge, UK, 2005. [Google Scholar]
Koenker, R.; Chernozhukov, V.; He, X.; Peng, L. Handbook of Quantile Regression; CRC Press: New York, NY, USA, 2017. [Google Scholar]
Ando, T.; Tsay, R.S. Quantile regression models with factor-augmented predictors and information criterion. Econom. J. 2011, 14, 1–24. [Google Scholar] [CrossRef]
Hirzel, A.H.; Hausser, J.; Chessel, D.; Perrin, N. Ecological-niche factor analysis: How to compute habitat-suitability maps without absence data? Ecology 2002, 83, 2027–2036. [Google Scholar] [CrossRef]
Hochreiter, S.; Clevert, D.A.; Obermayer, K. A new summarization method for affymetrix probe level data. Bioinformatics 2006, 22, 943–949. [Google Scholar] [CrossRef] [PubMed]
Gonalves, K.; Silva, A. Bayesian quantile factor models. arXiv 2020, arXiv:2002.07242. [Google Scholar]
Chang, J.; Guo, B.; Yao, Q. High dimensional stochastic regression with latent factors, endogeneity and nonlinearity. J. Econom. 2015, 189, 297–312. [Google Scholar] [CrossRef]
Chamberlain, G.; Rothschild, M. Arbitrage, factor structure, and mean–variance analysis on large asset markets. Econometrica 1982, 51, 1305–1324. [Google Scholar] [CrossRef]
Bai, J. Inferential theory for factor models of large dimensions. Econometrica 2003, 71, 135–171. [Google Scholar] [CrossRef]
Lam, C.; Yao, Q. Factor modeling for high-dimensional time series: Inference for the number of factors. Ann. Stat. 2012, 40, 694–726. [Google Scholar] [CrossRef]
Fan, J.; Liao, Y.; Mincheva, M. Large covariance estimation by thresholding principal orthogonal complements. J. R. Stat. Soc. Ser. (Stat. Methodol.) 2013, 75, 603–680. [Google Scholar] [CrossRef] [PubMed]
Fan, J.; Liu, H.; Wang, W. Large covariance estimation through elliptical factor models. Ann. Stat. 2018, 46, 1383–1414. [Google Scholar] [CrossRef]
Ando, T.; Bai, J. Quantile co-movement in financial markets: A panel quantile model with unobserved heterogeneity. J. Am. Stat. Assoc. 2020, 115, 266–279. [Google Scholar] [CrossRef]
He, X.; Pan, X.; Tan, K.M.; Zhou, W.X. Smoothed quantile regression with large-scale inference. J. Econom. 2021; in press. [Google Scholar] [CrossRef]
Forni, M.; Hallin, M.; Lippi, L. The generalized dynamic factor model: One-sided estimation and forecasting. J. Am. Stat. Assoc. 2005, 100, 830–840. [Google Scholar] [CrossRef]

Figure 1. The density of the response.

Table 1. The comparison for

p = 200, N (0, 1)

with the factor model.

Table 1. The comparison for

p = 200, N (0, 1)

with the factor model.

		Farvsqr					LASSO
$τ$	n	Estimation Error	Average Model Size	TP	TN	Elapsed Time (in Seconds)	Estimation Error	Average Model Size	TP	TN	Elapsed Time (in Seconds)
$τ = 0.25$	n = 100	0.442	26	3/200	174/200	1.338	2.522	24	3/200	176/200	3.876
	n = 300	0.218	21	3/200	179/200	2.697	2.312	27	3/200	173/200	12.818
	n = 500	0.174	19	3/200	181/200	2.951	2.215	35	3/200	165/200	18.989
	n = 800	0.139	20	3/200	180/200	2.108	2.443	38	3/200	162/200	23.558
	n = 1000	0.127	19	3/200	181/200	2.268	2.586	42	3/200	158/200	28.702
$τ = 0.5$	n = 100	0.346	31	3/200	169/200	1.015	2.226	23	3/200	177/200	3.418
	n = 300	0.200	22	3/200	178/200	1.946	2.054	28	3/200	172/200	12.735
	n = 500	0.154	22	3/200	178/200	1.792	2.132	36	3/200	164/200	18.406
	n = 800	0.132	20	3/200	180/200	1.811	2.355	40	3/200	160/200	23.207
	n = 1000	0.116	19	3/200	181/200	2.004	2.594	45	3/200	155/200	27.984
$τ = 0.75$	n = 100	0.418	26	3/200	174/200	1.255	2.457	22	3/200	178/200	3.525
	n = 300	0.228	21	3/200	179/200	2.715	2.218	26	3/200	174/200	12.949
	n = 500	0.171	20	3/200	180/200	3.049	2.241	34	3/200	166/200	19.219
	n = 800	0.141	21	3/200	179/200	2.099	2.474	39	3/200	161/200	24.118
	n = 1000	0.128	20	3/200	180/200	2.216	2.694	44	3/200	156/200	28.402
$τ = 0.9$	n = 100	0.583	23	3/200	177/200	1.784	3.337	22	3/200	178/200	3.727
	n = 300	0.285	21	3/200	179/200	4.746	2.718	25	3/200	175/200	13.996
	n = 500	0.216	20	3/200	180/200	6.356	2.640	31	3/200	169/200	20.913
	n = 800	0.171	21	3/200	179/200	5.812	2.914	37	3/200	163/200	25.975
	n = 1000	0.158	20	3/200	180/200	3.923	3.045	42	3/200	158/200	29.713

Table 2. The comparison for

p = 200, t_{2}

with the factor model.

Table 2. The comparison for

p = 200, t_{2}

with the factor model.

		Farvsqr					LASSO
$τ$	n	Estimation Error	Average Model Size	TP	TN	Elapsed Time (in Seconds)	Estimation Error	Average Model Size	TP	TN	Elapsed Time (in Seconds)
$τ = 0.25$	n = 100	0.675	22	3/200	178/200	2.587	2.513	20	3/200	180/200	7.432
	n = 300	0.347	22	3/200	178/200	5.262	2.074	26	3/200	174/200	18.704
	n = 500	0.257	22	3/200	178/200	5.251	2.230	34	3/200	166/200	27.196
	n = 800	0.201	21	3/200	179/200	3.389	2.487	42	3/200	158/200	27.192
	n = 1000	0.158	23	3/200	177/200	3.457	2.394	40	3/200	160/200	24.866
$τ = 0.5$	n = 100	0.545	25	3/200	175/200	1.177	2.256	20	3/200	180/200	3.862
	n = 300	0.257	22	3/200	178/200	2.988	1.830	27	3/200	173/200	13.639
	n = 500	0.194	21	3/200	179/200	2.728	2.029	34	3/200	166/200	22.801
	n = 800	0.149	20	3/200	180/200	2.502	2.268	41	3/200	159/200	26.217
	n = 1000	0.127	19	3/200	181/200	2.704	2.321	40	3/200	160/200	23.114
$τ = 0.75$	n = 100	0.655	26	3/200	174/200	1.366	2.608	22	3/200	178/200	3.672
	n = 300	0.320	24	3/200	176/200	4.114	2.101	27	3/200	173/200	14.301
	n = 500	0.254	22	3/200	178/200	4.757	2.228	34	3/200	166/200	25.602
	n = 800	0.182	24	3/200	176/200	3.279	2.435	41	3/200	159/200	27.394
	n = 1000	0.177	21	3/200	179/200	3.320	2.415	38	3/200	162/200	25.605
$τ = 0.9$	n = 100	1.222	26	3/200	174/200	2.779	3.617	22	3/200	178/200	5.240
	n = 300	0.638	22	3/200	178/200	7.501	2.743	26	3/200	174/200	17.543
	n = 500	0.487	22	3/200	178/200	9.414	2.738	30	3/200	170/200	27.684
	n = 800	0.373	22	3/200	178/200	9.757	2.867	38	3/200	162/200	31.129
	n = 1000	0.353	21	3/200	179/200	9.844	2.766	37	3/200	163/200	22.628

Table 3. The comparison for

p = 200, 0.1 * N (0, 1) + 0.9 * N (0, 9)

with the factor model.

Table 3. The comparison for

p = 200, 0.1 * N (0, 1) + 0.9 * N (0, 9)

with the factor model.

		Farvsqr					LASSO
$τ$	n	Estimation Error	Average Model Size	TP	TN	Elapsed Time (in Seconds)	Estimation Error	Average Model Size	TP	TN	Elapsed Time (in Seconds)
$τ = 0.25$	n = 100	1.119	28	3/200	172/200	1.599	2.667	22	3/200	178/200	4.086
	n = 300	0.620	24	3/200	176/200	3.893	2.762	32	3/200	168/200	13.850
	n = 500	0.502	23	3/200	177/200	4.236	2.648	36	3/200	164/200	21.379
	n = 800	0.379	24	3/200	176/200	3.461	2.509	39	3/200	161/200	28.357
	n = 1000	0.338	23	3/200	177/200	3.482	2.304	39	3/200	161/200	27.955
$τ = 0.5$	n = 100	1.049	30	3/200	170/200	1.252	2.359	22	3/200	178/200	3.583
	n = 300	0.583	25	3/200	175/200	3.082	2.608	32	3/200	168/200	13.517
	n = 500	0.469	24	3/200	176/200	3.074	2.481	37	3/200	163/200	20.439
	n = 800	0.349	24	3/200	176/200	2.845	2.395	40	3/200	160/200	27.249
	n = 1000	0.311	22	3/200	178/200	2.969	2.204	40	3/200	160/200	28.307
$τ = 0.75$	n = 100	1.183	28	3/200	172/200	1.498	2.606	22	3/200	178/200	3.552
	n = 300	0.618	27	3/200	173/200	3.882	2.808	32	3/200	168/200	13.790
	n = 500	0.491	23	3/200	177/200	4.157	2.695	36	3/200	164/200	21.212
	n = 800	0.380	23	3/200	177/200	3.406	2.531	38	3/200	162/200	27.762
	n = 1000	0.338	24	3/200	176/200	3.421	2.279	39	3/200	161/200	27.729
$τ = 0.9$	n = 100	1.469	24	3/200	176/200	2.078	3.490	21	3/200	179/200	3.577
	n = 300	0.856	21	3/200	179/200	6.467	3.380	31	3/200	169/200	15.179
	n = 500	0.640	22	3/200	178/200	7.692	3.175	33	3/200	167/200	23.323
	n = 800	0.500	21	3/200	179/200	7.326	2.871	34	3/200	166/200	29.211
	n = 1000	0.434	23	3/200	177/200	6.427	2.638	37	3/200	163/200	32.147

Table 4. The comparison for

n = 1000, N (0, 1)

with the factor model.

Table 4. The comparison for

n = 1000, N (0, 1)

with the factor model.

		Farvsqr					LASSO
$τ$	p	Estimation Error	Average Model Size	TP	TN	Elapsed Time (in Seconds)	Estimation Error	Average Model Size	TP	TN	Elapsed Time (in Seconds)
$τ = 0.25$	p = 200	0.124	19	3/200	181/200	2.268	2.059	35	3/200	165/200	27.725
	p = 300	0.136	21	3/300	279/300	2.977	2.934	52	3/300	248/300	29.432
	p = 400	0.149	20	3/400	380/400	3.989	3.417	60	3/400	340/400	21.408
	p = 500	0.153	22	3/500	478/500	4.914	3.961	63	3/500	437/500	18.900
	p = 600	0.158	21	3/600	579/600	6.013	3.821	70	3/600	530/600	19.541
$τ = 0.5$	p = 200	0.110	21	3/200	179/200	2.003	1.961	37	3/200	163/200	25.812
	p = 300	0.126	22	3/300	278/300	2.598	2.818	52	3/300	248/300	25.645
	p = 400	0.132	21	3/400	379/400	3.354	3.309	63	3/400	337/400	22.244
	p = 500	0.142	22	3/500	478/500	4.238	3.875	65	3/500	435/500	20.284
	p = 600	0.138	23	3/600	577/600	5.195	3.698	72	3/600	528/600	20.908
$τ = 0.75$	p = 200	0.120	20	3/200	180/200	2.247	2.051	36	3/200	164/200	25.729
	p = 300	0.141	21	3/300	279/300	2.939	2.890	52	3/300	248/300	28.248
	p = 400	0.139	21	3/400	379/400	3.830	3.466	62	3/400	338/400	23.445
	p = 500	0.149	20	3/500	480/500	4.866	3.972	62	3/500	438/500	19.635
	p = 600	0.148	23	3/600	577/600	5.967	3.870	71	3/600	529/600	18.038
$τ = 0.9$	p = 200	0.164	19	3/200	181/200	3.887	2.354	34	3/200	166/200	29.357
	p = 300	0.171	19	3/300	281/300	5.819	3.327	50	3/300	250/300	33.806
	p = 400	0.176	20	3/400	380/400	8.127	3.765	57	3/400	343/400	27.258
	p = 500	0.181	22	3/500	478/500	10.903	4.461	61	3/500	439/500	22.041
	p = 600	0.196	21	3/600	579/600	13.783	4.256	68	3/600	532/600	20.241

Table 5. The comparison for

n = 1000, t_{2}

with the factor model.

Table 5. The comparison for

n = 1000, t_{2}

with the factor model.

		Farvsqr					LASSO
$τ$	p	Estimation Error	Average Model Size	TP	TN	Elapsed Time (in Seconds)	Estimation Error	Average Model Size	TP	TN	Elapsed Time (in Seconds)
$τ = 0.25$	p = 200	0.183	20	3/200	180/200	6.311	2.249	38	3/200	162/200	20.116
	p = 300	0.191	23	3/300	277/300	8.945	3.272	55	3/300	245/300	12.822
	p = 400	0.208	24	3/400	376/400	12.027	3.074	58	3/400	342/400	13.202
	p = 500	0.219	27	3/500	473/500	15.986	3.861	76	3/500	424/500	9.092
	p = 600	0.210	23	3/600	577/600	19.367	4.269	86	3/600	514/600	9.049
$τ = 0.5$	p = 200	0.146	20	3/200	180/200	3.853	2.203	39	3/200	161/200	26.025
	p = 300	0.142	20	3/300	280/300	7.010	3.116	56	3/300	244/300	16.420
	p = 400	0.158	22	3/400	378/400	9.296	2.973	60	3/400	340/400	12.778
	p = 500	0.171	22	3/500	478/500	12.545	3.742	78	3/500	422/500	10.316
	p = 600	0.170	23	3/600	577/600	16.590	4.209	90	3/600	510/600	7.255
$τ = 0.75$	p = 200	0.182	22	3/200	178/200	6.187	2.251	38	3/200	162/200	23.494
	p = 300	0.196	21	3/300	279/300	8.831	3.253	56	3/300	244/300	14.122
	p = 400	0.207	23	3/400	377/400	11.974	3.120	59	3/400	341/400	12.617
	p = 500	0.221	22	3/500	478/500	15.781	3.926	77	3/500	423/500	9.743
	p = 600	0.223	23	3/600	577/600	19.573	4.292	84	3/600	516/600	8.488
$τ = 0.9$	p = 200	0.352	23	3/200	177/200	13.684	2.610	35	3/200	165/200	17.965
	p = 300	0.381	23	3/300	277/300	20.908	3.673	52	3/300	248/300	12.572
	p = 400	0.417	23	3/400	377/400	27.134	3.626	58	3/400	342/400	12.338
	p = 500	0.432	26	3/500	474/500	35.587	4.360	73	3/500	427/500	6.723
	p = 600	0.446	25	3/600	575/600	33.589	4.750	84	3/600	516/600	9.754

Table 6. The comparison for

n = 1000, 0.1 * N (0, 1) + 0.9 * N (0, 9)

with the factor model.

Table 6. The comparison for

n = 1000, 0.1 * N (0, 1) + 0.9 * N (0, 9)

with the factor model.

		Farvsqr					LASSO
$τ$	p	Estimation Error	Average Model Size	TP	TN	Elapsed Time (in Seconds)	Estimation Error	Average Model Size	TP	TN	Elapsed Time (in Seconds)
$τ = 0.25$	p = 200	0.364	21	3/200	179/200	3.435	2.323	40	3/200	160/200	31.611
	p = 300	0.387	23	3/300	277/300	4.788	3.281	59	3/300	241/300	31.467
	p = 400	0.401	26	3/400	374/400	6.411	3.649	64	3/400	336/400	25.958
	p = 500	0.431	25	3/500	475/500	8.340	3.860	75	3/500	425/500	21.536
	p = 600	0.417	25	3/600	575/600	10.548	4.215	85	3/600	515/600	15.388
$τ = 0.5$	p = 200	0.333	23	3/200	177/200	2.801	2.267	42	3/200	158/200	29.623
	p = 300	0.345	25	3/300	275/300	3.902	3.196	61	3/300	239/300	29.980
	p = 400	0.382	24	3/400	376/400	5.377	3.485	67	3/400	333/400	24.325
	p = 500	0.365	27	3/500	473/500	7.015	3.730	77	3/500	423/500	24.365
	p = 600	0.384	28	3/600	572/600	9.045	4.028	85	3/600	515/600	18.130
$τ = 0.75$	p = 200	0.359	23	3/200	177/200	3.262	2.320	42	3/200	158/200	30.568
	p = 300	0.384	23	3/300	277/300	4.589	3.309	59	3/300	241/300	29.940
	p = 400	0.404	25	3/400	375/400	6.283	3.620	62	3/400	338/400	25.600
	p = 500	0.407	26	3/500	474/500	8.242	3.825	73	3/500	427/500	22.053
	p = 600	0.433	27	3/600	573/600	10.525	4.117	83	3/600	517/600	15.519
$τ = 0.9$	p = 200	0.463	20	3/200	180/200	5.910	2.688	38	3/200	162/200	33.401
	p = 300	0.488	22	3/300	278/300	8.716	3.666	54	3/300	246/300	32.895
	p = 400	0.512	22	3/400	378/400	12.058	4.011	59	3/400	341/400	31.480
	p = 500	0.523	25	3/500	475/500	15.771	4.207	65	3/500	435/500	25.127
	p = 600	0.564	23	3/600	577/600	19.128	4.657	78	3/600	522/600	21.218

Table 7. The comparison for

p = 200, N (0, 1)