High-Dimensional Variable Selection for Quantile Regression Based on Variational Bayesian Method

Dai, Dengluan; Tang, Anmin; Ye, Jinli

doi:10.3390/math11102232

Open AccessArticle

High-Dimensional Variable Selection for Quantile Regression Based on Variational Bayesian Method

by

Dengluan Dai

,

Anmin Tang

^*

and

Jinli Ye

Yunnan Key Laboratory of Statistical Modeling and Data Analysis, Yunnan University, Kunming 650091, China

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(10), 2232; https://doi.org/10.3390/math11102232

Submission received: 12 April 2023 / Revised: 1 May 2023 / Accepted: 8 May 2023 / Published: 10 May 2023

(This article belongs to the Special Issue Big Data Mining and Analytics with Applications)

Download

Browse Figure

Versions Notes

Abstract

:

The quantile regression model is widely used in variable relationship research of moderate sized data, due to its strong robustness and more comprehensive description of response variable characteristics. With the increase of data size and data dimensions, there have been some studies on high-dimensional quantile regression under the classical statistical framework, including a high-efficiency frequency perspective; however, this comes at the cost of randomness quantification, or the use of a lower efficiency Bayesian method based on MCMC sampling. To overcome these problems, we propose high-dimensional quantile regression with a spike-and-slab lasso penalty based on variational Bayesian (VBSSLQR), which can, not only improve the computational efficiency, but also measure the randomness via variational distributions. Simulation studies and real data analysis illustrated that the proposed VBSSLQR method was superior or equivalent to other quantile and nonquantile regression methods (including Bayesian and non-Bayesian methods), and its efficiency was higher than any other method.

Keywords:

quantile regression; spike-and-slab prior; variational Bayesian; high-dimensional data

MSC:

62-08

1. Introduction

Quantile regression, as introduced by Koenker and Bassett (1978) [1], is an important statistical inference about the relationship between quantiles of the response distribution and available covariates, and can offer a practically significant alternative to the traditional mean regression, because it provides a more comprehensive description of the response distribution than the mean. Moreover, quantile regression can capture the heterogeneous impact of regressors on different parts of the distribution [2], has excellent computational properties [3], exhibits robustness to outliers, and has a wide applicability [4]. For these reasons, quantile regression has attracted extensive attention in the literature. For example, see [5] for a Bayesian quantile regression with asymmetric Laplace distribution used to specify the likelihood, [6] for a Bayesian nonparametric approach to inference for quantile regression, ref. [7] for a mechanism of Bayesian inference of quantile regression models, and [8] for model selection in quantile regression, among others.

Although there is a growing literature on quantile regression, to the best of our knowledge, little of the existing quantile regression models have focused on high-dimensional data with variable numbers and larger sample sizes. In practice, a large number of variables may be collected and some of these will be insignificant and should be excluded from the final model. In the past two decades, there has been active methodological research on penalized methods for significant variable selection in linear parametric models. For example, see [9] for ridge regression, ref. [10] for a least absolute shrinkage and selection operator (Lasso), ref. [11] for smoothly clipped absolute deviation penalty (SCAD), ref. [12] for elastic net penalty, and ref. [13] for adaptive lasso methods. These methods have been extended to quantile regression; for example, see [14] for a

L_{1}

-regularization method for quantile regression, and ref. [15] for variable selection of quantile regression using SCAD and adaptive-Lasso penalties. Nevertheless, the aforementioned regularization methods discussed are both computationally complex and unstable. Additionally, they fail to account for prior information about parameters, which can lead to an unsatisfactory parametric estimation accuracy. In recent decades, Bayesian approaches to variable selection and parameter estimation have garnered significant attention. This is because they can substantially enhance the accuracy and efficiency of parametric estimation, by imposing various priors on model parameters, consistently selecting crucial variables, and providing more information for variable selection than penalization methods for highly nonconvex optimization problems. For example, see [16] for a Bayesian Lasso where the

L_{1}

penalty is involved in Laplace prior, [17] for a Bayesian form of adaptive Lasso, ref. [18] for Bayesian Lasso quantile regression (BLQR), and [19] for Bayesian adaptive Lsso quantile regression (BALQR). The literature mentioned above implemented the standard Gibbs sampler for posterior computation, which is not easily scalable for high-dimensional data, where the number of variables is large compared with the sample size [20].

To address this issue, the Bayesian variable selection method with a spike-and-slab prior [20] has been favored by researchers, and this can be applied to high-dimensional data at the cost of a heavy computation burden. As a computationally efficient alternative to Markov chain Monte Carlo (MCMC) simulation, variational Bayes (VB) methods are gaining traction in machine learning and statistics for approximating posterior distributions in Bayesian inference. High-efficiency variational Bayesian spike-and-slab lasso(VBSSL) has been explored for certain high-dimensional models. Ray and Szabo (2022) [21] used a VBSSL method in a high-dimensional linear model, with regression coefficient’s prior specified as a mixture of Laplace distribution and Dirac mass. Yi and Tang (2022) [22] used VBSSL technology in high-dimensional linear mixed models, with a interesting prior parameter as a mixture of two Laplace distributions. However, to the best of our knowledge, there has been little work done on a VBSSL method for quantile regression. Xi et al. (2016) [23] considered Bayesian variable selection for nonparametric quantile regression with a small variable dimension, in which a spike-and-slab prior was chosen as a mixture of the point mass at zero and normal distribution. In this paper, to reduce the computational burden and quantify the parametric uncertainty, we propose quantile regression with spike-and-slab lasso penalty based on variational Bayesian (VBSSLQR), in which the prior is a mixture of two Laplace distributions, with a smaller or larger variance, respectively.

The main contributions of this paper are as follows: First, our proposed VBSSLQR method can perform variable selection for high-dimensional quantile regression at a relatively low computational cost, and without the need for nonconvex optimization, while also avoiding the curse of dimensionality problem. Second, in contrast to mean regressions, our proposed quantile approach offers a more systematic strategy for analyzing how covariates impact the various quantiles of the response distribution. Third, in ultra-high-dimensional regression, the mean regression errors are frequently presumed to be subGaussian, which is not required in our setting.

The rest of the paper is organized as follows: In Section 2, for high-dimensional data, we propose an efficient quantile regression with a spike-and-slab lasso penalty based on variational Bayes (VBSSLQR). In Section 3, we randomly generate high-dimensional data with n = 200 and p = 500 (excluding intercept items), and perform 500 simulation experiments, to explore the performance of our algorithm and compare it with other quantile regression methods (Bayesian and non-Bayesian) and nonquantile regression methods. The results show that our method is superior to other approaches in the case of high-dimensional data. We applied VBSSLQR to a real dataset that contained information about crime in various cities in the United States, and compared it with other quantile regression methods. The results showed that our method also had a good performance and excellent efficiency with real data, and the relevant results are shown in Section 4. Some concluding remarks are given in Section 5. Technical details are presented in the Appendix A, Appendix B, Appendix C and Appendix D.

2. Models and Methods

2.1. Quantile Regression

Consider a dataset of n independent subjects. For the ith subject, let

y_{i}

be the response, while

x_{i} = {(1, x_{i 1}, . . ., x_{i r})}^{⊤}

is an

(r + 1) \times 1

predictor vector, a simple linear regression model is defined as follows:

\begin{matrix} y_{i} = x_{i}^{⊤} β + ε_{i}, i = 1, 2, \dots, n, \end{matrix}

(1)

where

β = {(β_{0}, β_{1}, \dots, β_{r})}^{⊤}

is the regression coefficient vector with

β_{0}

corresponding to the intercept terms, and

ε_{i}

represents the error term with unknown distribution. It is usual to assume that the

τ

th quantile of the random error term is 0; that is,

Q_{τ} (ε_{i}) = 0

for

0 < τ < 1

. According to this assumption, the form of

τ

th quantile regression of model (1) is specified as follows:

\begin{matrix} Q_{y_{i}} (τ | x_{i}) = x_{i}^{⊤} β, \end{matrix}

(2)

where

Q_{y_{i}} (τ | x_{i})

is the inverse cumulative distribution function of

y_{i}

, given

x_{i}

evaluated at

τ

. The estimate of the regression coefficient vector

β

in Equation (2) is

\begin{matrix} \hat{β} & = \underset{β \in R^{r + 1}}{argmin} [\sum_{i : y_{i} \geq x_{i} β} τ | y_{i} - x_{i}^{⊤} β | + \sum_{i : y_{i} < x_{i} β} (1 - τ) | y_{i} - x_{i}^{⊤} β |] \\ = \underset{β \in R^{r + 1}}{argmin} \sum_{i = 1}^{n} ρ_{τ} (y_{i} - x_{i}^{⊤} β), \end{matrix}

(3)

where the loss function

ρ_{τ} (μ) = μ \times (τ - I (μ < 0))

with the indicator function

I (\cdot)

.

In light of [5,24], minimizing Equation (3) is equivalent to maximizing the likelihood of n independent individuals with the ith one distributed as an asymmetric Laplace distribution (ALD) specified as

\begin{matrix} p (y_{i} | x_{i}, β, σ, τ) = \frac{τ (1 - τ)}{σ} \exp \{- ρ_{τ} (\frac{y_{i} - μ_{i}}{σ})\}, \end{matrix}

(4)

where the local parameter

μ_{i} = x_{i}^{⊤} β

, the scale parameter

σ > 0

, and the skewness parameter

τ

is between 0 and 1; obviously, ALD is a Laplace distribution when

τ = 0.5

and

μ_{i} = 0

. However, it is computationally infeasible to carry out statistical inference based directly on Equation (4) involving the nondifferentiable point

μ_{i}

. Following [25], Equation (4) can be rewritten in the following hierarchical fashion:

\{\begin{matrix} y_{i} = x_{i}^{⊤} β + k_{1} z_{i} + \sqrt{k_{2} σ z_{i}} ξ_{i}, \\ z_{i} | σ \overset{i . i . d .}{\sim} Exp (\frac{1}{σ}), \\ ξ_{i} \overset{i . i . d .}{\sim} N (0, 1), \\ z_{i} is independent of ξ_{i}, \end{matrix}

(5)

where

k_{1} = \frac{1 - 2 τ}{τ (1 - τ)}

,

k_{2} = \frac{2}{τ (1 - τ)}

and

Exp (\frac{1}{σ})

denotes the exponential distribution with mean

σ

, whose specific density function is

p (z_{i} | σ) = \frac{1}{σ} exp (- \frac{1}{σ} z_{i}) I (z_{i} > 0)

. Equation (5) illustrates that an asymmetric Laplace distribution can also be represented as a mixture of exponential and standard normal distributions, which allows us to express a quantile regression model as a normal regression model, in which the response has the following conditional distribution:

y_{i} | x_{i}, β, z_{i}, σ \overset{ind}{\sim} N (x_{i}^{⊤} β + k_{1} z_{i}, k_{2} σ z_{i}) .

For the above-defined quantile regression model with high-dimensional covariate vector (r is large enough), it is of interest to estimate the parameter vector

β

and to identify the critical covariates. To this end, we considered Bayesian quantile regression based on spike-and-slab lasso, as follows:

2.2. Bayesian Quantile Regression Based on a Spike-and-Slab Lasso

As early as 2016, Xi et al. [23] applied a spike-and-slab prior to Bayesian quantile regression, but their proposed prior was a mixture of zero particle and normal distribution with large variance, and the estimate of posterior density was obtained using a Gibbs sampler. To offer novel theoretical insights into a class of continuous spike-and-slab priors, Rockova (2018) [26] introduced a novel family of spike-and-slab priors, which are a mixture of two density functions with spike or slab probability. In this paper, we adopt a spike-and-slab lasso prior with a mixture of two Laplace distributions with large or small variance, respectively [26], which facilitates the variational Bayesian technique for approximating the posterior density of parameters and for improving the efficiency of the algorithm. In light of ref. [26], given the indicator

γ_{j} = 0

or 1, the prior of

β

in the Bayesian quantile regression model (5) can be written as

\begin{matrix} π (β | γ) = \prod_{j = 0}^{r} π (β_{j} | γ_{j}) = \prod_{j = 0}^{r} [γ_{j} Ψ_{0} (β_{j} | λ_{0}) + (1 - γ_{j}) Ψ_{1} (β_{j} | λ_{1})], \end{matrix}

(6)

where the Laplace density

Ψ_{0} (β_{j} | λ_{0}) = \frac{λ_{0}}{2} exp (- λ_{0} | β_{j} |)

and

Ψ_{1} (β_{j} | λ_{1}) = \frac{λ_{1}}{2} exp (- λ_{1} | β_{j} |)

with precision parameters

λ_{0}

and

λ_{1}

satisfying that

λ_{0} ≫ λ_{1}

, the indicator variable set

γ = {γ_{j} | j = 0, 1, \dots, r}

, the jth variable is active when

γ_{j} = 0

, and inactive otherwise. Similarly to [27], the Laplace distribution for the regression coefficient

β_{j}

can be represented as a mixture of a normal distribution and an exponential distribution, specifically the distribution of

β_{j}

can be expressed as a hierarchical structure, as follows:

\begin{matrix} β_{j} | h_{0 j}^{2}, h_{1 j}^{2}, γ_{j} & \overset{ind .}{\sim} γ_{j} N (0, h_{0 j}^{2}) + (1 - γ_{j}) N (0, h_{1 j}^{2}), \\ h_{0 j}^{2} | λ_{0}^{2} & \overset{i . i . d .}{\sim} Exp (\frac{λ_{0}^{2}}{2}), \\ h_{1 j}^{2} | λ_{1}^{2} & \overset{i . i . d .}{\sim} Exp (\frac{λ_{1}^{2}}{2}), \\ γ_{j} & \overset{i . i . d .}{\sim} B (π_{γ}), \end{matrix}

(7)

where

B (π_{γ})

denotes the Bernoulli distribution, with

π_{γ}

being the probability that indicator variable

γ_{j}

equals one for

j = 0, 1, \dots, r

, and specifies the prior of

π_{γ}

as a Beta distribution

Be (a_{π_{γ}}, b_{π_{γ}})

, where hyperparameters

a_{π_{γ}}

and

b_{π_{γ}}

,

λ_{0}^{2}

, and

λ_{1}^{2}

are regularization parameters to identify important variables, for which we consider the following conjugate priors:

\begin{matrix} λ_{0}^{2} & \sim Ga (ν_{λ_{0}}, 1), λ_{1}^{2} & \sim Ga (ν_{λ_{1}}, 1), \end{matrix}

where

Ga (a, b)

denotes the gamma distribution with shape parameter a and scale parameter b. As mentioned above,

λ_{0}

and

λ_{1}

should satisfy that

λ_{0} ≫ λ_{1}

, to this end, we select hyperparmeters

ν_{λ_{0}}

and

ν_{λ_{1}}

to satisfy that

ν_{λ_{0}} ≫ ν_{λ_{1}}

. The prior of scale parameter

σ

in (5) is an inverse gamma distribution

IG (a_{σ}, b_{σ})

of the hyperparameters

a_{σ} = 1

and

b_{σ} = 0.01

in the paper leading to almost non-informative prior.

Under a Bayesian statistical paradise, based on the above priors and likelihood of quantile regression, it is required to induce the following posterior distribution

π (θ | D) \propto p (θ, D)

, where

θ = {β, z, σ, γ, h_{0}^{2}, h_{1}^{2}, λ_{0}^{2}, λ_{1}^{2}, π_{γ}}

, the latent variable set

z = {z_{i} | i = 1, \dots, n}

,

h_{0}^{2} = {h_{0 j}^{2} | j = 0, 1, \dots, r}

,

h_{1}^{2} = {h_{1 j}^{2} | j = 0, 1, \dots, r}

, observing set

D = {y, x}

with the response set

y = {y_{i} | i = 1, \dots, n}

and covariate set

x = {x_{i} | i = 1, \dots, n}

. Based on the hierarchical structure (5) of the quantile regression likelihood

p (y_{i} | x_{i}, β)

and the hierarchical structure (7) of the spike-and-slab prior to regression coefficient vector

β

, we derive the joint density

\begin{matrix} p (θ, D) & = \prod_{i = 1}^{n} N (y_{i} | x_{i}^{⊤} β + k_{1} z_{i}, k_{2} σ z_{i}) Exp (z_{i} | σ^{- 1}) Ga (λ_{0}^{2} | ν_{λ_{0}}, 1) Ga (λ_{1}^{2} | ν_{λ_{1}}, 1) \\ \times \prod_{j = 0}^{r} {[N (β_{j} | 0, h_{0 j}^{2})]}^{γ_{j}} {[N (β_{j} | 0, h_{1 j}^{2})]}^{(1 - γ_{j})} Exp (h_{0 j}^{2} | \frac{λ_{0}^{2}}{2}) Exp (h_{1 j}^{2} | \frac{λ_{1}^{2}}{2}) \\ \times \prod_{j = 0}^{r} B (γ_{j} | 1, π_{γ}) Be (π_{γ} | a_{π_{γ}}, b_{π_{γ}}) IG (σ | a_{σ}, b_{σ}) \\ \propto \prod_{i = 1}^{n} {(k_{2} σ z_{i})}^{- \frac{1}{2}} \exp (- \frac{{(y_{i} - x_{i}^{⊤} β - k_{1} z_{i})}^{2}}{2 k_{2} σ z_{i}}) \frac{1}{σ} \exp (- \frac{z_{i}}{σ}) {(λ_{0}^{2})}^{v_{λ_{0}} - 1} \exp (- λ_{0}^{2}) \\ \times {(λ_{1}^{2})}^{(v_{λ_{1}} - 1)} \exp (- λ_{1}^{2}) \prod_{j = 0}^{r} {[\frac{1}{\sqrt{h_{0 j}^{2}}} \exp (- \frac{β_{j}^{2}}{2 h_{0 j}^{2}})]}^{γ_{j}} {[\frac{1}{\sqrt{h_{1 j}^{2}}} \exp (- \frac{β_{j}^{2}}{2 h_{1 j}^{2}})]}^{(1 - γ_{j})} \\ \times \frac{λ_{0}^{2}}{2} \exp (- \frac{λ_{0}^{2}}{2} h_{0 j}^{2}) \frac{λ_{1}^{2}}{2} \exp (- \frac{λ_{1}^{2}}{2} h_{1 j}^{2}) \prod_{j = 0}^{r} π_{γ}^{γ_{j}} {(1 - π_{γ})}^{1 - γ_{j}} π_{γ}^{a_{π_{γ}} - 1} {(1 - π_{γ})}^{b_{π_{γ}} - 1} \\ \times \frac{b_{σ}^{a_{σ}}}{Γ (a_{σ})} σ^{- a_{σ} - 1} \exp (- \frac{b_{σ}}{σ}) . \end{matrix}

(8)

Although sampling from the aforementioned posterior is simple, it becomes increasingly time-consuming for higher dimension quantile models. To tackle this issue, we developed a faster and more efficient alternative method based on variational Bayesian.

2.3. Quantile Regression with a Spike-and-Slab Lasso Penalty Based on Variational BAYESIAN

At present, the most commonly variational Bayesian approximation posterior distribution methods use mean field approximation theory [28], which has the highest efficiency among variational methods, especially for those parameters or parameter block with conjugate priors. Bayesian quantile regression needs to take into account that the variance of each observation value is different, and each

y_{i}

corresponds to a potential variable

z_{i}

, which will result in the algorithm efficiency of quantile regression being lower than that of general mean regression. Therefore, in this paper, we use the variational Bayesian algorithm of the mean field approximation, which is the most efficient algorithm, to derive the quantile regression model with the spike-and-slab lasso penalty.

Based on variational theory, we choose a densities for random variables

θ

from variational family

F

, which having the same support

Θ

as the posterior density

π (θ | D)

. We approximate the posterior density

π (θ | D)

by any variational density

q (θ) \in F

. The variational Bayesian method is to seek the optimal approximation to

π (θ | D)

by minimizing the Kullback-Leibler divergence between

q (θ)

and

π (θ | D)

, which is an optimization problem that can be expressed as:

q^{*} (θ) = \underset{q (θ) \in F}{argmin} KL [q (θ) ‖ π (θ | D)],

where

KL [q (θ) ‖ π (θ | D)] = \int_{Θ} q (θ) \log \frac{q (θ)}{π (θ | D)} d θ

, which is not less than zero and equal to zero if, and only if,

q (θ) \equiv π (θ | D)

. The posterior density

π (θ | D) = \frac{p (θ, D)}{p (D)}

with the joint distribution

p (θ, D)

of parameter

θ

and data

D

and the marginal distribution

p (D)

data

D

. Since

p (D) = \int_{Θ} p (θ, D) d θ

, which does not have an analytic expression for our considered model, it is rather difficult to implement the optimization problem presented above. It is easy to induce that

\log p (D) = KL [q (θ) ‖ π (θ | D)] + L {q (θ)} \geq L {q (θ)},

in which the evidence lower bound (ELOB)

L {q (θ)} = E_{q (θ)} [\log p (θ, D)] - E_{q (θ)} [\log q (θ)]

with

E_{q (θ)} [\cdot]

representing the expectation taken with respect to the variational density

q (θ)

. Thus, minimizing

KL [q (θ) ‖ π (θ | D)]

is equivalent to maximizing

L {q (θ)}

because log

p (D)

does not depend on

θ

. That is,

q^{*} (θ) = \underset{q (θ) \in F}{argmin} KL [q (θ) ‖ π (θ | D)] = \underset{q (θ) \in F}{argmax} L {q (θ)},

which indicates that seeking the optimal approximation problem of

π (θ | D)

becomes maximizing

L {q (θ)}

under the variational family

F

. The complexity of the approximation problem heavily is related to the variational family

F

. Therefore, choosing a comparatively simple variational family

F

to optimize the objective function

L {q (θ)}

with respect to

q (θ)

is fascinating.

Following the commonly used approach to choosing a tractable variational family

F

in the variational studies, we consider the frequently-used mean-field theory, which assumes that blocks of

θ

are mutually independent and each is measured by the parameters of the variational density. Obviously, the variational density

q (θ)

is assumed to be factorized across the blocks of

θ

:

q (θ) = \prod_{j = 0}^{r} q_{1} (β_{j}, γ_{j}) q_{2} (h_{0 j}^{2}) q_{3} (h_{1 j}^{2}) \prod_{i = 1}^{n} q_{4} (z_{i}) q_{5} (λ_{0}^{2}) q_{6} (λ_{1}^{2}) q_{7} (π_{γ}) q_{8} (σ) \equiv \prod_{s}^{8} q_{s} (θ_{s}),

in which form of each variational densities

q_{s} (θ_{s})

’s is unknown, but the above assumed factorization across components is predetermined. Moreover, the best solutions for

q_{s} (θ_{s})

’s are to be achieved by maximizing

L {q (θ_{1}, \dots, θ_{8})}

with respect to variational densities

q_{1} (θ_{1}), \dots, q_{8} (θ_{8})

by the coordinate ascent method, where

θ = {θ_{1}, \dots, θ_{8}}

where

θ_{s}

can be either a scalar or a vector. This means that when the correlation between several unknown parameters or potential variables cannot be ignored, they should be put in the same block and merged into

θ_{s}

. Following the idea of the coordinate ascent method given in ref. [29], when fixing other variational factors

q_{k} (θ_{k})

for

k \neq s

, i.e., the optimal density

q_{s}^{*} (θ_{s})

, which maximizes

L {q (θ)}

with respect to

q_{s} (θ_{s})

, is shown to take the form

\begin{matrix} q_{s}^{*} (θ_{s}) \propto \exp \{E_{- θ_{s}} [\log p (θ, D)]\}, \end{matrix}

(9)

where

\log p (θ, D)

is the logarithm of the joint density function and

E_{- s} [\cdot]

is the expectation taken with respect to the density

\prod_{k \neq s} q_{k} (θ_{k})

for

s = 1, \dots, 8

.

According to Equations (8) and (9), we can derive the variational posterior for each parameter as follows (see Appendix A for the details):

\begin{matrix} β_{j} & \overset{i . i . d .}{\sim} μ_{γ_{j}} N (μ_{0 j}, σ_{0 j}^{2}) + (1 - μ_{γ_{j}}) N (μ_{1 j}, σ_{1 j}^{2}), for j = 1, \dots, r, \\ h_{0 j}^{2} & \overset{i . i . d .}{\sim} μ_{γ_{j}} GIG (\frac{1}{2}, E_{λ_{0}^{2}} (λ_{0}^{2}), σ_{0 j}^{2} + μ_{0 j}^{2}) + (1 - μ_{γ_{j}}) Exp (\frac{1}{2} E_{λ_{0}^{2}} (λ_{0}^{2})), \\ h_{1 j}^{2} & \overset{i . i . d .}{\sim} (1 - μ_{γ_{j}}) GIG (\frac{1}{2}, E_{λ_{1}^{2}} (λ_{1}^{2}), σ_{1 j}^{2} + μ_{1 j}^{2}) + μ_{γ_{j}} Exp (\frac{1}{2} E_{λ_{1}^{2}} (λ_{1}^{2})), \\ λ_{0}^{2} & \sim Ga (r + 1 + ν_{λ_{0}}, 1 + \frac{1}{2} \sum_{j = 1}^{r} E_{h_{0 j}^{2}} (h_{0 j}^{2})), \\ λ_{1}^{2} & \sim Ga (r + 1 + ν_{λ_{1}}, 1 + \frac{1}{2} \sum_{j = 1}^{r} E_{h_{1 j}^{2}} (h_{1 j}^{2})), \\ γ_{j} & \overset{i . i . d .}{\sim} B (1, {(1 + e^{ζ_{i}})}^{- 1}), for j = 0, 1, \dots, r, \\ π_{γ} & \sim Be (a + \sum_{j = 0}^{r} μ_{γ_{j}}, r + 1 + b - \sum_{j = 0}^{r} μ_{γ_{j}}), \\ z_{i} & \overset{i . i . d .}{\sim} GIG (\frac{1}{2}, a_{z_{i}}, b_{z_{i}}), for i = 1, 2, \dots, n, \\ σ & \sim IG (\frac{3}{2} n + a_{σ}, c_{σ}), \end{matrix}

where

GIG (\cdot, \cdot, \cdot)

denotes generalized inverse Gaussian distribution,

\begin{matrix} μ_{0 j} = σ_{o j}^{2} \frac{E_{σ} (σ^{- 1})}{k_{2}} \sum_{i = 1}^{n} [E_{z_{i}} (z_{i}^{- 1}) (y_{i} - x_{i, (- j)}^{⊤} μ_{β_{(- j)}}) - k_{1}] x_{i j}, \\ σ_{0 j}^{2} = {[E_{h_{0 j}^{2} | γ_{j} = 1} (h_{0 j}^{- 2}) + \frac{E_{σ} (σ^{- 1})}{k_{2}} \sum_{i = 1}^{n} x_{i j}^{2} E_{z_{i}} (z_{i}^{- 1})]}^{- 1}, \\ μ_{1 j} = σ_{1 j}^{2} \frac{E_{σ} (σ^{- 1})}{k_{2}} \sum_{i = 1}^{n} [E_{z_{i}} (z_{i}^{- 1}) (y_{i} - x_{i, (- j)}^{⊤} μ_{β_{(- j)}}) - k_{1}] x_{i j}, \\ σ_{1 j}^{2} = {[E_{h_{1 j}^{2} | γ_{j} = 0} (h_{1 j}^{- 2}) + \frac{E_{σ} (σ^{- 1})}{k_{2}} \sum_{i = 1}^{n} x_{i j}^{2} E_{z_{i}} (z_{i}^{- 1})]}^{- 1}, \end{matrix}

(10)

and

\begin{matrix} ζ_{j} = E_{π_{γ}} (\log (1 - π_{γ})) - E_{π_{γ}} (\log π_{γ}) + \frac{1}{2} E_{h_{0 j}^{2} | γ_{j} = 1} (\log h_{0 j}^{2}) - \frac{1}{2} E_{h_{1 j}^{2} | γ_{j} = 0} (\log h_{1 j}^{2}) \\ + \frac{1}{2} {\log σ_{1 j}^{2} - \log σ_{0 j}^{2}} + \frac{1}{2} (\log σ_{1 j}^{2} - \log σ_{0 j}^{2}) + \frac{1}{2} (μ_{1 j}^{2} {(σ_{1 j}^{2})}^{- 1} - μ_{0 j}^{2} {(σ_{0 j}^{2})}^{- 1}), \\ a_{z_{i}} = \frac{E_{σ} (σ^{- 1})}{k_{2}} (k_{1}^{2} + 2 k_{2}), \\ b_{z_{i}} = \frac{E_{σ} (σ^{- 1})}{k_{2}} [{(y_{i} - x_{i}^{⊤} μ_{β})}^{2} + x_{i}^{⊤} Σ_{β} x_{i}], \\ c_{σ} = \frac{1}{2 k_{2}} \sum_{i = 1}^{n} {k_{1}^{2} μ_{z_{i}} + E_{z_{i}} (z_{i}^{- 1}) [{(y_{i} - x_{i}^{⊤} μ_{β})}^{2} + x_{i}^{⊤} Σ_{β} x_{i}] \\ - 2 k_{1} (y_{i} - x_{i}^{⊤} μ_{β})} + \sum_{i = 1}^{n} μ_{z_{i}} + b_{σ} . \end{matrix}

In the above equation,

μ_{λ_{0}^{2}} = E_{λ_{0}^{2}} (λ_{0}^{2})

, and

μ_{λ_{1}^{2}}

,

μ_{β_{j}}

,

μ_{γ_{j}}

,

μ_{z_{i}}

are similar to

μ_{λ_{0}^{2}}

,

μ_{β} = {(μ_{β_{0}}, μ_{β_{1}}, \dots, μ_{β_{p}})}^{⊤}

with

μ_{β_{j}} = μ_{γ_{j}} μ_{0 j} + (1 - μ_{γ_{j}}) μ_{1 j}

, and

μ_{β_{(- j)}}

is a

p \times 1

vector with the jth component of vector

μ_{β}

deleted,

Σ_{β} = diag (σ_{β_{0}}^{2} σ_{β_{1}}^{2}, \dots σ_{β_{p}}^{2})

with

σ_{β_{j}}^{2} = μ_{γ_{j}} σ_{0 j}^{2} + (1 - μ_{γ_{j}}) σ_{1 j}^{2} + μ_{γ_{j}} (1 - μ_{γ_{j}}) {(μ_{0 j} - μ_{1 j})}^{2}

.

In the section above, we derived the variational posterior of each parameter. Using the idea of coordinate axis optimization, we can update each variational distribution iteratively until it converges.

For this reason, we list the variational Bayesian spike-and-slab lasso quantile regression (VBSSLQR) algorithm as shown in Algorithm 1:

Algorithm 1 Variational Bayesian spike-and-slab lasso quantile regression (VBSSLQR).

Input:

Data

y

, predictors

x

, prior parameters

ν_{λ_{0}} = 10^{4}

,

ν_{λ_{1}} = 1

,

a_{σ} = 1

,

b_{σ} = 0.01

,

a = b = 1

,
precision

ϵ = 0.01

and quantile

τ

;

Output:

Optimized variational parameters

E_{β_{j}} (β_{j})

,

{Var}_{β_{j}} (β_{j})

for

j = 0, 1, \dots, r

and the corresponding Bayesian confidence interval.

Initialize:

μ_{β}^{(0)}

;

E_{h_{0 j}^{2} | γ_{j} = 1}^{(0)} (\frac{1}{h_{0 j}^{2}}) = 0.01

;

E_{h_{1 j}^{2} | γ_{j} = 0}^{(0)} (\frac{1}{h_{1 j}^{2}}) = 1

;

μ_{γ}^{(0)} = 0.5

;

E_{λ_{0}^{2}}^{(0)} (λ_{0}^{2}) = 100

;

E_{λ_{1}^{2}}^{(0)} (λ_{1}^{2}) = 1

;

E_{π_{γ}}^{(0)} (\log π_{γ}) = 0

;

E_{σ}^{(0)} (σ^{- 1}) = 1

;

E_{π_{γ}}^{(0)} (\log (1 - π_{γ})) = - 1

;

γ^{(0)} = 0

;

E_{z_{i}}^{(0)} (z_{i}) = E_{z_{i}}^{(0)} (z_{i}^{- 1})

;

while

| d^{(t)} | > ε

do

for

j = 0

to r do

Update

{σ^{2}}_{0 j}^{(t + 1)}

,

μ_{0 j}^{(t + 1)}

,

{σ^{2}}_{1 j}^{(t + 1)}

and

μ_{1 j}^{(t + 1)}

according to Equation (10).

Update

E_{h_{0 j}^{2} | γ_{j} = 1}^{(t + 1)} (h_{0 j}^{2})

,

E_{h_{0 j}^{2}}^{(t + 1)} (h_{0 j}^{2})

,

E_{h_{0 j}^{2} | γ_{j} = 1}^{(t + 1)} (h_{0 j}^{- 2})

,

E_{h_{0 j}^{2} | γ_{j} = 1}^{(t + 1)} (\log h_{0 j}^{2})

according to

q (h_{0 j}^{2})

,

Update

E_{h_{1 j}^{2} | γ_{j} = 0}^{(t + 1)} (h_{1 j}^{2})

,

E_{h_{1 j}^{2}}^{(t + 1)} (h_{1 j}^{2})

,

E_{h_{1 j}^{2} | γ_{j} = 0}^{(t + 1)} (h_{1 j}^{- 2})

,

E_{h_{1 j}^{2} | γ_{j} = 0}^{(t + 1)} (\log h_{1 j}^{2})

according to

q (h_{1 j}^{2})

,

Update

E_{γ_{j}}^{(t + 1)} (γ_{j})

according to the variational posterior

q (γ_{j})

,

Update

E_{β_{j}}^{(t + 1)} (β_{j})

and

{Var}_{β_{j}}^{(t + 1)} (β_{j})

,

end for

Update

E_{λ_{0}^{2}}^{(t + 1)} (λ_{0}^{2})

according to the variational posterior

q (λ_{0}^{2})

,

Update

E_{λ_{0}^{2} 1}^{(t + 1)} (λ_{1}^{2})

according to the variational posterior

q (λ_{1}^{2})

,

Update

E_{π_{γ}}^{(t + 1)} (π_{γ})

,

E_{π_{γ}}^{(t + 1)} (\log π_{γ})

and

E_{π_{γ}}^{(t + 1)} (\log (1 - π_{γ}))

according to

q (π_{γ})

,

for

i = 1

to n do

Update

E_{z_{i}}^{(t + 1)} (z_{i})

and

E_{z_{i}}^{(t + 1)} (z_{i}^{- 1})

according to the variational posterior

q (z_{i})

,

end for

Update

E_{σ}^{(t + 1)} (σ)

and

E_{σ}^{(t + 1)} (σ^{- 1})

according to the variational posterior

q (σ)

,

| d^{(t + 1)} | = \max \{| θ_{q_{1}}^{(t + 1)} - θ_{q_{1}}^{(t)} |, . . ., | θ_{q_{m}}^{(t + 1)} - θ_{q_{m}}^{(t)} |\}

,

end while

In Algorithm 1 above,

Ψ (\cdot)

is the digamma function and the expectation

E_{h_{0 j}^{2} | γ_{j} = 1}^{(t + 1)} (\log h_{0 j}^{2})

of

\log h_{0 j}^{2}

with respect to generalized inverse Gaussian distribution. Thus, we assume that

x \sim GIG (p, a, b)

, then:

\begin{matrix} E_{x} (\log x) = \frac{\frac{d K_{p} (\sqrt{a b})}{d p}}{K_{p} (\sqrt{a b})} - \frac{1}{2} \ln (\frac{a}{b}), \end{matrix}

(11)

where

K_{p} (\cdot)

represents the Bessel function of the second kind. Note that there is no analytic solution or function to the differential of the modified Bessel function. Therefore, we approximate

E_{x} (\log x)

using the second-order Taylor expansion of

\log x

. This paper lists the expectations of some parameter functions about variational posteriors involved in Algorithm 1; see Appendix B for details. Based on our proposed VBSSLQR algorithm, in the next section, we randomly generate high-dimensional data, conduct simulation research, and compare the performance with other methods. Notably, the asymptotic variance of the quantile regression is reciprocal proportional to the density of the errors at the quantile point. In cases where n is small and we estimate extreme quantiles, the correlating asymptotic variance will be large, resulting in less precise estimates [23]. Therefore, the regression coefficient is difficult to estimate at an extreme quantile and this is feasible when the sample sizes needs to be increased appropriately.

3. Simulation Studies

In this section, we used simulated high-dimensional data with a sample size

n = 200

and variable number

r = 500

, in order to study the performance of VBSSLQR and compare it with existing methods, including linear regression with a lasso penalty (Lasso), linear regression with an adaptive lasso penalty (ALasso), quantile regression with a lasso penalty (QRL), quantile regression with an adaptive lasso penalty (QRAL), Bayesian regularized quantile regression with a lasso penalty (BLQR), and Bayesian regularized quantile regression with an adaptive lasso penalty (BALQR). The data in the simulation studies were generated using Equation (1), in which the covariate vector

x_{i}

was randomly generated from the multivariate normal distribution

N (0, Σ)

with the

(k, l)

th of

Σ

being

0 . 5^{| k - l |}

. Among these covariates, we only considered the ten important explanatory variables that have significant impact on the dependent variable. We set the 1, 51, 101, 151, 201, 251, 301, 351, 401, and 451 predictors to be active, and their regression coefficients are −3, −2.5, −2, −1.5, −1, 1, 1.5, 2, 2.5, 3, and the rest are zero. In addition, we discuss the performance of various approaches in the case of two types of random error; namely, independent and identically distributed (i.i.d.) random errors and heterogeneous random errors.

3.1. Independent and Identically Distributed Random Errors Random

In this subsection, with reference to [19,23], we set the random errors

ε_{i}

’s in Equation (1) to be independently and identically distributed and consider the following five different distributions with

τ

quantile being zero:

The error $ε_{i} \overset{i . i . d .}{\sim} N (- μ, 1)$ with $μ$ being the $τ$ quantile of $N (0, 1)$ , for $i = 1, \dots, n$ ;
The error $ε_{i} \overset{i . i . d .}{\sim} Laplace (- μ, 1)$ with $μ$ being the $τ$ quantile of $Laplace (0, 1)$ , where $Laplace (a, b)$ denotes the Laplace distribution with location parameter a and scale parameter b;
The error $ε_{i} \overset{i . i . d .}{\sim} 0.1 N (- μ_{1}, 9) + 0.9 N (- μ_{2}, 1)$ with $μ_{1}$ and $μ_{2}$ respectively being the $τ$ quantile of $N (0, 9)$ and $N (0, 1)$ ;
The error $ε_{i} \overset{i . i . d .}{\sim} 0.1 Laplace$ $(- μ_{1}, 9) + 0.9 Laplace (- μ_{2}, 1)$ with $μ_{1}$ and $μ_{2}$ respectively being the $τ$ quantile of $Laplace (0, 9)$ and $Laplace (0, 1)$ ;
The error $ε_{i} \overset{i . i . d .}{\sim} Cauchy (- μ, 0.2)$ with $μ$ being the $τ$ quantile of $Cauchy (0, 0.2)$ , where $Cauchy (a, b)$ denotes the Cauchy distribution with location parameter a and scale parameter b;

For all of the above error distributions with any

τ \in (0.3, 0.5, 0.7)

, we ran 500 replications for each method, and evaluated the performance using two criteria. The first criterion was the median of mean absolute deviations (MMAD), which quantifies the general distance between the estimated conditional quantile and the true conditional quantile. Specifically, the mean absolute deviation (MAD) in any replication is defined as

\frac{1}{n} \sum_{i = 1}^{n} | x_{i} β - x_{i} {\hat{β}}_{τ}) |,

where

{\hat{β}}_{τ}

is the estimate of regression coefficient

β

given

τ

. The second criterion of the mean of true positives (TP) and false positives (FP) was selected for each method.

Table 1 shows the median and standard deviation (SD) of MADs estimated using each method for simulations with homogeneous errors. It is clear that our method was either optimal (bold) in all cases, especially in quantile

τ = 0.3, 0.7

, or that our approach was significantly superior to the other six methods. When the error distribution was normal, a Laplace and normal mixture, the MMAD of the BALQR method was suboptimal. When the error distribution was Cauchy, the MMAD of the QRL approach was suboptimal, but Table 2 shows that this sacrificed the complexity of the model. When the error distribution was a Laplace mixture, the MMAD of the lasso approach was suboptimal; however, it selected an overfitting quantile regression model with about 30 FP variables, as can be seen from Table 2. It is particularly important to note that, in the case of high dimensional data, the MMAD for the quantile regression model with the lasso penalty or adaptive lasso penalty is not less than the MMAD for general linear models with a lasso penalty. Therefore, it is inappropriate to use a lasso penalty or adaptive lasso in this case.

In order to show the results of variable selection more intuitively, we introduced TP (true positives) and FP (false positives), to calculate the mean of TP and FP of 500 repeated simulations, respectively. Detailed results are shown in Table 2 below:

It is possible to conclude from the results in Table 2 that the lasso method can generally select all true active variables, but it fits many false active variables; and when the random error is a Cauchy distribution, it cannot select all true active variables; A lasso approach can identify true active variables, but there are also some misjudged behaviors, especially when the random error distribution is a Cauchy distribution; Although the QRL approach can identify real active variables, it still contains many false active variables, and the model it identifies has a high complexity; the QRAL method cannot identify all true active variables, and incorrectly identifies some inactive variables; In the case of high-dimensional data, the BLQR approach cannot select all the true active variables, but it also does not incorrectly select inactive variables. The true active variable selected using the BALQR method is better than using the BLQR method, but some inactive variables are incorrectly selected, especially when the random error distribution is the Cauchy distribution. Our VBSSLQR method not only had the smallest MMAD, but also could select true active variables and eliminate most false active variables. As a whole, our method was superior to the other six methods for variable selection, especially in quantile

τ = 0.3, 0.7

, performing significantly better than BLQR, BALQR, and QRL.

3.2. Heterogeneous Random Errors

Now we consider the case of heterogeneous random errors, to demonstrate the performance of our method. In this subsection, the data were generated from the following model:

\begin{matrix} y_{i} = x_{i}^{⊤} β + (1 + u_{i}) ε_{i}, \end{matrix}

(12)

where

u_{i} \overset{i . i . d .}{\sim} U (0, 1)

, in which

U (a, b)

represents the uniform distribution with support set

(a, b)

. The design matrix

x_{i}

was generated in the same way as above, and the regression coefficient

β

was set as before. Furthermore, in the simulation study, we combined

x_{i}

and

u_{i}

; that is,

u_{i}

was also a covariate. Finally, random error

ε_{i}

was also generated from the five different distributions defined in Section 3.1.

We also studied the performance of the quantile

τ \in (0.3, 0.5, 0.7)

under the different methods and simulated 500 times, to calculate the MMAD and mean of TP/FP. We list the experimental results in Table 3 and Table 4.

For heterogeneous random errors, our approach was still the best, similarly to for i.i.d. random errors. It is noteworthy that the VBSSLQR method was more robust than the other methods, and our method had the smallest MMAD change compared to the case of i.i.d. random errors. We can see that the MMAD of our method remained basically unchanged, while the MMAD of the other six methods differed by more than 0.5 compared to the i.i.d. random errors in some states.

We also investigated the mean of TP and FP for heterogeneous random errors under different quantiles

τ

. The results are listed in Table 4, which shows that the effect of variable selection of heterogeneous random errors was slightly lower than the effect of variable selection of i.i.d. random errors under the same sample size, but our method still provided the best selection results.

We also calculated the mean execution times of various Bayesian quantile regressions under different quantile

τ

for different distributions of random errors, and list the results in Table A1 of Appendix C, which illustrates that our proposed VBSSLQR approach was a lot more efficient than BLQR and BALQR, for which we sampled MCMC 1000 times and discarded the first 500 times, and made a statistical inference based on 500 samples (the experimental study shows that the algorithm converged after 500 samples). In order to illustrate the feasibility of applying our proposed VBSSLQR to cases with smaller effect sizes, we changed the above active predictors to 1, while the other settings remained unchanged. The performance of our proposed method is shown in Table A2, which illustrates that the results were not significantly different from those of Table 1 and Table 2 when random errors had an independent identical distribution, and slightly worse than those of Table 3 and Table 4 when the random errors were heterogeneous.

4. Examples

In this section, we analyzed a real dataset containing information about crime in various cities across the United States. This dataset is accessible from the University of Irvine machine learning repository (http://archive.ics.uci.edu/ml/datasets/communities+and+crime, accessed on 1 May 2023). We calculated the per capital rate of violent crimes by dividing the total number of violent crimes by the population of each city. The violent crimes considered in our analysis were those classified as murder, rape, robbery, and assault, as per United States law. The observed individuals were communities. The dataset has 116 variables, where the first four columns are response, name of the community, code of county, and code of community; the middle features are demographic information about each community, such as population, age, race, and income; and the final columns are regions. According to the source: “the data combines socio-economic data from the 1990 US Census, law enforcement data from the 1990 US LEMAS survey, and crime data from the 1995 FBI UCR”. This dataset has been applied for quantile regression [30]. The dataset is available at

train set: https://academics.hamilton.edu/mathematics/ckuruwit/Data/Crime/train.csv (accessed on 1 May 2023).
test set: https://academics.hamilton.edu/mathematics/ckuruwit/Data/Crime/test.csv (accessed on 1 May 2023).

Our dependent variable of interest was the murder rate of each city, denoted as

y_{i}

for the ith city. As the murder rate denotes the most dangerous violent crimes, we choose this variable. Studying factors correlated with the response dependent variable is of significant importance for the public and law enforcement agencies.

To adapt the data to our model, we preprocessed the data as follows:

Delete columns from the data set that contain missing data.
Delete the data when the response variable $y_{i}$ equals 0, because this is not an issue of interest to us.
Transform $y_{i}$ : $y_{i}^{'} = \log \frac{y_{i}}{1 - y_{i}}$ and let $y_{i}^{'}$ be the new response variable.
Convert some qualitative variables into quantitative variables.
Standardized covariates.

After the above data preprocessing, we obtained 1060 observation objects and 95 covariates in the training set, and we obtained 122 observation objects and 95 covariates in the testing set. We implemented a quantile regression model between the 96 predictors (including intercept) and the response

y_{i}^{'}

with different quantiles.

In this section, we compare QRL, QRAL, BLQR, and BALQR with our method VBSSLQR for real datasets, all with quantile penalty regression. Under different quantiles

τ \in (0.1, 0.3, 0.5, 0.7, 0.9)

, we compared the performance of the different approaches. We counted the root mean squared error (RMSE) of each method under each quantile

τ

and the number of selected active variables, to evaluate the performance of each approach on the test set, where the RMSE was evaluated using

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{τ i})}^{2}}

with

{\hat{y}}_{τ i}

being the fitted value of response

y_{i}

under quantile

τ

. Finally, the results are listed in Table 5 below.

To visually show the results listed in Table 4, the best results are in bold under each quantile

τ

. Clearly, our method performed better than the other methods for all quantiles, and the active variables selected by our method were suitable, which means that our method was very competitive compared with the other methods. The BLQR and BALQR approaches could only identify the intercept, and they could not identify the variables that really affected the response quantile. The QRL and QRAL methods were prone to overfitting, because they recognized too many active variables. Finally, the efficiency of our method with real data was also significantly higher than that of the other approaches. Although BLQR, QRL, and our proposed method had the same RMSE performance at

τ = 0.1, 0.3

, BLQR showed underfitting and QRL showed overfitting. Thus, we believe there is sufficient evidence to show that our method was very competitive with the other approaches.

Similarly to [30], we list the active variables selected under each quantile in Table 6. Thus, Table 6 shows the variable selection of our proposed method with real data.

In the above table, our method selected only a small number of predictors at each quantile level. Notably, only the variable “NumInShelters” had an impact on the response at all quantiles; the variables “FemalePctDiv” and “TotalPctDiv” had an impact on the response at

τ = 0.1, 0.3, 0.5, 0.7

; the variable “PctVacantBoarded” had an impact on the response at

τ = 0.5, 0.7, 0.9

; the variable “racePctWhite” had an impact on the response at

τ = 0.5, 0.7, 0.9

; and the other variables affected a few quantiles of response. Therefore, the five variables selected from the quantile regression model were

PctVacantBoarded: percentage of households that are vacant and boarded up to prevent vandalism.
NumInShelters: number of shelters in the community.
FemalePctDiv: percentage of females who are divorced.
TotalPctDiv: percentage of people who are divorced.
racePctWhite: percentage of people of white race.

In order to obtain a better understanding of these five common variables, we plot their correlation to the response.

Figure 1 depicts the relationship between the five common variables and the murder rate. Only the correlation between the NumInShelters and murder rate is not obvious, with MalePctDivorce, RentLowQ, racePctWhite, and PctVacantBoarded significantly affecting the murder rate. The variable racePctWhite and murder rate are negatively correlated, while FemalePctDiv, TotalPctDiv, PctVacantBoarded, and murder rate are positively correlated. This result was in accordance with the practical situation, and it can be seen that our results were basically consistent with that of [30]. Our method could more comprehensively select important variables under the same quantile. Therefore, the percentage of females who are divorced, percentage of persons who are divorced, the percentage of households that are vacant and boarded up to prevent from vandalism, and the percentage of people with white race affect the murder rate.

5. Conclusions

In this paper, we propose variational Bayesian spike-and-slab lasso quantile regression variable selection and estimation. This method applies spike-and-slab lasso prior to each regression coefficient

β

, thus punishing the regression coefficient

β

. The spike prior distribution we choose is a small variance Laplace distribution, while the slab prior distribution is a large variance Laplace distribution [26]. It is precisely because it punishes each regression coefficient

β

that it has a powerful variable selection function, but this also brings a problem of low efficiency, especially when introducing a spike-and-slab prior into quantile regression (note, the algorithm efficiency of quantile regression is inherently lower than that of general regression approaches). In order to solve the problem of inefficiency (influenced by both the quantile regression and spike-and-slab lasso prior) and to make the algorithm feasible. We introduced a variational Bayesian method, to approximate the posterior distribution of each parameter. The simulation studies and real data analyses illustrated that the quantile regression with the spike-and-slab lasso penalty based on variational Bayesian method performed effectively and exhibited a robust competitiveness compared with other approaches (Bayesian method or non-Bayesian method, quantile regression, or nonquantile regression), especially in the case of high-dimensional data. In future research work, it would be significant to improve the interpretability and computational efficiency of ultra-high-dimensional quantile regression based on VB and dimension reduction techniques.

Author Contributions

Conceptualization, D.D. and A.T.; methodology, A.T. and D.D.; software, D.D. and J.Y.; validation, A.T., D.D. and J.Y.; formal analysis, D.D. and A.T; investigation, D.D., J.Y. and A.T.; Preparation of the original work draft, A.T. and D.D.; visualization, D.D. and J.Y.; supervision, funding acquisition, D.D., A.T. and J.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (No. 11961079), Yunnan Univeristy Multidisciplinary Team Construction Projects in 2021, and Yunnan University Quality Improvement Plan for Graduate Course Textbook Construction (CZ22622202).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The research data are available on the website http://archive.ics.uci.edu/ml/datasets/communities+and+crime, accessed on 1 May 2023.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Deduction

Based on the mean-field variational Formula (9), from the joint density (8), it is required to induce the following variational distributions.

\begin{matrix} q_{1} (β_{j}, γ_{j}) \propto & \exp \{E_{- β_{j}, - γ_{j}} [\sum_{i = 1}^{n} \log N (y_{i} | x_{i}^{⊤} β + k_{1} z_{i}, k_{2} σ z_{i}) + γ_{j} \log N (β_{j} | 0, h_{0 j}^{2}) \\ + (1 - γ_{j}) \log N (β_{j} | 0, h_{0 j}^{2}) + \log B (γ_{j} | 1, π_{γ})]\} \\ \propto & \exp \{- \frac{1}{2 k_{2}} E_{σ} (σ^{- 1}) [\sum_{i = 1}^{n} E_{z_{i}} (z_{i}^{- 1}) x_{i j}^{2} β_{j}^{2} + 2 \sum_{i = 1}^{n} x_{i j} (E_{z_{i}} (z_{i}^{- 1}) \\ (y_{i} - x_{i, (- j)}^{⊤} μ_{β_{(- j)}}) - k_{1}) β_{j}] - \frac{γ_{j}}{2} E_{h_{0 j}^{2}} (\log 2 π h_{0 j}^{2}) - \frac{γ_{j}}{2} E_{h_{0 j}^{2}} (\frac{1}{h_{0 j}^{2}}) β_{j}^{2} \\ - \frac{(1 - γ_{j})}{2} E_{h_{1 j}^{2}} (\log 2 π h_{1 j}^{2}) - \frac{(1 - γ_{j})}{2} E_{h_{0 j}^{2}} (\frac{1}{h_{1 j}^{2}}) β_{j}^{2} + γ_{j} E_{π_{γ}} (\log π_{γ}) \\ + (1 - γ_{j}) E_{π_{γ}} (\log (1 - π_{γ}))\} \\ \propto & \exp \{[- \frac{1}{2 k_{2}} E_{σ} (σ^{- 1}) \sum_{i = 1}^{n} E_{z_{i}} (z_{i}^{- 1}) x_{i j}^{2} - \frac{γ_{j}}{2} E_{h_{0 j}^{2}} (\frac{1}{h_{0 j}^{2}}) - \frac{(1 - γ_{j})}{2} E_{h_{0 j}^{2}} (\frac{1}{h_{1 j}^{2}})] β_{j}^{2} \\ - \frac{1}{k_{2}} \sum_{i = 1}^{n} x_{i j} (E_{z_{i}} (z_{i}^{- 1}) (y_{i} - x_{i, (- j)}^{⊤} μ_{β_{(- j)}}) - k_{1}) β_{j} + γ_{j} E_{π_{γ}} (\log π_{γ}) \\ + (1 - γ_{j}) E_{π_{γ}} (\log (1 - π_{γ})) - \frac{γ_{j}}{2} E_{h_{0 j}^{2}} (\log 2 π h_{0 j}^{2}) - \frac{(1 - γ_{j})}{2} E_{h_{1 j}^{2}} (\log 2 π h_{1 j}^{2})\}, \end{matrix}

where

μ_{β_{(- j)}} = E_{β_{(- j)}} (β_{(- j)})

. therefore, given

γ_{j} = k

, for

k = 0

and 1, then

\begin{matrix} q_{1} (β_{j} | γ_{j} = k) \propto & \exp \{E_{- β_{j} | γ_{j} = k} [\sum_{i = 1}^{n} \log N (y_{i} | x_{i}^{⊤} β + k_{1} z_{i}, k_{2} σ z_{i}) + k \log N (β_{j} | 0, h_{0 j}^{2}) \\ + (1 - k) \log N (β_{j} | 0, h_{0 j}^{2})]\}, \end{matrix}

if

γ_{j} = 0

then

\begin{matrix} q_{1} (β_{j} | γ_{j} = 0) \propto & \exp \{[- \frac{1}{2 k_{2}} E_{σ} (σ^{- 1}) \sum_{i = 1}^{n} E_{z_{i}} (z_{i}^{- 1}) x_{i j}^{2} - \frac{1}{2} E_{h_{1 j}^{2} | γ_{j} = 0} (\frac{1}{h_{1 j}^{2}})] β_{j}^{2} \\ - \frac{1}{k_{2}} \sum_{i = 1}^{n} x_{i j} (E_{z_{i}} (z_{i}^{- 1}) (y_{i} - x_{i, (- j)}^{⊤} μ_{β_{(- j)}}) - k_{1}) β_{j}\}, \\ β_{j} | γ_{j} = 0 \overset{i . i . d .}{\sim} & N (μ_{1 j}, σ_{1 j}^{2}), \end{matrix}

if

γ_{j} = 1

then

\begin{matrix} q_{1} (β_{j} | γ_{j} = 1) \propto & \exp \{[- \frac{1}{2 k_{2}} E_{σ} (σ^{- 1}) \sum_{i = 1}^{n} E_{z_{i}} (z_{i}^{- 1}) x_{i j}^{2} - \frac{1}{2} E_{h_{0 j}^{2} | γ_{j} = 1} (\frac{1}{h_{0 j}^{2}})] β_{j}^{2} \\ - \frac{1}{k_{2}} \sum_{i = 1}^{n} x_{i j} (E_{z_{i}} (z_{i}^{- 1}) (y_{i} - x_{i, (- j)}^{⊤} μ_{β_{(- j)}}) - k_{1}) β_{j}\}, \\ β_{j} | γ_{j} = 1 \overset{i . i . d .}{\sim} & N (μ_{0 j}, σ_{0 j}^{2}), \end{matrix}

therefore

β_{j} \overset{i . i . d .}{\sim} \sum_{k = 0}^{1} p (γ_{j} = k) N (μ_{k j}, σ_{k j}^{2}), for j = 0, 1, \dots, r

, where

\begin{matrix} μ_{k j} = σ_{k j}^{2} \frac{E_{σ} (σ^{- 1})}{k_{2}} \sum_{i = 1}^{n} [E_{z_{i}} (z_{i}^{- 1}) (y_{i} - x_{i, (- j)}^{⊤} μ_{β_{(- j)}}) - k_{1}] x_{i j}, \\ σ_{k j}^{2} = {[E_{h_{k j}^{2} | γ_{j} = 1 - k} (h_{k j}^{- 2}) + \frac{E_{σ} (σ^{- 1})}{k_{2}} \sum_{i = 1}^{n} x_{i j}^{2} E_{z_{i}} (z_{i}^{- 1})]}^{- 1}, \end{matrix}

for

k = 1

and 2. similarly, if

γ_{j} = 1

then

\begin{matrix} q_{2} (h_{0 j}^{2} | γ_{j} = 1) \propto & \exp \{E_{- h_{0 j}^{2} | γ_{j} = 1} [\log N (β_{j} | 0, h_{0 j}^{2}) + \log Exp (h_{0 j}^{2} | \frac{λ_{0}^{2}}{2})]\} \\ \propto & {(h_{0 j}^{2})}^{- \frac{1}{2}} \exp \{- \frac{E_{β_{j} | γ_{j} = 1} (β_{j}^{2})}{2 h_{0 j}^{2}} - \frac{E_{λ_{0}^{2}} (λ_{0}^{2})}{2} h_{0 j}^{2}\} \\ = & {(h_{0 j}^{2})}^{- \frac{1}{2}} \exp \{- \frac{1}{2} [\frac{μ_{0 j}^{2} + σ_{0 j}^{2}}{h_{0 j}^{2}} + E_{λ_{0}^{2}} (λ_{0}^{2}) h_{0 j}^{2}]\}, \\ h_{0 j}^{2} | γ_{j} = 1 \overset{i . i . d .}{\sim} & GIG (\frac{1}{2}, E_{λ_{0}^{2}} (λ_{0}^{2}), μ_{0 j}^{2} + σ_{0 j}^{2}), \end{matrix}

if

γ_{j} = 0

then

\begin{matrix} q_{3} (h_{0 j}^{2} | γ_{j} = 0) \propto & {(h_{0 j}^{2})}^{- \frac{1}{2}} \exp \{- \frac{1}{2} E_{λ_{0}^{2}} (λ_{0}^{2}) h_{0 j}^{2}\}, \\ h_{0 j}^{2} | γ_{j} = 0 \overset{i . i . d .}{\sim} & Exp (\frac{1}{2} E_{λ_{0}^{2}} (λ_{0}^{2})), \end{matrix}

therefore

\begin{matrix} q_{2} (h_{0 j}^{2}) = p (γ_{j} = 1) GIG (\frac{1}{2}, E_{λ_{0}^{2}} (λ_{0}^{2}), σ_{0 j}^{2} + μ_{0 j}^{2}) + p (γ_{j} = 0) Exp (\frac{1}{2} E_{λ_{0}^{2}} (λ_{0}^{2})) . \end{matrix}

similarly,

\begin{matrix} q_{3} (h_{1 j}^{2}) = p (γ_{j} = 0) GIG (\frac{1}{2}, E_{λ_{1}^{2}} (λ_{1}^{2}), σ_{1 j}^{2} + μ_{1 j}^{2}) + p (γ_{j} = 1) Exp (\frac{1}{2} E_{λ_{1}^{2}} (λ_{1}^{2})), \end{matrix}

where

μ_{k j}

and

σ_{k j}^{2}

are the same as above, for

k = 1

and 0, from

q (β_{j}, γ_{j})

then

\begin{matrix} q_{1} (β_{j}, γ_{j} = 1) = & C \exp \{- \frac{1}{2} σ_{0 j}^{- 2} β_{j}^{2} + \frac{μ_{0 j}}{σ_{0 j}^{2}} β_{j} + E_{π_{γ}} (\log π_{γ}) - \frac{1}{2} E_{h_{0 j}^{2} | γ_{j} = 1} (\log h_{0 j}^{2})\}, \end{matrix}

integral out

β_{j}

obtain

\begin{matrix} p (γ_{j} = 1) & = \int q (β_{j}, γ_{j} = 1) d β_{j} \\ = C \sqrt{σ_{0 j}^{2}} \exp \{μ_{0 j}^{2} {(2 σ_{0 j}^{2})}^{- 1} + E_{π_{γ}} (\log π_{γ}) - \frac{1}{2} E_{h_{0 j}^{2} | γ_{j} = 1} (\log h_{0 j}^{2})\} . \end{matrix}

similarly,

\begin{matrix} p (γ_{j} = 0) & = \int q (β_{j}, γ_{j} = 0) d β_{j} \\ = C \sqrt{σ_{1 j}^{2}} \exp \{μ_{1 j}^{2} {(2 σ_{1 j}^{2})}^{- 1} + E_{π_{γ}} (\log (1 - π_{γ})) - \frac{1}{2} E_{h_{1 j}^{2} | γ_{j} = 0} (\log h_{1 j}^{2})\} . \end{matrix}

letting

\begin{matrix} ζ_{j} = & \log p (γ_{j} = 0) - \log p (γ_{j} = 1) \\ = & E_{π_{γ}} (\log (1 - π_{γ})) - E_{π_{γ}} (\log π_{γ}) + \frac{1}{2} E_{h_{0 j}^{2} | γ_{j} = 1} (\log h_{0 j}^{2}) \\ - & \frac{1}{2} E_{h_{1 j}^{2} | γ_{j} = 0} (\log h_{1 j}^{2}) + \frac{1}{2} (\log σ_{1 j}^{2} - \log σ_{0 j}^{2}) + \frac{1}{2} (μ_{1 j}^{2} {(σ_{1 j}^{2})}^{- 1} - μ_{0 j}^{2} {(σ_{0 j}^{2})}^{- 1}), \end{matrix}

then

\begin{matrix} p (γ_{j} = 1) = \frac{1}{1 + e^{ζ_{i}}}, p (γ_{j} = 0) = \frac{e^{ζ_{i}}}{1 + e^{ζ_{i}}}, \end{matrix}

therefore

γ_{j} \overset{i . i . d .}{\sim} B (1, {(1 + e^{ζ_{i}})}^{- 1})

, for

j = 0, 1, \dots, r

, where

μ_{k j}

and

σ_{k j}^{2}

are the same as above, for

k = 1

and 2.

\begin{matrix} q_{5} (λ_{0}^{2}) \propto & \exp \{E_{- λ_{0}^{2}} [\sum_{j = 0}^{r} \log Exp (h_{0 j}^{2} | \frac{λ_{0}^{2}}{2}) + \log π (λ_{0}^{2})]\} \\ \propto & {(λ_{0}^{2})}^{ν_{λ_{0}} + r + 1 - 1} \exp \{- \frac{\sum_{j = 0}^{r} E_{h_{0 j}^{2}} (h_{0 j}^{2})}{2} λ_{0}^{2} - λ_{0}^{2}\} \\ λ_{0}^{2} \sim & Ga (ν_{λ_{0}} + r + 1, \frac{\sum_{j = 0}^{r} E_{h_{0 j}^{2}} (h_{0 j}^{2})}{2} + 1), \end{matrix}

similarly,

\begin{matrix} λ_{1}^{2} \sim & Ga (ν_{λ_{1}} + r + 1, \frac{\sum_{j = 0}^{r} E_{h_{1 j}^{2}} (h_{1 j}^{2})}{2} + 1), \end{matrix}

where

ν_{λ_{0}}

and

ν_{λ_{1}}

are hyperparameters.

\begin{matrix} q_{7} (π_{γ}) \propto & \exp \{E_{- π_{γ}} [\sum_{j = 0}^{r} \log B (γ_{j} | 1, π_{γ}) + \log π (π_{γ})]\} \\ = & \exp \{\sum_{j = 0}^{r} E_{γ_{j}} (γ_{j}) \log π_{γ} + (r + 1 - \sum_{j = 0}^{r} E_{γ_{j}} (γ_{j})) \log (1 - π_{γ}) \\ + (a - 1) \log π_{γ} + (b - 1) \log (1 - π_{γ})} \\ = & {(π_{γ})}^{a - 1 + \sum_{j = 0}^{r} E_{γ_{j}} (γ_{j})} {(1 - π_{γ})}^{r + b + 1 - 1 - \sum_{j = 0}^{r} E_{γ_{j}} (γ_{j})}, \end{matrix}

therefore

π_{γ} \sim Be (a + \sum_{j = 0}^{r} E_{γ_{j}} (γ_{j}), r + b + 1 - \sum_{j = 0}^{r} E_{γ_{j}} (γ_{j}))

where a, b are hyperparameters.

\begin{matrix} q_{4} (z_{i}) \propto & \exp \{E_{- z_{i}} [\log N (y_{i} | x_{i}^{⊤} β + k_{1} z_{i}, k_{2} σ z_{i}) + \log E (z_{i} | σ^{- 1})]\} \\ \propto & z_{i}^{\frac{1}{2} - 1} \exp \{- \frac{1}{2} [\frac{E_{σ} (σ^{- 1})}{k_{2}} ((k_{1}^{2} + 2 k_{2}) z_{i} + \frac{{(y_{i} - x_{i}^{⊤} μ_{β})}^{2} + x_{i}^{⊤} Σ_{β} x_{i}}{z_{i}})]\}, \end{matrix}

therefore

z_{i} \overset{i . i . d .}{\sim} GIG (\frac{1}{2}, a_{z_{i}}, b_{z_{i}})

, for

i = 1, \dots, n

, where

\begin{matrix} a_{z_{i}} = \frac{E_{σ} (σ^{- 1})}{k_{2}} (k_{1}^{2} + 2 k_{2}), b_{z_{i}} = \frac{E_{σ} (σ^{- 1})}{k_{2}} [{(y_{i} - x_{i}^{⊤} μ_{β})}^{2} + x_{i}^{⊤} Σ_{β} x_{i}], \end{matrix}

with

k_{1}

and

k_{2}

being constants and

Σ_{β}

is a

(p + 1) \times (p + 1)

diagonal matrix with jth entry being

Var (β_{j})

.

\begin{matrix} q_{8} (σ) \propto & \exp \{E_{- σ} [\sum_{i = 1}^{n} \log N (y_{i} | x_{i}^{⊤} β + k_{1} z_{i}, k_{2} σ z_{i}) + \log E (z_{i} | σ^{- 1}) + \log π (σ)]\} \\ \propto & {(\frac{1}{σ})}^{\frac{n}{2} + n + a_{σ} - 1} \exp \{- \frac{1}{2 k_{2}} [k_{1}^{2} E_{z_{i}} (z_{i}) + E_{z_{i}} (z_{i}^{- 1}) ({(y_{i} - x_{i}^{⊤} μ_{β})}^{2} + x_{i}^{⊤} Σ_{β} x_{i}) \\ - 2 k_{1} (y_{i} - x_{i}^{⊤} μ_{β})] σ^{- 1} - \sum_{i = 1}^{n} E_{z_{i}} (z_{i}) σ^{- 1} - b_{σ} σ^{- 1}\}, \end{matrix}

therefore

σ \sim IG (\frac{3}{2} n + a_{σ}, c_{σ})

, where

a_{σ}

,

b_{σ}

are hyperparameters and

\begin{matrix} c_{σ} = & \frac{1}{2 k_{2}} \sum_{i = 1}^{n} {k_{1}^{2} E_{z_{i}} (z_{i}) + E_{z_{i}} (z_{i}^{- 1}) [{(y_{i} - x_{i}^{⊤} μ_{β})}^{2} + x_{i}^{⊤} Σ_{β} x_{i}] \\ - 2 k_{1} (y_{i} - x_{i}^{⊤} μ_{β})} + \sum_{i = 1}^{n} E_{z_{i}} (z_{i}) + b_{σ} . \end{matrix}

Appendix B. Expectation

The expectation of some parameter functions about variational posteriors:

Since

β_{j} \overset{i . i . d .}{\sim} \sum_{k = 0}^{1} p (γ_{j} = k) N (μ_{k j}, σ_{k j}^{2})

, therefore

\begin{matrix} E_{β_{j}} (β_{j}) = \sum_{k = 0}^{1} p (γ_{j} = k) E_{β_{j} | γ_{j} = k} (β_{j}) = \sum_{k = 0}^{1} p (γ_{j} = k) μ_{k j}, \end{matrix}

\begin{matrix} {Var}_{β_{j}} (β_{j}) = p (γ_{j} = 1) σ_{0 j}^{2} + p (γ_{j} = 0) σ_{1 j}^{2} + p (γ_{j} = 1) p (γ_{j} = 0) {(μ_{0 j} - μ_{1 j})}^{2}, \end{matrix}

\begin{matrix} E_{β_{j} | γ_{j} = k} (β_{j}^{2}) = μ_{k j}^{2} + σ_{k j}^{2}, k = 0, 1 . \end{matrix}

Since

h_{0 j}^{2} \overset{i . i . d .}{\sim} p (γ_{j} = 1) GIG (\frac{1}{2}, E_{λ_{0}^{2}} (λ_{0}^{2}), σ_{0 j}^{2} + μ_{0 j}^{2}) + p (γ_{j} = 0) Exp (\frac{1}{2} E_{λ_{0}^{2}} (λ_{0}^{2}))

, therefore

\begin{matrix} E_{h_{0 j}^{2}} (\frac{1}{h_{0 j}^{2}}) = \sum_{k = 0}^{1} p (γ_{j} = k) E_{h_{0 j}^{2} | γ_{j} = k} (\frac{1}{h_{0 j}^{2}}), \end{matrix}

where

E_{h_{0 j}^{2} | γ_{j} = 1} (\frac{1}{h_{0 j}^{2}}) = \sqrt{E_{λ_{0}^{2}} (λ_{0}^{2}) {(E_{β_{j} | γ_{j} = 1} (β_{j}^{2}))}^{- 1}}

and

E_{h_{0 j}^{2} | γ_{j} = 0} (\frac{1}{h_{0 j}^{2}}) = 2 {[E_{λ_{0}^{2}} (λ_{0}^{2})]}^{- 1}

,

\begin{matrix} E_{h_{0 j}^{2} | γ_{j} = 1} (h_{0 j}^{2}) = \sqrt{E_{β_{j} | γ_{j} = 1} (β_{j}^{2}) {(E_{λ_{0}^{2}} (λ_{0}^{2}))}^{- 1}} + {(E_{λ_{0}^{2}} (λ_{0}^{2}))}^{- 1} . \end{matrix}

Since

h_{1 j}^{2} \overset{i . i . d .}{\sim} p (γ_{j} = 0) GIG (\frac{1}{2}, E_{λ_{1}^{2}} (λ_{1}^{2}), σ_{1 j}^{2} + μ_{1 j}^{2}) + p (γ_{j} = 1) Exp (\frac{1}{2} E_{λ_{1}^{2}} (λ_{1}^{2}))

, therefore

\begin{matrix} E_{h_{1 j}^{2}} (\frac{1}{h_{1 j}^{2}}) = \sum_{k = 0}^{1} p (γ_{j} = k) E_{h_{1 j}^{2} | γ_{j} = k} (\frac{1}{h_{1 j}^{2}}), \end{matrix}

where

E_{h_{1 j}^{2} | γ_{j} = 1} (\frac{1}{h_{1 j}^{2}}) = 2 {[E_{λ_{1}^{2}} (λ_{1}^{2})]}^{- 1}

and

E_{h_{1 j}^{2} | γ_{j} = 0} (\frac{1}{h_{1 j}^{2}}) = \sqrt{E_{λ_{1}^{2}} (λ_{1}^{2}) {(E_{β_{j} | γ_{j} = 0} (β_{j}^{2}))}^{- 1}}

,

\begin{matrix} E_{h_{1 j}^{2} | γ_{j} = 0} (h_{1 j}^{2}) = \sqrt{E_{β_{j} | γ_{j} = 0} (β_{j}^{2}) {(E_{λ_{1}^{2}} (λ_{1}^{2}))}^{- 1}} + {(E_{λ_{1}^{2}} (λ_{1}^{2}))}^{- 1} . \end{matrix}

Since

σ \sim IG (\frac{3}{2} n + a_{σ}, c_{σ})

, therefore

\begin{matrix} E_{σ} (σ^{- 1}) = (\frac{3}{2} n + a_{σ}) \frac{1}{c_{σ}}, E_{σ} (σ) = {(\frac{3}{2} n + a_{σ} - 1)}^{- 1} c_{σ}, \end{matrix}

Since

z_{i} \overset{i . i . d .}{\sim} GIG (\frac{1}{2}, a_{z_{i}}, b_{z_{i}})

, therefore

\begin{matrix} E_{z_{i}} (z_{i}^{- 1}) = \sqrt{a_{z_{i}} {(b_{z_{i}})}^{- 1}}, E_{z_{i}} (z_{i}) = \sqrt{b_{z_{i}} {(a_{z_{i}})}^{- 1}} + {(a_{z_{i}})}^{- 1} . \end{matrix}

Since

λ_{k}^{2} \sim Ga (ν_{λ_{k}} + r + 1, \frac{\sum_{j = 0}^{r} E_{h_{k j}^{2}} (h_{k j}^{2})}{2} + 1)

, for

k = 0

and 1, therefore

\begin{matrix} E_{λ_{k}^{2}} (λ_{k}^{2}) = (r + 1 + ν_{λ_{k}}) {(1 + \frac{1}{2} \sum_{j = 0}^{r} E_{h_{k j}^{2}} (h_{k j}^{2}))}^{- 1} . \end{matrix}

Since

π_{γ} \sim Be (a + \sum_{j = 0}^{r} E_{γ_{j}} (γ_{j}), r + b + 1 - \sum_{j = 0}^{r} E_{γ_{j}} (γ_{j}))

, therefore

\begin{matrix} E_{π_{γ}} (\log π_{γ}) = Ψ (a + \sum_{j = 0}^{r} E_{γ_{j}} (γ_{j})) - Ψ (a + b + r + 1), \end{matrix}

\begin{matrix} E_{π_{γ}} (\log (1 - π_{γ})) = Ψ (r + b + 1 - \sum_{j = 0}^{r} E_{γ_{j}} (γ_{j})) - Ψ (a + b + r + 1), \end{matrix}

\begin{matrix} E_{π_{γ}} (π_{γ}) = (a + \sum_{j = 0}^{r} E_{γ_{j}} (γ_{j})) {(a + r + b + 1)}^{- 1} . \end{matrix}

Since

γ_{j} \overset{i . i . d .}{\sim} B (1, {(1 + e^{ζ_{i}})}^{- 1})

, therefore

\begin{matrix} E_{γ_{j}} (γ_{j}) = p (γ_{j} = 1) = {(1 + e^{ζ_{i}})}^{- 1} . \end{matrix}

Appendix C. Efficiency Comparison between Bayesian Quantile Regression Methods

Table A1. Mean execution times of the various Bayesian quantile regression methods in the simulation.

Quantile	Error Distribution	Method
		i.i.d. Random Errors			Heterogenous Random Errorss
		BLQR	BALQR	VBSSLQR	BLQR	BALQR	VBSSLQR
$τ = 0.3$	normal	146.45 s	144.56 s	$18.06$ s	146.48 s	174.88 s	$28.77$ s
	Laplace	145.26 s	144.62 s	$18.82$ s	145.86 s	173.44 s	$30.08$ s
	normal mixture	144.52 s	145.27 s	$18.11$ s	143.48 s	147.63 s	$28.99$ s
	Laplace mixture	145.86 s	146.28 s	$19.07$ s	145.82 s	171.69 s	$30.38$ s
	Cauchy	145.24 s	144.63 s	$16.06$ s	145.89 s	174.68 s	$25.64$ s
$τ = 0.5$	normal	144.62 s	144.52 s	$15.30$ s	144.67 s	174.39 s	$25.23$ s
	Laplace	144.54 s	145.26 s	$15.00$ s	144.57 s	172.84 s	$24.51$ s
	normal mixture	144.62 s	145.26 s	$15.39$ s	144.63 s	173.62 s	$25.21$ s
	Laplace mixture	147.68 s	145.83 s	$15.14$ s	147.62 s	172.25 s	$25.13$ s
	Cauchy	145.21 s	144.53 s	$12.38$ s	145.86 s	173.42 s	$20.72$ s
$τ = 0.7$	normal	147.29 s	144.37 s	$18.74$ s	146.48 s	173.46 s	$31.05$ s
	Laplace	144.62 s	145.27 s	$19.89$ s	144.64 s	174.61 s	$33.86$ s
	normal mixture	145.84 s	145.24 s	$19.00$ s	145.27 s	172.84 s	$31.60$ s
	Laplace mixture	147.57 s	147.98 s	$19.88$ s	147.68 s	173.42 s	$34.31$ s
	Cauchy	145.87 s	142.86 s	16.87 s	146.28 s	173.45 s	$28.12$ s

The bold represents the optimal result in each scenario and “s” denotes seconds.

Appendix D. Simulation Studies for Cases with Smaller Effect Sizes

Table A2. The performance of our proposed method for cases with smaller effect sizes.

Quantile	Distribution	Errors
		i.i.d. Random Errors			Heterogenous Random Errorss
		MMAD(sd)	TP	FP	MMAD(sd)	TP	FP
$τ = 0.3$	normal	0.21(0.05)	10.00	0.13	0.33(0.10)	10.00	0.25
	Laplace	0.25(0.08)	10.00	0.15	0.42(0.34)	9.56	0.37
	normal mixture	0.23(0.06)	10.00	0.07	0.36(0.16)	9.93	0.23
	Laplace mixture	0.29(0.10)	9.99	0.16	0.98(0.50)	8.32	0.38
	Cauchy	0.12(0.55)	9.43	0.00	0.16(0.76)	8.73	0.01
$τ = 0.5$	normal	0.20(0.06)	10.00	0.16	0.32(0.09)	10.00	0.29
	Laplace	0.20(0.06)	10.00	0.06	0.32(0.11)	9.99	0.20
	normal mixture	0.22(0.06)	10.00	0.10	0.34(0.10)	10.00	0.21
	Laplace mixture	0.22(0.07)	10.00	0.06	0.33(0.26)	9.76	0.17
	Cauchy	0.07(0.50)	9.55	0.00	0.10(0.69)	9.00	0.00
$τ = 0.7$	normal	0.21(0.06)	10.00	0.13	0.34(0.10)	10.00	0.37
	Laplace	0.25(0.08)	10.00	0.15	0.40(0.18)	9.91	0.38
	normal mixture	0.23(0.06)	10.00	0.16	0.36(0.12)	9.99	0.31
	Laplace mixture	0.27(0.09)	10.00	0.17	0.43(0.30)	9.66	0.33
	Cauchy	0.11(0.61)	9.38	0.00	0.15(0.67)	9.13	0.00

References

Koenker, R.; Bassett, G. Regression quantile. Econometrica 1978, 46, 33–50. [Google Scholar] [CrossRef]
Buchinsky, M. Changes in the united-states wage structure 1963–1987—Application of quantile regression. Econometrica 1994, 62, 405–458. [Google Scholar] [CrossRef]
Thisted, R.; Osborne, M.; Portnoy, S.; Koenker, R. The gaussian hare and the laplacian tortoise: Computability of squared-error versus absolute-error estimators—Comments and rejoinders. Stat. Sci. 1997, 12, 296–300. [Google Scholar]
Koenker, R.; Hallock, K. Quantile regression. J. Econ. Perspect. 2001, 15, 143–156. [Google Scholar] [CrossRef]
Yu, K.; Moyeed, R. Bayesian quantile regression. Stat. Probab. Lett. 2001, 54, 437–447. [Google Scholar] [CrossRef]
Taddy, M.A.; Kottas, A. A bayesian nonparametric approach to inference for quantile regression. J. Bus. Econ. Stat. 2010, 28, 357–369. [Google Scholar] [CrossRef]
Hu, Y.; Gramacy, R.B.; Lian, H. Bayesian quantile regression for single-index models. Stat. Comput. 2013, 23, 437–454. [Google Scholar] [CrossRef]
Lee, E.R.; Noh, H.; Park, B.U. Model selection via bayesian information criterion for quantile regression models. J. Am. Stat. Assoc. 2014, 109, 216–229. [Google Scholar] [CrossRef]
Frank, I.; Friedman, J. A statistical view of some chemometrics regression tools. Technometrics 1993, 35, 109–135. [Google Scholar] [CrossRef]
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser.-Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef]
Zou, H. The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar] [CrossRef]
Koenker, R. Quantile regression for longitudinal data. J. Multivar. Anal. 2004, 91, 74–89. [Google Scholar] [CrossRef]
Wu, Y.; Liu, Y. Variable selection in quantile regession. Stat. Sin. 2009, 19, 801–817. [Google Scholar]
Park, T.; Casella, G. The bayesian lasso. J. Am. Stat. Assoc. 2008, 103, 681–686. [Google Scholar] [CrossRef]
Leng, C.; Tran, M.N.; Nott, D. Bayesian adaptive Lasso. Ann. Inst. Stat. Math. 2014, 66, 221–244. [Google Scholar] [CrossRef]
Li, Q.; Xi, R.; Lin, N. Bayesian regularized quantile regression. Bayesian Anal. 2010, 5, 533–556. [Google Scholar] [CrossRef]
Alhamzawi, R.; Yu, K.; Benoit, D.F. Bayesian adaptive Lasso quantile regression. Stat. Model. 2012, 12, 279–297. [Google Scholar] [CrossRef]
Ishwaran, H.; Rao, J. Spike and slab variable selection: Frequentist and bayesian strategies. Ann. Stat. 2005, 33, 730–773. [Google Scholar] [CrossRef]
Ray, K.; Szabo, B. Variational bayes for high-dimensional linear regression with sparse priors. J. Am. Stat. Assoc. 2022, 117, 1270–1281. [Google Scholar] [CrossRef]
Yi, J.; Tang, N. Variational bayesian inference in high-dimensional linear mixed models. Mathematics 2022, 10, 463. [Google Scholar] [CrossRef]
Xi, R.; Li, Y.; Hu, Y. Bayesian quantile regression based on the empirical likelihood with spike and slab priors. Bayesian Anal. 2016, 11, 821–855. [Google Scholar] [CrossRef]
Koenker, R.; Machado, J. Goodness of fit and related inference processes for quantile regression. J. Am. Stat. Assoc. 1999, 94, 1296–1310. [Google Scholar] [CrossRef]
Tsionas, E. Bayesian quantile inference. J. Stat. Comput. Simul. 2003, 73, 659–674. [Google Scholar] [CrossRef]
Rockova, V. Bayesian estimation of sparse signals with a continuous spike-and-slab prior. Ann. Stat. 2018, 46, 401–437. [Google Scholar] [CrossRef]
Alhamzawi, R.; Ali, H.T.M. The bayesian adaptive lasso regression. Math. Biosci. 2018, 303, 75–82. [Google Scholar] [CrossRef]
Parisi, G.; Shankar, R. Statistical field theory. Phys. Today 1988, 41, 110. [Google Scholar] [CrossRef]
Beal, M.J. Variational Algorithms for Approximate Bayesian Inference. Ph.D Thesis, University of London, London, UK, 2003. [Google Scholar]
Kuruwita, C.N. Variable selection in the single-index quantile regression model with high-dimensional covariates. Commun.-Stat.-Simul. Comput. 2021, 52, 1120–1132. [Google Scholar] [CrossRef]

Figure 1. Correlation between murder rate and common variables.

Table 1. The median and standard deviation of 500 MADs estimated using various methods in simulations with i.i.d. errors.

Quantile	Error Distribution	Method
Quantile	Error Distribution	Lasso	ALasso	QRL	QRAL	BLQR	BALQR	VBSSLQR
$τ = 0.3$	normal	0.60 (0.06)	0.89 (0.14)	0.88 (0.07)	3.90 (0.20)	1.28 (0.17)	0.55 (0.09)	$0.21$ (0.05)
	Laplace	0.72 (0.08)	0.97 (0.15)	1.06 (0.15)	3.95 (0.20)	1.44 (0.21)	0.68 (0.20)	$0.24$ (0.06)
	normal mixture	0.76 (0.09)	1.04 (0.14)	1.02 (0.15)	3.95 (0.21)	1.42 (0.19)	0.71 (0.16)	$0.23$ (0.05)
	Laplace mixture	0.89 (0.13)	1.13 (0.18)	1.24 (0.33)	4.06 (0.23)	1.69 (0.32)	1.07 (0.31)	$0.26$ (0.07)
	Cauchy	1.23 (4.59)	1.46 (12.07)	0.59 (0.62)	4.24 (12.40)	1.92 (1.07)	1.98 (4.95)	$0.11$ (0.60)
$τ = 0.5$	normal	0.40 (0.05)	0.81 (0.16)	0.80 (0.07)	3.90 (0.21)	1.21 (0.19)	0.29 (0.07)	$0.20$ (0.05)
	Laplace	0.56 (0.08)	0.90 (0.17)	1.01 (0.16)	3.96 (0.21)	1.35 (0.22)	0.46 (0.23)	$0.20$ (0.05)
	normal mixture	0.53 (0.07)	0.89 (0.17)	0.96 (0.13)	3.95 (0.20)	1.36 (0.25)	0.41 (0.21)	$0.21$ (0.05)
	Laplace mixture	0.74 (0.13)	1.02 (0.20)	1.20 (0.22)	4.01 (0.23)	1.72 (0.32)	0.83 (0.32)	$0.21$ (0.06)
	Cauchy	1.31 (28.87)	1.44 (10.15)	0.48 (0.16)	4.40 (32.20)	2.026 (1.16)	4.28 (4.47)	$0.07$ (0.71)
$τ = 0.7$	normal	0.61 (0.06)	0.93 (0.16)	0.88 (0.07)	3.88 (0.21)	1.22 (0.19)	0.56 (0.11)	$0.20$ (0.05)
	Laplace	0.71 (0.08)	1.02 (0.16)	1.06 (0.16)	3.97 (0.20)	1.36 (0.27)	0.67 (0.20)	$0.25$ (0.07)
	normal mixture	0.75 (0.09)	1.06 (0.17)	1.01 (0.14)	3.94 (0.21)	1.32 (0.23)	0.75 (0.19)	$0.22$ (0.05)
	Laplace mixture	0.89 (0.13)	1.20 (0.21)	1.24 (0.32)	4.01 (0.22)	1.65 (0.33)	0.99 (0.36)	$0.26$ (0.09)
	Cauchy	1.22 (8.78)	1.60 (8.43)	0.60 (0.60)	4.27 (13.65)	2.28 (1.25)	3.02 (2.91)	$0.12$ (1.14)

The bold represents the optimal result in each scenario.

Table 2. Mean TP/FP of various methods for simulation with i.i.d. errors.

Quantile	Error Distribution	Method
		Lasso	ALasso	QRL	QRAL	BLQR	BALQR	VBSSLQR
		TP/FP	TP/FP	TP/FP	TP/FP	TP/FP	TP/FP	TP/FP
$τ = 0.3$	normal	10.00/38.24	9.80/0.20	10.00/175.01	5.00/95.33	8.48/0.00	10.00/0.02	$10.00 / 0.02$
	Laplace	10.00/40.43	9.75/0.34	9.99/169.28	5.00/94.88	8.06/0.00	9.86/0.13	$10.00 / 0.01$
	normal mixture	10.00/38.84	9.73/0.31	9.99/163.96	5.00/95.39	8.19/0.00	9.90/0.10	$10.00 / 0.00$
	Laplace mixture	10.00/37.99	9.62/0.71	9.87/146.31	5.00/95.25	7.43/0.00	9.50/0.70	$10.00 / 0.01$
	Cauchy	8.21/29.05	7.95/7.09	9.87/97.34	4.66/94.49	6.11/0.00	7.35/13.24	$9.83 / 0.00$
$τ = 0.5$	normal	10.00/39.00	9.75/0.29	10.00/156.04	5.00/94.73	8.37/0.00	10.00/0.01	$10.00 / 0.01$
	Laplace	10.00/38.06	9.69/0.47	9.99/107.53	5.00/95.30	7.97/0.00	9.81/0.09	$10.00 / 0.00$
	normal mixture	10.00/39.03	9.69/0.57	10.00/112.74	5.00/95.33	8.10/0.00	9.86/0.06	$10.00 / 0.00$
	Laplace mixture	10.00/39.71	9.55/0.83	9.94/71.54	5.00/94.77	7.40/0.00	9.53/0.47	$10.00 / 0.00$
	Cauchy	8.19/29.31	8.04/7.06	10.00/8.50	4.55/95.67	5.53/0.00	6.03/22.94	$9.77 / 0.00$
$τ = 0.7$	normal	10.00/38.85	9.70/0.41	10.00/174.87	5.00/95.40	8.41/0.00	9.99/0.03	$10.00 / 0.01$
	Laplace	10.00/37.68	9.64/0.67	9.99/168.15	5.00/95.05	7.91/0.00	9.77/0.13	$10.00 / 0.01$
	normal mixture	10.00/38.38	9.63/0.63	9.99/165.65	5.00/95.76	8.11/0.00	9.88/0.08	$10.00 / 0.01$
	Laplace mixture	10.00/39.64	9.41/1.22	9.88/150.33	5.00/95.56	7.32/0.00	9.50/1.03	$9.99 / 0.01$
	Cauchy	8.35/30.24	7.54/8.17	9.89/97.85	4.65/95.19	5.73/0.00	6.81/17.95	$9.36 / 0.00$

The bold represents the optimal result in each scenario.

Table 3. The median and standard deviation of 500 MADs were estimated via various methods for simulations with heterogeneous random errors.

Quantile	Error Distribution	Method
Quantile	Error Distribution	Lasso	ALasso	QRL	QRAL	BLQR	BALQR	VBSSLQR
$τ = 0.3$	normal	0.91 (0.10)	1.13 (0.15)	1.31 (0.14)	3.96 (0.22)	1.57 (0.22)	0.90 (0.20)	$0.30$ (0.07)
	Laplace	1.08 (0.13)	1.24 (0.17)	1.61 (0.39)	4.09 (0.23)	1.93 (0.29)	1.23 (0.35)	$0.36$ (0.13)
	normal mixture	1.14 (0.14)	1.32 (0.17)	1.55 (0.35)	4.06 (0.24)	1.92 (0.31)	1.31 (0.35)	$0.33$ (0.11)
	Laplace mixture	1.36 (0.21)	1.51 (0.23)	2.34 (0.60)	4.27 (0.26)	2.30 (0.42)	2.05 (0.52)	$0.39$ (0.24)
	Cauchy	1.91 (102.51)	1.94 (33.45)	1.17 (0.80)	4.80 (19.32)	2.49 (1.11)	3.90 (3.39)	$0.15$ (0.98)
$τ = 0.5$	normal	0.61 (0.08)	0.93 (0.17)	1.21 (0.12)	3.96/0.21	1.39 (0.27)	0.50 (0.22)	$0.30$ (0.07)
	Laplace	0.87 (0.13)	1.12 (0.20)	1.53 (0.25)	4.09 (0.23)	1.84 (0.33)	1.06 (0.35)	$0.29$ (0.08)
	normal mixture	0.81 (0.12)	1.08 (0.19)	1.44 (0.21)	4.05 (0.23)	1.77 (0.31)	1.08 (0.37)	$0.31$ (0.08)
	Laplace mixture	1.12 (0.21)	1.31 (0.25)	1.84 (0.32)	4.28 (0.29)	2.25 (0.41)	1.65 (0.56)	$0.32$ (0.12)
	Cauchy	1.93 (2.23)	1.93 (107.59)	0.74 (0.27)	4.93 (15.90)	2.88 (1.23)	3.72 (4.80)	$0.09$ (0.88)
$τ = 0.7$	normal	0.92 (0.09)	1.20 (0.17)	1.32 (0.14)	3.97/0.20	1.38 (0.26)	0.86 (0.21)	$0.31$ (0.09)
	Laplace	1.07 (0.13)	1.31 (0.18)	1.61 (0.37)	4.09 (0.22)	1.95 (0.33)	1.29 (0.31)	$0.37$ (0.15)
	normal mixture	1.14 (0.14)	1.40 (0.19)	1.53 (0.33)	4.08 (0.23)	1.85 (0.36)	1.31 (0.29)	$0.34$ (0.10)
	Laplace mixture	1.38 (0.20)	1.58 (0.23)	2.16 (0.61)	4.32 (0.28)	2.37 (0.52)	1.86 (0.50)	$0.40$ (0.25)
	Cauchy	1.76 (3.76)	2.01 (17.31)	1.24 (0.76)	4.65 (31.22)	4.01 (1.30)	4.31 (18.21)	$0.16$ (1.04)

The bold represents the optimal result in each scenario.

Table 4. Mean TP/FP of the various methods for simulations with heterogeneous random errors.

Quantile	Error Distribution	Method
		Lasso	ALasso	QRL	QRAL	BLQR	BALQR	VBSSLQR
		TP/FP	TP/FP	TP/FP	TP/FP	TP/FP	TP/FP	TP/FP
$τ = 0.3$	normal	10.00/40.76	9.72/0.35	9.98/170.29	5.00/95.23	7.97/0.00	9.75/0.13	$10.00 / 0.03$
	Laplace	10.00/38.80	9.54/0.83	9.73/140.09	4.99/94.59	7.23/0.00	9.31/1.20	$9.97 / 0.05$
	normal mixture	10.00/38.79	9.59/0.74	9.74/136.63	5.00/95.82	7.29/0.00	9.35/1.23	$9.98 / 0.04$
	Laplace mixture	9.95/37.96	9.17/2.07	8.82/76.98	4.96/95.42	6.53/0.00	8.51/4.48	$9.83 / 0.05$
	Cauchy	7.46/26.86	7.15/10.23	9.56/66.04	4.43/95.56	4.86/0.00	5.73/24.57	$9.52 / 0.00$
$τ = 0.5$	normal	10.00/39.21	9.67/0.44	9.99/142.56	5.00/95.56	7.93/0.00	9.86/0.16	$10.00 / 0.05$
	Laplace	10.00/39.03	9.49/0.93	9.81/89.76	5.00/95.33	7.07/0.00	9.32/1.09	$9.99 / 0.01$
	normal mixture	10.00/38.44	9.51/0.86	9.83/85.73	4.99/95.27	7.41/0.00	9.28/0.96	$9.99 / 0.02$
	Laplace mixture	9.96/38.91	9.26/2.08	9.33/31.13	4.95/95.76	6.23/0.00	8.75/4.13	$9.97 / 0.02$
	Cauchy	7.24/25.47	7.23/9.37	9.99/4.30	4.26/95.95	4.54/0.00	5.92/24.88	$9.60 / 0.00$
$τ = 0.7$	normal	10.00/40.58	9.59/0.52	9.98/171.43	5.00/95.53	7.88/0.00	9.76/0.16	$10.00 / 0.07$
	Laplace	10.00/39.20	9.44/1.14	9.74/144.66	4.99/95.20	6.87/0.00	9.26/0.97	$9.95 / 0.06$
	normal mixture	10.00/38.29	9.38/1.14	9.76/138.14	5.00/95.49	7.04/0.00	9.30/1.30	$9.99 / 0.02$
	Laplace mixture	9.96/38.88	9.08/2.37	8.93/85.78	4.96/95.02	5.95/0.00	8.69/3.50	$9.81 / 0.05$
	Cauchy	7.64/27.34	7.16/10.19	9.61/68.75	4.52/96.36	3.59/0.00	5.79/24.92	9.46/0.00

The bold represents the optimal result in each scenario.

Table 5. RMSE of fitting test dataset and the number of active variables for the various methods for the real dataset.

Method	RMSE/Number of Active Variables
Method	$τ = 0.1$	$τ = 0.3$	$τ = 0.5$	$τ = 0.7$	$τ = 0.9$
QRL	1.36/64	0.94/65	1.09/66	0.93/57	1.21/53
QRAL	3.11/37	3.05/40	3.08/37	2.84/36	3.10/32
BLQR	1.34/1	1.08/1	1.03/1	1.08/1	1.30/1
BALQR	1.67/2	1.08/1	1.02/1	1.08/1	1.30/1
VBSSLQR	1.34/12	$0.94$ /9	$0.78$ /12	$0.83$ /7	$1.14$ /9

The bold represents the optimal result in each scenario.

Table 6. Crime data analysis: variable selection.

Quantile Level ( $τ$ )	Quantile Specific Variables
0.1	racepctblack	pctUrban	PctLess9thGrade	TotalPctDiv
	PctNotHSGrad	PctOccupMgmtProf	FemalePctDiv	PctPersDenseHous
	NumInShelters	PctHousOccup	PctBornSameState
0.3	racepctblack	NumInShelters	FemalePctDiv	PctPersDenseHous
0.3	TotalPctDiv	PctHousOwnOcc	RentLowQ	MedRent
0.5	racePctWhite	racePctHisp	FemalePctDiv	TotalPctDiv
	PctWorkMom	PctSpeakEnglOnly	HousVacant	PctVacantBoarded
	RentLowQ	MedRent	NumInShelters
0.7	racePctWhite	racePctAsian	FemalePctDiv	TotalPctDiv
0.7	PctVacantBoarded		NumInShelters
0.9	racePctWhite	racePctAsian	indianPerCap	PctOccupManu
0.9	MalePctDivorce	PctHousOccup	PctVacantBoarded	NumInShelters

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dai, D.; Tang, A.; Ye, J. High-Dimensional Variable Selection for Quantile Regression Based on Variational Bayesian Method. Mathematics 2023, 11, 2232. https://doi.org/10.3390/math11102232

AMA Style

Dai D, Tang A, Ye J. High-Dimensional Variable Selection for Quantile Regression Based on Variational Bayesian Method. Mathematics. 2023; 11(10):2232. https://doi.org/10.3390/math11102232

Chicago/Turabian Style

Dai, Dengluan, Anmin Tang, and Jinli Ye. 2023. "High-Dimensional Variable Selection for Quantile Regression Based on Variational Bayesian Method" Mathematics 11, no. 10: 2232. https://doi.org/10.3390/math11102232

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

High-Dimensional Variable Selection for Quantile Regression Based on Variational Bayesian Method

Abstract

1. Introduction

2. Models and Methods

2.1. Quantile Regression

2.2. Bayesian Quantile Regression Based on a Spike-and-Slab Lasso

2.3. Quantile Regression with a Spike-and-Slab Lasso Penalty Based on Variational BAYESIAN

3. Simulation Studies

3.1. Independent and Identically Distributed Random Errors Random

3.2. Heterogeneous Random Errors

4. Examples

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Deduction

Appendix B. Expectation

Appendix C. Efficiency Comparison between Bayesian Quantile Regression Methods

Appendix D. Simulation Studies for Cases with Smaller Effect Sizes

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI