A New Quantile-Based Approach for LASSO Estimation

Shah, Ismail; Naz, Hina; Ali, Sajid; Almohaimeed, Amani; Lone, Showkat Ahmad

doi:10.3390/math11061452

Open AccessArticle

A New Quantile-Based Approach for LASSO Estimation

by

Ismail Shah

^1,†

,

Hina Naz

^1,†,

Sajid Ali

^1,*,†

,

Amani Almohaimeed

^2,*,†

and

Showkat Ahmad Lone

^3,†

¹

Department of Statistics, Quaid-i-Azam University, Islamabad 45320, Pakistan

²

Department of Statistics and Operation Research, College of Science, Qassim University, Buraydah 51482, Saudi Arabia

³

Department of Basic Sciences, College of Science and Theoretical Studies, Saudi Electronic University, Riyadh 11673, Saudi Arabia

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2023, 11(6), 1452; https://doi.org/10.3390/math11061452

Submission received: 26 January 2023 / Revised: 13 March 2023 / Accepted: 14 March 2023 / Published: 16 March 2023

(This article belongs to the Special Issue Computational Statistics and Data Analysis)

Download Review Reports Versions Notes

Abstract

:

Regularization regression techniques are widely used to overcome a model’s parameter estimation problem in the presence of multicollinearity. Several biased techniques are available in the literature, including ridge, Least Angle Shrinkage Selection Operator (LASSO), and elastic net. In this work, we study the performance of the classical LASSO, adaptive LASSO, and ordinary least squares (OLS) methods in high-multicollinearity scenarios and propose some new estimators for estimating the LASSO parameter “k”. The performance of the proposed estimators is evaluated using extensive Monte Carlo simulations and real-life examples. Based on the mean square error criterion, the results suggest that the proposed estimators outperformed the existing estimators.

Keywords:

LASSO; regularization methods; multicollinearity; high-dimensional data; Monte Carlo

MSC:

62J05; 62J07; 62H20; 65C05; 00A72

1. Introduction

Regression analysis is a statistical technique that is concerned with the prediction of a response variable dependent on one or more regressors. The standard linear regression model assumes that there should not exist a strong correlation among predictors, as this could lead to the problem of so-called multicollinearity [1]. However, in many fields of study, the regressors can be highly intercorrelated. For instance, when studying the effect of weight and age on the sugar level of a human, both predictors can be highly correlated, as the weight increases with the increase in age. Similarly, if the age and price of a car are used as regressors in predicting a car’s average selling time, then both predictors can be highly intercorrelated. In the presence of highly correlated regressors, the ordinary least squares estimation can lead to wrong inferences. For example, the ordinary least squares (OLS) estimators are inconsistent in the presence of multicollinearity as they have large standard errors. In the case of perfect multicollinearity, the OLS estimators cannot be estimated uniquely [2].

In such situations, a very popular and successful approach in statistical modeling is to use regularization regression techniques, such as ridge regression or Least Angle Shrinkage Selection Operator (LASSO) estimation [3,4]. The main idea behind these regression techniques is to introduce biased estimators by penalizing the OLS estimators, which decreases the overall standard error of the estimator. By minimizing both the empirical error and the penalty, one can find a model that is well suited and also “simple”, avoiding the large variance that can occur when estimating complex models. However, ridge regression cannot generate a parsimonious model because it still retains all predictors in the model [5]. On the other hand, best-subset selection generates a sparse model, but because of its inherent discreteness, it is extremely variable, as discussed in [6]. To cope with these problems, the LASSO [4] is an example of a good concept in this category. Its popularity stems in part from the regularization induced by the LASSO’s L1 penalty, which results in sparse solutions. LASSO shrinks the estimated coefficient vector toward the origin (in the L1 sense) for different values of k, usually setting some of the coefficients to zero. As a result, the LASSO blends the ridge regression and subset selection characteristics, and it aims to be a useful method for variable selection. The main idea behind the LASSO is to introduce biased estimators in order to decrease the standard error of the estimator. However, bias and variance are complementary; increasing one causes a decrease in the other, and the inverse is also true. This trade-off between bias and variance can be controlled by the LASSO parameter k, which is known as the tuning parameter. References [4,7] compared the predictive performance of LASSO, ridge, and bridge regression and found that none of them dominated the other two uniformly [8]. Although the LASSO has been proven to be effective in various scenarios, it has some limitations. Consider the three scenarios below:

In the case of $p > n$ , the LASSO selects at most n variables before it saturates because of the nature of the convex optimization problem. This seems to be a limiting feature for a variable selection method. Moreover, the LASSO is not well defined unless the bound on the L1-norm of the coefficients is smaller than a certain value [9].
The LASSO cannot perform group selection. If a group of predictors is highly correlated with themselves, the LASSO tends to pick only one among them and will shrink the others to zero [10].
For usual $n > p$ situations, if there are high correlations between the predictors, then it has been empirically observed that the prediction performance of the LASSO is dominated by ridge regression [4].

To address these limitations, several modifications to the LASSO were proposed in the literature, namely the adaptive LASSO [10], fused LASSO [11], group LASSO [12], elastic net [9], degree of freedom of the LASSO [13], and square root LASSO [14]. In addition, different researchers proposed different estimation methods for the estimation of the LASSO parameter “k”, such as references [15,16,17,18,19,20,21,22] and the references cited therein.

This paper aims to study the performance of the LASSO and adaptive LASSO in handling severe multicollinearity among independent variables in the context of multiple regression analysis using Monte Carlo simulations and real-life examples. Furthermore, we propose some new estimators for estimating the LASSO parameter “k” using a quantile-based approach and compared them with existing estimators to assess the performance of the proposals.

The rest of this paper is structured as follows. The general methodology as well as the proposed methods for the estimation of the LASSO parameter “k” are described in Section 2. Section 3 contains information about the simulation settings, while the simulation results are discussed in Section 4. In Section 5, the performance of the proposals, as well as existing LASSO methods, is evaluated using real data. Finally, some concluding remarks are discussed in Section 6.

2. Methodology

Consider the following linear regression model:

Y = X β + ϵ

(1)

where

Y = {(Y_{1}, Y_{2}, \dots, Y_{n})}^{^{'}}

is an

n \times 1

vector of the response variable and

X = [\begin{matrix} X_{11} & X_{12} & \dots & X_{1 p} \\ X_{21} & X_{22} & \dots & X_{2 p} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ X_{n 1} & X_{n 2} & \dots & X_{n p} \end{matrix}]

is a

n \times p

matrix (also known as design matrix) of the observed regressors,

β = {(β_{1}, β_{2}, \dots, β_{p})}^{^{'}}

is a

p \times 1

vector of unknown regression parameters, and

ϵ = {(ϵ_{1}, ϵ_{2}, \dots, ϵ_{n})}^{^{'}}

is an

n \times 1

vector of random errors. It is assumed that

ϵ

is normally distributed with a zero mean and a covariance matrix

σ^{2} I_{n}

, where

I_{n}

represents an identity matrix of the order n. In general, the parameter

\hat{β}

is estimated using the OLS that minimizes the following squared differences:

{\hat{β}}^{o l s} = \underset{β \in R}{a r g m i n} {| | Y - X β | |}_{2}

(2)

where

{| | . | |}_{2}

denotes the

L_{2}

penalty. As a result, the OLS estimator

{\hat{β}}^{o l s}

can be estimated as follows:

{\hat{β}}^{o l s} = {(X^{^{'}} X)}^{- 1} X^{^{'}} Y

and its covariance can be computed as follows:

cov ({\hat{β}}^{o l s}) = σ^{2} {(X^{^{'}} X)}^{- 1} .

Note that both the estimator and its covariance are heavily dependent on the matrix

{(X^{^{'}} X)}^{- 1}

. However, it is well known that in the presence of high correlation among predictors, the matrix

{(X^{^{'}} X)}^{- 1}

is ill-conditioned, and consequently,

{\hat{β}}^{o l s}

is highly inconsistent and has a large variance [5]. To cope with this issue, the LASSO is an alternate estimation procedure that is defined as

{\hat{β}}^{l a s s o} = \underset{β \in R}{a r g m i n} {| | Y - X β | |}_{2} + k {| | β | |}_{1}

(3)

where the first argument assesses the fit while the second argument penalizes the parameter

β

. The parameter k is called the LASSO parameter, which is a trade-off between the fit and the penalty. Thus, the choice of k is an important task in conducting LASSO regression.

Different extensions have been introduced for the LASSO. For example, the adaptive LASSO seeks to minimize

{\hat{β}}^{A . l a s s o} = \underset{β \in R}{a r g m i n} {| | Y - X β | |}_{2} + k \sum_{j = 1}^{p} \hat{ω_{j}} | β_{j} |

(4)

where k is the adaptive LASSO parameter,

β_{j}

represents the estimated coefficients, and

\hat{ω_{j}}

is called the adaptive weights vector, which is defined as follows:

\begin{matrix} \hat{ω_{j}} = \frac{1}{(| {\hat{β}}_{j}^{i n i} {|)}^{γ}} \end{matrix}

where

{\hat{β}}_{j}^{i n i}

refers to an initial estimate of the coefficients and

γ

is a positive constant for adjustment of the adaptive weights vector [10].

As the LASSO estimation is often a nonlinear and non-differentiable response value function, it is difficult to obtain a reliable estimate of its standard error. One approach is through the bootstrap, where either k can be set, or for each bootstrap sample, we can optimize over k. Fixing k is analogous to selecting the best subset and then using the least squares standard error for that subset.

Although a closed-form solution for the LASSO estimator is not possible because the solution is nonlinear in the response variable, an approximate closed-form estimate may be derived by writing the penalty

\sum | β_{j} |

as

\sum β_{j}^{2} / | β_{j} |

. Hence, at the LASSO estimate

\tilde{β}

, we may approximate the solution with a ridge regression of the form

β^{☆} = {(X^{^{'}} X + k W^{-})}^{- 1} X^{^{'}} y

, where W is a diagonal matrix with diagonal elements

| \tilde{β} |

,

W^{-}

denotes the generalized inverse of W, and k is chosen so that

\sum | β_{j}^{☆} |

= k [4]. Thus, the LASSO estimation problem becomes a ridge estimation.

To understand how the estimator is built, suppose that there exists an orthogonal matrix D such that

λ

=

D^{^{'}} C D

and

λ

=

diag (λ_{1}

,

λ_{2}

,

λ_{3}

,...,

λ_{p})

contain the eigenvalues of matrix

C = X^{^{'}} X

. Then, the modified form of Equation (1) is

\begin{matrix} Y = X^{*} α + ϵ \end{matrix}

(5)

where

X^{*} = X D

and

α = D^{^{'}} β

. Consequently, the generalized LASSO regression estimator can be written as

\begin{matrix} \hat{α} (k) & = & (X^{*^{^{'}}} X^{*} + k W^{-})^{- 1} X^{*^{^{'}}} Y \\ = & {(I_{p} + k W^{-} X^{*^{^{'}}} X^{*})}^{- 1} \hat{α} \end{matrix}

(6)

where k =

diag (k_{1}, k_{2}, k_{3}, \dots, k_{p})

and

\hat{α} = {(X^{*^{^{'}}} X^{*})}^{- 1} X^{*^{^{'}}} Y

is the OLS estimate of

α

.

Modified LASSO Estimators

Using Equation (6), some new modified estimators are proposed in this section. In particular, we use different percentiles, such as the 5^th percentile

(P_{05})

, 25^th percentile

(Q_{1})

, 50^th percentile

(Q_{2}

), 75^th percentile

(Q_{3})

and 90^th percentile

(P_{90})

, of the vector

H_{i}

=

\frac{C_{m a x}}{\hat{| α_{i} |}}

, where

C_{m a x}

is the maximum eigenvalue of the matrix C and

\hat{α_{i}}

is the i^th element of

\hat{α}

. In particular, for the vector

H_{i}

, the qth percentile is defined as

P (H < h) \leq q / 100

or, equivalently,

P (H \geq h) \geq 1 - q / 100

. An approximate value of the percentile is obtained through a linear interpolation of the modes for the order statistics from the uniform distribution on

[0, 1]

, as the R function quantile() does [23]. Mathematically, one can use

h_{⌊ γ ⌋} + (γ - ⌊ γ ⌋) (h_{⌈ γ ⌉} - h_{⌊ γ ⌋})

, where

γ = q * (n - 1) / 100 + 1

,

⌊ . ⌋

denotes the largest nearest integer of the specified value,

⌈ . ⌉

returns the lowest integer that is greater than or equal to the given number, and n is the size of the vector

H_{i}

. The main reason for considering percentiles is their robustness against outliers. In particular, the following are the proposed modified estimators.

1: The first proposal to estimate k is IHS1, which is defined as

IHS 1 = H_{P 05}

(7)

where H

_{i}

=

\frac{C_{m a x}}{\hat{| α_{i} |}}

and P

_{05}

denotes the fifth percentile.

2: The second proposal is IHS2, defined as

IHS 2 = H_{Q 1}

(8)

where Q

_{1}

denotes the first quartile.

3: To estimate k, the next proposal is IHS3, defined as

IHS 3 = H_{Q 2}

(9)

where Q

_{2}

denotes the second quartile.

4: The fourth proposal is

IHS 4 = H_{Q 3}

(10)

where Q

_{3}

denotes the third quartile.

5: Next, we propose IHS5, which is defined as

IHS 5 = H_{P 90}

(11)

where P

_{90}

denotes the 90^th percentile.

It is worth mentioning that many other types of shrinkage estimators can be considered for comparison purposes (e.g., [24]). However, we restricted the comparison of our proposed LASSO estimators to classical LASSO estimators which are widely used in the literature.

3. Simulation Settings

This section discusses a comprehensive simulation study that involved varying the number of independent variables, sample, size, correlation coefficient, and residual variance. For each simulation case, all covariates were centered and standardized to have a mean of zero and standard deviation of one. The predictors were generated as follows:

X_{i j} = {(1 - ρ^{2})}^{\frac{1}{2}} Z_{i j} + ρ Z_{i}

where

Z_{ij}

represents random numbers generated from the standard normal distribution and

ρ

is a high correlation coefficient value, indicating strong correlation among predictors. We considered three different correlation coefficient values (i.e.,

ρ

= 0.90, 0.95, and 0.99). In addition, to evaluate the effect of the sample size, different sample sizes such as n = 50, 100, and 150 were considered. Furthermore, we considered p = 4, 8, and 16 with variances

σ^{2}

= 1, 3, 5, 7, and 9 for the error terms to evaluate their effects. Thus, the errors were generated as follows:

1.: $ϵ_{i}$ followed the independent normal (0, 1);
2.: $ϵ_{i}$ followed the independent normal (0, 3);
3.: $ϵ_{i}$ followed the independent normal (0, 5);
4.: $ϵ_{i}$ followed the independent normal (0, 7);
5.: $ϵ_{i}$ followed the independent normal (0, 9).

To study the performance of the OLS, LASSO, adaptive LASSO, and proposed LASSO estimators, we computed the MSE using the following equation:

M S E = \frac{\sum_{i = 1}^{N} {(\hat{β} - β)}^{^{'}} (\hat{β} - β)}{N},

(12)

where

\hat{β}

is the estimator of

β

obtained through estimators and N is the number of replicates used in the Monte Carlo simulation. To achieve a reliable estimate, the simulation studies were repeated 2000 times, and thus 2000 MSEs were computed, with one for every replication.

Simulation Table Settings

Different combinations of varying values for

ρ, n

, and

σ^{2}

are considered in Table 1, Table 2 and Table 3, with p = 4. In particular, high values for the correlation coefficient were considered (i.e.,

ρ

= 0.90, 0.95, and 0.99). To assess the effect of the sample sizes,

n =

50, 100, and 150 were considered. Furthermore, the error variances

σ^{2}

= 1, 3, 5, 7, and 9 were used to evaluate their effect on the proposed estimator performance.

In Table 4, Table 5 and Table 6, the values of

ρ, n

, and

σ^{2}

were the same as those used in the first case. However, the number of variables was increased to p = 8 to assess the effect of the number of variables on the simulation studies. Table 7, Table 8 and Table 9 report the results when considering the number of explanatory variables to be 16 (i.e., p = 16). The choices for other variables remained the same as those used in the first two cases.

4. Simulation Results

The simulation results for the proposed estimators, along with some existing estimators (OLS, LASSO, and adaptive LASSO), are given in Table 1, Table 2, Table 3, Table 4, Table 5, Table 6, Table 7, Table 8 and Table 9. Table 1 reports the results for different estimators used in the study considering

ρ

= 0.90, 0.95, and 0.99, n = 50, p = 4, and

σ

= 1, 3, 5, 7, and 9. From these results, it is evident that the proposed estimators outperformed the OLS and existing LASSO estimators. The poor performance of the OLS estimator was evident from the results, as it provided higher values of MSEs. Furthermore, one can see that when

ρ

= 0.90 and

σ

= 1, 3, 5, 7, the proposed estimator IHS1 performed relatively well, whereas when

σ

= 9, IHS2 outperformed all other estimators. For

ρ

= 0.9, the smallest obtained MSE value was 0.336, which was obtained by IHS1. In the case of

ρ

= 0.95, when

σ

= 1, 3, 5, or 7, IHS1 outperformed all other estimators, whereas for

σ

= 9, the performance of IHS3 was better than other estimators. By increasing the correlation coefficient value

ρ

to 0.99, IHS1 performed relatively well compared with all other estimators, especially for small values of

σ

. As the value of

σ

increased, IHS2 and IHS3 outperformed the other estimators. Given the value of

σ

, the proposed estimators’ increases in MSEs were significantly less than those of the OLS and the existing LASSO estimators.

When considering

ρ

= 0.90, 0.95, and 0.99, p = 4, and

σ

= 1, 3, 5, 7, and 9, the results are listed in Table 2 with n = 100. This suggests that as the sample sizes increased, the proposed estimators IHS1 outperformed all other estimators. By increasing

σ

, the MSEs increased for the OLS and existing LASSO estimators. However, they decreased for the proposed LASSO estimators. The smallest values of the MSEs when considering

ρ

= 0.90, 0.95, and 0.99 were 0.306, 0.239, and 0.151, all of which were produced by the IHS1 proposal. Note that the existing estimators produced relatively large MSE values, especially for large values of

σ

. The table also shows the OLS’s poor performance.

Table 3 reports the results for different estimators used in the study when considering the same values for

ρ

, p, and

σ

with n = 150. This table shows that the proposed estimator IHS1 uniformly outperformed all other estimators. Note that for all combinations of different parameters, IHS1, IHS2, and IHS3 were the best three estimators, indicating the good performance of the proposed estimators. The MSEs decreased with the increase in the value of

σ

for the proposed estimators. However, they increased for the existing LASSO and the OLS estimators. Furthermore, the existing LASSO and the OLS estimators produced significantly large MSEs compared with the proposed estimators.

The regressors in the simulation study were increased from

p = 4

to

p = 8

to assess the effect of the number of explanatory variables listed in Table 4, Table 5 and Table 6. The parameter values

ρ

= 0.90, 0.95, and 0.99 and

σ

= 1, 3, 5, 7, and 9 were considered in these tables. However, the sample size varied by table, as Table 4, Table 5, and Table 6 considered n = 50, 100, and 150, respectively. The proposed estimators, according to these tables, produced significantly lower MSEs than the existing estimators. As previously noted, the proposed estimators IHS1 outperformed all others by producing extremely low MSEs. In comparison with the earlier outcomes, the proposed estimators IHS4 and IHS5 improved their rankings. This suggests that as p and n increased, the proposed estimators were more accurate and reliable than the existing LASSO and OLS estimators. Furthermore, the poor performance of the OLS estimators is shown in these tables, demonstrating that multicollinearity prevents them from being used.

To evaluate the effectiveness of the proposals, the number of regressors was increased from 8 to 16, and the results are shown in Table 7, Table 8 and Table 9. In Table 7,

ρ

= 0.90, 0.95, and 0.99,

n = 50

,

p = 16

, and

σ

= 1, 3, 5, 7, and 9 are taken into account. The same parameter values are specified in Table 8 with

n = 100

. According to these tables, the proposed estimators outperformed the existing LASSO and OLS estimators. The first five rankings in Table 1, Table 2, Table 3, Table 4, Table 5, Table 6, Table 7, Table 8 and Table 9 were attained by the five proposals, which show that they provided superior results in a high-dimensional context. IHS1 produced the best results, and as the sample size increased, the corresponding MSEs were reduced.

5. Real Data Application

This section provides illustrative examples to assess the performance of the proposed estimators in real life.

5.1. Case Study 1

Cruise-ship-info data, which were freely available from the University of Florida website (http://users.stat.ufl.edu/~winner/datasets.html, accessed on 15 May 2021), were utilized to evaluate the performance of the suggested estimators in real-life scenarios. The data contain six predictors: the tonnage (weight of the ship), passengers (passengers on board in 100 s), cabins (number of cabins in 100 s), length (length of the ship in 100 s), pasgrden (passenger density), age (age until 2013), and an outcome variable (crew). Some descriptive statistics of the data are given in Table 10. From the table, one can see that the variables were on different scales. For example, the age variable ranged from 4 years to 48 years, while the tonnage variable ranged from 2.32 to 220. Thus, the variables were standardized before computing the results for different estimators.

Pearson’s correlation coefficient was calculated for each pair of variables in Table 11 in order to examine the correlation among predictors. The majority of the variables in this table were highly intercorrelated, as can be seen by the 0.95 correlation between passengers and the tonnage measure. Similar to this, there was a 0.98 correlation between passengers and cabins. These findings indicate that the data had a significant multicollinearity issue. The data set consists of a dependent variable, “crew”, and six independent variables: Y (the number of crews),

x_{1}

(age until 2013),

x_{2}

(weight of the ship in tons),

x_{3}

(passengers on board),

x_{4}

(length of the ship),

x_{5}

(number of cabins), and

x_{6}

(passenger density).

Thus, for this data set, the following model was considered:

Y_{i} = \sum_{i = 1}^{6} β_{i} x_{i} + ϵ_{i}

(13)

The MSEs of the proposed estimators were computed using the following equation:

MSE (\hat{β}) = σ \sum_{i = 1}^{p} \frac{λ_{i}}{{(λ_{i} + k_{i})}^{2}} + \sum_{i = 1}^{p} \frac{{\hat{k}}_{i}^{2} λ_{i}^{2}}{{(λ_{i} + k_{i})}^{2}}

(14)

where

\hat{k} =

IHS1,⋯, IHS5 and

λ_{i}

is the ith eigenvalue.

The results are listed in Table 12, suggesting that the existing as well as the proposed LASSO estimators performed well compared with the OLS estimator. Note that the proposed estimators IHS1, IHS2, and IHS3 outperformed the competitors. The smallest MSE of 0.985 was obtained with IHS1, followed by IHS2 and IHS3.

5.2. Case Study 2

To check the performance of the proposed LASSO estimators, the Body Fat Prediction data set was used. This data set was generously supplied by Dr. A. Garth Fisher, who permitted freely distributing the data and using them for non-commercial purposes (https://www.kaggle.com/datasets, accessed on 20 December 2022). Out of the 14 human body predictors in the data, 8 were found to be the most highly correlated with the outcome variable of the body fat percentage: weight (in lbs), neck circumference (in cm), chest circumference (in cm), abdomen circumference (cm), hip circumference (cm), thigh circumference (cm), knee circumference (cm), and wrist circumference (cm). The variables in this data set were on different scales, as shown by the descriptive statistics presented in Table 13, so they were normalized before the study.

Table 14 shows the Pearson’s correlation coefficient values for each pair of variables to assess the degree of linear association between the data variables. It is obvious that most of the variables were significantly intercorrelated. For example, the correlation coefficient value between weight and hip was 0.94, but the correlation coefficient value between abdomen and chest was 0.916, indicating that the data had a multicollinearity problem. The following are the variables in the data set:

Y: Body fat;

x_{1}

: Weight;

x_{2}

: Neck circumference;

x_{3}

: Chest circumference;

x_{4}

: Abdomen circumference;

x_{5}

: Hip circumference;

x_{6}

: Thigh circumference;

x_{7}

: Knee circumference;

x_{8}

: Biceps circumference;

x_{9}

: Wrist circumference.

The proposed estimators and competitors were estimated for this data, and the results are summarized in Table 15 using the MSE criterion defined in Equation (14). From this table, it is evident that the existing as well as the proposed estimators had the smallest MSEs compared with the OLS estimator. In addition, the proposed estimators IHS1 and IHS2 outperformed their competitors.

6. Conclusions

The multicollinearity issue in linear regression occurs when the regressors have a high degree of correlation. In the presence of this issue, the OLS estimators are inconsistent and do not produce accurate estimates. To resolve this concern, penalized regression approaches such as the LASSO estimator are extensively used. For estimating the LASSO parameter k, this work provided five new estimators. The estimators’ performance was affected by the standard deviation of the random error

σ

and the correlations between the explanatory variables (

ρ

), the sample size (n), and the number of variables p. To evaluate the effectiveness of estimators employing the MSE criterion, we conducted comprehensive Monte Carlo simulation studies and used real data sets. The findings revealed that the OLS estimator performed poorly when there was substantial predictor correlation.

The findings revealed that in terms of the MSEs, the recommended estimators consistently outperformed the OLS estimators, conventional LASSO estimators, adaptive LASSO estimators, and others. The MSE decreased as the sample size increased, even for high correlation coefficient values and

σ

. Furthermore, in both the simulation studies and real-world data cases, the suggested estimate IHS1 outperformed all other estimators. On the other hand, it is evident that the MSEs of the existing estimators increased as the number of variables (p), the error variance (

σ^{2}

), and the correlation coefficient (

ρ

) between the independent variables increased.

The simulation and real data analysis led to the conclusion that the quartile-based LASSO’s parameter estimates had lower mean square errors than those of the OLS and other regularization regression technique type estimators. This work can be expanded in the future by assuming a multivariate response variable. More research can be performed to determine how well the suggested estimator performs when the response variable is distributed using another exponential distribution.

Author Contributions

Conceptualization, I.S.; Methodology, A.A.; Software, H.N.; Validation, I.S. and S.A.; Formal analysis, H.N.; Investigation, H.N.; Resources, S.A. and S.A.L.; Writing-original draft, I.S.; Writing—review & editing, S.A., A.A. and S.A.L.; Supervision, I.S.; Funding acquisition, A.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Deanship of Scientific Research at Qassim University.

Data Availability Statement

Data sources are mentioned within the manuscript.

Acknowledgments

The researchers would like to thank the Deanship of Scientific Research at Qassim University for funding the publication of this project.

Conflicts of Interest

The authors declare no conflict of interest.

References

Montgomery, D.C.; Peck, E.A.; Vining, G.G. Introduction to Linear Regression Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
Shah, I.; Sajid, F.; Ali, S.; Rehman, A.; Bahaj, S.A.; Fati, S.M. On the performance of jackknife based estimators for ridge regression. IEEE Access 2021, 9, 68044–68053. [Google Scholar] [CrossRef]
Hoerl, A.E.; Kennard, R.W. Ridge regression: Applications to nonorthogonal problems. Technometrics 1970, 12, 69–82. [Google Scholar] [CrossRef]
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 1996, 58, 267–288. [Google Scholar] [CrossRef]
Ali, S.; Khan, H.; Shah, I.; Butt, M.M.; Suhail, M. A comparison of some new and old robust ridge regression estimators. Commun. Stat. Simul. Comput. 2021, 50, 2213–2231. [Google Scholar] [CrossRef]
Breiman, L. Better subset regression using the nonnegative garrote. Technometrics 1995, 37, 373–384. [Google Scholar] [CrossRef]
Fu, W.J. Penalized regressions: The bridge versus the lasso. J. Comput. Graph. Stat. 1998, 7, 397–416. [Google Scholar]
Frank, L.E.; Friedman, J.H. A statistical view of some chemometrics regression tools. Technometrics 1993, 35, 109–135. [Google Scholar] [CrossRef]
Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2005, 67, 301–320. [Google Scholar] [CrossRef] [Green Version]
Zou, H. The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar] [CrossRef] [Green Version]
Tibshirani, R.; Saunders, M.; Rosset, S.; Zhu, J.; Knight, K. Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2005, 67, 91–108. [Google Scholar] [CrossRef] [Green Version]
Yuan, M.; Lin, Y. Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2006, 68, 49–67. [Google Scholar] [CrossRef]
Zou, H.; Hastie, T.; Tibshirani, R. On the “degrees of freedom” of the lasso. Ann. Stat. 2007, 35, 2173–2192. [Google Scholar] [CrossRef]
Belloni, A.; Chernozhukov, V.; Wang, L. Square-root lasso: Pivotal recovery of sparse signals via conic programming. Biometrika 2011, 98, 791–806. [Google Scholar] [CrossRef] [Green Version]
Osborne, M.R.; Presnell, B.; Turlach, B.A. On the lasso and its dual. J. Comput. Graph. Stat. 2000, 9, 319–337. [Google Scholar]
Knight, K.; Fu, W. Asymptotics for lasso-type estimators. Ann. Stat. 2000, 28, 1356–1378. [Google Scholar]
Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
Huang, F. Prediction error property of the lasso estimator and its generalization. Aust. N. Z. J. Stat. 2003, 45, 217–228. [Google Scholar] [CrossRef]
Efron, B.; Hastie, T.; Johnstone, I.; Tibshirani, R.; others. Least angle regression. Ann. Stat. 2004, 32, 407–499. [Google Scholar] [CrossRef] [Green Version]
Meinshausen, N.; Bühlmann, P. High-dimensional graphs and variable selection with the lasso. Ann. Stat. 2006, 34, 1436–1462. [Google Scholar] [CrossRef] [Green Version]
Zbonakova, L.; Härdle, W.K.; Wang, W. Time Varying Quantile Lasso; SFB 649 Discussion Paper, (047); 2016, SSRN. Available online: https://ssrn.com/abstract=2865608 (accessed on 12 March 2022).
Herawati, N.; Nisa, K.; Setiawan, E.; Nusyirwan, N.; Tiryono, T. Regularized multiple regression methods to deal with severe multicollinearity. Int. J. Stat. Appl. 2018, 8, 167–172. [Google Scholar]
R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2021. [Google Scholar]
Xiao, N.; Xu, Q.S. Multi-step adaptive elastic-net: Reducing false positives in high-dimensional variable selection. J. Stat. Comput. Simul. 2015, 85, 3755–3765. [Google Scholar] [CrossRef]

Table 1. Estimated MSEs for n = 50 and p = 4 (superscript represents ranking).

$ρ$	$σ$	OLS	LASSO	A. LASSO	IHS1	IHS2	IHS3	IHS4	IHS5
0.9
	1	${1.507}^{8}$	${0.837}^{5}$	${0.836}^{4}$	${0.545}^{1}$	${0.654}^{2}$	${0.77}^{3}$	${0.865}^{6}$	${0.896}^{7}$
	3	${4.513}^{8}$	${0.953}^{7}$	${0.952}^{6}$	${0.401}^{1}$	${0.469}^{2}$	${0.566}^{3}$	${0.702}^{4}$	${0.759}^{5}$
	5	${12.55}^{8}$	${2.003}^{7}$	${1.98}^{6}$	${0.336}^{1}$	${0.404}^{2}$	${0.496}^{3}$	${0.625}^{4}$	${0.685}^{5}$
	7	${23.851}^{8}$	${4.089}^{6}$	${4.319}^{7}$	${0.367}^{1}$	${0.395}^{2}$	${0.467}^{3}$	${0.5892}^{4}$	${0.6516}^{5}$
	9	${39.426}^{8}$	${7.738}^{6}$	${7.652}^{7}$	${0.53}^{5}$	${0.478}^{4}$	${0.486}^{3}$	${0.574}^{2}$	${0.629}^{1}$
0.95
	1	${1.057}^{8}$	${0.837}^{6}$	${0.86}^{5}$	${0.492}^{1}$	${0.57}^{2}$	${0.692}^{3}$	${0.811}^{4}$	${0.853}^{7}$
	3	${9.394}^{8}$	${0.953}^{7}$	${0.95}^{6}$	${0.313}^{1}$	${0.392}^{2}$	${0.491}^{3}$	${0.632}^{4}$	${0.694}^{5}$
	5	${26.174}^{8}$	${1.988}^{7}$	${1.984}^{6}$	${0.262}^{1}$	${0.327}^{2}$	${0.423}^{3}$	${0.558}^{4}$	${0.623}^{5}$
	7	${49.606}^{8}$	${4.074}^{6}$	${4.312}^{7}$	${0.32}^{1}$	${0.335}^{2}$	${0.403}^{3}$	${0.528}^{4}$	${0.595}^{5}$
	9	${82.138}^{8}$	${7.695}^{7}$	${7.606}^{6}$	${0.514}^{3}$	${0.441}^{2}$	${0.434}^{1}$	${0.519}^{4}$	${0.581}^{5}$
0.99
	1	${5.817}^{8}$	${0.837}^{7}$	${0.836}^{6}$	${0.336}^{1}$	${0.417}^{2}$	${0.516}^{3}$	${0.658}^{4}$	${0.719}^{5}$
	3	${51.569}^{8}$	${0.953}^{7}$	${0.95}^{6}$	${0.153}^{1}$	${0.223}^{2}$	${0.338}^{3}$	${0.488}^{4}$	${0.561}^{5}$
	5	${144.17}^{8}$	${1.998}^{6}$	${2.011}^{7}$	${0.166}^{1}$	${0.198}^{2}$	${0.3286}^{3}$	${0.429}^{4}$	${0.515}^{5}$
	7	${272.194}^{8}$	${4.071}^{6}$	${4.322}^{7}$	${0.295}^{2}$	${0.265}^{1}$	${0.299}^{3}$	${0.414}^{4}$	${0.499}^{5}$
	9	${451.804}^{8}$	${7.644}^{6}$	${7.66}^{7}$	${0.564}^{5}$	${0.438}^{3}$	${0.374}^{1}$	${0.431}^{2}$	${0.505}^{4}$

Table 2. Estimated MSEs for

n = 100

and

p = 4