Next Article in Journal
Nonlinear Dynamics Analysis of the Wheel-Side Planetary Reducer with Tooth Wear for the In-Wheel Motored Electric Vehicle
Previous Article in Journal
Splitting-Based Regenerations for Accelerated Simulation of Queues
Previous Article in Special Issue
A Meta-Learning Approach for Estimating Heterogeneous Treatment Effects Under Hölder Continuity
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Adaptive Penalized Regression for High-Efficiency Estimation in Correlated Predictor Settings: A Data-Driven Shrinkage Approach

by
Muhammad Shakir Khan
1,* and
Amirah Saeed Alharthi
2
1
Directorate General Livestock & Dairy Development Department (Research Wing), Khyber Pakhtunkhwa, P.O. Box 367, Peshawar 25000, Pakistan
2
Department of Mathematics and Statistics, College of Science, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia
*
Author to whom correspondence should be addressed.
Mathematics 2025, 13(17), 2884; https://doi.org/10.3390/math13172884
Submission received: 25 July 2025 / Revised: 29 August 2025 / Accepted: 3 September 2025 / Published: 6 September 2025
(This article belongs to the Special Issue Statistical Machine Learning: Models and Its Applications)

Abstract

Penalized regression estimators have become widely adopted alternatives to ordinary least squares while analyzing collinear data, despite introducing some bias. However, existing penalized methods lack universal superiority across diverse data conditions. To address this limitation, we propose a novel adaptive ridge estimator that automatically adjusts its penalty structure based on key data characteristics: (1) the degree of predictor collinearity, (2) error variance, and (3) model dimensionality. Through comprehensive Monte Carlo simulations and real-world applications, we evaluate the estimator’s performance using mean squared error (MSE) as our primary criterion. Our results demonstrate that the proposed method consistently outperforms existing approaches across all considered scenarios, with particularly strong performance in challenging high-collinearity settings. The real-data applications further confirm the estimator’s practical utility and robustness.

1. Introduction

The multiple linear regression model (MLRM) remains a cornerstone of statistical analysis due to its mathematical elegance, interpretability, and predictive performance [1,2]. While ordinary least squares (OLS) estimation provides optimal results under ideal conditions, also known as Gauss–Markov assumptions, practical applications often violate these requirements [2]. A particularly common challenge is multicollinearity among predictors. To address these limitations, researchers have developed several alternative estimation approaches, including ridge regression (RR) [3,4], principal component regression [5], elastic net regression [6], raised regression [7], and residualization [8]. Among these alternatives, ridge regression has emerged as particularly popular due to its computational efficiency, mathematical tractability, straightforward interpretation, and, more importantly, its ability to retain all predictors while stabilizing estimates through coefficient shrinkage [9]. Although RR introduces some bias, it often substantially reduces variance, providing overall improved estimation in the presence of multicollinearity [10]. This bias–variance tradeoff makes RR particularly valuable for practical applications where predictor correlations are non-negligible. Considering the following classical MLRM,
y = M 𝛶 + ε
where y ( n   X   1 ) is a vector of responses, M ( n   X   p ) is a design matrix of predictors, 𝛶 ( p   X   1 ) is a vector of unknown regression coefficients, i.e., 𝛶 = ( 𝛶 0 , 𝛶 1 , 𝛶 2 , 𝛶 p ) , where 𝛶 0 is assumed as zero, and ϵ ( n   X   1 ) is a vector of error terms. The error terms follow a multivariate normal distribution with a mean vector of 0 and a variance–covariance matrix σ 2 I n ; n is the number of observations; p is the number of predictors in the model; and I n is an identity matrix of order n. The OLS estimator and covariance matrix of ϒ are defined as follows:
𝛶 ^ = ( M M ) 1 M y   and   Cov ( 𝛶 ^ ) = σ 2 ( M M ) 1
As evident from Equation (2), the OLS estimates and the covariance matrix of 𝛶 ^ heavily depends on the characteristics of M M matrix. The collinearity amongst predictors makes the M M matrix ill-conditioned, resulting in reducing the few eigen values of M M matrix to zero and significantly inflating the variances of OLS estimates, compromising their efficiency and stability. To address the problem of multicollinearity, refs. [3,4] proposed the ridge regression (RR) estimator as
𝛶 ^ ( k ) = ( M M + k I ) 1 M y ,
where I ( p   X   p ) is an identity matrix and k is any positive scalar value, known as “ridge parameter” or “ridge penalty”. The spirit of the RR is to obtain stable estimates at the cost of some bias, i.e., “k”. The existing literature provides ample evidence to state that no ridge estimator performs uniformly superior; rather, its performance is dynamic based on important features of the data, i.e., level of multicollinearity, error variance, and number of predictors. Consequently, ref. [11] mentioned that the selection of the optimum value of the ridge penalty is both art and science. To find such a superior value of ridge penalty, several experts proposed different methods; for instance, we have a generalized ridge estimator proposed by Hocking et al. [12]. They mentioned that their estimator is superior in terms of minimum MSE. Similarly, Hoerl et al. [13] proposed their version of the ridge estimator and compared it with existing ridge estimators, including OLS, through a simulation study. Quantile-based ridge estimators are proposed by Suhail et al. [14]. Lipovetsky et al. [1] noticed that there is a limited liberty in the selection of the ridge penalty owing to the inverse relation of the ridge penalty and the goodness of fit of RR. Hence, to improve the goodness of fit of RR, they proposed a two-parameter ridge (TPR) estimator as follows:
𝛶 ^ ( q , k ) = q ( M M + k I ) 1 M y
where
q ^ = ( M y ) ( M M X + k I ) 1 M y ( M y ) ( M M + k I ) 1 M M ( M M + k I ) 1 M y  
They mentioned that their TPR estimator has not only improved the goodness of fit but also provided better orthogonality property between the predicted values of the response variable and residuals. Subsequently, numerous researchers contributed to bringing improvements in the two-parameter ridge regression model, see for example [15,16,17,18,19]. Though the existing ridge estimators often excel in specific scenarios, they lack robustness and adaptability when applied to diverse data sets. To narrow this gap, this study proposes an auto-adjusted two-parameter ridge estimator (AATPR) that is based on a dynamic ridge penalty and provides an automatic adjustment option to practitioners for diverse data types. The performance of the proposed estimators is evaluated in a range of scenarios through an extensive Monte Carlo simulation by using the minimum mean squared error (MSE) criterion. The applications of the proposed estimator are also evaluated using two real-life data sets.
The remainder of this article unfolds as follows: Section 2 includes statistical methodology, along with a brief review of some popular and widely used existing ridge estimators, followed by our proposed estimator. Simulation design is discussed in Section 3, while Section 4 provides a comprehensive discussion on simulation results. Section 5 assesses the application of proposed estimators on real-life data sets. Finally, some concluding remarks are given in Section 6.

2. Statistical Methodology

The model (1) may be rewritten in canonical form as
y = ψ α + ε ,
where ψ = M D , α = D 𝛶 ,     a n d   D D = I p . Matrix D is an orthogonal matrix that contains eigen vectors of the ( M M ) matrix and I p is an identity matrix. Moreover, Λ = D M M D such that Λ = d i a g ( λ 1 , λ 2 , λ 3 , λ p ) and λ 1 , λ 2 , λ 3 , λ p > 0 are the ordered eigen values (in descending order) of the matrix ( M M ) .
Equations (2)–(4) may be written in canonical form, respectively, as follows:
α ^ = Λ 1 ψ y
α ^ k = ( Λ + k I p ) 1 ψ y
α ^ ( q , k ) = q ( Λ + k I p ) 1 ψ y

2.1. Existing Estimators

This section provides a brief discussion on some popular ridge estimators, while our proposed estimators are discussed in the subsequent section. The pioneering work on ridge regression was conducted by [3]. They proposed the following generalized ridge estimator as an alternative to the OLS estimator for circumventing the multicollinearity issue in regression modeling.
k x = σ ^ 2 α ^ 2 x , x = 1,2 , p
where σ ^ 2 and α ^ 2 are the unbiased estimators of the population variance and regression coefficients, respectively. The following ridge estimators are considered in this study.
  • Hoerl and Kennard Estimator
Hoerl and Kennard introduced RR using a single optimum value of the ridge estimator. In their subsequent work [4], they proposed the following single value for their ridge estimator.
k ^ H K = σ ^ 2 α ^ 2 m a x
where α ^ 2 m a x is the maximum OLS regression coefficient.
2.
Hoerl, Kennard, and Baldwin Estimator
The first improvement in the foremost ridge estimator was suggested in [13], where they proposed their ridge estimator as follows:
k ^ H K B = p σ ^ 2 i = 1 p α ^ i 2
3.
Kibria Estimators
In recent times, ref. [20] suggested three ridge estimators by taking the arithmetic mean, geometric mean, and median of the generalized estimator of Hoerl and Kennard, as defined in Equation (10). However, ref. [20] concluded that amongst the ridge estimators considered in the research, the arithmetic mean estimator performed superiorly. Thus, in this study, we considered the best estimator, which is expressed as follows:
k ^ A M = 1 p i = 1 p σ ^ 2 α ^ i 2
4.
Suhail, Chand, and Kibria Estimator
The idea of Kibria [20] was further improved by [21], who suggested six quantile-based estimators of the generalized estimator of Hoerl and Kennard. According to their simulative results, the 95th quantile performed superiorly on the majority of occasions; hence, in this study, we considered the superior estimator as follows:
k ^ ( Q . 95 ) = p ( k ^ Q . 95 σ 2 α ^ 2 x ) 0.95
5.
Lipovetsky and Conklin Two-Parameter Ridge Estimator
The pioneering work on the two-parameter ridge estimator of Lipovetsky and Conklin [1] is included in this paper. They used Equation (11) as their first ridge parameter, i.e., ridge penalty (k), while for their second ridge parameter (q), Equation (5) was utilized.
6.
Toker and Kaciranlar Two-Parameter Ridge Estimator
To improve the work of [1], Toker and Kaciranlar [16] proposed optimum values of “q” and “k”. The optimum value of q ^ o p t is calculated as follows:
q ^ o p t = i = 1 p α ^ i 2 λ i λ i + k ^ H K i = 1 p ^ σ 2 λ i + α ^ i 2 λ 2 i ( λ i + k ^ H K ) 2
Subsequently, q ^ o p t , is utilized in Equation (16) to compute k ^ o p t as
k ^ o p t = q ^ o p t i = 1 p σ ^ 2 λ i + ( q ^ o p t 1 ) i = 1 p α ^ i 2 λ i 2 i = 1 p α ^ i 2 λ i
7.
Akhtar and Alharti Estimators
More recently, Akhtar and Alharti [18] proposed some modifications in the two-parameter ridge estimation by suggesting three condition-adjusted ridge estimators (CARE) as follows:
k ^ C A R E 2 = 1 p i = 1 p λ i α ^ i 1 + C o n d ( M M )
k ^ C A R E 2 = 2 p i = 1 p λ i α ^ i 1 + C o n d ( M M ) 2
k ^ C A R E 2 = 1 p i = 1 p λ i 2 α ^ i 1 + C o n d ( M M ) 3
where C o n d M M = λ m a x λ m i n is the index number of M M matrix.

2.2. Proposed Estimators

As established, existing ridge estimators demonstrate variable performance across different data conditions. This limitation stems from their dependence on three critical factors: (1) the severity of multicollinearity, (2) model dimensionality, (3) error variance, and (4) sample size. To address this challenge, we propose an adaptive ridge estimator that dynamically adjusts its penalty parameter based on the degree of multicollinearity in the data and model dimensionality. It is important to note that the MSE of the RR estimator generally exhibits a U-shaped curve with respect to k. Initially, as k increases, the MSE decreases as overfitting is reduced. However, beyond a certain point, increasing k too much leads to underfitting, causing the MSE to rise again. It is well-established that the MSE of a ridge estimator is strongly influenced by multicollinearity and model dimensionality. The degree of multicollinearity is precisely diagnosed using eigen values. Metrics like the Condition Number λ M a x λ m i n and Condition Index λ M a x λ m i n offer a robust framework for its detection. In the absence of multicollinearity, eigen values are balanced and moderate. However, under multicollinearity, this balance is disrupted, inflating the largest eigen value while others shrink toward zero. Moreover, multicollinearity may adversely impact the regression coefficients ( α ^ i ) by significantly inflating their values with incorrect signs. Our proposed estimator synthesizes these insights by formulating the ridge penalty k as a function of model dimensionality, regression coefficients, and the condition indices of the data to achieve a unique balance that mitigates both overfitting and underfitting, ensuring robust performance. The numerator addresses overfitting using a function of the eigen values, regression coefficients, and data dimensionality. Simultaneously, the denominator safeguards against underfitting induced by an overly aggressive ridge penalty parameter. The generalized form of our auto-adjusted two-parameter ridge (AATPR) estimator is expressed in mathematical form as follows:
k ^ i ( k , q ) = p λ i p α ^ i 1 + λ M a x λ m i n p
The existing literature strongly emphasizes the critical role of single optimal ridge penalty parameter selection. For our proposed estimators, we adopt the penalty selection framework introduced by [20] to obtain our proposed estimator from the genderized estimator (20) as follows:
k ^ A A T P R ( k , q ) = M e d λ i p α ^ i 1 + λ M a x λ m i n p
where λ M a x is the maximum eigen value and λ M i n is the minimum eigen value of M M matrix.
Although deriving the exact probability distribution of our proposed estimator is theoretically complex, previous work [22] has shown that the asymptotic distribution of the ridge estimator exhibits the properties of a sampling distribution, which in our case follows a normal distribution.

2.3. Performance Evaluation Criteria

Generally, the ridge estimators contain some amount of bias, contrary to OLS estimators that are unbiased. As mentioned by [23], the mean squared error criterion is an appropriate tool for comparing one or more biased estimators. Moreover, the available literature also unanimously advocates for using the minimum MSE criterion in choosing the best estimator, such as in refs. [4,20,24,25] among others.
The MSE is defined as follows:
M S E ( α ^ ) = E ( α ^ α ) ( α ^ α )
Since the theoretical comparison of the estimators mentioned in Section 2.1 and Section 2.2 is intractable, Monte Carlo simulations are performed to empirically evaluate the estimators using the minimum MSE criterion.

3. Simulation Study

In this section, the data generation process for empirical evaluations of the considered estimators is explained. Data is generated based on different values of important varying factors, i.e., pair-wise correlations amongst predictors (ρ), error variance (σ), sample size (n), and number of predictors in the model, to examine the performance of all the considered estimators in the range of situations. For instance, four levels of ρ = 0.90 ,   0.95 ,   0.99 ,   0.999 and σ 2 = 0.5, 1, 5, and 10, three levels of sample size, and two levels of number of predictors are considered to generate the data. The predictors are generated following [10,18,26,27] as follows:
m j i = ( 1 ρ 2 ) 1 2 w j i + ρ w j p + 1 ,   j = 1,2 , p ,   i = 1,2 , n
where wji is a pseudo-random number generated from the standard normal distribution.
The response variable is generated as follows:
y j = α 0 + α 1 m 1 i + α 2 m 2 i + α p m p i + ε i , j = 1,2 , n
where α i are computed, following [26,28,29], based on the most favorable (MF) direction. Moreover, without loss of generality, α 0 , the intercept term, is considered as zero in this study. The ε i is a random error term, computed from a normal distribution with a mean of 0 and a variance of σ 2 . The simulations are replicated 5000 times; hence, the estimated MSE (EMSE) is computed as follows:
E M S E ( α ^ i ) = 1 5000 j = 1 5000 ( α ^ i j α i ) ( α ^ i j α i )
All the necessary calculations are performed using the R programming language. The EMSEs of all the estimators are summed up in Table 1, Table 2 and Table 3, while their graphical display is provided in Figure 1, Figure 2 and Figure 3, respectively.

4. Simulation Result Discussion

Our comprehensive simulation studies yield the following significant findings:
  • Superior Performance of AATPR:
The proposed AATPR estimator consistently outperforms all competing ridge estimators in terms of minimum mean squared error (MSE) across all simulated scenarios. This superior performance persists regardless of sample size, error variance, or number of predictors. As expected, the OLS estimator demonstrates the least robustness to multicollinearity, consistent with previous findings [2,18,20].
2.
Robustness Against Multicollinearity:
Figure 1, Figure 2 and Figure 3 (derived from Table 1, Table 2 and Table 3) clearly demonstrate that while MSE values for OLS and existing ridge estimators increase with rising multicollinearity [14,18,19,30], our AATPR estimator shows an inverse relationship. This remarkable performance stems from the estimator’s dynamic ability to automatically adapt its penalty structure in response to varying levels of predictor correlation.
3.
Stability Across Error Variance Levels:
Simulation results confirm a strong positive association between error variance and MSE for most estimators. However, AATPR maintains exceptional stability, showing only marginal MSE increases even under high error variance conditions combined with multicollinearity.
4.
Dimensionality Effects:
As model complexity increases (particularly in the presence of multicollinearity), all estimators exhibit rising MSE values. However, OLS shows particularly rapid deterioration compared to ridge-type estimators, aligning with established literature [30,31].
5.
Sample Size Considerations:
In compliance with general property, increasing the sample size improves estimation accuracy for all methods. However, ridge estimators (including AATPR) demonstrate consistently better performance than OLS across all sample sizes, corroborating previous findings [20].

5. Applications

To demonstrate the real-world applications of our proposed estimators and methodology, we considered two published data sets manufacturing sector data set adopted by [32] and the Pakistan GDP Growth data set [24]. These data sets possess similar features that were considered earlier in our simulation work.

5.1. Analysis of Manufacturing Sector Data

The data set contains 31 observations using three predictor variables, in the period from 1960 to 1990, considering the following regression model:
y = 𝛶 0 + 𝛶 1 m 1 + 𝛶 2 m 2 + 𝛶 3 m 3 + ϵ i
In model (26), the response variable (y) shows the product value in the manufacturing sector, m1 represents the values of the imported intermediate commodities, m2 is the imported capital commodities, and m3 is the value of imported raw materials. The eigen values of the M M matrix are 2.9851, 0.00989, and 0.0050, respectively. The condition number is computed as 600.2598. Similarly, the variance inflation factor (VIF) for all the predictors is significantly greater than 10, i.e., for m1, m2, and m3, the VIFs computed are 128.2639, 103.4284, and 70.8708. The pair-wise correlation amongst the predictors is provided in Figure 4. All these indicate a severe multicollinearity problem in the data.
The EMSE and regression coefficients of all the considered estimators are given in Table 4. The results revealed that all the ridge estimators performed much better compared to OLS, as reported by [14,20]. The proposed estimator AATPR recorded the minimum MSE amongst all the considered estimators. Moreover, as mentioned by [33], multicollinearity may cause the regression coefficients of OLS to alter their signs; in our case, this happens with 𝛶 1 and 𝛶 2 .

5.2. Analysis of Pakistan GDP Growth Data

This data set contains data for the financial years 2008 to 2021, considering the linear regression model as follows:
y = 𝛶 0 + 𝛶 1 m 1 + 𝛶 2 m 2 + 𝛶 3 m 3 + 𝛶 4 m 4 + 𝛶 5 m 5 + 𝛶 6 m 6 + 𝛶 7 m 7 + 𝛶 8 m 8 + ϵ i  
In model (27), “y” is the GDP rate, m1 is milk production, m2 is meat production, m3 is the number of Buffalo, m4 is the number of Cattle, m5 is the number of Poultry, m6 is Consumer Price Index, m7 is tax to GDP ratio, and m8 is investment to GDP ratio. The eigen values, of the M M matrix are 6.04, 1.13, 0.74, 0.08, 0.0018, 0.00003, 0.000004, and 0.000001, respectively. The condition number is computed as 5,210,223. Similarly, the variance inflation factor (VIF) for most of the predictors is significantly greater than 10, i.e., for m1, m2, m3, m4, m5, m6, m7, and m8, the VIFs computed are 132,350, 633,144, 21,894, 194,158, 144,521, 6, 32, and 8. The pair-wise correlation amongst the predictors is provided in Figure 5. All these indicate a severe multicollinearity problem in the data. The EMSE and regression coefficients for Pakistan DGP Growth Data are given in Table 5.

6. Conclusions

In this paper, we propose an auto-tuning two-parameter ridge (AATPR) estimator that dynamically adjusts shrinkage parameters in response to the condition number of the design matrix (multicollinearity), model dimensionality, and error variance. The estimator’s adaptive mechanism optimizes the bias–variance tradeoff, yielding superior performance compared to existing ridge methods as demonstrated through Monte Carlo simulations and empirical applications. Potential extensions to models with concurrent multicollinearity and heteroscedasticity are identified as valuable future research directions.

Author Contributions

Conceptualization, M.S.K. and A.S.A.; Methodology, M.S.K.; Software, M.S.K.; Validation, A.S.A.; Formal analysis, M.S.K.; Investigation, M.S.K.; Resources, A.S.A.; Writing—original draft, M.S.K.; Writing—review & editing, A.S.A.; Visualization, M.S.K.; Supervision, A.S.A.; Project administration, A.S.A.; Funding acquisition, A.S.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Taif University, Saudi Arabia, Project No. (TU-DSPP-2025-39).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors extend their appreciation to Taif University, Saudi Arabia, for supporting this work through project number (TU-DSPP-2025-39).

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

  1. Lipovetsky, S.; Conklin, W.M. Ridge regression in two-parameter solution. Appl. Stoch. Models Bus. Ind. 2005, 21, 525–540. [Google Scholar] [CrossRef]
  2. Khan, M.S.; Ali, A.; Suhail, M.; Alotaibi, E.S.; Alsubaie, N.E. On the estimation of ridge penalty in linear regression: Simulation and application. Kuwait J. Sci. 2024, 51, 100273. [Google Scholar] [CrossRef]
  3. Hoerl, A.E.; Kennard, R.W. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics 1970, 42, 80. [Google Scholar]
  4. Hoerl, A.E.; Kennard, R.W. Ridge Regression: Applications to Nonorthogonal Problems. Technometrics 1970, 12, 69. [Google Scholar] [CrossRef]
  5. Massy, W.F. Principal Components Regression in Exploratory Statistical Research. J. Am. Stat. Assoc. 1965, 60, 234–256. [Google Scholar] [CrossRef]
  6. Zou, H.; Trevor, H. Regularization and Variable Selection Via the Elastic Net. J. R. Stat. Soc. Ser. B Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef]
  7. Garcia, C.G.; Pérez, J.G.; Liria, J.S. The raise method. An alternative procedure to estimate the parameters in presence of collinearity. Qual. Quant. 2011, 45, 403–423. [Google Scholar] [CrossRef]
  8. Garcia, C.B.; Salmeron, R.; Claudia, G.; Jose, G. Residualization: Justification, Properties and Application. J. Appl. Stat. 2020, 47, 1990–2010. [Google Scholar] [CrossRef]
  9. Belsley, D. A Guide to Using the Collinearity Diagnostics. Comput. Sci. Econ. Manag. 1991, 4, 33–50. [Google Scholar] [CrossRef]
  10. Dar, I.S.; Chand, S. Bootstrap-quantile ridge estimator for linear regression with applications. PLoS ONE 2024, 19, e0302221. [Google Scholar] [CrossRef]
  11. McDonald, G.C. Ridge Regression. Wiley Interdiscip. Rev. Comput. Stat. 2009, 1, 93–100. [Google Scholar] [CrossRef]
  12. Hocking, R.R.; Speed, F.M.; Lynn, M.J. American Society for Quality A Class of Biased Estimators in Linear Regression. Technometrics 1976, 18, 425–437. [Google Scholar] [CrossRef]
  13. Hoerl, A.E.; Kannard, R.W.; Baldwin, K.F. Ridge regression: Some simulations. Commun. Stat. 1975, 4, 105–123. [Google Scholar] [CrossRef]
  14. Suhail, M.; Chand, S.; Kibria, B.M.G. Quantile based estimation of biasing parameters in ridge regression model. Commun. Stat. Simul. Comput. 2020, 49, 2732–2744. [Google Scholar] [CrossRef]
  15. Lipovetsky, S. Two-parameter ridge regression and its convergence to the eventual pairwise model. Math. Comput. Model. 2006, 44, 304–318. [Google Scholar] [CrossRef]
  16. Toker, S.; Kaçiranlar, S. On the performance of two parameter ridge estimator under the mean square error criterion. Appl. Math. Comput. 2013, 219, 4718–4728. [Google Scholar] [CrossRef]
  17. Kuran, Ö.; Özbay, N. Improving prediction by means of a two parameter approach in linear mixed models. J. Stat. Comput. Simul. 2021, 91, 3721–3743. [Google Scholar] [CrossRef]
  18. Akhtar, N.; Alharthi, M.F. Enhancing accuracy in modelling highly multicollinear data using alternative shrinkage parameters for ridge regression methods. Sci. Rep. 2025, 15, 10774. [Google Scholar] [CrossRef]
  19. Alharthi, M.F.; Akhtar, N. Newly Improved Two-Parameter Ridge Estimators: A Better Approach for Mitigating Multicollinearity in Regression Analysis. Axioms 2025, 14, 186. [Google Scholar] [CrossRef]
  20. Kibria, B.M.G. Performance of some New Ridge regression estimators. Commun. Stat. Part B Simul. Comput. 2003, 32, 419–435. [Google Scholar] [CrossRef]
  21. Khalaf, G.; Mansson, K.; Shukur, G. Modified ridge regression estimators. Commun. Stat. Theory Methods 2013, 42, 1476–1487. [Google Scholar] [CrossRef]
  22. Sengupta, N.; Sowell, F. On the Asymptotic Distribution of Ridge Regression Estimators Using Training and Test Samples. Econometrics 2020, 8, 39. [Google Scholar] [CrossRef]
  23. Cochran, W.G. Sampling Techniques; John Wiley & Sons: Hoboken, NJ, USA, 2007. [Google Scholar]
  24. Khan, M.S.; Ali, A.; Suhail, M.; Awwad, F.A.; Ismail, E.A.A.; Ahmad, H. On the performance of two-parameter ridge estimators for handling multicollinearity problem in linear regression: Simulation and application. AIP Adv. 2023, 13, 115208. Available online: https://pubs.aip.org/adv/article/13/11/115208/2920711/On-the-performance-of-two-parameter-ridge (accessed on 2 September 2025). [CrossRef]
  25. Haq, M.S.; Kibira, B.M.G. A Shrinkage Estimator for the Restricted Linear Regression Model: Ridge Regression Approach. J. Appl. Stat. Sci. 1996, 3, 301–316. [Google Scholar]
  26. Mcdonald, G.C.; Galarneau, D.I.; Galarneau, D.I. A Monte Carlo Evaluation of Some Ridge-Type Estimators. J. Am. Stat. Assoc. 1975, 70, 407–416. [Google Scholar] [CrossRef]
  27. Suhail, M.; Chand, S.; Aslam, M. New quantile based ridge M-estimator for linear regression models with multicollinearity and outliers. Commun. Stat. Simul. Comput. 2021, 52, 1417–1434. [Google Scholar] [CrossRef]
  28. Halawa, A.M.; El Bassiouni, M.Y. Tests of regression coefficients under ridge regression models. J. Stat. Comput. Simul. 2000, 65, 341–356. [Google Scholar] [CrossRef]
  29. Newhouse, J.P.; Oman, S.D. An Evaluation of Ridge Estimators; Rand Corporation: Santa Monica, CA, USA, 1971; 716p. [Google Scholar]
  30. Yasin, S.; Kamal, S.; Suhail, M. Performance of Some New Ridge Parameters in Two-Parameter Ridge Regression Model. Iran. J. Sci. Technol. Trans. A Sci. 2021, 45, 327–341. [Google Scholar] [CrossRef]
  31. Majid, A.; Ahmad, S.; Aslam, M.; Kashif, M. A robust Kibria–Lukman estimator for linear regression model to combat multicollinearity and outliers. Concurr. Comput. 2022, 35, e7533. [Google Scholar] [CrossRef]
  32. Eledum, H.; Zahri, M. RELAXATION METHOD FOR TWO STAGES RIDGE REGRESSION ESTIMATOR. Int. J. Pure Appl. Math. 2013, 85, 653–667. [Google Scholar] [CrossRef]
  33. Gujarati, D.N.; Porter, D.C. Basic Econometrics, 5th ed.; McGraw Hill: Columbus, OH, USA, 2009. [Google Scholar]
Figure 1. Graphical display of Table 1.
Figure 1. Graphical display of Table 1.
Mathematics 13 02884 g001
Figure 2. Graphical display of Table 2.
Figure 2. Graphical display of Table 2.
Mathematics 13 02884 g002
Figure 3. Graphical display of Table 3.
Figure 3. Graphical display of Table 3.
Mathematics 13 02884 g003
Figure 4. Pair-wise correlation for manufacturing sector data.
Figure 4. Pair-wise correlation for manufacturing sector data.
Mathematics 13 02884 g004
Figure 5. Pair-wise correlation for Pakistan DGP Growth Data.
Figure 5. Pair-wise correlation for Pakistan DGP Growth Data.
Mathematics 13 02884 g005
Table 1. Estimated MSE values.
Table 1. Estimated MSE values.
n = 20, p = 4
σ = 1σ = 5σ = 10
Estimators0.900.950.990.9990.900.950.990.9990.900.950.990.999
OLS1.83623.511315.8826152.211245.062287.6211417.43613833.6983179.9323355.08641645.855215,732.7017
HK0.83331.17114.848147.989414.355027.0577129.13891122.360956.4732114.0342502.90274881.4484
HKB0.60550.95273.986034.745710.877820.9157101.7892911.033143.418588.4750396.37673795.4800
KAM0.32900.50921.58108.13663.46735.109216.569772.640210.260915.968746.8799203.0419
SCKQ0.951.23782.24029.488688.560826.471751.2289247.35432233.1539105.9365210.3253967.94449300.4372
LCTPR0.09950.15780.31290.13101.44651.13950.69550.45736.83286.01883.92944.3786
TKTPR1.12150.75560.40240.26467.26417.84667.47166.247326.941230.484531.207438.1610
CARE10.13220.09060.06540.06284.15092.71740.96050.810925.437021.62149.52616.5962
CARE20.14290.13670.13810.13160.98580.79290.62120.56226.38305.61463.82614.4869
CARE30.08670.08240.08170.08100.81740.68910.56340.50325.40585.05103.68644.4278
AATPR0.01740.01490.01370.01270.70820.62050.49770.43435.12514.91283.62794.3467
n = 50, p = 4
OLS0.52581.05765.357154.558413.272726.3312135.41971366.603853.6220106.5930533.83775300.7284
HK0.26320.42171.948217.87784.50978.726745.5868444.584417.814435.8653172.06581717.6850
HKB0.24910.36611.379013.21873.45046.528632.6977325.263913.708325.5072126.97091228.5303
KAM0.15540.24780.84214.45781.69662.77468.349038.53164.76567.475519.966885.8998
SCKQ0.950.42680.79003.489233.71708.395916.398383.2759837.363033.623665.8549326.61113221.9677
LCTPR0.04050.07170.23860.15150.61540.51270.28980.14482.29451.84331.48232.9325
TKTPR0.89530.48000.27290.01503.71433.39833.94412.428012.647314.439413.769410.6069
CARE10.15050.10500.05860.05573.64192.06500.36890.179822.581519.17777.17284.2852
CARE20.12860.12890.12670.13280.40920.29160.27580.25243.18552.18211.59913.0491
CARE30.07390.07310.07310.07300.26590.22210.22240.19731.90001.54921.44892.9763
AATPR0.00620.00550.00520.00520.18740.15550.15410.12781.51581.36341.36472.9175
Table 2. Estimated MSE values.
Table 2. Estimated MSE values.
n = 100, p = 4
σ = 1σ = 5σ = 10
Estimators0.900.950.990.9990.900.950.990.9990.900.950.990.999
OLS0.25700.50652.476424.62276.408312.400060.6824612.683925.186149.8313254.25512426.8792
HK0.20590.34191.02347.53822.19684.053118.5939188.40088.167914.928380.7834772.3286
HKB0.13530.21620.67785.99311.67872.959813.8958139.50576.080511.639058.6887546.1121
KAM0.08270.11960.37372.00740.87391.35843.909119.26212.37153.718711.048543.1370
SCKQ0.950.22010.39911.623014.61764.01467.513435.3834357.692015.207829.4572150.00621414.2695
LCTPR0.01610.02590.10090.22470.35860.40360.26640.09781.09760.87750.54150.3817
TKTPR0.44420.23610.15440.09442.53302.35681.45191.83186.45306.35416.98089.4470
CARE10.13080.11250.06080.05272.38891.84030.32520.117611.905410.68053.90301.6847
CARE20.12930.12500.12330.12400.30330.21720.19120.18461.62270.83530.53370.4855
CARE30.07200.07080.07070.07050.15870.14470.13670.13230.73620.54310.43900.4312
AATPR0.00340.00300.00250.00260.08450.07330.06730.06460.50990.43310.36010.3627
n = 20, p = 10
OLS6.631713.698270.4922716.8289169.3331346.08701782.258217,600.3318677.83171383.18726913.379771,815.1929
HK2.95245.752128.0653286.586569.5586140.8331720.75397032.2026281.7812552.96462719.744028,930.2631
HKB1.28412.644313.2939132.226331.209564.5457331.36363094.3952129.5274248.38261233.623012,950.3538
KAM0.44440.79103.473226.19447.644214.223159.9381464.261726.999846.8631191.59891504.4142
SCKQ0.954.81179.666748.8921494.5384118.3918241.11371238.008212,147.2933478.4283957.74004751.417049,598.4219
LCTPR0.02580.04010.16140.24280.66700.63990.44520.18435.48634.18822.79551.7175
TKTPR0.70780.71190.35920.998617.625017.657318.35837.045493.8814102.1094104.3437104.6398
CARE10.18470.14590.13210.13312.81401.08010.48800.262131.028919.43197.68543.8560
CARE20.30990.30510.31860.30770.59260.52360.50950.43244.96363.84622.88121.9929
CARE30.17850.17670.17510.17520.44190.38020.37280.30404.39173.54822.71001.8518
AATPR0.00790.00620.00540.00550.26120.20630.20320.13174.17053.35542.53511.6818
Table 3. Estimated MSE values.
Table 3. Estimated MSE values.
n = 50, p = 10
σ = 1σ = 5σ = 10
Estimators0.900.950.990.9990.900.950.990.9990.900.950.990.999
OLS2.98145.909130.2173293.974374.9081149.7640749.53777316.5106299.1009605.83873001.452529,655.1073
HK1.58392.728212.4960118.444931.293162.7277302.85982991.1756125.9605256.15291214.766112,230.3712
HKB0.69301.20225.581854.004813.843627.6069133.74571321.681255.2758110.6987537.96955258.8791
KAM0.23260.42881.751012.86834.11247.404429.2760205.652413.588525.459994.2310668.3306
SCKQ0.952.30014.407921.7677209.435454.3347107.4687532.19985198.5170216.1079434.77552135.216021,030.5045
LCTPR0.01040.01800.06500.16590.25690.27420.20390.07780.97441.06920.44580.2233
TKTPR0.21060.16380.29250.17812.84982.83242.77601.002630.129931.501526.600927.5152
CARE10.18160.14750.12670.12641.86830.69850.20310.176415.41617.30452.20790.6906
CARE20.30750.30890.30630.30550.38190.36560.37490.35251.03291.12410.64680.5157
CARE30.17200.17150.17170.17200.23590.22970.22330.22130.79510.94100.50970.3786
AATPR0.00260.00230.00200.00210.06360.05800.05220.05050.60450.76140.34050.2081
n = 100, p = 10
OLS1.19832.404512.6309124.502529.500360.0620305.86443126.2703118.6617246.16321271.545012,568.4173
HK0.77920.98445.211348.988512.008024.6107120.19171213.003847.727799.6658516.04025001.0816
HKB0.34540.54362.301622.30265.353710.673054.3188551.278221.556044.3498219.98942096.1307
KAM0.10250.18540.79396.05061.75903.181712.890895.79615.792110.800144.0593309.8799
SCKQ0.950.95881.81518.958986.518720.901842.0719211.31002154.868283.7742171.9511879.91818666.4791
LCTPR0.00430.00710.02640.15340.11750.15450.18550.06930.40380.35760.22610.1205
TKTPR0.08980.32800.15310.02331.06821.10460.82370.316510.29259.864810.029810.0114
CARE10.21470.15990.12620.12342.55351.09110.20410.152614.42406.07310.75720.2278
CARE20.30100.30590.29900.29980.34790.33480.33050.33040.53280.44580.40520.4110
CARE30.17110.17070.17060.17060.20460.19690.19730.19520.35110.29320.27500.2693
AATPR0.00130.00120.00100.00100.03340.02790.02660.02590.17190.12400.10400.1006
Table 4. EMSE and regression coefficients of manufacturing sector data.
Table 4. EMSE and regression coefficients of manufacturing sector data.
EstimatorsOLSHKHKBKAMSCKQ0.95LCTPRTKTPRCARE1CARE2CARE3AATPR
Amount of biasedness-0.090.040.050.080.033.8925.071258.7618,464.7944.46
MSE3.4840.5010.4690.4750.4980.4806.3080.8240.8850.8870.215
𝛶 1 0.2079−0.5738−0.5741−0.5740−0.5738−0.5745−0.5741−0.5743−0.5743−0.5743−0.5743
𝛶 2 0.9205−0.5169−0.5915−0.5727−0.5239−0.61420.0527−0.0100−0.0024−0.0022−0.0022
𝛶 3 −0.134−0.2314−0.2911−0.2750−0.2366−0.31160.0139−0.0028−0.0007−0.0006−0.0006
Table 5. EMSE and regression coefficients of Pakistan GDP Growth Data.
Table 5. EMSE and regression coefficients of Pakistan GDP Growth Data.
EstimatorsOLSHKHKBKAMSCKQ0.95LCTPRTKTPRCARE1CARE2CARE3AATPR
Amount of biasedness-0.080.06437.082265.260.011.612282.7310,421,695.9011,898.135.89
MSE1,343,76417,916.5215,019.9814,639.7814,639.7917,962.5214,645.4614,639.7614,639.7614,639.7614,139.21
𝛶 1 −48.1081−0.0185−0.0185−0.0028−0.0006−0.0186−0.0136−0.0994−0.1015−0.1015−0.1291
𝛶 2 100.2213−0.4462−0.4462−0.0145−0.0029−0.4492−0.3602−0.4605−0.4573−0.4573−0.6795
𝛶 3 −3.12560.58140.58140.01260.00250.58520.50140.39550.39180.39180.4053
𝛶 4 −2.50760.48830.48800.00120.00020.4914−0.68050.03670.03620.03620.3600
𝛶 5 −47.58063.44643.36720.00020.00003.4688−0.03730.00580.00580.00580.0061
𝛶 6 −0.9127−1.7387−0.76670.00000.0000−1.75000.0003−0.0001−0.0001−0.0001−0.0333
𝛶 7 0.4716−14.1422−2.93070.00000.0000−14.23420.0009−0.0001−0.0001−0.0001−0.0181
𝛶 8 −0.246116.60682.60740.00000.000016.7149−0.00080.00010.00010.00010.1644
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Khan, M.S.; Alharthi, A.S. Adaptive Penalized Regression for High-Efficiency Estimation in Correlated Predictor Settings: A Data-Driven Shrinkage Approach. Mathematics 2025, 13, 2884. https://doi.org/10.3390/math13172884

AMA Style

Khan MS, Alharthi AS. Adaptive Penalized Regression for High-Efficiency Estimation in Correlated Predictor Settings: A Data-Driven Shrinkage Approach. Mathematics. 2025; 13(17):2884. https://doi.org/10.3390/math13172884

Chicago/Turabian Style

Khan, Muhammad Shakir, and Amirah Saeed Alharthi. 2025. "Adaptive Penalized Regression for High-Efficiency Estimation in Correlated Predictor Settings: A Data-Driven Shrinkage Approach" Mathematics 13, no. 17: 2884. https://doi.org/10.3390/math13172884

APA Style

Khan, M. S., & Alharthi, A. S. (2025). Adaptive Penalized Regression for High-Efficiency Estimation in Correlated Predictor Settings: A Data-Driven Shrinkage Approach. Mathematics, 13(17), 2884. https://doi.org/10.3390/math13172884

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop