Next Article in Journal
Optimizing Maintenance of Energy Supply Systems in City Logistics with Heuristics and Reinforcement Learning
Previous Article in Journal
Navier–Stokes Equation in a Cone with Cross-Sections in the Form of 3D Spheres, Depending on Time, and the Corresponding Basis
Previous Article in Special Issue
Mitigating Multicollinearity in Regression: A Study on Improved Ridge Estimators
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Modified Liu Parameters for Scaling Options of the Multiple Regression Model with Multicollinearity Problem

by
Autcha Araveeporn
Department of Statistics, School of Science, King Mongkut’s Institute of Technology Ladkrabang, Bangkok 10520, Thailand
Mathematics 2024, 12(19), 3139; https://doi.org/10.3390/math12193139
Submission received: 22 August 2024 / Revised: 27 September 2024 / Accepted: 5 October 2024 / Published: 7 October 2024
(This article belongs to the Special Issue Application of Regression Models, Analysis and Bayesian Statistics)

Abstract

:
The multiple regression model statistical technique is employed to analyze the relationship between the dependent variable and several independent variables. The multicollinearity problem is one of the issues affecting the multiple regression model, occurring in regard to the relationship among independent variables. The ordinal least square is the standard method to evaluate parameters in the regression model, but the multicollinearity problem affects the unstable estimator. Liu regression is proposed to approximate the Liu estimators based on the Liu parameter, to overcome multicollinearity. In this paper, we propose a modified Liu parameter to estimate the biasing parameter in scaling options, comparing the ordinal least square estimator with two modified Liu parameters and six standard Liu parameters. The performance of the modified Liu parameter is considered, generating independent variables from the multivariate normal distribution in the Toeplitz correlation pattern as the multicollinearity data, where the dependent variable is obtained from the independent variable multiplied by a coefficient of regression and the error from the normal distribution. The mean absolute percentage error is computed as an evaluation criterion of the estimation. For application, a real Hepatitis C patients dataset was used, in order to investigate the benefit of the modified Liu parameter. Through simulation and real dataset analysis, the results indicate that the modified Liu parameter outperformed the other Liu parameters and the ordinal least square estimator. It can be recommended to the user for estimating parameters via the modified Liu parameter when the independent variable exhibits the multicollinearity problem.
MSC:
62J02; 62J05; 62J07; 62J20; 62P25

1. Introduction

Regression analysis is a potent statistical tool that reveals the connections between one or more independent variables and a dependent variable. Essential in data analysis and predictive modeling, it finds broad application across fields such as economics, finance, healthcare, and social sciences. However, regression models must meet certain assumptions to provide reliable and valid results. These assumptions form the foundation of regression analysis and guide researchers in interpreting results accurately. One problematic assumption to avoid is the linear relationship among independent variables called multicollinearity, which occurs when two or more independent variables are correlated, increasing the standard error of the coefficients. This escalation in standard errors can render the coefficients of certain independent variables statistically insignificant despite their potential significance. In essence, multicollinearity distorts the interpretation of variables by inflating their standard errors [1]. Shrestha [2] discussed the primary techniques for investigating multicollinearity using questionnaires for survey data to support customer satisfaction.
The Toeplitz correlation structure is a specific type of correlation pattern frequently appearing in real-world datasets such as financial time series data, spatial models, climate data, correlation in DNA sequences, and time-dependent traffic patterns. The properties of Toeplitz covariance matrices have been extensively applied across various fields, with early examples found in psychometric and medical research [3]. Furthermore, the Toeplitz correlation structure is a part of multicollinearity, often arising in datasets with variables exhibiting inherent relationships. Qi et al. [4] utilized a multiple-Toeplitz matrix reconstruction method with quadratic spatial smoothing to enhance direction-of-arrival estimation performance for coherent signals under low signal-to-noise ratio conditions.
Traditional regression techniques often struggle to handle multicollinearity effectively, leading to biased results and unreliable predictions. Researchers have developed various methods to mitigate these challenges, including Liu regression. Liu regression is a technique designed to address multicollinearity in regression analysis. It combines the principles of ridge regression with orthogonalization to effectively mitigate the effects of multicollinearity. Dawoud et al. [5] devised a novel modified Liu estimator to employ multicollinearity in a regression model with a single parameter, incorporating two biasing parameters, with at least one designed to mitigate this issue. Jahufer [6], on the other hand, employed the Liu estimator to alleviate the impact of multicollinearity and the influence of specific observations, devising approximate deletion formulas for identifying influential points.
In predictive analytics, the search for accurate models that can efficiently handle complex datasets while offering robust predictions is perpetual. Among the array of methodologies, the Liu regression model enables better control over the trade-off between bias and variance, leading to more stable and reliable parameter estimates. The flexibility of the Liu estimator makes it a valuable tool in the modern statistician’s toolkit, particularly in fields where predictive accuracy is critical. Karlsson et al. [7] introduced a Liu estimator tailored for the beta regression model with a fixed dispersion parameter, applicable in various practical scenarios where the correlation level among the regressors varies.
Liu regression [8] involves selecting a Liu estimator to balance the bias–variance trade-off. The optimal value of the Liu estimator is typically chosen through techniques such as cross-validation. The Liu estimator, named after its developer, is essential in managing multicollinearity. It is particularly associated with methodologies like ridge regression with orthogonalization, often abbreviated as Liu regression. Liu [9] enhanced the Liu estimator within the linear regression model by considering the biasing parameter under the prediction sum-of-squares criterion. Yang and Xu [10] proposed an alternative stochastic restricted Liu estimator for the parameter vector in a linear regression model, incorporating additional stochastic linear restrictions. Hubert and Wijekoon [11] investigated a novel Liu-type biased estimator, termed the stochastic restricted Liu estimator, and examined its efficiency.
The improvement of the Liu estimator transformed the multiple regression model to canonical form [12] to select a biasing parameter called the Liu parameter. The appropriate Liu parameters were developed to obtain the minimum mean square error in the estimation. Liu [8,9] applied the iterative method to estimate the Liu parameter as the minimum mean squares error in the Liu estimator. Özkale and Kaçiranlar [13] proposed a new restricted Liu parameter by computing the predicted residual error sum of squares to determine the biasing parameter. Dawoud et al. [5] proposed a new Liu estimator using the known mean squares error criterion to handle the multicollinearity problem. Suhail et al. [14] developed a new method of biasing parameters to mitigate the multicollinearity data. Lukman et al. [15] introduced a modified Liu estimator to address multicollinearity issues within the linear regression model.
In this paper, we propose two competing Liu parameters, following mean squares error and R-squared approaches, to estimate the Liu estimator via a multiple regression model with the multicollinearity problem. We measure this performance using the minimum average mean absolute percentage errors for the simulation and real dataset. We also consider the scale option of independent variables including the center, correlation form, and standardization.
This paper is structured as follows: Section 2 presents the multiple regression estimators and discusses the Liu estimator through the reparameterization of Liu regression in canonical form, then in comparison with the OLS estimator. Section 3 describes generation of the independent and dependent variables to evaluate the performance estimators. Section 4 applies a real dataset to validate the simulation results. Section 5 discusses the findings, followed by the conclusion in Section 6.

2. Liu Regression

The multiple regression model is expressed in matrix form as follows:
y = X β + ε ,
where y is the n × 1 a column vector of the dependent variable, X is the n × ( p + 1 ) independent variable matrix, β is the ( p + 1 ) × 1 multiple regression parameter vectors, and ε is the n × 1 error vector. The following assumptions of error are made: E ( ε ) = 0 , E ( ε ε ) = σ 2 I n , and V a r ( ε ) = σ 2 I n . The efficient parameters ( β ) in (1) are commonly estimated to obtain the ordinary least squares (OLS) estimator in (2), as follows:
β ^ O L S = ( X X ) 1 X y .
The estimation error of β ^ O L S is evaluated by computing:
β ^ O L S β = ( X X ) 1 X y β = ( X X ) 1 X ( X β + ε ) β = ( X X ) 1 X ε .
The bias, variance (Var), and mean squares error (MSE) of the OLS estimator are computed from (3) as follows:
B i a s ( β ^ O L S ) = E ( β ^ O L S β ) = E [ ( X X ) 1 X ε ] = 0 , V a r ( β ^ O L S ) = E ( β ^ O L S β ) ( β ^ O L S β ) = E [ ( X X ) 1 X ε ε X ( X X ) 1 ] = σ 2 ( X X ) 1 , M S E ( β ^ O L S ) = V a r ( β ^ O L S ) + ( B i a s ( β ^ O L S ) ) ( B i a s ( β ^ O L S ) ) = σ 2 ( X X ) 1 .
Hoel and Kennard [16] proposed ridge regression, a powerful technique for handling multicollinearity in linear regression models. Ridge regression addresses the issue by adding a penalty term to the ordinary least squares (OLS) estimation process, shrinking the coefficients towards zero. This regularization helps reduce model complexity and improve prediction accuracy. The ridge regression estimator is expressed as follows:
β ^ R i d g e = ( X X + λ I p ) 1 X y ,
where λ is the regularization parameter controlling the shrinkage amount.
From the above computation, the OLS estimator presents the unbiased estimator, which reduces the performance in estimating parameters in relation to the multicollinearity of independent variables. The diagonal matrix causes multicollinearity and inflation, increasing the estimated variance and mean squares error. To overcome this problem, Liu [8] proposed the Liu estimator, which performs better than the OLS estimator [13,17]. The Liu estimator is written based on the OLS as follows:
β ^ L i u = ( X X + I p ) 1 ( X X + d L i u I p ) β ^ O L S , 0 < d L i u < 1 ,
where d L i u is the Liu parameter in terms of the biasing parameter and I is the identity matrix. The OLS form (2) and Liu estimators from (5) are related to the independent variables affected by the multicollinearity problem because they depend on the OLS estimator.
The estimation error of β ^ O L S is evaluated as the OLS estimator by comparing the Liu estimator and the parameter of the multiple regression model:
β ^ L i u β = ( X X + I p ) 1 ( X X + d L i u I p ) β ^ O L S β .
The bias [18], variance (Var), and mean square error (MSE) of the Liu estimator from (6) are proposed in the following:
B i a s ( β ^ L i u ) = E ( β ^ L i u β ) = ( X X + I p ) 1 ( d L i u I p ) β , V a r ( β ^ L i u ) = E ( β ^ L i u β ) ( β ^ L i u β ) = σ 2 ( X X + I p ) 1 ( X X + d L i u I p ) ( X X ) 1 ( X X + d L i u I p ) ( X X + I p ) 1 , M S E ( β ^ L i u ) = V a r ( β ^ L i u ) + ( B i a s ( β ^ L i u ) ) ( B i a s ( β ^ L i u ) ) = ( X X + I p ) 1 ( X X + d L i u I p ) σ 2 ( X X ) 1 ( X X + d L i u I p ) ( X X + I p ) 1 + ( d L i u I p ) 2 β ( X X + I p ) 2 β .
The Liu estimator is shown as the bias estimator, and its variance is greater than that of the OLS estimator when d L i u lies between zero and one. Subsequently, Liu [9] developed the shrinkage factor [19] to create the Liu parameter that may lie outside the range between zero and one. In the following subsection, the multiple regression model is transformed into a canonical form to estimate the OLS and Liu estimators.

2.1. The Reparameterization of Liu Regression

The reparameterization of Liu regression transforms a multiple regression model into a canonical form, offering valuable insights into variable relationships and enhancing predictive accuracy [19]. The optimal Liu parameter is determined by minimizing the mean squares error. Akdeniz and Kacįranlar [20] introduced a new biased estimator and assessed its performance against a restricted least squares estimator regarding mean squares error. The comparison of the Liu estimator’s performance in canonical form is expressed as follows:
y = Z α + ε ,
where Z = X G , α = G β , Z Z = G X X G = Λ , and Λ is a diagonal matrix such that ( λ 1 , λ 2 , , λ p ) . The OLS estimator in canonical form can be defined as follows:
α ^ O L S = Λ 1 Z y .
Similarly, the Liu estimator [21] can be written as follows:
α ^ R . L i u = ( Λ + I p ) 1 ( Z y + d R . L i u α ^ O L S ) = ( Λ + I p ) 1 ( Λ + d R . L i u I p ) α ^ O L S .
The bias, variance (Var), and mean square error (MSE) of the reparameterization of the OLS estimator from (8) are expressed as follows:
B i a s ( α ^ O L S ) = E ( α ^ O L S α ) = E [ ( Z Z ) 1 Z ε ] = 0 , V a r ( α ^ O L S ) = E ( α ^ O L S α ) ( α ^ O L S α ) = E [ ( Z Z ) 1 Z ε ε Z ( Z Z ) 1 ] = σ 2 ( Z Z ) 1 = σ 2 Λ 1 , M S E ( α ^ O L S ) = V a r ( α ^ O L S ) + ( B i a s ( α ^ O L S ) ) ( B i a s ( α ^ O L S ) ) = σ 2 Λ 1 .
The bias, variance (Var), and mean square error (MSE) of the reparameterization of the Liu estimator from (9) are proposed in the following:
B i a s ( α ^ R . L i u ) = E ( α ^ R . L i u α ) = ( Λ + I p ) 1 ( d R . L i u I p ) α ,
V a r ( α ^ R . L i u ) = E ( α ^ R . L i u α ) ( α ^ R . L i u α ) = σ 2 ( Λ + I p ) 1 ( Λ + d R . L i u I p ) ( Λ ) 1 ( Λ + d R . L i u I p ) ( Λ + I p ) 1 , M S E ( α ^ R . L i u ) = V a r ( α ^ R . L i u ) + ( B i a s ( α ^ R . L i u ) ) ( B i a s ( α ^ R . L i u ) )
= σ 2 ( Λ + I p ) 1 ( Λ + d R . L i u I p ) ( Λ ) 1 ( Λ + d R . L i u I p ) ( Λ + I p ) 1 + ( d R . L i u I p ) 2 α ( Λ + I p ) 2 α .
Furthermore, the bias, variance, and mean squares error are given by Equations (13), (14), and (15), respectively:
B i a s ( α ^ R . L i u ) = ( d R . L i u 1 ) j = 1 p α j ( λ j + 1 ) ,
V a r ( α ^ R . L i u ) = σ 2 j = 1 p ( λ j + d R . L i u ) 2 λ j ( λ j + 1 ) 2 ,
M S E ( α ^ R . L i u ) = σ 2 j = 1 p ( λ j + d R . L i u ) 2 λ j ( λ j + 1 ) 2 + ( d R . L i u 1 ) 2 j = 1 p α j 2 ( λ j + 1 ) 2 .
The OLS and Liu estimators in canonical form were compared by considering the variance and MSE.
Given the α ^ O L S and α ^ R . L i u , the α ^ R . L i u is the better estimator than α ^ O L S , that is M S E ( α ^ O L S ) M S E ( α ^ R . L i u ) > 0 , , if and only if V a r ( α ^ O L S ) V a r ( α ^ R . L i u ) > 0 ,
Recall that:
V a r ( α ^ O L S ) = σ 2 Λ 1 and V a r ( α ^ R . L i u ) = σ 2 ( Λ + I p ) 1 ( Λ + d R . L i u I p ) ( Λ ) 1 ( Λ + d R . L i u I p ) ( Λ + I p ) 1 .
Then:
V a r ( α ^ O L S ) V a r ( α ^ R . L i u ) = σ 2 Λ 1 σ 2 ( Λ + I p ) 1 ( Λ + d R . L i u I p ) ( Λ ) 1 ( Λ + d R . L i u I p ) ( Λ + I p ) 1 = σ 2 d i a g [ 1 λ j ( λ j + d R . L i u ) 2 λ j ( λ j + 1 ) 2 ] > 0 , j = 1 , , p .
It can be observed that 1 λ j > ( λ j + d R . L i u ) 2 λ j ( λ j + 1 ) 2 when 0 < d R . L i u < 1 . It can be concluded that V a r ( α ^ O L S ) V a r ( α ^ R . L i u ) > 0 , and the Liu estimator outperforms the OLS estimator.

2.2. Liu Parameter

As per the above subsection, we compared the two estimators. The reparameterization of Liu regression provides the performance estimator. However, the existing Liu estimator is to select the appropriate Liu parameter that was started by Liu [8] and developed into another model by Suhail et al. [14], Lukman et al. [15], Abdelwahab et al. [22], and Babar et al. [23]. The optimal Liu parameter is one reason to make the minimum of mean squares error (MSE) that is excessed to affect the estimation of the Liu estimator of collinearity on independent variables. However, tracing a diagonal matrix of transformation is useful for calculating the optimal Liu parameter. In this article, we suggest a version of the original Liu parameter, which was proposed by Liu [8], which is defined according to the optimum (opt), the minimum MSE (mm), and Cl criterion (cl), respectively, as follows:
From (15), the mean squares error (MSE) of the estimator d R . L i u is given by:
M S E ( α ^ R . L i u ) = σ 2 j = 1 p ( λ j + d R . L i u ) 2 λ j ( λ j + 1 ) 2 + ( d R . L i u 1 ) 2 j = 1 p α j 2 ( λ j + 1 ) 2 .
Now, we need to differentiate the MSE concerning d = d R . L i u . This involves differentiating both the variance and bias terms as follows:
g ( d ) = d ( σ 2 j = 1 p ( λ j + d ) 2 λ j ( λ j + 1 ) 2 + ( d 1 ) 2 j = 1 p α i 2 ( λ j + 1 ) 2 )   = 2 σ 2 j = 1 p ( λ j + d ) λ j ( λ j + 1 ) 2 + 2 ( d 1 ) j = 1 p α j 2 ( λ j + 1 ) 2 = 0 .
From (17), this equation and solving for d yield the optimal d , then:
2 σ 2 j = 1 p ( λ j + d ) λ j ( λ j + 1 ) 2 = 2 ( d 1 ) j = 1 p α j 2 ( λ j + 1 ) 2 σ 2 j = 1 p λ j λ j ( λ j + 1 ) 2 d σ 2 j = 1 p 1 λ j ( λ j + 1 ) 2 = d j = 1 p α j 2 ( λ j + 1 ) 2 j = 1 p α j 2 ( λ j + 1 ) 2 j = 1 p α j 2 ( λ j + 1 ) 2 σ 2 j = 1 p λ j λ j ( λ j + 1 ) 2 = d σ 2 j = 1 p 1 λ j ( λ j + 1 ) 2 + d j = 1 p α j 2 ( λ j + 1 ) 2 j = 1 p α j 2 σ 2 ( λ j + 1 ) 2 = d j = 1 p σ 2 + λ j α j 2 λ j ( λ j + 1 ) 2 d = j = 1 p α j 2 σ 2 ( λ j + 1 ) 2 j = 1 p σ 2 + λ j α j 2 λ j ( λ j + 1 ) 2 .
After solving, the d o p t is given by:
d o p t = j = 1 p α ^ j 2 σ ^ 2 ( λ j + 1 ) 2 j = 1 p σ ^ 2 + λ j α ^ j 2 λ j ( λ j + 1 ) 2 ,
where σ 2 = σ ^ 2 = ( y X β ^ O L S ) ( y X β ^ O L S ) n p as the estimated standard deviation of the error term in the regression model and α = α ^ j = Λ 1 Z y as the estimated coefficients in the canonical form of the Liu regression model.
From (17), the minimum MSE is to substitute α j 2 = α ^ j 2 σ ^ 2 λ j and σ 2 = σ ^ 2 for their unbiased estimator, and the derivative of the MSE with respect to d is set to zero:
2 σ ^ 2 j = 1 p ( λ j + d ) λ j ( λ j + 1 ) 2 = 2 ( d 1 ) j = 1 p α ^ j 2 σ ^ 2 λ j ( λ j + 1 ) 2 σ ^ 2 j = 1 p λ j λ j ( λ j + 1 ) 2 σ ^ 2 d j = 1 p 1 λ j ( λ j + 1 ) 2 = ( d 1 ) j = 1 p λ j α ^ j 2 σ ^ 2 λ j ( λ j + 1 ) 2 σ ^ 2 j = 1 p λ j λ j ( λ j + 1 ) 2 σ 2 d j = 1 p 1 λ j ( λ j + 1 ) 2 = d j = 1 p λ j α ^ j 2 λ j ( λ j + 1 ) 2 j = 1 p λ j α ^ j 2 λ j ( λ j + 1 ) 2 σ ^ 2 d j = 1 p 1 λ j ( λ j + 1 ) 2 + σ ^ 2 j = 1 p 1 λ j ( λ j + 1 ) 2 j = 1 p λ j α ^ j 2 λ j ( λ j + 1 ) 2 d j = 1 p λ j α ^ j 2 λ j ( λ j + 1 ) 2 = σ ^ 2 j = 1 p 1 λ j ( λ j + 1 ) 2 + σ ^ 2 j = 1 p λ j λ j ( λ j + 1 ) 2 ( 1 d ) j = 1 p λ j α ^ j 2 λ j ( λ j + 1 ) 2 = σ ^ 2 j = 1 p ( λ j + 1 ) λ j ( λ j + 1 ) 2
Minimizing the MSE leads to the following expression for d m m :
d m m = 1 σ ^ 2 [ j = 1 p 1 λ i ( λ j + 1 ) j = 1 p α ^ j 2 ( λ j + 1 ) 2 ] .
The Liu parameter from the CL criterion is used to find the optimal biasing parameter d that minimizes the CL criterion, which balances the trade-off between fitting the data well and keeping the model’s complexity under control. The following formula gives the CL criterion:
C L d = S S R e s , d σ ^ 2 + 2 trace ( H d ) ( n 2 ) ,
where S S R e s , d = ( y Z α ^ O L S ) ( y Z α ^ O L S ) is the residual sum of squares, H d = X ( X X + I p ) 1 ( X X + d I p ) X , and σ ^ 2 is the estimated variance of the errors.
To find the optimal d , we need to take the derivative of the CL criterion with respect to d and set it to zero: d C L d d d = 0 .
After calculating the derivative and rearranging as d o p t and d m m , we obtain the following equation:
d c l = 1 σ ^ 2 [ j = 1 p 1 ( λ j + 1 ) j = 1 p λ j α ^ j 2 ( λ j + 1 ) 2 ] .
Furthermore, Liu [9] improved the Liu parameter in multiple linear regression under the approximation of the predicted residual error sum of squares criterion via the improved Liu estimator (ILE) as follows:
d I L E = i = 1 n [ e ˜ i 1 g i i ( e ˜ i 1 g i i e ^ i 1 h i i ) ] j = 1 p [ e ˜ i 1 g i i e ^ i 1 h i i ] ,
where
e ^ = y i x i ( X X x i x i ) 1 ( X y x i y i ) , e ˜ = y i x i ( X X + I p x i x i ) 1 ( X y x i y i ) , G = X ( X X + I p ) 1 X , H = X ( X X ) 1 X
and g i i = x i ( X X + I p ) 1 x i , h i i = x i ( X X ) 1 x i .
Özkale and Kaçiranlar [13] introduced a new two-parameter approach by incorporating the contraction estimator, encompassing well-known methods such as restricted least squares, restricted ridge, restricted contraction estimators, and a novel modified, restricted Liu estimator (RLE), which can be written as follows:
d R L E = i = 1 n [ e ^ d i 1 g i i e i ( 1 g i i ) ( 1 h i i ) ( g i i H ˜ d i i ) ] 2
where h i i represents the diagonal elements from the matrix H ;
g i i represents the diagonal elements from the matrix G ;
H ˜ d i i represents the diagonal elements from the Liu hat matrix from (5) with cross-validation implemented to evaluate MSE for d [24],
and e ^ d i = y i y ^ d i is the ith residual at a specific value of d .
  • Mallows [25] discussed the interpretation of Cp plots by using the display as a basis for formally selecting a subset-regression model and extending to estimate the Liu estimator. The Liu parameter is defined as follows:
    d C p = S S R σ ^ 2 + 2 t r a c e ( H ˜ d i i ) ( n 2 ) ,
    where S S R = i = 1 n [ e ^ d i 1 g i i e i ( 1 g i i ) ( 1 h i i ) ( g i i H ˜ d i i ) ] 2 .
In this paper, we modify the Liu parameter from Mallows [25] to introduce the mean squares error, which is obtained via the mean sum of squares residual (SSR) as follows:
d M S E = i = 1 n [ e ^ d i 1 g i i e i ( 1 g i i ) ( 1 h i i ) ( g i i H ˜ d i i ) ] 2 p .
Furthermore, the correlation coefficient, often denoted as R-squared ( R 2 ), is a critical metric in regression analysis. It quantifies the proportion of the variance in the dependent variable that can be predicted from the independent variables. From the significance of R-squared, we propose the new Liu parameter by computing the correlation coefficient as R 2 = 1 S S R / S S T in the range between zero and one, which is rewritten as follows:
d R 2 = 1 i = 1 n [ e ^ d i 1 g i i e i ( 1 g i i ) ( 1 h i i ) ( g i i H ˜ d i i ) ] 2 i = 1 n [ e ^ d i 1 g 1 i i e i ( 1 g 1 i i ) ( 1 h i i ) ( g 1 i i G ˜ d i i ) ] 2 ,
where G ˜ d i i represents the diagonal elements from I H ˜ d i i and g 1 i i represents the diagonal elements from 1 h i i .
Scaling options are utilized to standardize the independent variables and assess their performance via the Liu estimator. The initial method, introduced by Liu [8], is the centered option, standardizing independent variables to have zero mean and unit variance. The scaled option further standardizes independent variables. Lastly, the SC option scales independent variables in correlation form, a concept explored by Belsley [26].

3. Simulation Study

In line with the previous section’s theoretical comparison among Liu estimators, a simulation study covered the Monte Carlo simulation using the R 4.2.1 programming language. The objective of the simulation study was to estimate and compare Liu parameters’ performance on the multiple regression model. The independent variables ( x ˜ i ) were generated from multivariate normal distribution of five, ten, and fifteen independent variables based on Toeplitz correlation ( ρ ) values of 0.1 and 0.9. The multivariate normal distribution based on parameter means ( μ ˜ ) and covariance matrix ( ) was simulated as multicollinearity between independent variables. The probability distribution is defined as follows:
f ( x ˜ i | μ ˜ , ) = exp { 1 2 ( x ˜ i μ ˜ ) T 1 ( x ˜ i μ ˜ ) } ( 2 π ) p | | ,
where x ˜ i = [ x i 1 x i 2 x i p ] , μ ˜ = [ μ 1 μ 2 μ p ] , i = 1 , 2 , , n .
This type of covariance matrix is mentioned in the Toeplitz correlation model, which implies that closely located independent variables have a high correlation and the correlation decreases for independent variables that are farther apart. A matrix with the following pattern characterizes the relationship:
= [ 1 ρ ρ 2 ρ p 1 ρ 1 ρ ρ p 2 ρ 2 ρ 1 ρ p 3 ρ p 1 ρ p 2 ρ p 3 1 ] ,
where the correlation coefficient or level of multicollinearity is given by 0.1 or 0.9.
The observations on the dependent variable are obtained from the multiple regression model as
y i = β 0 + x i 1 β 1 + x i 2 β 2 + + x i p β p + ε i , i = 1 , 2 , , n ,
where ε is generated from the normal distribution to be mean zero and variance one, and the regression coefficients ( β 0 , β 1 , , β p ) are defined as the constant values.
The data generated by the regression model were randomly split into 70% training data and 30% testing data. The data were then randomly sampled, and the training and testing data were used to calculate the MAPE (mean absolute percentage error). The performance criterion was used to judge the performance of different Liu parameters in estimating the Liu estimator. The evaluated MAPE is defined as follows:
Mean   Absolute   Percentage   Error   = 1 n i = 1 n | y i y ^ i y i | × 100 ,
where y i is the real dataset, and y ^ i is the estimated dataset. The average mean absolute percentage error of the OLS, ridge regression, and eight Liu parameters for five, ten, and fifteen variables are presented in Table 1, Table 2 and Table 3 according to their correlation coefficient (0.1 and 0.9). Table 4 presents the Liu parameter values to estimate the Liu estimator. An average of over 1000 replications was employed to approximate the average mean absolute percentage error. The minimum average of the mean absolute percentage error is shown in bold letters.
Table 1, Table 2 and Table 3 describe the simulated average mean absolute percentage error for two levels of Toeplitz correlation. In Table 1, Table 2 and Table 3, the smallest value of the MAPE is highlighted in bold letters. The simulation results showed that the modified Liu parameter had the smallest values of MAPE in terms of R-squared (dR2), so it outperformed the other methods, especially in the scaled option in Table 2. However, the dCp have the weakest performance in all cases. Furthermore, the MAPE of dmm, dcl, and dopt was equal to the dR2 in the center and scaled options in Table 1 and Table 2. The influence of sample sizes was observed in the sampled impact on estimation since the MAPE decreased when the sampling sizes decreased. The MAPE of the independent variables was reduced when the independent variables increased. The Liu parameter of the estimate Liu estimator is presented in Table 4; it varied with sample size, independent variables, and the level of correlation.
Table 4 presents the mean Liu parameters for multiple regression models with low (0.1) and high (0.9) Toeplitz correlation, comparing various methods and sample sizes across different numbers of independent variables. For low correlations, Liu parameters were relatively stable across methods as sample size increases. However, for high correlations, methods including dRLE and dILE showed significant increases in their Liu parameters as sample sizes and independent variables increased, particularly at higher independent variables. Methods like dCp and dMSE also exhibited higher Liu parameters with larger independent variables and correlation values. Overall, dRLE and dILE tended to perform best, especially when correlation was high, while dmm, dopt, and dcl showed less variation across different sample sizes. For a better understanding, we have plotted the Liu parameters for just dmm, dcl, dopt, dILE, dRLE, dMSE, and dR2 for multicollinearity 0.1 and 0.9 in Figure 1 and Figure 2, respectively.
The Liu parameters based on dmm, dcl, dopt, dILE, dRLE, and dR2 demonstrated stable values in all situations, as shown in Figure 1 and Figure 2. Furthermore, the dMSE decreased when sample sizes increase, especially at high correlation. In contrast, the dmm, dcl, dopt, dILE, and dRLE closely followed the Liu parameter in high correlation. The other Liu parameters differ edslightly for dmm, dcl, dopt, dILE, and dRLE. The modified Liu parameters (dR2) were close to stable values of 1 in all cases.

4. Application in Actual Data

We employed Liu regression to distinguish between blood donors’ laboratory values and patients’ ages using the Hepatitis C patients dataset sourced from UCI Machine Learning. This dataset was retrieved from https://archive.ics.uci.edu/dataset/503/hepatitis+c+virus+hcv+for+egyptian+patients (accessed on 26 September 2024). and contained 589 records. The dependent variable was the age of the patients and independent variables included albumin (ALB), total protein (PROT), cholinesterase (CHE), cholesterol (CHOL), alkaline phosphatase (ALP), alanine aminotransferase (ALT), creatinine (CREA), bilirubin (BIL), aspartate aminotransferase (AST), and gamma-glutamyl transferase (GGT).
For checking multicollinearity data, Pearson’s correlation analysis was employed to ascertain any potential relationships among the ten continuous independent variables—the Pearson’s correlation coefficients of the independent variables are listed in Table 5 and illustrated in Figure 3. The null hypothesis stated that there was no relationship between the two variables, and the alternative hypothesis assessed the significance of these relationships. A p-value below 0.05 for the t-statistics signified a rejected null hypothesis and meant a significant relationship between the two variables, as demonstrated in Table 5.
Our findings showed that a moderately significant relationship, such as between 0.41–0.6, was observed in most cases. A weak level of considerable relationship was evident in some instances, such as between 0.2 and 0.4. Most of the independent variables exhibited a significant relationship, with the exceptions being between total protein (PROT) and alkaline phosphatase (ALP), alanine aminotransferase (ALT), creatinine (CREA), bilirubin (BIL), aspartate aminotransferase (AST), and gamma-glutamyl transferase (GGT).
The computed Pearson correlation matrix displaying different colors in Figure 3, derived from Table 5, utilizes varying shades to enhance clarity. Light shading indicates moderate correlations, while dark shading represents strong correlations. Most of the independent variables are depicted with moderate and light shading, suggesting inter-variable correlations or multicollinearity issues. The data from the entire dataset were divided into 70% training and 30% testing data and then randomly sampled. The average mean absolute percentage errors shown in Table 6 were computed using OLS, ridge regression, and eight Liu parameters with three scale options by generating 1000 replications testing all datasets. The selection of sample sizes of 50, 100, 150, and 200 mirrored those in the simulation data.
Table 6 reveals that the modified Liu parameters (dMSE and dR2) exhibited consistent and often superior accuracy prediction across all scenarios. The dMSE and dR2 methods notably demonstrated commendable estimation with all sample sizes, better than the original method using OLS and ridge regression. Consequently, the Liu parameter adjustment using the dMSE and dR2 methods for ten independent variables consistently surpassed expectations and aligned closely with the simulation outcomes. Although there were slight discrepancies in estimation when the sample sizes increased, substantial performance enhancements were evident with small sample sizes from within the Hepatitis C dataset. Using a large sample size is more efficient than using an entire dataset, both in estimation accuracy and processing time.

5. Discussion

The simulated results presented in Table 1, Table 2, Table 3 and Table 4 revealed that the mean average percentage error was affected by the number of independent variables and the sample size. The modified Liu estimator (dR2) exhibited superior performance with all independent variables, correlation levels, and sample sizes, whereas dMSE slightly differed from dR2. However, the mean average percentage error for the significant independent variables was lower than that for the smaller independent variables. The increase in the correlation coefficient had a weak impact on the estimation in most methods, as indicated by the slight variation in the mean average percentage error.
In the same direction, the real data results in Table 6 showcase that the proposed Liu parameters (dMSE and dR2) achieved the smallest mean average percentage error for the datasets with eight independent variables. It was observed that the real data’s independent variables exhibited skewed distributions, as illustrated in Figure 4, confirmed by the Shapiro–Wilk test [27], indicating non-normality. Altukhaes et al. [28] introduced robust Liu estimators to combat multicollinearity and outlier problems in the linear regression model. So, the dCp effectively estimated large sample sizes using the center option. Notably, the discrepancy between the simulated and real data results emphasized the importance of considering the data source when selecting the Liu parameter.
The proposed Liu parameters (dMSE and dR2) emerged as the most suitable for the estimator. Medical datasets are widely used to enhance predictive medical diagnosis patient classification. However, the Hepatitis C dataset used was a medical dataset indicating the patients’ ages, representing the multiple regression model with the multicollinearity problem among the independent variables. Oladapo et al. [29] introduced a novel modified Liu ridge-type estimator for estimating parameters in the general linear model, employing Portland cement data as a case study akin to medical data. Their proposed estimator demonstrated superior performance under certain conditions. Baber et al. [23] adapted Liu estimators to address multicollinearity issues in linear regression, utilizing tobacco data, advocating for adoption of these new estimators by practitioners facing high to severe multicollinearity among independent variables. Hammond et al. [30] employed a Liu estimator for inverse Gaussian regression, tackling multicollinearity in chemistry datasets. While considering the Liu estimator for addressing multicollinearity based on multiple regression, the proposed Liu estimator outperformed the other. In summary, we recommend the Liu estimator using the modified Liu parameter for high multicollinearity.
The modified Liu parameters have some critical limitations and challenges. These methods require more processing power and time than traditional methods, which could be problematic for large-scale applications or users with limited computing resources. Furthermore, the risk of overfitting might be too closely tailored to the specific datasets used in the study, leading to good performance on those datasets but poor performance on new, unseen data. More testing on diverse datasets is needed to ensure the methods do not lead to overfitting.

6. Conclusions

This paper introduces a Liu parameter designed to enhance the estimation of the Liu estimator in multiple regression models affected by multicollinearity among independent variables. The selection of this Liu parameter was carefully examined and compared to other methods to determine its effectiveness. Simulation studies demonstrated that the modified Liu parameter based on R-squared consistently achieved the lowest mean absolute percentage error, particularly in the scaled option, outperforming alternative approaches. Sample size, the number of independent variables, and correlation levels influenced the Liu parameter. Specifically, smaller sample sizes and more independent variables contributed to efficient estimators. Additionally, correlation levels significantly impacted the Liu parameter, with small correlations showing positive effects and large correlations leading to higher values. Furthermore, the modified Liu parameter outperformed the ordinary least squares method in simulation and real data scenarios. This Liu parameter substantially improves the estimator, especially in regression models with multicollinearity at varying correlation levels. As a result, utilizing a Liu parameter within the zero range is recommended, which can consistently provide the most accurate estimation.
Accurate estimation of correlation structures within the data is crucial to enhancing the reliability of the proposed methods. However, this can be challenging in practice, mainly when dealing with noisy, incomplete, outlier datasets, which may affect the overall performance of the methods. Therefore, further research should be conducted to address these estimation challenges.

Funding

This work was financially supported by King Mongkut’s Institute of Technology Ladkrabang [2567-02-05-010].

Data Availability Statement

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Daoud, J.I. Multicollinearity and regression analysis. J. Phys. Conf. Ser. 2017, 949, 012009. [Google Scholar] [CrossRef]
  2. Shrestha, N. Detecting multicollinearity in regression analysis. Am. J. Appl. Math. 2020, 8, 39–42. [Google Scholar] [CrossRef]
  3. Liang, Y.; Rosen, D.V.; Rosen, T.V. On properties of Toeplitz-type covariance matrices in models with nested random effects. Stat. Pap. 2021, 62, 2509–2528. [Google Scholar] [CrossRef]
  4. Qi, B.; Xu, L.; Liu, X. Improved multiple-Toeplitz matrices reconstruction method using quadratic spatial smoothing for coherent signals DOA estimation. Eng. Comput. 2024, 41, 333–346. [Google Scholar] [CrossRef]
  5. Dawoud, I.; Abonazel, M.R.; Awwad, F.A. Modified Liu estimator to address the multicollinearity problem in regression models: A new biased estimation class. Sci. Afr. 2022, 17, e01372. [Google Scholar] [CrossRef]
  6. Jahufer, A. Detecting global influential observations in Liu regression model. Open J. Stat. 2013, 3, 5–11. [Google Scholar] [CrossRef]
  7. Karlsson, P.; Månsson, K.; Golam Kibria, B.M. A Liu estimator for the beta regression model and its application to chemical data. J. Chemom. 2020, 24, 2–16. [Google Scholar] [CrossRef]
  8. Liu, K. A new class of biased estimate in linear regression. Commun. Stat. Theory Methods 1993, 22, 393–402. [Google Scholar]
  9. Liu, X.-Q. Improved Liu Estimation in a linear regression model. J. Stat. Plan. Inference 2011, 141, 189–196. [Google Scholar] [CrossRef]
  10. Yang, H.; Xu, J. An alternative stochastic restricted Liu estimator in linear regression. Stat. Pap. 2009, 50, 639–647. [Google Scholar] [CrossRef]
  11. Hubert, M.H.; Wijekoon, P. Improvement of the Liu estimator in linear regression model. Stat. Pap. 2006, 47, 471–479. [Google Scholar] [CrossRef]
  12. Akdeniz, F.; Erol, H. Mean squared error matrix comparison of some biased estimators in linear regression. Commun. Stat. Theory Methods 2003, 32, 2389–2413. [Google Scholar] [CrossRef]
  13. Özkale, M.R.; Kaçiranlar, S. The restricted and unrestricted two-parameter estimators. Commun. Stat. Theory Methods 2007, 36, 2707–2725. [Google Scholar] [CrossRef]
  14. Suhail, M.; Babar, I.; Khan, Y.A.; Imran, M.; Nawaz, Z. Quantile-based estimation of Liu parameter in the linear regression model: Applications to Portland cement and US crime data. Math. Probl. Eng. 2021, 2021, 1–11. [Google Scholar] [CrossRef]
  15. Lukman, A.F.; Golam Kibria, B.M.; Ayinde, K.; Jegede, S.L. Modified one-parameter Liu estimator for the linear regression model. Mod. Sim. Eng. 2020, 2020, 1–17. [Google Scholar] [CrossRef]
  16. Hoerl, A.E.; Kennard, R.W. Ridge Regression: Biased estimation for nonorthogonal problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
  17. Lukman, A.F.; Ayinde, K.; Kun, S.S.; Adewuyi, E.T. A Modified new two-parameter estimator in a linear regression model. Mod. Sim. Eng. 2019, 2019, 1–10. [Google Scholar] [CrossRef]
  18. Filzmoser, P.; Kurnaz, F.S. A robust Liu regression estimator. Commun. Stat. Simul. Comput. 2018, 47, 432–443. [Google Scholar] [CrossRef]
  19. Druilhet, P.; Mom, A. Shrinkage Structure in Biased Regression. J. Multivar. Anal. 2008, 99, 232–244. [Google Scholar] [CrossRef]
  20. Akdeniz, F.; Kacįranlar, S. More on the new biased estimator in linear regression. Sankhya Indian J. Stat. Ser. B 2001, 63, 321–325. [Google Scholar]
  21. Duran, E.R.; Akdeniz, F.; Hu, H. Efficiency of a Liu-type estimator in semiparametric regression models. J. Comput. Appl. Math. 2011, 235, 1418–1428. [Google Scholar] [CrossRef]
  22. Abdelwahab, M.M.; Abonazel, M.R.; Hammad, A.T.; El-Masry, A.M. Modified two-parameter Liu estimator for addressing multicollinearity in the Poisson regression model. Axioms 2024, 13, 46. [Google Scholar] [CrossRef]
  23. Babar, I.; Ayed, H.; Chand, S.; Suhail, M.; Khan, Y.A.; Marzouki, R. Modified Liu estimators in the linear regression model: An application to Tobacco data. PLoS ONE 2021, 16, e0259991. [Google Scholar] [CrossRef]
  24. Özkale, M.R.; Kaçiranlar, S. A Prediction-Oriented criterion for choosing the biasing parameter in Liu estimation. Commun. Stat. Theory Methods 2007, 36, 1889–1903. [Google Scholar] [CrossRef]
  25. Mallows, C.L. Some Comments on Cp. Technometrics 2012, 42, 87–94. [Google Scholar]
  26. Belsley, D.A. A Guide to using the collinearity diagnostics. Com. Sci. Eco. Mana. 1991, 4, 33–50. [Google Scholar] [CrossRef]
  27. Shapiro, S.S.; Wilk, M.P. An analysis of variance test for normality (complete samples). Biometrika 1965, 52, 591–611. [Google Scholar] [CrossRef]
  28. Altukhaes, W.D.; Roozbeh, M.; Mohamed, N.A. Robust Liu estimator used to combat some challenges in partially linear regression model by improving LTS algorithm using semidefinite programming. Mathematics 2024, 12, 2787. [Google Scholar] [CrossRef]
  29. Oladapo, O.J.; Owolabi, A.T.; Idowu, J.I.; Ayinde, K. A new modified Liu Ridge-Type estimator for the linear regression model: Simulation and application. Int. J. Clin. Biostat. Biom. 2022, 8, 1–14. [Google Scholar]
  30. Hammood, N.M.; Jabur, D.M.; Algamal, Z.Y. A Liu estimator in inverse Gaussian regression model with application in chemometrics. Math. Stat. Eng. Appl. 2022, 71, 248–266. [Google Scholar]
Figure 1. Estimated Liu parameter values for p = 5, 10, and 15, and the level of correlation at 0.1.
Figure 1. Estimated Liu parameter values for p = 5, 10, and 15, and the level of correlation at 0.1.
Mathematics 12 03139 g001
Figure 2. Estimated Liu parameter values for p = 5, 10, and 15, and the level of correlation at 0.9.
Figure 2. Estimated Liu parameter values for p = 5, 10, and 15, and the level of correlation at 0.9.
Mathematics 12 03139 g002
Figure 3. Correlation graph for the ten independent variables.
Figure 3. Correlation graph for the ten independent variables.
Mathematics 12 03139 g003
Figure 4. The histogram of ten independent variables.
Figure 4. The histogram of ten independent variables.
Mathematics 12 03139 g004
Table 1. The average mean absolute percentage error of Liu estimators for the Toeplitz correlation of the center option.
Table 1. The average mean absolute percentage error of Liu estimators for the Toeplitz correlation of the center option.
pMethods ρ = 0 . 1 ρ = 0 . 9
n = 50 n = 100n = 150n = 200n = 50 n = 100n = 150n = 200
5OLS0.8100.7220.7580.7470.8150.7660.7600.752
Ridge0.8340.7810.7730.7620.7760.7630.7540.755
dmm0.6000.6710.6940.7000.7050.7250.7870.758
dcl0.5960.6710.6940.7000.7370.7250.7260.718
dopt0.5990.6710.6940.7000.6500.7000.7200.719
dILE0.6200.6890.7350.7580.7420.7580.7820.796
dRLE0.6300.7520.7750.7920.7850.8240.8470.886
dCp1.1500.8230.7610.7370.9010.7530.7330.721
dMSE0.6110.6740.6960.7000.5990.6720.6970.704
dR20.5930.6700.6940.7000.5960.6710.6960.703
10OLS0.8420.7400.7130.6980.8590.7530.7260.715
Ridge0.8750.7670.7350.7220.7630.7270.7160.713
dmm0.4650.5700.6040.6190.6580.6480.6540.682
dcl0.4500.5690.6040.6190.7360.7350.7680.783
dopt0.4610.5700.6040.6190.5450.6270.6570.667
dILE0.6350.7050.7250.7600.6870.7390.7650.783
dRLE0.7200.7310.7430.7680.7320.7890.8520.876
dCp2.5801.3200.9690.8331.6301.0200.8110.737
dMSE0.4600.5750.6070.6200.5720.5770.6140.631
dR20.4390.5670.6040.6190.4480.5760.6140.631
15OLS0.8840.7220.6850.6650.9100.7460.7070.687
Ridge1.0100.7610.7090.6910.7480.6960.6890.673
dmm0.3370.4780.5300.5520.4900.6250.7210.752
dcl0.2860.4760.5290.5520.3650.6100.7250.785
dopt0.3150.4780.5300.5520.4890.5760.6740.712
dILE0.6520.6850.6960.7140.7180.7520.7860.816
dRLE0.7230.7580.7930.8120.7680.8230.8760.917
dCp4.6902.1301.4101.1002.7901.5201.0900.861
dMSE0.2880.4830.5330.5541.1500.5050.5460.570
dR20.2550.4740.5280.5510.2620.4880.5450.570
Table 2. The average mean absolute percentage error of Liu estimators for Toeplitz correlation of the scaled option.
Table 2. The average mean absolute percentage error of Liu estimators for Toeplitz correlation of the scaled option.
pMethods ρ = 0 . 1 ρ = 0 . 9
n = 50 n = 100n = 150n = 200n = 50 n = 100n = 150n = 200
5OLS0.8100.7220.7580.7470.8150.7660.7600.752
Ridge0.8340.7810.7730.7620.7760.7630.7540.755
dmm0.5990.6710.6940.7000.6310.7420.7800.795
dcl0.5950.6710.6940.7000.6280.7450.7680.783
dopt0.5990.6710.6940.7000.6550.6710.7320.720
dILE0.6240.7380.7620.7870.6320.7470.7640.725
dRLE0.6150.7250.7140.7730.6280.7320.7510.738
dCp1.1000.8160.7580.7360.8780.7490.7320.721
dMSE0.6090.6740.6960.7000.5990.6720.6970.704
dR20.5930.6700.6940.6990.5960.6710.6960.703
10OLS0.8420.7400.7130.6980.8590.7530.7260.715
Ridge0.8750.7670.7350.7220.7630.7270.7160.713
dmm0.4630.5700.6040.6190.5870.6580.7820.798
dcl0.4490.5690.6040.6190.5480.6450.6590.673
dopt0.4590.5700.6040.6190.5230.6270.6490.655
dILE0.5240.5680.5870.6280.6470.6580.6710.690
dRLE0.6410.6780.6890.6900.7450.7580.7650.778
dCp2.4801.2900.9590.8271.5901.0000.8120.739
dMSE0.4580.5740.6070.6200.5700.5770.6140.631
dR20.4390.5670.6040.6190.4480.5760.6140.631
15OLS0.8840.7220.6850.6650.9100.7460.7070.687
Ridge1.0100.7610.7090.6910.7480.6960.6890.673
dmm0.3350.4780.5300.5520.4810.5970.6470.672
dcl0.2850.4760.5290.5520.3580.5270.6410.687
dopt0.3130.4770.5300.5520.4570.5580.6270.690
dILE0.5410.5720.6520.6870.6250.6870.7540.798
dRLE0.6510.7450.7880.7900.7250.7840.8520.886
dCp4.5702.0801.3901.0902.7301.5101.0800.865
dMSE0.2870.4820.5320.5541.1300.5050.5460.570
dR20.2550.4740.5280.5510.2320.4880.5450.570
Table 3. The average of mean absolute percentage error of Liu estimators for Toeplitz correlation of the SC option.
Table 3. The average of mean absolute percentage error of Liu estimators for Toeplitz correlation of the SC option.
pMethods ρ = 0 . 1 ρ = 0 . 9
n = 50 n = 100n = 150n = 200n = 50 n = 100n = 150n = 200
5OLS0.8100.7220.7580.7470.8150.7660.7600.752
Ridge0.8340.7810.7730.7620.7760.7630.7540.755
dmm0.9550.9500.9290.9441.1431.1211.1051.165
dcl0.7840.8400.8470.8580.8540.9630.9710.985
dopt0.9310.9440.9270.9420.9520.9680.9740.957
dILE0.8570.8690.8740.8980.8740.8850.8930.901
dRLE0.7470.7650.7780.7850.7830.7870.7980.801
dCp8.7709.3009.4709.5605.4706.4206.8307.070
dMSE1.3801.5901.6701.6900.7260.9371.0901.170
dR20.5960.6730.6970.7020.5960.6710.6960.703
10OLS0.8420.7400.7130.6980.8590.7530.7260.715
Ridge0.8750.7670.7350.7220.7630.7270.7160.713
dmm1.0601.0301.0301.0501.1451.1361.1541.165
dcl0.7850.8560.8820.9000.8130.8650.8990.923
dopt0.9971.0201.0201.0401.1401.1581.1691.752
dILE0.8640.8750.8870.9140.9320.9560.9780.992
dRLE0.7530.7750.7840.7980.8410.8520.8610.887
dCp19.6021.7022.2022.4010.2012.3013.4013.90
dMSE1.0601.7201.9001.9802.0900.6950.6910.829
dR20.4410.5690.6050.6200.4480.5760.6140.631
15OLS0.8840.7220.6850.6650.9100.7460.7070.687
Ridge1.0100.7610.7090.6910.7480.6960.6890.673
dmm1.2201.0601.0901.1101.3241.2141.3651.427
dcl0.7630.8580.9010.9240.8250.93200.9350.957
dopt1.0401.0501.0801.1001.1541.1891.1922.014
dILE0.9620.9740.9850.9901.0151.1171.1571.187
dRLE0.8570.8640.8720.88700.9210.9380.9570.979
dCp29.8034.7036.1036.7014.6018.6020.2021.30
dMSE0.7371.5701.9202.0705.8101.5400.7720.613
dR20.2560.4750.5290.5520.2620.4880.5450.570
Table 4. The mean Liu parameters for Toeplitz correlation in the multiple regression models.
Table 4. The mean Liu parameters for Toeplitz correlation in the multiple regression models.
pMethods ρ = 0.1 ρ = 0.9
n = 50 n = 100n = 150n = 200n = 50 n = 100n = 150n = 200
5dmm0.5770.5980.6330.6160.6130.6790.7540.736
dcl0.6900.6920.7070.6950.7480.7890.8980.754
dopt0.5740.6020.6350.6180.6870.7470.7960.723
dILE0.3740.4510.5400.6330.3800.5610.6350.745
dRLE0.4520.5630.6320.7410.4780.5980.6930.774
dCp6.5906.8106.8806.9105.2205.9406.2506.410
dMSE0.2060.0940.0590.0440.9060.5000.3560.281
dR20.9640.9610.9600.9600.9900.9890.9890.989
10dmm0.5290.5930.6060.5990.6680.6970.7210.632
dcl0.6800.6920.6930.6890.7450.7680.7710.740
dopt0.5620.5990.6090.6010.6510.6750.6870.625
dILE0.3910.4730.5500.6850.4120.5870.6780.787
dRLE0.4840.6980.6730.7870.5840.6350.7890.874
dCp11.0011.6011.7011.807.9409.56010.3010.70
dMSE0.5200.2050.1290.0932.3701.1600.8210.637
dR20.9840.9820.9810.9810.9970.9970.9970.997
15dmm0.4510.5910.5960.5950.5870.6250.6470.654
dcl0.6690.6900.6880.6860.7250.7410.7580.787
dopt0.5360.5990.6000.5980.6740.6680.6850.674
dILE0.3870.4120.5630.6940.4320.5980.6890.814
dRLE0.5540.7150.7680.8790.6230.7890.8230.880
dCp15.1016.3016.6016.7010.4013.0014.2014.90
dMSE1.0200.3430.2050.1474.7201.9401.3201.020
dR20.9910.9890.9880.9880.9990.9990.9990.998
Table 5. Pearson correlation matrix for the relationships between ten independent variables.
Table 5. Pearson correlation matrix for the relationships between ten independent variables.
VariablesABLPROTCHECHOLALPALTCREABILASTGGT
ABL
p-value
1.000.57 *
<0.05
0.36 *
<0.05
0.21 *
<0.05
−0.15 *
0.01
0.04
1.00
0.00
1.00
−0.17 *
<0.05
−0.18 *
<0.05
-0.15 *
0.01
PROT
p-value
-1.000.31 *
<0.05
0.25 *
<0.05
−0.06
1.00
0.02
1.00
−0.03
1.00
−0.05
1.00
0.02
1.00
−0.04
1.00
CHE
p-value
--1.0000.43 *
<0.05
0.03
1.00
0.22 *
<0.05
−0.01
1.00
−0.32 *
<0.05
−0.20 *
<0.05
−0.10
0.36
CHOL
p-value
---1.0000.13
0.05
0.15 *
0.01
−0.05
1.00
−0.18 *
<0.05
−0.20 *
<0.05
0.01
1.00
ALP
p-value
----1.0000.22 *
<0.05
0.15 *
<0.05
0.06
1.00
0.07
1.00
0.46 *
<0.05
ALT
p-value
-----1.000−0.04
1.00
−0.11
0.18
0.20
<0.05
0.22
<0.05
CREA
p-value
------1.0000.02
1.00
−0.02
1.00
0.13
0.05
BIL
p-value
-------1.000.31 *
<0.05
0.21 *
<0.05
AST
p-value
--------1.000.14 *
<0.05
GGT
p-value
---------1.00
Note. *, multicollinearity between two variables.
Table 6. The average mean absolute percentage error was estimated on samples sizes of 50, 100, 150, 200, and 589.
Table 6. The average mean absolute percentage error was estimated on samples sizes of 50, 100, 150, 200, and 589.
Sample SizesOLS
Ridge
Scale OptionLiu Parameters
dmmdcldoptdILEdRLEdCpdMSEdR2
n = 50 Liu parameter0.5470.6350.7140.8830.85411.86.630.436
Centered17.2518.1415.1316.1317.6912.21110.4
24Scaled17.7918.3616.3916.9918.7529.518.410.5
19.01SC18.6518.9817.5817.5818.9688.647.511.8
n = 100 Liu parameter0.6980.7410.8960.7210.78511.92.690.271
Centered16.2516.3315.0616.0516.9613.813.813.8
19.8Scaled16.9817.5216.3516.3718.6916.313.813.8
18.32SC16.6917.8516.7816.6918.2367.717.414.9
n = 150 Liu parameter0.7850.8960.7410.6240.693121.630.215
Centered15.3716.1515.8315.8616.7414.91515
18.7Scaled15.5416.9816.6515.3217.16181515
18.03SC15.6216.9616.1515.8717.3656.815.215.8
n = 200 Liu parameter0.6320.6690.7250.6930.627121.190.185
Centered15.7316.1215.5215.9516.0515.415.415.5
18.2Scaled15.6916.2116.2015.4417.0515.515.415.5
17.96SC15.5416.2816.87156817.1350.915.416.2
n = 589 Liu parameter0.6140.6580.6980.6350.60211.990.3770.1269
Centered16.7016.7516.8717.2817.6916.5716.5916.59
17.41Scaled16.8716.9816.8717.1417.9816.7216.5916.59
17.50SC17.5817.2517.3217.5817.8537.5516.9017.09
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Araveeporn, A. Modified Liu Parameters for Scaling Options of the Multiple Regression Model with Multicollinearity Problem. Mathematics 2024, 12, 3139. https://doi.org/10.3390/math12193139

AMA Style

Araveeporn A. Modified Liu Parameters for Scaling Options of the Multiple Regression Model with Multicollinearity Problem. Mathematics. 2024; 12(19):3139. https://doi.org/10.3390/math12193139

Chicago/Turabian Style

Araveeporn, Autcha. 2024. "Modified Liu Parameters for Scaling Options of the Multiple Regression Model with Multicollinearity Problem" Mathematics 12, no. 19: 3139. https://doi.org/10.3390/math12193139

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop