Modified Liu Parameters for Scaling Options of the Multiple Regression Model with Multicollinearity Problem

Araveeporn, Autcha

doi:10.3390/math12193139

Open AccessArticle

Modified Liu Parameters for Scaling Options of the Multiple Regression Model with Multicollinearity Problem

by

Autcha Araveeporn

Department of Statistics, School of Science, King Mongkut’s Institute of Technology Ladkrabang, Bangkok 10520, Thailand

Mathematics 2024, 12(19), 3139; https://doi.org/10.3390/math12193139

Submission received: 22 August 2024 / Revised: 27 September 2024 / Accepted: 5 October 2024 / Published: 7 October 2024

(This article belongs to the Special Issue Application of Regression Models, Analysis and Bayesian Statistics)

Download

Browse Figures

Versions Notes

Abstract

:

The multiple regression model statistical technique is employed to analyze the relationship between the dependent variable and several independent variables. The multicollinearity problem is one of the issues affecting the multiple regression model, occurring in regard to the relationship among independent variables. The ordinal least square is the standard method to evaluate parameters in the regression model, but the multicollinearity problem affects the unstable estimator. Liu regression is proposed to approximate the Liu estimators based on the Liu parameter, to overcome multicollinearity. In this paper, we propose a modified Liu parameter to estimate the biasing parameter in scaling options, comparing the ordinal least square estimator with two modified Liu parameters and six standard Liu parameters. The performance of the modified Liu parameter is considered, generating independent variables from the multivariate normal distribution in the Toeplitz correlation pattern as the multicollinearity data, where the dependent variable is obtained from the independent variable multiplied by a coefficient of regression and the error from the normal distribution. The mean absolute percentage error is computed as an evaluation criterion of the estimation. For application, a real Hepatitis C patients dataset was used, in order to investigate the benefit of the modified Liu parameter. Through simulation and real dataset analysis, the results indicate that the modified Liu parameter outperformed the other Liu parameters and the ordinal least square estimator. It can be recommended to the user for estimating parameters via the modified Liu parameter when the independent variable exhibits the multicollinearity problem.

Keywords:

Liu parameter; multicollinearity; multiple regression; Toeplitz correlation

MSC:

62J02; 62J05; 62J07; 62J20; 62P25

1. Introduction

Regression analysis is a potent statistical tool that reveals the connections between one or more independent variables and a dependent variable. Essential in data analysis and predictive modeling, it finds broad application across fields such as economics, finance, healthcare, and social sciences. However, regression models must meet certain assumptions to provide reliable and valid results. These assumptions form the foundation of regression analysis and guide researchers in interpreting results accurately. One problematic assumption to avoid is the linear relationship among independent variables called multicollinearity, which occurs when two or more independent variables are correlated, increasing the standard error of the coefficients. This escalation in standard errors can render the coefficients of certain independent variables statistically insignificant despite their potential significance. In essence, multicollinearity distorts the interpretation of variables by inflating their standard errors [1]. Shrestha [2] discussed the primary techniques for investigating multicollinearity using questionnaires for survey data to support customer satisfaction.

The Toeplitz correlation structure is a specific type of correlation pattern frequently appearing in real-world datasets such as financial time series data, spatial models, climate data, correlation in DNA sequences, and time-dependent traffic patterns. The properties of Toeplitz covariance matrices have been extensively applied across various fields, with early examples found in psychometric and medical research [3]. Furthermore, the Toeplitz correlation structure is a part of multicollinearity, often arising in datasets with variables exhibiting inherent relationships. Qi et al. [4] utilized a multiple-Toeplitz matrix reconstruction method with quadratic spatial smoothing to enhance direction-of-arrival estimation performance for coherent signals under low signal-to-noise ratio conditions.

Traditional regression techniques often struggle to handle multicollinearity effectively, leading to biased results and unreliable predictions. Researchers have developed various methods to mitigate these challenges, including Liu regression. Liu regression is a technique designed to address multicollinearity in regression analysis. It combines the principles of ridge regression with orthogonalization to effectively mitigate the effects of multicollinearity. Dawoud et al. [5] devised a novel modified Liu estimator to employ multicollinearity in a regression model with a single parameter, incorporating two biasing parameters, with at least one designed to mitigate this issue. Jahufer [6], on the other hand, employed the Liu estimator to alleviate the impact of multicollinearity and the influence of specific observations, devising approximate deletion formulas for identifying influential points.

In predictive analytics, the search for accurate models that can efficiently handle complex datasets while offering robust predictions is perpetual. Among the array of methodologies, the Liu regression model enables better control over the trade-off between bias and variance, leading to more stable and reliable parameter estimates. The flexibility of the Liu estimator makes it a valuable tool in the modern statistician’s toolkit, particularly in fields where predictive accuracy is critical. Karlsson et al. [7] introduced a Liu estimator tailored for the beta regression model with a fixed dispersion parameter, applicable in various practical scenarios where the correlation level among the regressors varies.

Liu regression [8] involves selecting a Liu estimator to balance the bias–variance trade-off. The optimal value of the Liu estimator is typically chosen through techniques such as cross-validation. The Liu estimator, named after its developer, is essential in managing multicollinearity. It is particularly associated with methodologies like ridge regression with orthogonalization, often abbreviated as Liu regression. Liu [9] enhanced the Liu estimator within the linear regression model by considering the biasing parameter under the prediction sum-of-squares criterion. Yang and Xu [10] proposed an alternative stochastic restricted Liu estimator for the parameter vector in a linear regression model, incorporating additional stochastic linear restrictions. Hubert and Wijekoon [11] investigated a novel Liu-type biased estimator, termed the stochastic restricted Liu estimator, and examined its efficiency.

The improvement of the Liu estimator transformed the multiple regression model to canonical form [12] to select a biasing parameter called the Liu parameter. The appropriate Liu parameters were developed to obtain the minimum mean square error in the estimation. Liu [8,9] applied the iterative method to estimate the Liu parameter as the minimum mean squares error in the Liu estimator. Özkale and Kaçiranlar [13] proposed a new restricted Liu parameter by computing the predicted residual error sum of squares to determine the biasing parameter. Dawoud et al. [5] proposed a new Liu estimator using the known mean squares error criterion to handle the multicollinearity problem. Suhail et al. [14] developed a new method of biasing parameters to mitigate the multicollinearity data. Lukman et al. [15] introduced a modified Liu estimator to address multicollinearity issues within the linear regression model.

In this paper, we propose two competing Liu parameters, following mean squares error and R-squared approaches, to estimate the Liu estimator via a multiple regression model with the multicollinearity problem. We measure this performance using the minimum average mean absolute percentage errors for the simulation and real dataset. We also consider the scale option of independent variables including the center, correlation form, and standardization.

This paper is structured as follows: Section 2 presents the multiple regression estimators and discusses the Liu estimator through the reparameterization of Liu regression in canonical form, then in comparison with the OLS estimator. Section 3 describes generation of the independent and dependent variables to evaluate the performance estimators. Section 4 applies a real dataset to validate the simulation results. Section 5 discusses the findings, followed by the conclusion in Section 6.

2. Liu Regression

The multiple regression model is expressed in matrix form as follows:

y = X β + ε,

(1)

where

y

is the

n \times 1

a column vector of the dependent variable,

X

is the

n \times (p + 1)

independent variable matrix,

β

is the

(p + 1) \times 1

multiple regression parameter vectors, and

ε

is the

n \times 1

error vector. The following assumptions of error are made:

E (ε) = 0

,

E (ε ε^{'}) = σ^{2} I_{n}

, and

V a r (ε) = σ^{2} I_{n} .

The efficient parameters (

β

) in (1) are commonly estimated to obtain the ordinary least squares (OLS) estimator in (2), as follows:

{\hat{β}}_{O L S} = {(X^{'} X)}^{- 1} X^{'} y .

(2)

The estimation error of

{\hat{β}}_{O L S}

is evaluated by computing:

\begin{array}{l} {\hat{β}}_{O L S} - β & = {(X^{'} X)}^{- 1} X^{'} y - β \\ = {(X^{'} X)}^{- 1} X^{'} (X β + ε) - β \\ = {(X^{'} X)}^{- 1} X^{'} ε . \end{array}

(3)

The bias, variance (Var), and mean squares error (MSE) of the OLS estimator are computed from (3) as follows:

B i a s ({\hat{β}}_{O L S}) = E ({\hat{β}}_{O L S} - β) = E [{(X^{'} X)}^{- 1} X^{'} ε] = 0, \begin{matrix} V a r ({\hat{β}}_{O L S}) & = E ({\hat{β}}_{O L S} - β) {({\hat{β}}_{O L S} - β)}^{'} = E [{(X^{'} X)}^{- 1} X^{'} ε ε^{'} X {(X^{'} X)}^{- 1}] \\ = σ^{2} {(X^{'} X)}^{- 1}, \end{matrix} \begin{matrix} M S E ({\hat{β}}_{O L S}) = & V a r ({\hat{β}}_{O L S}) + {(B i a s ({\hat{β}}_{O L S}))}^{'} (B i a s ({\hat{β}}_{O L S})) \\ = σ^{2} {(X^{'} X)}^{- 1} . \end{matrix}

Hoel and Kennard [16] proposed ridge regression, a powerful technique for handling multicollinearity in linear regression models. Ridge regression addresses the issue by adding a penalty term to the ordinary least squares (OLS) estimation process, shrinking the coefficients towards zero. This regularization helps reduce model complexity and improve prediction accuracy. The ridge regression estimator is expressed as follows:

{\hat{β}}_{R i d g e} = {(X^{'} X + λ I_{p})}^{- 1} X^{'} y,

(4)

where

λ

is the regularization parameter controlling the shrinkage amount.

From the above computation, the OLS estimator presents the unbiased estimator, which reduces the performance in estimating parameters in relation to the multicollinearity of independent variables. The diagonal matrix causes multicollinearity and inflation, increasing the estimated variance and mean squares error. To overcome this problem, Liu [8] proposed the Liu estimator, which performs better than the OLS estimator [13,17]. The Liu estimator is written based on the OLS as follows:

{\hat{β}}_{L i u} = {(X^{'} X + I_{p})}^{- 1} (X^{'} X + d_{L i u} I_{p}) {\hat{β}}_{O L S}, 0 < d_{L i u} < 1,

(5)

where

d_{L i u}

is the Liu parameter in terms of the biasing parameter and

I

is the identity matrix. The OLS form (2) and Liu estimators from (5) are related to the independent variables affected by the multicollinearity problem because they depend on the OLS estimator.

The estimation error of

{\hat{β}}_{O L S}

is evaluated as the OLS estimator by comparing the Liu estimator and the parameter of the multiple regression model:

{\hat{β}}_{L i u} - β = {(X^{'} X + I_{p})}^{- 1} (X^{'} X + d_{L i u} I_{p}) {\hat{β}}_{O L S} - β .

(6)

The bias [18], variance (Var), and mean square error (MSE) of the Liu estimator from (6) are proposed in the following:

B i a s ({\hat{β}}_{L i u}) = E ({\hat{β}}_{L i u} - β) = {(X^{'} X + I_{p})}^{- 1} (d_{L i u} - I_{p}) β, \begin{array}{l} V a r ({\hat{β}}_{L i u}) = E ({\hat{β}}_{L i u} - β) {({\hat{β}}_{L i u} - β)}^{'} \\ = σ^{2} {(X^{'} X + I_{p})}^{- 1} (X^{'} X + d_{L i u} I_{p}) {(X^{'} X)}^{- 1} (X^{'} X + d_{L i u} I_{p}) {(X^{'} X + I_{p})}^{- 1}, \end{array} M S E ({\hat{β}}_{L i u}) = V a r ({\hat{β}}_{L i u}) + {(B i a s ({\hat{β}}_{L i u}))}^{'} (B i a s ({\hat{β}}_{L i u})) \begin{array}{l} = {(X^{'} X + I_{p})}^{- 1} (X^{'} X + d_{L i u} I_{p}) σ^{2} {(X^{'} X)}^{- 1} (X^{'} X + d_{L i u} I_{p}) {(X^{'} X + I_{p})}^{- 1} \\ + {(d_{L i u} - I_{p})}^{2} β^{'} {(X^{'} X + I_{p})}^{- 2} β . \end{array}

The Liu estimator is shown as the bias estimator, and its variance is greater than that of the OLS estimator when

d_{L i u}

lies between zero and one. Subsequently, Liu [9] developed the shrinkage factor [19] to create the Liu parameter that may lie outside the range between zero and one. In the following subsection, the multiple regression model is transformed into a canonical form to estimate the OLS and Liu estimators.

2.1. The Reparameterization of Liu Regression

The reparameterization of Liu regression transforms a multiple regression model into a canonical form, offering valuable insights into variable relationships and enhancing predictive accuracy [19]. The optimal Liu parameter is determined by minimizing the mean squares error. Akdeniz and Kacįranlar [20] introduced a new biased estimator and assessed its performance against a restricted least squares estimator regarding mean squares error. The comparison of the Liu estimator’s performance in canonical form is expressed as follows:

y = Z α + ε,

(7)

where

Z = X G

,

α = G^{'} β

,

Z^{'} Z = G^{'} X^{'} X G = Λ

, and

Λ

is a diagonal matrix such that

(λ_{1}, λ_{2}, \dots, λ_{p})

. The OLS estimator in canonical form can be defined as follows:

{\hat{α}}_{O L S} = Λ^{- 1} Z^{'} y .

(8)

Similarly, the Liu estimator [21] can be written as follows:

\begin{matrix} {\hat{α}}_{R . L i u} & = {(Λ + I_{p})}^{- 1} (Z^{'} y + d_{R . L i u} {\hat{α}}_{O L S}) \\ = {(Λ + I_{p})}^{- 1} (Λ + d_{R . L i u} I_{p}) {\hat{α}}_{O L S} . \end{matrix}

(9)

The bias, variance (Var), and mean square error (MSE) of the reparameterization of the OLS estimator from (8) are expressed as follows:

B i a s ({\hat{α}}_{O L S}) = E ({\hat{α}}_{O L S} - α) = E [{(Z^{'} Z)}^{- 1} Z^{'} ε] = 0, \begin{array}{l} V a r ({\hat{α}}_{O L S}) & = E ({\hat{α}}_{O L S} - α) ({\hat{α}}_{O L S} - α)' = E [{(Z^{'} Z)}^{- 1} Z^{'} ε ε^{'} Z {(Z^{'} Z)}^{- 1}] \\ = σ^{2} {(Z^{'} Z)}^{- 1} = σ^{2} Λ^{- 1}, \end{array} \begin{matrix} M S E ({\hat{α}}_{O L S}) & = V a r ({\hat{α}}_{O L S}) + {(B i a s ({\hat{α}}_{O L S}))}^{'} (B i a s ({\hat{α}}_{O L S})) \\ = σ^{2} Λ^{- 1} . \end{matrix}

The bias, variance (Var), and mean square error (MSE) of the reparameterization of the Liu estimator from (9) are proposed in the following:

B i a s ({\hat{α}}_{R . L i u}) = E ({\hat{α}}_{R . L i u} - α) = {(Λ + I_{p})}^{- 1} (d_{R . L i u} - I_{p}) α,

(10)

\begin{array}{l} V a r ({\hat{α}}_{R . L i u}) = E ({\hat{α}}_{R . L i u} - α) {({\hat{α}}_{R . L i u} - α)}^{'} \\ = σ^{2} {(Λ + I_{p})}^{- 1} (Λ + d_{R . L i u} I_{p}) {(Λ)}^{- 1} (Λ + d_{R . L i u} I_{p}) {(Λ + I_{p})}^{- 1}, \end{array} M S E ({\hat{α}}_{R . L i u}) = V a r ({\hat{α}}_{R . L i u}) + {(B i a s ({\hat{α}}_{R . L i u}))}^{'} (B i a s ({\hat{α}}_{R . L i u}))

(11)

\begin{array}{l} = σ^{2} {(Λ + I_{p})}^{- 1} (Λ + d_{R . L i u} I_{p}) {(Λ)}^{- 1} (Λ + d_{R . L i u} I_{p}) {(Λ + I_{p})}^{- 1} \\ + {(d_{R . L i u} - I_{p})}^{2} α^{'} {(Λ + I_{p})}^{- 2} α . \end{array}

(12)

Furthermore, the bias, variance, and mean squares error are given by Equations (13), (14), and (15), respectively:

B i a s ({\hat{α}}_{R . L i u}) = (d_{R . L i u} - 1) \sum_{j = 1}^{p} \frac{α_{j}}{(λ_{j} + 1)},

(13)

V a r ({\hat{α}}_{R . L i u}) = σ^{2} \sum_{j = 1}^{p} \frac{{(λ_{j} + d_{R . L i u})}^{2}}{λ_{j} {(λ_{j} + 1)}^{2}},

(14)

M S E ({\hat{α}}_{R . L i u}) = σ^{2} \sum_{j = 1}^{p} \frac{{(λ_{j} + d_{R . L i u})}^{2}}{λ_{j} {(λ_{j} + 1)}^{2}} + {(d_{R . L i u} - 1)}^{2} \sum_{j = 1}^{p} \frac{α_{j}^{2}}{{(λ_{j} + 1)}^{2}} .

(15)

The OLS and Liu estimators in canonical form were compared by considering the variance and MSE.

Given the

{\hat{α}}_{O L S}

and

{\hat{α}}_{R . L i u}

, the

{\hat{α}}_{R . L i u}

is the better estimator than

{\hat{α}}_{O L S}

, that is

M S E ({\hat{α}}_{O L S}) - M S E ({\hat{α}}_{R . L i u}) > 0,

, if and only if

V a r ({\hat{α}}_{O L S}) - V a r ({\hat{α}}_{R . L i u}) > 0,

Recall that:

V a r ({\hat{α}}_{O L S}) = σ^{2} Λ^{- 1}

and

V a r ({\hat{α}}_{R . L i u}) = σ^{2} {(Λ + I_{p})}^{- 1} (Λ + d_{R . L i u} I_{p}) {(Λ)}^{- 1} (Λ + d_{R . L i u} I_{p}) {(Λ + I_{p})}^{- 1} .

Then:

\begin{matrix} V a r ({\hat{α}}_{O L S}) - V a r ({\hat{α}}_{R . L i u}) & = σ^{2} Λ^{- 1} - σ^{2} {(Λ + I_{p})}^{- 1} (Λ + d_{R . L i u} I_{p}) {(Λ)}^{- 1} (Λ + d_{R . L i u} I_{p}) {(Λ + I_{p})}^{- 1} \\ = σ^{2} d i a g [\frac{1}{λ_{j}} - \frac{{(λ_{j} + d_{R . L i u})}^{2}}{λ_{j} {(λ_{j} + 1)}^{2}}] > 0, j = 1, \dots, p . \end{matrix}

(16)

It can be observed that

\frac{1}{λ_{j}} > \frac{{(λ_{j} + d_{R . L i u})}^{2}}{λ_{j} {(λ_{j} + 1)}^{2}}

when

0 < d_{R . L i u} < 1

. It can be concluded that

V a r ({\hat{α}}_{O L S}) - V a r ({\hat{α}}_{R . L i u}) > 0,

and the Liu estimator outperforms the OLS estimator.

2.2. Liu Parameter

As per the above subsection, we compared the two estimators. The reparameterization of Liu regression provides the performance estimator. However, the existing Liu estimator is to select the appropriate Liu parameter that was started by Liu [8] and developed into another model by Suhail et al. [14], Lukman et al. [15], Abdelwahab et al. [22], and Babar et al. [23]. The optimal Liu parameter is one reason to make the minimum of mean squares error (MSE) that is excessed to affect the estimation of the Liu estimator of collinearity on independent variables. However, tracing a diagonal matrix of transformation is useful for calculating the optimal Liu parameter. In this article, we suggest a version of the original Liu parameter, which was proposed by Liu [8], which is defined according to the optimum (opt), the minimum MSE (mm), and Cl criterion (cl), respectively, as follows:

From (15), the mean squares error (MSE) of the estimator

d_{R . L i u}

is given by:

M S E ({\hat{α}}_{R . L i u}) = σ^{2} \sum_{j = 1}^{p} \frac{{(λ_{j} + d_{R . L i u})}^{2}}{λ_{j} {(λ_{j} + 1)}^{2}} + {(d_{R . L i u} - 1)}^{2} \sum_{j = 1}^{p} \frac{α_{j}^{2}}{{(λ_{j} + 1)}^{2}} .

Now, we need to differentiate the MSE concerning

d = d_{R . L i u}

. This involves differentiating both the variance and bias terms as follows:

g^{'} (d) = \frac{\partial}{\partial d} (σ^{2} \sum_{j = 1}^{p} \frac{{(λ_{j} + d)}^{2}}{λ_{j} {(λ_{j} + 1)}^{2}} + {(d - 1)}^{2} \sum_{j = 1}^{p} \frac{α_{i}^{2}}{{(λ_{j} + 1)}^{2}}) = 2 σ^{2} \sum_{j = 1}^{p} \frac{(λ_{j} + d)}{λ_{j} {(λ_{j} + 1)}^{2}} + 2 (d - 1) \sum_{j = 1}^{p} \frac{α_{j}^{2}}{{(λ_{j} + 1)}^{2}} = 0 .

(17)

From (17), this equation and solving for

d

yield the optimal

d

, then:

\begin{array}{l} - 2 σ^{2} \sum_{j = 1}^{p} \frac{(λ_{j} + d)}{λ_{j} {(λ_{j} + 1)}^{2}} & = 2 (d - 1) \sum_{j = 1}^{p} \frac{α_{j}^{2}}{{(λ_{j} + 1)}^{2}} \\ - σ^{2} \sum_{j = 1}^{p} \frac{λ_{j}}{λ_{j} {(λ_{j} + 1)}^{2}} - d σ^{2} \sum_{j = 1}^{p} \frac{1}{λ_{j} {(λ_{j} + 1)}^{2}} & = d \sum_{j = 1}^{p} \frac{α_{j}^{2}}{{(λ_{j} + 1)}^{2}} - \sum_{j = 1}^{p} \frac{α_{j}^{2}}{{(λ_{j} + 1)}^{2}} \end{array} \begin{array}{l} \sum_{j = 1}^{p} \frac{α_{j}^{2}}{{(λ_{j} + 1)}^{2}} - σ^{2} \sum_{j = 1}^{p} \frac{λ_{j}}{λ_{j} {(λ_{j} + 1)}^{2}} & = d σ^{2} \sum_{j = 1}^{p} \frac{1}{λ_{j} {(λ_{j} + 1)}^{2}} + d \sum_{j = 1}^{p} \frac{α_{j}^{2}}{{(λ_{j} + 1)}^{2}} \\ \sum_{j = 1}^{p} \frac{α_{j}^{2} - σ^{2}}{{(λ_{j} + 1)}^{2}} & = d \sum_{j = 1}^{p} \frac{σ^{2} + λ_{j} α_{j}^{2}}{λ_{j} {(λ_{j} + 1)}^{2}} \\ d & = \frac{\sum_{j = 1}^{p} \frac{α_{j}^{2} - σ^{2}}{{(λ_{j} + 1)}^{2}}}{\sum_{j = 1}^{p} \frac{σ^{2} + λ_{j} α_{j}^{2}}{λ_{j} {(λ_{j} + 1)}^{2}}} . \end{array}

After solving, the

d_{o p t}

is given by:

d_{o p t} = \frac{\sum_{j = 1}^{p} \frac{{\hat{α}}_{j}^{2} - {\hat{σ}}^{2}}{{(λ_{j} + 1)}^{2}}}{\sum_{j = 1}^{p} \frac{{\hat{σ}}^{2} + λ_{j} {\hat{α}}_{j}^{2}}{λ_{j} {(λ_{j} + 1)}^{2}}},

(18)

where

σ^{2} = {\hat{σ}}^{2} = \frac{{(y - X {\hat{β}}_{O L S})}^{'} (y - X {\hat{β}}_{O L S})}{n - p}

as the estimated standard deviation of the error term in the regression model and

α = {\hat{α}}_{j} = Λ^{- 1} Z^{'} y

as the estimated coefficients in the canonical form of the Liu regression model.

From (17), the minimum MSE is to substitute

α_{j}^{2} = {\hat{α}}_{j}^{2} - \frac{{\hat{σ}}^{2}}{λ_{j}}

and

σ^{2} = {\hat{σ}}^{2}

for their unbiased estimator, and the derivative of the MSE with respect to

d

is set to zero:

\begin{array}{l} - 2 {\hat{σ}}^{2} \sum_{j = 1}^{p} \frac{(λ_{j} + d)}{λ_{j} {(λ_{j} + 1)}^{2}} & = 2 (d - 1) \sum_{j = 1}^{p} \frac{{\hat{α}}_{j}^{2} - \frac{{\hat{σ}}^{2}}{λ_{j}}}{{(λ_{j} + 1)}^{2}} \\ - {\hat{σ}}^{2} \sum_{j = 1}^{p} \frac{λ_{j}}{λ_{j} {(λ_{j} + 1)}^{2}} - {\hat{σ}}^{2} d \sum_{j = 1}^{p} \frac{1}{λ_{j} {(λ_{j} + 1)}^{2}} & = (d - 1) \sum_{j = 1}^{p} \frac{λ_{j} {\hat{α}}_{j}^{2} - {\hat{σ}}^{2}}{λ_{j} {(λ_{j} + 1)}^{2}} \end{array} \begin{array}{l} - {\hat{σ}}^{2} \sum_{j = 1}^{p} \frac{λ_{j}}{λ_{j} {(λ_{j} + 1)}^{2}} - σ^{2} d \sum_{j = 1}^{p} \frac{1}{λ_{j} {(λ_{j} + 1)}^{2}} \\ = d \sum_{j = 1}^{p} \frac{λ_{j} {\hat{α}}_{j}^{2}}{λ_{j} {(λ_{j} + 1)}^{2}} - \sum_{j = 1}^{p} \frac{λ_{j} {\hat{α}}_{j}^{2}}{λ_{j} {(λ_{j} + 1)}^{2}} - {\hat{σ}}^{2} d \sum_{j = 1}^{p} \frac{1}{λ_{j} {(λ_{j} + 1)}^{2}} + {\hat{σ}}^{2} \sum_{j = 1}^{p} \frac{1}{λ_{j} {(λ_{j} + 1)}^{2}} \end{array} \begin{array}{l} \sum_{j = 1}^{p} \frac{λ_{j} {\hat{α}}_{j}^{2}}{λ_{j} {(λ_{j} + 1)}^{2}} - d \sum_{j = 1}^{p} \frac{λ_{j} {\hat{α}}_{j}^{2}}{λ_{j} {(λ_{j} + 1)}^{2}} = {\hat{σ}}^{2} \sum_{j = 1}^{p} \frac{1}{λ_{j} {(λ_{j} + 1)}^{2}} + {\hat{σ}}^{2} \sum_{j = 1}^{p} \frac{λ_{j}}{λ_{j} {(λ_{j} + 1)}^{2}} \end{array} (1 - d) \sum_{j = 1}^{p} \frac{λ_{j} {\hat{α}}_{j}^{2}}{λ_{j} {(λ_{j} + 1)}^{2}} = {\hat{σ}}^{2} \sum_{j = 1}^{p} \frac{(λ_{j} + 1)}{λ_{j} {(λ_{j} + 1)}^{2}}

Minimizing the MSE leads to the following expression for

d_{m m}

:

d_{m m} = 1 - {\hat{σ}}^{2} [\frac{\sum_{j = 1}^{p} \frac{1}{λ_{i} (λ_{j} + 1)}}{\sum_{j = 1}^{p} \frac{{\hat{α}}_{j}^{2}}{{(λ_{j} + 1)}^{2}}}] .

(19)

The Liu parameter from the CL criterion is used to find the optimal biasing parameter

d

that minimizes the CL criterion, which balances the trade-off between fitting the data well and keeping the model’s complexity under control. The following formula gives the CL criterion:

C L_{d} = \frac{S S_{R e s, d}}{{\hat{σ}}^{2}} + 2 trace (H_{d}) - (n - 2),

where

S S_{R e s, d} = {(y - Z {\hat{α}}_{O L S})}^{'} (y - Z {\hat{α}}_{O L S})

is the residual sum of squares,

H_{d} = X {(X^{'} X + I_{p})}^{- 1} (X^{'} X + d I_{p}) X^{'}

, and

{\hat{σ}}^{2}

is the estimated variance of the errors.

To find the optimal

d

, we need to take the derivative of the CL criterion with respect to

d

and set it to zero:

\frac{d C L_{d}}{d d} = 0

.

After calculating the derivative and rearranging as

d_{o p t}

and

d_{m m}

, we obtain the following equation:

d_{c l} = 1 - {\hat{σ}}^{2} [\frac{\sum_{j = 1}^{p} \frac{1}{(λ_{j} + 1)}}{\sum_{j = 1}^{p} \frac{λ_{j} {\hat{α}}_{j}^{2}}{{(λ_{j} + 1)}^{2}}}] .

(20)

Furthermore, Liu [9] improved the Liu parameter in multiple linear regression under the approximation of the predicted residual error sum of squares criterion via the improved Liu estimator (ILE) as follows:

d_{I L E} = \frac{\sum_{i = 1}^{n} [\frac{{\tilde{e}}_{i}}{1 - g_{i i}} (\frac{{\tilde{e}}_{i}}{1 - g_{i i}} - \frac{{\hat{e}}_{i}}{1 - h_{i i}})]}{\sum_{j = 1}^{p} [\frac{{\tilde{e}}_{i}}{1 - g_{i i}} - \frac{{\hat{e}}_{i}}{1 - h_{i i}}]},

where

\begin{array}{l} \hat{e} = y_{i} - x_{i}^{'} {(X^{'} X - x_{i} x_{i}^{'})}^{- 1} (X^{'} y - x_{i} y_{i}), \\ \tilde{e} = y_{i} - x_{i}^{'} {(X^{'} X + I_{p} - x_{i} x_{i}^{'})}^{- 1} (X^{'} y - x_{i} y_{i}), \\ G = X {(X^{'} X + I_{p})}^{- 1} X^{'}, H = X {(X^{'} X)}^{- 1} X^{'} \end{array}

and

g_{i i} = x_{i} {(X^{'} X + I_{p})}^{- 1} x_{i}^{'}, h_{i i} = x_{i} {(X^{'} X)}^{- 1} x_{i}^{'} .

Özkale and Kaçiranlar [13] introduced a new two-parameter approach by incorporating the contraction estimator, encompassing well-known methods such as restricted least squares, restricted ridge, restricted contraction estimators, and a novel modified, restricted Liu estimator (RLE), which can be written as follows:

d_{R L E} = {\sum_{i = 1}^{n} [\frac{{\hat{e}}_{d i}}{1 - g_{i i}} - \frac{e_{i}}{(1 - g_{i i}) (1 - h_{i i})} (g_{i i} - {\tilde{H}}_{d - i i})]}^{2}

where

h_{i i}

represents the diagonal elements from the matrix

H

;

g_{i i}

represents the diagonal elements from the matrix

G

;

{\tilde{H}}_{d - i i}

represents the diagonal elements from the Liu hat matrix from (5) with cross-validation implemented to evaluate MSE for

d

[24],

and

{\hat{e}}_{d i} = y_{i} - {\hat{y}}_{d i}

is the ith residual at a specific value of

d

.

Mallows [25] discussed the interpretation of Cp plots by using the display as a basis for formally selecting a subset-regression model and extending to estimate the Liu estimator. The Liu parameter is defined as follows:

$d_{C p} = \frac{S S R}{{\hat{σ}}^{2}} + 2 t r a c e ({\tilde{H}}_{d - i i}) - (n - 2),$

where $S S R = {\sum_{i = 1}^{n} [\frac{{\hat{e}}_{d i}}{1 - g_{i i}} - \frac{e_{i}}{(1 - g_{i i}) (1 - h_{i i})} (g_{i i} - {\tilde{H}}_{d - i i})]}^{2} .$

In this paper, we modify the Liu parameter from Mallows [25] to introduce the mean squares error, which is obtained via the mean sum of squares residual (SSR) as follows:

d_{M S E} = \frac{{\sum_{i = 1}^{n} [\frac{{\hat{e}}_{d i}}{1 - g_{i i}} - \frac{e_{i}}{(1 - g_{i i}) (1 - h_{i i})} (g_{i i} - {\tilde{H}}_{d - i i})]}^{2}}{p} .

(21)

Furthermore, the correlation coefficient, often denoted as R-squared (

R^{2}

), is a critical metric in regression analysis. It quantifies the proportion of the variance in the dependent variable that can be predicted from the independent variables. From the significance of R-squared, we propose the new Liu parameter by computing the correlation coefficient as

R^{2} = 1 - S S R / S S T

in the range between zero and one, which is rewritten as follows:

d_{R 2} = 1 - \frac{{\sum_{i = 1}^{n} [\frac{{\hat{e}}_{d i}}{1 - g_{i i}} - \frac{e_{i}}{(1 - g_{i i}) (1 - h_{i i})} (g_{i i} - {\tilde{H}}_{d - i i})]}^{2}}{{\sum_{i = 1}^{n} [\frac{{\hat{e}}_{d i}}{1 - g_{1 - i i}} - \frac{e_{i}}{(1 - g_{1 - i i}) (1 - h_{i i})} (g_{1 - i i} - {\tilde{G}}_{d - i i})]}^{2}},

(22)

where

{\tilde{G}}_{d - i i}

represents the diagonal elements from

I - {\tilde{H}}_{d - i i}

and

g_{1 - i i}

represents the diagonal elements from

1 - h_{i i} .

Scaling options are utilized to standardize the independent variables and assess their performance via the Liu estimator. The initial method, introduced by Liu [8], is the centered option, standardizing independent variables to have zero mean and unit variance. The scaled option further standardizes independent variables. Lastly, the SC option scales independent variables in correlation form, a concept explored by Belsley [26].

3. Simulation Study

In line with the previous section’s theoretical comparison among Liu estimators, a simulation study covered the Monte Carlo simulation using the R 4.2.1 programming language. The objective of the simulation study was to estimate and compare Liu parameters’ performance on the multiple regression model. The independent variables (

{\underset{˜}{x}}_{i}

) were generated from multivariate normal distribution of five, ten, and fifteen independent variables based on Toeplitz correlation (

ρ

) values of 0.1 and 0.9. The multivariate normal distribution based on parameter means (

\underset{˜}{μ}

) and covariance matrix (

\sum

) was simulated as multicollinearity between independent variables. The probability distribution is defined as follows:

f ({\underset{˜}{x}}_{i} | \underset{˜}{μ}, \sum) = \frac{\exp {- \frac{1}{2} {({\underset{˜}{x}}_{i} - \underset{˜}{μ})}^{T} \sum^{- 1} ({\underset{˜}{x}}_{i} - \underset{˜}{μ})}}{\sqrt{{(2 π)}^{p} | \sum |}},

(23)

where

{\underset{˜}{x}}_{i} = [\begin{array}{l} x_{i 1} \\ x_{i 2} \\ \dots \\ x_{i p} \end{array}]

,

\underset{˜}{μ} = [\begin{array}{l} μ_{1} \\ μ_{2} \\ \dots \\ μ_{p} \end{array}], i = 1, 2, \dots, n .

This type of covariance matrix is mentioned in the Toeplitz correlation model, which implies that closely located independent variables have a high correlation and the correlation decreases for independent variables that are farther apart. A matrix with the following pattern characterizes the relationship:

\sum = [\begin{array}{r} 1 & ρ & ρ^{2} & \dots & ρ^{p - 1} \\ ρ & 1 & ρ & \dots & ρ^{p - 2} \\ ρ^{2} & ρ & 1 & \dots & ρ^{p - 3} \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ ρ^{p - 1} & ρ^{p - 2} & ρ^{p - 3} & \dots & 1 \end{array}],

(24)

where the correlation coefficient or level of multicollinearity is given by 0.1 or 0.9.

The observations on the dependent variable are obtained from the multiple regression model as

y_{i} = β_{0} + x_{i 1} β_{1} + x_{i 2} β_{2} + \dots + x_{i p} β_{p} + ε_{i}, i = 1, 2, \dots, n,

(25)

where

ε

is generated from the normal distribution to be mean zero and variance one, and the regression coefficients (

β_{0}, β_{1}, \dots, β_{p}

) are defined as the constant values.

The data generated by the regression model were randomly split into 70% training data and 30% testing data. The data were then randomly sampled, and the training and testing data were used to calculate the MAPE (mean absolute percentage error). The performance criterion was used to judge the performance of different Liu parameters in estimating the Liu estimator. The evaluated MAPE is defined as follows:

Mean Absolute Percentage Error = \frac{1}{n} \sum_{i = 1}^{n} | \frac{y_{i} - {\hat{y}}_{i}}{y_{i}} | \times 100,

(26)

where

y_{i}

is the real dataset, and

{\hat{y}}_{i}

is the estimated dataset. The average mean absolute percentage error of the OLS, ridge regression, and eight Liu parameters for five, ten, and fifteen variables are presented in Table 1, Table 2 and Table 3 according to their correlation coefficient (0.1 and 0.9). Table 4 presents the Liu parameter values to estimate the Liu estimator. An average of over 1000 replications was employed to approximate the average mean absolute percentage error. The minimum average of the mean absolute percentage error is shown in bold letters.

Table 1, Table 2 and Table 3 describe the simulated average mean absolute percentage error for two levels of Toeplitz correlation. In Table 1, Table 2 and Table 3, the smallest value of the MAPE is highlighted in bold letters. The simulation results showed that the modified Liu parameter had the smallest values of MAPE in terms of R-squared (dR2), so it outperformed the other methods, especially in the scaled option in Table 2. However, the dCp have the weakest performance in all cases. Furthermore, the MAPE of dmm, dcl, and dopt was equal to the dR2 in the center and scaled options in Table 1 and Table 2. The influence of sample sizes was observed in the sampled impact on estimation since the MAPE decreased when the sampling sizes decreased. The MAPE of the independent variables was reduced when the independent variables increased. The Liu parameter of the estimate Liu estimator is presented in Table 4; it varied with sample size, independent variables, and the level of correlation.

Table 4 presents the mean Liu parameters for multiple regression models with low (0.1) and high (0.9) Toeplitz correlation, comparing various methods and sample sizes across different numbers of independent variables. For low correlations, Liu parameters were relatively stable across methods as sample size increases. However, for high correlations, methods including dRLE and dILE showed significant increases in their Liu parameters as sample sizes and independent variables increased, particularly at higher independent variables. Methods like dCp and dMSE also exhibited higher Liu parameters with larger independent variables and correlation values. Overall, dRLE and dILE tended to perform best, especially when correlation was high, while dmm, dopt, and dcl showed less variation across different sample sizes. For a better understanding, we have plotted the Liu parameters for just dmm, dcl, dopt, dILE, dRLE, dMSE, and dR2 for multicollinearity 0.1 and 0.9 in Figure 1 and Figure 2, respectively.

The Liu parameters based on dmm, dcl, dopt, dILE, dRLE, and dR2 demonstrated stable values in all situations, as shown in Figure 1 and Figure 2. Furthermore, the dMSE decreased when sample sizes increase, especially at high correlation. In contrast, the dmm, dcl, dopt, dILE, and dRLE closely followed the Liu parameter in high correlation. The other Liu parameters differ edslightly for dmm, dcl, dopt, dILE, and dRLE. The modified Liu parameters (dR2) were close to stable values of 1 in all cases.

4. Application in Actual Data

We employed Liu regression to distinguish between blood donors’ laboratory values and patients’ ages using the Hepatitis C patients dataset sourced from UCI Machine Learning. This dataset was retrieved from https://archive.ics.uci.edu/dataset/503/hepatitis+c+virus+hcv+for+egyptian+patients (accessed on 26 September 2024). and contained 589 records. The dependent variable was the age of the patients and independent variables included albumin (ALB), total protein (PROT), cholinesterase (CHE), cholesterol (CHOL), alkaline phosphatase (ALP), alanine aminotransferase (ALT), creatinine (CREA), bilirubin (BIL), aspartate aminotransferase (AST), and gamma-glutamyl transferase (GGT).

For checking multicollinearity data, Pearson’s correlation analysis was employed to ascertain any potential relationships among the ten continuous independent variables—the Pearson’s correlation coefficients of the independent variables are listed in Table 5 and illustrated in Figure 3. The null hypothesis stated that there was no relationship between the two variables, and the alternative hypothesis assessed the significance of these relationships. A p-value below 0.05 for the t-statistics signified a rejected null hypothesis and meant a significant relationship between the two variables, as demonstrated in Table 5.

Our findings showed that a moderately significant relationship, such as between 0.41–0.6, was observed in most cases. A weak level of considerable relationship was evident in some instances, such as between 0.2 and 0.4. Most of the independent variables exhibited a significant relationship, with the exceptions being between total protein (PROT) and alkaline phosphatase (ALP), alanine aminotransferase (ALT), creatinine (CREA), bilirubin (BIL), aspartate aminotransferase (AST), and gamma-glutamyl transferase (GGT).

The computed Pearson correlation matrix displaying different colors in Figure 3, derived from Table 5, utilizes varying shades to enhance clarity. Light shading indicates moderate correlations, while dark shading represents strong correlations. Most of the independent variables are depicted with moderate and light shading, suggesting inter-variable correlations or multicollinearity issues. The data from the entire dataset were divided into 70% training and 30% testing data and then randomly sampled. The average mean absolute percentage errors shown in Table 6 were computed using OLS, ridge regression, and eight Liu parameters with three scale options by generating 1000 replications testing all datasets. The selection of sample sizes of 50, 100, 150, and 200 mirrored those in the simulation data.

Table 6 reveals that the modified Liu parameters (dMSE and dR2) exhibited consistent and often superior accuracy prediction across all scenarios. The dMSE and dR2 methods notably demonstrated commendable estimation with all sample sizes, better than the original method using OLS and ridge regression. Consequently, the Liu parameter adjustment using the dMSE and dR2 methods for ten independent variables consistently surpassed expectations and aligned closely with the simulation outcomes. Although there were slight discrepancies in estimation when the sample sizes increased, substantial performance enhancements were evident with small sample sizes from within the Hepatitis C dataset. Using a large sample size is more efficient than using an entire dataset, both in estimation accuracy and processing time.

5. Discussion

The simulated results presented in Table 1, Table 2, Table 3 and Table 4 revealed that the mean average percentage error was affected by the number of independent variables and the sample size. The modified Liu estimator (dR2) exhibited superior performance with all independent variables, correlation levels, and sample sizes, whereas dMSE slightly differed from dR2. However, the mean average percentage error for the significant independent variables was lower than that for the smaller independent variables. The increase in the correlation coefficient had a weak impact on the estimation in most methods, as indicated by the slight variation in the mean average percentage error.

In the same direction, the real data results in Table 6 showcase that the proposed Liu parameters (dMSE and dR2) achieved the smallest mean average percentage error for the datasets with eight independent variables. It was observed that the real data’s independent variables exhibited skewed distributions, as illustrated in Figure 4, confirmed by the Shapiro–Wilk test [27], indicating non-normality. Altukhaes et al. [28] introduced robust Liu estimators to combat multicollinearity and outlier problems in the linear regression model. So, the dCp effectively estimated large sample sizes using the center option. Notably, the discrepancy between the simulated and real data results emphasized the importance of considering the data source when selecting the Liu parameter.

The proposed Liu parameters (dMSE and dR2) emerged as the most suitable for the estimator. Medical datasets are widely used to enhance predictive medical diagnosis patient classification. However, the Hepatitis C dataset used was a medical dataset indicating the patients’ ages, representing the multiple regression model with the multicollinearity problem among the independent variables. Oladapo et al. [29] introduced a novel modified Liu ridge-type estimator for estimating parameters in the general linear model, employing Portland cement data as a case study akin to medical data. Their proposed estimator demonstrated superior performance under certain conditions. Baber et al. [23] adapted Liu estimators to address multicollinearity issues in linear regression, utilizing tobacco data, advocating for adoption of these new estimators by practitioners facing high to severe multicollinearity among independent variables. Hammond et al. [30] employed a Liu estimator for inverse Gaussian regression, tackling multicollinearity in chemistry datasets. While considering the Liu estimator for addressing multicollinearity based on multiple regression, the proposed Liu estimator outperformed the other. In summary, we recommend the Liu estimator using the modified Liu parameter for high multicollinearity.

The modified Liu parameters have some critical limitations and challenges. These methods require more processing power and time than traditional methods, which could be problematic for large-scale applications or users with limited computing resources. Furthermore, the risk of overfitting might be too closely tailored to the specific datasets used in the study, leading to good performance on those datasets but poor performance on new, unseen data. More testing on diverse datasets is needed to ensure the methods do not lead to overfitting.

6. Conclusions

This paper introduces a Liu parameter designed to enhance the estimation of the Liu estimator in multiple regression models affected by multicollinearity among independent variables. The selection of this Liu parameter was carefully examined and compared to other methods to determine its effectiveness. Simulation studies demonstrated that the modified Liu parameter based on R-squared consistently achieved the lowest mean absolute percentage error, particularly in the scaled option, outperforming alternative approaches. Sample size, the number of independent variables, and correlation levels influenced the Liu parameter. Specifically, smaller sample sizes and more independent variables contributed to efficient estimators. Additionally, correlation levels significantly impacted the Liu parameter, with small correlations showing positive effects and large correlations leading to higher values. Furthermore, the modified Liu parameter outperformed the ordinary least squares method in simulation and real data scenarios. This Liu parameter substantially improves the estimator, especially in regression models with multicollinearity at varying correlation levels. As a result, utilizing a Liu parameter within the zero range is recommended, which can consistently provide the most accurate estimation.

Accurate estimation of correlation structures within the data is crucial to enhancing the reliability of the proposed methods. However, this can be challenging in practice, mainly when dealing with noisy, incomplete, outlier datasets, which may affect the overall performance of the methods. Therefore, further research should be conducted to address these estimation challenges.

Funding

This work was financially supported by King Mongkut’s Institute of Technology Ladkrabang [2567-02-05-010].

Data Availability Statement

Data are available at https://docs.google.com/spreadsheets/d/1IEsYNzOf15upAhCpn_5FpOdnC8Sa1drG/edit?gid=648769099#gid=648769099 (accessed on 21 August 2024).

Conflicts of Interest

The author declares no conflict of interest.

References

Daoud, J.I. Multicollinearity and regression analysis. J. Phys. Conf. Ser. 2017, 949, 012009. [Google Scholar] [CrossRef]
Shrestha, N. Detecting multicollinearity in regression analysis. Am. J. Appl. Math. 2020, 8, 39–42. [Google Scholar] [CrossRef]
Liang, Y.; Rosen, D.V.; Rosen, T.V. On properties of Toeplitz-type covariance matrices in models with nested random effects. Stat. Pap. 2021, 62, 2509–2528. [Google Scholar] [CrossRef]
Qi, B.; Xu, L.; Liu, X. Improved multiple-Toeplitz matrices reconstruction method using quadratic spatial smoothing for coherent signals DOA estimation. Eng. Comput. 2024, 41, 333–346. [Google Scholar] [CrossRef]
Dawoud, I.; Abonazel, M.R.; Awwad, F.A. Modified Liu estimator to address the multicollinearity problem in regression models: A new biased estimation class. Sci. Afr. 2022, 17, e01372. [Google Scholar] [CrossRef]
Jahufer, A. Detecting global influential observations in Liu regression model. Open J. Stat. 2013, 3, 5–11. [Google Scholar] [CrossRef]
Karlsson, P.; Månsson, K.; Golam Kibria, B.M. A Liu estimator for the beta regression model and its application to chemical data. J. Chemom. 2020, 24, 2–16. [Google Scholar] [CrossRef]
Liu, K. A new class of biased estimate in linear regression. Commun. Stat. Theory Methods 1993, 22, 393–402. [Google Scholar]
Liu, X.-Q. Improved Liu Estimation in a linear regression model. J. Stat. Plan. Inference 2011, 141, 189–196. [Google Scholar] [CrossRef]
Yang, H.; Xu, J. An alternative stochastic restricted Liu estimator in linear regression. Stat. Pap. 2009, 50, 639–647. [Google Scholar] [CrossRef]
Hubert, M.H.; Wijekoon, P. Improvement of the Liu estimator in linear regression model. Stat. Pap. 2006, 47, 471–479. [Google Scholar] [CrossRef]
Akdeniz, F.; Erol, H. Mean squared error matrix comparison of some biased estimators in linear regression. Commun. Stat. Theory Methods 2003, 32, 2389–2413. [Google Scholar] [CrossRef]
Özkale, M.R.; Kaçiranlar, S. The restricted and unrestricted two-parameter estimators. Commun. Stat. Theory Methods 2007, 36, 2707–2725. [Google Scholar] [CrossRef]
Suhail, M.; Babar, I.; Khan, Y.A.; Imran, M.; Nawaz, Z. Quantile-based estimation of Liu parameter in the linear regression model: Applications to Portland cement and US crime data. Math. Probl. Eng. 2021, 2021, 1–11. [Google Scholar] [CrossRef]
Lukman, A.F.; Golam Kibria, B.M.; Ayinde, K.; Jegede, S.L. Modified one-parameter Liu estimator for the linear regression model. Mod. Sim. Eng. 2020, 2020, 1–17. [Google Scholar] [CrossRef]
Hoerl, A.E.; Kennard, R.W. Ridge Regression: Biased estimation for nonorthogonal problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
Lukman, A.F.; Ayinde, K.; Kun, S.S.; Adewuyi, E.T. A Modified new two-parameter estimator in a linear regression model. Mod. Sim. Eng. 2019, 2019, 1–10. [Google Scholar] [CrossRef]
Filzmoser, P.; Kurnaz, F.S. A robust Liu regression estimator. Commun. Stat. Simul. Comput. 2018, 47, 432–443. [Google Scholar] [CrossRef]
Druilhet, P.; Mom, A. Shrinkage Structure in Biased Regression. J. Multivar. Anal. 2008, 99, 232–244. [Google Scholar] [CrossRef]
Akdeniz, F.; Kacįranlar, S. More on the new biased estimator in linear regression. Sankhya Indian J. Stat. Ser. B 2001, 63, 321–325. [Google Scholar]
Duran, E.R.; Akdeniz, F.; Hu, H. Efficiency of a Liu-type estimator in semiparametric regression models. J. Comput. Appl. Math. 2011, 235, 1418–1428. [Google Scholar] [CrossRef]
Abdelwahab, M.M.; Abonazel, M.R.; Hammad, A.T.; El-Masry, A.M. Modified two-parameter Liu estimator for addressing multicollinearity in the Poisson regression model. Axioms 2024, 13, 46. [Google Scholar] [CrossRef]
Babar, I.; Ayed, H.; Chand, S.; Suhail, M.; Khan, Y.A.; Marzouki, R. Modified Liu estimators in the linear regression model: An application to Tobacco data. PLoS ONE 2021, 16, e0259991. [Google Scholar] [CrossRef]
Özkale, M.R.; Kaçiranlar, S. A Prediction-Oriented criterion for choosing the biasing parameter in Liu estimation. Commun. Stat. Theory Methods 2007, 36, 1889–1903. [Google Scholar] [CrossRef]
Mallows, C.L. Some Comments on Cp. Technometrics 2012, 42, 87–94. [Google Scholar]
Belsley, D.A. A Guide to using the collinearity diagnostics. Com. Sci. Eco. Mana. 1991, 4, 33–50. [Google Scholar] [CrossRef]
Shapiro, S.S.; Wilk, M.P. An analysis of variance test for normality (complete samples). Biometrika 1965, 52, 591–611. [Google Scholar] [CrossRef]
Altukhaes, W.D.; Roozbeh, M.; Mohamed, N.A. Robust Liu estimator used to combat some challenges in partially linear regression model by improving LTS algorithm using semidefinite programming. Mathematics 2024, 12, 2787. [Google Scholar] [CrossRef]
Oladapo, O.J.; Owolabi, A.T.; Idowu, J.I.; Ayinde, K. A new modified Liu Ridge-Type estimator for the linear regression model: Simulation and application. Int. J. Clin. Biostat. Biom. 2022, 8, 1–14. [Google Scholar]
Hammood, N.M.; Jabur, D.M.; Algamal, Z.Y. A Liu estimator in inverse Gaussian regression model with application in chemometrics. Math. Stat. Eng. Appl. 2022, 71, 248–266. [Google Scholar]

Figure 1. Estimated Liu parameter values for p = 5, 10, and 15, and the level of correlation at 0.1.

Figure 2. Estimated Liu parameter values for p = 5, 10, and 15, and the level of correlation at 0.9.

Figure 3. Correlation graph for the ten independent variables.

Figure 4. The histogram of ten independent variables.

Table 1. The average mean absolute percentage error of Liu estimators for the Toeplitz correlation of the center option.

p	Methods	$ρ = 0.1$				$ρ = 0.9$
p	Methods	n = 50	n = 100	n = 150	n = 200	n = 50	n = 100	n = 150	n = 200
5	OLS	0.810	0.722	0.758	0.747	0.815	0.766	0.760	0.752
	Ridge	0.834	0.781	0.773	0.762	0.776	0.763	0.754	0.755
	dmm	0.600	0.671	0.694	0.700	0.705	0.725	0.787	0.758
	dcl	0.596	0.671	0.694	0.700	0.737	0.725	0.726	0.718
	dopt	0.599	0.671	0.694	0.700	0.650	0.700	0.720	0.719
	dILE	0.620	0.689	0.735	0.758	0.742	0.758	0.782	0.796
	dRLE	0.630	0.752	0.775	0.792	0.785	0.824	0.847	0.886
	dCp	1.150	0.823	0.761	0.737	0.901	0.753	0.733	0.721
	dMSE	0.611	0.674	0.696	0.700	0.599	0.672	0.697	0.704
	dR2	0.593	0.670	0.694	0.700	0.596	0.671	0.696	0.703
10	OLS	0.842	0.740	0.713	0.698	0.859	0.753	0.726	0.715
	Ridge	0.875	0.767	0.735	0.722	0.763	0.727	0.716	0.713
	dmm	0.465	0.570	0.604	0.619	0.658	0.648	0.654	0.682
	dcl	0.450	0.569	0.604	0.619	0.736	0.735	0.768	0.783
	dopt	0.461	0.570	0.604	0.619	0.545	0.627	0.657	0.667
	dILE	0.635	0.705	0.725	0.760	0.687	0.739	0.765	0.783
	dRLE	0.720	0.731	0.743	0.768	0.732	0.789	0.852	0.876
	dCp	2.580	1.320	0.969	0.833	1.630	1.020	0.811	0.737
	dMSE	0.460	0.575	0.607	0.620	0.572	0.577	0.614	0.631
	dR2	0.439	0.567	0.604	0.619	0.448	0.576	0.614	0.631
15	OLS	0.884	0.722	0.685	0.665	0.910	0.746	0.707	0.687
	Ridge	1.010	0.761	0.709	0.691	0.748	0.696	0.689	0.673
	dmm	0.337	0.478	0.530	0.552	0.490	0.625	0.721	0.752
	dcl	0.286	0.476	0.529	0.552	0.365	0.610	0.725	0.785
	dopt	0.315	0.478	0.530	0.552	0.489	0.576	0.674	0.712
	dILE	0.652	0.685	0.696	0.714	0.718	0.752	0.786	0.816
	dRLE	0.723	0.758	0.793	0.812	0.768	0.823	0.876	0.917
	dCp	4.690	2.130	1.410	1.100	2.790	1.520	1.090	0.861
	dMSE	0.288	0.483	0.533	0.554	1.150	0.505	0.546	0.570
	dR2	0.255	0.474	0.528	0.551	0.262	0.488	0.545	0.570

Table 2. The average mean absolute percentage error of Liu estimators for Toeplitz correlation of the scaled option.

p	Methods	$ρ = 0.1$				$ρ = 0.9$
p	Methods	n = 50	n = 100	n = 150	n = 200	n = 50	n = 100	n = 150	n = 200
5	OLS	0.810	0.722	0.758	0.747	0.815	0.766	0.760	0.752
	Ridge	0.834	0.781	0.773	0.762	0.776	0.763	0.754	0.755
	dmm	0.599	0.671	0.694	0.700	0.631	0.742	0.780	0.795
	dcl	0.595	0.671	0.694	0.700	0.628	0.745	0.768	0.783
	dopt	0.599	0.671	0.694	0.700	0.655	0.671	0.732	0.720
	dILE	0.624	0.738	0.762	0.787	0.632	0.747	0.764	0.725
	dRLE	0.615	0.725	0.714	0.773	0.628	0.732	0.751	0.738
	dCp	1.100	0.816	0.758	0.736	0.878	0.749	0.732	0.721
	dMSE	0.609	0.674	0.696	0.700	0.599	0.672	0.697	0.704
	dR2	0.593	0.670	0.694	0.699	0.596	0.671	0.696	0.703
10	OLS	0.842	0.740	0.713	0.698	0.859	0.753	0.726	0.715
	Ridge	0.875	0.767	0.735	0.722	0.763	0.727	0.716	0.713
	dmm	0.463	0.570	0.604	0.619	0.587	0.658	0.782	0.798
	dcl	0.449	0.569	0.604	0.619	0.548	0.645	0.659	0.673
	dopt	0.459	0.570	0.604	0.619	0.523	0.627	0.649	0.655
	dILE	0.524	0.568	0.587	0.628	0.647	0.658	0.671	0.690
	dRLE	0.641	0.678	0.689	0.690	0.745	0.758	0.765	0.778
	dCp	2.480	1.290	0.959	0.827	1.590	1.000	0.812	0.739
	dMSE	0.458	0.574	0.607	0.620	0.570	0.577	0.614	0.631
	dR2	0.439	0.567	0.604	0.619	0.448	0.576	0.614	0.631
15	OLS	0.884	0.722	0.685	0.665	0.910	0.746	0.707	0.687
	Ridge	1.010	0.761	0.709	0.691	0.748	0.696	0.689	0.673
	dmm	0.335	0.478	0.530	0.552	0.481	0.597	0.647	0.672
	dcl	0.285	0.476	0.529	0.552	0.358	0.527	0.641	0.687
	dopt	0.313	0.477	0.530	0.552	0.457	0.558	0.627	0.690
	dILE	0.541	0.572	0.652	0.687	0.625	0.687	0.754	0.798
	dRLE	0.651	0.745	0.788	0.790	0.725	0.784	0.852	0.886
	dCp	4.570	2.080	1.390	1.090	2.730	1.510	1.080	0.865
	dMSE	0.287	0.482	0.532	0.554	1.130	0.505	0.546	0.570
	dR2	0.255	0.474	0.528	0.551	0.232	0.488	0.545	0.570

Table 3. The average of mean absolute percentage error of Liu estimators for Toeplitz correlation of the SC option.

p	Methods	$ρ = 0.1$				$ρ = 0.9$
p	Methods	n = 50	n = 100	n = 150	n = 200	n = 50	n = 100	n = 150	n = 200
5	OLS	0.810	0.722	0.758	0.747	0.815	0.766	0.760	0.752
	Ridge	0.834	0.781	0.773	0.762	0.776	0.763	0.754	0.755
	dmm	0.955	0.950	0.929	0.944	1.143	1.121	1.105	1.165
	dcl	0.784	0.840	0.847	0.858	0.854	0.963	0.971	0.985
	dopt	0.931	0.944	0.927	0.942	0.952	0.968	0.974	0.957
	dILE	0.857	0.869	0.874	0.898	0.874	0.885	0.893	0.901
	dRLE	0.747	0.765	0.778	0.785	0.783	0.787	0.798	0.801
	dCp	8.770	9.300	9.470	9.560	5.470	6.420	6.830	7.070
	dMSE	1.380	1.590	1.670	1.690	0.726	0.937	1.090	1.170
	dR2	0.596	0.673	0.697	0.702	0.596	0.671	0.696	0.703
10	OLS	0.842	0.740	0.713	0.698	0.859	0.753	0.726	0.715
	Ridge	0.875	0.767	0.735	0.722	0.763	0.727	0.716	0.713
	dmm	1.060	1.030	1.030	1.050	1.145	1.136	1.154	1.165
	dcl	0.785	0.856	0.882	0.900	0.813	0.865	0.899	0.923
	dopt	0.997	1.020	1.020	1.040	1.140	1.158	1.169	1.752
	dILE	0.864	0.875	0.887	0.914	0.932	0.956	0.978	0.992
	dRLE	0.753	0.775	0.784	0.798	0.841	0.852	0.861	0.887
	dCp	19.60	21.70	22.20	22.40	10.20	12.30	13.40	13.90
	dMSE	1.060	1.720	1.900	1.980	2.090	0.695	0.691	0.829
	dR2	0.441	0.569	0.605	0.620	0.448	0.576	0.614	0.631
15	OLS	0.884	0.722	0.685	0.665	0.910	0.746	0.707	0.687
	Ridge	1.010	0.761	0.709	0.691	0.748	0.696	0.689	0.673
	dmm	1.220	1.060	1.090	1.110	1.324	1.214	1.365	1.427
	dcl	0.763	0.858	0.901	0.924	0.825	0.9320	0.935	0.957
	dopt	1.040	1.050	1.080	1.100	1.154	1.189	1.192	2.014
	dILE	0.962	0.974	0.985	0.990	1.015	1.117	1.157	1.187
	dRLE	0.857	0.864	0.872	0.8870	0.921	0.938	0.957	0.979
	dCp	29.80	34.70	36.10	36.70	14.60	18.60	20.20	21.30
	dMSE	0.737	1.570	1.920	2.070	5.810	1.540	0.772	0.613
	dR2	0.256	0.475	0.529	0.552	0.262	0.488	0.545	0.570

Table 4. The mean Liu parameters for Toeplitz correlation in the multiple regression models.

p	Methods	$ρ = 0.1$				$ρ = 0.9$
p	Methods	n = 50	n = 100	n = 150	n = 200	n = 50	n = 100	n = 150	n = 200
5	dmm	0.577	0.598	0.633	0.616	0.613	0.679	0.754	0.736
	dcl	0.690	0.692	0.707	0.695	0.748	0.789	0.898	0.754
	dopt	0.574	0.602	0.635	0.618	0.687	0.747	0.796	0.723
	dILE	0.374	0.451	0.540	0.633	0.380	0.561	0.635	0.745
	dRLE	0.452	0.563	0.632	0.741	0.478	0.598	0.693	0.774
	dCp	6.590	6.810	6.880	6.910	5.220	5.940	6.250	6.410
	dMSE	0.206	0.094	0.059	0.044	0.906	0.500	0.356	0.281
	dR2	0.964	0.961	0.960	0.960	0.990	0.989	0.989	0.989
10	dmm	0.529	0.593	0.606	0.599	0.668	0.697	0.721	0.632
	dcl	0.680	0.692	0.693	0.689	0.745	0.768	0.771	0.740
	dopt	0.562	0.599	0.609	0.601	0.651	0.675	0.687	0.625
	dILE	0.391	0.473	0.550	0.685	0.412	0.587	0.678	0.787
	dRLE	0.484	0.698	0.673	0.787	0.584	0.635	0.789	0.874
	dCp	11.00	11.60	11.70	11.80	7.940	9.560	10.30	10.70
	dMSE	0.520	0.205	0.129	0.093	2.370	1.160	0.821	0.637
	dR2	0.984	0.982	0.981	0.981	0.997	0.997	0.997	0.997
15	dmm	0.451	0.591	0.596	0.595	0.587	0.625	0.647	0.654
	dcl	0.669	0.690	0.688	0.686	0.725	0.741	0.758	0.787
	dopt	0.536	0.599	0.600	0.598	0.674	0.668	0.685	0.674
	dILE	0.387	0.412	0.563	0.694	0.432	0.598	0.689	0.814
	dRLE	0.554	0.715	0.768	0.879	0.623	0.789	0.823	0.880
	dCp	15.10	16.30	16.60	16.70	10.40	13.00	14.20	14.90
	dMSE	1.020	0.343	0.205	0.147	4.720	1.940	1.320	1.020
	dR2	0.991	0.989	0.988	0.988	0.999	0.999	0.999	0.998

Table 5. Pearson correlation matrix for the relationships between ten independent variables.

Variables	ABL	PROT	CHE	CHOL	ALP	ALT	CREA	BIL	AST	GGT
ABL p-value	1.00	0.57 * <0.05	0.36 * <0.05	0.21 * <0.05	−0.15 * 0.01	0.04 1.00	0.00 1.00	−0.17 * <0.05	−0.18 * <0.05	-0.15 * 0.01
PROT p-value	-	1.00	0.31 * <0.05	0.25 * <0.05	−0.06 1.00	0.02 1.00	−0.03 1.00	−0.05 1.00	0.02 1.00	−0.04 1.00
CHE p-value	-	-	1.000	0.43 * <0.05	0.03 1.00	0.22 * <0.05	−0.01 1.00	−0.32 * <0.05	−0.20 * <0.05	−0.10 0.36
CHOL p-value	-	-	-	1.000	0.13 0.05	0.15 * 0.01	−0.05 1.00	−0.18 * <0.05	−0.20 * <0.05	0.01 1.00
ALP p-value	-	-	-	-	1.000	0.22 * <0.05	0.15 * <0.05	0.06 1.00	0.07 1.00	0.46 * <0.05
ALT p-value	-	-	-	-	-	1.000	−0.04 1.00	−0.11 0.18	0.20 <0.05	0.22 <0.05
CREA p-value	-	-	-	-	-	-	1.000	0.02 1.00	−0.02 1.00	0.13 0.05
BIL p-value	-	-	-	-	-	-	-	1.00	0.31 * <0.05	0.21 * <0.05
AST p-value	-	-	-	-	-	-	-	-	1.00	0.14 * <0.05
GGT p-value	-	-	-	-	-	-	-	-	-	1.00

Note. *, multicollinearity between two variables.

Table 6. The average mean absolute percentage error was estimated on samples sizes of 50, 100, 150, 200, and 589.

Sample Sizes	OLS Ridge	Scale Option	Liu Parameters
Sample Sizes	OLS Ridge	Scale Option	dmm	dcl	dopt	dILE	dRLE	dCp	dMSE	dR2
n = 50		Liu parameter	0.547	0.635	0.714	0.883	0.854	11.8	6.63	0.436
		Centered	17.25	18.14	15.13	16.13	17.69	12.2	11	10.4
	24	Scaled	17.79	18.36	16.39	16.99	18.75	29.5	18.4	10.5
	19.01	SC	18.65	18.98	17.58	17.58	18.96	88.6	47.5	11.8
n = 100		Liu parameter	0.698	0.741	0.896	0.721	0.785	11.9	2.69	0.271
		Centered	16.25	16.33	15.06	16.05	16.96	13.8	13.8	13.8
	19.8	Scaled	16.98	17.52	16.35	16.37	18.69	16.3	13.8	13.8
	18.32	SC	16.69	17.85	16.78	16.69	18.23	67.7	17.4	14.9
n = 150		Liu parameter	0.785	0.896	0.741	0.624	0.693	12	1.63	0.215
		Centered	15.37	16.15	15.83	15.86	16.74	14.9	15	15
	18.7	Scaled	15.54	16.98	16.65	15.32	17.16	18	15	15
	18.03	SC	15.62	16.96	16.15	15.87	17.36	56.8	15.2	15.8
n = 200		Liu parameter	0.632	0.669	0.725	0.693	0.627	12	1.19	0.185
		Centered	15.73	16.12	15.52	15.95	16.05	15.4	15.4	15.5
	18.2	Scaled	15.69	16.21	16.20	15.44	17.05	15.5	15.4	15.5
	17.96	SC	15.54	16.28	16.87	1568	17.13	50.9	15.4	16.2
n = 589		Liu parameter	0.614	0.658	0.698	0.635	0.602	11.99	0.377	0.1269
		Centered	16.70	16.75	16.87	17.28	17.69	16.57	16.59	16.59
	17.41	Scaled	16.87	16.98	16.87	17.14	17.98	16.72	16.59	16.59
	17.50	SC	17.58	17.25	17.32	17.58	17.85	37.55	16.90	17.09

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Araveeporn, A. Modified Liu Parameters for Scaling Options of the Multiple Regression Model with Multicollinearity Problem. Mathematics 2024, 12, 3139. https://doi.org/10.3390/math12193139

AMA Style

Araveeporn A. Modified Liu Parameters for Scaling Options of the Multiple Regression Model with Multicollinearity Problem. Mathematics. 2024; 12(19):3139. https://doi.org/10.3390/math12193139

Chicago/Turabian Style

Araveeporn, Autcha. 2024. "Modified Liu Parameters for Scaling Options of the Multiple Regression Model with Multicollinearity Problem" Mathematics 12, no. 19: 3139. https://doi.org/10.3390/math12193139

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Modified Liu Parameters for Scaling Options of the Multiple Regression Model with Multicollinearity Problem

Abstract

1. Introduction

2. Liu Regression

2.1. The Reparameterization of Liu Regression

2.2. Liu Parameter

3. Simulation Study

4. Application in Actual Data

5. Discussion

6. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI