Next Article in Journal
On the Theory of Methane Hydrate Decomposition in a One-Dimensional Model in Porous Sediments: Numerical Study
Next Article in Special Issue
Wild Bootstrap-Based Bias Correction for Spatial Quantile Panel Data Models with Varying Coefficients
Previous Article in Journal
An Artificial Rabbits’ Optimization to Allocate PVSTATCOM for Ancillary Service Provision in Distribution Systems
Previous Article in Special Issue
Homogeneity Test of Multi-Sample Covariance Matrices in High Dimensions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

K-L Estimator: Dealing with Multicollinearity in the Logistic Regression Model

1
Department of Epidemiology and Biostatistics, University of Medical Sciences, Ondo 220282, Nigeria
2
Department of Mathematics and Statistics, Florida International University, Miami, FL 33199, USA
3
Department of Statistics, University of Dar es Salaam, Dar es Salaam 65015, Tanzania
4
Department of Statistics, University of Sargodha, Sargodha 40100, Pakistan
5
Department of Statistics, Ladoke Akintola University of Technology, Ogbomoso 210214, Nigeria
6
Department of Mathematics, Insurance and Applied Statistics, Helwan University, Cairo 11732, Egypt
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(2), 340; https://doi.org/10.3390/math11020340
Submission received: 11 November 2022 / Revised: 23 December 2022 / Accepted: 29 December 2022 / Published: 9 January 2023
(This article belongs to the Special Issue Statistical Theory and Application)

Abstract

:
Multicollinearity negatively affects the efficiency of the maximum likelihood estimator (MLE) in both the linear and generalized linear models. The Kibria and Lukman estimator (KLE) was developed as an alternative to the MLE to handle multicollinearity for the linear regression model. In this study, we proposed the Logistic Kibria-Lukman estimator (LKLE) to handle multicollinearity for the logistic regression model. We theoretically established the superiority condition of this new estimator over the MLE, the logistic ridge estimator (LRE), the logistic Liu estimator (LLE), the logistic Liu-type estimator (LLTE) and the logistic two-parameter estimator (LTPE) using the mean squared error criteria. The theoretical conditions were validated using a real-life dataset, and the results showed that the conditions were satisfied. Finally, a simulation and the real-life results showed that the new estimator outperformed the other considered estimators. However, the performance of the estimators was contingent on the adopted shrinkage parameter estimators.

1. Introduction

Frisch [1] coined the term “multicollinearity” to describe the problem that occurs when the explanatory variables in a model are linearly related. This problem posed a severe threat to different regression models, e.g., the linear regression model (LRM), logistic regression model, Poisson regression model and gamma regression model. The parameters in the linear and logistic regression models are popularly estimated using the ordinary least squares (OLS) estimator and the maximum likelihood estimator (MLE), respectively. However, both estimators with multicollinearity possess high standard error, and occasionally, the estimated regression coefficients exhibit the wrong coefficient signs, making the conclusion doubtful [2,3]. The ridge regression estimator (RRE) and the logistic ridge estimator are notable alternatives to the OLS estimator in the LRM and the logistic regression model [4,5]. The Liu estimator is an alternative to the ridge estimator which accounts for multicollinearity in the LRM and the logistic regression model [6,7]. The modified ridge-type estimator is a two-parameter estimator that competes favorably with the ridge and Liu estimators [8,9]. Recently the K-L estimator emerged as another estimator in the ridge class and the Liu estimator with a single biasing parameter class [10]. The K-L estimator is a form of the Liu-type estimator with one parameter that minimizes the residual sum of squares with respect to the L2 norm with a prior information. The K-L estimator outperforms the RRE and Liu estimators based on the theoretical conditions. In this study, we developed the K-L estimator for parameter estimation with the logistic regression model, derived its statistical properties, performed a theoretical comparison with other estimators, and validated its performance by a simulation and a real-life application.
The organization of this paper is as follows. The proposed estimator is discussed in Section 2. A theoretical comparison of various estimators is presented in Section 3. A simulation study is conducted in Section 4. Real-life data are analyzed in Section 5. Finally, some concluding remarks are given in Section 6.

2. Proposed Estimator

Given that y i is a binary response variable, then the logistic regression model is defined as a Bernoulli distribution: y i ~ B e π i .
p ( y i ) = π i y i ( 1 π i ) ( 1 y i )
where π i = e x i T β 1 + e x i T β = 1 1 + e x i T β , i = 1,2 , , n and x i is the i t h row of X , which is an n × ( p + 1 ) matrix of explanatory variables, β is a ( p + 1 ) × 1 vector of regression coefficients and y i ~ B e π i . The parameters in the logistic regression model are estimated by the method of MLE. The MLE of β is
β ^ M L E = X T G ^ n X 1 X T G ^ n z ^
where G ^ n = d i a g π ^ i ( 1 π ^ i ) and z ^ i = log π ^ i + y i π ^ i π ^ i 1 π ^ i .
Multicollinearity among the explanatory variables affects the MLE. The variance of the regression parameter is often influenced by the presence of multicollinearity [11,12]. The RRE is an alternative to the MLE in linear and logistic regression models [4,5]. The logistic ridge estimator (LRE) is defined as:
β ^ L R E = X T G ^ n X + k I p 1 X T G ^ n X β ^ M L E ,
where I is an identity matrix, k (k > 0) is the ridge parameter, and G ^ n is the estimate of G using β ^ M L E . The ridge parameter [13] is defined as
k = ( p + 1 ) σ 2 j = 1 p + 1 α j 2
while the logistic version [14] is as follows:
k = ( p + 1 ) j = 1 p α j 2
The Liu estimator [6] is an alternative to the ridge estimator in the linear regression model, while the logistic Liu estimator (LLE) [7] is expressed as follows:
β ^ L L E = X T G ^ n X + I p 1 X T G ^ n X + d I p β ^ M L E ,
where d  0 < d < 1 is the Liu parameter. Further, we adopted the following method to compute Liu parameter d [15]:
d = m i n α j 2 1 λ j + α j 2
where max and min represent the maximum and minimum operators, respectively. Further, λ j represents the jth eigenvalue of the X T G ^ n X and α = Q T β ^ M L E , where Q is the eigenvector of X T G ^ n X .
Liu [16] proposed a two-parameter estimator called the Liu-type estimator. Inan and Erdogan [17] extended this work to the logistic regression model. The logistic Liu-type estimator (LLTE) is as follows:
β ^ L L T E = X T G ^ n X + k I p 1 X T G ^ n X d I p β ^ M L E
where k (k > 0) and d  < d < are the biasing parameters of the LLTE.
Ozkale and Kaciranlar [15] developed the two-parameter estimator (TPE) to mitigate multicollinearity in the LRM. Huang [18] developed the logistic TPE estimator (LTPE), defined as follows:
β ^ L T P E = X T G ^ n X + k I p 1 X T G ^ n X + k d I p β ^ M L E
where k (k > 0) and d  < d < are the biasing parameters. The biasing parameters are defined in Equations (5) and (7), respectively.
Recently, the K-L estimator (KLE) [10] has shown better performance than the ordinary least squares, the RRE and the LE for parameter estimation in the LRM. The KLE is defined as
β ^ K L E = ( X T X + k I p ) 1 ( X T X k I p ) β ^ M L E
where k (k > 0) is the KLE biasing parameter, which, as will be discussed in Section 3.6, was obtained by minimizing the mean squared error (MSE). However, in this study, we propose the logistic K-L estimator (LKLE) as
β ^ L K L E = ( X T G ^ n X + k I p ) 1 X T G ^ n X k I p β ^ M L E
The bias and the matrix mean squared error (MMSE) of the LKLE is obtained as follows:
The bias of the LKLE is as follows:
B ( β ^ L K L E ) = 2 k Q Λ k α
where Λ k = Λ + k I p 1 .
The variance of the LKLE is defined as follows:
C o v ( β ^ L K L E ) = Q ( Λ k I p ) Λ k Λ 1 Λ k ( Λ k I p ) Q T
where Λ k = Λ + k I p 1 .
Therefore, the MMSE and the scalar mean squared error (MSE) are, respectively, defined by
M M S E β ^ L K L E = Q ( Λ k I p ) Λ k Λ 1 Λ k ( Λ k I p ) Q T + 4 k 2 Q Λ k α α T Λ k Q T
and
M S E ( β ^ L K L E ) = j = 1 p λ j k 2 λ j λ j + k 2 + 4 k 2 j = 1 p α j 2 λ j + k 2
The MMSE and MSE of the MLE, LRE, LLE, LLTE and LTPE are given, respectively, as follows:
M M S E β ^ M L E = Q Λ 1 Q T
M S E β ^ M L E = i = 1 p 1 λ j
M M S E β ^ L R E = Q Λ k Λ Λ k Q T + k 2 Q Λ k α α T Λ k Q T
M S E β ^ L R E = j = 1 p λ j λ j + k 2 + k 2 j = 1 p α j 2 λ j + k 2
M M S E β ^ L L E = Q Λ d Λ 1 Λ d T Q T + Q Λ d I p α α T Λ d I p T Q T ,
where Λ d = Λ + I p 1 Λ + d I p .
M S E β ^ L L E = j = 1 p λ j + d 2 λ j λ j + 1 2 + d 1 2 α j 2 λ j + 1 2
M M S E β ^ L L T E = Q Λ k d Λ 1 Λ k d T Q T + Q d + k 2 Λ k α α T Λ k Q T
where Λ k d = Λ + k I p 1 Λ d I p .
M S E β ^ L L T E = j = 1 p λ j d 2 λ j λ j + k 2 + d + k 2 α j 2 λ j + k 2
M M S E β ^ L T P E = Q Λ k d Λ 1 Λ k d T Q T + Q k 2 1 d 2 Λ k α α T Λ k Q T
where Λ k d = Λ + k I p 1 Λ + k d I p .
M S E β ^ L T P E = j = 1 p λ j + k d 2 λ j λ j + k 2 + k 2 1 d 2 α j 2 λ j + k 2
The following lemmas are needful to prove the statistical properties of the proposed estimator.
Lemma 1.
Let M be a positive definite matrix, that is, M > 0, and α be some vector. Then M α α T 0 if and only if α T M 1 α 1 [19].
Lemma 2.
Let β ^ j = A j y , j = 1 , 2 be two linear estimators of β [20]. Suppose that D = C o v ( β ^ 1 ) C o v ( β ^ 2 ) > 0 , where C o v ( β ^ j ) , j = 1 , 2 denotes the covariance matrix of β ^ j and b j = B i a s ( β ^ j ) = A j X I β , j = 1 , 2 . Consequently,
Δ β ^ 1 β ^ 2 = M M S E ( β ^ 1 ) M M S E ( β ^ 2 ) = σ 2 D + b 1 b 1 T b 2 b 2 T > 0
if and only if b 2 T σ 2 D + b 1 b 1 T 1 b 2 < 1 , where M S E β ^ j = C o v β ^ j + b j T b j .

3. Comparison among the Estimators

In this section, we will perform a theoretical comparison of the proposed estimator with the available estimators in terms of MMSEs.

3.1. Comparison between β ^ M L E and β ^ L K L E

Theorem 1.
If k > 0, the estimator β ^ L K L E is preferable to the estimator β ^ M L E in the MMSE sense, if and only if, b T [ ( Λ 1 Λ k ( Λ k I p ) Λ 1 Λ k ( Λ k I p ) ) ] 1 b < 1 , where b = 2 k Λ k α .
Proof. 
M M S E β ^ M L E M M S E β ^ L K L E = Λ 1 Λ k ( Λ k I p ) Λ 1 Λ k ( Λ k I p ) 4 k 2 Λ k Λ k α α T
C o v β ^ M L E C o v β ^ L K L E = Λ 1 Λ k ( Λ k I p ) Λ 1 Λ k ( Λ k I p )
can be written in scalar form as follows:
C o v β ^ M L E C o v β ^ L K L E = Q   d i a g 1 λ j λ j k 2 λ j λ j + k 2 j = 1 p Q T
= Q   d i a g λ j + k λ j k λ j + k + λ j k λ j λ j + k 2 j = 1 p Q T
C o v β ^ M L E C o v β ^ L K L E is positive definite since 4 k λ j > 0 . Hence, using Lemma 2, M M S E β ^ M L E M M S E β ^ L K L E > 0 if and only if
4 k 2 α T Λ k [ ( Λ 1 Λ k ( Λ k I p ) Λ 1 Λ k ( Λ k I p ) ) ] 1 Λ k α < 1
Simplifying Equation (27), we have 4 k 2 α T Λ [ ( Λ + k ) 2 ( Λ k ) 2 ] 1 α < 1 . This was practically illustrated in Section 3.5 (Proof completed). □

3.2. Comparison between β ^ L R E and β ^ L K L E

Theorem 2.
If k > 0, the estimator β ^ L K L E is preferable to the estimator β ^ L R E in the MMSE sense, if and only if, b T [ ( Λ k Λ Λ k Λ k ( Λ k I p ) Λ 1 Λ k ( Λ k I p ) ) + b 1 T b 1 ] 1 b < 1 , where b = 2 k Λ k α and b 1 = k Λ k α .
Proof. 
M M S E β ^ L R E M M S E β ^ L K L E = Λ k Λ Λ k Λ k Λ k I p Λ 1 Λ k Λ k I p + b 1 b 1 T b b T
C o v β ^ L R E C o v β ^ L K L E = Λ k Λ Λ k Λ k Λ k I p Λ 1 Λ k Λ k I p
can be written in scalar form as follows:
C o v β ^ L R E C o v β ^ L K L E = Q   d i a g λ j λ j + k 2 λ j k 2 λ j λ j + k 2 j = 1 p Q T
= Q   d i a g λ j 2 λ j + k 2 λ j k 2 λ j + k 2 λ j λ j + k 2 j = 1 p Q T
C o v β ^ L R E C o v β ^ L K L E is positive definite since λ j 2 λ j + k 2 > λ j k 2 λ j + k 2 for k > 0 .
B β ^ L R E B β ^ L K L E = k λ j + k 1 α + 2 k λ j + k 1 α
= k λ j + k 1 α
B β ^ L R E B β ^ L K L E > 0
Hence, using Lemma 2, M M S E β ^ L R E M M S E β ^ L K L E > 0 if and only if
b T [ ( Λ k Λ Λ k Λ k ( Λ k I p ) Λ 1 Λ k ( Λ k I p ) ) + b 1 T b 1 ] 1 b < 1
Simplifying Equation (29), we have 4 k 2 α T [ Λ + k 2 α α T Λ 1 ( Λ k I p ) 2 ] 1 α < 1 . This was practically illustrated in Section 5 (Proof completed). □

3.3. Comparison between β ^ L L E and β ^ L K L E

Theorem 3.
If k > 0 and 0 < d < 1, the estimator β ^ L K L E is preferable to the estimator β ^ L L E in MMSE sense if and only if, b T [ ( Λ d Λ 1 Λ d T Λ k ( Λ k I p ) Λ 1 Λ k ( Λ k I p ) ) + b 2 T b 2 ] 1 b < 1 , where b = 2 k Λ k α and b 2 = 1 d Λ + I 1 α .
Proof. 
M M S E β ^ L L E M M S E β ^ L K L E = Λ d Λ 1 Λ d T Λ k Λ k I p Λ 1 Λ k Λ k I p + b 2 b 2 T b b T
C o v β ^ L L E C o v β ^ L K L E = Λ d Λ 1 Λ d T Λ k Λ k I p Λ 1 Λ k Λ k I p
can be written in scalar form as follows:
C o v β ^ L L E C o v β ^ L K L E = Q   d i a g λ j + d 2 λ j λ j + 1 2 λ j k 2 λ j λ j + k 2 j = 1 p Q T = Q   d i a g λ j + d 2 λ j + K 2 λ j K 2 λ j + 1 2 λ j λ j + K 2 λ j + 1 2 j = 1 p Q T
C o v β ^ L L E C o v β ^ L K L E since λ j + k 2 λ j + d 2 λ j k 2 λ j + 1 2 > 0 for k > λ j 1 d d + λ j + 1 and d > λ j 1 k k λ j + k .
B β ^ L L E B β ^ L K L E = ( 1 d ) λ j + 1 1 α + 2 k λ j + k 1 α
= α d i a g ( 1 d ) λ j + 1 2 2 k λ j + k j = 1 p
B β ^ L L E B β ^ L K L E = ( k + d λ j + k d + 2 k λ j λ j ) α > 0
Hence, using Lemma 2.2, M M S E β ^ L L E M M S E β ^ L K L E > 0 if and only if
b T [ ( Λ d Λ 1 Λ d T Λ k ( Λ k I p ) Λ 1 Λ k ( Λ k I p ) ) + b 2 T b 2 ] 1 b < 1
This was practically illustrated in Section 5 (Proof completed). □

3.4. Comparison between β ^ L L T E and β ^ L K L E

Theorem 4.
If k > 0 and < d < , the estimator β ^ L K L E is preferable to the estimator β ^ L L T E in MMSE sense if and only if, b T [ ( Λ k d Λ 1 Λ k d T Λ k ( Λ k I p ) Λ 1 Λ k ( Λ k I p ) ) + b 2 T b 2 ] 1 b < 1 , where Λ k d = Λ + k 1 Λ d I , b = 2 k Λ k α and b 2 = d + k Λ + k 1 α .
Proof. 
M M S E β ^ L L T E M M S E β ^ L K L E = Λ k d Λ 1 Λ k d T Λ k ( Λ k I p ) Λ 1 Λ k ( Λ k I p ) + b 2 b 2 T b b T
C o v β ^ L L T E C o v β ^ L K L E = Λ k d Λ 1 Λ k d T Λ k Λ k I p Λ 1 Λ k Λ k I p
can be written in scalar form as follows:
C o v β ^ L L T E C o v β ^ L K L E = Q   d i a g λ j d 2 λ j λ j + k 2 λ j k 2 λ j λ j + k 2 j = 1 p Q T = Q   d i a g λ j d 2 λ j k 2 λ j λ j + k 2 j = 1 p Q T + b 2 T b 2 b T b .
C o v β ^ L L T E C o v β ^ L K L E is non-negative (nn), since λ j d 2 > λ j k 2 .
B β ^ L L T E B β ^ L K L E = ( d + k ) λ j + k 1 α + 2 k λ j + k 1 α
= ( k d ) λ j + k 1 α
B β ^ L L T E B β ^ L K L E > 0 for k > d .
Hence, using Lemma 2.2, M M S E β ^ L L T E M M S E β ^ L K L E > 0 if and only if
b T [ ( Λ k d Λ 1 Λ k d T Λ k ( Λ k I p ) Λ 1 Λ k ( Λ k I p ) ) + b 2 T b 2 ] 1 b < 1
This was practically illustrated in Section 5 (Proof completed). □

3.5. Comparison between β ^ L T P E and β ^ L K L E

Theorem 5.
If k > 0 and < d < , the estimator β ^ L K L E is preferable to the estimator β ^ L T P E in MMSE sense if and only if, b T [ ( Λ k d Λ 1 Λ k d T Λ k ( Λ k I p ) Λ 1 Λ k ( Λ k I p ) ) + b 2 T b 2 ] 1 b < 1 , where Λ k d = Λ + k 1 Λ + k d I , b = 2 k Λ k α and b 2 = k 1 d Λ + k 1 α .
Proof. 
M M S E β ^ L T P E M M S E β ^ L K L E = Λ k d Λ 1 Λ k d T Λ k ( Λ k I p ) Λ 1 Λ k ( Λ k I p ) + b 2 b 2 T b b T
C o v β ^ L T P E C o v β ^ L K L E = Λ k d Λ 1 Λ k d T Λ k ( Λ k I p ) Λ 1 Λ k ( Λ k I p )
can be written in scalar form as follows:
C o v β ^ L T P E C o v β ^ L K L E = Q   d i a g λ j + k d 2 λ j λ j + k 2 λ j k 2 λ j λ j + k 2 j = 1 p Q T = Q   d i a g λ j + k d 2 λ j k 2 λ j λ j + k 2 j = 1 p Q T .
C o v β ^ L T P E C o v β ^ L K L E is non-negative (nn) since λ j + k d 2 > λ j k 2 .
B β ^ L T P E B β ^ L K L E = k ( 1 d ) λ j + k 1 α + 2 k λ j + k 1 α
= k ( 1 + d ) λ j + k 1 α
B β ^ L T P E B β ^ L K L E > 0
Hence, using Lemma 2, M M S E β ^ L T P E M M S E β ^ L K L E > 0 if and only if
b T [ ( Λ k d Λ 1 Λ k d T Λ k ( Λ k I p ) Λ 1 Λ k ( Λ k I p ) ) + b 2 T b 2 ] 1 b < 1
This was practically illustrated in Section 5 (Proof completed). □

3.6. Selection of k

Since the shrinkage parameter plays a significant role in estimating biased estimators such as the LRE, LLE and LKLE, several researchers have introduced various shrinkage parameter estimation methods for the different regression models [21,22,23,24,25,26,27,28,29]. Based on these studies, we propose some shrinkage estimators of the parameter k for the LKLE.
To estimate parameter k, following [4], we will consider the generalized version of KL- estimator, which is given as follows:
β ^ L K L E = ( X T G ^ n X + K I p ) 1 X T G ^ n X K I p β ^ M L E ,
where K = diag(k1, k2, …, kp).
The MSE of KL estimator in (44) would be
M S E ( β ^ L K L E ) = j = 1 p λ j k j 2 λ j λ j + k j 2 + 4 j = 1 p α j 2 k j 2 λ j + k j 2
Differentiating Equation (42) with respect to k j , (all terms except kj will be 0) and equating to 0, we have
2 λ j k j λ j λ j + k j 2 2 λ j k j 2 λ j λ j + k j 3 + 8 k j α j 2 λ j + k j 2 8 α j 2 k j 2 λ j + k j 3 = 0
2 λ j k j λ j + k j 2 λ j k j 2 + 8 λ j k j α j 2 λ j + k j 8 λ j α j 2 k j 2 λ j λ j + k j 3 = 0
2 λ j k j λ j + k j 2 λ j k j 2 + 8 λ j k j α j 2 λ j + k j 8 λ j α j 2 k j 2 = 0
λ j k j λ j + k j + λ j k j 2 4 λ j k j α j 2 λ j + k j + 4 λ j α j 2 k j 2 = 0
Simplifying further Equation (43), we have
2 λ j 2 2 λ j k j 4 k j λ j 2 α j 2 = 0
Dividing both sides of Equation (44) by 2 λ j , we obtain
k j = λ j 1 + 2 λ j α j 2
By replacing α   a n d   λ with its unbiased estimates, Equation (45) becomes
k ^ j = λ j ^ 1 + 2 λ ^ j α ^ j 2   ( j = 1 , 2 , , p )
Following Hoerl et al. [13], and based on the study of Mansson et al. [7], Lukman and Ayinde [3] and Qasim et al. [22,30], we suggest the following biasing parameter estimators for the logistic regression model:
  • LKLE 1: k = m i n 1 α ^ j 2
  • LKLE 2: k = min k ^ j
  • LRE 1: k = m i n 1 α ^ j 2
  • LRE 2: k = p j = 1 p α ^ j 2
  • LLE: d = min α j 2 1 λ j + α j 2
  • LLTE: k = p j = 1 p α ^ j 2 , d = min α j 2 1 λ j + α j 2
  • LTPE: k = p j = 1 p α ^ j 2 , d = min α j 2 1 λ j + α j 2

4. Monte Carlo Simulation

In this section, we compare the performance of the logistic regression estimators using a simulation study. A significant number of simulation studies have been conducted to compare the performance of estimators for both linear and logistic regression models [24,25,26,27,28,29,30,31,32,33,34,35]. The MSE is a function of β , σ 2 ,  p and is minimized subject to constraint β′β = 1 [36,37]. Schaefer [14] showed that the logistic regression model can be designed employing a similar approach to that of the linear regression model. The correlated explanatory variables can be obtained using the simulation procedure given in [38,39].
x i j = 1 ρ 2 1 / 2 w i j + ρ w i ( j + 1 ) , i = 1,2 , , n ; j = 1,2 , , p ,
where w i j are independent standard normal pseudo-random numbers and ρ is the correlation between the explanatory variables. The values of ρ are chosen to be 0.9, 0.95, 0.99 and 0.999. The response variable is generated from the Bernoulli distribution, i.e., y i ~ B e π i , where π i = e x i T β 1 + e x i T β . Sample size, n, is varied, i.e., 50, 100, 250 or 300. The estimated MSE is calculated as
M S E β ^ = 1 2000 i = 1 2000 β ^ i β T β ^ i β
where β ^ i denotes the vector of the estimated regression coefficient in ith replication and β is the vector of the true parameter values, chosen such that β β = 1 . The experiment was replicated 2000 times. We present the estimated MSEs and the bias of each of the estimators for p = 3 in Table 1 and Table 2, respectively. For p = 7, the results are provided in Table 3 and Table 4, respectively. We observed that increasing the sample size resulted in a decrease in the MSE values for each case. The following observations were obtained from the simulation result. The MSE values of the estimators increased as the degree of correlation and the number of explanatory variables increased. The simulation results show that the LKLE performed best at most levels of multicollinearity, sample sizes and the number of explanatory variables with few exceptions. The LTPE competed favorably in most cases, except on a few occasions. Upon comparing the performance of the shrinkage parameters in the LKLE, we found that LKLE 1 performed well except in a few cases. The MLE performed least well when there was multicollinearity in the data. Of the two-parameter estimators (LTPE and LLTE), LTPE performed better. Additionally, it is obvious that the bias of the proposed estimator was the lowest in most cases. Generally, the LKLE estimator is preferred over the two-parameter estimator.

5. Application: Cancer Data

The performance of LKLE and the other estimators was evaluated using a cancer remission dataset [34,40]. In the dataset, the binary response variable yi is 1 if the patient experiences complete cancer remission and 0 otherwise. There are five explanatory variables. These explanatory variables include cell index (x1), smear index (x2), infıl index (x3), blast index (x4) and temperature (x5). There were 27 patients, of which nine experienced complete remission. The eigenvalues of the X G ^ n X matrix were found to be λ1 = 9.2979, λ2 = 3.8070, λ3 = 3.0692, λ4 = 2.2713 and λ5 = 0.0314. To test the multicollinearity among the explanatory variables, we use condition index (CI), computed as C I = max ( λ j ) min ( λ j ) = 17.2. There was moderate collinearity when CI was between 10 and 30 and severe multicollinearity when CI exceeded 30 [41]. Thus, the results provide evidence of moderate multicollinearity among the explanatory variables. Next, we compared the performance of the estimators using the previously described dataset. The estimated regression coefficients and the corresponding scalar MSE values are given in Table 5. The scalar MSEs of each of the estimators under study were obtained using Equations (17), (19), (21) and (23)–(25), respectively. The proposed LKLE estimator surpassed the other estimators in this study in terms of MSE.
Moreover, we also evaluated the theoretical conditions as stated in Theorems 1 to 5 for the actual dataset. The validation results of these conditions are given in Table 6. As shown, all the theorem conditions hold for the cancer data, because all the inequalities in the theorems were less than one, as expected.
The logistic ridge estimator competed favorably in the simulation and the real-life application. The real-life application result agreed with the simulation study. However, the performance of the estimators in both the simulation and real life was a function of the biasing parameter. For instance, LKLE 1 performed best in the simulation study, while in the real-life analysis, LKLE 2 outperformed LKLE 1. Among the two-parameter estimators, the logistic two-parameter estimator (LTPE) performed best. Of the one-parameter estimators, LKLE outperformed the ridge and the Liu estimator. Generally, LKLE dominated among both the one- and two-parameter estimator. The performance of these estimators is a function of biasing parameters k and d. Additionally, as shown in Table 5 β ^ 2 and β ^ 3 did not fit well for the following estimators: MLE, LLE, LLTE, LTPE and LKLE 1.

6. Some Concluding Remarks

Kibria and Lukman (2020) developed the K-L estimator to circumvent the multicollinearity problem for the linear regression model. In this paper, we described the logistic Kibria-Lukman estimator (LKLE) to address the challenge of multicollinearity for the logistic regression model. We theoretically determined the superiority of LKLE over other existing estimators in terms of the MSE. The performance of the estimators was evaluated using the Monte Carlo simulation study. In the design of the experiment, factors such as the degree of correlation, the sample size and the number of explanatory variables were varied. The results showed that the performance of the estimators was highly dependent on these factors. Finally, to illustrate the efficiency of the proposed estimator, we applied a cancer dataset and observed that the results agreed with those of the simulation study to some extent. The findings of this study will be helpful for practitioners and applied researchers who use a logistic regression model with correlated explanatory variables.

Author Contributions

A.F.L.: Conceptualization, Methodology, Formal analysis, Software, Writing—original draft. B.M.G.K.: Conceptualization, Supervision, Review. R.F.: Writing—original draft, Resources, Review. M.A.: Methodology, Super vision, Writing—original draft. E.T.A.: Methodology, Formal analysis, Software, Writing—original draft. C.K.N.: Writing—original draft, Review. All authors have read and agreed to the published version of the manuscript.

Funding

Authors received no funding for this work.

Data Availability Statement

The Data is available as this article.

Conflicts of Interest

Authors declared no conflict of interest.

References

  1. Frisch, R. Statistical Confluence Analysis by Means of Complete Regression Systems; University Institute of Economics: Oslo, Norway, 1934. [Google Scholar]
  2. Kibria, B.M.G.; Mansson, K.; Shukur, G. Performance of some logistic ridge regression estimators. Comp. Econ. 2012, 40, 401–414. [Google Scholar] [CrossRef]
  3. Lukman, A.F.; Ayinde, K. Review and classifications of the ridge parameter estimation techniques. Hacet. J. Math. Stat. 2017, 46, 953–967. [Google Scholar] [CrossRef]
  4. Hoerl, A.E.; Kennard, R.W. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
  5. Schaeffer, R.L.; Roi, L.D.; Wolfe, R.A. A ridge logistic estimator. Commun. Stat. Theory Methods 1984, 13, 99–113. [Google Scholar] [CrossRef]
  6. Liu, K. A new class of biased estimate in linear regression. Commun. Stat. 1993, 22, 393–402. [Google Scholar]
  7. Mansson, K.; Kibria, B.M.G.; Shukur, G. On Liu estimators for the logit regression model. Econ. Model. 2012, 29, 1483–1488. [Google Scholar] [CrossRef] [Green Version]
  8. Lukman, A.F.; Ayinde, K.; Binuomote, S.; Onate, A.C. Modified ridge-type estimator to combat multicollinearity: Application to chemical data. J. Chemomet. 2019, 33, e3125. [Google Scholar] [CrossRef]
  9. Lukman, A.F.; Adewuyi, E.; Onate, A.C.; Ayinde, K. A Modified Ridge-Type Logistic Estimator. Iran. J. Sci. Technol. Trans. A Sci. 2020, 44, 437–443. [Google Scholar] [CrossRef]
  10. Kibria, B.M.G.; Lukman, A.F. A new ridge type estimator for the linear regression model: Simulations and applications. Scientifica 2020, 2020, 9758378. [Google Scholar] [CrossRef]
  11. Lukman, A.F.; Ayinde, K.; Aladeitan, B.; Bamidele, R. An unbiased estimator with prior information. Arab. J. Basic Appl. Sci. 2020, 27, 45–55. [Google Scholar] [CrossRef]
  12. Dawoud, I.; Lukman, A.F.; Haadi, A. A new biased regression estimator: Theory, simulation and application. Sci. Afr. 2022, 15, e01100. [Google Scholar] [CrossRef]
  13. Hoerl, A.E.; Kennard, R.W.; Baldwin, K.F. Ridge regression: Some simulation. Commun. Stat. Theory Methods 1975, 4, 105–123. [Google Scholar] [CrossRef]
  14. Schaeffer, R.L. Alternative estimators in logistic regression when the data is collinear. J. Stat. Comput. Simul. 1986, 25, 75–91. [Google Scholar] [CrossRef]
  15. Özkale, M.R.; Kaciranlar, S. The restricted and unrestricted two-parameter estimators. Commun. Statist. Theor. Meth 2007, 36, 2707–2725. [Google Scholar] [CrossRef]
  16. Liu, K. Using Liu-type estimator to combat collinearity. Commun. Stat.-Theory Methods 2003, 32, 1009–2003. [Google Scholar] [CrossRef]
  17. Inan, D.; Erdogan, B.E. Liu-Type logistic estimator. Commun. Stat. Simul. Comput. 2013, 42, 1578–1586. [Google Scholar] [CrossRef]
  18. Huang, J. A Simulation Research on a Biased Estimator in Logistic Regression Model. In Computational Intelligence and Intelligent Systems. ISICA 2012. Communications in Computer and Information Science; Li, Z., Li, X., Liu, Y., Cai, Z., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; Volume 316. [Google Scholar] [CrossRef]
  19. Farebrother, R.W. Further results on the mean square error of ridge regression. J. R. Stat. Soc. Ser. B 1976, 38, 248–250. [Google Scholar] [CrossRef]
  20. Trenkler, G.; Toutenburg, H. Mean squared error matrix comparisons between biased estimators—An overview of recent results. Stat. Pap. 1990, 31, 165–179. [Google Scholar] [CrossRef]
  21. Kibria, B.M.G. Performance of some new ridge regression estimators. Commun. Stat. Simul. Comput. 2003, 32, 419–435. [Google Scholar] [CrossRef]
  22. Qasim, M.; Amin, M.; Ullah, M.A. On the performance of some new liu parameters for the gamma regression model. J. Stat. Comput Simul 2018, 88, 3065–3080. [Google Scholar] [CrossRef]
  23. Amin, M.; Akram, M.N.; Majid, A. On the estimation of Bell regression model using ridge estimator. Commun. Stat. Simul. Comput. 2021. [Google Scholar] [CrossRef]
  24. Lukman, A.F.; Zakariya, A.; Kibria, B.M.G.; Ayinde, K. The KL estimator for the inverse gaussian regression model. Concurrency Computat. Pract. Exper. 2021, 33, e6222. [Google Scholar] [CrossRef]
  25. Lukman, A.F.; Aladeitan, B.; Ayinde, K.; Abonazel, M.R. Modified ridge-type for the Poisson Regression Model: Simulation and Application. J. Appl. Stat. 2021, 49, 2124–2136. [Google Scholar] [CrossRef] [PubMed]
  26. Lukman, A.F.; Adewuyi, E.; Månsson, K.; Kibria, B.M.G. A new estimator for the multicollinear poisson regression model: Simulation and application. Sci. Rep. 2021, 11, 3732. [Google Scholar] [CrossRef]
  27. Amin, M.; Qasim, M.; Amanullah, M.; Afzal, S. Performance of some ridge estimators for the gamma regression model. Stat. Pap. 2020, 61, 997–1026. [Google Scholar] [CrossRef]
  28. Amin, M.; Qasim, M.; Afzal, S.; Naveed, M. New ridge estimators in the inverse Gaussian regression: Monte Carlo simulation and application to chemical data. Commun. Stat. Simul. Comput. 2020, 51, 6170–6187. [Google Scholar] [CrossRef]
  29. Naveed, M.; Amin, M.; Afzal, S.; Qasim, M. New shrinkage parameters for the inverse Gaussian liu regression. Commun. Stat. Theory Methods 2020, 51, 3216–3236. [Google Scholar] [CrossRef]
  30. Qasim, M.; Amin, M.; Omer, T. Performance of some new Liu parameters for the linear regression model. Commun. Stat. Theory Methods 2020, 49, 4178–4196. [Google Scholar] [CrossRef]
  31. Ayinde, K.; Lukman, A.F.; Samuel, O.O.; Ajiboye, S.A. Some new adjusted ridge estimators of linear regression model. Int. J. Civil Eng. Technol. 2018, 9, 2838–2852. [Google Scholar]
  32. Asar, Y.; Genç, A. Two-parameter ridge estimator in the binary logistic regression. Commun. Stat.-Simul. Comput. 2017, 46, 7088–7099. [Google Scholar] [CrossRef]
  33. Kibria, B.M.G.; Banik, S. Some ridge regression estimators and their performances. J. Mod. Appl. Stat. Methods 2016, 15, 206–238. [Google Scholar] [CrossRef]
  34. Özkale, M.R.; Arıcan, E. A new biased estimator in logistic regression model. Statistics 2016, 50, 233–253. [Google Scholar] [CrossRef]
  35. Varathan, N.; Wijekoon, P. Optimal generalized logistic estimator. Commun. Stat. Theory Methods 2018, 47, 463–474. [Google Scholar] [CrossRef]
  36. Saleh, A.K.; Md, E.; Arashi, M.; Kibria, B.M.G. Theory of Ridge Regression Estimation with Applications; John Wiley: Hoboken, NJ, USA, 2019. [Google Scholar]
  37. Newhouse, J.P.; Oman, S.D. An Evaluation of Ridge Estimators; P-716-PR; Rand Corporation: Santa Monica, CA, USA, 1971; pp. 1–28. [Google Scholar]
  38. Gibbons, D.G. A simulation study of some ridge estimators. J. Am. Stat. Assoc. 1981, 76, 131–139. [Google Scholar] [CrossRef]
  39. McDonald, G.; Galarneau, D.I. A Monte Carlo evaluation of some ridge-type estimators. J. Am. Stat. Assoc. 1975, 70, 407–416. [Google Scholar] [CrossRef]
  40. Lesaffre, E.; Marx, B.D. Collinearity in generalized linear regression. Commun. Stat. Theory Methods 1993, 22, 1933–1952. [Google Scholar] [CrossRef]
  41. Gujarati, D.N. Basic Econometrics; McGraw-Hill: New York, NY, USA, 1995. [Google Scholar]
Table 1. Estimated MSEs and Bias for p = 3.
Table 1. Estimated MSEs and Bias for p = 3.
n50505050100100100100
ρ0.90.950.990.9990.90.950.990.999
MLEMSE1.48372.566811.6012112.1320.86481.39795.640453.8
LLEMSE1.05971.48935.002947.6740.74921.04092.644921.6
BIAS−0.7965−0.8000−0.8029−0.8034−0.8014−0.8030−0.8080−0.8126
LRE 1MSE0.98671.44925.429149.5070.66770.90902.761123.9
BIAS−0.8349−0.8190−0.7934−0.7802−0.8323−0.8234−0.8098−0.8039
LRE 2MSE0.83321.11873.721532.7190.60400.75381.962415.9
BIAS−0.8967−0.8651−0.8150−0.7906−0.8695−0.8517−0.8225−0.8089
LLTEMSE0.69270.80637.394750.1250.59830.63942.667948
BIAS−0.9207−0.9011−0.9448−1.9698−0.8751−0.8595−0.8447−1.0067
LTPEMSE0.72380.87322.300218.5850.59110.63841.35639.82
BIAS−0.9322−0.8974−0.8384−0.8075−0.8897−0.8697−0.8342−0.8164
LKLE 1MSE0.74580.80431.651011.5630.59180.63031.05336.21
BIAS−1.0436−0.9752−0.8734−0.8256−0.9502−0.9131−0.8536−0.8251
LKLE 2MSE0.95981.40175.214247.2680.65820.89162.661022.7
BIAS−0.8186−0.8032−0.7848−0.7790−0.8224−0.8136−0.8038−0.8029
Table 2. Estimated MSEs and Bias for p = 3.
Table 2. Estimated MSEs and Bias for p = 3.
n250250250250300300300300
ρ0.90.950.990.9990.90.950.990.999
MLEMSE0.51120.70832.295820.40.48340.65321.991816.931
LLEMSE0.49970.65301.48198.360.47570.61191.33446.708
BIAS−0.8273−0.8256−0.8239−0.8271−0.8189−0.8180−0.8187−0.8198
LRE 1MSE0.48470.58441.28039.070.46590.55501.15237.558
BIAS−0.8497−0.8417−0.8290−0.8244−0.8389−0.8329−0.8247−0.8182
LRE 2MSE0.48250.54210.98586.140.46560.51920.90115.118
BIAS−0.8746−0.8612−0.8381−0.8275−0.8603−0.8501−0.8331−0.8207
LLTEMSE0.47940.52390.678510.230.46340.50590.65807.865
BIAS−0.8766−0.8637−0.8438−0.8609−0.8618−0.8519−0.8370−0.8414
LTPEMSE0.47850.50830.76903.980.46310.48940.72013.398
BIAS−0.8873−0.8723−0.8456−0.8314−0.8726−0.8608−0.8404−0.8241
LKLE 1MSE0.23440.35760.62502.810.30870.32850.57712.364
BIAS−0.9275−0.9024−0.8594−0.8367−0.9053−0.8856−0.8515−0.8281
LKLE 2MSE0.43440.50271.25318.660.46250.55331.13037.200
BIAS−0.8469−0.8385−0.8261−0.8237−0.8358−0.8291−0.8211−0.8170
Table 3. Estimated MSEs and Bias for p = 7.
Table 3. Estimated MSEs and Bias for p = 7.
n50505050100100100100
ρ0.90.950.990.9990.90.950.990.999
MLEMSE6.275411.525161.55852.58654.925123.922236
LLEMSE1.93362.358110.21171.55052.08024.72040.9
BIAS−1.4187−1.4699−1.5091−1.5071−1.3487−1.3548−1.4039−1.4231
LRE 1MSE3.50136.005631.82971.61692.888713.218128
BIAS−1.3122−1.3329−1.3042−1.3101−1.3431−1.3225−1.3280−1.3272
LRE 2MSE1.64202.678813.21220.93221.50266.27359.3
BIAS−1.4628−1.4578−1.4198−1.4181−1.4314−1.3944−1.3825−1.3761
LLTEMSE1.26821.840912004770.84721.216315.024188
BIAS−1.5043−1.5429−2.2742−10.5093−1.4396−1.4161−1.5587−3.0291
LTPEMSE1.34882.05368.96780.80761.23584.74743.2
BIAS−1.4979−1.4971−1.4701−1.4802−1.4483−1.4162−1.4102−1.4045
LKLE 1MSE0.71490.99883.43310.69390.70741.71014.5
BIAS−1.7501−1.7118−1.6656−1.6547−1.5857−1.5257−1.4894−1.4755
LKLE 2MSE3.36465.729630.12811.58562.823612.878125
BIAS−1.3135−1.3373−1.3132−1.3202−1.3416−1.3222−1.3293−1.3294
Table 4. Estimated MSEs and Bias for p = 7.
Table 4. Estimated MSEs and Bias for p = 7.
n250250250250300300300300
ρ0.90.950.990.9990.90.950.990.999
MLEMSE1.02841.76627.5872.90.88941.48416.227460.1
LLEMSE0.88941.30012.6413.50.79441.16112.43469.43
BIAS−1.4056−1.4033−1.4287−1.4415−1.3846−1.3865−1.3990−1.4195
LRE 1MSE0.76321.18344.4040.50.68151.01953.652333.6
BIAS−1.4242−1.4107−1.4121−1.4083−1.4047−1.3981−1.3904−1.3917
LRE 2MSE0.54740.73252.2319.20.50880.65721.891516.2
BIAS−1.4755−1.4483−1.4333−1.4245−1.4518−1.4331−1.4089−1.4063
LLTEMSE0.53920.70051.621150.50330.63411.463747
BIAS−1.4768−1.4502−1.4422−1.5435−1.4529−1.4353−1.4163−1.4908
LTPEMSE0.51170.65301.7814.20.48080.58921.536112.6
BIAS−1.4849−1.4559−1.4403−1.4332−1.4619−1.4433−1.4169−1.4139
LKLE 1MSE0.53440.55760.72502.810.50870.52850.67712.364
BIAS−1.5577−1.5105−1.4735−1.4582−1.5285−1.4925−1.4471−1.4375
LKLE 2MSE0.48440.58271.25318.660.46550.55331.13037.200
BIAS−1.4217−1.4089−1.4117−1.4085−1.4026−1.3968−1.3900−1.3919
Table 5. Validation of the theoretical conditions for the cancer data.
Table 5. Validation of the theoretical conditions for the cancer data.
TheoremsConditionsValue
1 b T [ ( Λ 1 Λ k ( Λ k I p ) Λ 1 Λ k ( Λ k I p ) ) ] 1 b < 1 0.2413
2 b T [ ( Λ k Λ Λ k Λ k ( Λ k I p ) Λ 1 Λ k ( Λ k I p ) ) + b 1 T b 1 ] 1 b < 1 0.8866
3 b T [ ( Λ d Λ 1 Λ d T Λ k ( Λ k I p ) Λ 1 Λ k ( Λ k I p ) ) + b 2 T b 2 ] 1 b < 1 0.6443
4 b T [ ( Λ k d Λ 1 Λ k d T Λ k ( Λ k I p ) Λ 1 Λ k ( Λ k I p ) ) + b 2 T b 2 ] 1 b < 1 0.0958
5 b T [ ( Λ k d Λ 1 Λ k d T Λ k ( Λ k I p ) Λ 1 Λ k ( Λ k I p ) ) + b 2 T b 2 ] 1 b < 1 0.7540
Table 6. Regression coefficients and MSEs of the logistic regression estimators for the cancer dataset *.
Table 6. Regression coefficients and MSEs of the logistic regression estimators for the cancer dataset *.
Estimators β ^ 1 β ^ 2 β ^ 3 β ^ 4 β ^ 5 Estimated MSEs
MLE−0.1966−1.59571.81391.3073−0.420832.9393
(0.328)(0.513)(0.571)(0.664)(5.639)
LLE0.1940−0.47510.58081.0259−0.30785.0989
(0.305)(0.498)(0.571)(0.592)(1.645)
LRE 10.3706−0.22180.22661.1366−0.34571.4278
(0.318)(0.489)(0.544)(0.605)(0.652)
LRE 20.3503−0.09990.14720.9838−0.28831.2544
(0.303)(0.504)(0.584)(0.596)(0.461)
LLTE0.53730.4116−0.42270.8732−0.24304.1372
(0.295)(0.536)(0.643)(0.624)(1.720)
LTPE0.49490.2955−0.29330.8983−0.25333.8161
(0.310)(0.491)(0.556)(0.592)(1.679)
LKLE 10.89721.3960−1.51950.6604−0.155828.9236
(0.278)(0.400)(0.446)(0.418)(5.208)
LKLE 20.4696−0.12120.05761.2298−0.38151.1350
(0.326)(0.505)(0.560)(0.646)(0.084)
* Standard error for each of the estimators is in parenthesis.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lukman, A.F.; Kibria, B.M.G.; Nziku, C.K.; Amin, M.; Adewuyi, E.T.; Farghali, R. K-L Estimator: Dealing with Multicollinearity in the Logistic Regression Model. Mathematics 2023, 11, 340. https://doi.org/10.3390/math11020340

AMA Style

Lukman AF, Kibria BMG, Nziku CK, Amin M, Adewuyi ET, Farghali R. K-L Estimator: Dealing with Multicollinearity in the Logistic Regression Model. Mathematics. 2023; 11(2):340. https://doi.org/10.3390/math11020340

Chicago/Turabian Style

Lukman, Adewale F., B. M. Golam Kibria, Cosmas K. Nziku, Muhammad Amin, Emmanuel T. Adewuyi, and Rasha Farghali. 2023. "K-L Estimator: Dealing with Multicollinearity in the Logistic Regression Model" Mathematics 11, no. 2: 340. https://doi.org/10.3390/math11020340

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop