Next Article in Journal
Novel Formulas of Schröder Polynomials and Their Related Numbers
Previous Article in Journal
The Optical Path Method for the Problem of Oblique Incidence of a Plane Electromagnetic Wave on a Plane-Parallel Scatterer
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Generalized Linear Transformation and Its Effects on Logistic Regression

1
Independent Researcher, Plano, TX 75024, USA
2
School of Mathematical Sciences, University of Nottingham, Nottingham NG7 2RD, UK
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(2), 467; https://doi.org/10.3390/math11020467
Submission received: 20 November 2022 / Revised: 8 January 2023 / Accepted: 10 January 2023 / Published: 15 January 2023
(This article belongs to the Section Probability and Statistics)

Abstract

:
Linear transformations such as min–max normalization and z-score standardization are commonly used in logistic regression for the purpose of scaling. However, the work in the literature on linear transformations in logistic regression has two major limitations. First, most work focuses on improving the fit of the regression model. Second, the effects of transformations are rarely discussed. In this paper, we first generalized a linear transformation for a single variable to multiple variables by matrix multiplication. We then studied various effects of a generalized linear transformation in logistic regression. We showed that an invertible generalized linear transformation has no effects on predictions, multicollinearity, pseudo-complete separation and complete separation. We also showed that multiple linear transformations do not have effects on the variance inflation factor (VIF). Numeric examples with a real data were presented to validate our results. Our results of no effects justify the rationality of linear transformations in logistic regression.

1. Introduction

Logistic regression is one of the most commonly used techniques for modeling the relationship between the dependent variable and one or more independent variables.
In data analysis and machine learning, a transformation refers to a mapping of a variable into a new variable. A transformation can be linear or nonlinear, depending on whether the mapping is linear or nonlinear. Linear transformations can be used to improve interpretability of coefficients in linear regression and make a fitted model easier to understand [1], whereas nonlinear transformations are often used to improve the fit of the model on the data [2].
Three types of linear transformations are commonly used in machine learning prior to model fitting, namely, min–max normalization, z-score standardization and simple scaling. Since different variables that are measured in different scales may not contribute equally to model fitting, min–max normalization is used to transform all continuous variables into the same range [0, 1] to avoid a possible bias. Essentially, min–max normalization subtracts the minimum value of a continuous variable from each value and then divides by the range of the variable. z-score standardization rescales continuous variables to the standard scale, i.e., how far it is from the mean. Mathematically, z-score standardization subtracts the mean value of a continuous variable from each value and then divides by the standard deviation of the variable. Simple scaling shrinks or expands a continuous variable with big values and small values, respectively. The three types of linear transformations are all discussed by Adeyemo, Wimmer and Powell [3] for logistic regression.
However, the work in the literature on transformations in regression have some limitations. First, most work focuses on improving the fit of the regression model [4,5,6,7,8,9]. Second, the effects of transformations are rarely discussed. Morrell, Pearson, and Brant [10] examined how linear transformations affected a linear mixed-effect model and the tests of significance of fixed effects in the model. They showed how linear transformations modified the random effects, and their covariance matrix and the value of the restricted log-likelihood. Zeng [11] studied invariant properties of some statistical measures under monotonic transformations for univariate logistic regression. Zeng [12] derived analytic properties of some well-known category encodings such as ordinal encoding, order encoding and one-hot encoding in multivariate logistic regression by means of linear transformations. Adeyemo, Wimmer and Powell [3] compared the prediction accuracy of the three types of linear transformations, min–max normalization, z-score standardization and simple scaling, in logistic regression, by means of simulation.
In this paper, we first generalized a linear transformation for a single variable to multiple variables by a matrix multiplication. We then studied various effects of a generalized linear transformation in logistic regression. We showed that an invertible generalized linear transformation has no effects on predictions, multicollinearity, pseudo-complete separation, and complete separation. We also showed that multiple linear transformations do not have effects on the variance inflation factor (VIF). Numeric examples with randomly generated transformations from a real data were presented to illustrate our theoretic results.
The remainder of this paper is organized as follows. In Section 2, we give two definitions of a generalized linear transformation and show that they are equivalent. In Section 3, we study the effects of a generalized linear transformation on logistic regression. In Section 4, we present numeric examples to validate our theoretic results. Finally, the paper is concluded in Section 5.
Throughout the paper, we concentrate on transformations of independent variables, which are also sometimes called explanatory variables.

2. A Generalized Linear Transformation in Logistic Regression

Let x = ( x 1 ,   x 2 ,   ,   x p ) be the vector of p independent variables and y be the dependent variable. Let us consider a sample of n independent observations ( x i 1 ,   x i 2 ,   ,   x i p ,   y i ) ,   i = 1 ,   2 ,   ,   n , where y i is the value of y and x i 1 , x i 2 ,   ,   x i p   the values of p independent variables x 1 ,   x 2 ,   ,   x p   for the i -th observation. Without loss of generality, we assume x 1 ,   x 2 ,   ,   x p are all continuous variables since otherwise they can be converted into continuous variables.
Let us adopt the matrix notation:
Y = ( y 1 y 2 y n ) ,   X = ( x 10 x 11 x 1 p x 20 x 21 x 2 p x n 0 x n 1 x n p ) ,   β = ( β 0 β 1 β p )
where x i 0 = 1 for all i (used for intercept β 0 ) and matrix X is called the design matrix. Here, β 0 ,   β 1 ,   ,   β p are called regression coefficients or regression parameters.
Without causing confusion, we also use x 0 , x 1 ,   ,   x p to denote the ( p + 1 ) columns or column vectors of X . We further use capital letter X i to denote the row vector ( 1 , x i 1 , x i 2 ,   ,   x i p ) for i = 1 ,   2 ,   ,   n .
Definition 1.
A linear transformation is a linear function of a variable which maps or transforms the variable into a new one. Specifically, a linear transformation of variable x can be defined as x t = a x + b , where a and b are constants and a is nonzero. For convenience, let us call a linear transformation of a single variable a simple linear transformation. By multiple linear transformations, we mean a set of simple linear transformations. Here, we use letter t in the superscript to denote the new variable after a transformation.
Note that a and b in Definition 1 are not vectors since x is a variable.
Definition 1 can be generalized naturally by matrix multiplication to transform a set of variables to a new set of variables.
Definition 2.
A generalized linear transformation is a linear matrix-vector expression
( x 1 t x 2 t x p t ) = A ( x 1 x 2 x p ) + ( b 1 b 2 b p )  
that transforms or maps independent variables x 1 ,   ,   x p   into new independent variables x 1 t ,   x 2 t ,   ,   x p t , where A = ( a i j ) is a p × p matrix of real numbers and b 1 ,   b 2 ,   ,   b p are real constants. Here, x 1 ,   ,   x p are variables not vectors.
It should not be confused with the linear transformation between two vector spaces, in which there is no vector b = ( b 1 ,   b 2 ,   ,   b p ) . Here and hereafter, we use the prime symbol ′ in the superscript for the transpose of a vector or a matrix. The new variables x 1 t ,   x 2 t ,   ,   x p t in the component forms are
b 1 + j = 1 p a 1 j x j ,   b 2 + j = 1 p a 2 j x j ,   ,   b p + j = 1 p a p j x j .  
Consider a simple linear transformation, x i t = a x i + b , for some x i with 1 i p . Without loss of generality, assume i = 1 . Let A be a p -dimensional diagonal matrix with a 11 = a and a 22 = a 33 = = a p p = 1 . Let b = ( b , 0 ,   , 0 ) be a p -dimensional column vector. Then x 1 ,   x 2 , ,   x p   are transformed into to x 1 t ,   x 2 , ,   x p according to Definition 2. Similarly, consider a set of simple linear transformations, say, x i t = a i x i + b i for 1 i r   with 2   r   p . Let A be a p -dimensional diagonal matrix with a i i = a i   for 1 i r   and a i i = 1   for i = r + 1 ,   r + 2 ,   . ,   p . Let b = ( b i , b 2 ,   , b r ,   0 ,   , 0 ) be a p dimensional column vector. Then x 1 ,   x 2 , ,   x p   are transformed into to x 1 t ,   x 2 t ,   , ,   x r t ,   x r + 1 ,   x r + 2 ,   ,   x p according to Definition 2. Hence, both a simple linear transformation and multiple linear transformations are a special case of a generalized linear transformation.
However, Definition 2 is not convenient to use since the new design matrix issomewhat complicated. Therefore, we give another definition incorporated with the design matrix.
Definition 3.
A generalized linear transformation is a matrix multiplication X C that transforms x 1 ,   ,   x p   into x 1 t ,   x 2 t ,   ,   x p t , where x 1 t ,   x 2 t ,   ,   x p t are the 2nd to the last column of X C and C is a ( p + 1 ) × ( p + 1 ) matrix of real numbers as follows
C = ( 1 c 11 c 12 c 1 p 0 c 21 c 22 c 2 p 0 c p 1 c p 2 c p p 0 c p + 1 , 1 c p + 1 , 2 c p + 1 ,   p ) .  
Note that we request the first column of C to be 0 except the first entry (which is 1) in order for X C to be the new design matrix.
For convenience, let us partition C into 4 blocks such that C = ( 1 C 1 0 C 11 ) , where C 1 is the p -dimensional row vector ( c 11   c 12   c 1 p ) , 0 is the p -dimensional column vector of all 0′s and C 11 is the p × p submatrix by deleting the first column and the first row of C , that is,
C 11 = ( c 21 c 22 c 2 p c 31 c 32 c 3 p c p + 1 , 1 c p + 1 , 2 c p + 1 , p ) .
In the following we prove the definitions of generalized linear transformation are equivalent.
Theorem 1.
Definition 2 and Definition 3 are equivalent.
Proof. 
Let us begin with Definition 2. Its new design matrix is
( 1 b 1 + j = 1 p a 1 j x 1 j b 2 + j = 1 p a 2 j x 1 j b p + j = 1 p a p j x 1 j 1 b 1 + j = 1 p a 1 j x 2 j b 2 + j = 1 p a 2 j x 2 j b p + j = 1 p a p j x 2 j 1 b 1 + j = 1 p a 1 j x n j b 2 + j = 1 p a 2 j x n j b p + j = 1 p a p j x n j ) = X ( 1 b 1 b 2 b p 0 a 11 a 21 a p 1 0 a 12 a 22 a p 2 0 a 1 p a 2 p a p p ) .
Hence, the new design matrix of Definition 2 is in the form of Definition 3 with
C = ( 1 b 1 b 2 b p 0 a 11 a 21 a p 1 0 a 12 a 22 a p 2 0 a 1 p a 2 p a n p ) .
Note that the submatrix by deleting the first row and first column of matrix C above is the transpose of A , that is, A .
Next, let us begin with Definition 3.
X C = X ( 1 c 11 c 1 p 0 c 21 c 2 p 0 c p + 1 , 1 c p + 1 ,     p ) = 1 c 11 + j = 1 p c j + 1 , 1 x 1 j c 12 + j = 1 p c j + 1 , 2 x 1 j c 1 p + j = 1 p c j + 1 , p x 1 j 1 c 11 + j = 1 p c j + 1 , 1 x 2 j c 12 + j = 1 p c j + 1 , 2 x 2 j c 1 p + j = 1 p c j + 1 , p x 2 j 1 c 11 + j = 1 p c j + 1 , 1 x n j c 12 + j = 1 p c j + 1 , 2 x n j c 1 p + j = 1 p c j + 1 , p x n j .
The second, third, …, last column of the matrix above are from the linear transform
c 11 + j = 1 p c j + 1 , 1 x j ,   c 12 + j = 1 p c j + 1 , 2 x j ,   ,   c 1 p + j = 1 p c j + 1 , p x j ,  
respectively. Hence, Definition 3 is in the form of Definition 2 with
A = ( c 21 c 31 c p + 1 , 1 c 22 c 32 c p + 1 , 2 c 2 p c 3 p c p + 1 , p )
and
( b 1 b 2 b p ) = ( c 11 c 12 c 1 p ) .
We have concluded our proof. □
If we expand along the first column to find the determinant of C in (2), we immediately see that the determinant of C is equal to the determinant of C 11 . Therefore, C is nonsingular (or invertible) if and only if C 11 in (3) is nonsingular. In addition, it follows from the proof of Theorem 1 that C is nonsingular if and only if A in Definition 2 is nonsingular.
Moreover, it is easy to see that if C 11 is nonsingular then the inverse of C can be written as
C 1 = ( 1 C 1 C 11 1 0 C 11 1 ) .
From now on we will use Definition 3 unless otherwise specified. For convenience, let us call the generalized linear transformation X C invertible if C is invertible.

3. Effects of a Generalized Linear Transformation

In logistic regression, the dependent variable y is binary with 2 values 0 and 1. Let the conditional probability that y = 1 be denoted by P r o b ( y = 1   | x ) = π ( x ) .
Logistic regression assumes the logit linearity between the log odds and independent variables x 1 ,   x 2 ,   ,   x p
ln [   π ( x ) 1 π ( x ) ] = β 0 + β 1 x 1 + + β p x p .
Equation (10) above can be written as
π ( x ) = e β 0 + β 1 x 1 + + β p x p 1 + e β 0 + β 1 x 1 + + β p x p   .
The following log likelihood is used in logistic regression
  l ( β ,   Y ,   X ) = i = 1 n y i ln ( e X i β 1 + e X i β ) + i = 1 n ( 1 y i ) ln ( 1 1 + e X i β ) .
The maximum likelihood method is used to estimate parameters in logistic regression. Specifically, the maximum likelihood estimators (MLE) are the values of parameters β 0 ,   β 1 , β p that maximize (12). The vector β ^ = (   β ^ 0 ,   β ^ 1 ,   β ^ 2 , ,   β ^ p ) of the MLE estimators of β = ( β 0 , β 1 , β 2 , , β p ) satisfies [13]
i = 1 n y i = i = 1 n   π i i = 1 n x i j y i = i = 1 n x i j   π i
or in matrix-vector form
X Y = X π
where π = (   π 1 ,   π 2   ,   , π n ) , and   π i = π ( X i ) = e X i β ^ 1 + e X i β ^ for i = 1 ,   2 ,   ,   n . Note that after a generalized linear transformation X C ,   (12) and (14) hold with the design matrix X replaced by the new design matrix X C .
Equation (13) or (14) represents ( p + 1 ) nonlinear equations of   β ^ 0 ,   β ^ 1 , ,   β ^ p and cannot be solved explicitly in general [14]. Rather, they can be solved numerically by Newton-Raphson algorithm [15] as follows
β ( i + 1 ) = β ( i ) + ( X V X ) 1 g ,  
where V is the n × n   diagonal matrix with its diagonal elements π 1 ( 1 π 1 ) ,   π 2 ( 1 π 2 ) ,   ,   π n ( 1 π n ) . In addition g = X ( Y π ) . Both V and g are evaluated at β ( i ) in (15).
If X X is nonsingular and the data is not completely separable or pseudo-completely separable [16], then the MLE estimator β ^ exists and is unique.
The MLE estimator β ^ can be used to predict p r o b ( y = 1 | x ) by the linear combination of variables x 1 ,   ,   x p
π ^ ( x ) = e   β ^ 0 + β ^ 1 x 1 + β ^ 2 x 2 + + β ^ p x p 1 + e   β ^ 0 + β ^ 1 x 1 + β ^ 2 x 2 + + β ^ p x p = = e ( 1 ,   x 1 ,   x 2 ,   ,   x p ) β ^ 1 + e ( 1 ,   x 1 ,   x 2 ,   ,   x p ) β ^ .
In particular, we have n fitted values
π i ^ = e ( 1 ,   x i 1 ,   x i 2 ,   ,   x i p ) β ^ 1 + e ( 1 ,   x i 1 ,   x i 2 ,   ,   x i p ) β ^ ,   i = 1 ,   2 ,   ,   n .  

3.1. Effects on MLE Estimator and Predictions

Theorem 2.
For logistic regression, if the MLE estimator of β   is β ^ , then the MLE estimator of β is C 1 β ^ after a generalized linear transformation X C assuming C is nonsingular. Moreover, the generalized linear transformation does not affect predictions.
Proof. 
Since β ^ is the maximum likelihood estimator of β , (14) is satisfied by β ^ . Multiplying both sides of (14) by C , we obtain
C ( X Y ) = C ( X π ) .
Clearly, (17) can be rewritten as
( X C ) Y = ( X C ) π .  
Writing X i β ^ as ( X i C ) ( C 1 β ^ ) for i = 1 ,   2 ,   ,   n , we have
π i = e X i β ^ 1 + e X i β ^ = e ( X i C ) ( C 1 β ^ ) 1 + e ( X i C ) ( C 1 β ^ ) .  
It follows from (18) and (19) that C 1 β ^ satisfies (14) for the new design matrix X C . Hence, the linear combinations C 1 β ^ of β ^   is the new MLE estimator after the generalized linear transformation X C .
Let us now predict p r o b ( y = 1 ) for a set of values of variables x 1 ,   ,   x p , for the new system after the generalized linear transformation X C using the new MLE estimator C 1 β ^ . Let v 1 ,   v 2 , ,   v p be a specific value of x 1 ,   ,   x p , respectively. Then, the row vector ( 1 ,   v 1 ,   ,   v p ) in the original system becomes ( 1 ,   v 1 ,   ,   v p ) C in the new system. By (16), the predicted conditional probability of y = 1 when x = ( 1 ,   v 1 ,   ,   v p ) C in the new system is
e ( 1 ,   v 1 ,   ,   v p ) C ( C 1 β ) ^ 1 + e ( 1 ,   v 1 ,   ,   v p ) C ( C 1 β ) ^ = e ( 1 ,   v 1 ,   ,   v p ) β ^ 1 + e ( 1 ,   v 1 ,   ,   v p ) β ^ .
The right-hand side of (20) is the predicted conditional probability of y = 1 when x = ( 1 ,   v 1 ,   ,   v p ) in the original system. □

3.2. Effects on Multicollinearity

Perfect multicollinearity or complete multicollinearity or multicollinearity, in short, refers to a situation in logistic regression in which two or more independent variables are linearly related [17]. In particular, if two independent variables are linearly related, then it is called collinearity.
Mathematically, multicollinearity means there exist constant a 0 ,   a 1 ,   ,   a p such that
a 0 x 0 + i = 1 p a i x i = 0
where at least two of a 1 ,   ,   a p   are nonzero. If we treat x 0 as an independent variable, then we just require at least one of a 1 ,   ,   a p   is nonzero.
Multicollinearity is a common issue in logistic regression. If there is multicollinearity, the design matrix X will not have a full column rank of p + 1 . Hence, the ( p + 1 ) × ( p + 1 ) matrix I ^ = X V X   in (15) will have a rank less than p + 1 . Thus, the inverse matrix I ^ 1 in (15) does not exist, which make the iteration in (15) impossible.
If there is near multicollinearity and there is no separation of the data points, theoretically I ^ = X V X   in (15) has an inverse and the iteration in (15) can be proceeded. Yet, iteration (15) may not find an approximate inverse I ^ = X V X and hence may cause unstable estimates and inaccurate variances [18].
Some authors define multicollinearity in logistic regression to be a high correlation between independent variables [19,20,21]. Let us call multicollinearity with high correlation by near multicollinearity and reserve multicollinearity for perfect multicollinearity or complete multicollinearity.
Let us define VIF now. Let R j 2   be the R-squared that results when x j is linearly regressed against the other ( p     1 ) independent variables. Then VIF for x j   is defined as
V I F j = 1 1 R j 2 ,   j = 1 ,   2 ,   ,   p .
Near multicollinearity can be detected by using VIF [2]. The larger the VIF of an independent variable, the larger the correlation between this independent variable and others. However, there is no standard for acceptable levels of VIF. Multicollinearility can be combated by a generalized cross-validation (GCV) criterion in partially linear regression models [22,23].

3.2.1. Preliminary Results in Linear Regression

As VIF is related to linear regression, let us briefly introduce some preliminary results in linear regression. As for logistic regression, we consider p independent variables x 1 ,   x 2 ,   ,   x p . Unlike logistic regression, the dependent variable y in linear regression is a continuous variable. We shall adopt the same notation as in logistic regression unless otherwise specified. In particular, X is the design matrix.
In linear regression, the relationship between y and x 1 ,   ,   x p is formulated as a linear combination
y = β 0 + β 1 x 1 + β 2 x 2 + + β p x p + ϵ
where ϵ is a random error, or in matrix notation
Y = X β + ( ϵ 1 ϵ 2 ϵ n ) .
The ordinary least squares (OLS) estimator β ^ of β satisfies [2]
X X β ^ = X Y .
Assuming the ( p + 1 )-dimensional square matrix X X is nonsingular, then the OLS estimator β ^ = (   β ^ 0 ,   β ^ 1 ,   β ^ 2 , ,   β ^ p ) is unique and can be written explicitly as
β ^ = ( X X ) 1 X Y .
The OLS estimator β ^ can be used to predict y by the linear combination of variables x 1 ,   ,   x p as follows
y ^ = β ^ 0 + β ^ 1 x 1 + β ^ 2 x 2 + + β ^ p x p = ( 1 ,   x 1 ,   x 2 ,   ,   x p ) β ^ .
Like Gelman and Hill [1] and Chatterjee and Hadi [2], we will call a predicted value a fitted value if the values of x 1 ,   ,   x p come from one of the n observations. So, we have n fitted values
y i ^ = β ^ 0 + β ^ 1 x i 1 + β ^ 2 x i 2 + + β ^ p x i p ,   i = 1 ,   2 ,   ,   n .
Therefore, the n -dimensional column vector Y ^ for the n fitted values y ^ 1 ,   y ^ 2 ,   ,   y ^ n can be expressed as
Y ^ = ( y ^ 1 ,   y ^ 2 ,   ,   y ^ n ) = X β ^ .
It is easy to show that the OLS estimator is C 1 β ^ after an invertible generalized linear transformation X C . Moreover, the generalized linear transformation does not affect predictions. Indeed, let us now predict y for a set of values of variables x 1 ,   ,   x p , which could be from any set of values not necessarily from one of the n observations. We first transform the values of x 1 ,   ,   x p into ( 1 ,   v 1 ,   ,   v p ) C ,   where v 1 ,   v 2 , ,   v p are values of x 1 ,   ,   x p .   Next, we apply (27) and obtain ( 1 ,   v 1 ,   ,   v p ) C ( C 1 β ^ ) = ( 1 ,   v 1 ,   ,   v p ) β ^ , which is the predicted value of the original model.
In linear regression, the coefficient of determination, denoted by R 2 and also called R-squared, is given by Chatterjee and Hadi [2].
R 2 = i = 1 n ( y ^ i y ¯ ) 2 i = 1 n ( y i y ¯ ) 2 = 1 i = 1 n ( y i y ^ i ) 2 i = 1 n ( y i y ¯ ) 2
where y ¯ is the mean of the dependent variable y , that is, y ¯ = i = 1 n y i n , and y ^ i is the fitted value
y ^ i = β ^ 0 + β ^ 1 x i 1 + β ^ 2 x i 2 + + β ^ p x i p ,     i = 1 , 2 , , n .  
The coefficient of determination R 2 can be related to the square of the correlation between Y and Y ^ as follows [2]
R 2 = [ C o r ( Y , Y ^   ) ] 2
where
C o r ( Y , Y ^   ) = i = 1 n ( y i y ¯ ) ( y ^ i y ^ ¯ ) i = 1 n ( y i y ¯ ) 2 i = 1 n ( y ^ i y ^ ¯ ) 2 .
Theorem 3.
R 2 in linear regression is invariant under invertible generalized linear transformations.
Proof. 
Expressing i = 1 n ( y i   y   ^ i ) 2 in the numerator of the 2nd equation in (29) into the matrix form and applying (26), we obtain
i = 1 n ( y i y ^ i ) 2 = ( Y X β ^ ) ( Y X β ^ ) = Y Y Y X β ^ .  
Substituting (32) into (29) yields
R 2 = 1 Y Y Y X β ^ i = 1 n ( y i y ¯ ) 2 .
Now let X C be an invertible generalized linear transformation. Then the OLS estimator after the transformation becomes C 1 β ^ . In this case, R 2 in (33) becomes
1 Y Y Y ( X C ) C 1 β ^ i = 1 n ( y i y ¯ ) 2 = 1 Y Y Y X β ^ i = 1 n ( y i y ¯ ) 2 ,
which returns to R 2 in (29) before the generalized linear transformation. □

3.2.2. Effects on Logistic Regression

In Definitions 2 and 3, we defined a generalized linear transformation only for independent variables. Since an independent variable is used as the dependent variable in order to find its VIF, we consider a simple linear transformation for the dependent variable in the following result.
Lemma 1.
Consider a linear regression with y as the dependent variable and x 1 ,   x 2 ,   ,   x p   as the independent variables. If we make a simple linear transformation on y such as y t = a y + b and a generalized linear transformation X C on independent variables with nonsingular C , then β t ^ = C 1 ( a β ^ + ( b ,   0 ,   ,   0 ) ) is the OSL estimator of the new linear regression after the transformations, where X is the design matrix, ( b ,   0 ,   ,   0 ) is a ( p + 1 ) -dimensional row vector and β ^ = ( X X ) 1 X Y is the OLS estimator of the original linear regression.
Proof. 
Since for the new linear regression has design matrix is X C and the dependent variable can be expressed as a Y + ( b ,   b ,   ,   b ) , where ( b ,   b ,   ,   b ) is a n -dimensional row vector, it is sufficient show that β t ^ = C 1 ( a β ^ + ( b ,   0 ,   ,   0 ) ) satisfies
( X C ) ( X C ) β = ( X C ) ( a Y + ( b ,   b ,   ,   b ) ) .
Substituting β t ^ = C 1 ( a β ^ + ( b ,   0 ,   ,   0 ) ) into the left-hand side of (34) and replacing β ^ with ( X X ) 1 X Y , we obtain
( X C ) ( X C ) β ^ = ( X C ) X ( a β ^ + ( b ,   0 ,   ,   0 ) ) = ( X C ) ( a Y + ( b ,   b ,   ,   b ) ) ,  
which is the right-hand side of (34). □
Theorem 4.
VIF for each independent variable is invariant under multiple linear transformations in logistic regression.
Proof. 
Without loss of generality, we assume multiple linear transformations x i t = a i x i + b i   for the first r   independent variables for i = 1 ,   2 ,   ,   r , where r   p . To find VIF, we do linear regressions for each i = 1 ,   2 ,   ,   r , by making x i t as the dependent variable and x 1 t   ,   x 2 t ,   ,   x i 1 t ,   x i + 1 t ,   ,   x r t ,   x r + 1 ,   x r + 2 ,   ,   x p as the independent variables. Similarly, we do linear regression for each i = r + 1 , r + 2 ,   ,   p , by making x i as the dependent variable and x 1 t   ,   x 2 t ,   ,   x r t , x r + 1 ,   ,   x i 1 ,   x i + 1   ,   ,   x p as the independent variables. We only prove the invariance of VIF for x 1 t and of VIF for x r + 1 as the invariance of VIF for x i t ,   i = 2 ,   3 ,   ,   r can be proved similar to x 1 t and the invariance of VIF for x i ,     i = r + 1 ,   r + 2 ,   ,   p can be proved similar to x r + 1 .
To find VIF for x 1 t , we do linear regressions by making x 1 t as the dependent variable and x 2 t ,   x 3 t ,   x r t ,   x r + 1 ,   x r + 2 ,   ,   x p as the independent variables. In this case, the dependent variable x 1 t = y t = a 1 x 1 + b 1 and the independent variables x 2 t , x 3 t ,   ,   x r t ,   x r + 1 ,   x r + 2 ,   ,   x p result from a generalized linearization X C , where X is the design matrix with independent variables x 2 ,   x 3 ,   ,   x p and C is the upper triangular matrix as follows
  C = ( 1 b 2 b 3 b r 0 0 0 0 a 2 0 0 0 0 0 0 0 a 3 0 0 0 0 0 0 0 a r 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 ) .  
Since the determinant of C equals a 2 a 3 a r 0 ,   by Lemma 1, the OLS estimator after the multiple linear transformations is β t ^ = C 1 ( a 1 β ^ + ( b 1 ,   0 ,   ,   0 ) ) . By (29), it’s sufficient to prove the following identity:
i = 1 n ( a 1 x i 1 + b 1   y i t ^ ) 2 i = 1 n ( a 1 x i 1 + b 1 ( a 1 x 1 + b 1 ) ¯   ) 2 = i = 1 n ( y i y ^ i ) 2 i = 1 n ( y i y ¯ ) 2 .
Since the denominator of the left-hand side of (35) is
i = 1 n ( a 1 x i 1 + b 1 ( a 1 x 1 + b 1 ) ¯   ) 2 = i = 1 n ( a 1 x i 1 + b 1 a 1 x 1   ¯ b 1   ) 2 = ( a 1 ) 2 i = 1 n ( x i 1 x 1   ¯   ) 2 = ( a 1 ) 2 i = 1 n ( y i y ¯ ) 2 .  
It is sufficient to show that
i = 1 n ( a 1 x i 1 + b 1   y i t ^ ) 2 = ( a 1 ) 2 i = 1 n ( y i y ^ i ) 2 .  
Expressing the left-hand side of (36) as the multiplication of vectors
i = 1 n ( a 1 x i 1 + b 1   y i t ^ ) 2 = ( a 1 x 1 + B 1 Y t ^ ) ( a 1 x 1 + B 1 Y t ^ )
where B 1 is the n -dimensional column vector with all elements of b 1 and Y t ^ is the n -dimension vector of fitted values   y i t ^ for i = 1 ,   2 ,   ,   n .
Applying (28) for the vector Y t ^ of fitted values and design matrix X C and applying Lemma1, we obtain
Y t ^ = ( X C ) β t ^ = ( X C ) C 1 ( a 1 β ^ + ( b 1 ,   0 ,   ,   0 ) ) = a 1 X β ^ + B 1 .  
Hence, a 1 x 1 + B 1 Y t ^ = a 1 ( x 1 X β ^ ) and so (37) becomes
i = 1 n ( a 1 x i 1 + b 1   y i t ^ ) 2 = ( a 1 ) 2 ( x 1 X β ^ ) ( x 1 X β ^ )  
which is the right hand-side of (36).
To find VIF for x r + 1 , we do linear regressions by making x r + 1 as the dependent variable and x 1 t ,   x 2 t ,   ,   x r t , x r + 2 ,   ,   x p as the independent variables. In this case, the independent variable result from a generalized linearization Z D , where Z is the design matrix of independent variables x 1   , x 2 ,   ,   , x r   ,   x r + 2   ,   ,   x p and D is the upper triangular matrix as follows
D = ( 1 b 1 b 2 b r 0 0 0 0 a 1 0 0 0 0 0 0 0 a 2 0 0 0 0 0 0 0 a r 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 ) .
Since the determinant of D equals a 1 a 2 a r 0 , by Theorem 3, VIF for x r + 1 after the generalized transformation Z D is the same as VIF for   x r + 1 prior to the generalized linear transformation. □
Remark 1.
VIFs are not necessarily invariant under an invertible generalized linear transformation X C . For instance, let x 1 t = x 2 and x 2 t = x 1 and keep   x 3 ,   x 4 ,   ,   x p unchanged. Then x 1 t ,   x 2 t ,   x 3 ,   x 4 ,   ,   x p   result from the generalized linear transformation with
D = ( 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 ) .
Since the determinant of D is −1, D is nonsingular. However, VIF for x 1 t after the generalizer linear transformation X D equals VIF for   x 2 prior to the generalized linear transformation, which are unequal in general.
The following result is immediate.
Theorem 5.
Multicollinearity exists in logistic regression if, and only if, it exists after an invertible generalized linear transformation.
Remark 2.
All the results about multicollinearity and VIF also apply to machine learning algorithms in which multicollinearity is applicable such as linear regression.

3.3. Effects on Linear Separation

Albert and Anderson [16] first assumed design matrix X to have a full column rank, that is, no multicollinearity. They then introduced the concept of separation (including complete separation and quasi-complete separation) and overlap in logistic regression with intercept. They showed that separation leads to nonexistence of (finite) MLE and that overlap leads to finite and unique MLE. Therefore, like multi-collinearity, separation is a common issue in logistic regression.
Definition 4.
There is a complete separation of data points if there exists a vector b = ( b 0 , b 1 ,   ,   b p )   that correctly allocates all observations to their response groups; that is,
{ j = 0 p b j x i j = X i b = b X i > 0 ,   y i = 1 ,   j = 0 p b j x i j = X i b = b X i < 0 ,   y i = 0 .
Definition 5.
There is quasi-complete separation if the data are not complete separable, but there exists a vector b = ( b 0 , b 1 ,   ,   b p )   such that
{ j = 0 p b j x i j = X i b = b X i 0 ,   y i = 1 , j = 0 p b j x i j = X i b = b X i 0 ,   y i = 0  
and equality holds for at least one subject in each response group.
Definition 6.
If neither a complete nor a quasi-complete separation exists, then the data is said to have overlap.
Theorem 6.
An invertible generalized linear transformation does not affect the data configuration of logistic regression.
Proof. 
We consider three cases.
Case 1. There is a complete separation of data points in the original system. Then (38) holds for a vector b = ( b 0 , b 1 ,   ,   b p ) . The row i in the design matrix is X i C   for i = 1 ,   2 ,   ,   n , after the invertible generalized linear transformation X C . Let b t = ( C ) 1 b , then vector b is a constant column vector of dimension (p + 1). Since ( b t ) ( X i C ) = b X i , (38) holds after the generalized linear transformation. Therefore, there is also a complete separation of data points after the generalized linear transformation X C .
Case 2. There is a quasi-complete separation of data points in the original system. It can be proved similarly to Case 1.
Case 3. The original data point has overlap. Then the new data points after the generalized linear transformation X C also has overlap. We prove it by contradiction. Assume otherwise the new data points after the generalized linear transformation does not has overlap. Then there is either a complete separation or a pseudo-complete separation of data points. Let us first assume there is a complete separation of data point after the generalized linear transformation X C . Then there is a vector b = ( b 0 , b 1 ,   ,   b p ) such that (38) holds. Row i in the design matrix after the generalized linear transformation X C is X i C   for i = 1 ,   2 ,   ,   n . Let b t = ( C ) 1 b , then (38) holds with b t , which is a contradiction. Next, let us assume there is a quasi-complete separation after the generalized linear transformation X C . It can be proved similarly. □

4. Numeric Examples

In this section, we use real data, the well-known German Credit Data from a German bank, to validate our theoretical results. The German Credit Data can be found in the UCI Machine Learning Repository [24]. The original dataset is in file “german.data”, which contains categorical/symbolic attributes. It has 1000 observations representing 1000 loan applicants. The statistical software package R (version 3.4.2) and its RStudio will be employed for our analyses. Since there are only 1000 records, we will not split them into training and test. We extract german.data using R’s read_table function, call it german_credit_raw, and use colnames() method to rename the column names.
There are 21 variables or attributes in german_credit_raw including 8 numerical ones as follows, which are denoted by x 1 , x 2 ,   , x 8 , resepectively:
  • Duration: Duration in month;
  • credit_amount: Credit amount;
  • installment_rate: Installment rate in percentage of disposable income;
  • current_address_length: Present residence since;
  • age: Age in years;
  • num_credits: Number of existing credits at this bank;
  • num_dependents: Number of people being liable to provide maintenance for;
  • credit_status: Credit status: 1 for good loans and 2 for bad loans.
Let us define a new variable called default as y = d e f a u l t = c r e d i t _ s t a t u s     1 . With the new variable default, 0 is for good loans and 1 is for bad loans. Since it is not easy to interpret categorical variables, we will only consider numerical variables.

4.1. Validation of Invariance of Separation

Let us first build a logistic regression model logit_model_1 using all the 8 numerical variables and glm function in R. In the following, we italicize statements in R, use “>“ for the R prompt and make outputs from R bold.
> logit_model_1 <- glm(default ~ duration + credit_amount + installment_rate + current_address_length + age + num_credits + num_dependents + credit_status, data = german_credit_raw, family = “binomial”)
Warning message:
glm.fit: algorithm did not converge
We see a warning message as above. It indicates a separation in the data. Indeed, this separation is from variable credit_status. (38) holds with b 0 = 3 ,   b 1 = b 2 = = b 7 = 0 ,   b 8 = 2 . By Definition 4, there is a complete separation of data points.
Now let us make a generalized linear transformation. We randomly generate 8 × 8 matrix C 11 as shown in Table 1 and the 8-dimensional row vector C 1 in (3) by calling R function runif, which generates random values from a uniform distribution with a default value from 0 to 1. We set seed for the purpose of reproduction. We denote C 11 and C 1 by C_11 and C_1 in R, respectively. We call R’s function det to calculate the determinant of C 11 .
> set.seed(1)
> C_11 <- matrix(runif(64),nrow = 8)
We use R function det to find the determinant of C_11 to be 0.01433565.
Vector C _ 1 is generated as follows:
> set.seed(10)
> C_1 = runif(n = 8, min = 1, max = 20)
[1] 10.642086 6.828602 9.111246 14.168940 2.617583 5.283296
6.216080 6.173796
Since C _ 11 is nonsingular, so is C = ( 1 C 1 0 C 11 ) by (9). Now x 1 ,   ,   x 8   can   be   transmitted into x 1 t ,   x 2 t ,   ,   x 8 t as in (6). Let us denote x 1 t ,   x 2 t ,   ,   x 8 t by d u r a t i o n 2 ,   c r e d i t _ a m o u n t 2 ,   ,   c r e d i t _ s t a t u s 2 in R.
Let us build a logistic regression model logit_model_2 for the eight transformed variables.
We also see the warning message as for the eight original variables. Therefore, after a nonsingular generalized linear transformation, the separation in data remains.

4.2. Validation of MLE

Let us drop credit_status and rebuild a logistic regression model called logit_model_3. The main output is shown in Table 2.
The output also indicates the data still has overlap after the transformation. Hence, we have validated Theorem 6.
We see variables current_address_length, num_credits and num_dependents are not significant at the 0.05 level. Since we are not focused on building a model, let us still keep these variables. Let us extract the coefficients and put them in a vector called model_coef_3 as follows:
> model_coef_3 <- data.frame(coef(logit_model_3))
> model_coef_3 <- as.matrix(model_coef_3)
Next, let us make a generalized linear transformation. We use letter D rather than C to distinguish the case from Section 4.1. We randomly generate 7 × 7 matrix D 11   and the 7-dimensional row vector D 1 in (3) by calling R function runif. Again, we denote D 11 as shown in Table 3 and D 1 by D_11 and D_1 in R, respectively.
> set.seed(2)
> D_11 <- matrix(runif(49),nrow = 7)
> det(D_11)
[1] 0.2851758
> set.seed(20)
> D_1 = runif(n = 7, min = 1, max = 20)
[1] 17.672906 15.602131 6.300300 11.054110 19.295234 19.626737
2.735319
Since the determinant of D _ 11 is nonzero, D = ( 1 D 1 0 D 11 ) is non-singular by (9) as shown in Table 4:
We use R function solve to find its inverse D 1 and call it inv_D (see Table 5) in R
Now x 1 ,   ,   x 7   can   be   transmitted into x 1 t ,   x 2 t ,   ,   x 7 t as in (6). Let us denote x 1 t ,   x 2 t ,   ,   x 7 t by d u r a t i o n 4 ,   c r e d i t _ a m o u n t 4 ,   ,   n u m b e r _ d e p e n d e n t s 4 in R. Let us build a logistic regression model for the seven transformed variables and call it logit_model_4. The main output is shown in Table 6:
Let us extract the coefficients called model_coef_3 to get more digits as shown in Table 7:
> model_coef_4 <- data.frame(coef(logit_model_4))
Let us find the multiplication of D 1 and vector model_coef_3 in R as follows:
> inv_D%*%model_coef_3
The result of the product is shown in Table 8 below.
This is exactly the same as model_coef_4. Next, we calculate the predicted values for all the 1000 records using both models logit_model_3 and logit_model_4 by calling R function predict and then all.equal utility to check these two predictions are near equality:
> model_3_predictions = predict(logit_model_3, german_credit_raw, type=“response”)
> model_4_predictions = predict(logit_model_4, german_credit_raw, type=“response”)
> all.equal(model_3_predictions, model_4_predictions, tolerance = 1e-13)
[1] “Mean relative difference: 0.0000000000005060054”
We see that the two predictions are identical taking rounding errors into consideration. Thus, we have validated validated Theorem 2.
Note that a nonlinear transformation even a one-to-one correspondence will not have the properties in Theorem 2 even for a single variable. For instance, let us define a one-to-one correspondence for variable age as follows: a g e _ 6 = ln ( a g e ) , which is log ( a g e ) in R. Let us build a univariate logistic regression model called logit_model_5 for age and a univariate logistic regression model called logit_model_6 for age_6. Next, we apply these two models to predict the values for german_credit_raw.
> model_5_predictions = predict(logit_model_5, german_credit_raw, type=“response”)
> model_6_predictions = predict(logit_model_6, german_credit_raw, type=“response”)
> all.equal(model_5_predictions, model_6_predictions, scale=1)
[1] “Mean absolute difference: 0.008512868”
We see that the predictions from logit_model_5 are in general different from predictions for logit_model_6.

4.3. Validation of Invariance of VIF

For logistic regression model logit_model_3 in Section 4.2, we use VIF function in the car package of R to find VIF for all the 7 variables. The result is shown in Table 9.
> car::vif(logit_model_3)
Next, we randomly generate multiple simple transformations as follows
> set.seed(30)
> A = runif(n = 7)
> set.seed(40)
> B = runif(n = 7, min = 1, max = 10)
> german_credit_raw$duration_7 = A [1] * german_credit_raw$duration + B [1]
> german_credit_raw$credit_amount_7 = A [2] * german_credit_raw$credit_amount + B [2]
> german_credit_raw$num_dependents_7 = A [7] * german_credit_raw$num_dependents + B [7]
We build a logistic regression for the variables after multiple simple linear transformations and call it logit_model_7. We then find VIF as follows and display the result in Table 10
> car::vif(logit_model_7)
Hence, we have validated Theorem 4. There is no need to validate Theorem 5 (the invariance of multicollinearity) as its analytical proof is straightforward.

5. Conclusions

In this paper, we first generalized a linear transformation for a single variable to multiple variables by a matrix multiplication. We then studied various effects of a generalized linear transformation in logistic regression. We showed that an invertible generalized linear transformation has no effects on predictions, multicollinearity, pseudo-complete separation, and complete separation. We also showed that multiple linear transformations do not have effects on the variance inflation factor (VIF). Numeric examples with real data were presented to validate our theoretic results.

Author Contributions

Writing—original draft, G.Z.; Writing—review & editing, S.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Gelman, A.; Hill, J. Data Analysis Using Regression and multilevel/Hierarchical Models; Cambridge University Press: Cambridge, UK, 2006. [Google Scholar]
  2. Chatterjee, S.; Hadi, A.S. Regression Analysis by Example, 5th ed.; John Wiley & Sons: New York, NY, USA, 2013. [Google Scholar]
  3. Adeyemo, A.; Wimmer, H.; Powell, L.M. Effects of normalization techniques on logistic regression in data science. J. Inf. Syst. Appl. Res. 2019, 12, 37–44. [Google Scholar]
  4. Box, G.E.P.; Tidwell, P.W. Transformation of the Independent Variables. Technometrics 1962, 4, 531–550. [Google Scholar] [CrossRef]
  5. Whittemore, A.S. Transformations to Linearity in Binary Regression. SIAM J. Appl. Math. 1983, 43, 703–710. [Google Scholar] [CrossRef]
  6. Kay, R.; Little, S. Transformations of the explanatory variables in the logistic regression model for binary data. Biometrika 1987, 74, 495–501. [Google Scholar] [CrossRef]
  7. Feng, C.; Wang, H.; Lu, N.; Chen, T.; He, H.; Lu, Y.; Tu, X.M. Log-transformation and its implications for data analysis. Shanghai Arch. Psychiatry 2014, 26, 105–109. [Google Scholar] [PubMed]
  8. Zhang, M.; Chen, S.; Rain, S.C. Evaluating Continuous Variable Transformations in Logistic Regression. In Proceedings of the Midwest SAS User Group Conference 2015, Omaha, NE, USA, 18–20 October 2015. [Google Scholar]
  9. Lee, D.K. Data transformation: A focus on the interpretation. Korean J. Anesthesiol. 2020, 73, 503–508. [Google Scholar] [CrossRef] [PubMed]
  10. Morrell, C.H.; Pearson, J.D.; Brant, L.J. Linear Transformations of Linear Mixed-Effects Models. Am. Stat. 1997, 51, 338–343. [Google Scholar]
  11. Zeng, G. Invariant Properties of Logistic Regression Model in Credit Scoring under Monotonic Transformations. Commun. Stat. Theory Methods 2017, 46, 8791–8807. [Google Scholar] [CrossRef]
  12. Zeng, G. On the analytical properties of category encodings in logistic regression. Commun. Stat. Theory Methods, 2021; advance online publication. [Google Scholar] [CrossRef]
  13. Hosmer, D.W.; Lemeshow, S.; Sturdivant, R.X. Applied Logistic Regression, 3rd ed.; John Wiley & Sons, Inc.: New York, NY, USA, 2013. [Google Scholar]
  14. Zeng, G. On the Existence of an Analytical Solution in Multiple Logistic Regression. Int. J. Appl. Math. Stat. 2021, 60, 53–67. [Google Scholar]
  15. Refaat, M. Credit Risk Scorecards: Development and Implementation Using SAS; Lulu.com: Raleigh, NC, USA, 2011. [Google Scholar]
  16. Albert, A.; Anderson, J.A. On the existence of maximum likelihood estimates in logistic regression models. Biometrika 1984, 71, 1–10. [Google Scholar] [CrossRef]
  17. Zeng, G.; Zeng, E. On the Relationship between Multicollinearity and Separation in Logistic Regression. Commun. Stat. Simul. Comput. 2021, 50, 1989–1997. [Google Scholar] [CrossRef]
  18. Shen, L.; Gao, Y.; Xiao, J. Simulation of Hydrogen Production from Biomass Gasification in Interconnected Fluidized Beds. Biomass Bioenergy 2008, 32, 120–127. [Google Scholar] [CrossRef]
  19. Vatcheva, K.P.; Lee, M.; McCormick, J.B.; Rahbar, M.H. Multicollinearity in Regression Analyses Conducted in Epidemiologic Studies. Epidemiology 2016, 6, 227. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  20. Cincotta, K. Multicollinearity in Zero Intercept Regression: They Are Not Who We Thought They Were. In Proceedings of the Presented at the Society of Cost Estimating and Analysis (SCEA) Conference, Albuquerque, NM, USA, 6–10 June 2011. [Google Scholar]
  21. Dohoo, I.R.; Ducrot, C.; Fourichon, C.; Donald, A.; Hurnik, D. An overview of techniques for dealing with large numbers of independent variables in epidemiologic studies. Prev. Vet. Med. 1997, 29, 221–239. [Google Scholar] [CrossRef] [PubMed]
  22. Amini, M.; Roozbeh, M. Optimal partial ridge estimation in restricted semiparametric regression models. J. Multivar. Anal. 2015, 136, 26–40. [Google Scholar] [CrossRef]
  23. Roozbeh, M. Optimal QR-based estimation in partially linear regression models with correlated errors using GCV criterion. Comput. Stat. Data Anal. 2018, 117, 45–61. [Google Scholar] [CrossRef]
  24. Lichman, M. UCI Machine Learning Repository; University of California, School of Information and Computer Science: Irvine, CA, USA, 2013; Available online: http://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/ (accessed on 16 December 2022).
Table 1. Matrix C_11.
Table 1. Matrix C_11.
[,1][,2][,3][,4][,5][,6][,7][,8]
[1,]0.26550870.62911400.71761850.26722070.49354130.82094630.73231370.3162717
[2,]0.37212390.06178630.99190610.38611410.18621760.64706020.69273160.5186343
[3,]0.57285340.20597460.38003520.01339030.82737330.78293280.47761960.6620051
[4,]0.90820780.17655680.77744520.38238800.66846670.55303630.86120950.4068302
[5,]0.20168190.68702280.93470520.86969080.79423990.52971960.43809710.9128759
[6,]0.89838970.38410370.21214250.34034900.10794360.78935620.24479730.2936034
[7,]0.94467530.76984140.65167380.48208010.72371090.02333120.07067900.4590657
[8,]0.66079780.49769920.12555510.59956580.41127440.47723010.09946620.3323947
Table 2. Coefficients and statistics for model 3.
Table 2. Coefficients and statistics for model 3.
Coefficients:
EstimateStd. Errorz ValuePr(>|z|)
(Intercept)−1.569797650.42997660−3.6510.000261***
duration0.026211740.007703303.4030.000667***
credit_amount0.000070600.000034042.0740.038053*
installment_rate0.203559920.072516712.8070.004999**
current_address_length0.040909330.066908970.6110.540923
age−0.021430750.00708337−3.0260.002482**
num_credits−0.156890200.13049965−1.2020.229276
num_dependents0.128003280.201313380.6360.52488
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1.
Table 3. Matrix D_11.
Table 3. Matrix D_11.
[,1][,2][,3][,4][,5][,6][,7]
[1,]0.18488230.83344880.405282180.38754950.962644050.62719630.1150841
[2,]0.70237400.46801850.853548450.83688920.132372000.84442900.1632009
[3,]0.57332630.54998370.976398490.15050140.010414530.28487060.9440418
[4,]0.16805190.55267410.225825460.34727220.164642240.66722560.7948638
[5,]0.94383930.23889480.444809230.48877320.810192140.15046980.9746879
[6,]0.94347500.76051330.074979420.14924690.868861040.98172790.3490884
[7,]0.12915900.18082010.661898760.35706260.514281760.29701070.5019699
Table 4. Matrix D.
Table 4. Matrix D.
[,1][,2][,3][,4][,5][,6][,7][,8]
[1,]117.672906415.60213106.3002996311.054110219.2952335919.62673712.7353192
[2,]00.18488230.83344880.405282180.38754950.962644050.62719630.1150841
[3,]00.70237400.46801850.853548450.83688920.132372000.84442900.1632009
[4,]00.57332630.54998370.976398490.15050140.010414530.28487060.9440418
[5,]00.16805190.55267410.225825460.34727220.164642240.66722560.7948638
[6,]00.94383930.23889480.444809230.48877320.810192140.15046980.9746879
[7,]00.94347500.76051330.074979420.14924690.868861040.98172790.3490884
[8,]00.12915900.18082010.661898760.35706260.514281760.29701070.5019699
Table 5. Inverse matrix of D.
Table 5. Inverse matrix of D.
[,1][,2][,3][,4][,5][,6][,7][,8]
[1,]1−7.9987564−8.139901034.214718911.9118435−3.7229455−10.653517222.7150084
[2,]0−0.30695490.338389410.27554240−0.63170510.49875140.42693364−0.8228946
[3,]01.6396322−0.062529470.798067660.28072330.2680550−0.76439514−2.2899087
[4,]0−0.10622080.109610770.64555254−0.8488122−0.56254290.105098291.1379418
[5,]00.64602180.86107501−0.794203930.71678360.9646552−1.22876225−1.0880130
[6,]00.2595927−0.43338042−0.38521513−0.42831980.15802330.290555780.9751883
[7,]0−1.14604180.12889195−0.483511080.4179223−1.02930701.149403131.6676878
[8,]0−0.4189740−0.453833980.036081310.86234410.2778314−0.076809990.3163327
Table 6. Coefficients and statistics of model 4.
Table 6. Coefficients and statistics of model 4.
Coefficients:EstimateStd. Errorz valuePr(>|z|)
(Intercept)1.254871.651280.7600.447
duration_4−0.160780.18457−0.8710.384
credit_amount_40.037980.463880.0820.935
installment_rate_40.235130.244650.9610.337
current_address_length_4−0.082510.27582−0.2990.765
age_4−0.013310.19983−0.0670.947
num_credits_4−0.056160.35614−0.1580.875
num_dependents_40.078200.086440.9050.366
Table 7. Coefficients of model 4.
Table 7. Coefficients of model 4.
coef.logit_model_4.
(Intercept)1.2548745
duration_4−0.1607787
credit_amount_40.0379776
installment_rate_40.2351349
current_address_length_4−0.0825126
age_4−0.0133075
num_credits_4−0.0561589
num_dependents_40.0781968
Table 8. Product of inverse matrix of D and coefficients of model 3.
Table 8. Product of inverse matrix of D and coefficients of model 3.
coef.logit_model_3.
[1,]1.25487449
[2,]−0.16077871
[3,]0.03797764
[4,]0.23513490
[5,]−0.08251257
[6,]−0.01330746
[7,]−0.05615895
[8,]0.07819677
Table 9. VIF for model 3.
Table 9. VIF for model 3.
VIF
duration1.781992
credit_amount1.991715
installment rate1.223520
curent_adress_length1.069032
age1.111486
num_credits1.031390
num_dependents1.033490
Table 10. VIF for model 7.
Table 10. VIF for model 7.
VIF
duration_71.781992
credit_amount_71.991715
installment_rate_71.223520
curent_adress_length_71.069032
age_71.111486
num_credits_71.031390
num_dependents_71.033490
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zeng, G.; Tao, S. A Generalized Linear Transformation and Its Effects on Logistic Regression. Mathematics 2023, 11, 467. https://doi.org/10.3390/math11020467

AMA Style

Zeng G, Tao S. A Generalized Linear Transformation and Its Effects on Logistic Regression. Mathematics. 2023; 11(2):467. https://doi.org/10.3390/math11020467

Chicago/Turabian Style

Zeng, Guoping, and Sha Tao. 2023. "A Generalized Linear Transformation and Its Effects on Logistic Regression" Mathematics 11, no. 2: 467. https://doi.org/10.3390/math11020467

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop