Next Article in Journal
Design of Finite Difference Method and Neural Network Approach for Casson Nanofluid Flow: A Computational Study
Next Article in Special Issue
Neutrosophic Mean Estimation of Sensitive and Non-Sensitive Variables with Robust Hartley–Ross-Type Estimators
Previous Article in Journal
Application of Ant Colony Optimization Algorithm Based on Triangle Inequality Principle and Partition Method Strategy in Robot Path Planning
Previous Article in Special Issue
A New Reliability Class-Test Statistic for Life Distributions under Convolution, Mixture and Homogeneous Shock Model: Characterizations and Applications in Engineering and Medical Fields
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

James Stein Estimator for the Beta Regression Model with Application to Heat-Treating Test and Body Fat Datasets

1
Department of Statistics, University of Sargodha, Sargodha 40162, Pakistan
2
Department of Mathematics, College of Science, Qassim University, Buraydah 51452, Saudi Arabia
3
Department of Mathematics, Faculty of Science, Tanta University, Tanta 31111, Egypt
4
Department of Mathematical Sciences, College of Science, Princess Nourah bint Abdulrahman University, Riyadh 11671, Saudi Arabia
*
Author to whom correspondence should be addressed.
Axioms 2023, 12(6), 526; https://doi.org/10.3390/axioms12060526
Submission received: 2 April 2023 / Revised: 22 May 2023 / Accepted: 24 May 2023 / Published: 27 May 2023

Abstract

:
The beta regression model (BRM) is used when the dependent variable may take continuous values and be bounded in the interval (0, 1), such as rates, proportions, percentages and fractions. Generally, the parameters of the BRM are estimated by the method of maximum likelihood estimation (MLE). However, the MLE does not offer accurate and reliable estimates when the explanatory variables in the BRM are correlated. To solve this problem, the ridge and Liu estimators for the BRM were proposed by different authors. In the current study, the James Stein Estimator (JSE) for the BRM is proposed. The matrix mean squared error (MSE) and the scalar MSE properties are derived and then compared to the available ridge estimator, Liu estimator and MLE. The performance of the proposed estimator is evaluated by conducting a simulation experiment and analyzing two real-life applications. The MSE of the estimators is considered as a performance evaluation criterion. The findings of the simulation experiment and applications indicate the superiority of the suggested estimator over the competitive estimators for estimating the parameters of the BRM.

1. Introduction

Ferrari and Cribari-Neto [1] have introduced the beta regression model (BRM), which allows the response variable to be a continuous value in the range (0, 1), such as rates, proportions and percentages. The BRM is used in the fields of economics and medicine [2]. There are some assumptions for estimating model parameters for the linear and generalized linear models. One of them is that explanatory variables should be uncorrelated, but in practice, this assumption is rarely fulfilled and is called multicollinearity. Frisch [3] was the first to indicate the seriousness of this problem and its consequences on regression estimation. The presence of multicollinearity can increase the standard error of the coefficients, widen the confidence interval, increase the variances of the coefficients and may give the wrong signs to the model coefficients [4]. In order to check for the presence of multicollinearity, different measures have been proposed by different researchers. These include using the condition index, using the variance inflation factor and others [5].
When there is an issue of multicollinearity in linear models, the ordinary least squares (OLS) method performs poorly. Many authors have proposed various alternative methods, such as the Stein estimator [6], principal components estimator [7], the ordinary ridge regression estimator (ORRE) [8] and some others to address the problem of multicollinearity. The most popular of these biased estimation methods is ridge regression, in which the biasing parameter k is used to control the bias of regression coefficients. Many authors have discussed various methods for k in order to get the optimal value of k, for which the MSE of the ridge estimator should be the smallest [8,9,10,11,12,13,14,15,16,17,18]. In addition to linear regression models, the ridge estimator was also developed for some other models, see, for example, [19,20,21,22].
Liu [23] has proposed another estimator, known as the Liu estimator (LE), which combines the advantages of the Stein estimator and the ORRE. Qasim et al. [24] have suggested new Liu shrinkage parameters for the linear regression model. However, some studies have been done on generalized linear models for controlling the effect of multicollinearity, including the logistic ridge estimator [14,25], Poisson ridge regression estimator [20], Poisson Liu regression estimator [26], negative binomial ridge and Liu estimator [21] and gamma ridge regression estimator [22,27]. Further, Qasim et al. [24] have proposed the LE for the gamma regression and Stein [6] has suggested an estimator, the JSE, for dealing with multicollinearity in the linear regression model. Few researchers have worked on this method. Literature has shown that the JSE is not a better estimator than other biased estimators.
In the literature, the JSE for the logistic regression model was given by [28], which showed that for some situations, the JSE performed better than other biased estimators. Amin et al. [27] have worked on the JSE for the Poisson regression model and showed that the JSE has a better performance than the other considered biased estimators. Recently, Akram et al. [29] have studied the JSE for the inverse Gaussian regression model.
In the BRM, when the explanatory variables are correlated, then it is not possible to use the maximum likelihood estimator (MLE) for the estimation of the unknown parameters. Three studies are available in the literature to deal with the multicollinearity problems in the BRM. Qasim et al. [30] have proposed some beta ridge regression estimators. The LE was developed by [31] for the BRM. Abonazel et al. [2] have suggested some ridge regression estimators for the BRM. However, to date, no one has considered the JSE for the BRM in dealing with multicollinearity. Therefore, in this study, we are adapting the JSE for the BRM. This study also focuses on making a comparison between the JSE and other existing estimators for the BRM based on the mean squared error (MSE) criterion, as the literature has shown that the JSE has different performances for different models.

2. Methodology

Suppose Y is a continuous random variable, which follows a beta distribution with parameters  a , b , with the probability density function given by
f y ; a , b = Γ a + b Γ a Γ b y a 1 1 y b 1 ;   0 < y < 1 ,
where Γ (.) represents the gamma function and  a  and  b > 0. The expected value of Y is  E ( Y ) a a + b  and the variance of Y is  V a r ( Y ) = a b ( a + b ) 2 ( a + b + 1 ) .
Ferrari and Cribari-Neto [1] have offered another parameterization of Equation (1) by supposing that  μ = a a + b δ = a + b ,   a = 𝜇𝛿 and  b = 𝛿 − 𝛿𝜇. The new formulation of Equation (1) is given as follows:
f y ; μ , δ = Γ δ Γ μ δ Γ δ δ μ y μ δ 1 ( 1 y ) ( δ δ μ 1 ) ;   0 < y < 1 ,
where  μ  denotes the mean of the dependent variable and 𝛿 is known as the precision parameter.
The expected value of the re-parameterization of Y is 𝐸(Y) = 𝜇 and 𝑉𝑎𝑟(Y) =  μ (1 − 𝜇)/(1 +  ϕ ), where  ϕ  is the dispersion parameter ( ϕ = δ 1 ).
By transforming Equation (2) to estimate the model parameters via MLE, the log-likelihood function is given as follows:
l β = i = 1 n l i μ i , δ = i = 1 n [ log Γ δ log Γ ( μ i δ ) log Γ ( ( 1 μ i ) δ ) + δ μ i 1 log y i + δ μ i δ 1 l o g ( 1 y i ) ] .
Let the unbiased estimator of 𝛽 be  β ^ . The score function can be computed as:
S β = δ X t T y * μ * ,
where  y * = l o g y 1 y ,   μ * = ψ μ δ ψ 1 μ δ ,   X is a design matrix with order n × q ,   T = d i a g 1 g μ 1 , , 1 g μ n , ψ is the digamma function and g(⋅) is the logit link function.
Let  η i = g μ i = l o g μ i 1 μ i = x i t β , where  x i  is the i-th row of the data matrix, 𝛽 represents the  q × 1  vector of regression coefficients with intercept and  q = p + 1  are the explanatory variables. As Equation (4) is non-linear in  β , it needs an iterative reweighted method. According to Abonazel and Taha [2], using the iterative method,  β  can be computed as
β r + 1 = β r + I r β β 1 S β r ,
where  r = 0 , 1 , 2 , ,  represents the iterations that are repeated until convergence and  I r β β  is the information matrix of  β . At the final iteration [2], Equation (5) can be written as
β ^ M L = ( S ) 1 X t Z v ,
where  S = X t Z X Z = d i a g z 1 , z 2 , , z n ,   v = η Z 1 T y * μ *  and  z i = δ ψ μ i δ ψ 1 μ i δ g 2 μ i .
The matrix MSE (MMSE) and MSE can be calculated by assuming  α = ξ t β ^ M L  and  Λ = d i a g ( λ 1 , λ 2 , , λ q ) ,  which correspond to  ξ S ξ t ,  where  ξ  represents the orthogonal matrix, whose columns contain the eigenvectors of  S ; that is  ξ = ξ 1 , , ξ q  and  λ 1 > λ 2 > > λ q > 0  are the eigenvalues of the matrix  S ,  while  α j ,   j = 1 , 2 , , q  is the j-th element of  ξ t β ^ M L .  Then, the covariance and MMSE of the  β ^ M L  is defined as
C o v β ^ M L = ϕ ^ ( S ) 1 , M M S E β ^ M L = ϕ ^ ξ Λ 1 ξ t .
The scalar MSE of the MLE can be obtained by the following equation
M S E β ^ M L = E β ^ M L β t β ^ M L β = ϕ ^ t r ξ Λ 1 ξ t = ϕ ^ j = 1 q 1 λ j ,
where  λ j  represents the  j -th eigenvalues of the S matrix.

2.1. The Beta Ridge Regression Estimator

When explanatory variables are correlated, then the matrix becomes ill-conditioned, which makes eigenvalues very small, and the MLE of the BRM becomes inflated. The issue of multicollinearity makes the results unreliable as it increases the variances and confidence intervals of the BRM estimates, which leads to wrong inferences. To solve the issue of multicollinearity, Abonazel and Taha [2] introduced the use of a ridge estimator for the BRM and, later on, the beta ridge regression estimator (BRRE) was developed by Qasim et al. [30]. The BRRE is the extension of the Hoerl and Kennard [9] estimator and is defined as follows:
β ^ B R R E = Q k β ^ M L ,
where  Q k = S + k I q 1 S , k is the biasing parameter that is greater than 0, whereas  I q  is an identity matrix of order q × q. β ^ B R R E = β ^ M L ,  if  k 0 . The bias and covariance of Equation (9) are obtained using the following formula:
B i a s β ^ B R R E = k ξ Λ k 1 β ,
C o v β ^ B R R E = ϕ ^ ξ Λ k 1 Λ Λ k 1 ξ t ,
M M S E β ^ B R R E = Q k X t Z X 1 R k t + B i a s β ^ B R R E B i a s ( β ^ B R R E ) t = ϕ ^ ξ Λ k 1 Λ Λ k 1 ξ t + k 2 ξ Λ k 1 β β t Λ k 1 ξ t ,
where  Λ k = d i a g ( λ 1 + k , λ 2 + k , , λ q + k ) , Λ = d i a g ( λ 1 , λ 2 , , λ q ) = ξ ( S ) ξ t  and  ξ  is the orthogonal matrix whose columns are the eigenvectors of  S . Finally, the scalar MSE of the BRRE can be estimated by applying trace on Equation (12), which may be defined as
M S E β ^ B R R E = t r M M S E β ^ B R R E = ϕ ^ j = 1 q λ j ( λ j + k ) 2 + k 2 j = 1 q α j 2 ( λ j + k ) 2 ,
where  α = ξ t β ^ M L  and k is a biasing parameter that is always greater than 0.

2.2. The Beta Liu Estimator

Different kinds of shrinkage parameters have been proposed by different authors for estimating model parameters when the explanatory variables are correlated. The ridge regression estimator (RRE) was proposed by Hoerl and Kennard [9] for the linear regression model. Liu [23] has introduced an alternative RRE to solve the problem of multicollinearity with different biasing parameters. Karlsson et al. [31] have introduced the LE for the BRM called the beta Liu estimator (BLE), which is defined as follows:
β ^ B L E = L d β ^ M L ,
where  L d = S + I 1 S + d I  and d is the biasing parameter of the BLE that is restricted to attain the value in the interval [0, 1]. The  B i a s β ^ B L E  and  C o v β ^ B L E  of Equation (14), respectively, are given as
B i a s β ^ B L E = ξ d 1 Λ I 1 β
and
C o v β ^ B L E = ϕ ^ ξ Λ I 1 Λ d Λ 1 Λ d Λ I 1 ξ t .
In this context, the MMSE and scalar MSE of the BLE can be obtained as shown below:
M M S E β ^ B L E = ϕ ^ L d S 1 L d t + B i a s β ^ B L E B i a s β ^ B L E t = ϕ ^ ξ Λ I 1 Λ d Λ 1 Λ d Λ I 1 ξ t + ( d 1 ) 2 ξ Λ I 1 β β t Λ I 1 ξ t ,
M S E β ^ B L E = t r M M S E β ^ B L E = ϕ ^ j = 1 q ( λ j + d ) 2 λ j ( λ j + 1 ) 2 + d 1 2 j = 1 q α j 2 ( λ j + 1 ) 2 ,
where  Λ I = d i a g ( λ 1 + I , λ 2 + I , , λ q + I ) , Λ d = d i a g ( λ 1 + d , λ 2 + d , , λ q + d ) .

2.3. The JSE for the BRM

To solve the problem of ill-conditioned explanatory variables, another estimation method was proposed by Stein [6], called the JSE. This current study considers this estimator for the BRM, which is named the beta JSE (BJSE), and it is assumed that the BJSE will provides the better estimates than the BRRE and the BLE. The suggested estimator is defined as shown below:
β ^ B J S E = c β ^ M L ,
where  0 < c < 1  and  β ^ M L  is the unbiased estimate of  β .  If c = 1, then  β ^ B J S E = β ^ M L . For selecting the value of c, we took into account the work of Stein [6], which is as follows
c = ( β ^ M L t β ^ M L ) ( β ^ M L t β ^ M L + t r a c e ( S ) 1 ) .
The bias of the  β ^ B J S E  may be computed as
B i a s β ^ B J S E = E β ^ B J S E β .
Using Equation (20) in Equation (21), we obtained
B i a s β ^ B J S E = E β ^ M L t β ^ M L ( β ^ M L t β ^ M L + t r a c e ( S ) 1 ) β ^ M L .
We simplified Equation (22) and developed the following expression
B i a s β ^ B J S E = t r a c e ( S ) 1 β t β + t r a c e ( S ) 1 β .
The MMSE of the  β ^ B J S E  by using Equation (19) was obtained as
    M M S E β ^ B J S E = C o v β ^ B J S E + B i a s β ^ B J S E B i a s β ^ B J S E t = ϕ ^ c ( S ) 1 c t + b B J S E t b B J S E
where  b B J S E = B i a s ( β ^ B J S E ) .
For obtaining the scalar MSE of the BJSE, after applying the trace operator, we found the following:
M S E β ^ B J S E = ( β ^ M L t β ^ M L ) ( β ^ M L t β ^ M L + t r a c e ( S ) 1 ) t t r a c e ( S ) 1 ( β ^ M L t β ^ M L ) ( β ^ M L t β ^ M L + t r a c e ( S ) 1 ) + t r a c e S 1 β t β + t r a c e S 1 β t t r a c e S 1 β t β + t r a c e S 1 β .
On simplification, it was easy to obtain the following equation:
M S E β ^ B J S E = ϕ ^ j = 1 q α j 4 λ j ( α j 2 λ j + 1 ) 2 + j = 1 q α j 2 ( α j 2 λ j 2 + 1 ) .

2.4. Theoretical Comparison among the BRM’s Estimators

Lemma 1.
Let  G  be a positive definite   ( p . d )  matrix,   α  is a vector of non-zero constants and   h  may be a positive constant. Then   h G α α t > 0 ,  if, and only if,   α t M α < c  [32].

2.4.1. The MLE versus the BJSE

Theorem 1.
Under the BRM, consider   b B J S E = B i a s β ^ B J S E .  Then,
M S E β ^ M L M S E β ^ B J S E > 0
Proof. 
The difference between scalar MSE functions of the MLE and the BJSE is given as
M S E β ^ M L M S E β ^ B J S E = ϕ ^ ξ d i a g 1 λ j α j 4 λ j α j 2 λ j + 1 2 j = 1 q ξ t b t B J S E b B J S E
= ϕ ^ ξ d i a g α j 2 λ j + 1 2 α j 4 λ j λ j α j 2 λ j + 1 2 j = 1 q ξ t b t B J S E b B J S E .
Further, Equation (28) can also be written as
M S E β ^ M L M S E β ^ B J S E = ϕ ^ ξ d i a g 1 + 2 α j 2 λ j λ j α j 2 λ j + 1 2 j = 1 q ξ t b t B J S E b B J S E .
Equation (29) is a p.d if  α j 2 λ j + 1 2 α j 4 λ j > 0  or equivalently  1 + 2 α j 2 λ j > 0.
Thus, it is proved that BJSE dominates the MLE in the sense of scalar MSE for the BRM for all  j = 1,2 , . . q . □

2.4.2. The BRRE versus the BJSE

Theorem 2.
Under the BRM, consider   k > 0  and   b B R R E = B i a s β ^ B R R E   a n d   b B J S E = B i a s ( β ^ B J S E ) . Then   M S E β ^ B R R E M S E β ^ B J S E > 0 .
Proof. 
The difference between scalar MSE functions of the BRRE and the BJSE is given by
M S E β ^ B R R E M S E β ^ B J S E = ϕ ^ ξ d i a g λ j ( λ j + k ) 2 α j 4 λ j α j 2 λ j + 1 2 j = 1 q ξ t + b t B R R E b B R R E b t B J S E b B J S E = ϕ ^ ξ d i a g λ j α j 2 λ j + 1 2 α j 4 λ j ( λ j + k ) 2 ( λ j + k ) 2 α j 2 λ j + 1 2 j = 1 q ξ t + b t B R R E b B R R E b t B J S E b B J S E .
After simplifying the above expression, we obtained
M S E β ^ B R R E M S E β ^ B J S E = ϕ ^ ξ d i a g λ j + 2 α j 2 λ j 2 α j 4 λ j k 2 2 α j 4 λ j 2 k ( λ j + k ) 2 α j 2 λ j + 1 2 j = 1 q ξ t + b t B R R E b B R R E b t B J S E b B J S E .
Equation (31) is a p.d if  ( λ j + 2 α j 2 λ j 2 α j 4 λ j k 2 2 α j 4 λ j 2 ) > 0 . By simplifying Equation (31), we observed that  λ j 1 + 2 α j 2 λ j α j 4 λ j k ( k + 2 λ j ) > 0. Thus for  k > 0 , the proof is ended. □

2.4.3. The BLE versus the BJSE

Theorem 3.
Under the BRM, consider  0 < d < 1  and   b B L E = B i a s ( β ^ B L E ) a n d b B J S E = B i a s ( β ^ B J S E ) . Then  M S E β ^ B L E M S E β ^ B J S E > 0 .
The dissimilarity between scalar MSE functions of the BLE and the BJSE can be obtained as follows:
M S E β ^ B L E M S E β ^ B J S E = ϕ ^ ξ d i a g ( λ j + d ) 2 λ j ( λ j + 1 ) 2 α j 4 λ j α j 2 λ j + 1 2 j = 1 q ξ t + b t B L E b B L E b t B J S E b B J S E = ϕ ^ ξ d i a g ( λ j + d ) 2 α j 2 λ j + 1 α j 4 λ j ( λ j + 1 ) 2 λ j ( λ j + 1 ) 2 α j 2 λ j + 1 2 j = 1 q ξ t + b t B L E b B L E b t B J S E b B J S E .
Equation (32) is a p.d., if  ( λ j + d ) 2 ( α j 2 λ j + 1 ) α j 4 λ j ( λ j + 1 ) 2 > 0 . Thus, for  0 < d < 1  the proof is complete.

2.4.4. Computation of the Biasing Parameters

The calculation of the appropriate value of biasing parameter was necessary for obtaining better estimates of the BRM. The most desirable values of k, d and c are derived for the considered estimators. The optimum value for the k was selected by considering the work of Hoerl and Kennard [9] as
k = ϕ ^ j = 1 q α j 2 .
We consider the following optimum value of d for the BLE
d = α m a x 2 ϕ ^ ϕ ^ λ m a x + α m a x 2 .
The Stein parameter c was selected by considering the work of Stein [6] as
c = m i n α j 2 λ j ϕ ^ + α j 2 λ j .

3. Simulation Study

This section presents a numerical evaluation of the proposed estimator. A simulation experiment was performed to examine the performance of the BJSE with different simulation conditions and also to compare the BJSE with the MLE, the BRRE and the BLE.

3.1. Simulation Layout

In the simulation design, the response variable was generated from a beta distribution using the logit link function as the following equation:
l o g μ i 1 μ i = ( β 0 + β 1 x i 1 + β 2 x i 2 + + β p x i p ) 1 , i = 1 , , n ; j = , , q ,
where  x i j  represents the correlated independent variables and  β j  is the true parameters vector of the BRM. The true parameters should be selected such that  j = 1 q β j 2 = 1 ,  and the correlated explanatory variables are generated by the following formula
x i j = 1 ρ 2 1 2 z i j + ρ z i j + 1 , i = 1 , , n ; j = , , q ,
where  ρ 2  is the degree of correlation among regressors and  z i j  are the independent standard normal pseudo-random variables. The performance of the suggested estimator was examined by varying different factors, such as degree of correlation, number of regressors, sample size and precision parameter. In our experiment, we considered four different values of degree of correlation  ρ ,  which were 0.80, 0.90, 0.95 and 0.99. The sample sizes that we considered were n = 25, 50, 100 and 200. The number of regressors  p  also varied—these were 4, 8, and 16. Three different values of precision parameter  ϕ  were taken into consideration, namely 0.5, 2, and 4. We replicated the generated data 1000 times for the different combinations of n ρ , p  and  ϕ . To evaluate the performance of the suggested estimator, the MSE was used as the evaluation criterion, which is defined as follows:
M S E β ^ = i = 1 R ( β ^ i β ) t ( β ^ i β ) R ,
where  ( β ^ i β )  can be defined as the difference between the estimated and true value of parameter vectors of the suggested and other estimators at the i-th replication and R indicates the replication numbers. All the computational work was performed on R program language with the help of the betareg R package.

3.2. Simulation Results Discussion

In this section, we will discuss the results of the simulation study under different factors, such as multicollinearity, sample size, dispersion and the number of explanatory variables. The estimated MSEs of the MLE, BRRE, BLE and BJSE are given in Table 1, Table 2, Table 3, Table 4, Table 5, Table 6, Table 7, Table 8 and Table 9. The general summary of the simulation study result is given in the following points.
  • The first factor that affected the MSEs of the BRM estimate was multicollinearity, which had a direct effect on the estimators’ performance. This was indicated by the fact that as we increased the level of multicollinearity, the MSEs of the considered estimators were increasing. On comparing the performance of the BJSE under multicollinearity, it was observed that the increase in MSE of the BJSE was too small as compared to the MSEs of the MLE, BRRE and BLE. These results show that the performance of the proposed estimator was better than the available estimators.
  • The second factor that affected the performance of the BRM estimators was the sample size. From the simulation results, we found that the estimated MSEs of the considered estimators decreased with the increase in sample size. For all considered sample sizes, the performance of the BJSE was better as compared to other BRM’s estimators.
  • The number of explanatory variables also affected the simulation results of the BRM’s estimators. Simulation results show that there was a direct relationship between the MSEs of the estimators and the number of explanatory variables. This indicated that the number of explanatory variables increased the MSEs of the BRM’s estimators. This increase in the MSE of the BJSE was very small as compared to other biased estimators. Again, BJSE showed an efficient and more consistent performance as compared to other biased estimators for dealing with the issue of multicollinearity for larger p and precision.
  • The last factor affecting the performance of the biased estimators is the dispersion parameter. As the dispersion parameter increased, the MSEs of the estimators decreased because the dispersion parameter is the reciprocal of the precision parameter.

4. Empirical Applications

In the following section, we analyzed two real-life applications to evaluate the performance of the suggested estimator. We used the MSE as evaluation criterion to demonstrate the performance of the proposed estimator.

4.1. Application 1: Heat-Treating Test Data

This dataset consists of the result of the pitch carbon analysis test and was taken from Montgomery and Runger [33]. There are 32 observations in the dataset, where pitch is the response variable and there are five independent variables. These variables are explained as follows: y = pitch, which represents a brand’s introduction to the client’s heart, implying that the amount of vibrations produced controls the sound quality. The explanatory variables can be defined as the  x 1  = furnace temperature,  x 2  = carbon concentration (soak time),  x 3  = carburizing cycle (soak pct),  x 4  = carbon concentration (Diff time),  x 5  = duration of the defuse time (Diff pct). The response variable is continuous and in the form of a ratio, so the appropriate model for modeling such a variable is the BRM. In this study, we propose the BJSE for estimating the model parameters when multicollinearity exists among the regressors. However, for comparing the performance of the new proposed estimator, we also estimated the model parameters of the MLE, the BRRE and the BLE. The coefficients of the MLE, the BRRE, the BLE and the BJSE were calculated using Equations (6), (9), (14) and (19), whereas the MSEs of these estimators are obtained from Equations (8), (13), (18) and (26), respectively. Table 10 shows the coefficients and MSEs of the MLE, the BRRE, the BLE and the BJSE. As shown in Table 10, it is observed that the MLE gained a high value of MSE, which proves that MLE is not an appropriate method of estimation in the case of high but imperfect multicollinearity. The other estimators indicating smaller values of the MSEs as compared to the MLE. Among these estimators, the proposed estimator, the BJSE, achieved a lower MSE, which indicates its superiority to the other estimators. Therefore, it is possible to say that the suggested estimator is the best option for estimating BRM parameters in the presence of multicollinearity.

4.2. Application 2: Body Fat Data

In this section, we looked at another real-world application (body fat) to demonstrate the superiority of the proposed estimator, BJSE. Johnson [34] used this dataset for predicting body fat and he found that the distribution of this dataset is non-linear in nature. It was also used by Bailey [35] to predict body fat based on age and several skinfold measurements. This dataset consists of n = 252 observations, with one response and 14 independent variables. The response variable, y, is the percentage of body fat, and the independent variables include density determined from underwater weighing  x 1 , age  x 2 , weight  x 3 , height  x 4 , neck circumference  x 5 , chest circumference  x 6 , abdomen circumference  x 7 , hip circumference  x 8 , thigh circumference  x 9 , knee circumference  x 10 , ankle circumference  x 11 , biceps extended circumference  x 12 , forearm circumference  x 13  and wrist circumference  x 14 . Given that the body fat data are in the form of a percentage, we used BRM for the evaluation of the impact of these factors on the response variable y. Table 11 consists of the estimated coefficients and MSEs of the MLE, the BRRE, the BLE and the BJSE. From Table 11, we observed that BJSE had the minimum MSE as compared to other estimators, which indicates its superiority over the other estimators. Table 11 also indicates that the MLE gained a high value of MSE, which proves that the MLE becomes unstable in the case of high but imperfect multicollinearity. The other estimators showed the minimum values of MSE, but among all the estimators, the proposed estimator BJSE achieved a lower MSE, which indicates its superiority to the other estimators. Therefore, based on simulation results and example findings, we conclude that our proposed estimator is the best option for estimating the BRM parameters when multicollinearity exists among the explanatory variables.

5. Limitations of the Proposed Model

The BRM has a limitation that it is suitable for the response variable that lies between zero and one, but it is not suitable when the response variables take the values of zero or one. Like other biased estimators (ridge and LE), the JSE is also based on its shrinkage parameter (c). In the current study, we focused only on one Stein parameter. As in the literature, various biasing parameters are proposed for the ridge estimator and the LE. Similarly, one can consider several Stein parameters to find out the optimum values for the JSE in reducing the effect of multicollinearity in the BRM.

6. Conclusions

The BRM is an appropriate model to use for predicting the response variable when it is in the form of ratios or proportions and follows the beta distribution. Sometimes, the explanatory variables of the model are linearly correlated and this is known as multicollinearity. In the case of multicollinearity, using the MLE method of estimating model parameters becomes unreliable. To address this, alternative methods, such as ridge, Liu and other estimation methods have been considered. The current study addressed the issue of multicollinearity by proposing the use of the JSE for the BRM. Furthermore, we derived its mathematical properties and compared its performance theoretically with the available methods (MLE, BRRE and BLE) in terms of MSE. A simulation experiment was conducted by varying different factors to evaluate the efficiency of the proposed estimator over other estimators. Two real-life applications were also analyzed to illustrate the findings of the simulation experiment. From the results of the simulation, it was observed that for all the scenarios, the suggested estimator outperformed its competitive estimators in the context of smaller MSE. Moreover, the findings of both applications revealed the efficiency of the proposed estimator over other considered estimators. This whole study gives evidence that, in the case of severe multicollinearity, biased estimation methods performed well. Results of both the simulation study and empirical applications provide evidence that the JSE is superior to other estimators due to its smaller MSE as compared to other considered estimators. Hence, based on the findings of the simulation experiment and real-life applications, we recommend practitioners utilize the BJSE for estimating BRM parameters due to its better results in the presence of multicollinearity.

Author Contributions

Conceptualization, M.A. and H.A.; methodology, M.A. and H.A.; software, M.A. and H.A.; writing—original draft preparation, M.A. and H.A.; writing—review and editing, M.A., H.A., H.S.B. and N.Q.; visualization, M.A., H.A. and H.S.B.; funding acquisition, N.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2023R376), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.

Data Availability Statement

The data are available from the authors upon reasonable request.

Acknowledgments

The authors gratefully acknowledge Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2023R376), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia on the financial support for this project.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Ferrari, S.; Cribari-Neto, F. Beta regression for modelling rates and proportions. J. Appl. Stat. 2004, 31, 799–815. [Google Scholar] [CrossRef]
  2. Abonazel, M.R.; Taha, I.M. Beta ridge regression estimators: Simulation and application. Commun. Stat.-Simul. Comput. 2021, 1–13. [Google Scholar] [CrossRef]
  3. Brambilla, F. Statistical confluence analysis by means of complete regression systems. G. Econ. Riv. Stat. 1937, 77, 160–163. [Google Scholar]
  4. Shrestha, N. Detecting multicollinearity in regression analysis. Am. J. Appl. Math. Stat. 2020, 8, 39–42. [Google Scholar] [CrossRef]
  5. Imdadullah, M.; Aslam, M.; Altaf, S. mctest: An R Package for Detection of Collinearity among Regressors. R J. 2016, 8, 495. [Google Scholar] [CrossRef]
  6. Stein, C. Inadmissibility of the Usual Estimator for the Mean of a Multivariate Normal Distribution; Stanford University Stanford United States: Stanford, CA, USA, 1956. [Google Scholar]
  7. Massy, W.F. Principal components regression in exploratory statistical research. J. Am. Stat. Assoc. 1965, 60, 234–256. [Google Scholar] [CrossRef]
  8. Hoerl, A.E.; Kannard, R.W.; Baldwin, K.F. Ridge regression: Some simulations. Commun. Stat.-Theory Methods 1975, 4, 105–123. [Google Scholar] [CrossRef]
  9. Hoerl, A.E.; Kennard, R.W. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
  10. McDonald, G.C.; Galarneau, D.I. A Monte Carlo evaluation of some ridge-type estimators. J. Am. Stat. Assoc. 1975, 70, 407–416. [Google Scholar] [CrossRef]
  11. Hocking, R.R.; Speed, F.; Lynn, M. A class of biased estimators in linear regression. Technometrics 1976, 18, 425–437. [Google Scholar] [CrossRef]
  12. Lawless, J.F.; Wang, P. A simulation study of ridge and other regression estimators. Commun. Stat.-Theory Methods 1976, 5, 307–323. [Google Scholar]
  13. Gunst, R.F.; Mason, R.L. Advantages of examining multicollinearities in regression analysis. Biometrics 1977, 33, 249–260. [Google Scholar] [CrossRef] [PubMed]
  14. Kibria, B.G. Performance of some new ridge regression estimators. Commun. Stat.-Simul. Comput. 2003, 32, 419–435. [Google Scholar] [CrossRef]
  15. Khalaf, G.; Shukur, G. Choosing ridge parameter for regression problems. Commun. Stat.-Theory Methods 2005, 34, 1177–1182. [Google Scholar] [CrossRef]
  16. Dorugade, A.V. New ridge parameters for ridge regression. J. Assoc. Arab Univ. Basic Appl. Sci. 2014, 15, 94–99. [Google Scholar] [CrossRef] [Green Version]
  17. Asar, Y.; Genç, A. A note on some new modifications of ridge estimators. Kuwait J. Sci. 2017, 44, 75–82. [Google Scholar]
  18. Alkhamisi, M.A.; Shukur, G. A Monte Carlo study of recent ridge parameters. Commun. Stat.-Simul. Comput. 2007, 36, 535–547. [Google Scholar] [CrossRef]
  19. Alkhamisi, M.A.; Shukur, G. Developing ridge parameters for SUR model. Commun. Stat.-Theory Methods 2008, 37, 544–564. [Google Scholar] [CrossRef]
  20. Månsson, K.; Shukur, G. On ridge parameters in logistic regression. Commun. Stat.-Theory Methods 2011, 40, 3366–3381. [Google Scholar] [CrossRef]
  21. Månsson, K. Developing a Liu estimator for the negative binomial regression model: Method and application. J. Stat. Comput. Simul. 2013, 83, 1773–1780. [Google Scholar] [CrossRef]
  22. Algamal, Z.Y. Developing a ridge estimator for the gamma regression model. J. Chemom. 2018, 32, e3054. [Google Scholar] [CrossRef]
  23. Kejian, L. A new class of blased estimate in linear regression. Commun. Stat.-Theory Methods 1993, 22, 393–402. [Google Scholar] [CrossRef]
  24. Qasim, M.; Amin, M.; Amanullah, M. On the performance of some new Liu parameters for the gamma regression model. J. Stat. Comput. Simul. 2018, 88, 3065–3080. [Google Scholar] [CrossRef]
  25. Schaefer, R.; Roi, L.; Wolfe, R. A ridge logistic estimator. Commun. Stat.-Theory Methods 1984, 13, 99–113. [Google Scholar] [CrossRef]
  26. Qasim, M.; Amin, M.; Omer, T. Performance of some new Liu parameters for the linear regression model. Commun. Stat.-Theory Methods 2020, 49, 4178–4196. [Google Scholar] [CrossRef]
  27. Amin, M.; Qasim, M.; Amanullah, M.; Afzal, S. Performance of some ridge estimators for the gamma regression model. Stat. Pap. 2020, 61, 997–1026. [Google Scholar] [CrossRef]
  28. Schaefer, R.L. Alternative estimators in logistic regression when the data are collinear. J. Stat. Comput. Simul. 1986, 25, 75–91. [Google Scholar] [CrossRef]
  29. Akram, M.N.; Amin, M.; Amanullah, M. James Stein estimator for the inverse Gaussian regression model. Iran. J. Sci. Technol. Trans. A Sci. 2021, 45, 1389–1403. [Google Scholar] [CrossRef]
  30. Qasim, M.; Månsson, K.; Golam Kibria, B. On some beta ridge regression estimators: Method, simulation and application. J. Stat. Comput. Simul. 2021, 91, 1699–1712. [Google Scholar] [CrossRef]
  31. Karlsson, P.; Månsson, K.; Kibria, B.G. A Liu estimator for the beta regression model and its application to chemical data. J. Chemom. 2020, 34, e3300. [Google Scholar] [CrossRef]
  32. Farebrother, R.W. Further results on the mean square error of ridge regression. J. R. Stat. Soc. Ser. B 1976, 38, 248–250. [Google Scholar] [CrossRef]
  33. Montgomery, D.C.; Runger, G.C. Applied Statistics and Probability for Engineers; John Wiley & Sons: Hoboken, NJ, USA, 2010. [Google Scholar]
  34. Johnson, R.W. Fitting percentage of body fat to simple body measurements. J. Stat. Educ. 1996, 4, 1–7. [Google Scholar] [CrossRef]
  35. Bailey, C. Smart Exercise: Burning Fat, Getting Fit; Houghton Mifflin Harcourt: Bosten, MA, USA, 1996. [Google Scholar]
Table 1. MSE values when p = 4 and 𝜙 = 0.5.
Table 1. MSE values when p = 4 and 𝜙 = 0.5.
n ρ MLEBRREBLEBJSE
250.807.47205.26954.34580.3620
0.9013.88089.40737.25980.6329
0.9527.961718.836413.87431.3023
0.99132.577288.133860.73006.5650
500.806.73804.92604.11760.4236
0.9013.27749.54297.74230.7180
0.9525.238217.962314.06311.3564
0.99120.911485.174563.82306.1307
1000.806.25254.69663.97400.3585
0.9010.98478.05046.65820.6836
0.9520.059014.457011.77811.3142
0.9997.814170.247656.41445.1157
2000.805.70344.27543.67330.2895
0.909.94867.25556.05270.5565
0.9518.209013.201110.92961.1866
0.9984.720560.329648.43126.4092
Table 2. MSE values when p = 4 and 𝜙 = 2.
Table 2. MSE values when p = 4 and 𝜙 = 2.
n ρ MLEBRREBLEBJSE
250.806.46271.39871.12960.0133
0.9010.76221.84891.24800.0176
0.9518.92482.58711.45150.0309
0.9980.53346.99782.81720.0558
500.805.39491.39881.18670.0102
0.908.37461.70931.29970.0114
0.9515.86592.44451.55020.0171
0.9972.13707.60203.54940.0754
1000.804.70501.38691.19430.0105
0.907.28631.68561.31610.0127
0.9513.42382.30211.53480.0232
0.9962.80206.68323.31420.0614
2000.804.65201.41791.21590.0097
0.907.47531.70211.31400.0132
0.9513.19242.23851.53030.0163
0.9959.96836.39533.16020.0421
Table 3. MSE values when p = 4 and 𝜙 = 4.
Table 3. MSE values when p = 4 and 𝜙 = 4.
n ρ MLEBRREBLEBJSE
250.806.24901.52841.81450.0064
0.909.40131.79421.84230.0068
0.9515.55452.06031.88160.0069
0.9970.17033.08072.04810.0166
500.805.64311.94382.09620.0066
0.908.24532.13292.14480.0066
0.9513.56592.36792.19940.0070
0.9953.30692.87952.24770.0113
1000.805.17602.12592.22230.0064
0.907.31002.27072.26980.0067
0.9511.35982.37852.25940.0067
0.9945.43122.85112.35880.0095
2000.805.01512.22882.25710.0067
0.907.30512.33072.30980.0066
0.9511.32822.44202.33360.0073
0.9945.08372.86122.42360.0104
Table 4. MSE values when p = 8 and 𝜙 = 0.5.
Table 4. MSE values when p = 8 and 𝜙 = 0.5.
n ρ MLEBRREBLEBJSE
250.8014.460210.26987.07660.1385
0.9027.544119.259612.04790.2453
0.9554.327838.308623.28430.4821
0.99267.7602189.9188103.56742.4785
500.8012.00839.53977.11700.1480
0.9021.731117.124112.18020.2245
0.9546.593636.846625.63720.5748
0.99229.2447181.7785120.63282.7580
1000.809.49637.98026.22990.1224
0.9018.505615.594611.90210.2080
0.9536.419330.399622.07160.4994
0.99171.0047143.1465101.64852.2256
2000.807.89566.60875.16050.0876
0.9015.193312.75499.81120.1711
0.9528.662924.111318.33790.3373
0.99140.9234118.239087.31981.6646
Table 5. MSE values when p = 8 and 𝜙 = 2.
Table 5. MSE values when p = 8 and 𝜙 = 2.
n ρ MLEBRREBLEBJSE
250.807.93641.67660.91510.0063
0.9014.29072.46390.97360.0068
0.9526.04633.84140.97820.0073
0.99122.660316.05911.79710.0138
500.807.61791.87361.11420.0068
0.9012.81572.56301.19700.0072
0.9526.03884.55931.56420.0070
0.99121.931918.81223.97690.0139
1000.806.98401.93181.18720.0068
0.9013.51023.01541.40460.0072
0.9524.71114.97331.83410.0088
0.99115.634619.39585.05230.0134
2000.806.43061.91721.20110.0067
0.9011.53152.84161.44700.0071
0.9521.51924.70651.92360.0073
0.99101.406218.25045.18770.0130
Table 6. MSE values when p = 8 and 𝜙 = 4.
Table 6. MSE values when p = 8 and 𝜙 = 4.
n ρ MLEBRREBLEBJSE
250.8013.28162.30221.75790.0060
0.9025.01092.83211.75490.0061
0.9547.83233.51031.72590.0061
0.99242.72958.85771.86650.0083
500.808.97902.31422.06260.0060
0.9016.40342.55882.03570.0060
0.9530.55023.03202.06700.0060
0.99132.21955.25222.16380.0068
1000.807.53042.31902.18420.0060
0.9011.82472.46522.21120.0060
0.9523.28992.74102.24560.0061
0.99101.38284.00672.35840.0062
2000.806.06902.31962.23450.0060
0.909.63052.50192.28060.0061
0.9516.44622.65502.30430.0061
0.9970.26644.00882.51350.0062
Table 7. MSE values when p = 16 and 𝜙 = 0.5.
Table 7. MSE values when p = 16 and 𝜙 = 0.5.
n ρ MLEBRREBLEBJSE
250.8043.494420.14126.29210.0219
0.9068.994630.59758.11080.0368
0.95122.132754.158212.03530.0627
0.99512.1863227.836534.43230.3919
500.8011.54009.54425.66480.0206
0.9020.775417.20509.18530.0325
0.9536.804630.289314.30160.0435
0.99185.5982153.108658.51060.2491
1000.808.12187.32565.47390.0168
0.9014.409112.99409.33240.0298
0.9525.792323.213515.85130.0437
0.99127.09114.068670.66420.1653
2000.807.86457.25675.73570.0133
0.9014.065612.95899.91660.0248
0.9525.943723.847017.59010.0406
0.99138.5874127.319190.06340.2117
Table 8. MSE values when p = 16 and 𝜙 = 2.
Table 8. MSE values when p = 16 and 𝜙 = 2.
n ρ MLEBRREBLEBJSE
250.8021.12423.41450.61610.0059
0.9040.71546.39380.58190.0059
0.9578.206111.53530.50170.0060
0.99407.916557.33390.49680.0062
500.809.97013.17351.05710.0060
0.9019.64005.39311.08520.0060
0.9536.68949.93151.21210.0064
0.99180.697645.13332.85470.0064
1000.807.79513.01971.23120.0060
0.9013.68674.82871.40710.0060
0.9525.41878.47581.67110.0061
0.99125.887438.72564.38980.0066
2000.807.19722.91971.26720.0060
0.9013.68885.07681.63030.0060
0.9525.73518.88292.11680.0061
0.99123.128041.14986.91200.0066
Table 9. MSE values when p = 16 and 𝜙 = 4.
Table 9. MSE values when p = 16 and 𝜙 = 4.
n ρ MLEBRREBLEBJSE
250.8040.38361.84241.16110.00587
0.9069.30232.28151.09250.00587
0.95129.74192.79621.07440.00586
0.99597.65127.02231.12040.00586
500.809.28632.29421.84010.00588
0.9014.67092.61181.84070.00588
0.9526.03163.00451.83120.00588
0.99111.49126.69201.86860.00589
1000.806.17272.43442.13740.00589
0.909.16612.59382.11820.00590
0.9515.86673.01212.10300.00590
0.9960.43695.75922.18620.00590
2000.806.20402.53332.23560.00589
0.909.62102.77322.23660.00590
0.9515.62173.17932.23880.00590
0.9967.92396.82292.36530.00590
Table 10. Estimated coefficients and MSEs of the MLE, the BRRE, the BLE and the BJSE for heat-treating test data.
Table 10. Estimated coefficients and MSEs of the MLE, the BRRE, the BLE and the BJSE for heat-treating test data.
β ^ M L β ^ B R R E β ^ B L E β ^ B J S E
Intercept−8.4172−0.0021−0.0102−1.4020
x 1 0.0023−0.0026−0.00250.0004
x 2 0.05000.06340.06860.0083
x 3 0.52130.0087−0.01920.0868
x 4 0.45280.50910.40680.0754
x 5 −0.2220−0.0485−0.1349−0.0370
MSE1813.335829.60553.67920.4215
Table 11. Estimated coefficients and MSEs of MLE, BRRE, BLE and BJSE for Body Fat Data.
Table 11. Estimated coefficients and MSEs of MLE, BRRE, BLE and BJSE for Body Fat Data.
β ^ M L β ^ B R R E β ^ B L E β ^ B J S E
Intercept30.6659−0.001328.48022.0444
x 1 −30.6599−0.0023−28.5600−2.0440
x 2 0.00180.00550.00200.0001
x 3 0.00050.00430.00050.0000
x 4 0.0048−0.01320.00380.0003
x 5 −0.0040−0.0515−0.0069−0.0003
x 6 −0.0010−0.0159−0.0018−0.0001
x 7 −0.00360.06890.0014−0.0002
x 8 −0.0056−0.0442−0.0079−0.0004
x 9 0.00870.02120.00970.0006
x 10 0.0058−0.00640.00520.0004
x 11 −0.00740.0024−0.0066−0.0005
x 12 0.00470.02070.00590.0003
x 13 −0.00250.0270−0.0005−0.0002
x 14 0.0085−0.1368−0.00080.0006
MSE119.4920106.9982103.45270.2352
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Amin, M.; Ashraf, H.; Bakouch, H.S.; Qarmalah, N. James Stein Estimator for the Beta Regression Model with Application to Heat-Treating Test and Body Fat Datasets. Axioms 2023, 12, 526. https://doi.org/10.3390/axioms12060526

AMA Style

Amin M, Ashraf H, Bakouch HS, Qarmalah N. James Stein Estimator for the Beta Regression Model with Application to Heat-Treating Test and Body Fat Datasets. Axioms. 2023; 12(6):526. https://doi.org/10.3390/axioms12060526

Chicago/Turabian Style

Amin, Muhammad, Hajra Ashraf, Hassan S. Bakouch, and Najla Qarmalah. 2023. "James Stein Estimator for the Beta Regression Model with Application to Heat-Treating Test and Body Fat Datasets" Axioms 12, no. 6: 526. https://doi.org/10.3390/axioms12060526

APA Style

Amin, M., Ashraf, H., Bakouch, H. S., & Qarmalah, N. (2023). James Stein Estimator for the Beta Regression Model with Application to Heat-Treating Test and Body Fat Datasets. Axioms, 12(6), 526. https://doi.org/10.3390/axioms12060526

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop