Next Article in Journal
Audio-Visual Effects of a Collaborative Robot on Worker Efficiency
Next Article in Special Issue
Simultaneous Confidence Intervals for All Pairwise Differences between Means of Weibull Distributions
Previous Article in Journal
System Modeling and Simulation for Investigating Dynamic Characteristics of Geared Symmetric System Based on Linear Analysis
Previous Article in Special Issue
A New Sine-Based Distributional Method with Symmetrical and Asymmetrical Natures: Control Chart with Industrial Implication
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Model Selection in Generalized Linear Models

1
Department of Mathematics, Gonzaga University, Spokane, WA 99258-0102, USA
2
Department of Mathematics and Statistics, University of Windsor, Windsor, ON N9B 3P4, Canada
*
Author to whom correspondence should be addressed.
Symmetry 2023, 15(10), 1905; https://doi.org/10.3390/sym15101905
Submission received: 24 August 2023 / Revised: 25 September 2023 / Accepted: 9 October 2023 / Published: 11 October 2023

Abstract

:
The problem of model selection in regression analysis through the use of forward selection, backward elimination, and stepwise selection has been well explored in the literature. The main assumption in this, of course, is that the data are normally distributed and the main tool used here is either a t test or an F test. However, the properties of these model selection procedures are not well-known. The purpose of this paper is to study the properties of these procedures within generalized linear regression models, considering the normal linear regression model as a special case. The main tool that is being used is the score test. However, the F test and other large sample tests, such as the likelihood ratio and the Wald test, the AIC, and the BIC, are included for the comparison. A systematic study, through simulations, of the properties of this procedure was conducted, in terms of level and power, for symmetric and asymmetric distributions, such as normal, Poisson, and binomial regression models. Extensions for skewed distributions, over-dispersed Poisson (the negative binomial), and over-dispersed binomial (the beta-binomial) regression models, are also given and evaluated. The methods are applied to analyze two health datasets.

1. Introduction

In modern scientific studies, a primary focus is on selecting the appropriate models to use. Researchers typically gather data by measuring various aspects of the subjects being observed and then analyze how these variables affect a specific outcome. It is essential to determine which measures are essential to the outcome, identify any irrelevant measures, and evaluate any potential interactions between the variables that require consideration [1].
In particular, the importance of model selection in regression analysis when dealing with normally distributed response variables is very familiar and widely applied in many areas of study, including engineering, biomedical sciences, and social sciences. The overwhelming interest to researchers in these fields is to obtain a regression model with as few regression parameters as possible (a property called a parsimonious model). The popular method, in practice, is one of forward selection, backward elimination, or stepwise selection procedures through a test of significance of a single regression coefficient, for example, a test of H 0 : β j = 0 , using the F test ([2,3]), where β j is the jth regression parameter in a multiple linear regression model. Other model selection procedures, such as the Akaike information criterion (AIC) [4] and the Bayesian information criterion (BIC) [5] are also available. However, the properties of these model selection procedures are not well-known.
The purpose of this paper is to study the properties of these procedures in generalized linear models (GLMs). In GLMs, the choice of probability distribution is not limited to symmetric distributions like the normal distribution. It encompasses a range of asymmetric probability distributions, including the binomial and Poisson distributions. In this paper, a model selection procedure is developed to accommodate both symmetric and asymmetric regression models.
The history of regression starts from Gauss and Legendre who first introduced the method of least squares in the early 1800s. Later, between the 1800s and 1900s, Galton and Pearson developed the concept of regression. It was Fisher who combined the works of Gauss and Pearson to form a complete theory of the properties of least squares estimation. Fisher’s contribution to this field made regression analysis useful for predicting and understanding correlations as well as making inferences about the relationship between a response and a covariate. Later, nonparametric regression and semiparametric regression methods were developed based on kernels by Fan [6], splines by Eilers and Marx [7], and wavelets by Bock and Pliego [8]. In this paper, we work on generalized linear regression models.
In an ordinary linear regression model (OLS), we assume that the error terms are normally distributed with common variance. However, in binomial regression models, the error terms can only be 0 or 1 for each observation, and no constant variance. Moreover, in the Poisson regression model, error terms can only be positive numbers, whereas in OLS, error terms could take on any value on the real number line. As a result, it cannot be assured, without further investigation, whether the model selection procedures that work for normally distributed data will also work for non-symmetric data, such as the Poisson and binomial data.
There have been recent advancements in selecting variables for GLM that are suitable for either large datasets or those with high dimensions. Some of the references for variable selection in big data are based on elastic net regularization paths [9], debiased lasso [10], reference models [11], regularized version of the least-squares criterion [12]. For variable selection in high-dimensional GLMs with binary outcomes, see [13], for temporal-dependent data, refer to [14], and for knowledge transfer, refer to [15]. Model selection has garnered significant attention in the Bayesian approach to generalized linear mixed models [16]. In this context, it is worth noting that there is a large amount of literature on goodness-of-fit tests, for example [17,18,19,20,21,22,23,24]. However, in this paper, we investigate variable selection in GLM for non-symmetric data, such as binomial and Poison regression models, when the dataset is small or moderate. This type of investigation, to the best of our knowledge, does not exist in the literature.
There are two aspects to the model selection procedure: (a) finding a suitable test statistic for testing the significance of a single regression coefficient, for example, to test H 0 : β j = 0 , which performs best in holding an appropriate level of significance, say 5 % , and has superior power properties, and (b) finding a model selection procedure using this suitable test statistic, which, again, has the best properties with respect to level and power.
For (a), we developed three large sample test statistics, namely, the score test, the likelihood ratio test, and the Wald test. These three tests, along with the usual F test, are compared using a simulation study.
The score test [25] is a special case of the C ( α ) test [26], where the nuisance parameters are replaced by maximum likelihood estimates, which are n -consistent; here, n denotes the number of observations used in estimating the parameters. The score test is particularly appealing as we only have to study the distribution of the test statistic under the null hypothesis, which is that of the basic model. It often maintains, at least approximately, a preassigned level of significance, and often produces a statistic that is simple to calculate. On the contrary, the other two asymptotically equivalent tests (the LRT test and Wald test) require estimates of the parameters under the alternative hypothesis and often show liberal or conservative behaviors in small samples. For further discussion, see [27].
For (b), an extensive simulation study was conducted to compare the properties of the forward selection and the backward elimination procedure using the best statistic found in (a), with the AIC and the BIC. Further discussion on this is provided in Section 3.1.1.
In Section 2, we develop the three large sample test statistics, which are then specialized for data from the normal, Poisson, and binomial distributions. The F statistic used in model selection for data from a normal distribution is also discussed. The results of an extensive simulation study are reported in Section 3. Extensions for asymmetric distributions, such as over-dispersed Poisson (the negative binomial) and over-dispersed binomial (the beta-binomial) regression models, are presented and evaluated in Section 4. Two examples are presented in Section 5; a discussion follows in Section 6.

2. Generalized Linear Model and the Test Statistics

2.1. Generalized Linear Model

The Generalized Linear Model (GLM) was developed by Nelder and Wedderburn [28]. A GLM is the generalization of ordinary linear regression models to encompass non-normal response distributions and nonlinear functions of the mean. It is composed of three components:
(i)
The random component: This describes the response variable y (categorical or continuous) and its probability distribution.
(ii)
The systematic component: This connects a set of covariates with a linear predictor in the following form:
η = j = 1 p X j β j .
(iii)
The link function: It is a monotone differentiable function f applied to each component of E ( y ) , which connects the random and systematic components through η = f ( E ( y ) ) . For more details, see [28,29].
The random variable Y has a distribution of the GLM form if
f ( y ; θ ) = exp a ( θ ) y g ( θ ) + c ( y ) ,
where θ = η = X β . In GLMs, V ( μ ) is a variance function that characterizes a particular GLM family of distributions. Apart from the normal distribution, the discrete models, namely, the binomial model and the Poisson model, belong to this family. A set of covariates x 1 , x 2 , , x k is related to the mean μ by θ ( μ ) = X β , where θ is the link function, X = [ x i r ] is an n × p matrix, and β = ( β 0 , β 1 , , β p ) is the vector of regression parameters. Furthermore, we assume that x i 0 = 1 , so that β 0 is the intercept parameter.
Inference procedures regarding the mean μ or the regression parameters β 0 , β 1 , , β p are made using the log-likelihood function l ( y , μ ) . The log-likelihood for Y i , ( i = 1 , 2 , , n ) can be written as
l = i = 1 n a ( θ i ) y i g ( θ i ) + c ( y i )

2.2. The Test Statistics

Our interest is to develop a test statistic for testing the hypothesis where one of the β parameters is zero. As such, we consider the null hypothesis H 0 : β j = 0 with β 0 , β 1 , , β j 1 , β j + 1 , , β p unspecified, against H a : β j 0 .
In order to develop the test statistic that follows a distribution of the GLM form, we need to obtain the maximum likelihood estimates of the β parameters under the null as well as under the alternative hypotheses using the log-likelihood in Equation (1) developed above, the first derivative of which is
l β j = i = 1 n y i μ i V i μ i θ i x i j ,
where μ i = E ( y i ) = g ( X i β ) a ( X i β ) , μ i η i = h ( X i β ) , and V i = var ( y i ) = g ( X i β ) a ( X i β ) a ( X i β ) g ( X i β ) ( a ( X i β ) ) 3 , where denotes differentiation with respect to θ i . To estimate the parameters β k , k = 0 , 1 , , p , we need to solve l β k = 0 , which is non-linear in β k , so must be solved iteratively.
Note that under the null hypothesis, we estimate β k for k = 0 , 1 , , j 1 , j + 1 , , p . We denote these estimates through β ^ k . Furthermore, under the alternative hypothesis, we estimate β k , for k = 0 , 1 , , p . We denote these estimates by β ˜ k .

2.2.1. The Likelihood Ratio Test and the Wald Test

Generally, the likelihood ratio statistic used to test a null hypothesis against an alternative is the ratio of the maximum likelihood under the null hypothesis to that under the alternative hypothesis. In practice, we maximize the log-likelihoods to find maximum likelihood estimates of the parameters under the null and the alternative hypotheses. Let l ^ and l ˜ be the maximized log-likelihood under the null and the alternative hypotheses, respectively. Then, it can be shown that the likelihood ratio statistic for testing the null hypothesis H 0 : β j = 0 with β 0 , β 1 , , β j 1 , β j + 1 , , β p unspecified, against H a : β j 0 , is L R T j = 2 l ˜ l ^ .
Similarly, the Wald test statistic is the ratio of the maximum likelihood estimate of the parameter of interest under the alternative hypothesis and its standard error. Thus, the Wald test statistic for testing the null hypothesis H 0 : β j = 0 with β 0 , β 1 , , β j 1 , β j + 1 , , β p unspecified, against H a : β j 0 , is given by W j = β ˜ j var ( β ˜ j ) , where var ( β j ˜ ) is obtained from the Hessian matrix at the end of the iterative process.

2.2.2. The Score Test

The score test is based on the partial derivatives of the log-likelihood function with respect to the nuisance parameters and the parameters of interest evaluated at the null hypothesis. The score test statistic can be shown to be
S j = P j ^ 2 x j W ^ I n X j X j W ^ X j 1 X j W ^ x j .
For the derivation of the score test statistics and the definition of P j ^ , x j , W ^ , and X j ; see Appendix A. The above score test can also be obtained from Pregibon [30], who developed the score test for the generalized linear interactive modeling system. The proof is presented in Appendix B. Note that the symbol that represents the MLE under the null hypothesis. Asymptotically (for large n), the distribution of each test statistic, L R T j , W j 2 , and S j , converges to χ 2 ( 1 ) [26]. Therefore, for a fixed significance level α > 0 , we reject the null hypothesis if the value of a test statistic is greater than χ α 2 ( 1 ) .
To save space, the expressions for the three test statistics, L R T j , W j , and S j , for the special cases for which the data distribution is normal, Poisson, and binomial, respectively, are presented in Appendix A.

2.2.3. The F Test

The F statistic used in model selection for data from a normal distribution is N F = SSR ( x j | x 1 , , x j 1 , x j + 1 , , x p ) / df 1 SSE ( x 1 , , x p ) / df 2 , where
SSR ( x j | x 1 , , x j 1 , x j + 1 , , x p ) = SSE ( x 1 , , x j 1 , x j + 1 , , x p ) SSE ( x 1 , , x p ) ,
df 1 = 1 and df 2 = n p 1 . Here, N F F ( 1 , n p 1 ) if H 0 holds ([3], p. 267).

2.3. Simulation Study

A simulation study is now conducted to compare the behaviors of the four test statistics, namely, the score, the LRT, the Wald, and the F, in terms of empirical level and power, for testing the significance of a single regression coefficient. We consider a two-variable regression model with link functions μ = β 0 + β 1 x 1 + β 2 x 2 , λ = exp ( β 0 + β 1 x 1 + β 2 x 2 ) , and p 1 p = exp ( β 0 + β 1 x 1 + β 2 x 2 ) for N ( μ , σ 2 ) , Poisson ( λ ) , and Bin ( m , p ) distribution, respectively. x 1 and x 2 are generated from the standard normal distribution and σ = 2 is considered for normal distribution.
Suppose our interest is to test H 0 : β 2 = 0 against H a : β 2 0 in each case. For empirical levels, we take β 0 = 1 , β 1 = 1 , and β 2 = 0 . For power, we take β 0 = 1 and β 1 = 1 , and different values of β 2 , as presented in Table 1 for normal and Poisson-distributed data, and Table 2 for binomial-distributed data.
For data from the binomial distribution, the level and power results may be affected by the binomial index m. To check this, we conduct simulations for m = 10 , m = 30 , and m = 40 . For both level and power, we consider sample sizes n = 10 , 20 , 30 , and 50 for all distributions. Each simulation experiment is based on 10,000 replicated samples. The level and power results are presented in Table 1 for normal and Poisson distributions and in Table 2 for binomial distribution. Results in Table 1 show that for normally distributed data, the score test and the F test maintain the level reasonably well, although the score test shows some inflated level. As a result, it shows some inflated power. The other two statistics (Wald and LRT) show liberal behavior. Because of this, these two statistics show higher power than the other two tests.
Results in Table 1 and Table 2 show that for data from the Poisson and binomial distributions, the F test performs very badly. The other three statistics hold the level very well and their power performances are also similar. Furthermore, results in Table 2 show that the size of the binomial index m does not have any effect on the size and power of the tests. So, in subsequent sections, we choose m = 40 as the binomial index.
We further conducted a simulation study where the covariates x 1 and x 2 are correlated for Poisson and binomial distributions, and the results (not included in the paper) show similar empirical level and empirical power properties.
It is reassuring that the F test does well for data from the normal distribution. So, in Section 3, we use this test in the study of the performance of the model selection procedures for normally distributed data. For data from Poisson and binomial distributions, we use the score test as it has a very simple form, it does not need estimates of the regression parameters under the alternative hypothesis, and its level and power properties are at least as good as those of the LRT and the Wald tests.

3. Model Selection

3.1. Empirical Level and Power

Following the findings in Section 2.3, our model selection criterion for normally distributed data is based on testing the significance of a single regression coefficient β j using the F test presented in Section 2.2.3. Also, as discussed in Section 2.3, for data from the Poisson and the binomial distributions, we use the score test statistic S P j and S B j , respectively, presented in Appendix A. Our purpose here is to make a comparative study of the performance of forward selection, backward elimination, AIC, and BIC, with respect to level and power.
Although these model selection procedures are well known, to help the readers, we provide brief descriptions of them below.
Forward Selection Procedure: The forward selection starts with only one variable in the model. So, if the model has p regression variables, apart from the intercept, in the first step, we fit p regression models and calculate the value of the score test statistic for each model. Then, the variable corresponding to the largest value of the score test statistic, which is also significant at a specified level of significance, is kept in the model. In step 2, we fit p 1 regression models with the regression variable selected at step 1, and one of the remaining p 1 regression variables, and follow the procedure as in step 1. We then continue this process by adding one more variable, each time, until no more variables can be included in the model. In the end, the final model will have q p variables.
Backward Elimination Procedure: The backward elimination starts with the full model. We calculate (p) the score test statistic for testing H 0 : β j = 0 , j = 1 , 2 , , p . Then, if the variable with the smallest value of the score test statistic is found to be insignificant at a specified level of significance, we remove that variable from the model. We then continue this process by removing one more variable each time, until no more variables can be deleted from the model.
AIC and BIC Criteria: AIC judges a model by how close its fitted values tend to be to the true values, in terms of a certain expected value. AIC can be written as A I C = 2 l + 2 p . Forward selection through AIC starts from the null model and every variable outside the current model can be added one at a time at each step until AIC is no better. A Bayesian argument motivates the BIC, an alternative to AIC. It takes the sample size into account and the forward selection process through BIC is similar to AIC, where B I C = 2 l + ln ( n ) p .
As mentioned earlier, our purpose is to find the most parsimonious model. Here, we illustrate a method of calculating the empirical level using a p variable Poisson regression model with ln ( μ ) = β 0 + β 1 x 1 + + β p x p . For given values of the regression parameters and simulated values of the regression variables, we obtain a sample of size n from the Poisson ( μ ) distribution. We then use the score test statistic for testing H 0 : β j = 0 and a model selection procedure, for example, the forward selection procedure, and find a model of a subset of the regression variables. We repeat this process 10,000 times and find 10,000 models. If the given value of β j is very small, we want to see that the regression variable x j is in the final model. We then count the number of models in which the variable x j is included. Let this number be s. Then the empirical level for rejecting H 0 : β j = 0 is s/10,000. Empirical power is calculated similarly by taking a larger value of β j during the simulation process.

3.1.1. Simulation Study

We conduct a simulation study to compare the performance of the model selection procedures, forward selection, backward elimination, AIC, and BIC, with respect to empirical level and power. We consider a four-variable regression model. Data are drawn from the normal N ( μ , σ 2 ) regression model, the Poisson ( λ ) regression model, and the Binomial ( m , p ) regression model with
μ = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + β 4 x 4 , λ = exp ( β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + β 4 x 4 ) , and p 1 p = exp ( β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + β 4 x 4 )
respectively. Suppose we would like to test H 0 : β 1 = 0 . To calculate the empirical level for each distribution, we choose β 1 = 0 , and for empirical power, we take different values of β 1 , as presented in Table 3. The rest of the parameters are set at σ 2 = 2 , β 2 = 0.3 , β 3 = 0.2 , and β 4 = 0.3 for normal and Poisson distributions, and m = 40 , β 2 = 0.2 , β 3 = 0.1 , and β 4 = 0.2 for binomial distributions. For each distribution, 10,000 replicated samples are taken for sample sizes of n = 10, 20, 30, and 50.
For the forward selection and backward elimination procedures, we consider α = 0.05 . Note that for the other two procedures, α cannot be fixed.
The level and power results are presented in Table 3 and Table 4, which show that the forward selection method using the F test for normal-distributed, and both forward selection and backward elimination using the score test for Poisson and binomial-distributed data, always produce a reasonable empirical level (close to the nominal level), irrespective of the sample size. The other two procedures, the AIC and BIC, produce highly inflated type I errors. The BIC, however, does well for a large sample size ( n = 50 ), where its power performance is also comparable to that of the forward selection and backward elimination procedures using the score test.
Thus, for normal regression models, our recommendation is to use the forward selection procedure using the F test. For Poisson and binomial regression models, our recommendation is to use the forward selection procedure using the score test for small to moderate sample sizes, while for large n ( n > 50 ) sizes, the BIC should be used as it is computationally much simpler.

4. Over-Dispersed Poisson and Over-Dispersed Binomial Regression Models

4.1. Introduction and Motivation

Discrete data, in terms of proportions, are commonly encountered in toxicology and related areas. When the experimental unit is a litter, there tends to be a litter effect, meaning that littermates respond more similarly to each other than to animals from other litters. Based on the experimental data, fetuses from the same litter tend to have similar responses to the treatment. The probability of success may vary across litters, indicating that a binomial model may not be a good fit for proportion-based data. The two-parameter beta-binomial (BB) model is widely used for analyzing count data of this nature, proposed originally by Williams [31] and later applied by Paul [32] and others, assuming that the binomial parameter varies between litters.
Discrete data in the form of counts arise in many health science disciplines, such as biology and epidemiology. For examples of discrete count data, see [19,20,33,34,35,36,37].
The Poisson distribution has the property that the mean and the variance are equal. However, in practice, count data often display extra-Poisson variation or over/under dispersion relative to a Poisson model. Thus, Poisson distribution is not an ideal choice for analyzing count data in many applications. One very convenient and common model to accommodate this extra dispersion is the two-parameter negative binomial distribution. For applications of the negative binomial distribution, see, for example, [38,39,40,41].
In Section 4.2 and Section 4.3, we extend the methods and ideas developed in Section 2 and Section 3 for model selection for Poisson and binomial regression models to over-dispersed Poisson and over-dispersed binomial regression models, respectively. Specifically, we deal with model selection procedures in negative binomial regression and beta-binomial regression models. Here, we first develop the score, the LRT, and the Wald tests for testing the significance of a single regression variable, and then for model selection, we compare the forward selection, the AIC, and the BIC procedures.

4.2. Negative Binomial Regression Model

Consider the negative binomial (NB) distribution with probability density function
f ( y ; m , c ) = Γ ( y + c 1 ) Γ ( c 1 ) y ! ( c m c m + 1 ) y ( 1 c m + 1 ) c 1 ,
with mean E ( y ) = m and variance var ( y ) = m ( 1 + c m ) (see [42]). We denote this distribution as NB ( m , c ) . In Equation (7), term c represents the dispersion parameter, which is constant. Clearly, when c 0 , the NB distribution reduces to the Poisson distribution with parameter m.
Let y i , i = 1 , , n , be a random sample from the N B ( m i , c ) distribution with m i = exp ( x i β ) = exp ( β 0 + x i 1 β 1 + + x i p β p ) , then m i β j = m i x i j . Then the log-likelihood of the NB regression model is
l = i = 1 n y i log ( m i ) ( y i + c 1 ) log ( c m i + 1 ) + j = 1 y i log [ 1 + c ( j 1 ) ] .
The first and second-order partial derivatives of the log-likelihood function with respect to the parameters β and c are presented in Appendix B.

4.2.1. Derivation of the Test Statistics

We follow the same procedure to find the score test for testing H 0 : β j = 0 as described in Appendix A. Omitting the details, the score, the Wald, and LRT statistics are
S N B j = S ( D A 1 B 11 1 A 1 ) 1 S = i = 1 n ( y i m ^ i ) x i j 1 + c ^ m ^ i 2 x j W I n X j X j W X j 1 X j W x j , W N B j = β ˜ j / var ( β ˜ j ) = β ˜ j / i = 1 n m ˜ i 1 + c ˜ m ˜ i x i j 2 , and L N B j = 2 i = 1 n { y i log m ˜ i m ^ i ( y i + c ˜ 1 ) log ( c ˜ m ˜ i + 1 ) + ( y i + c ^ 1 ) log ( c ^ m ^ i + 1 ) + l = 1 y i log 1 + c ˜ 1 ( l 1 ) 1 + c ^ 1 ( l 1 ) } ,
where w i = m ^ i 1 + c ^ m ^ i , W = diag ( w 1 , , w n ) , m ^ i = exp ( β ^ 0 + β ^ 1 x i 1 + + β ^ j 1 x i ( j 1 ) + β ^ j + 1 x i ( j + 1 ) + + β ^ p x i p ) , and m ˜ i = exp ( x i β ˜ ) , and where β ˜ is the maximum likelihood estimate of β under the alternative hypothesis.

4.2.2. Simulation Study

We conducted two simulation studies; the first compared the performances of three test statistics, the score, the Wald, and the LRT; and the other compared the performances of model selection through forward selection, AIC, and BIC.
Empirical level and power of the score, the Wald, and the LRT tests: Data are simulated from the negative binomial regression model NB ( m , c ) with link function m = exp ( β 0 + β 1 x 1 + β 2 x 2 ) and we would like to test the null hypothesis H 0 : β 2 = 0 against H a : β 2 0 .
For empirical levels: We simulate response data from the negative binomial regression model with c = 0.03 , β 0 = 2 , β 1 = 0.3 , and β 2 = 0 . For powers, different values of β 2 are taken, as represented in Table 5. The independent variables, x 1 and x 2 , are generated from the standard normal distribution.
The level and power results are presented in Table 5. The results show that the score test has the best level property (empirical level close to the nominal level). The other two statistics show some inflation of the empirical level compared to the nominal level, which results in a higher power for the Wald and the LRT statistics. So, here, we also use the score test statistic in model selection with the forward selection procedure.
Empirical level and power in model selection through the forward selection procedure: A simulation study is conducted similar to that in Section 3.1 to compare the property of the model selection procedure through forward selection using the score test with the other two criteria (AIC and BIC).
Data are taken from the NB ( m , c ) distribution with m = exp ( β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + β 4 x 4 ) , β 0 = 2 , β 1 = 0 , β 2 = 0.3 , and β 4 = 0.1 for the empirical level and values of β 1 are presented in Table 6 for power. The value of c is c = 0.03 . Furthermore, as in Section 3, sample sizes and nominal level are chosen for n = 10 , 20, 30, 50, and α = 0.05 . The level and power results are presented in Table 6.
The results in Table 6 show a similar performance akin to that of forward (score), as observed in Table 3, specifically indicating its level is close to the nominal level. The other procedures show a highly inflated empirical level, even for large n ( n = 50 ).
Study of misspecification of models: A small study of the misspecification of models was conducted. Specifically, we studied the performance of test statistics developed under the assumption of Poisson-distributed data, where the data are distributed as negative binomial and vice-versa. In Table 7, the results of such a study are presented. When data are generated from the negative binomial distribution, where the statistics are developed using the Poisson probability density function, the statistics show an inflated level and power. However, when data are generated from the Poisson distribution, where the statistics are developed using the negative binomial probability density function, the statistics do not show an inflated level and power. This is reasonable as the Poisson distribution is a special case of the negative binomial distribution.

4.3. Beta-Binomial Regression Model

Suppose that Y follows a beta-binomial distribution with mean μ and dispersion parameter θ , denoted by Y BB ( k , μ , θ ) if Y has the following probability mass function
P ( Y = y ) = k y r = 0 y 1 ( μ + r θ ) r = 0 k y 1 ( 1 μ + r θ ) r = 0 k 1 ( 1 + r θ ) ,
for y = 0 , 1 , , k , 0 μ 1 and θ max μ / ( k 1 ) , ( 1 μ ) / ( k 1 ) with mean E ( Y ) = k μ and variance var ( Y ) = k μ ( 1 μ ) [ 1 + ( k 1 ) ϕ ] , where ϕ = θ / ( 1 + θ ) (see [31,32]).
Note that, as θ 0 , the BB ( k , μ , θ ) tends to the binomial ( k , μ ) distribution; for θ = 0 , we have var ( Y ) = k μ ( 1 μ ) , and BB ( k , μ , θ ) becomes the binomial ( k , μ ) distribution.
Let y i , i = 1 , , n be a random sample from the BB ( k i , μ i , θ ) . Then the log-likelihood is
l = i = 1 n r = 0 y i 1 log ( μ i + r θ ) + r = 0 k i y i 1 log ( 1 μ i + r θ ) r = 0 k i 1 log ( 1 + r θ ) .
The mean μ i is assumed to follow the logistic model μ i ( x i , β ) = exp ( x i β ) 1 + exp ( x i β ) , so
μ i β j = μ i ( 1 μ i ) x i j . The first- and second-order partial derivatives of l, with respect to parameters β and θ , are presented in Appendix B.

4.3.1. Derivation of the Test Statistics

Using the same procedure that is presented in Appendix A, the score test statistic for testing H 0 : β j = 0 is
S B B j = i = 1 n r = 0 y i 1 1 μ ^ i + r θ ^ r = 0 k i y i 1 1 1 μ ^ i + r θ ^ μ ^ i ( 1 μ ^ i ) x i j 2 V ^ j ,
where μ ^ i 1 μ ^ i = exp ( β ^ 0 + β ^ 1 x i 1 + + β ^ j 1 x i ( j 1 ) + β ^ j + 1 x i ( j + 1 ) + + β ^ p x i p ) , β ^ , and θ ^ are the maximum likelihood estimates of β and θ under the null hypothesis, and V ^ j = V j ( μ ^ , θ ^ ) is presented in Appendix C.
The Wald test and LRT test statistics are as follows:
W B B j = β ˜ j / var ( β ˜ j ) = β ˜ j / i = 1 n ( p 1 i + p 2 i ) μ ˜ i 2 ( 1 μ ˜ i ) 2 x i j 2 and L B B j = 2 i = 1 n r = 0 y i 1 log μ ˜ i + r θ ˜ μ ^ i + r θ ^ + r = 0 k i y i 1 log 1 μ ˜ i + r θ ˜ 1 μ ^ i + r θ ^ r = 0 k i 1 log 1 + r θ ˜ 1 + r θ ^ ,
where β ˜ and θ ˜ are the maximum likelihood estimates of β and θ under the alternative hypothesis with μ ˜ i 1 μ ˜ i = exp ( x i β ˜ ) .

4.3.2. Simulation Study

Two simulation studies are conducted in this subsection: the first compares the performances of three test statistics and the other compares the performance of the model selection by forward selection through the Wald test and using AIC and BIC.
Empirical level and power of the score, the Wald, and the LRT tests: We take data from the beta-binomial regression model BB ( k , μ , θ ) with the link function μ 1 μ = exp ( β 0 + β 1 x 1 + β 2 x 2 ) and test the null hypothesis H 0 : β 2 = 0 against H a : β 2 0 .
For empirical levels: We simulate data from the beta-binomial regression model BB ( k , μ , θ ) with k = 40 , θ = 0.2 , β 0 = 1 , β 1 = 0.5 , and β 2 = 0 . For powers, different values of β 2 are taken as represented in Table 8. The independent variables x 1 and x 2 are generated from the standard normal distribution. As in previous studies, we consider sample sizes n = 10 , 20 , 30 , and 50. The level and power results are presented in Table 8.
The results in Table 8 show that the score test here does not enjoy the favorable level property observed for data from Poisson, binomial, and negative binomial distributions. In this case, the levels are, in general, somewhat liberal for all sample sizes. However, this level property of the score test is consistent in the sense that it holds a similar level, irrespective of the sample size. The other two statistics are liberal for small sample sizes, and as the sample size increases, the level of behavior becomes closer to that of the score test. For larger sample sizes (30 and 50), the empirical level (of all three statistics) is close (and closer) to the nominal level, although it remains somewhat liberal. Furthermore, for these sample sizes, the Wald and the LRT statistics show similar power, which is much better than that of the score test statistic. Thus, for small sample sizes, none of the statistics can be recommended to test the significance of a single regression coefficient. For large sample sizes, although somewhat liberal, we recommend using the Wald test as it has a simple form and a significant power advantage over the score test.
Empirical level and power in model selection through the forward selection procedure: We conducted a simulation study similar to that in Section 3.1 to investigate the model selection behavior through the forward selection procedure using the Wald test, the AIC, and the BIC, in terms of level and power, but only for large sample sizes (see below).
To calculate the empirical level, we generate data from the beta-binomial regression model BB ( k , μ , θ ) with μ 1 μ = exp ( β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + β 4 x 4 ) . We choose k = 40 , θ = 0.2 , β 0 = 1 , β 1 = 0 , β 2 = 0.5 , β 3 = 0.4 , and β 4 = 0.5 for empirical power. We take different values of β 1 , as presented in Table 9. For each simulation experiment, 10,000 replicated samples are taken for sample sizes n = 30 and 50. For the forward selection procedure, we consider α = 0.05 . The level and power results are presented in Table 9.
The results in Table 9 show that empirical levels of the forward selection procedure using the Wald test and the BIC are approaching the nominal level (5%) as n increases. However, the empirical level is still not very close to the nominal level. So, we extend this simulation study to n = 70 and n = 100 to determine the sample size needed to achieve an empirical level proximate to the nominal level. It shows that the empirical level of the Wald test moves closer to the nominal level as n increases to 70 and 100, although it is still not very close. However, overall, its property is the best, both in terms of level and power. For example, for n = 100 , empirical levels of the Wald, AIC, and BIC are 5.83, 16.57, and 3.27, respectively. The corresponding powers for β 1 = 0.25 are 66.68, 83.24, and 60.44. Compared to the Wald test, AIC shows an inflated empirical level and BIC shows a deflated level. This is reflected in the power results, namely that AIC shows higher power and BIC shows lower power than that of the Wald test.
Also, the empirical power of the procedure using either the Wald test or BIC is similar. Thus, for model selection, our recommendation is to use the forward selection procedure through either the Wald test or the BIC, as both are easy to compute for large sample sizes.
Note that all of the simulation studies were conducted with different parameter values and have similar empirical levels and powers that are not included in the paper to save space.

5. Real Data Analysis

To demonstrate the practical application of the model selection procedures discussed in this paper, we examine two real datasets that have small sample sizes.
Dataset 1: The Lower Respiratory Illness Count Dataset.
This is a dataset provided by LaVange et al. [43], consisting of information on lower respiratory illness in 284 children during their first year of life. Each child was examined every two weeks over a period of one year.
There were eight covariates, namely
x 1 : Risk: the number of weeks where the child is at risk in that year.
x 2 : Passive: a dummy variable that indicates whether the child was exposed to cigarette smoking.
x 3 : Crowding: a variable that indicates whether or not living at home is crowded.
x 4 : Race: an indicator variable for race (1 = white, 0 = not white).
( x 5 , x 6 ) : Socioeconomic status (1, 0), (0, 1), and (0, 0) for low-, medium-, and high-class, respectively, and
( x 7 , x 8 ) : Age group (1, 0), (0, 1), and (0, 0) for under four, four to six, and more than six months, respectively.
We find this dataset appealing as it comes from a real experiment. However, this is a large dataset. So, we construct a small dataset consisting of a random sample of 50 children with their respective lower respiratory illness status and the covariates. The dataset is presented in Appendix D, and is analyzed below.
This is a count dataset. The usual model to analyze such count data is a Poisson regression model. So, we first use a Poisson regression model and apply the model section procedures discussed earlier. However, there may be overdispersion in the data since the children who have an infection are more likely to have other infections. To test this, we apply the score test statistic, T L M , given by Cameron and Trivedi ([44], p. 49) to test H 0 : c = 0 versus H a : c > 0 . This statistic has an asymptotic standard normal distribution, and for the sample data, T L M = 1.9943 with a p-value < 0.023 .
We consider a negative binomial regression model to accommodate overdispersion. Thus, the full model considered here for model selection is
log y = β 0 + β 1 x 1 + + β 8 x 8 .
In Table 10, we provide variables that enter into the model in each step of the forward selection procedure using the score test, Wald test, LRT test, AIC, and BIC.
Table 10 shows that two covariates (passive smoking and crowding) are significant out of the eight covariates using the forward selection procedure through the score test, and through the Wald test for the Poisson and negative binomial regression models. In contrast, the forward selection procedure through the AIC and the BIC provides different parsimonious models. We select the final model using the forward selection through the score test.
Thus, the final model for these data is
log y = β 0 + β 2 x 2 + β 3 x 3 .
Example 2: The Coronary Heart Disease Dataset.
The data presented here consist of 50 data points (Rousseauw et al. [45]) from a retrospective sample of 3357 males in a coronary heart disease high-risk region of the Western Cape, South Africa. The response variable y is coronary heart disease, which has two controls. There are nine covariates, namely, x 1 : systolic blood pressure; x 2 : cumulative tobacco (kg); x 3 : low density lipoprotein cholesterol, x 4 : adiposity, x 5 : family history of heart disease, x 6 : type-A behavior, x 7 : obesity, x 8 : current alcohol consumption, and x 9 : age at onset.
We consider a logistic regression model. Thus, the full model considered here for model selection is
log E ( y ) 1 E ( y ) = β 0 + β 1 x 1 + + β 9 x 9 .
In Table 11, we present variables that enter into the model in each step of the forward selection procedure using the score test, Wald test, LRT test, AIC, and BIC.
Table 11 shows that two covariates (low-density lipoprotein cholesterol and family history of heart disease) are significant out of the nine covariates using the forward selection procedure through the score test and the Wald test. However, the forward selection procedure through the LRT test and the BIC provides different parsimonious models.
Thus, the final model for these data is
log E ( y ) 1 E ( y ) = β 0 + β 3 x 3 + β 5 x 5 .

6. Discussion

In this paper, we first develop a score test procedure for testing the significance of a single covariate in generalized linear models that encompasses a range of symmetric and asymmetric probability distributions. This score test is compared—by extensive simulations—with the Wald test, the likelihood ratio test, and the F test.
The F test does well for data from the normal distribution. For data from Poisson and binomial distributions, the score test performs best.
Next, a comparative study of the performance of a few model selection procedures, such as the forward selection, the AIC, and the BIC, with respect to level and power, was conducted. The other two procedures, backward elimination and stepwise selection, are not included in our study, as in practice, these produce similar final models as those obtained by the forward selection procedure. Furthermore, although these model selection procedures are well-known, to be helpful to the readers, we provide a brief description in this paper.
The F test is well-known, and as it does best for normally distributed data, it is used in model selection for data from this distribution. The score test performs the best for data from Poisson and binomial data and it has a very simple form. So, for data from these distributions, the score test is recommended for model selection.
Simulation studies show that the forward selection procedure using the score test performs best in terms of the level and power for data from all three distributions, although model selection using the F test performs very well for normally distributed data.
The development of the score test procedure for testing the significance of a single covariate and, subsequently, using it in model selection, is extended to over-dispersed Poisson and over-dispersed binomial models, specifically for the negative binomial and beta-binomial models.

7. Conclusions

Our recommendation is to use the forward selection procedure using the F test for normal regression models. For Poisson, binomial, and negative binomial regression models, our recommendation is to use the forward selection procedure with the score test for small to moderate sample sizes; for large n ( n > 50 ), the BIC procedure is recommended as it is computationally much simpler. However, for the beta-binomial regression model, our recommendation is to use the forward selection procedure with the Wald test for a moderate sample size and to use the BIC for a large sample size.

Author Contributions

Conceptualization, S.P.; methodology, A.M. and S.P.; formal analysis, A.M.; writing—original draft preparation, S.P. and A.M.; writing—review and editing, S.P and A.M.; supervision, S.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Natural Science and Engineering Research Council of Canada, grant account number 875700 given to Sudhir Paul at the University of Windsor.

Data Availability Statement

Datasets are available in Appendix D.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AICAkaike information criterion
BBbeta-binomial
BICBayesian information criterion
GLMgeneralized linear model
LRTlikelihood ratio test
OLSordinary linear regression model

Appendix A. Derivation of the Score Test Statistic

Suppose that δ = β j and θ = ( β 0 , β 1 , , β j 1 , β j + 1 , , β p ) . We define the partial derivatives of the log-likelihood evaluated at δ = 0 as
ψ = l δ | δ = 0 = l β j | δ = 0 and γ = l θ | δ = 0 = l β 0 , l β 1 , , l β j 1 , l β j + 1 , , l β p | δ = 0 .
The C ( α ) test is based on the adjusted score S = l δ B l θ , where B is the matrix of partial regression coefficients that is obtained by regressing l δ on l θ . The variance–covariance of S is D A B 1 A , where D = E 2 l β j 2 | δ = 0 , A = E 2 l β j β k | δ = 0 ( k j ), which is a 1 × p vector, and B = E 2 l β k β t | δ = 0 ( k , t j ) , which is a p × p matrix. After replacing θ in S , A , B , and D with θ ^ , the C ( α ) statistic takes the form S j = S ( D A B 1 A ) 1 S , which is approximately distributed as chi-squared with 1 degree of freedom.
Now, we define
w i = μ i η i 2 V i 1 , W = diag ( w 1 , , w n ) | β j = 0 , x j = ( x 1 j , , x n j ) , and X j = 1 x 11 x 1 ( j 1 ) x 1 ( j + 1 ) x 1 p 1 x 21 x 2 ( j 1 ) x 2 ( j + 1 ) x 2 p 1 x n 1 x n ( j 1 ) x n ( j + 1 ) x n p .
Then
P j = l β j | β j = 0 = i = 1 n w i ( y i μ i ) η i μ i x i j | β j = 0 , D = i = 1 n w i x i j 2 | β j = 0 = x j W x j , A = i = 1 n x i j w i 1 , x i 1 , , x i ( j 1 ) , x i ( j + 1 ) , , x i p | β j = 0 = x j W X j , B = X j W X j , and D A B 1 A = x j W I n X j X j W X j 1 X j W x j .
Substitute β 0 , β 1 , , β j 1 , β j + 1 , , β p with their MLEs under the null hypothesis. Then the score test statistic is
S j = P j ^ 2 x j W ^ I n X j X j W ^ X j 1 X j W ^ x j .
The above score test can also be obtained from [30]. For testing a subset q of the regression parameters equal to zero, Pregibon [30] obtains a score test given by
P S q = s X q X q W 1 2 M p W 1 2 X q 1 X q s ,
where M p = I W 1 2 X p ( X p W X p ) 1 X p W 1 2 , W = W 1 2 W 1 2 and S = s X .
Using q = 1 in the above, the score test becomes P S 1 = S x j W 1 2 M p W 1 2 x j 1 S .
Now
x j W 1 2 M p W 1 2 x j = x j W 1 2 I W 1 2 X p ( X p W X p ) 1 X p W 1 2 W 1 2 x j = x j W 1 2 W 1 2 x j x j W 1 2 W 1 2 X p ( X p W X p ) 1 X p W 1 2 W 1 2 x j = x j W x j x j W X p X p W X p 1 X p W x j = x j W I n X p X p W X p 1 X p W x j .
Therefore,
P S 1 = S x j W 1 2 M p W 1 2 x j 1 S = S 2 x j W I n X p X p W X p 1 X p W x j ,
which, after replacing S with S ^ , is identical to S j .
Special Cases:
Expressions for the three test statistics, L R N j , W j , and S j , are provided for special cases wherein the data distribution is normal, Poisson, and binomial, respectively.
(i)
For the N ( μ , σ 2 ) distribution with link function η i = μ i , these statistics are
L R N j = 1 σ 2 y i μ ^ i 2 y i μ ˜ i 2 , W N j = β ˜ j 1 σ ˜ 2 i = 1 n x i j 2 , and S N j = i = 1 n y i μ ^ i σ ^ 2 x i j 2 x j W I X j X j W X j 1 X j W x j ,
where μ ˜ i = β ˜ 0 + β ˜ 1 x i 1 + + β ˜ p x i p , μ ^ i = β ^ 0 + β ^ 1 x i 1 + + β ^ j 1 x i ( j 1 ) + β ^ j + 1 x i ( j + 1 ) + + β ^ p x i p , W = diag ( 1 / σ ^ 2 , , 1 / σ ^ 2 ) and σ ^ 2 = i = 1 n y i μ ^ i 2 / n .
(ii)
For the Poisson ( λ ) distribution, the link function is η i = log ( λ i ) . After derivation and simplification, we obtain the corresponding test statistics for Poisson-distributed data as
L R P j = 2 ( y i log λ ˜ i λ ˜ i ) ( y i log λ ^ i λ ^ i ) , W P j = β ˜ j i = 1 n λ ˜ i x i j 2 , and S P j = i = 1 n y i λ ^ i x i j 2 x j W I X j X j W X j 1 X j W x j ,
where λ ˜ i = exp ( β ˜ 0 + β ˜ 1 x i 1 + + β ˜ p x i p ) , λ ^ i = exp ( β ^ 0 + β ^ 1 x i 1 + + β ^ j 1 x i ( j 1 ) + β ^ j + 1 x i ( j + 1 ) + + β ^ p x i p ) and W = diag ( λ ^ 1 , , λ ^ n ) .
(iii)
Finally, for the binomial ( m , p ) distribution with the link function η i = log ( p i 1 p i ) , the corresponding statistics are
L R B j = 2 y i log p ˜ i 1 p ˜ i + m i log ( 1 p ˜ i ) y i log p ^ i 1 p ^ i + m i log ( 1 p ^ i ) , W B j = β ˜ j i = 1 n m i p ˜ i ( 1 p ˜ i ) x i j 2 , and S B j = i = 1 n y i m i p ^ i x i j 2 x j W I X j X j W X j 1 X j W x j ,
where p ˜ i 1 p ˜ i = exp ( β ˜ 0 + β ˜ 1 x i 1 + + β ˜ p x i p ) , p ^ i 1 p ^ i = exp ( β ^ 0 + β ^ 1 x i 1 + + β ^ j 1 x i ( j 1 ) + β ^ j + 1 x i ( j + 1 ) + + β ^ p x i p ) and W = diag m 1 p ^ 1 ( 1 p ^ 1 ) , , m n p ^ n ( 1 p ^ n ) .

Appendix B. First- and Second-Order Partial Derivatives of the Log-likelihood of the Negative Binomial Regression Model with Respect to Parameters β and c

l β j = i = 1 n y i m i 1 + c m i x i j , l c = i = 1 n log ( 1 + c m i ) c 2 m i ( y i + c 1 ) 1 + c m i + l = 1 y i l 1 1 + c ( l 1 ) , 2 l β j β k = i = 1 n y i + 2 c m i y i c m i 2 ( 1 + c m i ) 2 y i m i 1 + c m i x i j x i k , 2 l β j c = i = 1 n m i ( y i m i ) ( 1 + c m i ) 2 x i j , and 2 l c 2 = i = 1 n [ l = 1 y i l 1 1 + c ( l 1 ) 2 + 2 c 3 log ( 1 + c m i ) 2 c 2 m i ( 1 + c m i ) ( y i + c 1 ) m i 2 ( 1 + c m i ) 2 .

Appendix C. First- and Second-Order Partial Derivatives of the Log-likelihood of the Beta-Binomial Regression Model with Respect to Parameters β and θ. The Denominator Term Vj of the Score Test in Section 4.2.1

l β j = i = 1 n r = 0 y i 1 1 μ i + r θ r = 0 k i y i 1 1 1 μ i + r θ μ i ( 1 μ i ) x i j , l θ = i = 1 n r = 0 y i 1 r μ i + r θ + r = 0 k i y i 1 r 1 μ i + r θ r = 0 k i 1 r 1 + r θ , 2 l β j β k = i = 1 n r = 0 y i 1 1 ( μ i + r θ ) 2 + r = 0 k i y i 1 1 ( 1 μ i + r θ ) 2 μ i 2 ( 1 μ i ) 2 x i j x i k + i = 1 n r = 0 y i 1 1 μ i + r θ r = 0 k i y i 1 1 1 μ i + r θ μ i ( 1 μ i ) ( 1 2 μ i ) x i j x i k , 2 l β j θ = i = 1 n r = 0 y i 1 r ( μ i + r θ ) 2 + r = 0 k i y i 1 r ( 1 μ i + r θ ) 2 μ i ( 1 μ i ) x i j , and 2 l θ 2 = i = 1 n r = 0 y i 1 r 2 ( μ i + r θ ) 2 r = 0 k i y i 1 r 2 ( 1 μ i + r θ ) 2 + r = 0 k i 1 r 2 ( 1 + r θ ) 2 .
The denominator term of the score test in Section 4.2.1:
V j = x j W W 1 a U U X j V 1 1 X j W I W X j X j W X j 1 X j U V 2 1 U x j , W = diag ( w 1 , , w n ) , U = ( u 1 , , u n ) , w i = ( p 1 i + p 2 i ) μ i 2 ( 1 μ i ) 2 , u i = 1 θ [ μ i p 1 i + ( 1 μ i ) p 2 i ] μ i ( 1 μ i ) , V 1 = X j W 1 a U U X j , V 2 = a U X j X j W X j 1 X j U , a = 1 θ 2 i = 1 n μ i 2 p 1 i + ( 1 μ i ) 2 p 2 i p 3 i , p 1 i = r = 1 k i Pr ( y i r ) [ μ i + ( r 1 ) θ ] 2 , p 2 i = r = 1 k i Pr ( y i k i r ) [ 1 μ i + ( r 1 ) θ ] 2 , and p 3 i = r = 1 k i 1 ( 1 + r θ ) 2 .

Appendix D. Dataset

Table A1. Example 1 dataset: the lower respiratory illness count.
Table A1. Example 1 dataset: the lower respiratory illness count.
LRIRiskPassiveCrowdingRaceSocioeconomicSocioeconomicAgeAge
Status (Low)Status (Medium)<44–6
(y) ( x 1 ) ( x 2 ) ( x 3 ) ( x 4 ) ( x 5 ) ( x 6 ) ( x 7 ) ( x 8 )
0450001001
0341110101
0380001001
0440001001
4301100101
0420001001
0110010110
0380000101
0401000101
0370001001
0421110101
5351110001
0400000001
0380001001
0410001001
1271000101
1311000101
0410001001
0390100101
2231110101
1431001001
0361100101
070111000
1411110101
3371110001
1300111001
4381010001
0310000101
4391101001
0291100001
0401001001
1351100101
0381111001
0361011001
050110100
3400100001
0141110110
1271010101
0400001001
0330110101
040000000
1291110101
0430001001
1371001001
1361110101
0430101001
2370100101
0441001001
0181110001
0431001001
Table A2. Example 2 dataset: coronary heart disease.
Table A2. Example 2 dataset: coronary heart disease.
CountsbpTobaccoldlAdiposityFamhistTypeaObesityAlcoholAge
( y ) ( x 1 ) ( x 2 ) ( x 3 ) ( x 4 ) ( x 5 ) ( x 6 ) ( x 7 ) ( x 8 ) ( x 9 )
01181.629.0121.70Absent5925.8921.1940
01622.923.6331.33Absent6231.5918.5142
01240.612.6917.15Present6122.7611.5520
11341.103.5420.41Present5824.5439.9139
11542.405.6342.17Present5935.0712.8650
01361.363.1614.97Present5624.987.3024
11300.085.5925.42Present5024.986.2743
01280.733.9723.52Absent5423.8119.2064
01121.442.7122.92Absent5924.810.0052
01320.103.2810.73Absent7320.420.0017
01200.002.4216.66Absent4620.160.0017
01280.406.1726.35Absent6427.8611.1134
01241.803.7416.64Present4222.2610.4920
015813.505.0430.79Absent5424.7921.5062
01280.003.2226.55Present3926.5916.7149
11488.207.7534.46Present4626.536.0464
11743.505.2621.97Present3622.048.3359
015210.104.7124.65Present6526.2124.5357
11224.189.0529.27Present4424.0519.3452
11102.353.3626.72Present5426.08109.8058
01230.054.6113.69Absent5123.232.7816
11348.081.5517.50Present5622.6566.6531
113212.305.9632.79Present5730.1221.5062
11689.008.5324.48Present6926.184.6354
01942.556.8933.88Present6929.330.0041
01104.644.5530.46Absent4830.9015.2246
01304.002.4017.42Absent6022.050.0040
01240.003.0417.33Absent4922.040.0018
11766.003.9817.20Present5221.074.1161
01304.505.8637.43Absent6131.2132.3058
01140.002.999.74Absent5446.580.0017
01765.764.8926.10Present4627.3019.4457
01244.006.6530.84Present5428.4033.5160
01427.445.5233.97Absent4729.2924.2754
01480.005.3226.71Present5232.2132.7827
01140.003.8319.40Present4924.862.4929
01400.002.4027.89Present7030.74144.0029
11241.607.2239.68Present3631.500.0051
11645.603.1730.98Present4425.9943.2053
01625.604.2422.53Absent2922.915.6660
015212.184.0437.83Present6334.574.1764
01320.003.3021.61Absent4224.9232.6133
11440.76 10.5335.66Absent6334.350.0055
01180.083.4832.28Present5229.143.8146
11348.807.4126.84Absent3529.4429.5260
11280.008.4128.82Present6026.860.0059
01545.533.2028.81Present6126.1542.7942
01380.003.9624.70Present5323.800.0045
01200.003.9813.19Present4721.890.0016
01544.205.5925.02Absent5825.021.5443

References

  1. Kadane, J.; Lazar, N. Methods and Criteria for Model Selection. J. Am. Stat. Assoc. 2004, 99, 279–290. [Google Scholar] [CrossRef]
  2. Beale, E.M.L. Note on Procedures for Variable Selection in Multiple Regression. Technometrics 1970, 12, 909–914. [Google Scholar] [CrossRef]
  3. Kutner, M.H.; Nachtsheim, C.J.; Neter, J.; Li, W. Applied Linear Statistical Models, 5th ed.; McGraw-Hill: New York, NY, USA, 2013. [Google Scholar]
  4. Akaike, H. A New Look at the Statistical Model Identification. IEEE Trans. Autom. Control 1974, 19, 716–723. [Google Scholar] [CrossRef]
  5. Schwarz, G. Estimating the Dimension of a Model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
  6. Fan, J. Design-adaptive Nonparametric Regression. J. Am. Stat. Assoc. 1992, 87, 998–1004. [Google Scholar] [CrossRef]
  7. Eilers, P.H.C.; Marx, B.D. Flexible Smoothing with B-splines and Penalties. Statist. Sci. 1996, 11, 89–121. [Google Scholar] [CrossRef]
  8. Bock, M.E.; Pliego, G. Estimating Functions with Wavelets Part II: Using a Daubechies Wavelet in Nonparametric Regression. Stat. Comput. Stat. Graph. Newsl. 1992, 3, 27–34. [Google Scholar]
  9. Tay, J.K.; Narasimhan, B.; Hastie, T. Elastic Net Regularization Paths for All Generalized Linear Models. J. Stat. Softw. 2023, 106, 1–31. [Google Scholar] [CrossRef]
  10. Xia, L.; Nan, B.; Li, Y. Debiased Lasso for Generalized Linear Models with a Diverging Number of Covariates. Biometrics 2023, 79, 344–357. [Google Scholar] [CrossRef]
  11. Pavone, F.; Piironen, J.; Bürkner, P.C.; Vehtari, A. Using Reference Models in Variable Selection. Comput. Stat. 2023, 38, 349–371. [Google Scholar] [CrossRef]
  12. Mazumder, R.; Radchenko, P.; Dedieu, A. Subset Selection with Shrinkage: Sparse Linear Modeling when the SNR is Low. Oper. Res. 2023, 71, 129–147. [Google Scholar] [CrossRef]
  13. Cai, T.T.; Guo, Z.; Ma, R. Statistical Inference for High-Dimensional Generalized Linear Models with Binary Outcomes. J. Am. Stat. Assoc. 2023, 118, 1319–1332. [Google Scholar] [CrossRef] [PubMed]
  14. Han, Y.; Tsay, R.S.; Wu, W.B. High Dimensional Generalized Linear Models for Temporal Dependent Data. Bernoulli 2023, 29, 105–131. [Google Scholar] [CrossRef]
  15. Li, S.; Zhang, L.; Cai, T.T.; Li, H. Estimation and Inference for High-Dimensional Generalized Linear Models with Knowledge Transfer. J. Am. Stat. Assoc. 2023. [Google Scholar] [CrossRef]
  16. Xu, S.; Ferreira, M.A.R.; Porter, E.M.P.; Franck, C.T. Bayesian Model Selection for Generalized Linear Mixed Models. Biometrics 2023, 2023, 1–13. [Google Scholar] [CrossRef]
  17. Arnastauskaite, J.; Ruzgas, T.; Bražėnas, M. A New Goodness of Fit Test for Multivariate Normality and Comparative Simulation Study. Mathematics 2021, 9, 3003. [Google Scholar] [CrossRef]
  18. Di Noia, A.; Barabesi, L.; Marcheselli, M.; Pisani, C.; Pratelli, L. Goodness-of-fit Test for Count Distributions with Finite Second Moment. J. Nonparametric Stat. 2022, 35, 19–37. [Google Scholar] [CrossRef]
  19. Deng, D.; Paul, S.R. Score Tests for Zero-inflation in Generalized Linear Models. Can. J. Stat. 2000, 27, 563–570. [Google Scholar] [CrossRef]
  20. Deng, D.; Paul, S.R. Score Tests for Zero-inflation and Over-dispersion in Generalized Linear Models. Stat. Sin. 2005, 15, 257–276. [Google Scholar]
  21. Deng, D.; Paul, S.R. Goodness of Fit of Product Multinomial Regression Models to Sparse Data. Sankhya B 2016, 78, 78–95. [Google Scholar] [CrossRef]
  22. Erlemann, R.; Lindqvist, B.H. Conditional Goodness-of-fit Tests for Discrete Distributions. J. Stat. Theory Pract. 2022, 16, 8. [Google Scholar] [CrossRef]
  23. Ozonur, D.; Paul, S. Goodness of Fit Tests of the Two-Parameter Gamma Distribution against the Three-Parameter Generalized Gamma Distribution. Commun. Stat.-Simul. Comput. 2022, 51, 687–697. [Google Scholar] [CrossRef]
  24. Paul, S.R.; Deng, D. Assessing Goodness of Fit of Generalized Linear Models to Sparse Data using Higher Order Moment Corrections. Sankhya B 2012, 74, 195–210. [Google Scholar] [CrossRef]
  25. Rao, C.R. Large Sample Tests of Statistical Hypotheses Concerning Several Parameters with Applications to Problems of Estimation. Proc. Camb. Philos. Soc. 1947, 44, 50–57. [Google Scholar]
  26. Neyman, J. Optimal Asymptotic Tests for Composite Hypothesis. In Probability and Statistics: Harold Cramer Volume; Grenander, U., Ed.; Wiley: New York, NY, USA, 1959. [Google Scholar]
  27. Rao, C.R. Score Test: Historical Review and Recent Developments. In Advances in Ranking and Selection, Multiple Comparisons, and Reliability-Methodology and Applications; Balakrishna, N., Kannan, N., Nagaraja, H.N., Eds.; Statistics for Industry and Technology; Springer: Berlin/Heidelberg, Germany, 2005; pp. 3–20. [Google Scholar]
  28. Nelder, J.A.; Wedderburn, R.W.M. Generalized Linear Models. J. R. Stat. Soc. A 1972, 135, 370–384. [Google Scholar] [CrossRef]
  29. McCullagh, P.; Nelder, J.A. Generalized Linear Models, 2nd ed.; Chapman and Hall: London, UK, 1989. [Google Scholar]
  30. Pregibon, D. Score Tests in GLIM with Applications. Lect. Notes Stat. 1982, 14, 87–97. [Google Scholar]
  31. Williams, D.A. The Analysis of Binary Responses from Toxicological Experiments Involving Reproduction and Teratogenicity. Biometrics 1975, 31, 949–952. [Google Scholar] [CrossRef]
  32. Paul, S.R. Analysis of Proportions of Affected Foetuses in Teratological Experiments. Biometrics 1982, 38, 361–370. [Google Scholar] [CrossRef]
  33. Anscombe, F.J. The Statistical Analysis of Insect Counts Based on the Negative Binomial Distribution. Biometrics 1949, 5, 165–173. [Google Scholar] [CrossRef]
  34. Bliss, C.I.; Fisher, R.A. Fitting the Negative Binomial Distribution to Biological Data. Biometrics 1953, 9, 176–200. [Google Scholar] [CrossRef]
  35. Bohning, D.; Dietz, E.; Schlattmann, P.; Mendonca, L.; Kirchner, U. The Zero-Inflated Poisson Model and the Decayed, Missing and Filled Teeth Index in Dental Epidemiology. J. R. Stat. Soc. Ser. A 1999, 162, 195–209. [Google Scholar] [CrossRef]
  36. Margolin, B.H.; Kaplan, N.; Zeiger, E. Statistical Analysis of the Ames Salmonella/microsome Test. Proc. Nat. Acad. Sci. USA 1981, 76, 3779–3783. [Google Scholar] [CrossRef] [PubMed]
  37. McCaughran, D.A.; Arnold, D.W. Statistical Models for Members of Implantation Sites and Embryonic Deaths in Mice. Toxicol. Appl. Pharmacol. 1976, 38, 325–333. [Google Scholar] [CrossRef] [PubMed]
  38. Breslow, N.E. Extra-Poisson Variation in Log-linear Models. Appl. Stat. 1984, 33, 38–44. [Google Scholar] [CrossRef]
  39. Engel, J. Models for Response Data Showing Extra-Poisson Variation. Stat. Neerl. 1984, 38, 159–167. [Google Scholar] [CrossRef]
  40. Lawless, J.F. Negative Binomial and Mixed Poisson Regression. Can. J. Stat. 1987, 15, 209–225. [Google Scholar] [CrossRef]
  41. Margolin, B.H.; Kim, B.S.; Risko, K.J. The Ames salmonella/microsome Mutagenicityassay: Issues of Inference and Validation. J. Am. Stat. Assoc. 1989, 84, 651–661. [Google Scholar]
  42. Piegorsch, W.W. Maximum Likelihood Estimation for the Negative Binomial Dispersion Parameter. Biometrics 1990, 46, 863–867. [Google Scholar] [CrossRef]
  43. LaVange, L.M.; Keyes, L.L.; Koch, G.G.; Margolis, P.E. Application of Sample Survey Methods for Modelling Ratios to Incidence Densities. Stat. Med. 1994, 13, 343–355. [Google Scholar] [CrossRef]
  44. Cameron, A.C.; Trivedi, P.K. Regression Analysis of Count Data; Cambridge University Press: Cambridge, UK, 1998. [Google Scholar]
  45. Rousseauw, J.; du Plessis, J.; Benade, A.; Jordaan, P.; Kotze, J.; Ferreira, J. Coronary Risk Factor Screening in Three Rural Communities. S. Af. Med. J. 1983, 64, 430–436. [Google Scholar]
Table 1. Empirical level (EL) and power (in %) of the four test statistics, based on 10,000 replications and α = 0.05 .
Table 1. Empirical level (EL) and power (in %) of the four test statistics, based on 10,000 replications and α = 0.05 .
DistributionSizeTestELEmpirical Power
(n) β 2
0.000.050.100.150.200.250.300.350.400.450.50
Normal10Score7.807.857.908.178.669.069.7610.5411.1912.0913.34
Wald9.299.379.649.8710.3110.8711.6212.5012.9914.3615.45
LRT11.5711.6311.8912.1912.9213.2614.3814.9815.4916.8718.61
F5.155.285.455.555.706.126.667.467.868.539.64
20Score6.576.586.807.478.019.1410.3813.2314.3116.3919.35
Wald7.177.117.458.118.689.9611.1914.3615.3417.5820.46
LRT7.927.998.308.949.5711.0812.2215.5616.8119.0322.14
F5.325.525.646.266.827.828.8211.2512.3814.3117.23
30Score5.966.016.457.738.6710.7712.8516.1619.2622.9825.66
Wald6.426.356.758.159.1711.4313.3816.7820.0823.7626.58
LRT6.836.787.428.719.8412.1714.1817.7821.1524.9527.76
F5.335.455.896.917.839.6711.6115.1217.9421.1023.72
50Score5.616.146.608.2011.2514.6417.6122.1028.5734.0340.51
Wald5.776.426.798.4311.6215.1818.0522.5629.1134.7041.09
LRT6.096.757.098.8012.0515.7118.4823.2529.8835.4541.96
F5.285.606.197.7710.4613.8416.7021.1227.5632.8039.32
Poisson10Score5.095.967.8011.0217.1622.5630.2537.8447.7554.6862.69
Wald4.595.377.2210.2216.2421.5329.0036.5646.3753.5461.65
LRT5.426.338.0411.3916.4123.1630.5938.4248.6455.5663.54
F0.090.140.220.280.320.400.530.731.051.131.41
20Score4.746.0011.5619.4231.0144.8558.2770.1379.4587.3791.98
Wald4.615.8411.4119.1130.6444.5357.9069.8079.1487.1191.86
LRT4.806.1411.6619.5831.1945.1258.7370.3979.5887.5392.25
F0.020.000.010.040.040.140.220.570.971.241.98
30Score4.838.5015.3827.9545.0262.0976.1786.9393.3496.6998.59
Wald4.798.4415.3027.7044.8961.9376.0686.8593.2896.6598.55
LRT4.828.5915.3728.0045.2462.1076.3187.0893.4596.7498.58
F0.010.010.010.010.120.230.310.801.662.203.70
50Score4.859.4222.1044.8866.3383.9493.9697.8599.5199.8999.98
Wald4.849.4322.0444.7866.2783.9193.9397.8399.5099.8999.98
LRT4.819.4522.1244.9466.4084.0893.9897.8799.5199.8899.98
F0.000.000.010.020.110.300.962.455.059.4313.82
Table 2. Empirical level and power (in %) of the four test statistics in binomial distribution; based on 10,000 replications and α = 0.05 .
Table 2. Empirical level and power (in %) of the four test statistics in binomial distribution; based on 10,000 replications and α = 0.05 .
Size ELEmpirical Power
( m , n ) Test β 2
0.000.050.100.150.200.250.300.350.400.450.50
(10,10)Score4.875.587.039.2312.3816.2121.1127.2333.1640.1446.17
Wald4.445.026.328.4411.4715.1019.5625.7331.5438.2544.46
LRT5.206.057.399.6612.9516.8321.8828.0333.9540.8046.80
F0.550.800.961.011.472.162.803.755.086.728.14
(30,10)Score4.946.7710.7117.5326.8137.9050.0760.8970.2677.7683.17
Wald4.816.5810.4917.1926.3837.4349.5760.5069.7877.4082.99
LRT5.056.8810.8817.7626.9638.1550.3461.1170.4578.0283.37
F0.000.000.000.000.000.000.050.040.130.070.01
(40,10)Score4.966.7912.5921.5533.6747.5960.5271.3579.7185.8489.87
Wald4.836.5712.3821.2633.3347.1860.1171.0179.4285.6989.78
LRT5.036.8912.6821.6933.8647.6660.7171.5079.8985.8889.99
F0.000.000.000.000.000.000.000.010.010.020.01
(10,20)Score5.376.129.7114.2621.5030.9541.2051.7661.7570.8478.90
Wald5.045.899.2713.7420.9430.1540.4951.0760.9470.0078.30
LRT5.476.279.9014.5821.7831.2241.6452.4062.1771.1079.27
F0.190.230.450.791.552.854.707.3511.3716.6322.80
(30,20)Score5.308.0217.9533.5951.1868.5782.7490.6295.4797.9798.99
Wald5.207.9417.8133.3250.8668.3382.5990.4595.4297.9698.98
LRT5.338.1218.1033.7351.3568.6782.8790.7595.4997.9799.00
F0.000.000.000.000.000.000.000.000.020.060.12
(40,20)Score5.379.1422.1042.0262.8079.4290.6295.6698.2499.4099.71
Wald5.319.1021.9641.8462.7079.3190.5495.6498.2499.3999.71
LRT5.379.2322.1642.1662.9279.4890.6895.6998.2699.4199.71
F0.000.000.000.000.000.000.000.000.000.020.00
(10,30)Score5.357.0411.0119.8031.0844.4058.1369.1579.9188.0992.52
Wald5.236.8210.7419.3330.4743.8957.6468.8179.5587.8892.31
LRT5.407.1811.1819.9831.3344.6158.4569.5180.0988.2792.59
F0.080.190.361.052.264.538.6713.7622.1632.3942.40
(30,30)Score5.2110.2124.6247.1170.8886.0794.9598.0599.3799.8599.94
Wald5.1710.1224.5146.8970.7385.9794.8498.0499.3799.8599.94
LRT5.2610.1924.6747.2370.9586.1794.9898.0799.3999.8599.94
F0.000.000.000.000.000.000.000.000.030.040.02
(40,30)Score5.2911.9531.1657.9381.6093.4898.1799.4999.8899.9899.98
Wald5.2211.9030.9957.8181.5693.4598.1699.4999.8899.9899.98
LRT5.3011.9731.2457.9481.6893.5298.1899.4999.8899.9899.98
F0.000.000.000.000.000.000.000.000.000.020.01
(10,50)Score5.247.7716.2830.4146.9765.4179.7189.4595.4098.2799.32
Wald5.127.6616.1530.0846.7365.0979.5389.3095.3098.2499.31
LRT5.267.7916.4330.5947.1665.5379.8089.5495.4898.2899.33
F0.030.050.521.894.1210.0219.4733.8549.0364.5478.18
(30,50)Score4.9112.8937.8369.8989.3097.7099.8899.98100100100
Wald4.8912.8837.7669.7789.2297.6899.8899.98100100100
LRT4.9612.9537.9369.8989.3197.7099.8899.98100100100
F0.000.000.000.000.000.000.000.010.100.532.02
(40,50)Score4.9215.7547.9581.1895.4299.3799.9699.99100100100
Wald4.9215.7247.8781.1695.4299.3799.9699.99100100100
LRT4.9715.7847.9181.2095.4499.3799.9699.99100100100
F0.000.000.000.000.000.000.000.000.010.000.06
Table 3. Empirical level and power (in %) of model selection by forward selection using the score test (Forward-S), forward selection using the F test (Forward-F), AIC, and BIC; based on 10,000 replications.
Table 3. Empirical level and power (in %) of model selection by forward selection using the score test (Forward-S), forward selection using the F test (Forward-F), AIC, and BIC; based on 10,000 replications.
DistributionSizeMethodELEmpirical Power
( n ) β 1
0.000.050.100.150.200.250.300.350.400.450.50
Normal10Forward-F5.095.485.345.826.106.116.777.867.959.119.56
AIC30.9431.6632.3132.4733.0533.1234.2035.4136.0837.1840.22
BIC27.1727.8328.3228.5628.8629.1930.0931.3432.3533.2036.25
20Forward-F4.965.585.676.227.327.939.6110.8412.8515.2016.50
AIC21.4021.5322.7223.6925.7227.6629.5531.4135.0139.2241.26
BIC12.3212.5313.1913.7715.7217.2118.8221.1823.8527.1629.55
30Forward-F4.685.255.766.798.1010.3911.5714.4017.5521.2423.58
AIC18.8720.3920.3022.3525.0128.1230.9435.3339.3844.0748.34
BIC8.238.949.5910.8312.6815.2717.3320.9724.5428.5731.46
50Forward-F5.185.416.248.3610.3513.8017.5421.6726.7132.2038.93
AIC18.2818.9419.9923.9427.3933.1538.0843.6249.9357.0863.04
BIC5.876.196.989.3411.4614.9818.9723.3628.9134.4541.03
Poisson10Forward-S6.938.539.8412.2716.5822.2727.7332.9841.2647.1153.98
AIC19.0821.7723.4026.6132.9839.3646.2551.8859.8064.2170.92
BIC16.2818.6420.7323.6329.5735.6642.6648.4456.6861.1368.34
20Forward-S7.018.3312.2019.2628.3338.9149.3362.4871.0479.0486.55
AIC18.0619.5926.8536.3448.1859.2370.5380.4686.6391.1194.96
BIC10.8712.1617.6325.5636.6347.5859.6971.4579.0185.7591.56
30Forward-S6.528.5315.2125.8840.2254.9769.8580.3389.0693.4896.82
AIC17.1320.2331.0144.9862.0175.4985.9592.2396.6098.1699.20
BIC8.0810.4118.2130.0645.4160.8374.8584.6892.0695.4697.89
50Forward-S5.759.4621.4941.3262.6679.4090.6196.6998.8399.6299.87
AIC15.7122.8740.9263.1880.9591.9097.0299.1699.8099.999.98
BIC5.479.2321.4241.3262.7379.5691.0296.7999.0299.6499.90
Binomial10Forward-S6.939.1414.0320.8230.8641.5753.0364.1271.5978.8284.68
AIC18.5421.3428.4338.0049.1460.5270.5978.9784.1288.7892.23
BIC16.0318.5425.1834.5445.6457.2367.6576.6982.3887.1591.19
20Forward-S6.739.9920.9837.4556.5573.1884.4892.0795.8698.3699.06
AIC17.3822.8139.1859.4875.7788.0294.0697.4198.8199.7099.86
BIC10.4314.5728.2047.2966.2680.8889.8595.0897.8499.2299.64
30Forward-S6.2012.2129.2853.9875.2690.1296.2498.9799.7799.9099.97
AIC16.8826.3349.2674.4289.4096.8199.1499.8099.95100100
BIC7.7114.8433.7359.6880.0492.5697.4799.3499.8399.9799.99
50Forward-S5.4716.0446.6777.8094.7799.2299.82100100100100
AIC16.4733.7168.3690.7798.6499.8799.99100100100100
BIC5.4615.9046.3177.7694.9099.3099.84100100100100
Table 4. Empirical level and power (in %) of model selection by the backward elimination using the score test (Backward-S), AIC, and BIC; based on 10,000 replications.
Table 4. Empirical level and power (in %) of model selection by the backward elimination using the score test (Backward-S), AIC, and BIC; based on 10,000 replications.
DistributionSizeMethodELEmpirical Power
( n ) β 1
0.000.050.100.150.200.250.300.350.400.450.50
Poisson10Backward-S6.597.258.259.9012.1516.0919.7522.5626.5929.7633.67
AIC35.4837.739.8743.0348.8654.2961.0765.1770.7974.8579.51
BIC34.3536.5138.7742.0747.7753.0359.9163.9169.7773.8078.79
20Backward-S6.667.159.7913.7118.8223.7329.5335.5040.5744.4948.54
AIC23.4625.1932.4342.153.7163.9175.0283.7589.2893.0996.12
BIC18.8720.0426.4935.3446.7156.6468.278.584.5790.1394.28
30Backward-S6.997.0811.5616.7423.6730.1637.9542.8948.2351.6954.39
AIC18.9322.2432.8446.9663.8677.1687.2493.0097.0398.4699.36
BIC11.2913.7822.3934.5150.3165.5678.8287.3193.6796.6798.6
50Backward-S6.068.2012.9421.3430.7739.2045.3751.5755.1455.2757.57
AIC15.8923.2441.2563.5181.1492.0897.1399.1899.899.9399.98
BIC5.9510.1322.5842.8664.1880.791.7297.1399.1599.7399.93
Binomial10Backward-S6.748.1211.3616.6822.0728.8536.3642.3947.1151.6455.65
AIC33.3336.9343.4552.1962.1971.3179.3585.6789.0892.5494.67
BIC32.2535.8042.1850.7161.0370.0378.4184.888.4491.9494.23
20Backward-S6.488.1515.1323.9634.6144.7953.0259.6965.4669.0872.63
AIC21.827.1944.0563.9679.2390.1695.4497.9699.1199.7999.90
BIC16.8321.5936.7055.7872.8986.0093.0196.5798.6099.5499.76
30Backward-S6.209.5518.0730.2243.4953.5761.6067.7772.8976.9279.85
AIC18.3027.8150.8775.6990.1697.1699.2599.8399.96100100
BIC10.4817.9537.4863.6883.0394.0998.0499.5499.8799.9899.99
50Backward-S5.1310.1823.2038.7151.3961.8568.3775.0679.8384.187.91
AIC16.6534.0568.5990.8898.6799.8799.99100100100100
BIC5.7516.5947.3278.7495.3699.3699.861001000100100
Table 5. Empirical level and power (in %) of the three test statistics in negative binomial distribution; based on 10,000 replications and α = 0.05 .
Table 5. Empirical level and power (in %) of the three test statistics in negative binomial distribution; based on 10,000 replications and α = 0.05 .
SizeTestELEmpirical Power
( n ) β 2
0.000.050.100.150.200.250.300.350.400.450.50
10Score4.585.539.2415.8423.6634.2345.3955.8964.3571.9579.30
Wald6.747.8713.1920.9931.7044.1655.2766.2274.0680.5786.99
LRT6.077.0411.8419.6629.6041.7352.8563.7872.0378.7485.78
20Score4.858.0717.0132.9451.6768.3281.3790.1795.2597.6698.83
Wald6.4310.0920.5938.0456.9273.4985.0992.6596.7298.4199.26
LRT5.829.3019.0936.1455.0471.8784.0391.8796.3398.299.12
30Score4.7010.1825.2947.8769.9987.1794.8498.3799.4699.8299.93
Wald5.8011.9828.2952.4173.4489.5395.9798.8199.6299.8999.95
LRT5.3511.1826.8950.5072.0688.6495.4698.6599.5699.8799.95
50Score4.9612.9540.1071.0791.2197.9699.6299.93100100100
Wald5.7814.4042.5973.5692.2898.2099.7199.94100100100
LRT5.3713.7341.5772.4291.8098.1299.6899.94100100100
Table 6. Empirical level and power (in %) of model selection by forward selection using the score test, AIC, and BIC in the negative binomial distribution; based on 10,000 replications.
Table 6. Empirical level and power (in %) of model selection by forward selection using the score test, AIC, and BIC in the negative binomial distribution; based on 10,000 replications.
SizeMethodELEmpirical Power
( n ) β 1
0.000.050.100.150.200.250.300.350.400.450.50
10Forward3.804.055.047.069.5713.2316.8422.0627.6332.5538.05
AIC23.5524.1830.5037.1745.9956.4064.4172.1978.2583.2387.64
BIC20.5321.2227.5133.7042.3252.7960.9269.2575.7680.9585.77
20Forward4.815.498.9013.8221.2529.9540.2751.0361.3970.0379.00
AIC19.0824.4638.0454.2071.1683.5290.7395.4797.6399.0399.41
BIC11.3015.3726.3941.3959.7474.2784.4691.4695.1297.6698.73
30Forward4.866.6811.7720.4733.2646.7761.5572.8683.3289.7193.74
AIC18.2626.6846.9969.7485.2694.4698.0399.4699.7799.95100
BIC8.5914.6230.5253.7374.1987.7694.8098.0899.2099.7699.99
50Forward4.858.1518.6434.5954.5873.0486.2494.1397.8599.0099.78
AIC18.2626.6846.4026.6132.9839.3646.2551.8859.8064.2170.92
BIC8.5914.6220.7323.6329.5735.6642.6648.4456.6861.1368.34
Table 7. Empirical level and power (in %) of the three test statistics; based on 10,000 replications and α = 0.05 .
Table 7. Empirical level and power (in %) of the three test statistics; based on 10,000 replications and α = 0.05 .
DistributionSizeTestELEmpirical Power
( n ) β 2
0.000.050.100.150.200.250.300.350.400.450.50
NB to Pois.10Score7.929.2615.1923.4634.7447.8458.9869.6877.1783.4489.08
Wald7.739.1114.9023.1734.3347.5358.6169.2676.8583.1688.84
LRT7.989.3115.2323.7834.9148.3459.4469.9977.3383.6289.32
20Score8.3312.1724.1942.7162.0277.8088.1494.4897.5898.8399.53
Wald8.2812.1124.1042.5861.9077.6488.1194.4297.5698.8399.53
LRT8.2812.2024.2142.7662.0177.9188.3194.4997.6398.8699.53
30Score7.6914.7433.0057.5977.8791.8697.2399.1599.7699.9699.96
Wald7.6714.7332.9457.5077.8091.8497.2299.1599.7699.9699.96
LRT7.6214.7832.9657.7478.0591.8697.2499.1899.7899.9699.96
50Score7.9218.1549.0278.2794.3798.8399.8599.95100100100
Wald7.9118.1549.0078.2894.3698.8399.8599.95100100100
LRT7.9218.1349.0878.3494.3698.8399.8599.95100100100
Pois. to NB10Score3.835.079.4216.0126.5938.9251.1962.0571.1178.5184.43
Wald5.056.5411.9719.9132.0745.7358.9569.4678.0784.6889.49
LRT4.656.1411.2418.8130.9644.4657.6668.3476.9283.8888.83
20Score3.656.7118.1935.6857.8175.5386.9793.9397.1898.8099.51
Wald4.437.6820.5939.1361.5278.5188.9595.1997.8699.2199.65
LRT4.097.3619.6738.0160.2677.7188.5094.8597.6499.1199.63
30Score3.859.3126.8353.7178.4191.9897.2899.2299.8499.9699.99
Wald4.3910.4628.8556.4480.1992.8697.7999.3999.8699.9799.99
LRT4.2210.0828.2155.3779.5492.4997.5999.3499.8699.9799.99
50Score4.1714.0344.5078.0694.8799.2399.9399.9799.99100100
Wald4.5514.7546.0579.2795.2499.3499.9499.9799.99100100
LRT4.4314.3945.4178.7295.0999.3099.9499.9799.99100100
Table 8. Empirical level and power (in %) of the three test statistics in the beta-binomial distribution; based on 10,000 replications and α = 0.05 .
Table 8. Empirical level and power (in %) of the three test statistics in the beta-binomial distribution; based on 10,000 replications and α = 0.05 .
SizeTestELEmpirical Power
( n ) β 2
0.000.050.100.150.200.250.300.350.400.450.50
10Score8.078.8511.6913.2615.8620.0922.9527.6432.0836.8040.88
Wald13.9315.1117.7619.5623.3927.7832.5837.6342.0748.5053.38
LRT10.4711.7914.4216.2019.2523.8627.9933.9038.4944.5549.44
20Score7.478.6010.9516.1221.3027.5434.9740.6747.2752.6156.11
Wald8.279.6912.8518.5626.5534.8143.8553.4762.5769.5975.50
LRT7.328.6411.5417.1624.9432.7242.0151.2060.8967.8174.05
30Score6.968.3412.2418.1626.1835.6144.6051.9443.4548.1652.51
Wald7.008.9013.8221.4332.4545.1757.1668.2570.5279.4184.83
LRT6.388.2612.9320.4231.2543.8155.7567.0269.6678.4884.08
50Score6.718.9215.2225.4037.1749.6659.7867.8673.3275.4777.70
Wald5.968.8516.9630.6147.5763.7877.8387.7393.8896.9298.77
LRT5.798.4616.4429.9946.6363.0577.1587.2893.5696.6898.70
Table 9. Empirical level and power (in %) of model selection by forward selection using the Wald test, AIC, and BIC in beta-binomial distribution; based on 10,000 replications.
Table 9. Empirical level and power (in %) of model selection by forward selection using the Wald test, AIC, and BIC in beta-binomial distribution; based on 10,000 replications.
SizeMethodELEmpirical Power
( n ) β 1
0.000.050.100.150.200.250.300.350.400.450.50
30Wald8.008.6011.0515.6319.5526.4433.2040.0747.8955.6262.26
AIC20.2721.8325.9032.6138.5747.8355.3663.7470.6377.0382.12
BIC9.9210.5213.6018.9023.4231.2638.8846.9655.0762.0568.82
50Wald6.818.2312.2920.4528.2039.9750.6260.6071.5679.1784.76
AIC18.1421.0128.5338.3550.4360.2971.5580.0686.9091.7394.95
BIC6.258.0312.2319.7129.1738.4750.3161.2470.9979.5785.19
70Wald6.048.3614.3624.2137.0551.2764.9176.6785.0091.1295.13
AIC17.6921.1930.1544.0958.3572.0982.0289.7094.6197.2498.47
BIC4.727.0211.5421.5033.5948.0960.9473.4083.6489.9294.26
100Wald5.838.7917.8832.0049.5466.6879.7489.6795.5998.1099.19
AIC16.5722.1136.7553.6471.1883.2491.8196.3798.5799.4299.84
BIC3.276.1413.7526.1942.3860.4474.7486.1993.0296.6998.56
Table 10. Analysis of the number of times of lower respiratory infection data: Variables to enter the model using the forward selection procedure through the score test, Wald test, LRT test, AIC, and BIC for the Poisson and negative binomial regression models.
Table 10. Analysis of the number of times of lower respiratory infection data: Variables to enter the model using the forward selection procedure through the score test, Wald test, LRT test, AIC, and BIC for the Poisson and negative binomial regression models.
MethodPoisson Regression ModelNegative Binomial Regression Model
First StepSecond StepThird Step4th StepFirst StepSecond StepThird Step
Score x 2 x 3 x 2 x 3
Wald x 2 x 3 x 2 x 3
LRT x 2 x 3 x 8 x 2 x 3 x 8
AIC x 2 x 3 x 8 x 6 x 2 x 3 x 8
BIC x 2 x 3 x 8 x 2 x 3 x 8
Table 11. Variables to enter the model using the forward selection procedure through the score test, Wald test, LRT test, AIC, and BIC for the binomial regression model.
Table 11. Variables to enter the model using the forward selection procedure through the score test, Wald test, LRT test, AIC, and BIC for the binomial regression model.
MethodBinomial Regression Model
First StepSecond StepThird Step4th Step5th Step
Score x 3 x 5
Wald x 3 x 5
LRT x 9 x 5 x 3
AIC x 9 x 5 x 3 x 8 x 6
BIC x 9 x 5 x 3
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Mamun, A.; Paul, S. Model Selection in Generalized Linear Models. Symmetry 2023, 15, 1905. https://doi.org/10.3390/sym15101905

AMA Style

Mamun A, Paul S. Model Selection in Generalized Linear Models. Symmetry. 2023; 15(10):1905. https://doi.org/10.3390/sym15101905

Chicago/Turabian Style

Mamun, Abdulla, and Sudhir Paul. 2023. "Model Selection in Generalized Linear Models" Symmetry 15, no. 10: 1905. https://doi.org/10.3390/sym15101905

APA Style

Mamun, A., & Paul, S. (2023). Model Selection in Generalized Linear Models. Symmetry, 15(10), 1905. https://doi.org/10.3390/sym15101905

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop