Model Selection in Generalized Linear Models

Mamun, Abdulla; Paul, Sudhir

doi:10.3390/sym15101905

Open AccessArticle

Model Selection in Generalized Linear Models

by

Abdulla Mamun

¹

and

Sudhir Paul

^2,*

¹

Department of Mathematics, Gonzaga University, Spokane, WA 99258-0102, USA

²

Department of Mathematics and Statistics, University of Windsor, Windsor, ON N9B 3P4, Canada

^*

Author to whom correspondence should be addressed.

Symmetry 2023, 15(10), 1905; https://doi.org/10.3390/sym15101905

Submission received: 24 August 2023 / Revised: 25 September 2023 / Accepted: 9 October 2023 / Published: 11 October 2023

(This article belongs to the Special Issue Skewed (Asymmetrical) Probability Distributions and Applications across Disciplines III)

Download Versions Notes

Abstract

:

The problem of model selection in regression analysis through the use of forward selection, backward elimination, and stepwise selection has been well explored in the literature. The main assumption in this, of course, is that the data are normally distributed and the main tool used here is either a t test or an F test. However, the properties of these model selection procedures are not well-known. The purpose of this paper is to study the properties of these procedures within generalized linear regression models, considering the normal linear regression model as a special case. The main tool that is being used is the score test. However, the F test and other large sample tests, such as the likelihood ratio and the Wald test, the AIC, and the BIC, are included for the comparison. A systematic study, through simulations, of the properties of this procedure was conducted, in terms of level and power, for symmetric and asymmetric distributions, such as normal, Poisson, and binomial regression models. Extensions for skewed distributions, over-dispersed Poisson (the negative binomial), and over-dispersed binomial (the beta-binomial) regression models, are also given and evaluated. The methods are applied to analyze two health datasets.

Keywords:

generalized linear model; over-dispersion; score test; Wald test; likelihood ratio test

1. Introduction

In modern scientific studies, a primary focus is on selecting the appropriate models to use. Researchers typically gather data by measuring various aspects of the subjects being observed and then analyze how these variables affect a specific outcome. It is essential to determine which measures are essential to the outcome, identify any irrelevant measures, and evaluate any potential interactions between the variables that require consideration [1].

In particular, the importance of model selection in regression analysis when dealing with normally distributed response variables is very familiar and widely applied in many areas of study, including engineering, biomedical sciences, and social sciences. The overwhelming interest to researchers in these fields is to obtain a regression model with as few regression parameters as possible (a property called a parsimonious model). The popular method, in practice, is one of forward selection, backward elimination, or stepwise selection procedures through a test of significance of a single regression coefficient, for example, a test of

H_{0} : β_{j} = 0

, using the F test ([2,3]), where

β_{j}

is the jth regression parameter in a multiple linear regression model. Other model selection procedures, such as the Akaike information criterion (AIC) [4] and the Bayesian information criterion (BIC) [5] are also available. However, the properties of these model selection procedures are not well-known.

The purpose of this paper is to study the properties of these procedures in generalized linear models (GLMs). In GLMs, the choice of probability distribution is not limited to symmetric distributions like the normal distribution. It encompasses a range of asymmetric probability distributions, including the binomial and Poisson distributions. In this paper, a model selection procedure is developed to accommodate both symmetric and asymmetric regression models.

The history of regression starts from Gauss and Legendre who first introduced the method of least squares in the early 1800s. Later, between the 1800s and 1900s, Galton and Pearson developed the concept of regression. It was Fisher who combined the works of Gauss and Pearson to form a complete theory of the properties of least squares estimation. Fisher’s contribution to this field made regression analysis useful for predicting and understanding correlations as well as making inferences about the relationship between a response and a covariate. Later, nonparametric regression and semiparametric regression methods were developed based on kernels by Fan [6], splines by Eilers and Marx [7], and wavelets by Bock and Pliego [8]. In this paper, we work on generalized linear regression models.

In an ordinary linear regression model (OLS), we assume that the error terms are normally distributed with common variance. However, in binomial regression models, the error terms can only be 0 or 1 for each observation, and no constant variance. Moreover, in the Poisson regression model, error terms can only be positive numbers, whereas in OLS, error terms could take on any value on the real number line. As a result, it cannot be assured, without further investigation, whether the model selection procedures that work for normally distributed data will also work for non-symmetric data, such as the Poisson and binomial data.

There have been recent advancements in selecting variables for GLM that are suitable for either large datasets or those with high dimensions. Some of the references for variable selection in big data are based on elastic net regularization paths [9], debiased lasso [10], reference models [11], regularized version of the least-squares criterion [12]. For variable selection in high-dimensional GLMs with binary outcomes, see [13], for temporal-dependent data, refer to [14], and for knowledge transfer, refer to [15]. Model selection has garnered significant attention in the Bayesian approach to generalized linear mixed models [16]. In this context, it is worth noting that there is a large amount of literature on goodness-of-fit tests, for example [17,18,19,20,21,22,23,24]. However, in this paper, we investigate variable selection in GLM for non-symmetric data, such as binomial and Poison regression models, when the dataset is small or moderate. This type of investigation, to the best of our knowledge, does not exist in the literature.

There are two aspects to the model selection procedure: (a) finding a suitable test statistic for testing the significance of a single regression coefficient, for example, to test

H_{0} : β_{j} = 0

, which performs best in holding an appropriate level of significance, say

5 %

, and has superior power properties, and (b) finding a model selection procedure using this suitable test statistic, which, again, has the best properties with respect to level and power.

For (a), we developed three large sample test statistics, namely, the score test, the likelihood ratio test, and the Wald test. These three tests, along with the usual F test, are compared using a simulation study.

The score test [25] is a special case of the

C (α)

test [26], where the nuisance parameters are replaced by maximum likelihood estimates, which are

\sqrt{n}

-consistent; here, n denotes the number of observations used in estimating the parameters. The score test is particularly appealing as we only have to study the distribution of the test statistic under the null hypothesis, which is that of the basic model. It often maintains, at least approximately, a preassigned level of significance, and often produces a statistic that is simple to calculate. On the contrary, the other two asymptotically equivalent tests (the LRT test and Wald test) require estimates of the parameters under the alternative hypothesis and often show liberal or conservative behaviors in small samples. For further discussion, see [27].

For (b), an extensive simulation study was conducted to compare the properties of the forward selection and the backward elimination procedure using the best statistic found in (a), with the AIC and the BIC. Further discussion on this is provided in Section 3.1.1.

In Section 2, we develop the three large sample test statistics, which are then specialized for data from the normal, Poisson, and binomial distributions. The F statistic used in model selection for data from a normal distribution is also discussed. The results of an extensive simulation study are reported in Section 3. Extensions for asymmetric distributions, such as over-dispersed Poisson (the negative binomial) and over-dispersed binomial (the beta-binomial) regression models, are presented and evaluated in Section 4. Two examples are presented in Section 5; a discussion follows in Section 6.

2. Generalized Linear Model and the Test Statistics

2.1. Generalized Linear Model

The Generalized Linear Model (GLM) was developed by Nelder and Wedderburn [28]. A GLM is the generalization of ordinary linear regression models to encompass non-normal response distributions and nonlinear functions of the mean. It is composed of three components:

(i): The random component: This describes the response variable y (categorical or continuous) and its probability distribution.
(ii): The systematic component: This connects a set of covariates with a linear predictor in the following form:

$\begin{matrix} \begin{matrix} η = \sum_{j = 1}^{p} X_{j} β_{j} . \end{matrix} \end{matrix}$

(1)
(iii): The link function: It is a monotone differentiable function f applied to each component of $E (y)$ , which connects the random and systematic components through $η = f (E (y))$ . For more details, see [28,29].

The random variable Y has a distribution of the GLM form if

\begin{matrix} f (y; θ) = exp [a (θ) y - g (θ) + c (y)], \end{matrix}

(2)

where

θ = η = X β

. In GLMs,

V (μ)

is a variance function that characterizes a particular GLM family of distributions. Apart from the normal distribution, the discrete models, namely, the binomial model and the Poisson model, belong to this family. A set of covariates

x_{1}, x_{2}, \dots, x_{k}

is related to the mean

μ

by

θ (μ) = X β

, where

θ

is the link function,

X = [x_{i r}]

is an

n \times p

matrix, and

β = {(β_{0}, β_{1}, \dots, β_{p})}^{'}

is the vector of regression parameters. Furthermore, we assume that

x_{i 0} = 1

, so that

β_{0}

is the intercept parameter.

Inference procedures regarding the mean

μ

or the regression parameters

β_{0}, β_{1}, \dots, β_{p}

are made using the log-likelihood function

l (y, μ)

. The log-likelihood for

Y_{i}, (i = 1, 2, \dots, n)

can be written as

\begin{matrix} l = \sum_{i = 1}^{n} [a (θ_{i}) y_{i} - g (θ_{i}) + c (y_{i})] \end{matrix}

(3)

2.2. The Test Statistics

Our interest is to develop a test statistic for testing the hypothesis where one of the

β

parameters is zero. As such, we consider the null hypothesis

H_{0} : β_{j} = 0

with

β_{0}, β_{1}, \dots, β_{j - 1},

β_{j + 1}, \dots, β_{p}

unspecified, against

H_{a} : β_{j} \neq 0

.

In order to develop the test statistic that follows a distribution of the GLM form, we need to obtain the maximum likelihood estimates of the

β

parameters under the null as well as under the alternative hypotheses using the log-likelihood in Equation (1) developed above, the first derivative of which is

\begin{matrix} \frac{\partial l}{\partial β_{j}} = \sum_{i = 1}^{n} \frac{y_{i} - μ_{i}}{V_{i}} \frac{\partial μ_{i}}{\partial θ_{i}} x_{i j}, \end{matrix}

(4)

where

μ_{i} = E (y_{i}) = \frac{g^{'} (X_{i} β)}{a^{'} (X_{i} β)}

,

\frac{\partial μ_{i}}{\partial η_{i}} = h^{'} (X_{i} β)

, and

V_{i} = var (y_{i}) = \frac{g^{″} (X_{i} β) a^{'} (X_{i} β) - a^{″} (X_{i} β) g^{'} (X_{i} β)}{{(a^{'} (X_{i} β))}^{3}}

, where

^{'}

denotes differentiation with respect to

θ_{i}

. To estimate the parameters

β_{k},

k = 0, 1, \dots, p

, we need to solve

\frac{\partial l}{\partial β_{k}} = 0

, which is non-linear in

β_{k}

, so must be solved iteratively.

Note that under the null hypothesis, we estimate

β_{k}

for

k = 0, 1, \dots, j - 1, j + 1, \dots, p

. We denote these estimates through

{\hat{β}}_{k}

. Furthermore, under the alternative hypothesis, we estimate

β_{k}

, for

k = 0, 1, \dots, p

. We denote these estimates by

{\tilde{β}}_{k}

.

2.2.1. The Likelihood Ratio Test and the Wald Test

Generally, the likelihood ratio statistic used to test a null hypothesis against an alternative is the ratio of the maximum likelihood under the null hypothesis to that under the alternative hypothesis. In practice, we maximize the log-likelihoods to find maximum likelihood estimates of the parameters under the null and the alternative hypotheses. Let

\hat{l}

and

\tilde{l}

be the maximized log-likelihood under the null and the alternative hypotheses, respectively. Then, it can be shown that the likelihood ratio statistic for testing the null hypothesis

H_{0} : β_{j} = 0

with

β_{0}, β_{1}, \dots, β_{j - 1},

β_{j + 1}, \dots, β_{p}

unspecified, against

H_{a} : β_{j} \neq 0

, is

L R T_{j} = 2 (\tilde{l} - \hat{l})

.

Similarly, the Wald test statistic is the ratio of the maximum likelihood estimate of the parameter of interest under the alternative hypothesis and its standard error. Thus, the Wald test statistic for testing the null hypothesis

H_{0} : β_{j} = 0

with

β_{0}, β_{1}, \dots, β_{j - 1},

β_{j + 1}, \dots, β_{p}

unspecified, against

H_{a} : β_{j} \neq 0

, is given by

W_{j} = \frac{{\tilde{β}}_{j}}{\sqrt{var ({\tilde{β}}_{j})}}

, where

var (\tilde{β_{j}})

is obtained from the Hessian matrix at the end of the iterative process.

2.2.2. The Score Test

The score test is based on the partial derivatives of the log-likelihood function with respect to the nuisance parameters and the parameters of interest evaluated at the null hypothesis. The score test statistic can be shown to be

\begin{matrix} S_{j} = \frac{{\hat{P_{j}}}^{2}}{x_{j}^{'} \hat{W} (I_{n} - X_{j} {(X_{j}^{'} \hat{W} X_{j})}^{- 1} X_{j}^{'} \hat{W}) x_{j}} . \end{matrix}

(5)

For the derivation of the score test statistics and the definition of

\hat{P_{j}}

,

x_{j}

,

\hat{W}

, and

X_{j}

; see Appendix A. The above score test can also be obtained from Pregibon [30], who developed the score test for the generalized linear interactive modeling system. The proof is presented in Appendix B. Note that the symbol that represents the MLE under the null hypothesis. Asymptotically (for large n), the distribution of each test statistic,

L R T_{j}

,

W_{j}^{2}

, and

S_{j}

, converges to

χ^{2} (1)

[26]. Therefore, for a fixed significance level

α > 0

, we reject the null hypothesis if the value of a test statistic is greater than

χ_{α}^{2} (1)

.

To save space, the expressions for the three test statistics,

L R T_{j}

,

W_{j}

, and

S_{j}

, for the special cases for which the data distribution is normal, Poisson, and binomial, respectively, are presented in Appendix A.

2.2.3. The F Test

The F statistic used in model selection for data from a normal distribution is

N F = \frac{SSR (x_{j} | x_{1}, \dots, x_{j - 1}, x_{j + 1}, \dots, x_{p}) / df 1}{SSE (x_{1}, \dots, x_{p}) / df 2}

, where

SSR (x_{j} | x_{1}, \dots, x_{j - 1}, x_{j + 1}, \dots, x_{p}) = SSE (x_{1}, \dots, x_{j - 1}, x_{j + 1}, \dots, x_{p}) - SSE (x_{1}, \dots, x_{p}),

df 1 = 1

and

df 2 = n - p - 1

. Here,

N F \sim F (1, n - p - 1)

if

H_{0}

holds ([3], p. 267).

2.3. Simulation Study

A simulation study is now conducted to compare the behaviors of the four test statistics, namely, the score, the LRT, the Wald, and the F, in terms of empirical level and power, for testing the significance of a single regression coefficient. We consider a two-variable regression model with link functions

μ = β_{0} + β_{1} x_{1} + β_{2} x_{2}

,

λ = exp (β_{0} + β_{1} x_{1} + β_{2} x_{2})

, and

\frac{p}{1 - p} = exp (β_{0} + β_{1} x_{1} + β_{2} x_{2})

for

N (μ, σ^{2})

,

Poisson (λ)

, and

Bin (m, p)

distribution, respectively.

x_{1}

and

x_{2}

are generated from the standard normal distribution and

σ = 2

is considered for normal distribution.

Suppose our interest is to test

H_{0} : β_{2} = 0

against

H_{a} : β_{2} \neq 0

in each case. For empirical levels, we take

β_{0} = 1,

β_{1} = - 1

, and

β_{2} = 0

. For power, we take

β_{0} = 1

and

β_{1} = - 1

, and different values of

β_{2}

, as presented in Table 1 for normal and Poisson-distributed data, and Table 2 for binomial-distributed data.

For data from the binomial distribution, the level and power results may be affected by the binomial index m. To check this, we conduct simulations for

m = 10

,

m = 30

, and

m = 40

. For both level and power, we consider sample sizes

n = 10, 20, 30,

and 50 for all distributions. Each simulation experiment is based on 10,000 replicated samples. The level and power results are presented in Table 1 for normal and Poisson distributions and in Table 2 for binomial distribution. Results in Table 1 show that for normally distributed data, the score test and the F test maintain the level reasonably well, although the score test shows some inflated level. As a result, it shows some inflated power. The other two statistics (Wald and LRT) show liberal behavior. Because of this, these two statistics show higher power than the other two tests.

Results in Table 1 and Table 2 show that for data from the Poisson and binomial distributions, the F test performs very badly. The other three statistics hold the level very well and their power performances are also similar. Furthermore, results in Table 2 show that the size of the binomial index m does not have any effect on the size and power of the tests. So, in subsequent sections, we choose

m = 40

as the binomial index.

We further conducted a simulation study where the covariates

x_{1}

and

x_{2}

are correlated for Poisson and binomial distributions, and the results (not included in the paper) show similar empirical level and empirical power properties.

It is reassuring that the F test does well for data from the normal distribution. So, in Section 3, we use this test in the study of the performance of the model selection procedures for normally distributed data. For data from Poisson and binomial distributions, we use the score test as it has a very simple form, it does not need estimates of the regression parameters under the alternative hypothesis, and its level and power properties are at least as good as those of the LRT and the Wald tests.

3. Model Selection

3.1. Empirical Level and Power

Following the findings in Section 2.3, our model selection criterion for normally distributed data is based on testing the significance of a single regression coefficient

β_{j}

using the F test presented in Section 2.2.3. Also, as discussed in Section 2.3, for data from the Poisson and the binomial distributions, we use the score test statistic

S P_{j}

and

S B_{j}

, respectively, presented in Appendix A. Our purpose here is to make a comparative study of the performance of forward selection, backward elimination, AIC, and BIC, with respect to level and power.

Although these model selection procedures are well known, to help the readers, we provide brief descriptions of them below.

Forward Selection Procedure: The forward selection starts with only one variable in the model. So, if the model has p regression variables, apart from the intercept, in the first step, we fit p regression models and calculate the value of the score test statistic for each model. Then, the variable corresponding to the largest value of the score test statistic, which is also significant at a specified level of significance, is kept in the model. In step 2, we fit

p - 1

regression models with the regression variable selected at step 1, and one of the remaining

p - 1

regression variables, and follow the procedure as in step 1. We then continue this process by adding one more variable, each time, until no more variables can be included in the model. In the end, the final model will have

q \leq p

variables.

Backward Elimination Procedure: The backward elimination starts with the full model. We calculate (p) the score test statistic for testing

H_{0} : β_{j} = 0

,

j = 1, 2, \dots, p

. Then, if the variable with the smallest value of the score test statistic is found to be insignificant at a specified level of significance, we remove that variable from the model. We then continue this process by removing one more variable each time, until no more variables can be deleted from the model.

AIC and BIC Criteria: AIC judges a model by how close its fitted values tend to be to the true values, in terms of a certain expected value. AIC can be written as

A I C = - 2 l + 2 p

. Forward selection through AIC starts from the null model and every variable outside the current model can be added one at a time at each step until AIC is no better. A Bayesian argument motivates the BIC, an alternative to AIC. It takes the sample size into account and the forward selection process through BIC is similar to AIC, where

B I C = - 2 l + ln (n) p

.

As mentioned earlier, our purpose is to find the most parsimonious model. Here, we illustrate a method of calculating the empirical level using a p variable Poisson regression model with

ln (μ) = β_{0} + β_{1} x_{1} + \dots + β_{p} x_{p}

. For given values of the regression parameters and simulated values of the regression variables, we obtain a sample of size n from the Poisson

(μ)

distribution. We then use the score test statistic for testing

H_{0} : β_{j} = 0

and a model selection procedure, for example, the forward selection procedure, and find a model of a subset of the regression variables. We repeat this process 10,000 times and find 10,000 models. If the given value of

β_{j}

is very small, we want to see that the regression variable

x_{j}

is in the final model. We then count the number of models in which the variable

x_{j}

is included. Let this number be s. Then the empirical level for rejecting

H_{0} : β_{j} = 0

is s/10,000. Empirical power is calculated similarly by taking a larger value of

β_{j}

during the simulation process.

3.1.1. Simulation Study

We conduct a simulation study to compare the performance of the model selection procedures, forward selection, backward elimination, AIC, and BIC, with respect to empirical level and power. We consider a four-variable regression model. Data are drawn from the normal

N (μ, σ^{2})

regression model, the Poisson

(λ)

regression model, and the Binomial

(m, p)

regression model with

\begin{matrix} \begin{matrix} μ & = & β_{0} + β_{1} x_{1} + β_{2} x_{2} + β_{3} x_{3} + β_{4} x_{4}, \\ λ & = & exp (β_{0} + β_{1} x_{1} + β_{2} x_{2} + β_{3} x_{3} + β_{4} x_{4}), and \\ \frac{p}{1 - p} & = & exp (β_{0} + β_{1} x_{1} + β_{2} x_{2} + β_{3} x_{3} + β_{4} x_{4}) \end{matrix} \end{matrix}

(6)

respectively. Suppose we would like to test

H_{0} : β_{1} = 0

. To calculate the empirical level for each distribution, we choose

β_{1} = 0

, and for empirical power, we take different values of

β_{1}

, as presented in Table 3. The rest of the parameters are set at

σ^{2} = 2

,

β_{2} = - 0.3

,

β_{3} = 0.2

, and

β_{4} = 0.3

for normal and Poisson distributions, and

m = 40

,

β_{2} = 0.2

,

β_{3} = - 0.1

, and

β_{4} = - 0.2

for binomial distributions. For each distribution, 10,000 replicated samples are taken for sample sizes of

n =

10, 20, 30, and 50.

For the forward selection and backward elimination procedures, we consider

α = 0.05

. Note that for the other two procedures,

α

cannot be fixed.

The level and power results are presented in Table 3 and Table 4, which show that the forward selection method using the F test for normal-distributed, and both forward selection and backward elimination using the score test for Poisson and binomial-distributed data, always produce a reasonable empirical level (close to the nominal level), irrespective of the sample size. The other two procedures, the AIC and BIC, produce highly inflated type I errors. The BIC, however, does well for a large sample size (

n = 50

), where its power performance is also comparable to that of the forward selection and backward elimination procedures using the score test.

Thus, for normal regression models, our recommendation is to use the forward selection procedure using the F test. For Poisson and binomial regression models, our recommendation is to use the forward selection procedure using the score test for small to moderate sample sizes, while for large n (

n > 50

) sizes, the BIC should be used as it is computationally much simpler.

4. Over-Dispersed Poisson and Over-Dispersed Binomial Regression Models

4.1. Introduction and Motivation

Discrete data, in terms of proportions, are commonly encountered in toxicology and related areas. When the experimental unit is a litter, there tends to be a litter effect, meaning that littermates respond more similarly to each other than to animals from other litters. Based on the experimental data, fetuses from the same litter tend to have similar responses to the treatment. The probability of success may vary across litters, indicating that a binomial model may not be a good fit for proportion-based data. The two-parameter beta-binomial (BB) model is widely used for analyzing count data of this nature, proposed originally by Williams [31] and later applied by Paul [32] and others, assuming that the binomial parameter varies between litters.

Discrete data in the form of counts arise in many health science disciplines, such as biology and epidemiology. For examples of discrete count data, see [19,20,33,34,35,36,37].

The Poisson distribution has the property that the mean and the variance are equal. However, in practice, count data often display extra-Poisson variation or over/under dispersion relative to a Poisson model. Thus, Poisson distribution is not an ideal choice for analyzing count data in many applications. One very convenient and common model to accommodate this extra dispersion is the two-parameter negative binomial distribution. For applications of the negative binomial distribution, see, for example, [38,39,40,41].

In Section 4.2 and Section 4.3, we extend the methods and ideas developed in Section 2 and Section 3 for model selection for Poisson and binomial regression models to over-dispersed Poisson and over-dispersed binomial regression models, respectively. Specifically, we deal with model selection procedures in negative binomial regression and beta-binomial regression models. Here, we first develop the score, the LRT, and the Wald tests for testing the significance of a single regression variable, and then for model selection, we compare the forward selection, the AIC, and the BIC procedures.

4.2. Negative Binomial Regression Model

Consider the negative binomial (NB) distribution with probability density function

\begin{matrix} f (y; m, c) & = \frac{Γ (y + c^{- 1})}{Γ (c^{- 1}) y!} {(\frac{c m}{c m + 1})}^{y} {(\frac{1}{c m + 1})}^{c^{- 1}}, \end{matrix}

(7)

with mean

E (y) = m

and variance

var (y) = m (1 + c m)

(see [42]). We denote this distribution as

NB (m, c)

. In Equation (7), term c represents the dispersion parameter, which is constant. Clearly, when

c \to 0

, the NB distribution reduces to the Poisson distribution with parameter m.

Let

y_{i}, i = 1, \dots, n

, be a random sample from the

N B (m_{i}, c)

distribution with

m_{i} = exp (x_{i}^{'} β) = exp (β_{0} + x_{i 1} β_{1} + \dots + x_{i p} β_{p})

, then

\frac{\partial m_{i}}{\partial β_{j}} = m_{i} x_{i j}

. Then the log-likelihood of the NB regression model is

\begin{matrix} l & = \sum_{i = 1}^{n} \{y_{i} log (m_{i}) - (y_{i} + c^{- 1}) log (c m_{i} + 1) + \sum_{j = 1}^{y_{i}} log [1 + c (j - 1)]\} . \end{matrix}

(8)

The first and second-order partial derivatives of the log-likelihood function with respect to the parameters

β

and c are presented in Appendix B.

4.2.1. Derivation of the Test Statistics

We follow the same procedure to find the score test for testing

H_{0} : β_{j} = 0

as described in Appendix A. Omitting the details, the score, the Wald, and LRT statistics are

\begin{matrix} \begin{matrix} S N B_{j} = & S^{'} {(D - A_{1} B_{11}^{- 1} A_{1}^{'})}^{- 1} S = \frac{{(\sum_{i = 1}^{n} \frac{(y_{i} - {\hat{m}}_{i}) x_{i j}}{1 + \hat{c} {\hat{m}}_{i}})}^{2}}{x_{j}^{'} W (I_{n} - X_{j} {(X_{j}^{^{'}} W X_{j})}^{- 1} X_{j}^{^{'}} W) x_{j}}, \\ W N B_{j} = & {\tilde{β}}_{j} / \sqrt{var ({\tilde{β}}_{j})} = {\tilde{β}}_{j} / \sqrt{\sum_{i = 1}^{n} \frac{{\tilde{m}}_{i}}{1 + \tilde{c} {\tilde{m}}_{i}} x_{i j}^{2}}, and \\ L N B_{j} = & 2 \sum_{i = 1}^{n} {y_{i} log \frac{{\tilde{m}}_{i}}{{\hat{m}}_{i}} - (y_{i} + {\tilde{c}}^{- 1}) log (\tilde{c} {\tilde{m}}_{i} + 1) \\ + (y_{i} + {\hat{c}}^{- 1}) log (\hat{c} {\hat{m}}_{i} + 1) + \sum_{l = 1}^{y_{i}} log [\frac{1 + {\tilde{c}}^{- 1} (l - 1)}{1 + {\hat{c}}^{- 1} (l - 1)}]}, \end{matrix} \end{matrix}

(9)

where

w_{i} = \frac{{\hat{m}}_{i}}{1 + \hat{c} {\hat{m}}_{i}}

,

W = diag (w_{1}, \dots, w_{n})

,

{\hat{m}}_{i} = exp ({\hat{β}}_{0} + {\hat{β}}_{1} x_{i 1} + \dots + {\hat{β}}_{j - 1} x_{i (j - 1)} + {\hat{β}}_{j + 1} x_{i (j + 1)} + \dots + {\hat{β}}_{p} x_{i p})

, and

{\tilde{m}}_{i} = exp (x_{i}^{'} \tilde{β})

, and where

\tilde{β}

is the maximum likelihood estimate of

β

under the alternative hypothesis.

4.2.2. Simulation Study

We conducted two simulation studies; the first compared the performances of three test statistics, the score, the Wald, and the LRT; and the other compared the performances of model selection through forward selection, AIC, and BIC.

Empirical level and power of the score, the Wald, and the LRT tests: Data are simulated from the negative binomial regression model

NB (m, c)

with link function

m = exp (β_{0} + β_{1} x_{1} + β_{2} x_{2})

and we would like to test the null hypothesis

H_{0} : β_{2} = 0

against

H_{a} : β_{2} \neq 0

.

For empirical levels: We simulate response data from the negative binomial regression model with

c = 0.03

,

β_{0} = 2

,

β_{1} = - 0.3

, and

β_{2} = 0

. For powers, different values of

β_{2}

are taken, as represented in Table 5. The independent variables,

x_{1}

and

x_{2}

, are generated from the standard normal distribution.

The level and power results are presented in Table 5. The results show that the score test has the best level property (empirical level close to the nominal level). The other two statistics show some inflation of the empirical level compared to the nominal level, which results in a higher power for the Wald and the LRT statistics. So, here, we also use the score test statistic in model selection with the forward selection procedure.

Empirical level and power in model selection through the forward selection procedure: A simulation study is conducted similar to that in Section 3.1 to compare the property of the model selection procedure through forward selection using the score test with the other two criteria (AIC and BIC).

Data are taken from the

NB (m, c)

distribution with

m = exp (β_{0} + β_{1} x_{1} + β_{2} x_{2} + β_{3} x_{3} + β_{4} x_{4})

,

β_{0} = 2

,

β_{1} = 0

,

β_{2} = - 0.3

, and

β_{4} = - 0.1

for the empirical level and values of

β_{1}

are presented in Table 6 for power. The value of c is

c = 0.03

. Furthermore, as in Section 3, sample sizes and nominal level are chosen for

n = 10

, 20, 30, 50, and

α = 0.05

. The level and power results are presented in Table 6.

The results in Table 6 show a similar performance akin to that of forward (score), as observed in Table 3, specifically indicating its level is close to the nominal level. The other procedures show a highly inflated empirical level, even for large n (

n = 50

).

Study of misspecification of models: A small study of the misspecification of models was conducted. Specifically, we studied the performance of test statistics developed under the assumption of Poisson-distributed data, where the data are distributed as negative binomial and vice-versa. In Table 7, the results of such a study are presented. When data are generated from the negative binomial distribution, where the statistics are developed using the Poisson probability density function, the statistics show an inflated level and power. However, when data are generated from the Poisson distribution, where the statistics are developed using the negative binomial probability density function, the statistics do not show an inflated level and power. This is reasonable as the Poisson distribution is a special case of the negative binomial distribution.

4.3. Beta-Binomial Regression Model

Suppose that Y follows a beta-binomial distribution with mean

μ

and dispersion parameter

θ

, denoted by

Y \sim BB (k, μ, θ)

if Y has the following probability mass function

\begin{matrix} P (Y = y) = (\begin{matrix} k \\ y \end{matrix}) \frac{\prod_{r = 0}^{y - 1} (μ + r θ) \prod_{r = 0}^{k - y - 1} (1 - μ + r θ)}{\prod_{r = 0}^{k - 1} (1 + r θ)}, \end{matrix}

(10)

for

y = 0, 1, \dots, k

,

0 \leq μ \leq 1

and

θ \geq \max [- μ / (k - 1), - (1 - μ) / (k - 1)]

with mean

E (Y) = k μ

and variance

var (Y) = k μ (1 - μ) [1 + (k - 1) ϕ]

, where

ϕ = θ / (1 + θ)

(see [31,32]).

Note that, as

θ \to 0

, the

BB (k, μ, θ)

tends to the

binomial (k, μ)

distribution; for

θ = 0

, we have

var (Y) = k μ (1 - μ)

, and

BB (k, μ, θ)

becomes the

binomial (k, μ)

distribution.

Let

y_{i}, i = 1, \dots, n

be a random sample from the

BB (k_{i}, μ_{i}, θ)

. Then the log-likelihood is

\begin{matrix} l = \sum_{i = 1}^{n} [\sum_{r = 0}^{y_{i} - 1} log (μ_{i} + r θ) + \sum_{r = 0}^{k_{i} - y_{i} - 1} log (1 - μ_{i} + r θ) - \sum_{r = 0}^{k_{i} - 1} log (1 + r θ)] . \end{matrix}

(11)

The mean

μ_{i}

is assumed to follow the logistic model

μ_{i} (x_{i}^{'}, β) = \frac{exp (x_{i}^{'} β)}{1 + exp (x_{i}^{'} β)}

, so

\frac{\partial μ_{i}}{\partial β_{j}} = μ_{i} (1 - μ_{i}) x_{i j}

. The first- and second-order partial derivatives of l, with respect to parameters

β

and

θ

, are presented in Appendix B.

4.3.1. Derivation of the Test Statistics

Using the same procedure that is presented in Appendix A, the score test statistic for testing

H_{0} : β_{j} = 0

is

\begin{matrix} S B B_{j} = \frac{{(\sum_{i = 1}^{n} (\sum_{r = 0}^{y_{i} - 1} \frac{1}{{\hat{μ}}_{i} + r \hat{θ}} - \sum_{r = 0}^{k_{i} - y_{i} - 1} \frac{1}{1 - {\hat{μ}}_{i} + r \hat{θ}}) {\hat{μ}}_{i} (1 - {\hat{μ}}_{i}) x_{i j})}^{2}}{{\hat{V}}_{j}}, \end{matrix}

(12)

where

\frac{{\hat{μ}}_{i}}{1 - {\hat{μ}}_{i}} = exp ({\hat{β}}_{0} + {\hat{β}}_{1} x_{i 1} + \dots + {\hat{β}}_{j - 1} x_{i (j - 1)} + {\hat{β}}_{j + 1} x_{i (j + 1)} + \dots + {\hat{β}}_{p} x_{i p})

,

\hat{β}

, and

\hat{θ}

are the maximum likelihood estimates of

β

and

θ

under the null hypothesis, and

{\hat{V}}_{j} = V_{j} (\hat{μ}, \hat{θ})

is presented in Appendix C.

The Wald test and LRT test statistics are as follows:

\begin{matrix} \begin{matrix} W B B_{j} = & {\tilde{β}}_{j} / \sqrt{var ({\tilde{β}}_{j})} = {\tilde{β}}_{j} / \sqrt{\sum_{i = 1}^{n} (p_{1 i} + p_{2 i}) {\tilde{μ}}_{i}^{2} {(1 - {\tilde{μ}}_{i})}^{2} x_{i j}^{2}} and \\ L B B_{j} = & 2 \sum_{i = 1}^{n} [\sum_{r = 0}^{y_{i} - 1} log \frac{{\tilde{μ}}_{i} + r \tilde{θ}}{{\hat{μ}}_{i} + r \hat{θ}} + \sum_{r = 0}^{k_{i} - y_{i} - 1} log \frac{1 - {\tilde{μ}}_{i} + r \tilde{θ}}{1 - {\hat{μ}}_{i} + r \hat{θ}} - \sum_{r = 0}^{k_{i} - 1} log \frac{1 + r \tilde{θ}}{1 + r \hat{θ}}], \end{matrix} \end{matrix}

(13)

where

\tilde{β}

and

\tilde{θ}

are the maximum likelihood estimates of

β

and

θ

under the alternative hypothesis with

\frac{{\tilde{μ}}_{i}}{1 - {\tilde{μ}}_{i}} = exp (x_{i}^{'} \tilde{β})

.

4.3.2. Simulation Study

Two simulation studies are conducted in this subsection: the first compares the performances of three test statistics and the other compares the performance of the model selection by forward selection through the Wald test and using AIC and BIC.

Empirical level and power of the score, the Wald, and the LRT tests: We take data from the beta-binomial regression model

BB (k, μ, θ)

with the link function

\frac{μ}{1 - μ} = exp (β_{0} + β_{1} x_{1} + β_{2} x_{2})

and test the null hypothesis

H_{0} : β_{2} = 0

against

H_{a} : β_{2} \neq 0

.

For empirical levels: We simulate data from the beta-binomial regression model

BB (k, μ, θ)

with

k = 40

,

θ = 0.2

,

β_{0} = 1

,

β_{1} = - 0.5

, and

β_{2} = 0

. For powers, different values of

β_{2}

are taken as represented in Table 8. The independent variables

x_{1}

and

x_{2}

are generated from the standard normal distribution. As in previous studies, we consider sample sizes

n = 10, 20, 30,

and 50. The level and power results are presented in Table 8.

The results in Table 8 show that the score test here does not enjoy the favorable level property observed for data from Poisson, binomial, and negative binomial distributions. In this case, the levels are, in general, somewhat liberal for all sample sizes. However, this level property of the score test is consistent in the sense that it holds a similar level, irrespective of the sample size. The other two statistics are liberal for small sample sizes, and as the sample size increases, the level of behavior becomes closer to that of the score test. For larger sample sizes (30 and 50), the empirical level (of all three statistics) is close (and closer) to the nominal level, although it remains somewhat liberal. Furthermore, for these sample sizes, the Wald and the LRT statistics show similar power, which is much better than that of the score test statistic. Thus, for small sample sizes, none of the statistics can be recommended to test the significance of a single regression coefficient. For large sample sizes, although somewhat liberal, we recommend using the Wald test as it has a simple form and a significant power advantage over the score test.

Empirical level and power in model selection through the forward selection procedure: We conducted a simulation study similar to that in Section 3.1 to investigate the model selection behavior through the forward selection procedure using the Wald test, the AIC, and the BIC, in terms of level and power, but only for large sample sizes (see below).

To calculate the empirical level, we generate data from the beta-binomial regression model

BB (k, μ, θ)

with

\frac{μ}{1 - μ} = exp (β_{0} + β_{1} x_{1} + β_{2} x_{2} + β_{3} x_{3} + β_{4} x_{4})

. We choose

k = 40

,

θ = 0.2

,

β_{0} = 1

,

β_{1} = 0

,

β_{2} = - 0.5

,

β_{3} = 0.4

, and

β_{4} = 0.5

for empirical power. We take different values of

β_{1}

, as presented in Table 9. For each simulation experiment, 10,000 replicated samples are taken for sample sizes

n = 30

and 50. For the forward selection procedure, we consider

α = 0.05

. The level and power results are presented in Table 9.

The results in Table 9 show that empirical levels of the forward selection procedure using the Wald test and the BIC are approaching the nominal level (5%) as n increases. However, the empirical level is still not very close to the nominal level. So, we extend this simulation study to

n = 70

and

n = 100

to determine the sample size needed to achieve an empirical level proximate to the nominal level. It shows that the empirical level of the Wald test moves closer to the nominal level as n increases to 70 and 100, although it is still not very close. However, overall, its property is the best, both in terms of level and power. For example, for

n = 100

, empirical levels of the Wald, AIC, and BIC are 5.83, 16.57, and 3.27, respectively. The corresponding powers for

β_{1} = 0.25

are 66.68, 83.24, and 60.44. Compared to the Wald test, AIC shows an inflated empirical level and BIC shows a deflated level. This is reflected in the power results, namely that AIC shows higher power and BIC shows lower power than that of the Wald test.

Also, the empirical power of the procedure using either the Wald test or BIC is similar. Thus, for model selection, our recommendation is to use the forward selection procedure through either the Wald test or the BIC, as both are easy to compute for large sample sizes.

Note that all of the simulation studies were conducted with different parameter values and have similar empirical levels and powers that are not included in the paper to save space.

5. Real Data Analysis

To demonstrate the practical application of the model selection procedures discussed in this paper, we examine two real datasets that have small sample sizes.

Dataset 1: The Lower Respiratory Illness Count Dataset.

This is a dataset provided by LaVange et al. [43], consisting of information on lower respiratory illness in 284 children during their first year of life. Each child was examined every two weeks over a period of one year.

There were eight covariates, namely

x_{1}

: Risk: the number of weeks where the child is at risk in that year.

x_{2}

: Passive: a dummy variable that indicates whether the child was exposed to cigarette smoking.

x_{3}

: Crowding: a variable that indicates whether or not living at home is crowded.

x_{4}

: Race: an indicator variable for race (1 = white, 0 = not white).

(x_{5}, x_{6})

: Socioeconomic status (1, 0), (0, 1), and (0, 0) for low-, medium-, and high-class, respectively, and

(x_{7}, x_{8})

: Age group (1, 0), (0, 1), and (0, 0) for under four, four to six, and more than six months, respectively.

We find this dataset appealing as it comes from a real experiment. However, this is a large dataset. So, we construct a small dataset consisting of a random sample of 50 children with their respective lower respiratory illness status and the covariates. The dataset is presented in Appendix D, and is analyzed below.

This is a count dataset. The usual model to analyze such count data is a Poisson regression model. So, we first use a Poisson regression model and apply the model section procedures discussed earlier. However, there may be overdispersion in the data since the children who have an infection are more likely to have other infections. To test this, we apply the score test statistic,

T_{L M}

, given by Cameron and Trivedi ([44], p. 49) to test

H_{0} : c = 0

versus

H_{a} : c > 0

. This statistic has an asymptotic standard normal distribution, and for the sample data,

T_{L M} = 1.9943

with a p-value

< 0.023

.

We consider a negative binomial regression model to accommodate overdispersion. Thus, the full model considered here for model selection is

\begin{matrix} log y = β_{0} + β_{1} x_{1} + \dots + β_{8} x_{8} . \end{matrix}

(14)

In Table 10, we provide variables that enter into the model in each step of the forward selection procedure using the score test, Wald test, LRT test, AIC, and BIC.

Table 10 shows that two covariates (passive smoking and crowding) are significant out of the eight covariates using the forward selection procedure through the score test, and through the Wald test for the Poisson and negative binomial regression models. In contrast, the forward selection procedure through the AIC and the BIC provides different parsimonious models. We select the final model using the forward selection through the score test.

Thus, the final model for these data is

\begin{matrix} log y = β_{0} + β_{2} x_{2} + β_{3} x_{3} . \end{matrix}

(15)

Example 2: The Coronary Heart Disease Dataset.

The data presented here consist of 50 data points (Rousseauw et al. [45]) from a retrospective sample of 3357 males in a coronary heart disease high-risk region of the Western Cape, South Africa. The response variable y is coronary heart disease, which has two controls. There are nine covariates, namely,

x_{1} :

systolic blood pressure;

x_{2} :

cumulative tobacco (kg);

x_{3} :

low density lipoprotein cholesterol,

x_{4} :

adiposity,

x_{5} :

family history of heart disease,

x_{6} :

type-A behavior,

x_{7} :

obesity,

x_{8} :

current alcohol consumption, and

x_{9} :

age at onset.

We consider a logistic regression model. Thus, the full model considered here for model selection is

\begin{matrix} log (\frac{E (y)}{1 - E (y)}) = β_{0} + β_{1} x_{1} + \dots + β_{9} x_{9} . \end{matrix}

(16)

In Table 11, we present variables that enter into the model in each step of the forward selection procedure using the score test, Wald test, LRT test, AIC, and BIC.

Table 11 shows that two covariates (low-density lipoprotein cholesterol and family history of heart disease) are significant out of the nine covariates using the forward selection procedure through the score test and the Wald test. However, the forward selection procedure through the LRT test and the BIC provides different parsimonious models.

Thus, the final model for these data is

\begin{matrix} log (\frac{E (y)}{1 - E (y)}) = β_{0} + β_{3} x_{3} + β_{5} x_{5} . \end{matrix}

(17)

6. Discussion

In this paper, we first develop a score test procedure for testing the significance of a single covariate in generalized linear models that encompasses a range of symmetric and asymmetric probability distributions. This score test is compared—by extensive simulations—with the Wald test, the likelihood ratio test, and the F test.

The F test does well for data from the normal distribution. For data from Poisson and binomial distributions, the score test performs best.

Next, a comparative study of the performance of a few model selection procedures, such as the forward selection, the AIC, and the BIC, with respect to level and power, was conducted. The other two procedures, backward elimination and stepwise selection, are not included in our study, as in practice, these produce similar final models as those obtained by the forward selection procedure. Furthermore, although these model selection procedures are well-known, to be helpful to the readers, we provide a brief description in this paper.

The F test is well-known, and as it does best for normally distributed data, it is used in model selection for data from this distribution. The score test performs the best for data from Poisson and binomial data and it has a very simple form. So, for data from these distributions, the score test is recommended for model selection.

Simulation studies show that the forward selection procedure using the score test performs best in terms of the level and power for data from all three distributions, although model selection using the F test performs very well for normally distributed data.

The development of the score test procedure for testing the significance of a single covariate and, subsequently, using it in model selection, is extended to over-dispersed Poisson and over-dispersed binomial models, specifically for the negative binomial and beta-binomial models.

7. Conclusions

Our recommendation is to use the forward selection procedure using the F test for normal regression models. For Poisson, binomial, and negative binomial regression models, our recommendation is to use the forward selection procedure with the score test for small to moderate sample sizes; for large n (

n > 50

), the BIC procedure is recommended as it is computationally much simpler. However, for the beta-binomial regression model, our recommendation is to use the forward selection procedure with the Wald test for a moderate sample size and to use the BIC for a large sample size.

Author Contributions

Conceptualization, S.P.; methodology, A.M. and S.P.; formal analysis, A.M.; writing—original draft preparation, S.P. and A.M.; writing—review and editing, S.P and A.M.; supervision, S.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Natural Science and Engineering Research Council of Canada, grant account number 875700 given to Sudhir Paul at the University of Windsor.

Data Availability Statement

Datasets are available in Appendix D.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AIC	Akaike information criterion
BB	beta-binomial
BIC	Bayesian information criterion
GLM	generalized linear model
LRT	likelihood ratio test
OLS	ordinary linear regression model

Appendix A. Derivation of the Score Test Statistic

Suppose that

δ = β_{j}

and

θ = {(β_{0}, β_{1}, \dots, β_{j - 1}, β_{j + 1}, \dots, β_{p})}^{'}

. We define the partial derivatives of the log-likelihood evaluated at

δ = 0

as

\begin{matrix} \begin{matrix} ψ = \frac{\partial l}{\partial δ} |_{δ = 0} = [\frac{\partial l}{\partial β_{j}}] |_{δ = 0} and γ = \frac{\partial l}{\partial θ} |_{δ = 0} = {[\frac{\partial l}{\partial β_{0}}, \frac{\partial l}{\partial β_{1}}, \dots, \frac{\partial l}{\partial β_{j - 1}}, \frac{\partial l}{\partial β_{j + 1}}, \dots, \frac{\partial l}{\partial β_{p}}]}^{'} |_{δ = 0} . \end{matrix} \end{matrix}

The

C (α)

test is based on the adjusted score

S = \frac{\partial l}{\partial δ} - B \frac{\partial l}{\partial θ}

, where B is the matrix of partial regression coefficients that is obtained by regressing

\frac{\partial l}{\partial δ}

on

\frac{\partial l}{\partial θ}

. The variance–covariance of S is

D - A B^{- 1} A^{'}

, where

D = E [- \frac{\partial^{2} l}{\partial β_{j}^{2}}] |_{δ = 0}

,

A = E [- \frac{\partial^{2} l}{\partial β_{j} \partial β_{k}}] |_{δ = 0}

(

k \neq j

), which is a

1 \times p

vector, and

B = E [- \frac{\partial^{2} l}{\partial β_{k} \partial β_{t}}] |_{δ = 0}

(k, t \neq j)

, which is a

p \times p

matrix. After replacing

θ

in

S, A, B

, and D with

\hat{θ}

, the

C (α)

statistic takes the form

S_{j} = S^{'} {(D - A B^{- 1} A^{'})}^{- 1} S,

which is approximately distributed as chi-squared with 1 degree of freedom.

Now, we define

\begin{matrix} w_{i} & = & {(\frac{\partial μ_{i}}{\partial η_{i}})}^{2} V_{i}^{- 1}, \\ W & = & diag (w_{1}, \dots, w_{n}) |_{β_{j} = 0}, x_{j} = {(x_{1 j}, \dots, x_{n j})}^{'}, and \\ X_{j} & = & [\begin{matrix} 1 & x_{11} & \dots & x_{1 (j - 1)} & x_{1 (j + 1)} & \dots & x_{1 p} \\ 1 & x_{21} & \dots & x_{2 (j - 1)} & x_{2 (j + 1)} & \dots & x_{2 p} \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ 1 & x_{n 1} & \dots & x_{n (j - 1)} & x_{n (j + 1)} & \dots & x_{n p} \end{matrix}] . \end{matrix}

Then

\begin{matrix} P_{j} = \frac{\partial l}{\partial β_{j}} |_{β_{j} = 0} = \sum_{i = 1}^{n} [w_{i} (y_{i} - μ_{i}) \frac{\partial η_{i}}{\partial μ_{i}} x_{i j}] |_{β_{j} = 0}, D = \sum_{i = 1}^{n} w_{i} x_{i j}^{2} |_{β_{j} = 0} = x_{j}^{'} W x_{j}, \\ A = \sum_{i = 1}^{n} x_{i j} w_{i} (1, x_{i 1}, \dots, x_{i (j - 1)}, x_{i (j + 1)}, \dots, x_{i p}) |_{β_{j} = 0} = x_{j}^{'} W X_{j}, \\ B = X_{j}^{'} W X_{j}, and D - A B^{- 1} A^{'} = x_{j}^{'} W [I_{n} - X_{j} {(X_{j}^{'} W X_{j})}^{- 1} X_{j}^{'} W] x_{j} . \end{matrix}

Substitute

β_{0}, β_{1}, \dots, β_{j - 1}, β_{j + 1}, \dots, β_{p}

with their MLEs under the null hypothesis. Then the score test statistic is

\begin{matrix} S_{j} = \frac{{\hat{P_{j}}}^{2}}{x_{j}^{'} \hat{W} (I_{n} - X_{j} {(X_{j}^{'} \hat{W} X_{j})}^{- 1} X_{j}^{'} \hat{W}) x_{j}} . \end{matrix}

(A1)

The above score test can also be obtained from [30]. For testing a subset q of the regression parameters equal to zero, Pregibon [30] obtains a score test given by

P S_{q} = s^{'} X_{q} {(X_{q}^{'} W^{\frac{1}{2}} M_{p} W^{\frac{1}{2}} X_{q})}^{- 1} X_{q}^{'} s,

where

M_{p} = I - W^{\frac{1}{2}} X_{p} {(X_{p}^{'} W X_{p})}^{- 1} X_{p}^{'} W^{\frac{1}{2}}

,

W = W^{\frac{1}{2}} W^{\frac{1}{2}}

and

S = s X

.

Using

q = 1

in the above, the score test becomes

P S_{1} = S^{'} {(x_{j}^{'} W^{\frac{1}{2}} M_{p} W^{\frac{1}{2}} x_{j})}^{- 1} S

.

Now

\begin{matrix} x_{j}^{'} W^{\frac{1}{2}} M_{p} W^{\frac{1}{2}} x_{j} & = & x_{j}^{'} W^{\frac{1}{2}} (I - W^{\frac{1}{2}} X_{p} {(X_{p}^{'} W X_{p})}^{- 1} X_{p}^{'} W^{\frac{1}{2}}) W^{\frac{1}{2}} x_{j} \\ = & x_{j}^{'} W^{\frac{1}{2}} W^{\frac{1}{2}} x_{j} - x_{j}^{'} W^{\frac{1}{2}} W^{\frac{1}{2}} X_{p} {(X_{p}^{'} W X_{p})}^{- 1} X_{p}^{'} W^{\frac{1}{2}} W^{\frac{1}{2}} x_{j} \\ = & x_{j}^{'} W x_{j} - x_{j}^{'} W X_{p} {(X_{p}^{'} W X_{p})}^{- 1} X_{p}^{'} W x_{j} \\ = & x_{j}^{'} W [I_{n} - X_{p} {(X_{p}^{'} W X_{p})}^{- 1} X_{p}^{'} W] x_{j} . \end{matrix}

Therefore,

\begin{matrix} P S_{1} & = & S^{'} {(x_{j}^{'} W^{\frac{1}{2}} M_{p} W^{\frac{1}{2}} x_{j})}^{- 1} S \\ = & \frac{S^{2}}{x_{j}^{'} W [I_{n} - X_{p} {(X_{p}^{'} W X_{p})}^{- 1} X_{p}^{'} W] x_{j}}, \end{matrix}

which, after replacing S with

\hat{S}

, is identical to

S_{j}

.

Special Cases:

Expressions for the three test statistics,

L R N_{j}

,

W_{j}

, and

S_{j}

, are provided for special cases wherein the data distribution is normal, Poisson, and binomial, respectively.

(i): For the $N (μ, σ^{2})$ distribution with link function $η_{i} = μ_{i}$ , these statistics are

$\begin{matrix} L R N_{j} & = & \frac{1}{σ^{2}} [{(y_{i} - {\hat{μ}}_{i})}^{2} - {(y_{i} - {\tilde{μ}}_{i})}^{2}], \\ W N_{j} & = & \frac{{\tilde{β}}_{j}}{\sqrt{\frac{1}{{\tilde{σ}}^{2}} \sum_{i = 1}^{n} x_{i j}^{2}}}, and \\ S N_{j} & = & \frac{{(\sum_{i = 1}^{n} (\frac{y_{i} - {\hat{μ}}_{i}}{{\hat{σ}}^{2}}) x_{i j})}^{2}}{x_{j}^{'} W (I - X_{j} {(X_{j}^{'} W X_{j})}^{- 1} X_{j}^{'} W) x_{j}}, \end{matrix}$

where ${\tilde{μ}}_{i} = {\tilde{β}}_{0} + {\tilde{β}}_{1} x_{i 1} + \dots + {\tilde{β}}_{p} x_{i p}$ , ${\hat{μ}}_{i} = {\hat{β}}_{0} + {\hat{β}}_{1} x_{i 1} + \dots + {\hat{β}}_{j - 1} x_{i (j - 1)} + {\hat{β}}_{j + 1} x_{i (j + 1)} + \dots + {\hat{β}}_{p} x_{i p}$ , $W = diag (1 / {\hat{σ}}^{2}, \dots, 1 / {\hat{σ}}^{2})$ and ${\hat{σ}}^{2} = (\sum_{i = 1}^{n} {(y_{i} - {\hat{μ}}_{i})}^{2}) / n$ .
(ii): For the $Poisson (λ)$ distribution, the link function is $η_{i} = log (λ_{i})$ . After derivation and simplification, we obtain the corresponding test statistics for Poisson-distributed data as

$\begin{matrix} L R P_{j} & = & 2 [(y_{i} log {\tilde{λ}}_{i} - {\tilde{λ}}_{i}) - (y_{i} log {\hat{λ}}_{i} - {\hat{λ}}_{i})], \\ W P_{j} & = & \frac{{\tilde{β}}_{j}}{\sqrt{\sum_{i = 1}^{n} {\tilde{λ}}_{i} x_{i j}^{2}}}, and \\ S P_{j} & = & \frac{{(\sum_{i = 1}^{n} (y_{i} - {\hat{λ}}_{i}) x_{i j})}^{2}}{x_{j}^{'} W (I - X_{j} {(X_{j}^{'} W X_{j})}^{- 1} X_{j}^{'} W) x_{j}}, \end{matrix}$

where ${\tilde{λ}}_{i} = exp ({\tilde{β}}_{0} + {\tilde{β}}_{1} x_{i 1} + \dots + {\tilde{β}}_{p} x_{i p})$ , ${\hat{λ}}_{i} = exp ({\hat{β}}_{0} + {\hat{β}}_{1} x_{i 1} + \dots + {\hat{β}}_{j - 1} x_{i (j - 1)} + {\hat{β}}_{j + 1} x_{i (j + 1)} + \dots + {\hat{β}}_{p} x_{i p})$ and $W = diag ({\hat{λ}}_{1}, \dots, {\hat{λ}}_{n})$ .
(iii): Finally, for the $binomial (m, p)$ distribution with the link function $η_{i} = log (\frac{p_{i}}{1 - p_{i}})$ , the corresponding statistics are

$\begin{matrix} L R B_{j} & = & 2 [(y_{i} log \frac{{\tilde{p}}_{i}}{1 - {\tilde{p}}_{i}} + m_{i} log (1 - {\tilde{p}}_{i})) - (y_{i} log \frac{{\hat{p}}_{i}}{1 - {\hat{p}}_{i}} + m_{i} log (1 - {\hat{p}}_{i}))], \\ W B_{j} & = & \frac{{\tilde{β}}_{j}}{\sqrt{\sum_{i = 1}^{n} m_{i} {\tilde{p}}_{i} (1 - {\tilde{p}}_{i}) x_{i j}^{2}}}, and \\ S B_{j} & = & \frac{{(\sum_{i = 1}^{n} (y_{i} - m_{i} {\hat{p}}_{i}) x_{i j})}^{2}}{x_{j}^{'} W (I - X_{j} {(X_{j}^{'} W X_{j})}^{- 1} X_{j}^{'} W) x_{j}}, \end{matrix}$

where $\frac{{\tilde{p}}_{i}}{1 - {\tilde{p}}_{i}} = exp ({\tilde{β}}_{0} + {\tilde{β}}_{1} x_{i 1} + \dots + {\tilde{β}}_{p} x_{i p})$ , $\frac{{\hat{p}}_{i}}{1 - {\hat{p}}_{i}} = exp ({\hat{β}}_{0} + {\hat{β}}_{1} x_{i 1} + \dots + {\hat{β}}_{j - 1} x_{i (j - 1)} + {\hat{β}}_{j + 1} x_{i (j + 1)} + \dots + {\hat{β}}_{p} x_{i p})$ and $W = diag (m_{1} {\hat{p}}_{1} (1 - {\hat{p}}_{1}), \dots, m_{n} {\hat{p}}_{n} (1 - {\hat{p}}_{n}))$ .

Appendix B. First- and Second-Order Partial Derivatives of the Log-likelihood of the Negative Binomial Regression Model with Respect to Parameters β and c

\begin{matrix} \frac{\partial l}{\partial β_{j}} & = & \sum_{i = 1}^{n} \frac{y_{i} - m_{i}}{1 + c m_{i}} x_{i j}, \\ \frac{\partial l}{\partial c} & = & \sum_{i = 1}^{n} [\frac{log (1 + c m_{i})}{c^{2}} - \frac{m_{i} (y_{i} + c^{- 1})}{1 + c m_{i}} + \sum_{l = 1}^{y_{i}} \frac{l - 1}{1 + c (l - 1)}], \\ \frac{\partial^{2} l}{\partial β_{j} \partial β_{k}} & = & - \sum_{i = 1}^{n} [\frac{y_{i} + 2 c m_{i} y_{i} - c m_{i}^{2}}{{(1 + c m_{i})}^{2}} - \frac{y_{i} - m_{i}}{1 + c m_{i}}] x_{i j} x_{i k}, \\ \frac{\partial^{2} l}{\partial β_{j} \partial c} & = & - \sum_{i = 1}^{n} \frac{m_{i} (y_{i} - m_{i})}{{(1 + c m_{i})}^{2}} x_{i j}, and \\ \frac{\partial^{2} l}{\partial c^{2}} & = & - \sum_{i = 1}^{n} [\sum_{l = 1}^{y_{i}} {(\frac{l - 1}{1 + c (l - 1)})}^{2} + 2 c^{- 3} log (1 + c m_{i}) - \frac{2 c^{- 2} m_{i}}{(1 + c m_{i})} - \frac{(y_{i} + c^{- 1}) m_{i}^{2}}{{(1 + c m_{i})}^{2}} . \end{matrix}

Appendix C. First- and Second-Order Partial Derivatives of the Log-likelihood of the Beta-Binomial Regression Model with Respect to Parameters β and θ. The Denominator Term V_j of the Score Test in Section 4.2.1

\begin{matrix} \frac{\partial l}{\partial β_{j}} & = \sum_{i = 1}^{n} [\sum_{r = 0}^{y_{i} - 1} \frac{1}{μ_{i} + r θ} - \sum_{r = 0}^{k_{i} - y_{i} - 1} \frac{1}{1 - μ_{i} + r θ}] μ_{i} (1 - μ_{i}) x_{i j}, \\ \frac{\partial l}{\partial θ} & = \sum_{i = 1}^{n} [\sum_{r = 0}^{y_{i} - 1} \frac{r}{μ_{i} + r θ} + \sum_{r = 0}^{k_{i} - y_{i} - 1} \frac{r}{1 - μ_{i} + r θ} - \sum_{r = 0}^{k_{i} - 1} \frac{r}{1 + r θ}], \\ \frac{\partial^{2} l}{\partial β_{j} \partial β_{k}} & = - \sum_{i = 1}^{n} [\sum_{r = 0}^{y_{i} - 1} \frac{1}{{(μ_{i} + r θ)}^{2}} + \sum_{r = 0}^{k_{i} - y_{i} - 1} \frac{1}{{(1 - μ_{i} + r θ)}^{2}}] μ_{i}^{2} {(1 - μ_{i})}^{2} x_{i j} x_{i k} + \\ \sum_{i = 1}^{n} [\sum_{r = 0}^{y_{i} - 1} \frac{1}{μ_{i} + r θ} - \sum_{r = 0}^{k_{i} - y_{i} - 1} \frac{1}{1 - μ_{i} + r θ}] μ_{i} (1 - μ_{i}) (1 - 2 μ_{i}) x_{i j} x_{i k}, \\ \frac{\partial^{2} l}{\partial β_{j} \partial θ} & = \sum_{i = 1}^{n} [- \sum_{r = 0}^{y_{i} - 1} \frac{r}{{(μ_{i} + r θ)}^{2}} + \sum_{r = 0}^{k_{i} - y_{i} - 1} \frac{r}{{(1 - μ_{i} + r θ)}^{2}}] μ_{i} (1 - μ_{i}) x_{i j}, and \\ \frac{\partial^{2} l}{\partial θ^{2}} & = \sum_{i = 1}^{n} [- \sum_{r = 0}^{y_{i} - 1} \frac{r^{2}}{{(μ_{i} + r θ)}^{2}} - \sum_{r = 0}^{k_{i} - y_{i} - 1} \frac{r^{2}}{{(1 - μ_{i} + r θ)}^{2}} + \sum_{r = 0}^{k_{i} - 1} \frac{r^{2}}{{(1 + r θ)}^{2}}] . \end{matrix}

The denominator term of the score test in Section 4.2.1:

\begin{matrix} V_{j} = x_{j}^{'} [W - (W - \frac{1}{a} U U^{'}) X_{j} V_{1}^{- 1} X_{j}^{'} W - (I - W X_{j} {(X_{j}^{'} W X_{j})}^{- 1} X_{j}^{'}) U V_{2}^{- 1} U^{'}] x_{j}, \\ W = diag (w_{1}, \dots, w_{n}), U = {(u_{1}, \dots, u_{n})}^{'}, \\ w_{i} = (p_{1 i} + p_{2 i}) μ_{i}^{2} {(1 - μ_{i})}^{2}, u_{i} = \frac{1}{θ} [- μ_{i} p_{1 i} + (1 - μ_{i}) p_{2 i}] μ_{i} (1 - μ_{i}), \\ V_{1} = X_{j}^{'} (W - \frac{1}{a} U U^{'}) X_{j}, V_{2} = a - U^{'} X_{j} {(X_{j}^{'} W X_{j})}^{- 1} X_{j}^{'} U, \\ a = \frac{1}{θ^{2}} \sum_{i = 1}^{n} (μ_{i}^{2} p_{1 i} + {(1 - μ_{i})}^{2} p_{2 i} - p_{3 i}), p_{1 i} = \sum_{r = 1}^{k_{i}} \frac{\Pr (y_{i} \geq r)}{{[μ_{i} + (r - 1) θ]}^{2}}, \\ p_{2 i} = \sum_{r = 1}^{k_{i}} \frac{\Pr (y_{i} \leq k_{i} - r)}{{[1 - μ_{i} + (r - 1) θ]}^{2}}, and p_{3 i} = \sum_{r = 1}^{k_{i}} \frac{1}{{(1 + r θ)}^{2}} . \end{matrix}

Appendix D. Dataset

Table A1. Example 1 dataset: the lower respiratory illness count.

LRI	Risk	Passive	Crowding	Race	Socioeconomic	Socioeconomic	Age	Age
					Status (Low)	Status (Medium)	<4	4–6
(y)	$(x_{1})$	$(x_{2})$	$(x_{3})$	$(x_{4})$	$(x_{5})$	$(x_{6})$	$(x_{7})$	$(x_{8})$
0	45	0	0	0	1	0	0	1
0	34	1	1	1	0	1	0	1
0	38	0	0	0	1	0	0	1
0	44	0	0	0	1	0	0	1
4	30	1	1	0	0	1	0	1
0	42	0	0	0	1	0	0	1
0	11	0	0	1	0	1	1	0
0	38	0	0	0	0	1	0	1
0	40	1	0	0	0	1	0	1
0	37	0	0	0	1	0	0	1
0	42	1	1	1	0	1	0	1
5	35	1	1	1	0	0	0	1
0	40	0	0	0	0	0	0	1
0	38	0	0	0	1	0	0	1
0	41	0	0	0	1	0	0	1
1	27	1	0	0	0	1	0	1
1	31	1	0	0	0	1	0	1
0	41	0	0	0	1	0	0	1
0	39	0	1	0	0	1	0	1
2	23	1	1	1	0	1	0	1
1	43	1	0	0	1	0	0	1
0	36	1	1	0	0	1	0	1
0	7	0	1	1	1	0	0	0
1	41	1	1	1	0	1	0	1
3	37	1	1	1	0	0	0	1
1	30	0	1	1	1	0	0	1
4	38	1	0	1	0	0	0	1
0	31	0	0	0	0	1	0	1
4	39	1	1	0	1	0	0	1
0	29	1	1	0	0	0	0	1
0	40	1	0	0	1	0	0	1
1	35	1	1	0	0	1	0	1
0	38	1	1	1	1	0	0	1
0	36	1	0	1	1	0	0	1
0	5	0	1	1	0	1	0	0
3	40	0	1	0	0	0	0	1
0	14	1	1	1	0	1	1	0
1	27	1	0	1	0	1	0	1
0	40	0	0	0	1	0	0	1
0	33	0	1	1	0	1	0	1
0	4	0	0	0	0	0	0	0
1	29	1	1	1	0	1	0	1
0	43	0	0	0	1	0	0	1
1	37	1	0	0	1	0	0	1
1	36	1	1	1	0	1	0	1
0	43	0	1	0	1	0	0	1
2	37	0	1	0	0	1	0	1
0	44	1	0	0	1	0	0	1
0	18	1	1	1	0	0	0	1
0	43	1	0	0	1	0	0	1

Table A2. Example 2 dataset: coronary heart disease.

Count	sbp	Tobacco	ldl	Adiposity	Famhist	Typea	Obesity	Alcohol	Age
$(y)$	$(x_{1})$	$(x_{2})$	$(x_{3})$	$(x_{4})$	$(x_{5})$	$(x_{6})$	$(x_{7})$	$(x_{8})$	$(x_{9})$
0	118	1.62	9.01	21.70	Absent	59	25.89	21.19	40
0	162	2.92	3.63	31.33	Absent	62	31.59	18.51	42
0	124	0.61	2.69	17.15	Present	61	22.76	11.55	20
1	134	1.10	3.54	20.41	Present	58	24.54	39.91	39
1	154	2.40	5.63	42.17	Present	59	35.07	12.86	50
0	136	1.36	3.16	14.97	Present	56	24.98	7.30	24
1	130	0.08	5.59	25.42	Present	50	24.98	6.27	43
0	128	0.73	3.97	23.52	Absent	54	23.81	19.20	64
0	112	1.44	2.71	22.92	Absent	59	24.81	0.00	52
0	132	0.10	3.28	10.73	Absent	73	20.42	0.00	17
0	120	0.00	2.42	16.66	Absent	46	20.16	0.00	17
0	128	0.40	6.17	26.35	Absent	64	27.86	11.11	34
0	124	1.80	3.74	16.64	Present	42	22.26	10.49	20
0	158	13.50	5.04	30.79	Absent	54	24.79	21.50	62
0	128	0.00	3.22	26.55	Present	39	26.59	16.71	49
1	148	8.20	7.75	34.46	Present	46	26.53	6.04	64
1	174	3.50	5.26	21.97	Present	36	22.04	8.33	59
0	152	10.10	4.71	24.65	Present	65	26.21	24.53	57
1	122	4.18	9.05	29.27	Present	44	24.05	19.34	52
1	110	2.35	3.36	26.72	Present	54	26.08	109.80	58
0	123	0.05	4.61	13.69	Absent	51	23.23	2.78	16
1	134	8.08	1.55	17.50	Present	56	22.65	66.65	31
1	132	12.30	5.96	32.79	Present	57	30.12	21.50	62
1	168	9.00	8.53	24.48	Present	69	26.18	4.63	54
0	194	2.55	6.89	33.88	Present	69	29.33	0.00	41
0	110	4.64	4.55	30.46	Absent	48	30.90	15.22	46
0	130	4.00	2.40	17.42	Absent	60	22.05	0.00	40
0	124	0.00	3.04	17.33	Absent	49	22.04	0.00	18
1	176	6.00	3.98	17.20	Present	52	21.07	4.11	61
0	130	4.50	5.86	37.43	Absent	61	31.21	32.30	58
0	114	0.00	2.99	9.74	Absent	54	46.58	0.00	17
0	176	5.76	4.89	26.10	Present	46	27.30	19.44	57
0	124	4.00	6.65	30.84	Present	54	28.40	33.51	60
0	142	7.44	5.52	33.97	Absent	47	29.29	24.27	54
0	148	0.00	5.32	26.71	Present	52	32.21	32.78	27
0	114	0.00	3.83	19.40	Present	49	24.86	2.49	29
0	140	0.00	2.40	27.89	Present	70	30.74	144.00	29
1	124	1.60	7.22	39.68	Present	36	31.50	0.00	51
1	164	5.60	3.17	30.98	Present	44	25.99	43.20	53
0	162	5.60	4.24	22.53	Absent	29	22.91	5.66	60
0	152	12.18	4.04	37.83	Present	63	34.57	4.17	64
0	132	0.00	3.30	21.61	Absent	42	24.92	32.61	33
1	144	0.76 1	0.53	35.66	Absent	63	34.35	0.00	55
0	118	0.08	3.48	32.28	Present	52	29.14	3.81	46
1	134	8.80	7.41	26.84	Absent	35	29.44	29.52	60
1	128	0.00	8.41	28.82	Present	60	26.86	0.00	59
0	154	5.53	3.20	28.81	Present	61	26.15	42.79	42
0	138	0.00	3.96	24.70	Present	53	23.80	0.00	45
0	120	0.00	3.98	13.19	Present	47	21.89	0.00	16
0	154	4.20	5.59	25.02	Absent	58	25.02	1.54	43

References

Kadane, J.; Lazar, N. Methods and Criteria for Model Selection. J. Am. Stat. Assoc. 2004, 99, 279–290. [Google Scholar] [CrossRef]
Beale, E.M.L. Note on Procedures for Variable Selection in Multiple Regression. Technometrics 1970, 12, 909–914. [Google Scholar] [CrossRef]
Kutner, M.H.; Nachtsheim, C.J.; Neter, J.; Li, W. Applied Linear Statistical Models, 5th ed.; McGraw-Hill: New York, NY, USA, 2013. [Google Scholar]
Akaike, H. A New Look at the Statistical Model Identification. IEEE Trans. Autom. Control 1974, 19, 716–723. [Google Scholar] [CrossRef]
Schwarz, G. Estimating the Dimension of a Model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
Fan, J. Design-adaptive Nonparametric Regression. J. Am. Stat. Assoc. 1992, 87, 998–1004. [Google Scholar] [CrossRef]
Eilers, P.H.C.; Marx, B.D. Flexible Smoothing with B-splines and Penalties. Statist. Sci. 1996, 11, 89–121. [Google Scholar] [CrossRef]
Bock, M.E.; Pliego, G. Estimating Functions with Wavelets Part II: Using a Daubechies Wavelet in Nonparametric Regression. Stat. Comput. Stat. Graph. Newsl. 1992, 3, 27–34. [Google Scholar]
Tay, J.K.; Narasimhan, B.; Hastie, T. Elastic Net Regularization Paths for All Generalized Linear Models. J. Stat. Softw. 2023, 106, 1–31. [Google Scholar] [CrossRef]
Xia, L.; Nan, B.; Li, Y. Debiased Lasso for Generalized Linear Models with a Diverging Number of Covariates. Biometrics 2023, 79, 344–357. [Google Scholar] [CrossRef]
Pavone, F.; Piironen, J.; Bürkner, P.C.; Vehtari, A. Using Reference Models in Variable Selection. Comput. Stat. 2023, 38, 349–371. [Google Scholar] [CrossRef]
Mazumder, R.; Radchenko, P.; Dedieu, A. Subset Selection with Shrinkage: Sparse Linear Modeling when the SNR is Low. Oper. Res. 2023, 71, 129–147. [Google Scholar] [CrossRef]
Cai, T.T.; Guo, Z.; Ma, R. Statistical Inference for High-Dimensional Generalized Linear Models with Binary Outcomes. J. Am. Stat. Assoc. 2023, 118, 1319–1332. [Google Scholar] [CrossRef] [PubMed]
Han, Y.; Tsay, R.S.; Wu, W.B. High Dimensional Generalized Linear Models for Temporal Dependent Data. Bernoulli 2023, 29, 105–131. [Google Scholar] [CrossRef]
Li, S.; Zhang, L.; Cai, T.T.; Li, H. Estimation and Inference for High-Dimensional Generalized Linear Models with Knowledge Transfer. J. Am. Stat. Assoc. 2023. [Google Scholar] [CrossRef]
Xu, S.; Ferreira, M.A.R.; Porter, E.M.P.; Franck, C.T. Bayesian Model Selection for Generalized Linear Mixed Models. Biometrics 2023, 2023, 1–13. [Google Scholar] [CrossRef]
Arnastauskaite, J.; Ruzgas, T.; Bražėnas, M. A New Goodness of Fit Test for Multivariate Normality and Comparative Simulation Study. Mathematics 2021, 9, 3003. [Google Scholar] [CrossRef]
Di Noia, A.; Barabesi, L.; Marcheselli, M.; Pisani, C.; Pratelli, L. Goodness-of-fit Test for Count Distributions with Finite Second Moment. J. Nonparametric Stat. 2022, 35, 19–37. [Google Scholar] [CrossRef]
Deng, D.; Paul, S.R. Score Tests for Zero-inflation in Generalized Linear Models. Can. J. Stat. 2000, 27, 563–570. [Google Scholar] [CrossRef]
Deng, D.; Paul, S.R. Score Tests for Zero-inflation and Over-dispersion in Generalized Linear Models. Stat. Sin. 2005, 15, 257–276. [Google Scholar]
Deng, D.; Paul, S.R. Goodness of Fit of Product Multinomial Regression Models to Sparse Data. Sankhya B 2016, 78, 78–95. [Google Scholar] [CrossRef]
Erlemann, R.; Lindqvist, B.H. Conditional Goodness-of-fit Tests for Discrete Distributions. J. Stat. Theory Pract. 2022, 16, 8. [Google Scholar] [CrossRef]
Ozonur, D.; Paul, S. Goodness of Fit Tests of the Two-Parameter Gamma Distribution against the Three-Parameter Generalized Gamma Distribution. Commun. Stat.-Simul. Comput. 2022, 51, 687–697. [Google Scholar] [CrossRef]
Paul, S.R.; Deng, D. Assessing Goodness of Fit of Generalized Linear Models to Sparse Data using Higher Order Moment Corrections. Sankhya B 2012, 74, 195–210. [Google Scholar] [CrossRef]
Rao, C.R. Large Sample Tests of Statistical Hypotheses Concerning Several Parameters with Applications to Problems of Estimation. Proc. Camb. Philos. Soc. 1947, 44, 50–57. [Google Scholar]
Neyman, J. Optimal Asymptotic Tests for Composite Hypothesis. In Probability and Statistics: Harold Cramer Volume; Grenander, U., Ed.; Wiley: New York, NY, USA, 1959. [Google Scholar]
Rao, C.R. Score Test: Historical Review and Recent Developments. In Advances in Ranking and Selection, Multiple Comparisons, and Reliability-Methodology and Applications; Balakrishna, N., Kannan, N., Nagaraja, H.N., Eds.; Statistics for Industry and Technology; Springer: Berlin/Heidelberg, Germany, 2005; pp. 3–20. [Google Scholar]
Nelder, J.A.; Wedderburn, R.W.M. Generalized Linear Models. J. R. Stat. Soc. A 1972, 135, 370–384. [Google Scholar] [CrossRef]
McCullagh, P.; Nelder, J.A. Generalized Linear Models, 2nd ed.; Chapman and Hall: London, UK, 1989. [Google Scholar]
Pregibon, D. Score Tests in GLIM with Applications. Lect. Notes Stat. 1982, 14, 87–97. [Google Scholar]
Williams, D.A. The Analysis of Binary Responses from Toxicological Experiments Involving Reproduction and Teratogenicity. Biometrics 1975, 31, 949–952. [Google Scholar] [CrossRef]
Paul, S.R. Analysis of Proportions of Affected Foetuses in Teratological Experiments. Biometrics 1982, 38, 361–370. [Google Scholar] [CrossRef]
Anscombe, F.J. The Statistical Analysis of Insect Counts Based on the Negative Binomial Distribution. Biometrics 1949, 5, 165–173. [Google Scholar] [CrossRef]
Bliss, C.I.; Fisher, R.A. Fitting the Negative Binomial Distribution to Biological Data. Biometrics 1953, 9, 176–200. [Google Scholar] [CrossRef]
Bohning, D.; Dietz, E.; Schlattmann, P.; Mendonca, L.; Kirchner, U. The Zero-Inflated Poisson Model and the Decayed, Missing and Filled Teeth Index in Dental Epidemiology. J. R. Stat. Soc. Ser. A 1999, 162, 195–209. [Google Scholar] [CrossRef]
Margolin, B.H.; Kaplan, N.; Zeiger, E. Statistical Analysis of the Ames Salmonella/microsome Test. Proc. Nat. Acad. Sci. USA 1981, 76, 3779–3783. [Google Scholar] [CrossRef] [PubMed]
McCaughran, D.A.; Arnold, D.W. Statistical Models for Members of Implantation Sites and Embryonic Deaths in Mice. Toxicol. Appl. Pharmacol. 1976, 38, 325–333. [Google Scholar] [CrossRef] [PubMed]
Breslow, N.E. Extra-Poisson Variation in Log-linear Models. Appl. Stat. 1984, 33, 38–44. [Google Scholar] [CrossRef]
Engel, J. Models for Response Data Showing Extra-Poisson Variation. Stat. Neerl. 1984, 38, 159–167. [Google Scholar] [CrossRef]
Lawless, J.F. Negative Binomial and Mixed Poisson Regression. Can. J. Stat. 1987, 15, 209–225. [Google Scholar] [CrossRef]
Margolin, B.H.; Kim, B.S.; Risko, K.J. The Ames salmonella/microsome Mutagenicityassay: Issues of Inference and Validation. J. Am. Stat. Assoc. 1989, 84, 651–661. [Google Scholar]
Piegorsch, W.W. Maximum Likelihood Estimation for the Negative Binomial Dispersion Parameter. Biometrics 1990, 46, 863–867. [Google Scholar] [CrossRef]
LaVange, L.M.; Keyes, L.L.; Koch, G.G.; Margolis, P.E. Application of Sample Survey Methods for Modelling Ratios to Incidence Densities. Stat. Med. 1994, 13, 343–355. [Google Scholar] [CrossRef]
Cameron, A.C.; Trivedi, P.K. Regression Analysis of Count Data; Cambridge University Press: Cambridge, UK, 1998. [Google Scholar]
Rousseauw, J.; du Plessis, J.; Benade, A.; Jordaan, P.; Kotze, J.; Ferreira, J. Coronary Risk Factor Screening in Three Rural Communities. S. Af. Med. J. 1983, 64, 430–436. [Google Scholar]

Table 1. Empirical level (EL) and power (in %) of the four test statistics, based on 10,000 replications and

α = 0.05

.

Table 1. Empirical level (EL) and power (in %) of the four test statistics, based on 10,000 replications and

α = 0.05

.

Distribution	Size	Test	EL	Empirical Power
	(n)		EL	$β_{2}$
			0.00	0.05	0.10	0.15	0.20	0.25	0.30	0.35	0.40	0.45	0.50
Normal	10	Score	7.80	7.85	7.90	8.17	8.66	9.06	9.76	10.54	11.19	12.09	13.34
		Wald	9.29	9.37	9.64	9.87	10.31	10.87	11.62	12.50	12.99	14.36	15.45
		LRT	11.57	11.63	11.89	12.19	12.92	13.26	14.38	14.98	15.49	16.87	18.61
		F	5.15	5.28	5.45	5.55	5.70	6.12	6.66	7.46	7.86	8.53	9.64
	20	Score	6.57	6.58	6.80	7.47	8.01	9.14	10.38	13.23	14.31	16.39	19.35
		Wald	7.17	7.11	7.45	8.11	8.68	9.96	11.19	14.36	15.34	17.58	20.46
		LRT	7.92	7.99	8.30	8.94	9.57	11.08	12.22	15.56	16.81	19.03	22.14
		F	5.32	5.52	5.64	6.26	6.82	7.82	8.82	11.25	12.38	14.31	17.23
	30	Score	5.96	6.01	6.45	7.73	8.67	10.77	12.85	16.16	19.26	22.98	25.66
		Wald	6.42	6.35	6.75	8.15	9.17	11.43	13.38	16.78	20.08	23.76	26.58
		LRT	6.83	6.78	7.42	8.71	9.84	12.17	14.18	17.78	21.15	24.95	27.76
		F	5.33	5.45	5.89	6.91	7.83	9.67	11.61	15.12	17.94	21.10	23.72
	50	Score	5.61	6.14	6.60	8.20	11.25	14.64	17.61	22.10	28.57	34.03	40.51
		Wald	5.77	6.42	6.79	8.43	11.62	15.18	18.05	22.56	29.11	34.70	41.09
		LRT	6.09	6.75	7.09	8.80	12.05	15.71	18.48	23.25	29.88	35.45	41.96
		F	5.28	5.60	6.19	7.77	10.46	13.84	16.70	21.12	27.56	32.80	39.32
Poisson	10	Score	5.09	5.96	7.80	11.02	17.16	22.56	30.25	37.84	47.75	54.68	62.69
		Wald	4.59	5.37	7.22	10.22	16.24	21.53	29.00	36.56	46.37	53.54	61.65
		LRT	5.42	6.33	8.04	11.39	16.41	23.16	30.59	38.42	48.64	55.56	63.54
		F	0.09	0.14	0.22	0.28	0.32	0.40	0.53	0.73	1.05	1.13	1.41
	20	Score	4.74	6.00	11.56	19.42	31.01	44.85	58.27	70.13	79.45	87.37	91.98
		Wald	4.61	5.84	11.41	19.11	30.64	44.53	57.90	69.80	79.14	87.11	91.86
		LRT	4.80	6.14	11.66	19.58	31.19	45.12	58.73	70.39	79.58	87.53	92.25
		F	0.02	0.00	0.01	0.04	0.04	0.14	0.22	0.57	0.97	1.24	1.98
	30	Score	4.83	8.50	15.38	27.95	45.02	62.09	76.17	86.93	93.34	96.69	98.59
		Wald	4.79	8.44	15.30	27.70	44.89	61.93	76.06	86.85	93.28	96.65	98.55
		LRT	4.82	8.59	15.37	28.00	45.24	62.10	76.31	87.08	93.45	96.74	98.58
		F	0.01	0.01	0.01	0.01	0.12	0.23	0.31	0.80	1.66	2.20	3.70
	50	Score	4.85	9.42	22.10	44.88	66.33	83.94	93.96	97.85	99.51	99.89	99.98
		Wald	4.84	9.43	22.04	44.78	66.27	83.91	93.93	97.83	99.50	99.89	99.98
		LRT	4.81	9.45	22.12	44.94	66.40	84.08	93.98	97.87	99.51	99.88	99.98
		F	0.00	0.00	0.01	0.02	0.11	0.30	0.96	2.45	5.05	9.43	13.82

Table 2. Empirical level and power (in %) of the four test statistics in binomial distribution; based on 10,000 replications and

α = 0.05

.

Table 2. Empirical level and power (in %) of the four test statistics in binomial distribution; based on 10,000 replications and

α = 0.05

.

Size		EL	Empirical Power
$(m, n)$	Test	EL	$β_{2}$
		0.00	0.05	0.10	0.15	0.20	0.25	0.30	0.35	0.40	0.45	0.50
(10,10)	Score	4.87	5.58	7.03	9.23	12.38	16.21	21.11	27.23	33.16	40.14	46.17
	Wald	4.44	5.02	6.32	8.44	11.47	15.10	19.56	25.73	31.54	38.25	44.46
	LRT	5.20	6.05	7.39	9.66	12.95	16.83	21.88	28.03	33.95	40.80	46.80
	F	0.55	0.80	0.96	1.01	1.47	2.16	2.80	3.75	5.08	6.72	8.14
(30,10)	Score	4.94	6.77	10.71	17.53	26.81	37.90	50.07	60.89	70.26	77.76	83.17
	Wald	4.81	6.58	10.49	17.19	26.38	37.43	49.57	60.50	69.78	77.40	82.99
	LRT	5.05	6.88	10.88	17.76	26.96	38.15	50.34	61.11	70.45	78.02	83.37
	F	0.00	0.00	0.00	0.00	0.00	0.00	0.05	0.04	0.13	0.07	0.01
(40,10)	Score	4.96	6.79	12.59	21.55	33.67	47.59	60.52	71.35	79.71	85.84	89.87
	Wald	4.83	6.57	12.38	21.26	33.33	47.18	60.11	71.01	79.42	85.69	89.78
	LRT	5.03	6.89	12.68	21.69	33.86	47.66	60.71	71.50	79.89	85.88	89.99
	F	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.01	0.01	0.02	0.01
(10,20)	Score	5.37	6.12	9.71	14.26	21.50	30.95	41.20	51.76	61.75	70.84	78.90
	Wald	5.04	5.89	9.27	13.74	20.94	30.15	40.49	51.07	60.94	70.00	78.30
	LRT	5.47	6.27	9.90	14.58	21.78	31.22	41.64	52.40	62.17	71.10	79.27
	F	0.19	0.23	0.45	0.79	1.55	2.85	4.70	7.35	11.37	16.63	22.80
(30,20)	Score	5.30	8.02	17.95	33.59	51.18	68.57	82.74	90.62	95.47	97.97	98.99
	Wald	5.20	7.94	17.81	33.32	50.86	68.33	82.59	90.45	95.42	97.96	98.98
	LRT	5.33	8.12	18.10	33.73	51.35	68.67	82.87	90.75	95.49	97.97	99.00
	F	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.02	0.06	0.12
(40,20)	Score	5.37	9.14	22.10	42.02	62.80	79.42	90.62	95.66	98.24	99.40	99.71
	Wald	5.31	9.10	21.96	41.84	62.70	79.31	90.54	95.64	98.24	99.39	99.71
	LRT	5.37	9.23	22.16	42.16	62.92	79.48	90.68	95.69	98.26	99.41	99.71
	F	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.02	0.00
(10,30)	Score	5.35	7.04	11.01	19.80	31.08	44.40	58.13	69.15	79.91	88.09	92.52
	Wald	5.23	6.82	10.74	19.33	30.47	43.89	57.64	68.81	79.55	87.88	92.31
	LRT	5.40	7.18	11.18	19.98	31.33	44.61	58.45	69.51	80.09	88.27	92.59
	F	0.08	0.19	0.36	1.05	2.26	4.53	8.67	13.76	22.16	32.39	42.40
(30,30)	Score	5.21	10.21	24.62	47.11	70.88	86.07	94.95	98.05	99.37	99.85	99.94
	Wald	5.17	10.12	24.51	46.89	70.73	85.97	94.84	98.04	99.37	99.85	99.94
	LRT	5.26	10.19	24.67	47.23	70.95	86.17	94.98	98.07	99.39	99.85	99.94
	F	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.03	0.04	0.02
(40,30)	Score	5.29	11.95	31.16	57.93	81.60	93.48	98.17	99.49	99.88	99.98	99.98
	Wald	5.22	11.90	30.99	57.81	81.56	93.45	98.16	99.49	99.88	99.98	99.98
	LRT	5.30	11.97	31.24	57.94	81.68	93.52	98.18	99.49	99.88	99.98	99.98
	F	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.02	0.01
(10,50)	Score	5.24	7.77	16.28	30.41	46.97	65.41	79.71	89.45	95.40	98.27	99.32
	Wald	5.12	7.66	16.15	30.08	46.73	65.09	79.53	89.30	95.30	98.24	99.31
	LRT	5.26	7.79	16.43	30.59	47.16	65.53	79.80	89.54	95.48	98.28	99.33
	F	0.03	0.05	0.52	1.89	4.12	10.02	19.47	33.85	49.03	64.54	78.18
(30,50)	Score	4.91	12.89	37.83	69.89	89.30	97.70	99.88	99.98	100	100	100
	Wald	4.89	12.88	37.76	69.77	89.22	97.68	99.88	99.98	100	100	100
	LRT	4.96	12.95	37.93	69.89	89.31	97.70	99.88	99.98	100	100	100
	F	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.01	0.10	0.53	2.02
(40,50)	Score	4.92	15.75	47.95	81.18	95.42	99.37	99.96	99.99	100	100	100
	Wald	4.92	15.72	47.87	81.16	95.42	99.37	99.96	99.99	100	100	100
	LRT	4.97	15.78	47.91	81.20	95.44	99.37	99.96	99.99	100	100	100
	F	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.01	0.00	0.06

Table 3. Empirical level and power (in %) of model selection by forward selection using the score test (Forward-S), forward selection using the F test (Forward-F), AIC, and BIC; based on 10,000 replications.

Distribution	Size	Method	EL	Empirical Power
	$(n)$		EL	$β_{1}$
			0.00	0.05	0.10	0.15	0.20	0.25	0.30	0.35	0.40	0.45	0.50
Normal	10	Forward-F	5.09	5.48	5.34	5.82	6.10	6.11	6.77	7.86	7.95	9.11	9.56
		AIC	30.94	31.66	32.31	32.47	33.05	33.12	34.20	35.41	36.08	37.18	40.22
		BIC	27.17	27.83	28.32	28.56	28.86	29.19	30.09	31.34	32.35	33.20	36.25
	20	Forward-F	4.96	5.58	5.67	6.22	7.32	7.93	9.61	10.84	12.85	15.20	16.50
		AIC	21.40	21.53	22.72	23.69	25.72	27.66	29.55	31.41	35.01	39.22	41.26
		BIC	12.32	12.53	13.19	13.77	15.72	17.21	18.82	21.18	23.85	27.16	29.55
	30	Forward-F	4.68	5.25	5.76	6.79	8.10	10.39	11.57	14.40	17.55	21.24	23.58
		AIC	18.87	20.39	20.30	22.35	25.01	28.12	30.94	35.33	39.38	44.07	48.34
		BIC	8.23	8.94	9.59	10.83	12.68	15.27	17.33	20.97	24.54	28.57	31.46
	50	Forward-F	5.18	5.41	6.24	8.36	10.35	13.80	17.54	21.67	26.71	32.20	38.93
		AIC	18.28	18.94	19.99	23.94	27.39	33.15	38.08	43.62	49.93	57.08	63.04
		BIC	5.87	6.19	6.98	9.34	11.46	14.98	18.97	23.36	28.91	34.45	41.03
Poisson	10	Forward-S	6.93	8.53	9.84	12.27	16.58	22.27	27.73	32.98	41.26	47.11	53.98
		AIC	19.08	21.77	23.40	26.61	32.98	39.36	46.25	51.88	59.80	64.21	70.92
		BIC	16.28	18.64	20.73	23.63	29.57	35.66	42.66	48.44	56.68	61.13	68.34
	20	Forward-S	7.01	8.33	12.20	19.26	28.33	38.91	49.33	62.48	71.04	79.04	86.55
		AIC	18.06	19.59	26.85	36.34	48.18	59.23	70.53	80.46	86.63	91.11	94.96
		BIC	10.87	12.16	17.63	25.56	36.63	47.58	59.69	71.45	79.01	85.75	91.56
	30	Forward-S	6.52	8.53	15.21	25.88	40.22	54.97	69.85	80.33	89.06	93.48	96.82
		AIC	17.13	20.23	31.01	44.98	62.01	75.49	85.95	92.23	96.60	98.16	99.20
		BIC	8.08	10.41	18.21	30.06	45.41	60.83	74.85	84.68	92.06	95.46	97.89
	50	Forward-S	5.75	9.46	21.49	41.32	62.66	79.40	90.61	96.69	98.83	99.62	99.87
		AIC	15.71	22.87	40.92	63.18	80.95	91.90	97.02	99.16	99.80	99.9	99.98
		BIC	5.47	9.23	21.42	41.32	62.73	79.56	91.02	96.79	99.02	99.64	99.90
Binomial	10	Forward-S	6.93	9.14	14.03	20.82	30.86	41.57	53.03	64.12	71.59	78.82	84.68
		AIC	18.54	21.34	28.43	38.00	49.14	60.52	70.59	78.97	84.12	88.78	92.23
		BIC	16.03	18.54	25.18	34.54	45.64	57.23	67.65	76.69	82.38	87.15	91.19
	20	Forward-S	6.73	9.99	20.98	37.45	56.55	73.18	84.48	92.07	95.86	98.36	99.06
		AIC	17.38	22.81	39.18	59.48	75.77	88.02	94.06	97.41	98.81	99.70	99.86
		BIC	10.43	14.57	28.20	47.29	66.26	80.88	89.85	95.08	97.84	99.22	99.64
	30	Forward-S	6.20	12.21	29.28	53.98	75.26	90.12	96.24	98.97	99.77	99.90	99.97
		AIC	16.88	26.33	49.26	74.42	89.40	96.81	99.14	99.80	99.95	100	100
		BIC	7.71	14.84	33.73	59.68	80.04	92.56	97.47	99.34	99.83	99.97	99.99
	50	Forward-S	5.47	16.04	46.67	77.80	94.77	99.22	99.82	100	100	100	100
		AIC	16.47	33.71	68.36	90.77	98.64	99.87	99.99	100	100	100	100
		BIC	5.46	15.90	46.31	77.76	94.90	99.30	99.84	100	100	100	100

Table 4. Empirical level and power (in %) of model selection by the backward elimination using the score test (Backward-S), AIC, and BIC; based on 10,000 replications.

Distribution	Size	Method	EL	Empirical Power
	$(n)$		EL	$β_{1}$
			0.00	0.05	0.10	0.15	0.20	0.25	0.30	0.35	0.40	0.45	0.50
Poisson	10	Backward-S	6.59	7.25	8.25	9.90	12.15	16.09	19.75	22.56	26.59	29.76	33.67
		AIC	35.48	37.7	39.87	43.03	48.86	54.29	61.07	65.17	70.79	74.85	79.51
		BIC	34.35	36.51	38.77	42.07	47.77	53.03	59.91	63.91	69.77	73.80	78.79
	20	Backward-S	6.66	7.15	9.79	13.71	18.82	23.73	29.53	35.50	40.57	44.49	48.54
		AIC	23.46	25.19	32.43	42.1	53.71	63.91	75.02	83.75	89.28	93.09	96.12
		BIC	18.87	20.04	26.49	35.34	46.71	56.64	68.2	78.5	84.57	90.13	94.28
	30	Backward-S	6.99	7.08	11.56	16.74	23.67	30.16	37.95	42.89	48.23	51.69	54.39
		AIC	18.93	22.24	32.84	46.96	63.86	77.16	87.24	93.00	97.03	98.46	99.36
		BIC	11.29	13.78	22.39	34.51	50.31	65.56	78.82	87.31	93.67	96.67	98.6
	50	Backward-S	6.06	8.20	12.94	21.34	30.77	39.20	45.37	51.57	55.14	55.27	57.57
		AIC	15.89	23.24	41.25	63.51	81.14	92.08	97.13	99.18	99.8	99.93	99.98
		BIC	5.95	10.13	22.58	42.86	64.18	80.7	91.72	97.13	99.15	99.73	99.93
Binomial	10	Backward-S	6.74	8.12	11.36	16.68	22.07	28.85	36.36	42.39	47.11	51.64	55.65
		AIC	33.33	36.93	43.45	52.19	62.19	71.31	79.35	85.67	89.08	92.54	94.67
		BIC	32.25	35.80	42.18	50.71	61.03	70.03	78.41	84.8	88.44	91.94	94.23
	20	Backward-S	6.48	8.15	15.13	23.96	34.61	44.79	53.02	59.69	65.46	69.08	72.63
		AIC	21.8	27.19	44.05	63.96	79.23	90.16	95.44	97.96	99.11	99.79	99.90
		BIC	16.83	21.59	36.70	55.78	72.89	86.00	93.01	96.57	98.60	99.54	99.76
	30	Backward-S	6.20	9.55	18.07	30.22	43.49	53.57	61.60	67.77	72.89	76.92	79.85
		AIC	18.30	27.81	50.87	75.69	90.16	97.16	99.25	99.83	99.96	100	100
		BIC	10.48	17.95	37.48	63.68	83.03	94.09	98.04	99.54	99.87	99.98	99.99
	50	Backward-S	5.13	10.18	23.20	38.71	51.39	61.85	68.37	75.06	79.83	84.1	87.91
		AIC	16.65	34.05	68.59	90.88	98.67	99.87	99.99	100	100	100	100
		BIC	5.75	16.59	47.32	78.74	95.36	99.36	99.86	100	1000	100	100

Table 5. Empirical level and power (in %) of the three test statistics in negative binomial distribution; based on 10,000 replications and

α = 0.05

.

Table 5. Empirical level and power (in %) of the three test statistics in negative binomial distribution; based on 10,000 replications and

α = 0.05

.

Size	Test	EL	Empirical Power
$(n)$		EL	$β_{2}$
		0.00	0.05	0.10	0.15	0.20	0.25	0.30	0.35	0.40	0.45	0.50
10	Score	4.58	5.53	9.24	15.84	23.66	34.23	45.39	55.89	64.35	71.95	79.30
	Wald	6.74	7.87	13.19	20.99	31.70	44.16	55.27	66.22	74.06	80.57	86.99
	LRT	6.07	7.04	11.84	19.66	29.60	41.73	52.85	63.78	72.03	78.74	85.78
20	Score	4.85	8.07	17.01	32.94	51.67	68.32	81.37	90.17	95.25	97.66	98.83
	Wald	6.43	10.09	20.59	38.04	56.92	73.49	85.09	92.65	96.72	98.41	99.26
	LRT	5.82	9.30	19.09	36.14	55.04	71.87	84.03	91.87	96.33	98.2	99.12
30	Score	4.70	10.18	25.29	47.87	69.99	87.17	94.84	98.37	99.46	99.82	99.93
	Wald	5.80	11.98	28.29	52.41	73.44	89.53	95.97	98.81	99.62	99.89	99.95
	LRT	5.35	11.18	26.89	50.50	72.06	88.64	95.46	98.65	99.56	99.87	99.95
50	Score	4.96	12.95	40.10	71.07	91.21	97.96	99.62	99.93	100	100	100
	Wald	5.78	14.40	42.59	73.56	92.28	98.20	99.71	99.94	100	100	100
	LRT	5.37	13.73	41.57	72.42	91.80	98.12	99.68	99.94	100	100	100

Table 6. Empirical level and power (in %) of model selection by forward selection using the score test, AIC, and BIC in the negative binomial distribution; based on 10,000 replications.

Size	Method	EL	Empirical Power
$(n)$		EL	$β_{1}$
		0.00	0.05	0.10	0.15	0.20	0.25	0.30	0.35	0.40	0.45	0.50
10	Forward	3.80	4.05	5.04	7.06	9.57	13.23	16.84	22.06	27.63	32.55	38.05
	AIC	23.55	24.18	30.50	37.17	45.99	56.40	64.41	72.19	78.25	83.23	87.64
	BIC	20.53	21.22	27.51	33.70	42.32	52.79	60.92	69.25	75.76	80.95	85.77
20	Forward	4.81	5.49	8.90	13.82	21.25	29.95	40.27	51.03	61.39	70.03	79.00
	AIC	19.08	24.46	38.04	54.20	71.16	83.52	90.73	95.47	97.63	99.03	99.41
	BIC	11.30	15.37	26.39	41.39	59.74	74.27	84.46	91.46	95.12	97.66	98.73
30	Forward	4.86	6.68	11.77	20.47	33.26	46.77	61.55	72.86	83.32	89.71	93.74
	AIC	18.26	26.68	46.99	69.74	85.26	94.46	98.03	99.46	99.77	99.95	100
	BIC	8.59	14.62	30.52	53.73	74.19	87.76	94.80	98.08	99.20	99.76	99.99
50	Forward	4.85	8.15	18.64	34.59	54.58	73.04	86.24	94.13	97.85	99.00	99.78
	AIC	18.26	26.68	46.40	26.61	32.98	39.36	46.25	51.88	59.80	64.21	70.92
	BIC	8.59	14.62	20.73	23.63	29.57	35.66	42.66	48.44	56.68	61.13	68.34

Table 7. Empirical level and power (in %) of the three test statistics; based on 10,000 replications and

α = 0.05

.

Table 7. Empirical level and power (in %) of the three test statistics; based on 10,000 replications and

α = 0.05

.

Distribution	Size	Test	EL	Empirical Power
	$(n)$		EL	$β_{2}$
			0.00	0.05	0.10	0.15	0.20	0.25	0.30	0.35	0.40	0.45	0.50
NB to Pois.	10	Score	7.92	9.26	15.19	23.46	34.74	47.84	58.98	69.68	77.17	83.44	89.08
		Wald	7.73	9.11	14.90	23.17	34.33	47.53	58.61	69.26	76.85	83.16	88.84
		LRT	7.98	9.31	15.23	23.78	34.91	48.34	59.44	69.99	77.33	83.62	89.32
	20	Score	8.33	12.17	24.19	42.71	62.02	77.80	88.14	94.48	97.58	98.83	99.53
		Wald	8.28	12.11	24.10	42.58	61.90	77.64	88.11	94.42	97.56	98.83	99.53
		LRT	8.28	12.20	24.21	42.76	62.01	77.91	88.31	94.49	97.63	98.86	99.53
	30	Score	7.69	14.74	33.00	57.59	77.87	91.86	97.23	99.15	99.76	99.96	99.96
		Wald	7.67	14.73	32.94	57.50	77.80	91.84	97.22	99.15	99.76	99.96	99.96
		LRT	7.62	14.78	32.96	57.74	78.05	91.86	97.24	99.18	99.78	99.96	99.96
	50	Score	7.92	18.15	49.02	78.27	94.37	98.83	99.85	99.95	100	100	100
		Wald	7.91	18.15	49.00	78.28	94.36	98.83	99.85	99.95	100	100	100
		LRT	7.92	18.13	49.08	78.34	94.36	98.83	99.85	99.95	100	100	100
Pois. to NB	10	Score	3.83	5.07	9.42	16.01	26.59	38.92	51.19	62.05	71.11	78.51	84.43
		Wald	5.05	6.54	11.97	19.91	32.07	45.73	58.95	69.46	78.07	84.68	89.49
		LRT	4.65	6.14	11.24	18.81	30.96	44.46	57.66	68.34	76.92	83.88	88.83
	20	Score	3.65	6.71	18.19	35.68	57.81	75.53	86.97	93.93	97.18	98.80	99.51
		Wald	4.43	7.68	20.59	39.13	61.52	78.51	88.95	95.19	97.86	99.21	99.65
		LRT	4.09	7.36	19.67	38.01	60.26	77.71	88.50	94.85	97.64	99.11	99.63
	30	Score	3.85	9.31	26.83	53.71	78.41	91.98	97.28	99.22	99.84	99.96	99.99
		Wald	4.39	10.46	28.85	56.44	80.19	92.86	97.79	99.39	99.86	99.97	99.99
		LRT	4.22	10.08	28.21	55.37	79.54	92.49	97.59	99.34	99.86	99.97	99.99
	50	Score	4.17	14.03	44.50	78.06	94.87	99.23	99.93	99.97	99.99	100	100
		Wald	4.55	14.75	46.05	79.27	95.24	99.34	99.94	99.97	99.99	100	100
		LRT	4.43	14.39	45.41	78.72	95.09	99.30	99.94	99.97	99.99	100	100

Table 8. Empirical level and power (in %) of the three test statistics in the beta-binomial distribution; based on 10,000 replications and

α = 0.05

.

Table 8. Empirical level and power (in %) of the three test statistics in the beta-binomial distribution; based on 10,000 replications and

α = 0.05

.

Size	Test	EL	Empirical Power
$(n)$		EL	$β_{2}$
		0.00	0.05	0.10	0.15	0.20	0.25	0.30	0.35	0.40	0.45	0.50
10	Score	8.07	8.85	11.69	13.26	15.86	20.09	22.95	27.64	32.08	36.80	40.88
	Wald	13.93	15.11	17.76	19.56	23.39	27.78	32.58	37.63	42.07	48.50	53.38
	LRT	10.47	11.79	14.42	16.20	19.25	23.86	27.99	33.90	38.49	44.55	49.44
20	Score	7.47	8.60	10.95	16.12	21.30	27.54	34.97	40.67	47.27	52.61	56.11
	Wald	8.27	9.69	12.85	18.56	26.55	34.81	43.85	53.47	62.57	69.59	75.50
	LRT	7.32	8.64	11.54	17.16	24.94	32.72	42.01	51.20	60.89	67.81	74.05
30	Score	6.96	8.34	12.24	18.16	26.18	35.61	44.60	51.94	43.45	48.16	52.51
	Wald	7.00	8.90	13.82	21.43	32.45	45.17	57.16	68.25	70.52	79.41	84.83
	LRT	6.38	8.26	12.93	20.42	31.25	43.81	55.75	67.02	69.66	78.48	84.08
50	Score	6.71	8.92	15.22	25.40	37.17	49.66	59.78	67.86	73.32	75.47	77.70
	Wald	5.96	8.85	16.96	30.61	47.57	63.78	77.83	87.73	93.88	96.92	98.77
	LRT	5.79	8.46	16.44	29.99	46.63	63.05	77.15	87.28	93.56	96.68	98.70

Table 9. Empirical level and power (in %) of model selection by forward selection using the Wald test, AIC, and BIC in beta-binomial distribution; based on 10,000 replications.

Size	Method	EL	Empirical Power
$(n)$		EL	$β_{1}$
		0.00	0.05	0.10	0.15	0.20	0.25	0.30	0.35	0.40	0.45	0.50
30	Wald	8.00	8.60	11.05	15.63	19.55	26.44	33.20	40.07	47.89	55.62	62.26
	AIC	20.27	21.83	25.90	32.61	38.57	47.83	55.36	63.74	70.63	77.03	82.12
	BIC	9.92	10.52	13.60	18.90	23.42	31.26	38.88	46.96	55.07	62.05	68.82
50	Wald	6.81	8.23	12.29	20.45	28.20	39.97	50.62	60.60	71.56	79.17	84.76
	AIC	18.14	21.01	28.53	38.35	50.43	60.29	71.55	80.06	86.90	91.73	94.95
	BIC	6.25	8.03	12.23	19.71	29.17	38.47	50.31	61.24	70.99	79.57	85.19
70	Wald	6.04	8.36	14.36	24.21	37.05	51.27	64.91	76.67	85.00	91.12	95.13
	AIC	17.69	21.19	30.15	44.09	58.35	72.09	82.02	89.70	94.61	97.24	98.47
	BIC	4.72	7.02	11.54	21.50	33.59	48.09	60.94	73.40	83.64	89.92	94.26
100	Wald	5.83	8.79	17.88	32.00	49.54	66.68	79.74	89.67	95.59	98.10	99.19
	AIC	16.57	22.11	36.75	53.64	71.18	83.24	91.81	96.37	98.57	99.42	99.84
	BIC	3.27	6.14	13.75	26.19	42.38	60.44	74.74	86.19	93.02	96.69	98.56

Table 10. Analysis of the number of times of lower respiratory infection data: Variables to enter the model using the forward selection procedure through the score test, Wald test, LRT test, AIC, and BIC for the Poisson and negative binomial regression models.

Method	Poisson Regression Model				Negative Binomial Regression Model
Method	First Step	Second Step	Third Step	4th Step	First Step	Second Step	Third Step
Score	$x_{2}$	$x_{3}$			$x_{2}$	$x_{3}$
Wald	$x_{2}$	$x_{3}$			$x_{2}$	$x_{3}$
LRT	$x_{2}$	$x_{3}$	$x_{8}$		$x_{2}$	$x_{3}$	$x_{8}$
AIC	$x_{2}$	$x_{3}$	$x_{8}$	$x_{6}$	$x_{2}$	$x_{3}$	$x_{8}$
BIC	$x_{2}$	$x_{3}$	$x_{8}$		$x_{2}$	$x_{3}$	$x_{8}$

Table 11. Variables to enter the model using the forward selection procedure through the score test, Wald test, LRT test, AIC, and BIC for the binomial regression model.

Method	Binomial Regression Model
Method	First Step	Second Step	Third Step	4th Step	5th Step
Score	$x_{3}$	$x_{5}$
Wald	$x_{3}$	$x_{5}$
LRT	$x_{9}$	$x_{5}$	$x_{3}$
AIC	$x_{9}$	$x_{5}$	$x_{3}$	$x_{8}$	$x_{6}$
BIC	$x_{9}$	$x_{5}$	$x_{3}$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mamun, A.; Paul, S. Model Selection in Generalized Linear Models. Symmetry 2023, 15, 1905. https://doi.org/10.3390/sym15101905

AMA Style

Mamun A, Paul S. Model Selection in Generalized Linear Models. Symmetry. 2023; 15(10):1905. https://doi.org/10.3390/sym15101905

Chicago/Turabian Style

Mamun, Abdulla, and Sudhir Paul. 2023. "Model Selection in Generalized Linear Models" Symmetry 15, no. 10: 1905. https://doi.org/10.3390/sym15101905

APA Style

Mamun, A., & Paul, S. (2023). Model Selection in Generalized Linear Models. Symmetry, 15(10), 1905. https://doi.org/10.3390/sym15101905

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Model Selection in Generalized Linear Models

Abstract

1. Introduction

2. Generalized Linear Model and the Test Statistics

2.1. Generalized Linear Model

2.2. The Test Statistics

2.2.1. The Likelihood Ratio Test and the Wald Test

2.2.2. The Score Test

2.2.3. The F Test

2.3. Simulation Study

3. Model Selection

3.1. Empirical Level and Power

3.1.1. Simulation Study

4. Over-Dispersed Poisson and Over-Dispersed Binomial Regression Models

4.1. Introduction and Motivation

4.2. Negative Binomial Regression Model

4.2.1. Derivation of the Test Statistics

4.2.2. Simulation Study

4.3. Beta-Binomial Regression Model

4.3.1. Derivation of the Test Statistics

4.3.2. Simulation Study

5. Real Data Analysis

6. Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Derivation of the Score Test Statistic

Appendix B. First- and Second-Order Partial Derivatives of the Log-likelihood of the Negative Binomial Regression Model with Respect to Parameters β and c

Appendix C. First- and Second-Order Partial Derivatives of the Log-likelihood of the Beta-Binomial Regression Model with Respect to Parameters β and θ. The Denominator Term Vj of the Score Test in Section 4.2.1

Appendix D. Dataset

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Appendix C. First- and Second-Order Partial Derivatives of the Log-likelihood of the Beta-Binomial Regression Model with Respect to Parameters β and θ. The Denominator Term V_j of the Score Test in Section 4.2.1