1. Introduction
The (reproductive)
,
class (known as the Tweedie class, cf., [
1]) is composed of all exponential dispersion models (EDMs) with variance functions (VFs) of the form
where
m is the mean,
is the mean parameter space,
is the dispersion parameter, and
is the power parameter (cf., [
2,
3] and the references cited therein).
Let
be an EDM belonging to the TBE class. Also, let
and
denote, respectively, the convex support and mean parameter space of
. Among the
class, the subclasses containing all absolutely continuous (with respect to the Lebesgue measure) models comprise the following cases (cf., [
2]):
When , is generated by a stable distribution with a stable index in supported on with i.e., , a proper subset of (interior of ), for all
When , is the normal EDM with .
When , is the gamma EDM with
When , is generated by a positive stable distribution with a stable index in supported on with for all
When
,
is the EDM generated by the Landau distribution, supported on
with
. It is absolutely continuous with respect to the Lebesgue measure on
and is the limit of EDMs having power VFs (see [
2,
3] for further details).
Two important aspects related to the above TBE models should be remarked at this point:
Complexity of the density function. Except for the normal (
), gamma (
) and inverse Gaussian (
) EDMs, no other
possesses an explicit density, in terms of algebraic functions. All such densities can only be expressed in terms of integral form or power series; hence, their evaluation becomes rather complicated. To resolve the problem, several studies have directly employed the saddlepoint approximation for density estimation on the
scale for
, as discussed in [
4,
5,
6,
7,
8]. The saddlepoint approximation can do so by substituting the part of the density that lacks a closed-form representation with a simple analytic expression. Additionally, the saddlepoint approximation can be utilized instead of traditional likelihood methods to derive the maximum likelihood estimation (MLE) of
(cf., [
6,
9]). Dunn created and maintained the Tweedie R package [
10], while [
11] contributed to and maintained the statmod R package. In this frame, the function
tweedie.profile in the tweedie R package practically enables the fit of TBE models. These packages can be extended to include the
as well.
Steepness. The model
is called
steep if
. Steepness is an essential property in two aspects: (1) First, it is related to the existence of the MLE of
m. Indeed, if
is steep and
are
n i.i.d. random variables drawn from
then the MLE of
denoted by
(sample average), exists with probability one, and is given by the gradient of the log-likelihood (cf., [
12], Theorem 9.29). (2) Second, steepness is a necessary condition for applying generalized linear models (GLMs) methodology to EDMs (cf., [
2,
6,
13]). Consequently, out of the absolutely continuous
models described above, only those with
are steep, as their mean parameter space
equals the interior of their convex support (i.e.,
for
and
for
). For any
is not steep, as its mean parameter space
is a proper subset of its interior convex support
.
GLM applications for
are straightforward and have been analyzed in various references (cf., [
13] and the references cited therein). GLM applications for
(
), are discussed and presented by [
6], who also maintained an R package (see [
10]) for these EDMs. Consequently, we are left with absolutely continuous
models supported on the whole real line (
As already noted, the
models for
are not steep—a fact which precludes them from being candidates for GLM analysis. This is quite unfortunate, as such a subclass comprises an infinite set of absolutely continuous EDMs (with respect to the Lebesgue measure) that are supported on the whole real line. Thus, the only remaining steep EDMs supported on the whole real line are the normal (
) and the EDM generated by the Landau distribution (
), where both are suitable for GLM applications. The normal EDM constitutes the classical linear regression model, whereas the
requires further analysis by GLM methodology, an analysis that establishes the core of this paper. Such an analysis would complement the results of [
6] and accomplish a complete analysis of all absolutely continuous TBE models.
The paper is organized as follows.
Section 2 presents some preliminaries on natural exponential families (NEFs), additive, and reproductive EDMs.
Section 3 introduces the
—the EDM generated by the Landau distribution and the GLM ingredients needed for its analysis. Mainly, we present its link function and total and scaled deviance. In
Section 4, we study its analysis of deviance, derive the asymptotic properties of the MLEs of the covariate parameters
, and obtain the asymptotic distribution of deviance, using the saddlepoint approximation.
Section 5 includes the estimation algorithm, a brief description of our R package, and simulation studies. In
Section 6, we provide the analysis of real data, including applications to three real datasets. It is demonstrated there that the GLM using the
performs better than the linear model based on the normal distribution. Some concluding remarks are presented in
Section 7. Proofs of statements (propositions, corollaries, and theorems) in this paper are relegated to
Appendix A.
2. Preliminaries: NEFs, Mean Value Representation, and Additive and Reproductive EDMs
NEFs. The preliminaries in the sequel hold for any positive Radon measure
on
. Without loss of generality, we confine our introduction to
where
is an absolutely continuous positive Radon measure with respect to the Lebesgue measure on the real line. The Laplace transform of
and its effective domain are defined, respectively, by
Let
, and assume
is non-empty. Then, the NEF generated by
h is defined by the densities of the form
where
is the cumulant transform of
L. The cumulant transform
is a real analytic on
, implying that the
r-th cumulant of
is given by
In particular, the mean, mean parameter space, and variance corresponding to (
1) are given, respectively, by
,
, and
. As
is strictly increasing, its inverse mapping
is well-defined. So, we denote by
the variance function (VF) corresponding to (
1). The pair
uniquely defines the NEF generated by
h within the class of NEFs (cf., [
14]). Also,
V is called the unit VF.
Mean value parameterization. For GLM applications and various other statistical aspects, it is necessary to express the NEF with densities (
1) in terms of its mean rather than in terms of the artificial parameter
(for details, see [
3]). Indeed, given a VF
then
and
are the primitives of
and
and, thus, are given by
implying that the mean value representation of (
1) is given by
Additive EDMs. The Jorgensen set related to (
1) is defined by (cf., [
2])
The set
is not empty, due to convolution. Moreover,
if
h is infinitely divisible, a valid property for all TBE members. Accordingly, the additive EDM (cf., [
2]) is defined by densities of the form
where the VF corresponding to the additive EDM is given by (
Reproductive EDMs. In general, for various statistical aspects, particularly for GLM applications, it is more effective to represent (
3) by resembling the normal structure. Such a representation, called the reproductive EDM, is obtained by mapping
, where
. Then, the densities of this mapping have the form (cf., [
2,
6,
15])
where
, and
is the support of
h. It is crucial to note that the structure in (
4) is not suitable for the discrete case (counting measures on
). This is because, for different
’s, it alters the support
of
h. In contrast, for the absolutely continuous case, the structure in (
4) is appropriate. The VF of the reproductive EDM (
4) is given by
where if
,
or
then
,
or
respectively.
4. Asymptotic Properties
This section deals with the saddlepoint approximation and asymptotic behavior of the MLEs of the parameters involved. The section establishes the central core of the asymptotic behavior of all the statistics required for the appropriate analysis of the deviance.
4.1. Asymptotic Properties of MLE
Let us start with the saddlepoint approximation (
14) below, which is essential in the asymptotic theory of GLMs. The exact distribution (
9) is challenging to handle, due to the cumbersome form of
(
7). The saddlepoint approximation neatly gets rid of it. For more details on this point, see Sections 1.5.3 and 3.5.1 in [
2] and Section 5.4.3 in [
6]. The following proposition presents the saddlepoint approximation for
Proposition 1. Let . Then, for sufficiently small the saddlepoint approximation for the density of Y is given bywhere and . The following corollary, an immediate consequence of Proposition 1, implies convergence to normality:
Corollary 1. Let ; then,where and denotes convergence in distribution. Corollary 1 provides the asymptotic normality for a single observation
y. For
with
, we have
where
Using (
16), the following theorem shows that the MLE of
is asymptotically normally distributed.
Theorem 1. Let be the MLE of β and let be the design matrix. If has bounded eigenvalues, thenwhere is the true parameter. 4.2. Analysis of the Deviance
With and known, we consider the distribution of deviance. We claim that when the saddlepoint approximation holds (and it does for ) then the scaled deviance distribution follows an approximate chi-square distribution.
Theorem 2. For the scaled deviance (12), we haveat the true values of . When
is unknown, it is replaced by its MLE
. Thus, we define the residual and scaled residual deviances as
and
As the GLM considered in
Section 3 is involved with
regression parameters, it follows that
Generally, the deviance is most useful not as an absolute measure of goodness-of-fit, but rather for comparing two nested models. For example, one may want to test whether incorporating an additional covariate significantly improves the model fit. In this case, the deviance can be employed to compare two nested GLMs that are based on the same EDM but have different fitted systematic components:
and
where
denotes the MLE of
under Model A,
denotes the MLE of
under Model B, and
is a covariate,
. Note that Model A is a special case of Model B, with
. Accordingly, we consider the following hypotheses, to determine if the simpler Model A is adequate to model the data:
We have previously observed that the total deviance captures that part of the log-likelihood that depends on
. Therefore, the following theorem holds, from which it can be seen that (
18) is a special case of Theorem 3:
Theorem 3. If φ is known, the likelihood ratio test (LRT) statistic for comparing Models A and B isThen, under the null hypothesis in (19), as . Consider the two models in Theorem 3 with both and unknown. Then, an estimate of is required. This is done in Theorem 4, which is deduced from Theorem 3:
Theorem 4. If φ is unknown, the appropriate statistic for comparing Model A with Model B iswhere is an estimate of φ based on Model B. Then, under the null hypothesis in (19), Proof. It suffices to prove the asymptotic independence of
and
. The proof is similar to Theorem 4.3 in reference [
17]. □
Note that our above statements about asymptotic distributions are all based on the assumption that . These results are called small-dispersion asymptotics, regardless of the sample size n. Large-sample asymptotics are also well-known and, hence, no further explanations are provided.
6. Real Data Analysis
We present the proposed estimation procedure through applications to three real datasets. The first and second datasets, grazing and hcrabs, are both from the R package ‘GLMsData’ (see [
6,
20]). The last one is Boston housing data.
6.1. Dataset “Grazing”
This dataset reveals the density of understorey birds across a series of sites located on either side of a stockproof fence, in two distinct areas. It has the potential to provide insights into the impact of habitat fragmentation on bird populations (cf., [
20]):
To verify the appropriateness of the GLM for the data, we evaluated the prediction performance of GLM and compared it with a linear model. We conducted 500 random splits of the 62 observations. In each split, we randomly selected 80% as the training set and the rest as the testing set , where 13 was the result of multiplying 62 by 20% and rounding up. We applied both GLM and the linear model to the training set and estimated the coefficients.
By applying Algorithm 1 for the
GLM, the estimates of
and
were
and
. Then, we predicted
by
, where
. We let
and calculated the mean squared error (MSE) as MSE
, where
. In the linear model, we estimated
as
, using least squares (without
). Then, we predicted
by
. We let
and calculated MSE
, where
.
Thus, we could compute the average and sd of the predictions’ MSEs of both models under 500 random splits. For GLM, we obtained the average and sd of the MSEs to be 0.111 and 0.017, respectively. For the linear model, we obtained the average and sd of the MSEs to be 0.760 and 0.238, respectively. Thus, it can be seen that the results for the GLM performed much better than the linear model for both average and sd. Additionally, we calculated the Bayesian information criterion (BIC) for both models, resulting in and , where LM denotes the linear model. It is evident that the BIC for GLM was significantly lower than that for the linear model, indicating a better model fit.
6.2. Dataset “Hcrabs”
This dataset describes the number of male crabs attached to female horseshoe crabs (cf., [
20]):
As we did with processing the first dataset above, we meticulously conducted 500 random splits of the 173 observations. In each split, we carefully selected 80% as the training set and the rest as the testing set , where 35 was the result of multiplying 173 by 20% and rounding up. We applied both GLM and the linear model, ensuring the validity of our approach.
The estimates of
and
for the
GLM were
and
. Then, we predicted
by
, where
. We let
and calculated MSE
, where
. In the linear model, we estimated
as
using least squares. Then, we predicted
by
. We let
and calculated MSE
, where
.
For GLM, we ascertained the average and sd of the MSEs to be 0.011 and 0.004, respectively. For the linear model, we ascertained the average and sd of the MSEs to be 0.837 and 0.276, respectively. Here, again, it can be seen that the results for the GLM performed much better than the linear model for both average and sd. We calculated the BIC for both models, resulting in and . It is clear that the BIC for GLM was lower than that for the linear model, indicating a superior model fit.
6.3. Dataset “Boston Housing”
This dataset is taken from Harrison Jr and Rubinfeld 1978, including 14 variables that were measured across 506 census tracts in the Boston area. The response variable can be the logarithm of the median value of the houses in those census tracts of the Boston Standard Metropolitan Statistical Area:
Again, we conducted 500 random splits of the 506 observations. In each split, we carefully selected 80% as the training set and the rest as the testing set , where 102 was the result of multiplying 506 by 20% and rounding up. We applied both GLM and the linear model, comparing the performance.
For the
GLM, the estimates of
and
were
and
. Then, we predicted
by
, where
. We let
and calculated MSE
, where
. In the linear model, we estimated
as
, using least squares. Then, we predicted
by
. We let
and calculated MSE
, where
.
For GLM, we ascertained the average and sd of the MSEs to be 0.031 and 0.009, respectively. For the linear model, we ascertained the average and sd of the MSEs to be 0.041 and 0.010, respectively. It can be seen that this dataset was appropriate for the linear model, and that our GLM could also fit well, which, to some extent, reflects the wide application of GLM. It can also be seen that the results for the GLM were slightly better than the linear model for both average and sd. We computed the BIC for both models, yielding and . The lower BIC of GLM compared to the linear model indicated a superior model fit.
7. Conclusions
In this paper, we were interested in GLM methodology applied to the —the EDM generated by the Landau distribution, an EDM supported on the real line. We introduced its density function, deviance, and link function. We considered the saddlepoint approximation approach for and then deduced the convergence of Y to normality. Based on the small dispersion and saddlepoint approximation, we derived that the asymptotic distribution of MLE for was normal. The analysis of deviance was also studied, considering different situations of and . In numerical studies, we first estimated and using Algorithm 1 and then evaluated its estimation performance. We reported averages of bias, standard deviations (SDs), and standard errors (SEs) in a simulation study. We demonstrated that the biases and SDs were relatively small and that the SDs were close to the SEs. As for applications to the three datasets of real data, the results for GLM showed much better performance than the linear models. To some extent, this indicated the widespread applications of . We also composed an R package for GLM applications of .
We trust that the proposed GLM will be well utilized in modeling more real data and various statistical purposes.