Next Article in Journal
Weighted Ranked Set Sampling for Skewed Distributions
Previous Article in Journal
Geometry Interaction Embeddings for Interpolation Temporal Knowledge Graph Completion
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Generalized Linear Model (GLM) Applications for the Exponential Dispersion Model Generated by the Landau Distribution

1
Faculty of Industrial Engineering and Technology Management, Holon Institute of Technology, Holon 5810201, Israel
2
School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai 200433, China
3
Yunnan Key Laboratory of Statistical Modeling and Data Analysis, Yunnan University, Kunming 650091, China
4
School of Business and Economics, Vrije University of Amsterdam, 1081 HV Amsterdam, The Netherlands
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(13), 2021; https://doi.org/10.3390/math12132021
Submission received: 6 June 2024 / Revised: 26 June 2024 / Accepted: 27 June 2024 / Published: 28 June 2024
(This article belongs to the Section Probability and Statistics)

Abstract

:
The exponential dispersion model (EDM) generated by the Landau distribution, denoted by EDM-EVF (exponential variance function), belongs to the Tweedie scale with power infinity. Its density function does not have an explicit form and, as of yet, has not been used for statistical aspects. Out of all EDMs belonging to the Tweedie scale, only two EDMs are steep and supported on the whole real line: the normal EDM with constant variance function and the EDM-EVF. All other absolutely continuous steep EDMs in the Tweedie scale are supported on the positive real line. This paper aims to accomplish an overall picture of all generalized linear model (GLM) applications belonging to the Tweedie scale by including the EDM-EVF. This paper introduces all GLM ingredients needed for its analysis, including the respective link function and total and scaled deviance. We study its analysis of deviance, derive the asymptotic properties of the maximum likelihood estimation (MLE) of the covariate parameters, and obtain the asymptotic distribution of deviance, using saddlepoint approximation. We provide numerical studies, which include estimation algorithm, simulation studies, and applications to three real datasets, and demonstrate that GLM using the EDM-EVF performs better than the linear model based on the normal EDM. An R package accompanies all of these.

1. Introduction

The (reproductive) TBE γ , γ ( , 0 ) , 0 , 1 , [ 2 , ] class (known as the Tweedie class, cf., [1]) is composed of all exponential dispersion models (EDMs) with variance functions (VFs) of the form
V ( m ) = φ m γ , φ > 0 , m M γ ,
where m is the mean, M γ is the mean parameter space, φ is the dispersion parameter, and γ is the power parameter (cf., [2,3] and the references cited therein).
Let F γ TBE γ be an EDM belonging to the TBE class. Also, let C γ and M γ denote, respectively, the convex support and mean parameter space of F γ . Among the TBE γ class, the subclasses containing all absolutely continuous (with respect to the Lebesgue measure) models comprise the following cases (cf., [2]):
  • When γ < 0 , F γ is generated by a stable distribution with a stable index in ( 1 , 2 ) supported on C γ = R with M γ = R + , i.e., M γ = R + i n t   C γ = R , a proper subset of i n t   C γ (interior of C γ ), for all γ < 0 .
  • When γ = 0 , F 0 is the normal EDM with M 0 = i n t   C 0 = R .
  • When γ = 2 , F 2 is the gamma EDM with M 2 = i n t   C 2 = R + .
  • When 2 < γ < , F γ is generated by a positive stable distribution with a stable index in ( 0 , 1 ) supported on C γ = R + with M γ = R + for all γ > 2 .
  • When γ = , F is the EDM generated by the Landau distribution, supported on C = R with M = R . It is absolutely continuous with respect to the Lebesgue measure on R and is the limit of EDMs having power VFs (see [2,3] for further details).
Two important aspects related to the above TBE models should be remarked at this point:
  • Complexity of the density function. Except for the normal ( γ = 0 ), gamma ( γ = 2 ) and inverse Gaussian ( γ = 3 ) EDMs, no other TBE γ possesses an explicit density, in terms of algebraic functions. All such densities can only be expressed in terms of integral form or power series; hence, their evaluation becomes rather complicated. To resolve the problem, several studies have directly employed the saddlepoint approximation for density estimation on the TBE γ scale for 2 < γ < , as discussed in [4,5,6,7,8]. The saddlepoint approximation can do so by substituting the part of the density that lacks a closed-form representation with a simple analytic expression. Additionally, the saddlepoint approximation can be utilized instead of traditional likelihood methods to derive the maximum likelihood estimation (MLE) of φ (cf., [6,9]). Dunn created and maintained the Tweedie R package [10], while [11] contributed to and maintained the statmod R package. In this frame, the function tweedie.profile in the tweedie R package practically enables the fit of TBE models. These packages can be extended to include the TBE as well.
  • Steepness. The model F γ TBE γ is called steep if M γ = i n t   C γ . Steepness is an essential property in two aspects: (1) First, it is related to the existence of the MLE of m. Indeed, if F γ is steep and Y 1 , , Y n are n i.i.d. random variables drawn from F γ then the MLE of m , denoted by m ^ = Y ¯ n (sample average), exists with probability one, and is given by the gradient of the log-likelihood (cf., [12], Theorem 9.29). (2) Second, steepness is a necessary condition for applying generalized linear models (GLMs) methodology to EDMs (cf., [2,6,13]). Consequently, out of the absolutely continuous TBE γ models described above, only those with γ 0 , [ 2 , ) , are steep, as their mean parameter space M γ equals the interior of their convex support (i.e., M γ = R for γ = 0 , and M γ = R + for γ [ 2 , ) ). For any γ < 0 , TBE γ is not steep, as its mean parameter space M γ = R + is a proper subset of its interior convex support R .
GLM applications for TBE γ , γ = 2 , 3 , are straightforward and have been analyzed in various references (cf., [13] and the references cited therein). GLM applications for TBE γ ( γ > 2 , γ 3 ), are discussed and presented by [6], who also maintained an R package (see [10]) for these EDMs. Consequently, we are left with absolutely continuous TBE γ models supported on the whole real line ( γ < 0 , γ = 0 , γ = ) . As already noted, the TBE γ models for γ < 0 are not steep—a fact which precludes them from being candidates for GLM analysis. This is quite unfortunate, as such a subclass comprises an infinite set of absolutely continuous EDMs (with respect to the Lebesgue measure) that are supported on the whole real line. Thus, the only remaining steep EDMs supported on the whole real line are the normal ( TBE 0 ) and the EDM generated by the Landau distribution ( TBE ), where both are suitable for GLM applications. The normal EDM constitutes the classical linear regression model, whereas the TBE requires further analysis by GLM methodology, an analysis that establishes the core of this paper. Such an analysis would complement the results of [6] and accomplish a complete analysis of all absolutely continuous TBE models.
The paper is organized as follows. Section 2 presents some preliminaries on natural exponential families (NEFs), additive, and reproductive EDMs. Section 3 introduces the TBE —the EDM generated by the Landau distribution and the GLM ingredients needed for its analysis. Mainly, we present its link function and total and scaled deviance. In Section 4, we study its analysis of deviance, derive the asymptotic properties of the MLEs of the covariate parameters β , and obtain the asymptotic distribution of deviance, using the saddlepoint approximation. Section 5 includes the estimation algorithm, a brief description of our R package, and simulation studies. In Section 6, we provide the analysis of real data, including applications to three real datasets. It is demonstrated there that the GLM using the TBE performs better than the linear model based on the normal distribution. Some concluding remarks are presented in Section 7. Proofs of statements (propositions, corollaries, and theorems) in this paper are relegated to Appendix A.

2. Preliminaries: NEFs, Mean Value Representation, and Additive and Reproductive EDMs

NEFs. The preliminaries in the sequel hold for any positive Radon measure μ ( d x ) on R . Without loss of generality, we confine our introduction to μ ( d x ) = h ( x ) d x , where h ( x ) d x is an absolutely continuous positive Radon measure with respect to the Lebesgue measure on the real line. The Laplace transform of h ( x ) and its effective domain are defined, respectively, by
L ( θ ) = R h ( x ) e θ x d x and D h = θ R : L ( θ ) < .
Let Θ = i n t   D h , and assume Θ is non-empty. Then, the NEF generated by h is defined by the densities of the form
h ( x ; θ ) = h ( x ) exp θ x k ( θ ) , θ Θ R ,
where k ( θ ) = ln L ( θ ) is the cumulant transform of L. The cumulant transform k ( θ ) is a real analytic on Θ , implying that the r-th cumulant of h ( x ; θ ) is given by κ r ( θ ) = d r k ( θ ) / d θ r . In particular, the mean, mean parameter space, and variance corresponding to (1) are given, respectively, by m =   k ( θ ) , M =   k ( Θ ) , and k ( θ ) , θ Θ . As k is strictly increasing, its inverse mapping ψ : M   Θ is well-defined. So, we denote by V ( m ) = k ( ψ ( m ) ) the variance function (VF) corresponding to (1). The pair ( V , M ) uniquely defines the NEF generated by h within the class of NEFs (cf., [14]). Also, V is called the unit VF.
Mean value parameterization. For GLM applications and various other statistical aspects, it is necessary to express the NEF with densities (1) in terms of its mean rather than in terms of the artificial parameter θ (for details, see [3]). Indeed, given a VF ( V , M ) then θ ( m ) and k ( θ ( m ) ) are the primitives of 1 / V ( m ) and m / V ( m ) and, thus, are given by
θ ( m ) = d m V ( m ) and k ( θ ( m ) ) = m d m V ( m ) ,
implying that the mean value representation of (1) is given by
h ( x ; θ ( m ) ) = h ( x ) exp θ ( m ) x k ( θ ( m ) ) , m M .
Additive EDMs. The Jorgensen set related to (1) is defined by (cf., [2])
Λ = p R + : p k ( θ ) is a cumulant transform of some density h ( ; p ) on R .
The set Λ is not empty, due to convolution. Moreover, Λ = R + if h is infinitely divisible, a valid property for all TBE members. Accordingly, the additive EDM (cf., [2]) is defined by densities of the form
h ( x ; θ , p ) = h ( x ; p ) exp θ x p k ( θ ) , θ Θ , p Λ = R + ,
where the VF corresponding to the additive EDM is given by ( p V ( m / p ) , p M ) .
Reproductive EDMs. In general, for various statistical aspects, particularly for GLM applications, it is more effective to represent (3) by resembling the normal structure. Such a representation, called the reproductive EDM, is obtained by mapping x y = φ x , where φ = 1 / p . Then, the densities of this mapping have the form (cf., [2,6,15])
f φ ( y ; θ ) = φ 1 h ( y φ 1 ; θ , φ 1 ) = h ( y φ 1 ; φ 1 ) exp φ 1 θ y k ( θ ) ,
where y φ C h , θ Θ , φ R + , and C h is the support of h. It is crucial to note that the structure in (4) is not suitable for the discrete case (counting measures on N ). This is because, for different φ ’s, it alters the support C h of h. In contrast, for the absolutely continuous case, the structure in (4) is appropriate. The VF of the reproductive EDM (4) is given by
( φ V ( m ) , φ M ) ,
where if i n t   C h = R , R + or R then φ M = R , R + or R , respectively.

3. GLM Applications for the EDM Generated by the Landau Distribution—Some Basics

In this section, we provide some required components for GLM applications. We first give an expression for the TBE density. We then present the related link function and scaled deviance.

3.1. Density Function

The TBE distribution is the EDM generated by the Landau distribution, known as a Tweedie model with power infinity, and it possesses a simple unit VF of the form ( e m , R ) . It is steep ( C = M = R ), infinitely divisible, skewed to the right, leptokurtic (i.e., has fatter tails), and absolutely continuous, supported on the whole real line. It was surveyed in detail and further developed by [3], and was named there as EDM-EVF (exponential VF). Its reproductive EDM density, of the form (4), is
f ( y ; θ , φ ) = h φ ( y ) exp φ 1 θ y k ( θ ) , y R , ( θ , φ ) R × R + ,
where
h φ ( y ) = 1 π 0 e ( 1 y ) t t log t φ t sin ( π t ) d t ,
k ( θ ) = θ θ ln ( θ ) ,
and VF
( V , M ) = ( φ e m , R ) .
The expressions for θ ( m ) and k ( θ ( m ) ) , needed for its mean value parameterization, are
θ ( m ) = d m V ( m ) = e m and k ( θ ( m ) ) = m d m V ( m ) = e m m + 1 .
Thus, the density (6) can be written as
f ( y ; m , φ ) = h φ ( y ) exp φ 1 e m y + e m m + 1 , y R , ( m , φ ) R × R + .
If Y f ( y ; m , φ ) then we write this as Y   TBE or we use the standard EDM notation and write Y EDM - EVF ( m , φ ) . The mean, variance, and cumulants of such a Y are
E ( Y ) = m , Var ( Y ) = φ e m , κ r ( m ) = ( r 2 ) ! φ r 1 e ( r 1 ) m , r 3 .

3.2. Scaled Deviance and Link Function

We shall now consider two essential ingredients needed for GLM applications of EDM-EVFs (9): namely, the scaled deviance and the link function. These were introduced by [2,15] (see also [6,16]). For GLM applications, we need the following ingredients. Consider
t ( y , m ) = y θ ( m ) k ( θ ( m ) ) = y e m + e m m + 1 .
It is evident that arg max m f ( y ; m , φ ) = arg max m t ( y , m ) . By taking the partial derivative of t ( y , m ) and setting it to zero, we obtain
t ( y , m ) m = y e m m e m = s e t 0 ,
implying that m = y maximizes t ( y , m ) (since TBE is steep). Hence, the unit deviance
d ( y , m ) = 2 t ( y , y ) t ( y , m ) = 2 e y + e m y m 1
can be considered as a distance measure with two properties: d ( y , y ) = 0 and d ( y , m ) > 0 for y m . GLMs assume a systematic component with the linear predictor
η = β 0 + j = 1 p β j x j .
This is linked to the mean m through a link function g, such that g ( m ) = η . For TBE , we choose the canonical (and simple) link function
η = g ( m ) = θ m = e m .
Let y = ( y 1 , , y n ) be a set of independent observations, where y i EDM - EVF ( m i , φ ) (assuming a single dispersion parameter) is associated with the link function η i = β 0 + j = 1 p x i j β j , i = 1 , . . . , n . Write X R n × ( p + 1 ) for the set of covariates, in which case η = X β and the total and scaled deviances are given, respectively, by
D ( y , m ) = i = 1 n d ( y i , m i )
and
D * ( y , m ) = D ( y , m ) φ = 2 φ i = 1 n e y i + e m i y i m i 1 ) .
Consequently, the log-likelihood is
l ( m , φ ; y ) = i = 1 n ln f ( y i ; y i , φ ) 1 2 D * ( y , m ) .
Let β ^ be the MLE of β = ( β 0 , β 1 , , β p ) . As in linear models, we aim to estimate β and obtain its asymptotic behavior.

4. Asymptotic Properties

This section deals with the saddlepoint approximation and asymptotic behavior of the MLEs of the parameters involved. The section establishes the central core of the asymptotic behavior of all the statistics required for the appropriate analysis of the deviance.

4.1. Asymptotic Properties of MLE

Let us start with the saddlepoint approximation (14) below, which is essential in the asymptotic theory of GLMs. The exact distribution (9) is challenging to handle, due to the cumbersome form of h φ ( y ) (7). The saddlepoint approximation neatly gets rid of it. For more details on this point, see Sections 1.5.3 and 3.5.1 in [2] and Section 5.4.3 in [6]. The following proposition presents the saddlepoint approximation for TBE .
Proposition 1.
Let Y   TBE . Then, for sufficiently small φ , the saddlepoint approximation for the density of Y is given by
f ( y ; m , φ ) 1 2 π φ V ( y ) exp 1 2 φ d ( y , m ) ,
where V ( y ) = e y and d ( y , m ) = 2 e y + e m y m 1 .
Proof. 
See Appendix A.    □
The following corollary, an immediate consequence of Proposition 1, implies convergence to normality:
Corollary 1.
Let Y   TBE ; then,
Y m φ d N ( 0 , V ( m ) ) , a s   φ 0 ,
where V ( m ) = e m and d denotes convergence in distribution.
Proof. 
See Appendix A.    □
Corollary 1 provides the asymptotic normality for a single observation y. For y = ( y 1 , . . . , y n ) with y i   TBE , we have
Y m φ d N ( 0 n , C ) , a s   φ 0 ,
where
C = d i a g V ( m 1 ) , V ( m 2 ) , , V ( m n ) = d i a g e m 1 , e m 2 , , e m n .
Using (16), the following theorem shows that the MLE of β is asymptotically normally distributed.
Theorem 1.
Let β ^ be the MLE of β and let X R n × ( p + 1 ) be the design matrix. If X X has bounded eigenvalues, then
β ^ β 0 φ d N ( 0 p + 1 , ( X C X ) 1 ) , a s   φ 0 ,
where β 0 is the true parameter.
Proof. 
See Appendix A.    □

4.2. Analysis of the Deviance

With m and φ known, we consider the distribution of deviance. We claim that when the saddlepoint approximation holds (and it does for TBE ) then the scaled deviance distribution follows an approximate chi-square distribution.
Theorem 2.
For the scaled deviance (12), we have
D * ( y , m ) d χ n 2 , a s   φ 0
at the true values of m = ( m 1 , , m n ) .
Proof. 
See Appendix A.    □
When m is unknown, it is replaced by its MLE m ^ . Thus, we define the residual and scaled residual deviances as
D ( y , m ^ ) = i = 1 n d ( y i , m ^ i )
and
D * ( y , m ^ ) = D ( y , m ^ ) φ .
As the GLM considered in Section 3 is involved with p + 1 regression parameters, it follows that
D * ( y , m ^ ) d χ n p 1 2 , a s   φ 0 .
Generally, the deviance is most useful not as an absolute measure of goodness-of-fit, but rather for comparing two nested models. For example, one may want to test whether incorporating an additional covariate significantly improves the model fit. In this case, the deviance can be employed to compare two nested GLMs that are based on the same EDM but have different fitted systematic components:
Model   A : g ( m ^ A ) = e m ^ A = β ^ 0 A 1 n + β ^ 1 A x 1 + + β ^ p A A x p A ,
and
Model   B : g ( m ^ B ) = e m ^ B = β ^ 0 B 1 n + β ^ 1 B x 1 + + β ^ p A B x p A + + β ^ p B B x p B ,
where β ^ i A denotes the MLE of β i under Model A, β ^ j B denotes the MLE of β j under Model B, and x j is a covariate, i = 0 , 1 , , p A , j = 0 , 1 , , p B . Note that Model A is a special case of Model B, with p B > p A . Accordingly, we consider the following hypotheses, to determine if the simpler Model A is adequate to model the data:
H 0 : β p A + 1 = = β p B = 0   versus   H 1 : β j 0 j { p A + 1 , , p B } .
We have previously observed that the total deviance captures that part of the log-likelihood that depends on m . Therefore, the following theorem holds, from which it can be seen that (18) is a special case of Theorem 3:
Theorem 3.
If φ is known, the likelihood ratio test (LRT) statistic for comparing Models A and B is
L = 2 { B A } = D ( y , m ^ A ) D ( y , m ^ B ) φ .
Then, under the null hypothesis in (19), L d χ p B p A 2 as φ 0 .
Proof. 
See Appendix A.    □
Consider the two models in Theorem 3 with both m and φ unknown. Then, an estimate of φ is required. This is done in Theorem 4, which is deduced from Theorem 3:
Theorem 4.
If φ is unknown, the appropriate statistic for comparing Model A with Model B is
F = [ D ( y , m ^ A ) D ( y , m ^ B ) ] / ( p B p A ) D ( y , m ^ B ) / ( n p B 1 ) ,
where D ( y , m ^ B ) / ( n p B 1 ) is an estimate of φ based on Model B. Then, under the null hypothesis in (19),
F d F ( p B p A , n p B 1 ) , a s   φ 0 .
Proof. 
It suffices to prove the asymptotic independence of D ( y , m ^ A ) D ( y , m ^ B ) and D ( y , m ^ B ) . The proof is similar to Theorem 4.3 in reference [17].    □
Note that our above statements about asymptotic distributions are all based on the assumption that φ 0 . These results are called small-dispersion asymptotics, regardless of the sample size n. Large-sample asymptotics are also well-known and, hence, no further explanations are provided.

5. Simulation Studies

5.1. Implementation

Herein, we discuss the estimation of the unknown parameters in the TBE GLM: the covariate coefficients β and the dispersion parameter φ . For the estimation of β , we use iteratively reweighted least squares (IRLS). The score vector U for β is
U ( β ) = 1 φ X W M ( y m ) ,
where W = d i a g ( W 1 , , W n ) and W i = V ( m i ) ( d η i / d m i ) 2 1 are called the working weights, and M is the diagonal matrix of the link derivatives d η i / d m i = e m i . The Fisher information matrix for β is
I ( β ) = 1 φ X W X .
Thus, an iterative technique using the Newton–Raphson method yields
β ^ ( r + 1 ) = β ^ ( r ) + I ( β ^ ( r ) ) 1 U ( β ^ ( r ) ) ,
where the Fisher information matrix I ( · ) is used in place of the observed information matrix, and the superscript ( r ) denotes the r-th iterate. The iteration can be re-organized as IRLS (cf., [6]):
β ^ ( r + 1 ) = ( X W X ) 1 X W z ,
where z , the working response vector, is given by
z = η ^ + M ( y m ^ ) ,
and all other quantities on the right-hand side are evaluated at β ^ ( r ) .
For the estimation of φ , we use the mean deviance estimator in [6]. It is easy to show that the MLE of φ is the simple mean deviance D ( y , m ^ ) / n with the saddlepoint approximation density. When taking into account the estimation of β and the residual degrees of freedom, we obtain the mean deviance estimator of φ as
φ ^ = D ( y , m ^ ) / ( n p 1 ) .
We summarize all of the above as Algorithm 1.
Algorithm 1 Estimating β and φ Based on Iteratively Reweighted Least Squares Estimation (IRLSE) and Mean Deviance
1:
Input: Data { ( y , X ) } , initial value of β , threshold τ = 10 8 .
2:
Repeat:
3:
Step 1. Obtain η = X β and m = log ( η ) .
4:
Step 2. Calculate
β n e w = ( X W X ) 1 X W z ,
where W = d i a g e m 1 , e m 2 , , e m n and z = ( z 1 , , z n ) with z i = η i + ( y i m i ) / e m i , i = 1 , , n .
5:
Step 3. Set β = β n e w .
6:
Until:  β n e w β τ .
7:
Step 4. Calculate the total deviance
D ( y , m ) = 2 i = 1 n e y i + e m i y i m i 1 ,
and φ = D ( y , m ) / ( n p 1 ) .
8:
Output:  β and φ .
We have developed an R package named TBEinf [18], which is used in our numerical experiments and is publicly available at https://github.com/xliusufe/TBEinf (accessed on 28 April 2024).
The package includes a program for computing the density of TBE by a direct calculation (cf., [3]), saddlepoint approximation (cf., [6]), Fourier inversion (cf., [2,16]), and modified W-Transformation (cf., [16,19]). Specifically, the function dTBEinf in the package calculates density by real density when method = “real”, saddlepoint approximation when method = “saddle”, Fourier inversion when method = “finverse”, and modified W-transformation when method = “mWtrans”.
Also, the package applies GLMs methodology to TBE for estimation and prediction. The estimates of the covariate coefficients β and the dispersion parameter φ are obtained through Algorithm 1.

5.2. Simulation Studies

Firstly, we generated simulated data using (16). We let the sample sizes be n = 100 ,   200 , 400 , 800, respectively, and the true values be β 0 = [ 0.1 , 0.2 , 0.3 , 0.4 , 0.5 , 0.6 , 0.7 , 0.8 ,   0.9 , 1.0 ] , φ 0 = 0.01 . The first column of X was a 1-vector, and all the other elements were random numbers sampled from U ( 0 , 1 ) . We set 1000 repetitions and then generated 1000 y for each n. We estimated β and φ according to Algorithm 1.
By applying Algorithm 1, the average value of estimated φ was φ ¯ = 3.59 × 10 5 for n = 100 , φ ¯ = 3.77 × 10 5 for n = 200 , 3.66 × 10 5 for n = 400 , and 3.73 × 10 5 for n = 800 . It can be seen that they were relatively small compared to the true value φ 0 = 0.01 . This is because the total deviance was very small.
Table 1 lists the simulation results calculated by applying Algorithm 1 with varying sample sizes n = 100 ,   200 , 400 , 800 . Herein, sd denotes the standard deviation (SD), which is computed by sd ( β ^ 0 , k ) = j = 1 1000 ( β ^ 0 , k ( j ) β ^ ¯ 0 , k ) 2 / 999 for k = 1 , , 10 , where β ^ 0 , k ( j ) is the estimate of β 0 , k at the j-th repetition and β ^ ¯ 0 , k is the average of { β ^ 0 , k ( 1 ) , , β ^ 0 , k ( 1000 ) } . Also, se denotes the standard error (SE) of β ^ , which can be calculated using (17) by se ( β ^ 0 , k ) = φ ¯ b k , where φ ¯ is the average value of estimated φ , and b k is the k-th diagonal element of ( X C X ) 1 . From Table 1, we can see that the average bias was around 10 5 and SD was around 10 3 , which demonstrated the estimation procedure performs well and stably. It is observed that the SDs were all close to the SEs, and they all decreased as n increased, which implies that the simulation results verify that the asymptotic properties are reasonable.

6. Real Data Analysis

We present the proposed estimation procedure through applications to three real datasets. The first and second datasets, grazing and hcrabs, are both from the R package ‘GLMsData’ (see [6,20]). The last one is Boston housing data.

6.1. Dataset “Grazing”

This dataset reveals the density of understorey birds across a series of sites located on either side of a stockproof fence, in two distinct areas. It has the potential to provide insights into the impact of habitat fragmentation on bird populations (cf., [20]):
  • Sample size: n = 62 ;
  • The number of variables: p = 3 ;
  • Variables description; see Table 2.
To verify the appropriateness of the TBE GLM for the data, we evaluated the prediction performance of TBE GLM and compared it with a linear model. We conducted 500 random splits of the 62 observations. In each split, we randomly selected 80% as the training set and the rest as the testing set { ( y t e s t , i , x t e s t , i ) , i = 1 , , 13 } , where 13 was the result of multiplying 62 by 20% and rounding up. We applied both TBE GLM and the linear model to the training set and estimated the coefficients.
By applying Algorithm 1 for the TBE GLM, the estimates of β and φ were
β ^ = ( 0.392 , 0.013 , 0.017 )
and φ ^ = 0.040 . Then, we predicted y t e s t , i by y ^ i = log ( η ^ i ) , where η ^ i = x t e s t , i β ^ . We let ϵ ^ i = y t e s t , i y ^ i and calculated the mean squared error (MSE) as MSE ( y ^ ) = 1 13 i = 1 13 ϵ ^ i 2 , where y ^ = ( y ^ 1 , , y ^ 13 ) . In the linear model, we estimated β as β ˜ , using least squares (without φ ). Then, we predicted y t e s t , i by y ˜ i = x t e s t , i β ˜ . We let ϵ ˜ i = y t e s t , i y ˜ i and calculated MSE ( y ˜ ) = 1 13 i = 1 13 ϵ ˜ i 2 , where y ˜ = ( y ˜ 1 , , y ˜ 13 ) .
Thus, we could compute the average and sd of the predictions’ MSEs of both models under 500 random splits. For TBE GLM, we obtained the average and sd of the MSEs to be 0.111 and 0.017, respectively. For the linear model, we obtained the average and sd of the MSEs to be 0.760 and 0.238, respectively. Thus, it can be seen that the results for the TBE GLM performed much better than the linear model for both average and sd. Additionally, we calculated the Bayesian information criterion (BIC) for both models, resulting in BIC ( TBE ) = 267.028 and BIC ( LM ) = 385.949 , where LM denotes the linear model. It is evident that the BIC for TBE GLM was significantly lower than that for the linear model, indicating a better model fit.

6.2. Dataset “Hcrabs”

This dataset describes the number of male crabs attached to female horseshoe crabs (cf., [20]):
  • Sample size: n = 173 ;
  • The number of variables: p = 5 ;
  • Variables description; see Table 3 below.
As we did with processing the first dataset above, we meticulously conducted 500 random splits of the 173 observations. In each split, we carefully selected 80% as the training set and the rest as the testing set { ( y t e s t , i , x t e s t , i ) , i = 1 , , 35 } , where 35 was the result of multiplying 173 by 20% and rounding up. We applied both TBE GLM and the linear model, ensuring the validity of our approach.
The estimates of β and φ for the TBE GLM were
β ^ = ( 1.276 , 0.001 , 0.002 , 0.885 , 0.009 )
and φ ^ = 0.004 . Then, we predicted y t e s t , i by y ^ i = log ( η ^ i ) , where η ^ i = x t e s t , i β ^ . We let ϵ ^ i = y t e s t , i y ^ i and calculated MSE ( y ^ ) = 1 35 i = 1 35 ϵ ^ i 2 , where y ^ = ( y ^ 1 , , y ^ 35 ) . In the linear model, we estimated β as β ˜ using least squares. Then, we predicted y t e s t , i by y ˜ i = x t e s t , i β ˜ . We let ϵ ˜ i = y t e s t , i y ˜ i and calculated MSE ( y ˜ ) = 1 35 i = 1 35 ϵ ˜ i 2 , where y ˜ = ( y ˜ 1 , , y ˜ 35 ) .
For TBE GLM, we ascertained the average and sd of the MSEs to be 0.011 and 0.004, respectively. For the linear model, we ascertained the average and sd of the MSEs to be 0.837 and 0.276, respectively. Here, again, it can be seen that the results for the TBE GLM performed much better than the linear model for both average and sd. We calculated the BIC for both models, resulting in BIC ( TBE ) = 48.704 and BIC ( LM ) = 150.366 . It is clear that the BIC for TBE GLM was lower than that for the linear model, indicating a superior model fit.

6.3. Dataset “Boston Housing”

This dataset is taken from Harrison Jr and Rubinfeld 1978, including 14 variables that were measured across 506 census tracts in the Boston area. The response variable can be the logarithm of the median value of the houses in those census tracts of the Boston Standard Metropolitan Statistical Area:
  • Sample size: n = 506 ;
  • The number of variables: p = 14 ;
  • Variables description; see Table 4.
Again, we conducted 500 random splits of the 506 observations. In each split, we carefully selected 80% as the training set and the rest as the testing set { ( y t e s t , i , x t e s t , i ) ,   i = 1 , , 102 } , where 102 was the result of multiplying 506 by 20% and rounding up. We applied both TBE GLM and the linear model, comparing the performance.
For the TBE GLM, the estimates of β and φ were
β ^ = ( 0.142 , 0.027 , 0.010 , 0.009 , 0.008 , 0.148 , 0.270 , 0.007 , 0.078 , 0.068 , 0.084 , 0.241 , 0.062 , 0.155 )
and φ ^ = 0.010 . Then, we predicted y t e s t , i by y ^ i = log ( η ^ i ) , where η ^ i = x t e s t , i β ^ . We let ϵ ^ i = y t e s t , i y ^ i and calculated MSE ( y ^ ) = 1 102 i = 1 102 ϵ ^ i 2 , where y ^ = ( y ^ 1 , , y ^ 102 ) . In the linear model, we estimated β as β ˜ , using least squares. Then, we predicted y t e s t , i by y ˜ i = x t e s t , i β ˜ . We let ϵ ˜ i = y t e s t , i y ˜ i and calculated MSE ( y ˜ ) = 1 102 i = 1 102 ϵ ˜ i 2 , where y ˜ = ( y ˜ 1 , , y ˜ 102 ) .
For TBE GLM, we ascertained the average and sd of the MSEs to be 0.031 and 0.009, respectively. For the linear model, we ascertained the average and sd of the MSEs to be 0.041 and 0.010, respectively. It can be seen that this dataset was appropriate for the linear model, and that our TBE GLM could also fit well, which, to some extent, reflects the wide application of TBE GLM. It can also be seen that the results for the TBE GLM were slightly better than the linear model for both average and sd. We computed the BIC for both models, yielding BIC ( TBE ) = 332.293 and BIC ( LM ) = 140.174 . The lower BIC of TBE GLM compared to the linear model indicated a superior model fit.

7. Conclusions

In this paper, we were interested in GLM methodology applied to the TBE —the EDM generated by the Landau distribution, an EDM supported on the real line. We introduced its density function, deviance, and link function. We considered the saddlepoint approximation approach for Y   TBE and then deduced the convergence of Y to normality. Based on the small dispersion and saddlepoint approximation, we derived that the asymptotic distribution of MLE for β ^ was normal. The analysis of deviance was also studied, considering different situations of φ and m . In numerical studies, we first estimated β and φ using Algorithm 1 and then evaluated its estimation performance. We reported averages of bias, standard deviations (SDs), and standard errors (SEs) in a simulation study. We demonstrated that the biases and SDs were relatively small and that the SDs were close to the SEs. As for applications to the three datasets of real data, the results for TBE GLM showed much better performance than the linear models. To some extent, this indicated the widespread applications of TBE . We also composed an R package for GLM applications of TBE .
We trust that the proposed TBE GLM will be well utilized in modeling more real data and various statistical purposes.

Author Contributions

Conceptualization, S.K.B.-L.; methodology, S.K.B.-L., X.L. and Z.X.; software, X.L. and Z.X.; validation, X.L. and Z.X.; formal analysis, X.L. and Z.X.; investigation, S.K.B.-L., X.L., A.R. and Z.X.; resources, S.K.B.-L. and X.L.; data curation, S.K.B.-L., X.L. and Z.X.; writing—original draft preparation, S.K.B.-L., X.L. and Z.X.; writing—review and editing, S.K.B.-L., X.L., A.R. and Z.X.; visualization, X.L. and Z.X.; supervision, S.K.B.-L. and X.L.; project administration, S.K.B.-L.; funding acquisition, X.L. All authors have read and agreed to the published version of the manuscript.

Funding

The research of Liu and Xiang was funded by the National Natural Science Foundation of China (12271329, 72331005), the Program for Innovative Research Team of SUFE, the Shanghai Research Center for Data Science and Decision Technology, the Open Research Fund of the Yunnan Key Laboratory of Statistical Modeling and Data Analysis, Yunnan University, and the Open Research Fund of the Key Laboratory of Analytical Mathematics and Applications (Fujian Normal University), Ministry of Education, P. R. China. The research of Bar-Lev and Ridder was funded by STAR (Stochastics—Theoretical and Applied Research), one of the four mathematics clusters within the Dutch Research Council (NWO).

Data Availability Statement

All real datasets used in this manuscript are explicitly displayed in the paper.

Acknowledgments

We thank two reviewers for helpful comments and suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
EDMexponential dispersion model
EVFexponential variance function
VFvariance function
TBETweedie, Bar-Lev, and Enis
NEFnatural exponential family
LRTlikelihood ratio test
IRLSiteratively reweighted least squares
SDstandard deviation
SEstandard error

Appendix A. Proofs

Proof of Proposition 1.
For Y   TBE , by steepness and (6), the characteristic function of Y is
Φ ( t ; θ , φ ) = E exp ( i t y ) = exp ( i t y ) h φ ( y ) exp φ 1 θ y k ( θ ) d y = h φ ( y ) exp φ 1 ( θ + i t φ ) y k ( θ ) d y = exp k ( θ ) k ( θ ) φ h φ ( y ) exp θ y k ( θ ) φ d y = exp k ( θ ) k ( θ ) φ ,
where θ = θ + i t φ . The last equation holds, since the integrand is an EDM density function written in terms of θ rather than θ . If Φ ( t ; θ , φ ) is absolutely integrable, then by the Fourier inversion theorem the probability density function of Y is
f ( y ; m , φ ) = 1 2 π Φ ( t ; θ , φ ) exp ( i t y ) d t = 1 2 π exp k ( θ + i t φ ) k ( θ ) φ i t y d t = 1 2 π φ exp k ( θ + i s ) k ( θ ) i s y φ d s ,
where i is the complex imaginary unit and s = t φ .
Since m = k ( θ ) and ψ is the inverse mapping of k , we have θ = ψ ( m ) . Let θ ˜ = ψ ( y ) ; then, y = k ( θ ˜ ) . Since the integrand is analytic, we may move the path of integration from ( , ) to i ( θ θ ˜ ) + ( , ) . The density then becomes
f ( y ; m , φ ) = 1 2 π φ exp k ( θ ˜ + i s ) ( θ ˜ + i s ) y + θ y k ( θ ) φ d s .
We introduce the unit deviance,
d ( y , m ) = 2 [ t ( y , y ) t ( y , m ) ] = 2 [ y θ ( y ) k ( θ ( y ) ) y θ ( m ) + k ( θ ( m ) ) ] = 2 [ y ψ ( y ) k ( ψ ( y ) ) y θ + k ( θ ) ] = 2 [ y θ ˜ k ( θ ˜ ) y θ + k ( θ ) ] .
Let φ 0 ; then, for every fixed t, s = t φ 0 . By expanding k around θ ˜ , we obtain
k ( θ ˜ + i s ) = k ( θ ˜ ) + i s k ( θ ˜ ) + 1 2 ( i s ) 2 k ( θ ˜ ) + o ( ( i s ) 2 ) = k ( θ ˜ ) + i s y 1 2 s 2 k ( θ ˜ ) + o ( s 2 ) = k ( θ ˜ ) + i s y 1 2 s 2 V ( y ) + o ( s 2 ) ,
where k ( θ ˜ ) = V ( y ) , since V ( m ) = k ( ψ ( m ) ) and ψ ( y ) = θ ˜ .
We now consider the term in curly brackets in (A1). By introducing the unit deviance and expanding k around θ ˜ , this term becomes
φ 1 k ( θ ˜ + i s ) ( θ ˜ + i s ) y + θ y k ( θ ) = φ 1 k ( θ ˜ + i s ) ( θ ˜ + i s ) y + θ ˜ y k ( θ ˜ ) d ( y , m ) / 2 φ 1 1 2 s 2 V ( y ) 1 2 d ( y , m ) ,
where high-order terms of s 2 are discarded. From the result
exp V ( y ) s 2 2 φ d s = 2 π φ V ( y ) ,
we obtain the approximation for φ > 0 small enough,
f ( y ; m , φ ) 1 2 π φ exp φ 1 1 2 s 2 V ( y ) 1 2 d ( y , m ) d s = 1 2 π φ 2 π φ V ( y ) exp 1 2 φ d ( y , m ) = 1 2 π φ V ( y ) exp 1 2 φ d ( y , m ) .
This completes the proof of Proposition 1 (cf., [2]). □
Proof of Corollary 1.
Before proving this, we prove a lemma: the unit scaled deviance behaves approximately as the normal unit deviance near its minimum m = y (cf., [2]). Let
d * ( y , m ) = d ( y , m ) φ = 2 e y + e m y m 1 φ .
For Y   TBE , we first show that
2 d * y 2 ( m , m ) = 2 d * m 2 ( m , m ) = 2 d * m y ( m , m ) .
By a simple calculation, we have
d * y ( y , m ) = 2 φ ( e m e y ) , i . e . , d * y ( m , m ) = 0 .
d * m ( y , m ) = 2 φ ( e m ( y m ) ) , i . e . , d * m ( m , m ) = 0 .
2 d * y 2 ( y , m ) = 2 φ e y , i . e . , 2 d * y 2 ( m , m ) = 2 φ e m .
2 d * m 2 ( y , m ) = 2 φ e m y + 1 m , i . e . , 2 d * m 2 ( m , m ) = 2 φ e m .
2 d * m y ( y , m ) = 2 φ e m , i . e . , 2 d * m y ( m , m ) = 2 φ e m .
Thus, (A2) holds. Then, the unit variance function satisfies the relationship
φ V ( m ) = φ e m = 2 2 d * y 2 ( m , m ) = 2 2 d * m 2 ( m , m ) = 2 2 d * m y ( m , m ) .
Furthermore, (A3) implies the following second-order Taylor expansion of d * near its minimum ( δ 2 0 ):
d * ( m 0 + a δ , m 0 + b δ ) = δ 2 φ V ( m 0 ) ( a b ) 2 + o ( δ 2 ) .
This expansion shows that the unit deviance behaves approximately as does the normal unit deviance near its minimum.
Let the mean of Y be m 0 and Z = Y m 0 φ . From the saddlepoint approximation (14) of Y, we ascertain, for φ > 0 small enough, that
f ( y ; m 0 , φ ) 1 2 π φ V ( y ) exp 1 2 d * ( y , m 0 ) .
Then, substituting (A4) and taking δ = φ , b = 0 , we ascertain, for φ > 0 small enough, that
f ( z ; φ ) = φ f ( y ; m 0 , φ ) φ 1 2 π φ V ( φ z + m 0 ) exp 1 2 d * ( φ z + m 0 , m 0 ) = 1 2 π V ( φ z + m 0 ) exp 1 2 φ φ V ( m 0 ) z 2 + o ( φ ) 1 2 π V ( m 0 ) exp 1 2 V ( m 0 ) z 2 .
The last “≈” holds because of the continuity of V ( x ) = e x . □
Proof of Theorem 1.
First, we prove the consistency of β ^ , i.e., β ^ P β 0 as φ 0 , where β 0 is the true parameter. We shall consider the behavior of the log-likelihood l ( β ) on the sphere Q h with center at the true point β 0 and radius h. We will show that for any sufficiently small h, the probability of
l ( β ) < l ( β 0 )
tends to 1 at all points β on the surface of Q h , i.e., β β 0 2 = h (cf., [17]). Note that this method also handles the proof of the MLE’s consistency in large-sample asymptotics.
By (13), we have
l ( β ) = l ( m ( β ) ; φ , y ) = i = 1 n ln f ( y i ; y i , φ ) 1 2 D * ( y , m ) .
Thus, we obtain
l β j = i = 1 n l m i m i η i η i β j = i = 1 n φ 1 ( y i m i ) e m i ( η i 1 ) x i j = i = 1 n φ 1 ( y i m i ) e m i e m i x i j = i = 1 n φ 1 ( y i m i ) x i j .
This can be written in a matrix form as
l β = φ 1 X ( y m ) .
Additionally, we have
2 l β β = β l β = m l β m β = m m β l m m β = m β 2 l m m m β .
We denote
m 0 = m | β = β 0 ,
M ( β ) = m β , M ( β 0 ) = M ( β ) | β = β 0 ,
K ( m ) = 2 l m m , K ( m 0 ) = K ( m ) | β = β 0 = K ( m ) | m = m 0 .
Through differential operations, we obtain
K ( m ) = d i a g φ 1 ( m 1 y 1 1 ) e m 1 , , φ 1 ( m n y n 1 ) e m n .
Let
W ( m 0 ) = K ( m 0 ) | y = m 0 = d i a g φ 1 exp ( m 1 0 ) , , φ 1 exp ( m n 0 ) ,
where m 0 = ( m 1 0 , , m n 0 ) . Obviously, W ( m 0 ) is negative definite. By (16), we know Y P m 0 as φ 0 (by use of the Chebyshev inequality); then
K ( m 0 ) P W ( m 0 ) .
For sufficiently small h, by expanding l ( β ) around the true point β 0 and multiplying by φ , we have
φ l ( β ) l ( β 0 ) = φ ( β β 0 ) l β | β = β 0 + 1 2 φ ( β β 0 ) 2 l β β | β = β 0 ( β β 0 ) + o ( φ β β 0 2 2 ) = φ ( β β 0 ) l β | β = β 0 + 1 2 φ ( β β 0 ) M ( β 0 ) K ( m 0 ) M ( β 0 ) ( β β 0 ) + o ( h 2 ) : = S 1 + S 2 + o ( h 2 ) .
We now consider S 1 . Suppose that X X has bounded eigenvalues and its maximum eigenvalue is λ m a x . By (A5) and the Cauchy–Schwarz inequality, we have
| S 1 | = | ( β β 0 ) X ( y m 0 ) | β β 0 2 X ( y m 0 ) 2 = h X ( y m 0 ) 2 = h ( y m 0 ) X X ( y m 0 ) 1 / 2 h λ m a x 1 / 2 y m 0 2 h λ m a x 1 / 2 y m 0 1 .
Consider y m 0 1 . For each i 1 , 2 , , n , by (16) and the Chebyshev inequality we obtain, by letting φ 0 ,
P φ 1 / 2 | y i m i 0 | h 2 n n 2 exp ( m i 0 ) h 4 ;
that is,
P | y i m i 0 | h 2 n φ n 2 exp ( m i 0 ) h 4 ,
implying that
P ( y m 0 1 h 2 ) = P i = 1 n | y i m i 0 | h 2 P i = 1 n | y i m i 0 | h 2 n i = 1 n P | y i m i 0 | h 2 n φ n 2 h 4 i = 1 n exp ( m i 0 ) .
The last term of the above inequality tends to 0, since φ 0 , and the other terms are constants. Thus, we have
P ( y m 0 1 < h 2 ) 1 φ n 2 h 4 i = 1 n exp ( m i 0 ) .
That is, y m 0 1 < h 2 with probability tending to 1. Returning to | S 1 | , we have, with probability tending to 1,
| S 1 | λ m a x 1 / 2 h 3 .
The above argument about y m 0 1 is based on the result of convergence in the distribution of Y m 0 φ d N ( 0 n , C ) . In fact, the same conclusion can be achieved by the convergence in probability of Y P m 0 . It is crucial to note that the argument above demonstrates how convergence in probability can be achieved through convergence in distribution when φ 0 . This allows us to directly apply convergence in probability and omit any mention of convergence in distribution in terms of S 2 below.
We now consider S 2 . We have
2 S 2 = φ ( β β 0 ) M ( β 0 ) ( K ( m 0 ) W ( m 0 ) ) M ( β 0 ) ( β β 0 ) + φ ( β β 0 ) M ( β 0 ) W ( m 0 ) M ( β 0 ) ( β β 0 ) .
For the first term, we note that by (A6), φ M ( β 0 ) ( K ( m 0 ) W ( m 0 ) ) M ( β 0 ) P 0 . We use an argument analogous to that used for S 1 but replace the Chebyshev inequality with the definition of convergence in probability. Thus, the absolute value of the first term is less than a constant multiple of h 4 with probability tending to 1. The second term is a negative quadratic form in M ( β 0 ) ( β β 0 ) . Let M ( β 0 ) ( β β 0 ) = ( a 1 , a 2 , , a n ) . Then, by a straightforward calculation, we have
φ ( β β 0 ) M ( β 0 ) W ( m 0 ) M ( β 0 ) ( β β 0 ) = i = 1 n a i 2 exp ( m i 0 ) = O ( h 2 ) .
Thus, 2 S 2 is negative and we obtain
| S 2 | = O ( h 2 ) .
So, with (A7)–(A9), we have
l ( β ) l ( β 0 ) < 0
for sufficiently small h.
Because l ( β ) is continuous and differentiable on Q h , there must be a local maximum point β ^ that satisfies
l β | β = β ^ = 0 .
Combining this with (A10), we obtain
P ( β ^ β 0 < h ) 1 .
So, when φ 0 , we have
β ^ P β 0 .
Now, we will show that β ^ β 0 d N ( 0 p + 1 , S ) , where S is a covariance matrix. Denote
l ( β ) = l β .
By expanding l ( β ) around the true point β 0 , we obtain
l ( β ) = l ( β 0 ) + 2 l β β | β = β 0 ( β β 0 ) + ,
where higher-order terms are ignored. Replace β with β ^ (we can do this since β ^ P β 0 ), and then note that the left side of the equation is 0 p + 1 . Rearranging this equation, we have
l ( β 0 ) = M ( β 0 ) K ( m 0 ) M ( β 0 ) ( β ^ β 0 ) .
Consider l ( β 0 ) and M ( β 0 ) K ( m 0 ) M ( β 0 ) separately. From (A5) and (16) we have
φ l ( β 0 ) d N ( 0 p + 1 , X C X ) .
So, by a simple calculation, we find that
M ( β 0 ) = e m 1 0 e m 1 0 x 11 e m 1 0 x 1 p e m n 0 e m n 0 x n 1 e m n 0 x n p = C X .
Thus, we obtain
M ( β 0 ) W ( m 0 ) M ( β 0 ) = X C W ( m 0 ) C X = φ 1 X C X .
By (A6), we obtain
φ M ( β 0 ) K ( m 0 ) M ( β 0 ) P X C X .
By (A11)–(A13), we obtain
β ^ β 0 φ d N ( 0 p + 1 , ( X C X ) 1 ) .
This completes the proof of Theorem 1. □
Proof of Theorem 2.
First, we show the unit deviance follows an approximate χ 1 2 . By Proposition 1, the moment-generating function (MGF) of the unit deviance is approximately
M d ( y , m ) ( t ) = E exp d ( y , m ) t exp d ( y , m ) t 1 2 π φ V ( y ) exp 1 2 φ d ( y , m ) d y = 1 2 π φ V ( y ) exp 1 2 φ t 2 φ d ( y , m ) d y = ( 1 2 φ t ) 1 / 2 ( 1 2 φ t ) 1 / 2 [ 2 π φ V ( y ) ] 1 / 2 exp 1 2 φ t 2 φ d ( y , m ) d y = ( 1 2 φ t ) 1 / 2 1 [ 2 π φ V ( y ) ] 1 / 2 exp 1 2 φ d ( y , m ) d y ,
where φ = φ / ( 1 2 φ t ) . Since the integrand is the (saddlepoint) density of the distribution with φ = φ / ( 1 2 φ t ) , we have
M d ( y , m ) ( t ) = ( 1 2 φ t ) 1 / 2 , i . e . , M d ( y , m ) / φ ( t ) = ( 1 2 t ) 1 / 2 ,
which is identical to the MGF of a χ 1 2 . So, as φ 0 we have
d ( y , m ) / φ d χ 1 2 .
For the set of observations y = ( y 1 , , y n ) , where the y i ’s are independent, with y i   TBE , we have
d ( y i , m i ) / φ d χ 1 2 ,   as   φ 0 .
Then, by independence, we obtain
D * ( y , m ) d χ n 2 ,   as   φ 0 ,
which completes the proof of Theorem 2 (cf., [6]). □
Proof of Theorem 3.
We consider the four nested hypotheses (cf., [17]):
  • H 0 : m M (the saturated hypothesis);
  • H 1 : m = m ( β 0 , β 1 , , β p ) ;
  • H 2 : m = m ( β 0 , β 1 , , β p B ) ;
  • H 3 : m = m ( β 0 , β 1 , , β p A )
of dimensions n , p + 1 , p B + 1 , p A + 1 , respectively, where n > p + 1 p B + 1 > p A + 1 .
Since we proved the asymptotic normality of β ^ in Theorem 1, just as in Theorems 10.3.1 and 10.3.3 of [21], we can prove that the likelihood ratio test (LRT) statistic follows asymptotically a chi-square distribution by starting from the simple hypothesis and moving on to the composite hypothesis. That is, for LRT λ ( Y ) , we have 2 ln λ ( Y ) χ q 2 , where q is the corresponding degrees of freedom.
For testing H 0 vs. H 1 ( H 1 is the null hypothesis), we ascertain that (18) holds. For testing H 2 vs. H 3 ( H 3 is the null hypothesis), we ascertain that (20) holds. Note that their degrees of freedom are the difference in their numbers of dimensions. □

References

  1. Bar-Lev, S.K. Independent tough identical results: The Tweedie class on power variance functions and the class of Bar-Lev and Enis on reproducible natural exponential families. Int. J. Stat. Probab. 2020, 9, 30–35. [Google Scholar] [CrossRef]
  2. Jørgensen, B. The Theory of Dispersion Models; Chapman and Hall: London, UK, 1997. [Google Scholar]
  3. Bar-Lev, S.K. The Exponential Dispersion Model Generated by the Landau Distribution—A Comprehensive Review and Further Developments. Mathematics 2023, 11, 4343. [Google Scholar] [CrossRef]
  4. Dunn, P.K.; Smyth, G.K. Tweedie Family Densities: Methods of Evaluation. In Proceedings of the 16th International Workshop on Statistical Modelling, Odense, Denmark, 2–6 July 2001. [Google Scholar]
  5. Dunn, P.K.; Smyth, G.K. Series Evaluation of Tweedie Exponential Dispersion Model Densities. Stat. Comput. 2005, 15, 267–280. [Google Scholar] [CrossRef]
  6. Dunn, P.K.; Smyth, G.K. Generalized Linear Models with Examples in R; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
  7. Hougaard, P. Nonlinear Regression and Curved Exponential Families. Improvement of the Approximation to the Asymptotic Distribution. Metrika 1995, 42, 191–202. [Google Scholar] [CrossRef]
  8. Chen, Z.; Pan, E.; Xia, T.; Li, Y. Optimal degradation-based burn-in policy using Tweedie exponential-dispersion process model with measurement errors. Reliab. Syst. Saf. 2020, 195, 106748. [Google Scholar] [CrossRef]
  9. Ricci, L.; Martínez, R. Adjusted R2-type measures for Tweedie models. Comput. Stat. Data Anal. 2008, 52, 1650–1660. [Google Scholar] [CrossRef]
  10. Dunn, P.K. Tweedie: Evaluation of Tweedie Exponential Family Models, R Package Version 2.3.5; 2022. Available online: https://cran.r-project.org/web/packages/tweedie/tweedie.pdf (accessed on 12 September 2023).
  11. Smyth, G.K. Statmod: Statistical Modeling, R Package Version 1.4.30; 2017. Available online: https://CRAN.R-project.org/package=statmod (accessed on 3 April 2024).
  12. Barndorff-Nielsen, O. Information and Exponential Families in Statistical Theory; Wiley: New York, NY, USA, 1978. [Google Scholar]
  13. Merz, M.; Wüthrich, M.V. Statistical Foundations of Actuarial Learning and Its Applications; Springer: Berlin/Heidelberg, Germany, 2023. [Google Scholar]
  14. Morris, C.N. Natural exponential families with quadratic variance functions. Ann. Statist. 1982, 10, 65–80. [Google Scholar] [CrossRef]
  15. McCullagh, P.; Nelder, J.A. Generalized Linear Models, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 1989. [Google Scholar]
  16. Dunn, P.K.; Smyth, G.K. Evaluation of Tweedie exponential dispersion model densities by Fourier inversion. Stat. Comput. 2008, 18, 73–86. [Google Scholar] [CrossRef]
  17. Jørgensen, B. Small dispersion asymptotics. Braz. J. Probab. Stat. 1987, 1, 59–90. [Google Scholar]
  18. Liu, X.; Xiang, Z.; Bar-Lev, S.K.; Ridder, A. TBEinf, R Package Version 0.0.1; 2024. Available online: https://github.com/xliusufe/TBEinf (accessed on 28 April 2024).
  19. Sidi, A. A user-friendly extrapolation method for oscillatory infinite integrals. Math. Comput. 1988, 51, 249–266. [Google Scholar] [CrossRef]
  20. Dunn, P.K.; Smyth, G.K. GLMsData: Generalized Linear Model Data Sets, R Package Version 1.0.0; 2017. Available online: https://CRAN.R-project.org/package=GLMsData (accessed on 12 April 2024).
  21. Casella, G.; Berger, R.L. Statistical Inference; Thomson Learning Inc.: Duxbury, MA, USA, 2002. [Google Scholar]
Table 1. Average of bias, sd, and se of estimated β for n = 100 ,   200 , 400 , 800 . β 0 is the true value of β . β 0 , 1 is the intercept term.
Table 1. Average of bias, sd, and se of estimated β for n = 100 ,   200 , 400 , 800 . β 0 is the true value of β . β 0 , 1 is the intercept term.
β 0 n = 100 n = 200 n = 400 n = 800
bias
( × 10 5 )
sdsebias
( × 10 5 )
sdsebias
( × 10 5 )
sdsebias
( × 10 5 )
sdse
β 0 , 1 = 0.1 2.00.00560.0048 16 0.00360.0032 2.8 0.00290.0026 7.7 0.00200.0018
β 0 , 2 = 0.2 11 0.00370.0036 2.7 0.00260.0026 1.2 0.00180.0018 10 0.00130.0012
β 0 , 3 = 0.3 3.8 0.00400.0039 12 0.00260.0026 2.0 0.00170.0017 0.43 0.00130.0012
β 0 , 4 = 0.4 4.50.00360.0036110.00250.0024 5.1 0.00180.0017 0.94 0.00120.0012
β 0 , 5 = 0.5 6.6 0.00400.0038140.00260.0024 6.6 0.00180.0018 4.1 0.00120.0012
β 0 , 6 = 0.6 7.3 0.00360.0034 2.5 0.00280.0026 0.22 0.00180.0017 2.5 0.00120.0012
β 0 , 7 = 0.7 3.1 0.00360.0034 5.8 0.00250.0025 2.1 0.00180.0017 6.2 0.00120.0012
β 0 , 8 = 0.8 6.8 0.00380.0036140.00270.0026 0.61 0.00170.0018 1.1 0.00130.0012
β 0 , 9 = 0.9 120.00330.0032 7.6 0.00250.0025 2.8 0.00180.0017 2.8 0.00130.0012
β 0 , 10 = 1.0 9.30.00380.0036 0.84 0.00250.0024 3.2 0.00180.0017 3.6 0.00130.0012
Table 2. Variables description of grazing dataset.
Table 2. Variables description of grazing dataset.
VariablesMeanings
Birdsthe number of understorey birds; a numeric vector
Whenwhen the bird count was conducted; a factor with levels Before (before herbivores were removed) and After (after herbivores were removed)
Grazedwhich side of the stockproof fence; a factor with levels Reference (grazed by native herbivores) and Feral (grazed by feral herbivores, mainly horses)
Table 3. Variables description of hcrabs dataset.
Table 3. Variables description of hcrabs dataset.
VariablesMeanings
Colthe color of the female; a factor with levels LM (light medium), M (medium), DM (dark medium) or D (dark)
Spinethe spine condition; a factor with levels BothOK, OneOK or NoneOK
Widththe carapace width of the female crab in cm; a numeric vector
Wtthe weight of the female crab in grams; a numeric vector
Satthe number of male crabs attached to the female (‘satellites’); a numeric vector
Table 4. Variables description of Boston housing dataset.
Table 4. Variables description of Boston housing dataset.
VariablesMeanings
CRIMCrime rate by town
ZNProportion of residential land zoned for lots over 25,000 sq. ft
INDUSProportion of nonretail business acres per town
CHASCharles River dummy variable (1 if tract bounds the river)
NOXConcentration of nitrogen oxides in parts per 10 million
RMAverage number of rooms per dwelling
AGEProportion of owner-occupied units built prior to 1940
DISWeighted mean of distances to five Boston employment centres
RADIndex of accessibility to radial highways
TAXFull-value property-tax rate per $10,000
PTRATIOPupil/teacher ratio by town
BThe proportion of black people by town
LSTATPercentage of people of lower status
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bar-Lev, S.K.; Liu, X.; Ridder, A.; Xiang, Z. Generalized Linear Model (GLM) Applications for the Exponential Dispersion Model Generated by the Landau Distribution. Mathematics 2024, 12, 2021. https://doi.org/10.3390/math12132021

AMA Style

Bar-Lev SK, Liu X, Ridder A, Xiang Z. Generalized Linear Model (GLM) Applications for the Exponential Dispersion Model Generated by the Landau Distribution. Mathematics. 2024; 12(13):2021. https://doi.org/10.3390/math12132021

Chicago/Turabian Style

Bar-Lev, Shaul K., Xu Liu, Ad Ridder, and Ziyu Xiang. 2024. "Generalized Linear Model (GLM) Applications for the Exponential Dispersion Model Generated by the Landau Distribution" Mathematics 12, no. 13: 2021. https://doi.org/10.3390/math12132021

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop