Next Article in Journal
Evolutionary Multiobjective Design Approach for Robust Balancing of the Shaking Force, Shaking Moment, and Torque under Uncertainties: Application to Robotic Manipulators
Next Article in Special Issue
Quick and Complete Convergence in the Law of Large Numbers with Applications to Statistics
Previous Article in Journal
Auto-Encoders in Deep Learning—A Review with New Perspectives
Previous Article in Special Issue
Branching Random Walks with One Particle Generation Center and Possible Absorption at Every Point
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Limit Distributions for the Estimates of the Digamma Distribution Parameters Constructed from a Random Size Sample

by
Alexey Kudryavtsev
1,2,* and
Oleg Shestakov
1,2,3,*
1
Faculty of Computational Mathematics and Cybernetics, M. V. Lomonosov Moscow State University, Moscow 119991, Russia
2
Moscow Center for Fundamental and Applied Mathematics, Moscow 119991, Russia
3
Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, Moscow 119333, Russia
*
Authors to whom correspondence should be addressed.
Mathematics 2023, 11(8), 1778; https://doi.org/10.3390/math11081778
Submission received: 9 March 2023 / Revised: 31 March 2023 / Accepted: 6 April 2023 / Published: 7 April 2023

Abstract

:
In this paper, we study a new type of distribution that generalizes distributions from the gamma and beta classes that are widely used in applications. The estimators for the parameters of the digamma distribution obtained by the method of logarithmic cumulants are considered. Based on the previously proved asymptotic normality of the estimators for the characteristic index and the shape and scale parameters of the digamma distribution constructed from a fixed-size sample, we obtain a statement about the convergence of these estimators to the scale mixtures of the normal law in the case of a random sample size. Using this result, asymptotic confidence intervals for the estimated parameters are constructed. A number of examples of the limit laws for sample sizes with special forms of negative binomial distributions are given. The results of this paper can be widely used in the study of probabilistic models based on continuous distributions with an unbounded non-negative support.

1. Introduction

Distributions belonging to beta and gamma classes play an essential role in probability theory and mathematical statistics. Such distributions have proven themselves as convenient and efficient tools in modeling a large number of real processes and phenomena [1,2,3,4,5,6]. Special cases of the generalized beta distribution of the second kind and the generalized gamma distribution can have the properties of infinite divisibility and stability, which makes it possible to use them as asymptotic approximations in various limit theorems. Ref. [7] proposed a new probability distribution closely related to both beta and gamma classes.
Definition 1.
We say that the random variable ζ has the digamma distribution  D i G ( r , ν , p , q , δ )  with a characteristic index  r R  and the parameters of shape  ν 0 , concentration  p , q > 0 , and scale  δ > 0 , if its Mellin transform is
M ζ ( z ) = δ z Γ p + z / ν Γ q r z / ν Γ ( p ) Γ ( q ) , p + Re ( z ) ν > 0 , q r Re ( z ) ν > 0 ,
where  Re ( z )  is the real part of a complex number z, and  Γ ( z )  is Euler’s gamma function.
Particular types of digamma distribution include the generalized gamma distribution (also known as the Amoroso distribution with zero shift) [8], the generalized beta distribution of the second kind (also known as the McDonald distribution) [9], and the gamma-exponential distribution [10].
The digamma distribution (1) can be represented as a scale mixture of two generalized gamma-distributed random variables, i.e., for  ζ D i G ( r , ν , p , q , δ )  and independent random variables  λ Γ ( p , 1 )  and  μ Γ ( q , 1 )  with gamma distributions
ζ = d δ λ μ r 1 / ν .
This representation makes it possible [11] to use the digamma distribution for an adequate description of the Bayesian balance models proposed in [12].
Assuming that the process is modeled using the digamma distribution, the problem of statistical estimation of its unknown parameters inevitably arises [5,13,14]. As shown in Ref. [7], the density of the digamma distribution is expressed in terms of the special Fox’s H-function. This significantly complicates the application of the maximum likelihood method. The form of the Mellin transform (1) of the digamma distribution also indicates the infeasibility of using the direct method of moments. Refs. [15,16,17] originally proposed a modified method for estimating the parameters of the gamma-exponential distribution based on logarithmic moments and cumulants. Due to the fact that the digamma distribution and the gamma-exponential distribution have the Mellin transform of the same type (up to the range of the parameter r), all previously obtained conclusions about the form of estimates by the method of logarithmic cumulants for the gamma-exponential distribution automatically remain valid for the digamma distribution, taking into account the formal expansion of the characteristic index range from a unit interval to the entire real line.
In today’s rapidly changing world, it is quite problematic to use the traditional statistical approach based on the analysis of fixed-size samples. Thus, in the context of the global crisis caused by the COVID-19 epidemic, it is necessary to have a mechanism to respond to negative impacts using only the currently available data. Since the accumulation of a sufficient fixed amount of statistics can often take an indefinite time, it makes sense to strive for the possession of methods that allow one to draw adequate conclusions based on an a priori indefinite number of observations. This approach inevitably leads to the consideration of models with randomized sample sizes and is usually found not only in medicine but also in other fields in situations where the accumulation of statistical data continues not up to a certain amount but, rather, over a given period of time. For example, a similar situation can be observed in insurance when a different number of insurance events (insurance payments and/or insurance contracts) occur during different reporting periods of the same length (say, months), etc. Due to these circumstances, it becomes quite natural to study the asymptotic behavior of distributions of fairly general statistics based on the random size samples. When replacing a non-random sample size with a random variable, the asymptotic properties of statistics can radically change. This fact was apparently first noted by B.V. Gnedenko in 1989 [18,19]. It was shown that if the sample size is a geometrically distributed random variable, then instead of the normal law expected in accordance with the classical theory, a Student distribution with two degrees of freedom arises as an asymptotic distribution for the sample median, whose tails are so heavy that it does not have second-order moments. The “heaviness” of the tails of asymptotic distributions is of critical importance, in particular, in problems of testing hypotheses.
The distributions from the gamma class and their derivatives have become very popular for modeling random non-negative parameters, and, when modeling a random number of events and studying an a priori unknown number of observations, their discrete analogs are widely used, which are mixed Poisson distributions with corresponding continuous structural distributions.
The discrete analog of the gamma distribution  Γ ( p , δ )  is the negative binomial distribution, whose partial probabilities for  n = 0 , 1 ,  are
P ( N = n ) = 0 λ n + p 1 e ( 1 + 1 / δ ) λ δ p Γ ( p ) n ! d λ = Γ ( n + p ) Γ ( n + 1 ) Γ ( p ) δ δ + 1 n 1 δ + 1 p .
A natural generalization of the distribution (3) is the mixed Poisson distribution whose structure is given by the generalized gamma distribution  G G ( ν , p , δ )  with the density
f ( x ) = | ν | x ν p 1 e ( x / δ ) ν δ ν p Γ ( p ) , ν 0 , p > 0 , x > 0 .
Such distributions are called generalized negative binomial distributions and are widely used in insurance, financial mathematics, physics, and other fields [20,21,22,23,24].
The purpose of this article is to study the asymptotic behavior of digamma distribution parameter estimates under conditions of an a priori unknown sample size.
The article has the following structure. Section 2 describes a method for obtaining digamma distribution parameter estimates; auxiliary relations are given. Section 3 contains the main statement of this paper on the asymptotic behavior of the digamma distribution parameter estimates constructed from random size samples. Section 4 discusses special cases of limit distributions. This paper also contains a section with our conclusions.

2. Auxiliary Relations

This section describes a method based on logarithmic cumulants for obtaining estimators for the parameters r ν , and  δ  of the digamma distribution (1) with fixed concentration parameters p and q and a sample of a non-random size n. Estimating the parameters p and q is a separate problem due to the analytical complexity of inverting the polygamma function.
The results and relations of this section were published in Ref. [17] and are provided as auxiliary statements.
To obtain an explicit form of theoretical logarithmic cumulants, consider the polygamma functions
ψ ( z ) = d d z ln Γ ( z ) , ψ ( m ) ( z ) = d m + 1 d z m + 1 ln Γ ( z ) , m = 1 , 2 ,
The theoretical cumulants of the random variable  ln ζ  for  ζ D i G ( r , ν , p , q , δ )  have the form
κ 1 ( r , ν , δ ) = E ln ζ = ν ln δ + ψ ( p ) r ψ ( q ) ν ;
κ m ( r , ν ) = ( i ) m d m d y m ln E ζ i y | y = 0 = ψ ( m 1 ) ( p ) + ( r ) m ψ ( m 1 ) ( q ) ν m , m > 1 .
The moments of the random variable  ln ζ  can be represented as [25]
μ m ( r , ν , δ ) E ln m ζ = B m ( κ 1 ( r , ν , δ ) , κ 2 ( r , ν ) , , κ m ( r , ν ) ) ,
where  B m  is a complete (exponential) Bell polynomial that can be recurrently defined as
B m + 1 ( x 1 , , x m + 1 ) = k = 0 m C m k B m k ( x 1 , , x m k ) x k + 1 , B 0 = 1 .
An explicit form of the necessary relations connecting moments and cumulants can be found in Ref. [25].
In addition, we will need the following moment characteristics of the logarithm of a random variable with a digamma distribution:
σ m 2 ( r , ν , δ ) D ln m ζ = μ 2 m ( r , ν , δ ) μ m 2 ( r , ν , δ ) ; σ m l ( r , ν , δ ) cov ( ln m ζ , ln l ζ ) = μ m + l ( r , ν , δ ) μ m ( r , ν , δ ) μ l ( r , ν , δ ) .
To define the sample logarithmic cumulants, we introduce a notation for the sample logarithmic moments of the random variable  ζ :
L m ( X n ) = 1 n i = 1 n ln m X i ,
where  X n = ( X 1 , , X n )  is a sample from the distribution  ζ  of non-random size n.
Let us denote  l = ( l 1 , l 2 , l 3 , l 4 ) . Consider the functions
K 1 ( l ) K 1 ( l 1 ) = ( ψ ( q ) ) 1 l 1 ;
K 2 ( l ) K 2 ( l 1 , l 2 ) = ( ψ ( q ) ) 1 ( l 2 l 1 2 ) ;
K 3 ( l ) K 3 ( l 1 , l 2 , l 3 ) = ( ψ ( q ) ) 1 ( l 3 3 l 2 l 1 + 2 l 1 3 ) ;
K 4 ( l ) K 4 ( l 1 , l 2 , l 3 , l 4 ) = ( ψ ( q ) ) 1 ( l 4 4 l 3 l 1 3 l 2 2 + 12 l 2 l 1 2 6 l 1 4 ) .
Consider the statistics
K 1 ( X n ) K 1 ( L 1 ( X n ) ) ;
K 2 ( X n ) K 2 ( L 1 ( X n ) , L 2 ( X n ) ) ;
K 3 ( X n ) K 3 ( L 1 ( X n ) , L 2 ( X n ) , L 3 ( X n ) ) ;
K 4 ( X n ) K 4 ( L 1 ( X n ) , L 2 ( X n ) , L 3 ( X n ) , L 4 ( X n ) ) .
Note that the statistics  ψ ( m 1 ) ( q ) K m ( X n )  are the m-th sample logarithmic cumulants of the digamma distribution.
The method for estimating the unknown parameters considered in this paper is based on solving the system for logarithmic cumulants:
κ m ( r , ν , δ ) = ψ ( m 1 ) ( q ) K m ( X n ) , m = 1 , 2 , 3 , 4 .
To describe the solution of this system, we introduce a number of functions of sample logarithmic cumulants with the arguments  k = ( k 1 , k 2 , k 3 , k 4 ) :
ϕ m = ψ ( m ) ( p ) ψ ( m ) ( q ) ; τ ( k ) τ ( k 2 , k 4 ) = ϕ 1 2 k 4 + ϕ 3 k 4 k 2 2 ;
R ± ( k ) R ± ( k 2 , k 4 ) = ϕ 1 k 4 ± k 2 τ ( k ) k 2 2 k 4 ; V ± ( k ) V ± ( k 2 , k 4 ) = ϕ 1 k 2 ± τ ( k ) k 2 2 k 4 ; D ± ( k ) D ± ( k 1 , k 2 , k 4 ) = exp ψ ( q ) k 1 + ψ ( q ) R ± ( k ) ψ ( p ) V ± ( k ) .
In what follows, we will need the derivatives of functions (10), expressed in terms of the functions  ϕ m  and  τ , defined in (9). Note that
R k 2 , ± ( k ) R ± k 2 ( k 2 , k 4 ) = k 4 ϕ 1 2 k 2 2 + τ ( k ) ± 2 ϕ 1 k 2 τ ( k ) 2 k 2 2 k 4 3 / 2 τ ( k ) ϕ 1 k 4 ± k 2 τ ( k ) ; R k 4 , ± ( k ) R ± k 4 ( k 2 , k 4 ) = ± k 2 ϕ 1 2 k 2 2 + τ ( k ) ± 2 ϕ 1 k 2 τ ( k ) 4 ( k 2 2 k 4 ) 3 / 2 τ ( k ) ϕ 1 k 4 ± k 2 τ ( k ) ; V k 2 , ± ( k ) V ± k 2 ( k 2 , k 4 ) = k 2 ϕ 1 2 k 4 + τ ( k ) ± ϕ 1 ( k 2 2 + k 4 ) τ ( k ) 2 ( k 2 2 k 4 ) 3 / 2 τ ( k ) ϕ 1 k 2 ± τ ( k ) ; V k 4 , ± ( k ) V ± k 4 ( k 2 , k 4 ) = ± ϕ 1 2 k 2 2 + τ ( k ) ± 2 ϕ 1 k 2 τ ( k ) 4 ( k 2 2 k 4 ) 3 / 2 τ ( k ) ϕ 1 k 2 ± τ ( k ) ; D k 1 , ± ( k ) D ± k 1 ( k 1 , k 2 , k 4 ) = ψ ( q ) exp ψ ( q ) k 1 + ψ ( q ) R ± ( k ) ψ ( p ) V ± ( k ) ; D k 2 , ± ( k ) D ± k 2 ( k 1 , k 2 , k 4 ) = exp ψ ( q ) k 1 + ψ ( q ) R ± ( k ) ψ ( p ) V ± ( k ) × × ψ ( p ) V k 2 , ± ( k ) + ψ ( q ) R k 2 , ± ( k ) V ± ( k ) ψ ( q ) R ± ( k ) V k 2 , ± ( k ) V ± 2 ( k ) ; D k 4 , ± ( k ) D ± k 4 ( k 1 , k 2 , k 4 ) = exp ψ ( q ) k 1 + ψ ( q ) R ± ( k ) ψ ( p ) V ± ( k ) × × ψ ( p ) V k 4 , ± ( k ) + ψ ( q ) R k 4 , ± ( k ) V ± ( k ) ψ ( q ) R ± ( k ) V k 4 , ± ( k ) V ± 2 ( k ) .
Using the formula for the derivative of a composite function, we obtain
R ± l 1 ( l ) = 2 l 1 ψ ( q ) R k 2 , ± ( K 2 ( l ) , K 4 ( l ) ) 4 l 3 24 l 2 l 1 + 24 l 1 3 ψ ( q ) R k 4 , ± ( K 2 ( l ) , K 4 ( l ) ) ; R ± l 2 ( l ) = 1 ψ ( q ) R k 2 , ± ( K 2 ( l ) , K 4 ( l ) ) 6 l 2 12 l 1 2 ψ ( q ) R k 4 , ± ( K 2 ( l ) , K 4 ( l ) ) ; R ± l 3 ( l ) = 4 l 1 ψ ( q ) R k 4 , ± ( K 2 ( l ) , K 4 ( l ) ) ; R ± l 4 ( l ) = 1 ψ ( q ) R k 4 , ± ( K 2 ( l ) , K 4 ( l ) ) ;
V ± l 1 ( l ) = 2 l 1 ψ ( q ) V k 2 , ± ( K 2 ( l ) , K 4 ( l ) ) 4 l 3 24 l 2 l 1 + 24 l 1 3 ψ ( q ) V k 4 , ± ( K 2 ( l ) , K 4 ( l ) ) ; V ± l 2 ( l ) = 1 ψ ( q ) V k 2 , ± ( K 2 ( l ) , K 4 ( l ) ) 6 l 2 12 l 1 2 ψ ( q ) V k 4 , ± ( K 2 ( l ) , K 4 ( l ) ) ; V ± l 3 ( l ) = 4 l 1 ψ ( q ) V k 4 , ± ( K 2 ( l ) , K 4 ( l ) ) ; V ± l 4 ( l ) = 1 ψ ( q ) V k 4 , ± ( K 2 ( l ) , K 4 ( l ) ) ; D ± l 1 ( l ) = 1 ψ ( q ) D k 1 , ± ( K 1 ( l ) , K 2 ( l ) , K 4 ( l ) ) 2 l 1 ψ ( q ) D k 2 , ± ( K 1 ( l ) , K 2 ( l ) , K 4 ( l ) ) 4 l 3 24 l 2 l 1 + 24 l 1 3 ψ ( q ) D k 4 , ± ( K 1 ( l ) , K 2 ( l ) , K 4 ( l ) ) ; D ± l 2 ( l ) = 1 ψ ( q ) D k 2 , ± ( K 1 ( l ) , K 2 ( l ) , K 4 ( l ) ) 6 l 2 12 l 1 2 ψ ( q ) D k 4 , ± ( K 1 ( l ) , K 2 ( l ) , K 4 ( l ) ) ; D ± l 3 ( l ) = 4 l 1 ψ ( q ) D k 4 , ± ( K 1 ( l ) , K 2 ( l ) , K 4 ( l ) ) ; D ± l 4 ( l ) = 1 ψ ( q ) D k 4 , ± ( K 1 ( l ) , K 2 ( l ) , K 4 ( l ) ) .
To formulate the statement about the asymptotic normality of estimators for the parameters r ν , and  δ  with fixed concentration parameters p and q for a fixed sample size n, we introduce some notations. Let
Σ = σ 1 2 ( r , ν , δ ) σ 12 ( r , ν , δ ) σ 13 ( r , ν , δ ) σ 14 ( r , ν , δ ) σ 12 ( r , ν , δ ) σ 2 2 ( r , ν , δ ) σ 23 ( r , ν , δ ) σ 24 ( r , ν , δ ) σ 13 ( r , ν , δ ) σ 23 ( r , ν , δ ) σ 3 2 ( r , ν , δ ) σ 34 ( r , ν , δ ) σ 14 ( r , ν , δ ) σ 24 ( r , ν , δ ) σ 34 ( r , ν , δ ) σ 4 2 ( r , ν , δ ) ; d R ± = R ± l 1 ( l ) | l = μ , R ± l 2 ( l ) | l = μ , R ± l 3 ( l ) | l = μ , R ± l 4 ( l ) | l = μ ; d V ± = V ± l 1 ( l ) | l = μ , V ± l 2 ( l ) | l = μ , V ± l 3 ( l ) | l = μ , V ± l 4 ( l ) | l = μ ; d D ± = D ± l 1 ( l ) | l = μ , D ± l 2 ( l ) | l = μ , D ± l 3 ( l ) | l = μ , D ± l 4 ( l ) | l = μ ,
where the variances  σ m 2 ( r , ν , δ )  and the covariances  σ m l ( r , ν , δ )  are defined in the relations (6), the partial derivatives  R ± / l k ( l ) V ± / l k ( l ) , and  D ± / l k ( l )  are defined in (11), and  μ = ( μ 1 , μ 2 , μ 3 )  is the vector of moments (5).
Previously, in Ref. [17], the following result was obtained for the gamma-exponential distribution.
Theorem 1.
Let  0 r < 1  and  ν > 0 . Assume that the concentration parameters p and q of the digamma distribution  D i G ( r , ν , p , q , δ )  are fixed. Then, for  r > ϕ 3 / ϕ 1 , the estimators  r ^ ( X n ) = R + ( K 2 ( X n ) , K 4 ( X n ) )  for the unknown characteristic index r,  ν ^ ( X n ) = V + ( K 2 ( X n ) , K 4 ( X n ) )  for the unknown shape parameter ν and  δ ^ ( X n ) = D + ( K 1 ( X n ) , K 2 ( X n ) , K 4 ( X n ) )  for the unknown scale parameter δ have the property of asymptotic normality when  n :
n r ^ ( X n ) r d R + Σ d R + T N ( 0 , 1 ) , n ν ^ ( X n ) ν d V + Σ d V + T N ( 0 , 1 ) ; n δ ^ ( X n ) δ d D + Σ d D + T N ( 0 , 1 ) .
Remark 1.
In addition to the property of asymptotic normality, the estimators listed in Theorem 1 have the property of strong consistency [16].
Remark 2.
In Theorem 1, if  0 r < ϕ 3 / ϕ 1 , then one should choose the statistics  r ^ ( X n ) = R ( K 2 ( X n ) , K 4 ( X n ) ) ν ^ ( X n ) = V ( K 2 ( X n ) , K 4 ( X n ) ) , and  δ ^ ( X n ) = D ( K 1 ( X n ) , K 2 ( X n ) , K 4 ( X n ) )  with a corresponding modification of the normalizing constants in (13) [17].
Remark 3.
In Theorem 1, if  ν < 0 , then one should choose as an estimator for the unknown parameter ν the statistics  ν ^ ( X n ) = V + ( K 2 ( X n ) , K 4 ( X n ) )  if  r > ϕ 3 / ϕ 1 , and  ν ^ ( X n ) = V ( K 2 ( X n ) , K 4 ( X n ) )  if  0 r < ϕ 3 / ϕ 1 .
Remark 4.
Since the gamma-exponential distribution and the digamma distribution have the Mellin transform of the same type (1), the results of Theorem 1 and Remark 1 remain valid for all  r 0 . In the case when  r < 0 , one should consider as an estimator for the parameter r the statistics  r ^ ( X n ) = R + ( K 2 ( X n ) , K 4 ( X n ) )  for  r < ϕ 3 / ϕ 1  and  r ^ ( X n ) = R ( K 2 ( X n ) , K 4 ( X n ) )  for  ϕ 3 / ϕ 1 < r 0 .
Remark 5.
When processing real data, one should first choose one of the statistics  ± R ± ( K 2 ( X n ) , K 4 ( X n ) ) , and  ± V ± ( K 2 ( X n ) , K 4 ( X n ) )  as the estimators  r ^ ( X n )  and  ν ^ ( X n ) , using the algorithm for eliminating unnecessary solutions described in Ref. [17]. The estimator for the unknown parameter δ is always defined by the formula
δ ^ ( X n ) = exp ψ ( q ) K 1 ( X n ) + ψ ( q ) r ^ ( X n ) ψ ( p ) ν ^ ( X n ) .

3. Main Result

Everywhere below we will assume that the sample size is random. To obtain asymptotic approximations, it is reasonable to consider a situation in which the random size of the sample increases in some sense. We will consider a sequence  N n  such that  N n  in probability as  n .
Let the non-random size sample  X n = ( X 1 , , X n )  and the random size sample  X N n = ( X 1 , , X N n )  be from the digamma distribution  D i G ( r , ν , p , q , δ )  with the known concentration parameters p and q.
Using the Functions (7) and (8), we construct the statistics
K 2 ( X N n ) K 2 ( L 1 ( X N n ) , L 2 ( X N n ) ) ;
K 4 ( X N n ) K 4 ( L 1 ( X N n ) , L 2 ( X N n ) , L 3 ( X N n ) , L 4 ( X N n ) ) ,
based on sample logarithmic moments
L m ( X N n ) = 1 N n i = 1 N n ln m X i .
Let  N n  be a sequence of natural-valued random variables independent of  X 1 , X 2 , , for each n, and let  N n  tend toward infinity in probability as  n .
The following statement holds.
Theorem 2.
Let  r > ϕ 3 / ϕ 1  and  ν > 0 . Suppose that the concentration parameters p and q of the digamma distribution  D i G ( r , ν , p , q , δ )  are fixed. Assume that there exists a numerical sequence  { b n > 0 }  and a random variable U such that
N n b n U
when  n . Then, the estimators  r ^ ( X N n ) = R + ( K 2 ( X N n ) , K 4 ( X N n ) )  for the unknown characteristic index r,  ν ^ ( X N n ) = V + ( K 2 ( X N n ) , K 4 ( X N n ) )  for the unknown shape parameter ν, and  δ ^ ( X N n ) = D + ( K 1 ( X N n ) , K 2 ( X N n ) , K 4 ( X N n ) )  for the unknown scale parameter δ converge in distribution when  n :
b n r ^ ( X N n ) r d R + Σ d R + T Y U , b n ν ^ ( X N n ) ν d V + Σ d V + T Y U , b n δ ^ ( X N n ) δ d D + Σ d D + T Y U ,
where Y has a standard normal distribution, and U can be considered independent of Y.
Proof of Theorem 2.
We consider the statement of the theorem for estimating the characteristic index r. The argument is based on the method proposed in Ref. [26].
Denote
a n = d R + Σ d R + T b n , c n = d R + Σ d R + T n .
Let  h n ( t )  be the characteristic function of a random variable
Y n n r ^ ( X n ) r d R + Σ d R + T r ^ ( X n ) r c n ;
and  f n ( t )  be the characteristic function of
Z n b n r ^ ( X N n ) r d R + Σ d R + T r ^ ( X N n ) r a n .
Theorem 1 implies that when  n
Y n Y N ( 0 , 1 ) .
Denote by  h ( t )  the characteristic function of a standard normal random variable Y. Define the random variables
U n c N n a n .
Let
g n ( t ) = E h ( t U n ) .
Let us show that for any  t R
lim n | f n ( t ) g n ( t ) | = 0 .
For some positive number  γ  and positive integer m, we define
K 1 , n K 1 , n ( γ ) = { m | c m γ a n } , K 2 , n K 2 , n ( γ ) = { m | c m > γ a n } .
For  t = 0 , the statement is obvious. Fix an arbitrary  t 0 . Then,
| f n ( t ) g n ( t ) | = E exp { i t Z n } E h ( t U n ) =
= m = 1 P ( N n = m ) E exp i t r ^ ( X m ) r a n h t c m a n =
= m = 1 P ( N n = m ) E exp i t c m a n · r ^ ( X m ) r c m h t c m a n =
= m = 1 P ( N n = m ) h m t c m a n h t c m a n m K 1 , n P ( N n = m ) h m t c m a n h t c m a n +
+ m K 2 , n P ( N n = m ) h m t c m a n h t c m a n = I 1 + I 2 .
Fix an arbitrary  ϵ > 0 . Consider  I 2 .
I 2 = m K 2 , n P ( N n = m ) h m t c m a n h t c m a n 2 m K 2 , n P ( N n = m ) = 2 P ( U n > γ ) < ϵ / 2
for all  γ > γ 2 ( ϵ ) , due to the convergence  U n 1 / U .
Now, consider  I 1 . Let  γ > γ 2 ( ϵ ) . Since  | t c m / a n | | t | γ ,
I 1 = m K 1 , n P ( N n = m ) h m t c m a n h t c m a n
m = 1 P ( N n = m ) sup | τ | γ | t | | h m ( τ ) h ( τ ) | = E sup | τ | γ | t | | h N n ( τ ) h ( τ ) | .
Due to the uniform convergence of the sequence of characteristic functions  h n ( t )  to  h ( t )  on any finite interval and the convergence  N n  in probability,
E sup | τ | γ | t | | h N n ( τ ) h ( τ ) | < ϵ / 2
starting from some n.
Since  I 1 + I 2 < ϵ  starting from some n, we conclude that for any t
lim n | f n ( t ) g n ( t ) | = 0 .
Note that the function
ϕ t ( x ) = h ( t x )
is bounded and continuous. Therefore, the weak convergence condition  U n 1 / U  implies
lim n E ϕ t ( U n ) = E ϕ t ( 1 / U ) = E h ( t / U ) .
By the Fubini theorem, the right-hand side of the last equality is the characteristic function of the random variable  Y / U  for a copy of the standard normal random variable Y independent of U.
Since
| f n ( t ) E h ( t / U ) | | f n ( t ) g n ( t ) | + | g n ( t ) E h ( t / U ) | < 2 ϵ
for all  ϵ > 0  starting from some n,
lim n f n ( t ) = E h ( t / U ) ,
which completes the proof of the theorem for the estimator of the characteristic index r.
The statements of the theorem for the estimators of the form parameter  ν  and the scale parameter  δ  are proved in a completely similar way. The theorem is proved. □
Remark 6.
Similarly to Remarks 2–5, the statement of Theorem 2 remains valid in the cases  r < ϕ 3 / ϕ 1 ϕ 3 / ϕ 1 < r 0 0 r < ϕ 3 / ϕ 1  and  ν < 0  for the estimators  r ^ ( X N n ) = ± R ± ( K 2 ( X N n ) , K 4 ( X N n ) )  and  ν ^ ( X N n ) = ± V ± ( K 2 ( X N n ) , K 4 ( X N n ) )  with the corresponding modification of the normalizing constants in (15). The choice of the “correct” signs of the estimators is carried out using the algorithm for eliminating unnecessary solutions from Ref. [17]. The estimator for the unknown parameter δ is always defined by the formula
δ ^ ( X N n ) = exp ψ ( q ) K 1 ( X N n ) + ψ ( q ) r ^ ( X N n ) ψ ( p ) ν ^ ( X N n ) .
Let us introduce additional notation
s m m ( X N n ) σ m 2 ( r ^ ( X N n ) , ν ^ ( X N n ) , δ ^ ( X N n ) ) ; s m l ( X N n ) = s l m ( X N n ) σ m l ( r ^ ( X N n ) , ν ^ ( X N n ) , δ ^ ( X N n ) ) ; d r [ m ] ( X N n ) r ^ ( X N n ) l m ; d ν [ m ] ( X N n ) ν ^ ( X N n ) l m ; d δ [ m ] ( X N n ) δ ^ ( X N n ) l m ,
where  σ m 2 ( r , ν , δ )  and  σ m l ( r , ν , δ )  are defined in (6) and  r ^ ( X N n ) ν ^ ( X N n ) , and  δ ^ ( X N n )  satisfy the conditions of Theorem 2.
Theorem 2 implies a statement about the form of the asymptotic confidence intervals for unknown parameters of the digamma distribution. Denote by  u γ  the  ( 1 + γ ) / 2 -quantile of the limiting random variable  Y / U .
Corollary 1.
Suppose that the conditions of Theorem 2 are met; then the asymptotic confidence intervals with a confidence level γ based on the estimators  r ^ ( X N n ) ν ^ ( X N n ) , and  δ ^ ( X N n )  for the unknown parameters r, ν, and δ have the form
( A r ( X N n ) , B r ( X N n ) ) = r ^ ( X N n ) u γ n C r ( X N n ) , r ^ ( X N n ) + u γ n C r ( X N n ) ;
( A ν ( X N n ) , B ν ( X N n ) ) = ν ^ ( X N n ) u γ n C ν ( X N n ) , ν ^ ( X N n ) + u γ n C ν ( X N n ) ;
( A δ ( X N n ) , B δ ( X N n ) ) = δ ^ ( X N n ) u γ n C δ ( X N n ) , δ ^ ( X N n ) + u γ n C δ ( X N n ) ,
where
C r ( X N n ) = m = 1 4 l = 1 4 d r [ m ] ( X N n ) s m l ( X N n ) d r [ l ] ( X N n ) ;
C ν ( X N n ) = m = 1 4 l = 1 4 d ν [ m ] ( X N n ) s m l ( X N n ) d ν [ l ] ( X N n ) ;
C δ ( X N n ) = m = 1 4 l = 1 4 d δ [ m ] ( X N n ) s m l ( X N n ) d δ [ l ] ( X N n ) ,
and  s m l ( X N n ) d r [ m ] ( X N n ) d ν [ m ] ( X N n ) d δ [ m ] ( X N n )  are defined in (16).
The proof is completely analogous to the proof of Corollary 2 from Ref. [17].

4. Examples of Limit Distributions

Let us give a number of examples of possible limit distributions in Theorem 2.
As noted in Section 1, special forms of the negative binomial distribution have gained great popularity in modeling a random number of events. Since the negative binomial distribution is concentrated on non-negative integers, it cannot be directly used as a random sample size. We will consider such distributions with a shift by one, which will ensure the natural value of the sample size. According to the generalized Slutsky theorem, all conclusions concerning the asymptotic behavior of “shifted” distributions are equivalent to the statements about the asymptotics for sequences of random variables that have a classical negative binomial distribution or a mixed Poisson distribution with a structural gamma distribution.
Note that the gamma distribution belongs to the class of distributions with a scale parameter. It means that if  Λ Γ ( s , θ ) , then  Λ ^ = d Λ / θ Γ ( s , 1 ) . The following statements are based on the fact that for a standard Poisson process  N 1 ( t )  independent of the random variable  Λ ,
N 1 ( Λ n ) θ n Λ ^ , n .
Note also that if a random variable  ξ  has a generalized gamma distribution  G G ( v , s , θ )  with the density (4), then
1 ξ G G 2 v , s , 1 θ .
Denote by  Π ( Λ )  the mixed Poisson distribution whose structure is given by the random variable  Λ . To specify particular cases of Theorem 2, we consider the distribution  D ( θ )  degenerate at the point  θ , the gamma distribution  Γ ( s , θ ) , the exponential distribution  E ( θ ) Γ ( 1 , θ ) , and the scaled  χ 2 -distribution  χ 2 ( k , θ ) Γ ( k / 2 , θ ) k N  as the structural one. To determine the corresponding mixed Poisson distributions, consider the negative binomial distribution  N B ( p , 1 / ( 1 + θ ) )  whose partial probabilities are given by (3), and the geometric distribution  G ( 1 / ( 1 + θ ) ) N B ( 1 , 1 / ( 1 + θ ) ) . To determine the limit distributions, consider the type VII Pearson distribution  P 7 ( m , α ) m 1 / 2 α > 0 , with the density
f P 7 ( x ) = α 2 m 1 B ( m 1 / 2 , 1 / 2 ) ( α 2 + x 2 ) m ;
the Student distribution  S t ( n ) P 7 ( ( n + 1 ) / 2 , n ) ; and the Cauchy distribution  K ( α ) P 7 ( 1 , α ) .
For  b n = θ n , let us list several examples of limit distributions of the random variable  Y / U  from (15).
Let  N n 1 = d N 1 ( Λ n ) Π ( Λ n ) . Then, the limit random variable U in (14) coincides in distribution with  Λ ^ , and the distributions of the random variable  Y / U  have the form shown in Table 1.
Let us give some numerical examples of calculating the estimates of the parameters r ν , and  δ  of the digamma distribution  D i G ( r , ν , p , q , δ )  from the model samples. The concentration parameters p and q are fixed. The data given in Table 2 are obtained using the algorithm described in Ref. [17].
The pseudorandom sample size  N n  for each n is generated for the distributions  Π ( Λ n )  from Table 1. The simulation of pseudorandom samples from the digamma distribution is based on Relation (2).
Table 2 lists the values of the estimates  r ^ ( X N n ) ν ^ ( X N n ) , and  δ ^ ( X N n )  of the parameters r ν , and  δ , obtained by simulating a sample from the digamma distribution  D i G ( 0.5 ; 2.5 ; 2.4 ; 1.9 ; 1.0 ) , and the corresponding boundaries of the confidence intervals. The distributions of the random sample size are taken from Table 1 with  θ = 1  and  s = 2 .

5. Conclusions

This paper has considered the problem of estimating the parameters of the digamma distribution with a random sample size. The consideration of a random sample size is very important since the accumulation of a sufficient fixed amount of statistical data can often take an indefinite amount of time, and, sometimes, it is impossible, in principle. Therefore, it becomes natural to study the asymptotic behavior of statistics based on random size samples.
The digamma distribution is a generalization of popular distributions from the gamma and beta classes, as well as the gamma-exponential distribution. This paper has discussed a method for estimating unknown parameters of the digamma distribution based on the logarithmic cumulants. Assuming that the sample size is random, the weak convergence of the studied estimators to the scale mixtures of the normal law is proved. This result allows for the construction of asymptotic confidence intervals for the estimated parameters. It is shown that the asymptotic properties of the statistics can change radically when passing from a fixed sample size to a random one. In particular, it leads to heavier tails of the limit distribution. For example, the type VII Pearson distribution may appear to be a limiting distribution whose representatives may not have a mathematical expectation.
The results proposed in this paper concern the estimation of the characteristic index and the shape and scale parameters of the digamma distribution assuming that the concentration parameters are known. Naturally, the question arises about the form of statistical estimates in the case in which all five parameters are unknown. The equations for constructing the estimates contain polygamma functions with arguments depending on the concentration parameters. Theoretical methods for inverting polygamma functions are being actively developed at the present time, but, apparently, there are currently no effective tools suitable for use in the method under consideration. At the same time, polygamma functions have nice properties that make their inversion easy using numerical methods. The authors plan to continue their studies in this direction.

Author Contributions

Conceptualization, A.K. and O.S.; methodology, A.K. and O.S.; formal analysis, A.K. and O.S.; investigation, A.K. and O.S.; writing—original draft preparation, A.K. and O.S.; writing—review and editing, A.K. and O.S.; supervision, A.K. and O.S.; funding acquisition, O.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Russian Science Foundation, project no. 22-11-00212.

Data Availability Statement

Data sharing not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Feng, M.; Qu, H.; Yi, Z.; Kurths, J. Subnormal Distribution Derived From Evolving Networks with Variable Elements. IEEE Trans. Cybern. 2018, 48, 2556–2568. [Google Scholar]
  2. Iriarte, Y.A.; Varela, H.; Gómez, H.J.; Gómez, H.W. A Gamma-Type Distribution with Applications. Symmetry 2020, 12, 870. [Google Scholar] [CrossRef]
  3. Feng, M.; Deng, L.-J.; Chen, F.; Perc, M.; Kurths, J. The accumulative law and its probability model: An extension of the Pareto distribution and the log-normal distribution. Proc. R. Soc. 2020, 476, 20200019. [Google Scholar] [CrossRef] [PubMed]
  4. Barranco-Chamorro, I.; Iriarte, Y.A.; Gómez, Y.M.; Astorga, J.M.; Gómez, H.W. A Generalized Rayleigh Family of Distributions Based on the Modified Slash Model. Symmetry 2021, 13, 1226. [Google Scholar] [CrossRef]
  5. López-Rodríguez, F.; García-Sanz-Calcedo, J.; Moral-García, F.J.; García-Conde, A.J. Statistical Study of Rainfall Control: The Dagum Distribution and Applicability to the Southwest of Spain. Water 2019, 11, 453. [Google Scholar] [CrossRef] [Green Version]
  6. Santoro, K.I.; Gómez, H.J.; Barranco-Chamorro, I.; Gómez, H.W. Extended Half-Power Exponential Distribution with Applications to COVID-19 Data. Mathematics 2022, 10, 942. [Google Scholar] [CrossRef]
  7. Kudryavtsev, A.A.; Nedolivko, Y.N.; Shestakov, O.V. Main Probabilistic Characteristics of the Digamma Distribution and the Method of Estimating Its Parameters. Moscow Univ. Comput. Math. Cybern. 2022, 46, 79–86. [Google Scholar] [CrossRef]
  8. Amoroso, L. Ricerche intorno alla curva dei redditi. Ann. Mat. Pura Appl. 1925, 21, 123–159. [Google Scholar] [CrossRef]
  9. McDonald, J.B. Some Generalized Functions for the Size Distribution of Income. Econometrica 1984, 52, 647–665. [Google Scholar] [CrossRef]
  10. Kudryavtsev, A.A. On the representation of gamma-exponential and generalized negative binomial distributions. Inform. Appl. 2019, 13, 78–82. [Google Scholar]
  11. Kudryavtsev, A.A.; Shestakov, O.V. Digamma Distribution as a Limit for the Integral Balance Index. Moscow Univ. Comput. Math. Cybern. 2022, 46, 133–139. [Google Scholar] [CrossRef]
  12. Kudryavtsev, A.A. Bayesian balance models. Inform. Appl. 2018, 12, 18–27. [Google Scholar]
  13. Combes, C.; Ng, H.K.T. On parameter estimation for Amoroso family of distributions. Math. Comp. Sim. 2021, 191, 309–327. [Google Scholar] [CrossRef]
  14. Liu, S.; Gui, W. Estimating the Parameters of the Two-Parameter Rayleigh Distribution Based on Adaptive Type II Progressive Hybrid Censored Data with Competing Risks. Mathematics 2020, 8, 1783. [Google Scholar] [CrossRef]
  15. Kudryavtsev, A.; Shestakov, O. Asymptotically Normal Estimators for the Parameters of the Gamma-Exponential Distribution. Mathematics 2021, 9, 273. [Google Scholar] [CrossRef]
  16. Kudryavtsev, A.A.; Shestakov, O.V. A Method for Estimating Bent, Shape and Scale Parameters of the Gamma-Exponential Distribution. Inform. Appl. 2021, 15, 57–62. [Google Scholar]
  17. Kudryavtsev, A.; Shestakov, O. The Estimators of the Bent, Shape and Scale Parameters of the Gamma-Exponential Distribution and their Asymptotic Normality. Mathematics 2022, 10, 619. [Google Scholar] [CrossRef]
  18. Korolev, V.Y. Product representations for random variables with Weibull distributions and their applications. J. Math. Sci. 2016, 218, 298–313. [Google Scholar] [CrossRef]
  19. Gnedenko, B.V. On the estimation of unknown distribution parameters with a random number of independent observations. Tr. Tbilis. Mat. Inst. 1989, 92, 146–150. [Google Scholar]
  20. Korolev, V.Y.; Zeifman, A.I. Generalized negative binomial distributions as mixed geometric laws and related limit theorems. Lith. Math. J. 2019, 59, 366–388. [Google Scholar] [CrossRef] [Green Version]
  21. Wang, X.; Zhao, X.; Sun, J. A compound negative binomial distribution with mutative termination conditions based on a change point. J. Comput. Appl. Math. 2019, 351, 237–249. [Google Scholar] [CrossRef]
  22. Bhati, D.; Ahmed, I.S. On uniform-negative binomial distribution including Gauss hypergeometric function and its application in count regression modeling. Commun. Stat. Theory Methods 2021, 50, 3106–3122. [Google Scholar] [CrossRef]
  23. Zhang, J.; Wang, D.; Yang, K. A study of RCINAR(1) process with generalized negative binomial marginals. Commun. Stat. B Simul. Comput. 2020, 49, 1487–1510. [Google Scholar] [CrossRef]
  24. Mangiola, S.; Thomas, E.A.; Modrák, M.; Vehtari, A.; Papenfuss, A.T. Probabilistic outlier identification for RNA sequencing generalized linear models. NAR Genom. Bioinform. 2021, 3, lqab005. [Google Scholar] [CrossRef]
  25. Kendall, M.G.; Stuart, A. The Advanced Theory of Statistics, 3rd ed.; Griffin: London, UK, 1969; Volume 1. [Google Scholar]
  26. Korolev, V.Y.; Zeifman, A.I. On Convergence of the Distributions of Random Sequences with Independent Random Indexes to Variance–Mean Mixtures. Stoch. Model. 2016, 32, 414–432. [Google Scholar] [CrossRef] [Green Version]
Table 1. Special cases of the limit distribution.
Table 1. Special cases of the limit distribution.
  Λ   Π ( Λ n )   Y / U
  D ( θ )   Π ( θ n )   N ( 0 , 1 )
  E ( θ )   G 1 1 + θ n   S t ( 2 )
  χ 2 1 , θ   N B 1 2 , 1 1 + θ n   K 2
  Γ ( s , θ )   N B s , 1 1 + θ n   P 7 s + 1 2 , 2
Table 2. Examples of parameter estimates and boundaries of confidence intervals for a model distribution for  r = 0.5 ν = 2.5 , and  δ = 1.0 .
Table 2. Examples of parameter estimates and boundaries of confidence intervals for a model distribution for  r = 0.5 ν = 2.5 , and  δ = 1.0 .
  N n 1   r ^ ( X N n )   A r ( X N n )   B r ( X N n )   ν ^ ( X N n )   A ν ( X N n )   B ν ( X N n )   δ ^ ( X N n )   A δ ( X N n )   B δ ( X N n )
  Π ( 10 4 )   0.5754   0.0458   1.1051   2.5877   1.8475   3.3278   1.0159   0.8915   1.1403
  Π ( 10 5 )   0.4693   0.3526   0.5859   2.4633   2.3205   2.6061   0.9912   0.9630   1.0195
  Π ( 10 6 )   0.5032   0.4631   0.5433   2.5039   2.4525   2.5554   1.0005   0.9909   1.0101
  G 1 1 + 10 4   0.4073   0.2401   1.0549   2.3678   1.6620   3.0735   0.9767   0.8180   1.1355
  G 1 1 + 10 5   0.5613   0.1982   0.9243   2.5793   2.077   3.0808   1.0140   0.9284   1.0995
  G 1 1 + 10 6   0.4942   0.4392   0.5493   2.4927   2.4229   2.5626   0.9985   0.9853   1.0118
  N B 1 2 , 1 1 + 10 4   0.4575   2.9511   3.8662   2.4297   1.6338   6.4933   0.9884   0.1551   1.8217
  N B 1 2 , 1 1 + 10 5   0.4137   0.7791   1.6065   2.4056   1.0728   3.7384   0.9780   0.6893   1.2668
  N B 1 2 , 1 1 + 10 6   0.5107   0.2524   0.7690   2.5133   2.1781   2.8485   1.0024   0.9404   1.0644
  N B 2 , 1 1 + 10 4   0.5700   0.2545   0.8855   2.5935   2.1529   3.0341   1.0148   0.9407   1.0888
  N B 2 , 1 1 + 10 5   0.5317   0.3867   0.6767   2.5418   2.3483   2.7353   1.0079   0.9733   1.0424
  N B 2 , 1 1 + 10 6   0.5052   0.4848   0.5256   2.5056   2.4793   2.5319   1.0009   0.9960   1.0058
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kudryavtsev, A.; Shestakov, O. Limit Distributions for the Estimates of the Digamma Distribution Parameters Constructed from a Random Size Sample. Mathematics 2023, 11, 1778. https://doi.org/10.3390/math11081778

AMA Style

Kudryavtsev A, Shestakov O. Limit Distributions for the Estimates of the Digamma Distribution Parameters Constructed from a Random Size Sample. Mathematics. 2023; 11(8):1778. https://doi.org/10.3390/math11081778

Chicago/Turabian Style

Kudryavtsev, Alexey, and Oleg Shestakov. 2023. "Limit Distributions for the Estimates of the Digamma Distribution Parameters Constructed from a Random Size Sample" Mathematics 11, no. 8: 1778. https://doi.org/10.3390/math11081778

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop