A Penalized Empirical Likelihood Approach for Estimating Population Sizes under the Negative Binomial Regression Model

Ji, Yulu; Liu, Yang

doi:10.3390/math12172674

Open AccessFeature PaperArticle

A Penalized Empirical Likelihood Approach for Estimating Population Sizes under the Negative Binomial Regression Model

by

Yulu Ji

and

Yang Liu

^*

School of Mathematical Sciences, Soochow University, Suzhou 215006, China

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(17), 2674; https://doi.org/10.3390/math12172674

Submission received: 23 July 2024 / Revised: 22 August 2024 / Accepted: 25 August 2024 / Published: 28 August 2024

Download

Browse Figure

Review Reports Versions Notes

Abstract

:

In capture–recapture experiments, the presence of overdispersion and heterogeneity necessitates the use of the negative binomial regression model for inferring population sizes. However, within this model, existing methods based on likelihood and ratio regression for estimating the dispersion parameter often face boundary and nonidentifiability issues. These problems can result in nonsensically large point estimates and unbounded upper limits of confidence intervals for the population size. We present a penalized empirical likelihood technique for solving these two problems by imposing a half-normal prior on the population size. Based on the proposed approach, a maximum penalized empirical likelihood estimator with asymptotic normality and a penalized empirical likelihood ratio statistic with asymptotic chi-square distribution are derived. To improve numerical performance, we present an effective expectation-maximization (EM) algorithm. In the M-step, optimization for the model parameters could be achieved by fitting a standard negative binomial regression model via the R basic function glm.nb(). This approach ensures the convergence and reliability of the numerical algorithm. Using simulations, we analyze several synthetic datasets to illustrate three advantages of our methods in finite-sample cases: complete mitigation of the boundary problem, more efficient maximum penalized empirical likelihood estimates, and more precise penalized empirical likelihood ratio interval estimates compared to the estimates obtained without penalty. These advantages are further demonstrated in a case study estimating the abundance of black bears (Ursus americanus) at the U.S. Army’s Fort Drum Military Installation in northern New York.

Keywords:

population size; negative binomial regression model; penalized empirical likelihood; EM algorithm

MSC:

62F10

1. Introduction

The use of capture–recapture data to infer population sizes is widely sought after across multiple fields, such as ecology and epidemiology. The capture–recapture sampling dates back to fisheries [1,2] and is used to estimate the abundance of species in the ecosystem [3]. Additionally, the number of populations of drug or alcohol addicts can be inferred by considering each visit to institutions as one capture [4,5]. Similarly, epidemiologists may leverage multiple incomplete lists of patients to estimate disease prevalence [6].

In traditional practice, the Poisson regression model is commonly used for modeling the capture–recapture data. However, the literature has demonstrated that the Poisson regression model may perform inadequately when the number of captures is subject to overdispersion, which means that the corresponding expectation is larger than variance. In capture–recapture experiments, the overdispersion often arises from individual heterogeneity. For example, animal species may exhibit flocking and spatiotemporal aggregation behaviors [7]. As argued in ref. [8], the population size may be underestimated if the model used ignores heterogeneity and overdispersion. Without any adjustments, using the Poisson regression model can yield unreliable statistical inferences, leading to incorrect interpretations. In this situation, ref. [9] argued that the negative binomial regression model might be more suitable.

Various estimation methods have been investigated for capture–recapture data within the framework of the negative binomial regression model. Based on conditional likelihood, ref. [8] introduced a maximum likelihood estimator for the regression parameter. Additionally, they derived a Horvitz–Thompson type estimator and a corresponding Wald-type interval estimator for estimating the population size. Building on the recapture probability ratios, refs. [10,11,12] investigated issues related to model diagnostics and population size estimation. Their simulations and empirical investigations revealed that the negative binomial regression model outperforms the Poisson regression model in the presence of overdispersion. However, these methods face challenges, and at least two specific issues arise. First, the likelihood function may suffer from identification and boundary problems for the dispersion parameter, leading to nonsensically large values of the Horvitz–Thompson type estimator associated with large standard errors; see, for example, refs. [10,11,13]. Second, the Newton–Raphson procedure may not be reliable and even fail to converge for maximizing the conditional likelihood function, as illustrated in refs. [14,15].

Empirical likelihood offers an alternative approach to mitigating the two issues mentioned earlier. Using empirical likelihood techniques, ref. [16] investigated the full semiparametric likelihood inference for population sizes in capture–recapture studies. Additionally, this method was extended by refs. [17,18,19,20] to include the continuous time case, one inflation and missing covariates. Experience shows that the semiparametric empirical-likelihood-based method usually outperforms the conditional-likelihood-based method in numerical experiments.

However, our simulation studies indicate that the point and interval estimators derived using maximum empirical likelihood estimation are prone to boundary issues, particularly when a small proportion of individuals are captured and severe overdispersion is present. For instance, as shown in our simulation studies with

β_{0} = {(- 0.5, 0.3)}^{⊤}

,

k_{0} = 0.5

, and

N_{0} = 250

, where the capture probability is as low as 40%, nearly 67% of simulated cases yield unreasonable empirical likelihood ratio interval estimates whose upper limits exceed the number of captured individuals by over 100 times. This boundary problem is similarly observed in Section 4 for a case study. A potential cause of this issue may be that in such scenarios, the empirical log-likelihood ratio function flattens out as the population size increases.

The following sections are arranged as follows. Section 2 revisits the semiparametric empirical likelihood method within the negative binomial regression model and introduces a penalized empirical likelihood estimation approach to addressing the boundary problem. Furthermore, the EM algorithm is designed to enhance the reliability of the estimation process. Section 3 presents a number of simulations to illustrate how the proposed methods perform in the finite-sample settings. These methods are put into practice for analyzing the black bear data from New York in Section 4. Finally, a discussion is presented in Section 5.

2. Methodology

2.1. Model and Data

Suppose there are N individuals in a population. Each individual is characterized by the number of captures, denoted by Y, a nonnegative interger-valued variable. A naive approach to modeling Y is to assume a Poisson distribution,

Y \sim Poisson (θ)

, where

θ > 0

represents the rate or expected number of captures. The Poisson model inherently assumes that all individuals in the population are homogeneous, meaning they share the same rate parameter

θ

. In addition, as noted in the introduction, a key limitation of the Poisson model is its inability to handle overdispersed data, where the variance exceeds the mean.

To address the heterogeneity and overdispersion, one can assume that individuals have varying rates. In practice, a common approach is to model the distribution of these rates using a gamma distribution as a prior. Specifically,

θ \sim Gamma (r, p / (1 - p))

, where

r > 0

is the shape parameter and

p / (1 - p)

is the scale parameter with

p \in (0, 1)

. With this prior, the marginal probability mass function of Y can be derived in a closed form:

\begin{matrix} p_{Y} (y) = \int_{0}^{\infty} p (y ∣ θ) p (θ) d θ = \frac{Γ (y + r)}{Γ (y + 1) Γ (r)} p^{y} {(1 - p)}^{r}, y = 0, 1, 2, \dots . \end{matrix}

which corresponds to the Poisson–Gamma or negative binomial distribution in the probability textbook. This distribution models the number of successes before the rth failure occurs in a sequence of independent Bernoulli trials, with p representing the probability of success in each trial.

As highlighted in ref. [9], a common reparameterization of

p_{Y} (y)

is often used to interpret the counting processes in ecological and biodiversity studies. This reparameterization expresses

p_{Y} (y)

in terms of its mean

μ

and a dispersion or aggregation parameter k, which controls the variation in counts. By setting

p = μ / (k + μ)

and

k = r

,

p_{Y} (y)

can be reformulated as:

\begin{matrix} p_{Y} (y) = \frac{Γ (y + k)}{Γ (y + 1) Γ (k)} {(\frac{μ}{k + μ})}^{y} {(\frac{k}{k + μ})}^{k} . \end{matrix}

When the individual covariates, denoted by

X

, are available, it becomes necessary to account for the heterogeneity induced by these covariates. To do so, a parametric model is used, specifically

μ (x; β) = exp (β^{⊤} x)

, which relates the mean parameter to the individual covariates. Thus, given

X = x

, the conditional probability mass function of Y is expressed by:

\begin{matrix} P (Y = y ∣ X = x) = \frac{Γ (y + k)}{Γ (y + 1) Γ (k)} {\{\frac{μ (x; β)}{k + μ (x; β)}\}}^{y} {\{\frac{k}{k + μ (x; β)}\}}^{k} = : f (y, x; β, k), \end{matrix}

(1)

where

β

represents the unknown regression coefficients and

k > 0

represents the dispersion parameter. This formulation is referred to as the negative binomial regression model. As a special case, Equation (1) reduces to

p_{Y} (y)

when all coefficients, except the intercept, are zero. The negative binomial regression model also includes the geometric regression model when

k = 1

and reduces to the Poisson regression model as

k \to \infty

.

Given

X = x

, the conditional expectation of Y is equal to

μ (x; β)

, while the conditional variance is expressed as

μ (x; β) + {μ (x; β)}^{2} / k

, indicating a quadratic relationship. The parameter k controls the degree of overdispersion: as k decreases, the variance increases, leading to greater overdispersion. Overdispersion is commonly observed in capture–recapture studies, where the variance significantly exceeds the mean. Consequently, the negative binomial regression model is often more appropriate for modeling capture–recapture data under conditions of severe overdispersion, as compared to the Poisson model.

Because the event

{Y = 0}

is unobservable in capture–recapture studies, the zero-truncated version of model (1) is considered:

\begin{matrix} P (Y = y ∣ Y > 0, X = x) & = & \frac{P (Y = y ∣ X = x)}{P (Y > 0 ∣ X = x)} \\ = & \frac{f (y, x; β, k)}{1 - ϕ (x; β, k)}, y = 1, 2, \dots, \end{matrix}

(2)

where

ϕ (x; β, k) = {[k / {k + μ (x; β)}]}^{k}

represents the conditional probability that an individual with a covariate

x

is not captured at all.

Consider a study that captured n distinct individuals, with

(x_{1}, \dots, x_{n})

and

(y_{1}, \dots, y_{n})

denoting their individual covariates and capture frequencies, respectively. Under the model (2), ref. [8] proposed a maximum conditional likelihood estimator

(\tilde{β}, \tilde{k})

by maximizing:

\prod_{i = 1}^{n} \frac{f (y_{i}, x_{i}; β, k)}{1 - ϕ (x_{i}; β, k)} .

According to the principle of inverse probability weighting, the Horvitz–Thompson type estimator of N is defined as

\tilde{N} = \sum_{i = 1}^{n} {1 - ϕ (x_{i}; \tilde{β}, \tilde{k})}^{- 1}

. However,

\tilde{N}

might be inflated due to small detection probabilities.

2.2. Semiparametric Empirical Likelihood

The semiparametric empirical likelihood, initially derived from ref. [16], is an appealing technique for implementing the full likelihood method when capture–recapture data contain individual covariates. Taking the negative binomial regression model as an example, we provide a brief introduction to this technique.

Considering that n distinct individuals out of a total of N individuals were captured, n follows a binomial distribution and the corresponding probability is as follows:

P (n) = (\binom{N}{n}) {(1 - α)}^{n} α^{N - n},

where

α = P (Y = 0)

represents the probability that a generic individual was not captured at all. For the given n individuals, the conditional probability of their covariates and capture counts is as follows:

\prod_{i = 1}^{n} \frac{P (Y = y_{i}, X = x_{i})}{P (Y > 0)} = \prod_{i = 1}^{n} \frac{f (y_{i}, x_{i}; β, k) P (X = x_{i})}{1 - α} .

Multiplying these two expressions yields the full likelihood function:

\begin{matrix} (\binom{N}{n}) α^{N - n} \times \prod_{i = 1}^{n} {f (y_{i}, x_{i}; β, k) P (X = x_{i})} . \end{matrix}

(3)

In Equation (3), the marginal probability

P (X = x_{i})

is unknown and shall be addressed by the empirical likelihood method; see refs. [21,22] for more details. Technically, we assume that

P (X = x_{i}) = p_{i}

for

i = 1, 2, \dots, n

, where

p_{i} \in (0, 1)

is subject to the constraint

\sum_{i = 1}^{n} p_{i} = 1

. With this substitution, we call the full likelihood the semiparametric empirical likelihood and refer to its logarithm as the empirical log-likelihood function, namely:

\begin{matrix} \tilde{ℓ} (N, β, k, α, {p_{i}}) & = log (\binom{N}{n}) + (N - n) log (α) + \sum_{i = 1}^{n} log {f (y_{i}, x_{i}; β, k)} \\ + \sum_{i = 1}^{n} log (p_{i}) . \end{matrix}

By the definition of

α

and the iterated expectation theorem, it follows that

α = E {P (Y = 0 ∣ X)}

, or equivalently:

\sum_{i = 1}^{n} {ϕ (x_{i}; β, k) - α} p_{i} = 0 .

With the constraints for

p_{i}

’s, the profile empirical log-likelihood function can be derived using the Lagrange multiplier method on Equation (3):

\begin{matrix} ℓ (N, β, k, α) = & log (\binom{N}{n}) + (N - n) log (α) + \sum_{i = 1}^{n} log (\binom{y_{i} + k - 1}{y_{i}}) \\ + \sum_{i = 1}^{n} [k log \{\frac{k}{k + μ (x_{i}; β)}\} + y_{i} log \{\frac{μ (x_{i}; β)}{k + μ (x_{i}; β)}\}] \\ - \sum_{i = 1}^{n} log [1 + ξ {ϕ (x_{i}; β, k) - α}] - n log (n), \end{matrix}

where

ξ

is the Lagrange multiplier, satisfying:

\sum_{i = 1}^{n} \frac{ϕ (x_{i}; β, k) - α}{1 + ξ {ϕ (x_{i}; β, k) - α}} = 0 .

Notice that there are a finite number of unknown parameters in the profile empirical log-likelihood function. By maximizing this function, we obtain the maximum empirical likelihood estimator, expressed as

(\hat{N}, \hat{β}, \hat{k}, \hat{α}) = arg max {ℓ (N, β, k, α)} .

2.3. Penalized Empirical Likelihood Inference

When the number of captures exhibits severe overdispersion, both the estimators

\tilde{N}

and

\hat{N}

, proposed in Section 2.1 and Section 2.2, respectively, may exhibit spuriously large values, potentially leading to misleading conclusions. This issue has been addressed in ref. [13] (p. 84) for the Horvitz–Thompson type estimator. Our simulation studies further confirm that the empirical likelihood estimators may also suffer from the boundary problem. This issue may arise due to the limited information available about the population size, causing the profile empirical log-likelihood to fail in distinguishing between different values of large N.

To mitigate this problem, we intuitively incorporate prior information on the population size to reduce the probability of large values. We achieve this by augmenting the likelihood functions with an appropriate penalty term. Correspondingly, the penalized empirical log-likelihood function and its profile version are defined as:

\begin{matrix} \begin{matrix} {\tilde{ℓ}}_{p} (N, β, k, α, {p_{i}}) & = \tilde{ℓ} (N, β, k, α, {p_{i}}) + f_{p} (N), \\ ℓ_{p} (N, β, k, α) & = ℓ (N, β, k, α) + f_{p} (N) . \end{matrix} \end{matrix}

(4)

where the penalty term

f_{p} (N)

takes the form of

- C {(N - ν)}^{2} I (N > ν)

, where

ν

is a lower bound of N,

C \geq 0

is a tuning parameter, and

I (\cdot)

is the indicator function. For specific values of

ν

and C, a maximum penalized empirical likelihood estimator is proposed, namely,

({\hat{N}}_{p}, {\hat{β}}_{p}, {\hat{k}}_{p}, {\hat{α}}_{p}) = arg max {ℓ_{p} (N, β, k, α)} .

From a Bayesian perspective, adding the penalty term

f_{p} (N)

into the log-likelihood is equivalent to imposing a prior for N that has a mixture of the half-normal distribution

N (ν, {(2 C)}^{- 1})

for

N > ν

and a uniform distribution

U (n, ν)

for

n \leq N \leq ν

. In other words, this penalty has no effect on the likelihood when

n \leq N \leq ν

and gradually decreases the likelihood when

N > ν

. The larger the population size, the more pronounced the decrease. Consequently, the penalized method encourages large values of

\hat{N}

to shrink towards

ν

. In practice, we recommend using the Chao estimator as the lower bound

ν

; see ref. [23] for details about this estimator. Alternatively, the generalized Chao estimator proposed in ref. [24] can also be considered.

To derive the large-sample properties of the estimator

({\hat{N}}_{p}, {\hat{β}}_{p}, {\hat{k}}_{p}, {\hat{α}}_{p})

, we define some notation when the parameter vector

(N, β, k, α)

takes its true value, namely,

(N_{0}, β_{0}, k_{0}, α_{0})

. Define

ψ (x; β_{0}, k_{0}) = - log \{1 + μ (x; β_{0}) / k_{0}\} + {\{1 + k_{0} / μ (x; β_{0})\}}^{- 1}

,

φ = E {1 - ϕ (X; β_{0}, k_{0})}^{- 1}

, and:

\begin{matrix} S_{2} (y + k_{0} - 1, y) = \{\begin{matrix} - \sum_{k = k_{0}}^{y + k_{0} - 1} \frac{1}{k^{2}}, & y = 1, 2, \dots \\ 0, & y = 0 \end{matrix} . \end{matrix}

Let:

\begin{matrix} W = [\begin{matrix} - V_{11} & 0_{s \times 1}^{⊤} & 0 & - V_{14} \\ 0_{s \times 1} & - V_{22} + V_{25} V_{55}^{- 1} V_{52} & - V_{23} + V_{25} V_{55}^{- 1} V_{53} & - V_{24} + V_{25} V_{55}^{- 1} V_{54} \\ 0 & - V_{32} + V_{35} V_{55}^{- 1} V_{52} & - V_{33} + V_{35} V_{55}^{- 1} V_{53} & - V_{34} + V_{35} V_{55}^{- 1} V_{54} \\ - V_{41} & - V_{42} + V_{45} V_{55}^{- 1} V_{52} & - V_{43} + V_{45} V_{55}^{- 1} V_{53} & - V_{44} + V_{45} V_{55}^{- 1} V_{54} \end{matrix}], \end{matrix}

(5)

where

V_{11} = 1 - α_{0}^{- 1}

,

V_{41} = V_{14} = α_{0}^{- 1}

,

V_{44} = φ - α_{0}^{- 1}

,

V_{45} = V_{54} = {(1 - α_{0})}^{2} φ

,

V_{55} = {(1 - α_{0})}^{4} φ - {(1 - α_{0})}^{3}

, and:

\begin{matrix} V_{22} & = & E [\{\frac{k_{0} ϕ (X; β_{0}, k_{0}) μ (X; β_{0})}{1 - ϕ (X; β_{0}, k_{0})} - k_{0} - μ (X; β_{0}, k_{0})\} \frac{k_{0} μ (X; β_{0}) X^{\otimes 2}}{{k_{0} + μ (X; β_{0})}^{2}}], \\ V_{23} & = & V_{32}^{⊤} = - E [\frac{k_{0} ϕ (X; β_{0}, k_{0}) ψ (X; β_{0}, k_{0}) μ (X; β_{0}) X}{{1 - ϕ (X; β_{0}, k_{0})} {k_{0} + μ (X; β_{0})}}], \\ V_{33} & = & E [\frac{ϕ (X; β_{0}, k_{0}) ψ^{2} (X; β_{0}, k_{0})}{1 - ϕ (X; β_{0}, k_{0})} + \frac{k_{0} μ (X; β_{0}) + μ^{2} (X; β_{0})}{k_{0} {k_{0} + μ (X; β_{0})}^{2}}] \\ + E {S_{2} (Y + k_{0} - 1, Y)}, \\ V_{24} & = & E [\frac{ϕ (X; β_{0}, k_{0})}{1 - ϕ (X; β_{0}, k_{0})} \cdot \frac{k_{0} μ (X; β_{0}) X}{k_{0} + μ (X; β_{0})}], V_{25} = V_{52}^{⊤} = {(1 - α_{0})}^{2} V_{24}, \\ V_{34} & = & V_{43} = - E \{\frac{ϕ (X; β_{0}, k_{0}) ψ (X; β_{0}, k_{0})}{1 - ϕ (X; β_{0}, k_{0})}\}, V_{35} = V_{53} = {(1 - α_{0})}^{2} V_{34} . \end{matrix}

The following theorem presents the large-sample properties of the maximum penalized empirical likelihood estimator

({\hat{N}}_{p}, {\hat{β}}_{p}, {\hat{k}}_{p}, {\hat{α}}_{p})

associated with the penalized empirical likelihood ratio statistic of N.

Theorem 1.

Suppose that

k_{0} > 0

,

0 < α_{0} < 1

, and the tuning parameter satisfies

C = O (N^{- 2})

. If

W

is nonsingular and

φ < \infty

, as

N_{0} \to \infty

:

(a): $\sqrt{N_{0}} {log ({\hat{N}}_{p}) - log (N_{0}), {({\hat{β}}_{p} - β_{0})}^{⊤}, {\hat{k}}_{p} - k_{0}, {\hat{α}}_{p} - α_{0}}^{⊤} \overset{d}{⟶} N (0, W^{- 1})$ ;
(b): $\sqrt{N_{0}} ({\hat{N}}_{p} / N_{0} - 1) \overset{d}{⟶} N (0, σ^{2})$ , where

$\begin{matrix} σ^{2} = φ - 1 - [\begin{matrix} V_{42} & V_{43} \end{matrix}] {[\begin{matrix} V_{22} & V_{23} \\ V_{32} & V_{33} \end{matrix}]}^{- 1} [\begin{matrix} V_{24} \\ V_{34} \end{matrix}]; \end{matrix}$
(c): $2 {ℓ_{p} ({\hat{N}}_{p}, {\hat{β}}_{p}, {\hat{k}}_{p}, {\hat{α}}_{p}) - {max}_{(β, k, α)} ℓ_{p} (N, β, k, α)} \overset{d}{⟶} χ_{1}^{2}$ , where $χ_{1}^{2}$ denotes the standard chi-square distribution.

Proof.

As the proposed semiparametric empirical likelihood approach can be seen as an extension of the EL method of ref. [16] to the negative binomial regression model, the proof of Theorem 1 is very similar to those of Theorem 1 and Corollary 1 in ref. [16]. Here, we only highlight the difference and the formulae of

V

and

Σ

in our framework.

We first argue that

ξ_{0} = - {(1 - α_{0})}^{- 1}

is the limit of

\hat{ξ}

, where

\hat{ξ}

is the solution to:

\sum_{i = 1}^{n} \frac{ϕ (x_{i}; {\hat{β}}_{p}, {\hat{k}}_{p}) - {\hat{α}}_{p}}{1 + ξ {ϕ (x_{i}; {\hat{β}}_{p}, {\hat{k}}_{p}) - {\hat{α}}_{p}}} = 0 .

For this purpose, we define a function:

\begin{matrix} ℏ (N, β, k, α, ξ) & = & log (\binom{N}{n}) + (N - n) log (α) - \sum_{i = 1}^{n} log [1 + ξ {ϕ (x_{i}; β, k) - α}] \\ + \sum_{i = 1}^{n} [log (\binom{y_{i} + k - 1}{y_{i}}) + k log \{\frac{k}{k + μ (x_{i}; β)}\} + y_{i} log \{\frac{μ (x_{i}; β)}{k + μ (x_{i}; β)}\}] \\ - n log (n) + f_{p} (N) . \end{matrix}

It can be seen that

ℓ_{p} (N, β, k, α) = ℏ (N, β, k, α, ξ_{*}),

where

ξ_{*}

is the solution to

\partial ℏ / \partial ξ = 0

. The first partial derivatives of

ℏ (N, β, k, α, ξ)

with respect to

α

and

ξ

are:

\begin{matrix} \frac{\partial ℏ}{\partial α} & = & \frac{N - n}{α} + ξ \sum_{i = 1}^{n} \frac{1}{1 + ξ {ϕ (x_{i}; β, k) - α}}, \\ \frac{\partial ℏ}{\partial ξ} & = & - \sum_{i = 1}^{n} \frac{ϕ (x_{i}; β, k) - α}{1 + ξ {ϕ (x_{i}; β, k) - α}} . \end{matrix}

Setting the above equations to zero gives

\hat{ξ} = - ({\hat{N}}_{p} - n) / (n {\hat{α}}_{p}) .

Since

α_{0}

is the probability of never being captured, n follows a Binomial distribution Bi(

N_{0}, 1 - α_{0}

). When

({\hat{N}}_{p}, {\hat{α}}_{p})

is consistent, it follows from the strong law of large numbers that as

N_{0} \to \infty

:

\hat{ξ} = - \frac{\hat{N} / N_{0} - n / N_{0}}{(n / N_{0}) {\hat{α}}_{p}} \overset{p}{⟶} - \frac{1 - (1 - α_{0})}{(1 - α_{0}) α_{0}} = - \frac{1}{1 - α_{0}} = ξ_{0},

where

\overset{p}{⟶}

denotes convergence in probability.

Below, we derive the formulae of

V

and

Σ

. Let

γ = {(γ_{1}, γ_{2}^{⊤}, γ_{3}, γ_{4}, γ_{5})}^{⊤}

, with

γ_{1} = \sqrt{N_{0}} (N / N_{0} - 1), γ_{2} = \sqrt{N_{0}} (β - β_{0}), γ_{3} = \sqrt{N_{0}} (k - k_{0}), γ_{4} = \sqrt{N_{0}} (α - α_{0})

,

γ_{5} = \sqrt{N_{0}} (ξ - ξ_{0})

. Define

H (γ) = ℏ (N_{0} + N_{0}^{1 / 2} γ_{1}, β_{0} + N_{0}^{- 1 / 2} γ_{2}, k_{0} + N_{0}^{- 1 / 2} γ_{3}, α_{0} + N_{0}^{- 1 / 2} γ_{4}, ξ_{0} + N_{0}^{- 1 / 2} γ_{5}) .

According to Lemma 2 in the Supplementary Material of [16], deriving the formula of

V

is equivalent to calculating the first two partial derivatives of

H (γ)

with respect to

γ

at

0

. It follows from the law of large numbers and the central limit theorem that:

\begin{matrix} \frac{\partial H (0)}{\partial γ_{1}} & = & N_{0}^{1 / 2} {S_{1} (N_{0}, n) + log (α_{0}) + f_{p}^{'} (N_{0})} = N_{0}^{1 / 2} (\frac{n / N_{0} - 1}{α_{0}} + 1) + O_{p} (N_{0}^{- 1 / 2}), \\ \frac{\partial H (0)}{\partial γ_{2}} & = & N_{0}^{- 1 / 2} \sum_{i = 1}^{n} [\{- \frac{μ (x_{i}; β_{0})}{1 - ϕ (x_{i}; β_{0}, k_{0})} + y_{i}\} \frac{k_{0} x_{i}}{k_{0} + μ (x_{i}; β_{0})}], \\ \frac{\partial H (0)}{\partial γ_{3}} & = & N_{0}^{- 1 / 2} \sum_{i = 1}^{n} \{\frac{ψ (x; β_{0}, k_{0})}{1 - ϕ (x_{i}; β_{0}, k_{0})} - \frac{y_{i}}{k_{0} + μ (x_{i}; β_{0})} + S_{1} (y_{i} + k_{0} - 1, y_{i})\}, \\ \frac{\partial H (0)}{\partial γ_{4}} & = & N_{0}^{- 1 / 2} \{\frac{N_{0} - n}{α_{0}} - \sum_{i = 1}^{n} \frac{1}{1 - ϕ (x_{i}; β_{0}, k_{0})}\}, \\ \frac{\partial H (0)}{\partial γ_{5}} & = & - N_{0}^{- 1 / 2} (1 - α_{0}) \sum_{i = 1}^{n} \frac{ϕ (x_{i}; β_{0}, k_{0}) - α_{0}}{1 - ϕ (x_{i}; β_{0}, k_{0})}, \end{matrix}

where the first equation uses the result (a) of Lemma A1 and Equation (A1) of Lemma A2 in the Appendix A. Using the result (b) of Lemma A1 and Equation (A2) of Lemma A2, we have:

\frac{\partial^{2} H (0)}{\partial γ_{1}^{2}} = N_{0} S_{2} (N_{0}, n) + N_{0} f_{p}^{″} (N_{0}) = 1 - α_{0}^{- 1} + O_{p} (N_{0}^{- 1 / 2}) .

Similarly, it can be verified that

\partial^{2} H (0) / (\partial γ_{4} \partial γ_{1}) = \partial^{2} H (0) / (\partial γ_{1} \partial γ_{4}) = α_{0}^{- 1} + O_{p} (N_{0}^{- 1 / 2})

and:

\begin{matrix} \frac{\partial^{2} H (0)}{\partial γ_{2} \partial γ_{2}^{⊤}} & = & N_{0}^{- 1} \sum_{i = 1}^{n} ([\frac{k_{0} ϕ (x_{i}; β_{0}, k_{0}) μ (x_{i}; β_{0})}{{1 - ϕ (x_{i}; β_{0}, k_{0})}^{2}} - \frac{k_{0}}{1 - ϕ (x_{i}; β_{0}, k_{0})} - y_{i}] \frac{k_{0} μ (x_{i}; β_{0}) x_{i}^{\otimes 2}}{{k_{0} + μ (x_{i}; β_{0})}^{2}}), \\ \frac{\partial^{2} H (0)}{\partial γ_{2} \partial γ_{3}} & = & {\{\frac{\partial^{2} H (0)}{\partial γ_{3} \partial γ_{2}^{⊤}}\}}^{⊤} = N_{0}^{- 1} \sum_{i = 1}^{n} [- \frac{k_{0} ϕ (x_{i}; β_{0}, k_{0}) ψ (x_{i}; β_{0}, k_{0})}{{\{1 - ϕ (x_{i}; β_{0}, k_{0})\}}^{2}} + \frac{y_{i}}{k_{0} + μ (x_{i}; β_{0})} \\ - \frac{1}{1 - ϕ (x_{i}; β_{0}, k_{0})} \cdot \frac{μ (x_{i}; β_{0})}{k_{0} + μ (x_{i}; β_{0})}] \frac{μ (x_{i}; β_{0}) x_{i}}{k_{0} + μ (x_{i}; β_{0})}, \\ \frac{\partial^{2} H (0)}{\partial γ_{2} \partial γ_{4}} & = & {\{\frac{\partial^{2} H (0)}{\partial γ_{4} \partial γ_{2}^{⊤}}\}}^{⊤} = N_{0}^{- 1} \sum_{i = 1}^{n} \frac{ϕ (x_{i}; β_{0}, k_{0})}{{1 - ϕ (x_{i}; β_{0}, k_{0})}^{2}} \cdot \frac{k_{0} μ (x_{i}; β_{0}) x_{i}}{k_{0} + μ (x_{i}; β_{0})}, \\ \frac{\partial^{2} H (0)}{\partial γ_{2} \partial γ_{5}} & = & {\{\frac{\partial^{2} H (0)}{\partial γ_{5} \partial γ_{2}^{⊤}}\}}^{⊤} = {(1 - α_{0})}^{2} N_{0}^{- 1} \sum_{i = 1}^{n} \frac{ϕ (x_{i}; β_{0}, k_{0})}{{1 - ϕ (x_{i}; β_{0}, k_{0})}^{2}} \cdot \frac{k_{0} μ (x_{i}; β_{0}) x_{i}}{k_{0} + μ (x_{i}; β_{0})}, \\ \frac{\partial^{2} H (0)}{\partial γ_{3}^{2}} & = & N_{0}^{- 1} \sum_{i = 1}^{n} [\frac{ϕ (x_{i}; β_{0}, k_{0}) ψ^{2} (x_{i}; β_{0}, k_{0})}{{1 - ϕ (x_{i}; β_{0}, k_{0})}^{2}} + \frac{1}{1 - ϕ (x_{i}; β_{0}, k_{0})} \cdot \frac{μ^{2} (x_{i}; β_{0})}{k_{0} {k_{0} + μ (x_{i}; β_{0}, k_{0})}^{2}} \\ + \frac{y_{i}}{{k_{0} + μ (x_{i}; β_{0}, k_{0})}^{2}} + S_{2} (y_{i} + k_{0} - 1, y_{i})], \\ \frac{\partial^{2} H (0)}{\partial γ_{3} \partial γ_{4}} & = & \frac{\partial^{2} H (0)}{\partial γ_{4} \partial γ_{3}} = - N_{0}^{- 1} \sum_{i = 1}^{n} \frac{ϕ (x_{i}; β_{0}, k_{0}) ψ (x_{i}; β_{0}, k_{0})}{{1 - ϕ (x_{i}; β_{0}, k_{0})}^{2}}, \\ \frac{\partial^{2} H (0)}{\partial γ_{3} \partial γ_{5}} & = & \frac{\partial^{2} H (0)}{\partial γ_{5} \partial γ_{3}} = - {(1 - α_{0})}^{2} N_{0}^{- 1} \sum_{i = 1}^{n} \frac{ϕ (x_{i}; β_{0}, k_{0}) ψ (x_{i}; β_{0}, k_{0})}{{1 - ϕ (x_{i}; β_{0}, k_{0})}^{2}}, \\ \frac{\partial^{2} H (0)}{\partial γ_{4}^{2}} & = & N_{0}^{- 1} \sum_{i = 1}^{n} \frac{1}{{1 - ϕ (x_{i}; β_{0}, k_{0})}^{2}} - \frac{1 - n / N_{0}}{α_{0}^{2}}, \\ \frac{\partial^{2} H (0)}{\partial γ_{4} \partial γ_{5}} & = & \frac{\partial^{2} H (0)}{\partial γ_{5} \partial γ_{4}} = {(1 - α_{0})}^{2} N_{0}^{- 1} \sum_{i = 1}^{n} \frac{1}{{1 - ϕ (x_{i}; β_{0}, k_{0})}^{2}}, \\ \frac{\partial^{2} H (0)}{\partial γ_{5}^{2}} & = & {(1 - α_{0})}^{4} N_{0}^{- 1} \sum_{i = 1}^{n} \frac{1}{{1 - ϕ (x_{i}; β_{0}, k_{0})}^{2}} - {(1 - α_{0})}^{3} . \end{matrix}

With the arguments of Lemma A3 in the Appendix A, we have that the leading term of

\partial^{2} H (0) / (\partial γ \partial γ^{⊤})

is as follows:

\begin{matrix} \frac{\partial^{2} H (0)}{\partial γ \partial γ^{⊤}} = V + O_{p} (N_{0}^{- 1 / 2}), V = [\begin{matrix} V_{11} & 0^{⊤} & 0 & V_{14} & 0 \\ 0 & V_{22} & V_{23} & V_{24} & V_{25} \\ 0 & V_{32} & V_{33} & V_{34} & V_{35} \\ V_{41} & V_{42} & V_{43} & V_{44} & V_{45} \\ 0 & V_{52} & V_{53} & V_{54} & V_{55} \end{matrix}], \end{matrix}

where

V_{i j}

s are the same as those defined in Equation (5).

Next, we derive the formula of

Σ

. Define

u_{n} = {(u_{n 1}, u_{n 2}^{⊤}, u_{n 3}, u_{n 4}, u_{n 5})}^{⊤}

, where:

u_{n 1} = N_{0}^{1 / 2} (\frac{n / N_{0} - 1}{α_{0}} + 1), u_{n 2} = \frac{\partial H (0)}{\partial γ_{2}}, u_{n 3} = \frac{\partial H (0)}{\partial γ_{3}}, u_{n 4} = \frac{\partial H (0)}{\partial γ_{4}}, u_{n 5} = \frac{\partial H (0)}{\partial γ_{5}} .

It can be verified that

\partial H (0) / \partial γ = u_{n} + O_{p} (N_{0}^{- 1 / 2})

and

E (u_{n}) = 0

. Here, we only verify that

E (u_{n 2}) = 0

and

E (u_{n 3}) = 0

. In fact, it follows Lemma A3 that:

\begin{matrix} E (u_{n 2}) & = & N_{0}^{1 / 2} E \{- \frac{k_{0} X μ (X; β_{0})}{k_{0} + μ (X; β_{0})} + \frac{k_{0} X E (Y ∣ X)}{k_{0} + μ (X; β_{0})}\} = 0, \\ E (u_{n 3}) & = & N_{0}^{1 / 2} E [ψ (X; β_{0}, k_{0}) - \frac{E (Y ∣ X)}{k_{0} + μ (X; β_{0})} + E {S_{1} (Y + k_{0} - 1, Y) ∣ X}] \\ = & N_{0}^{1 / 2} E [ψ (X; β_{0}, k_{0}) - \frac{μ (X; β_{0})}{k_{0} + μ (X; β_{0})} - log \{\frac{k_{0}}{k_{0} + μ (X; β_{0})}\}] = 0, \end{matrix}

where the last second equation uses Equation (A3) of Lemma A4 in the Appendix A.

For the covariance matrix of

u_{n}

, it follows Lemma A4 that:

\begin{matrix} V ar (u_{n 1}) & = & α_{0}^{- 1} - 1, C ov (u_{n 4}, u_{n 1}) = - α_{0}^{- 1}, \\ C ov (u_{n 2}, u_{n 1}) & = & C ov (u_{n 2}, u_{n 4}) = C ov (u_{n 2}, u_{n 5}) = 0, \\ C ov (u_{n 3}, u_{n 1}) & = & C ov (u_{n 3}, u_{n 4}) = C ov (u_{n 3}, u_{n 5}) = C ov (u_{n 5}, u_{n 1}) = 0, \\ V ar (u_{n 2}) & = & E {[\frac{1}{N_{0}} \sum_{i = 1}^{N_{0}} \{- \frac{μ (X_{i}; β_{0})}{1 - ϕ (X_{i}; β_{0}, k_{0})} + Y_{i}\} \frac{k_{0} X_{i} I (Y_{i} > 0)}{k_{0} + μ (X_{i}; β_{0})}]}^{\otimes 2} \\ = & E [{\{- \frac{μ (X; β_{0})}{1 - ϕ (X; β_{0}, k_{0})} + Y\}}^{2} \frac{k_{0}^{2} I (Y > 0) X^{\otimes 2}}{{k_{0} + μ (X; β_{0})}^{2}}] \\ = & E [\{\frac{μ^{2} (X; β_{0})}{{1 - ϕ (X; β_{0}, k_{0})}^{2}} + Y^{2} - \frac{2 Y μ (X; β_{0})}{1 - ϕ (X; β_{0}, k_{0})}\} \frac{k_{0}^{2} I (Y > 0) X^{\otimes 2}}{{k_{0} + μ (X; β_{0})}^{2}}] \\ = & E [\{- \frac{k_{0} μ (X; β_{0}) ϕ (X; β_{0}, k_{0})}{1 - ϕ (X; β_{0}, k_{0})} + k_{0} + μ (X; β_{0})\} \frac{k_{0} μ (X; β_{0}) X^{\otimes 2}}{{k_{0} + μ (X; β_{0})}^{2}}] \\ = & - V_{22}, \\ C ov (u_{n 2}, u_{n 3}) & = & E [\{- \frac{μ (X; β_{0})}{1 - ϕ (X; β_{0}, k_{0})} + Y\} {\frac{ψ (x; β_{0}, k_{0})}{1 - ϕ (X; β_{0}, k_{0})} - \frac{Y}{k_{0} + μ (X; β_{0})} \\ + S_{1} (Y + k_{0} - 1, Y)} \cdot \frac{k_{0} X I (Y > 0)}{k_{0} + μ (X; β_{0})}] \\ = & E [{\frac{ϕ (X; β_{0}, k_{0}) ψ (x; β_{0}, k_{0})}{1 - ϕ (x; β_{0}, k_{0})} - \frac{1}{k_{0}} + \frac{E [Y S_{1} (Y + k_{0} - 1, Y) ∣ X]}{μ (X; β_{0})} \\ + log \frac{k_{0}}{k_{0} + μ (X; β_{0})}} \frac{k_{0} μ (X; β_{0}) X}{k_{0} + μ (X; β_{0})}] = - V_{23}, \\ V ar (u_{n 3}) & = & E [{\{\frac{ψ (X; β_{0}, k_{0})}{1 - ϕ (X; β_{0}, k_{0})} - \frac{Y}{k_{0} + μ (X; β_{0})} + S_{1} (Y + k_{0} - 1, Y)\}}^{2} I (Y > 0)] \\ = & E [- \frac{{ψ (X; β_{0}, k_{0})}^{2}}{1 - ϕ (X; β_{0}, k_{0})} + \frac{(k_{0} + 1) {μ (X; β_{0}, k_{0})}^{2} + k_{0} μ (X; β_{0}, k_{0})}{k_{0} {k_{0} + μ (X; β_{0}, k_{0})}^{2}} \\ + E [{S_{1} (Y + k - 1, Y)}^{2} ∣ X] - \frac{2 E {Y S_{1} (Y + k - 1, Y) ∣ X}}{k + μ (X; β_{0}, k_{0})}] \\ = & - V_{33}, \\ V ar (u_{n 4}) & = & N_{0}^{- 1} E {[\sum_{i = 1}^{N_{0}} \{\frac{I (Y_{i} = 0)}{α_{0}} - \frac{I (Y_{i} > 0)}{1 - ϕ (X_{i}; β_{0}, k_{0})}\}]}^{2} \\ = & E {\{\frac{I (Y = 0)}{α_{0}} - \frac{I (Y > 0)}{1 - ϕ (X; β_{0}, k_{0})}\}}^{2} = α_{0}^{- 1} + E \{\frac{1}{1 - ϕ (X; β_{0}, k_{0})}\}, \end{matrix}

\begin{matrix} C ov (u_{n 4}, u_{n 5}) & = & E [- \frac{1 - α_{0}}{N_{0}} \sum_{i = 1}^{N_{0}} \{\frac{I (Y_{i} = 0)}{α_{0}} - \frac{I (Y_{i} > 0)}{1 - ϕ (X_{i}; β_{0}, k_{0})}\} \\ \cdot \frac{{ϕ (X_{i}; β_{0}, k_{0}) - α_{0}} I (Y_{i} > 0)}{1 - ϕ (X_{i}; β_{0}, k_{0})}] \\ = & - (1 - α_{0}) + {(1 - α_{0})}^{2} E \{\frac{1}{1 - ϕ (X; β_{0}, k_{0})}\}, \\ V ar (u_{n 5}) & = & N_{0}^{- 1} {(1 - α_{0})}^{2} E [\sum_{i = 1}^{N_{0}} \frac{{ϕ (X_{i}; β_{0}, k_{0}) - α_{0}}^{2} I (Y_{i} > 0)}{{1 - ϕ (X_{i}; β_{0}, k_{0})}^{2}}] \\ = & {(1 - α_{0})}^{2} E [\frac{{ϕ (X; β_{0}, k_{0}) - α_{0}}^{2}}{1 - ϕ (X; β_{0}, k_{0})}] = V_{55} . \end{matrix}

By the central limit theorem, as

N_{0} \to \infty

we have

u_{n} \overset{d}{⟶} N (0, Σ)

, where:

\begin{matrix} Σ & = & (\begin{matrix} - V_{11} & 0^{⊤} & 0 & - V_{14} & 0 \\ 0 & - V_{22} & - V_{23} & 0 & 0 \\ 0 & - V_{32} & - V_{33} & 0 & 0 \\ - V_{41} & 0^{⊤} & 0 & 2 V_{45} {(1 - α_{0})}^{- 2} - V_{44} & V_{55} {(1 - α_{0})}^{- 2} \\ 0 & 0^{⊤} & 0 & V_{55} {(1 - α_{0})}^{- 2} & V_{55} \end{matrix}) . \end{matrix}

Since the covariance matrix

Σ

has the same form as the

Σ

in Lemma 3 of the Supplementary Material in ref. [16], so does the matrix

W

. The rest of the proof is similar and omitted. This completes the proof of Theorem 1. □

When

C = 0

, there is no penalty term and the likelihood functions with and without penalty coincide. This implies that the asymptotic results in Theorem 1 hold for the empirical likelihood estimators without penalty. Utilizing the result (c), a penalized empirical likelihood ratio interval estimator can be constructed, namely:

\begin{matrix} I_{p} = \{N : 2 \{ℓ_{p} ({\hat{N}}_{p}, {\hat{β}}_{p}, {\hat{k}}_{p}, {\hat{α}}_{p}) - max_{(β, k, α)} ℓ_{p} (N, β, k, α)\} \leq χ_{1}^{2} (1 - a)\}, \end{matrix}

where

χ_{1}^{2} (1 - a)

stands for the

(1 - a)

th quantile of

χ_{1}^{2}

. Correspondingly, the empirical likelihood ratio interval estimator derived without penalty is as follows:

\begin{matrix} I = \{N : 2 \{ℓ (\hat{N}, \hat{β}, \hat{k}, \hat{α}) - max_{(β, k, α)} ℓ (N, β, k, α)\} \leq χ_{1}^{2} (1 - a)\} . \end{matrix}

Despite both interval estimators asymptotically yielding correct coverage probability

(1 - a)

, our simulation studies indicate that

I_{p}

generally outperforms

I

in terms of interval width.

Remark 1.

One might question whether overdispersion exists, or equivalently, whether the zero-truncated Poisson regression model adequately fit the data. Various methods have been proposed to address this question. See, for instances, refs. [8,11,25,26].

2.4. Numerical Implementation

In this section, we aim to develop an EM algorithm to facilitate the proposed estimation method described in Section 2.3. For better presentation, we begin by considering a special case when N is fixed. Our primary objective is to maximize the profile penalized empirical log-likelihood function for a given N, as specified in Equation (4). In other words, we shall design an EM algorithm to calculate the maximum penalized empirical likelihood estimator of

(β, k, α)

when N is fixed.

In this case, the observed data can be represented as

O = {(y_{1}, x_{1}), (y_{2}, x_{2}), \dots, (y_{n}, x_{n})}

, where each

y_{i}

is positive. Additionally, the observed data include the counts

(y_{n + 1}, y_{n + 2}, \dots, y_{N})

, all of which are zero. For these individuals not captured, their covariate information is missing and represented as

O^{*} = (x_{n + 1}^{*}, x_{n + 2}^{*}, \dots, x_{N}^{*})

. According to the principle of empirical likelihood, the potential values of the

x_{i}^{*}

’s are drawn from

(x_{1}, x_{2}, \dots, x_{n})

, where the associated probabilities are

(p_{1}, p_{2}, \dots, p_{n})

.

The observed and missing data constitute the complete data. The likelihood is as follows:

\begin{matrix} \prod_{i = 1}^{n} {P (X = x_{i}) P (Y = y_{i} ∣ X = x_{i})} \times \prod_{i = n + 1}^{N} {P (X = x_{i}^{*}) P (Y = 0 ∣ X = x_{i}^{*})} . \end{matrix}

Correspondingly, the log-likelihood of

θ = {(β^{⊤}, k, α, {p_{i}})}^{⊤}

becomes:

\begin{matrix} ℓ_{c} (θ) & = & \sum_{i = 1}^{n} [log {f (y_{i}, x_{i}; β, k)} + log (p_{i})] \\ + \sum_{j = n + 1}^{N} \sum_{i = 1}^{n} I (x_{j}^{*} = x_{i}) [log {f (0, x_{i}; β, k)} + log (p_{i})] . \end{matrix}

The core of the EM-algorithm is its iterative process, which consists of an expectation step (E-step) followed by a maximization step (M-step) in each iteration. Before these two steps, we use

θ^{o l d} = (β^{o l d}, k^{o l d}, α^{o l d}, {p_{i}^{o l d}})

to denote the current value of parameters. In the E-step, we need to compute the expectation of

ℓ_{c} (θ)

conditional on

O

and

θ^{o l d}

. For this purpose, we calculate the conditional expectation of the indicator

I (x_{j}^{*} = x_{i})

, which is equal to:

\begin{matrix} P (X = x_{i} ∣ Y = 0, θ = θ^{o l d}) = \frac{ϕ (x_{i}; β^{o l d}, k^{o l d}) p_{i}^{o l d}}{α^{o l d}}, \end{matrix}

where

α^{o l d} = \sum_{i = 1}^{n} ϕ (x_{i}; β^{o l d}, k^{o l d}) p_{i}^{o l d}

denotes the current value of

α

. Correspondingly, the conditional expectation of the log-likelihood

ℓ_{c} (θ)

is equal to:

\begin{matrix} Q (θ ∣ θ^{o l d}) & = & \sum_{i = 1}^{n} [log {f (y_{i}, x_{i}; β)} + u_{i}^{o l d} log {f (0, x_{i}; β, k)}] \\ + \sum_{i = 1}^{n} (1 + u_{i}^{o l d}) log (p_{i}) \\ = : & ℓ_{1} (β, k) + ℓ_{2} ({p_{i}}) . \end{matrix}

where

u_{i}^{o l d} = (N - n) ϕ (x_{i}; β^{o l d}, k^{o l d}) p_{i}^{o l d} / α^{o l d}

represents the weight for

i = 1, \dots, n

.

The M-step consists of maximizing the function

Q (θ ∣ θ^{o l d})

. The separation of parameters

(β, k)

and

p_{i}

’s makes the maximization procedure much more elegant, which can be implemented using the following steps.

Step 1.: Update $(β, k)$ to $(β^{n e w}, k^{n e w})$ by maximizing $ℓ_{1} (β, k)$ . Given that $ℓ_{1} (β, k)$ can be interpreted as a weighted log-likelihood function, we propose that maximizing $ℓ_{1} (β, k)$ is analogous to fitting a negative binomial regression model to the observed counts $(y_{1}, y_{2}, \dots, y_{n})$ and the n-dimensional zero vector with covariates $(x_{1}, x_{2}, \dots, x_{n}, x_{1}, x_{2}, \dots, x_{n})$ and weights $(1, 1, \dots, 1, u_{1}^{o l d}, u_{2}^{o l d}, \dots, u_{n}^{o l d})$ . This step can be readily implemented through the glm.nb() function from the MASS package in R.
Step 2.: Update $p_{i}$ values by maximizing $ℓ_{2} (p_{1}, \dots, p_{n})$ under the positive and sum-to-one constraints. This step yields a closed form, namely, $p_{i}^{n e w} = (u_{i}^{o l d} + 1) / \sum_{i = 1}^{n} (u_{i}^{o l d} + 1)$ for $i = 1, \dots, n$ .
Step 3.: Update $α$ by calculating $α^{n e w} = \sum_{i = 1}^{n} ϕ (x_{i}; β^{n e w}, k^{n e w}) p_{i}^{n e w}$ .

The E- and M-steps are repeated until the sequence of

(β^{n e w}, k^{n e w}, α^{n e w}, {p_{i}^{n e w}})

or

{\tilde{ℓ}}_{p} (N, β^{n e w}, k^{n e w}, α^{n e w}, {p_{i}^{n e w}})

converges. The EM algorithm outlined above exhibits a desirable property under very general circumstances: the penalized empirical likelihood does not decrease with successive iterations. Given that the penalized empirical log-likelihood is bounded above by zero, the convergence of the sequence

(β^{n e w}, k^{n e w}, α^{n e w}, {p_{i}^{n e w}})

to a local maximum of

{\tilde{ℓ}}_{p} (N, β, k, α, {p_{i}})

is always guaranteed.

To compute the maximum penalized empirical likelihood estimator

({\hat{N}}_{p}, {\hat{β}}_{p}, {\hat{k}}_{p}, {\hat{α}}_{p})

, the aforementioned EM algorithm remains applicable after some modifications. In this scenario, the current parameter is denoted by

θ^{o l d} = (N^{o l d}, β^{o l d}, k^{o l d}, α^{o l d}, {p_{i}^{o l d}})

and the weight is

u_{i} = (N^{o l d} - n) ϕ (x_{i}; β^{o l d}, k^{o l d}) p_{i}^{o l d} / α^{o l d}

in the E-step. In addition, the M-step incorporates a maximization step for the population size parameter.

Step 4.: Calculate the updated value $N^{n e w}$ , by maximizing the partial log-likelihood function relavent on N, expressed as $log (\binom{N}{n}) + (N - n) log (α^{n e w}) + f_{p} (N)$ . This optimization can be efficiently performed using the optimize() function available in the R software (version 4.3.1, https://www.r-project.org/).

The penalized empirical likelihood ratio confidence interval for N is computed by identifying the two zeros of the modified penalized likelihood ratio function:

\begin{matrix} ℓ_{m} (N) = 2 \{ℓ_{p} ({\hat{N}}_{p}, {\hat{β}}_{p}, {\hat{k}}_{p}, {\hat{α}}_{p}) - max_{(β, k, α)} ℓ_{p} (N, β, k, α)\} - χ_{1}^{2} (1 - a), \end{matrix}

(6)

where the search for these zeros is conducted within the intervals

[n, {\hat{N}}_{p}]

and

[{\hat{N}}_{p}, M]

, and M is a sufficiently large user-specified value ensuring that

ℓ_{m} (M) > 0

. This can be implemented via the uniroot() function available in the R software. In summary, the pseudocodes outlined in Algorithms A1–A3 (Appendix B) offer the procedures for calculating both the maximum penalized empirical likelihood estimator and the corresponding penalized empirical likelihood ratio confidence interval for N.

3. Simulations

To demonstrate the efficiency of penalized empirical likelihood estimators, several simulations are conducted and multiple synthetic datasets are analyzed. In the simulation settings, we fix the abundance

N_{0}

at 250, 500, and 1000. We consider two different scenarios for generating the covariate X:

(A): A binary variable, $X \sim Bi (1, 0.3)$ , is used to represent a discrete-valued covariate, as in the case study presented in Section 4.
(B): Alternatively, a continuous variable, $X \sim U (0, 1)$ , is considered.

Given

X = {(1, X)}^{⊤}

, we simulate the count response Y from a negative binomial regression model (1), where the regression coefficient

β_{0}

is set at

{(- 0.5, 0.3)}^{⊤}

or

{(0.1, 0.3)}^{⊤}

and the dispersion parameter

k_{0}

is set at 0.5, 1, or 5.

Under each of the 18 (

3 \times 2 \times 3

) parameter combinations, we simulate 2000 random samples for each scenario. Subsequently, we calculate the maximum empirical likelihood estimators (

\hat{N}

and

{\hat{N}}_{p}

) as well as the empirical likelihood ratio interval estimators (

I

and

I_{p}

), where the estimators with the subscript p are derived via the penalized method. When the penalty is applied, we recommend an adaptive tuning parameter value of

C = {2 n {(ν - n)}^{2}}^{- 1}

, which is proven effective in our numerical studies.

3.1. Evaluation of Point Estimators

We evaluate the performance of two point estimators

\hat{N}

and

{\hat{N}}_{p}

, by assessing their relative bias in percent (%Rbias) and relative mean squared error (RMSE). For a generic estimator

\overset{ˇ}{N}

, the %Rbias and RMSE are as follows:

\begin{matrix} % Rbias = \frac{1}{2000} \sum_{j = 1}^{2000} \frac{{\overset{ˇ}{N}}_{j} - N_{0}}{N_{0}} \times 100 and RMSE = \frac{1}{2000} \sum_{j = 1}^{2000} \frac{{({\overset{ˇ}{N}}_{j} - N_{0})}^{2}}{N_{0}}, \end{matrix}

respectively, where

{\overset{ˇ}{N}}_{j}

represents the estimate derived from the jth random sample. Table 1 reports the simulated %Rbiases and RMSEs of both estimators

\hat{N}

and

{\hat{N}}_{p}

.

We first examine the simulation results when

β_{0} = {(0.1, 0.3)}^{⊤}

with the average capture probability of 56%. From the last two columns of Table 1, we see that the %Rbiases and RMSEs of both estimators are comparable when

k_{0} = 5

, where the variance-to-mean ratio (VMR) is as low as 1.3. However, the results differ significantly when

k_{0}

decreases to 1 and further to 0.5, with VMRs of 2.3 and 3.5, respectively. As the degree of overdispersion increases, the maximum empirical likelihood estimator

\hat{N}

uniformly exhibits larger biases and RMSEs than the maximum penalized empirical likelihood estimator

{\hat{N}}_{p}

. For example, in Scenario A when

k_{0} = 0.5

and

N_{0} = 250

, the relative bias of

\hat{N}

is as large as 29.8%, and the RMSE of

\hat{N}

is about 44 times (1768/40) the RMSE of

{\hat{N}}_{p}

.

Next, we examine the results when

β_{0} = {(- 0.5, 0.3)}^{⊤}

. In this scenario, the %Rbiases and RMSEs of both estimators are larger than those when

β_{0} = {(0.1, 0.3)}^{⊤}

. This is expected because the average capture probability reduces from 56% to 40%, making the sample provide less information about the model parameters. As in the aforementioned scenario, similar conclusions can be drawn, with the advantages of

{\hat{N}}_{p}

over

\hat{N}

being even clear. All relative biases of

{\hat{N}}_{p}

are smaller than 10%. In contrast,

\hat{N}

generally overestimates the population size, with the largest relative bias approaching 83% in Scenario B when

k_{0} = 0.5

and

N_{0} = 250

. In this case, its RMSE is about 58 times (4222/73) the RMSE of

{\hat{N}}_{p}

.

3.2. Evaluation of Interval Estimators

We evaluate and contrast the performance of two empirical likelihood ratio interval estimators:

I_{p}

(with penalty) and

I

(without penalty). This comparison focuses on their coverage probabilities and interval widths. As discussed in the introduction, the interval estimators may have unbounded upper limits. In the simulation, the interval estimator’s upper limit was set to the minimum of the right endpoint and 100 times the sample size.

Table 2 presents the simulation results at the 95% confidence level. Overall, the coverage accuracy of the two interval estimators is similar, with coverage probabilities either matching or slightly exceeding the nominal 95% level. However, the widths of

I_{p}

are always shorter than those of

I

, indicating that the penalized empirical likelihood ratio interval estimator offers greater precision. Specifically, the width of

I_{p}

is 12% (605/5112) of that of

I

in Scenario B when

N_{0} = 250

,

k_{0} = 0.5

, and

β_{0} = {(0.1, 0.3)}^{⊤}

.

Finally, we explore the possible explanations for the significant differences in the widths of the confidence intervals. For this purpose, Table 2 also reports the proportion of bounded cases, in which the upper limits are less than 100 times the observed sample sizes. In the other cases, the upper limits can be seen unbounded. From Table 2, it is clear that there are many unbounded cases for

I

. The smaller

N_{0}

or k, the more likely this boundary problem occurs. In contrast, the upper limits of

I_{p}

are always bounded. This demonstrates that the penalized empirical likelihood method effectively reduces the occurrence of boundary problems, leading to more precise intervals.

4. Case Study

Among the various bear species found in North America, the black bear (Ursus americanus) stands out as the most widely distributed game species. However, their populations have become fragmented, owing to over-harvesting. Understanding black bear demography is crucial for informing effective management and conservation strategies [27].

This study aims to make statistical inferences for the abundance of black bears at the U.S. Army’s Fort Drum Military Installation in northern New York. For this purpose, the capture–recapture experiment was conducted using an array of 38 baited “hair snares” during June and July 2006. The study area and the locations of the 38 hair snares are illustrated in Figure 1.4 in ref. [28]. Each week, for eight consecutive weeks, barbed wire traps were baited and checked for hair samples. Over the 8-week period, a total of 47 black bears were captured at least once, with their sex recorded upon capture. The original data structure also includes the trap locations and whether each bear was captured at each location. Although capture status was recorded for each bear across all 38 hair snares, we treat this as a standard capture–recapture dataset. Specifically, a bear was considered captured if it was caught at least once in any given week. The analyzed data include the number of weeks that bears were captured and the sex of each bear; see Table 3 for the frequencies.

We first examine the estimation results of abundance in the framework of the Poisson regression model. With this model, we find that the maximum empirical likelihood estimate equals 52. This point estimate is smaller than Chao lower bound estimate of 63, indicating that the Poisson regression model may be not adequate to fit this dataset. This inadequacy is further illustrated via the ratio plot (see Figure 1A), indicating a nonconstant relationship between the ratios of frequencies and the number of captures.

Now, we examine the estimation results in the framework of the negative binomial regression model. Applying the maximum empirical likelihood methods with and without penalty, we find that both approaches yield a point estimate of 72 with a standard error of 8.9 for the abundance, significantly higher than the estimate of 52 derived from the Poisson regression model. This difference can be attributed to overdispersion present in the negative binomial regression model, as indicated by an overdispersion parameter estimate of 1.2 with a standard error of 0.2. For completeness, we also calculate the ratio regression estimate proposed in ref. [12], which is 74 with a standard error of 10.2. Our point estimate is close to the ratio regression estimate but more efficient, as indicated by the smaller standard error.

Applying the empirical likelihood method with and without penalty, we obtain significantly different interval estimates of the abundance. At the 95% confidence level, the empirical likelihood ratio interval estimate (

I

) is [52, 1343], whereas the penalized empirical likelihood ratio interval estimate (

I_{p}

) is [52, 201]. Clearly,

I

exhibits a significantly larger upper limit in comparison to

I_{p}

. In order to understand this discrepancy, we plot the profile empirical log-likelihood ratio functions with and without penalty of the abundance (see Figure 1B). The two functions are very close when the abundance N is less than 100 but diverge when

N > 100

. The empirical log-likelihood ratio function increases gradually without penalty, flattening as N increases. In contrast, the penalized log-likelihood ratio function exhibits a rapid ascent with increasing N, thereby enhancing the precision of the confidence interval. This demonstrates how the penalized empirical likelihood method effectively mitigates the boundary problem caused by overdispersion.

As suggested by one peer reviewer, we also applied the classical Lincoln–Petersen method [1,2] to the black bear data. For performing the Lincoln–Petersen method, the eight weeks of data were divided into two samples: the initial weeks constituted sample 1, while the remaining weeks formed sample 2. The Lincoln–Petersen estimator is calculated as

n_{1} n_{2} / n_{12}

, where

n_{1}

,

n_{2}

, and

n_{12}

represent the number of individuals in sample 1, sample 2, and the overlap between the two samples, respectively. The variance of this estimator is calculated as:

\frac{(n_{1} + 1) (n_{2} + 1) (n_{1} - n_{12}) (n_{2} - n_{12})}{{(n_{12} + 1)}^{2} (n_{12} + 2)} .

Table 4 presents the Lincoln–Petersen estimates along with their standard errors. It is clear that all estimates are consistently lower than the maximum penalized empirical likelihood estimate of 72. This discrepancy is likely due to the assumption of sample independence inherent in the Lincoln–Petersen method, which may not hold true in this context. Specifically, bears captured in the first sample may be more likely to be recaptured due to the lure, leading to a positive correlation between the two samples and an overestimation of capture probability. Additionally, the standard errors for the Lincoln–Petersen estimates are uniformly smaller than the standard error of 8.9 obtained from our proposed method. This difference might arise because the Lincoln–Petersen estimator assumes equal capture probability across all individuals, failing to account for heterogeneity, such as sex-based variation. In contrast, our proposed method explicitly incorporates sex as a factor, addressing this limitation.

Given that the true population size is unknown, we aim to validate the reliability of our method’s conclusions by assessing whether the negative binomial regression model accurately fits the observed frequency data. To this end, we have included a goodness-of-fit test in the revision. Specifically, for the black bear data, we define the

χ^{2}

test statistic as:

χ^{2} = \frac{{(e_{1} - m_{1})}^{2}}{e_{1}} + \frac{{(e_{2} - m_{2})}^{2}}{e_{2}} + \frac{{(e_{3} - m_{3})}^{2}}{e_{3}} + \frac{{(e_{4} - m_{4})}^{2}}{e_{4}} + \frac{{(e_{5} - m_{5})}^{2}}{e_{5}},

where

m_{1} = 19

,

m_{2} = 11

,

m_{3} = 7

,

m_{4} = 2

, and

m_{5} = 8

represent the observed frequencies of captures occurring once, twice, three times, four times, and at least five times, respectively. The expected frequencies are computed as:

e_{k} = n \frac{\sum_{i = 1}^{n} {\hat{p}}_{i} f (k, x_{i}; {\hat{β}}_{p}, {\hat{k}}_{p})}{\sum_{i = 1}^{n} {\hat{p}}_{i} {1 - ϕ (x_{i}; {\hat{β}}_{p}, {\hat{k}}_{p})}}, k = 1, 2, 3, 4,

with

e_{5} = n - \sum_{k = 1}^{4} e_{k}

, where the

{\hat{p}}_{i}

’s are those convergence values of the

p_{i}

’s obtained from the EM algorithm. The

χ^{2}

test statistic, calibrated with a

χ_{4}^{2}

distribution, yields a value of 1.66 with a p-value of 0.8. This result suggests that the negative binomial regression model fits the black bear data well, thereby supporting the reliability of our estimation results with this model.

5. Conclusions and Discussion

The use of the negative binomial regression model is prevalent to address heterogeneity and dispersion related to capture–recapture frequency data. As noted in refs. [10,11], fitting this model often leads to identification and boundary problems for the dispersion parameter and then yields unbounded population size estimates. To address this boundary problem, we proposed imposing a half-normal prior on the population size or equivalently decreasing the empirical log-likelihood function for large population sizes by adding a suitable penalty function. This penalized technique could improve robustness of the maximum penalized empirical likelihood estimator, ensuring the derivation of a consistently bounded interval estimator. This penalized empirical likelihood approach constitutes a significant contribution of our paper. Additionally, we introduced an efficient EM algorithm to maximize the penalized empirical likelihood function. Unlike the classical Newton-type optimization method, the EM algorithm guarantees local convergence of the numerical procedure and yields stable estimates of population size. Compared to estimators without penalty, the proposed maximum penalized empirical likelihood estimator exhibits a higher efficiency and the penalized empirical likelihood ratio interval estimator is more precise.

This paper assumes that the capture–recapture experiment is conducted at a single site. However, in ecological studies, experiments are often conducted at multiple sites or follow a spatial pattern [29]. For example, the case study in Section 4 belongs to this case. To accommodate these general application scenarios, the count data can be considered as copies of

(Y_{(1)}, Y_{(2)}, \dots, Y_{(J)})

, where, for

j = 1, 2, \dots, J

,

Y_{(j)}

represents the number of times a generic individual was captured at the jth site. In such cases, the multivariate mixed negative binomial regression model used in ref. [30] can be adopted to fit the count data. Specifically, the conditional probability model (1) becomes:

\begin{matrix} P (Y_{(1)} = y_{(1)}, Y_{(2)} = y_{(2)}, \dots, Y_{(J)} = y_{(J)} ∣ X = x) \\ = & \int_{0}^{\infty} \dots \int_{0}^{\infty} \prod_{j = 1}^{J} \frac{Γ (y_{(j)} + k_{j})}{Γ (y_{(j)} + 1) Γ (k_{j})} {\{\frac{k_{j}}{k_{j} + λ_{j} μ (x; β)}\}}^{k_{j}} {\{\frac{λ_{j} μ (x; β)}{k_{j} + λ_{j} μ (x; β)}\}}^{y_{(j)}} \\ h (λ_{1}, \dots, λ_{J}; γ) d λ_{1} \dots d λ_{J}, \end{matrix}

where

(λ_{1}, \dots, λ_{J})

describes the sites’ random effects and

h (λ_{1}, \dots, λ_{J}; γ)

describes the spatial variation of capture intensities across sites. Following the approach used in ref. [30], one can opt for a gamma distribution as the mixing distribution, with a probability density function described by:

h (λ_{1}, \dots, λ_{J}; γ) = \frac{γ^{γ}}{Γ (γ)} exp (- γ λ) λ^{γ - 1} I (λ_{1} = \dots = λ_{J} = λ),

where

γ > 0

. When spatial auto-correlation arises, indicating that random effects are partially correlated across different sites,

(log (λ_{1}), \dots, log (λ_{J}))

can be modeled by a Gaussian random field. The covariance matrix of this field can be determined by flexible spatial covariance functions [31].

Throughout the paper, the conditional mean of the count data are assumed to have a log-linear relationship with individual covariates, as depicted in model (1). In practice, the relationship between the conditional mean and covariates may be more complex. To adapt our approach for practical application scenarios, one can employ the negative binomial additive model with flexible smoothing functions for each covariate [32]. In addition, applying the proposed methods to a more generalized Poisson mixture model, where the Tweedie distribution is used as the mixing distribution [33], is a straightforward yet challenging task.

Author Contributions

Conceptualization, Y.J. and Y.L.; methodology, Y.J. and Y.L.; software, Y.J. and Y.L.; validation, Y.L.; formal analysis, Y.J.; data curation, Y.J.; visualization, Y.J.; writing—original draft preparation, Y.J.; writing—review and editing, Y.L.; supervision, Y.L.; project administration, Y.L.; funding acquisition, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the National Key R&D Program of China (2021YFA1000100 and 2021YFA1000101) and the National Natural Science Foundation of China (12101239, 12171157).

Data Availability Statement

The code and data required to reproduce analyses in the case study are available on Github at https://github.com/ecnuliuyang/AbunNB (accessed on 4 June 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Some Preparation for the Proof of Theorem 1

The following lemmas facilitate the proof of Theorem 1. Specifically, Lemma A1 establishes the magnitude bounds on the first two derivatives of the penalty function. Lemma A2 identifies the leading terms of the logarithmic derivatives of the gamma functions. Lemma A3 presents fundamental results regarding the expectations of sample-based functions. Finally, Lemma A4 provides key results for conditional expectations within the framework of the negative binomial regression model (1).

Lemma A1.

Suppose that

f_{p} (N) = - C {(N - ν)}^{2} I (N > ν)

,

C = O_{p} (N_{0}^{- 2})

. Then (a)

f_{p}^{'} (N_{0}) = O_{p} (N_{0}^{- 1})

; and (b)

f_{p}^{″} (N_{0}) = O_{p} (N_{0}^{- 3 / 2})

.

Proof.

It suffices to show that

ν = O_{p} (N_{0})

, which holds because

ν

is the lower bound of

N_{0}

. □

Lemma A2.

Let

Γ (x)

be the gamma function. Define:

\begin{matrix} S_{1} (N, n) & = & \frac{\partial log {Γ (N + 1) / Γ (N - n + 1)}}{\partial N}, \\ S_{2} (N, n) & = & \frac{\partial^{2} log {Γ (N + 1) / Γ (N - n + 1)}}{\partial N^{2}} . \end{matrix}

We have:

\begin{matrix} S_{1} (N_{0}, n) & = & - log (α_{0}) + \frac{(n / N_{0}) - 1 + α_{0}}{α_{0}} + O_{p} (N_{0}^{- 1}), \end{matrix}

(A1)

\begin{matrix} S_{2} (N_{0}, n) & = & - \frac{1 - α_{0}}{N_{0} α_{0}} + O_{p} (N_{0}^{- 3 / 2}) . \end{matrix}

(A2)

Proof.

We refer readers to pages 6 and 9 of the Supplementary Material of ref. [16]. □

Lemma A3.

Suppose

r (x)

is a nonzero function of

x

and

g (y)

is a function of y. Then the following results hold.

(a): If $E [r (X) {1 - ϕ (X; β_{0}, k_{0})}] < \infty$ , we have:

$E \{\frac{1}{N_{0}} \sum_{i = 1}^{n} r (x_{i})\} = E [r (X) {1 - ϕ (X; β_{0}, k_{0})}] .$
(b): If $E [{r (X)}^{2} {1 - ϕ (X; β_{0}, k_{0})}] < \infty$ , we have:

$\frac{1}{N_{0}} \sum_{i = 1}^{n} r (x_{i}) - E [r (X) {1 - ϕ (X; β_{0}, k_{0})}] = O_{p} (N_{0}^{- 1 / 2}) .$
(c): If $E [r (X) E {g (Y) ∣ X}] < \infty$ and $g (0) = 0$ , we have:

$E \{\frac{1}{N_{0}} \sum_{i = 1}^{n} g (y_{i}) r (x_{i})\} = E [r (X) E {g (Y) ∣ X}] .$

Specifically, the above equation equals $μ (X; β_{0}, k_{0})$ if $g (y) = y$ .

Proof.

Let

{(X_{i}, Y_{i}) : i = 1, \dots, N_{0}}

denote the independent and identically distributed (i.i.d) copies of

(X, Y)

for individuals in the population. Note that:

\frac{1}{N_{0}} \sum_{i = 1}^{n} r (x_{i}) = \frac{1}{N_{0}} \sum_{i = 1}^{N_{0}} r (X_{i}) I (Y_{i} > 0),

the right hand side of which is a summation of i.i.d random variables. Hence, Result (a) follows from the fact that

E {r (X_{i}) I (Y_{i} > 0)} = E [r (X_{i}) E {I (Y_{i} > 0) | X_{i}}] = E [r (X_{i}) {1 - ϕ (X; β_{0}, k_{0})}] .

where the last equality uses that

ϕ (x; β_{0}, k_{0}) = pr (Y = 0 ∣ X = x)

.

For Result (b), we first write:

\begin{matrix} \frac{1}{N_{0}} \sum_{i = 1}^{n} r (x_{i}) - E [r (X) {1 - ϕ (X; β_{0}, k_{0})}] \\ = & \frac{1}{N_{0}} \sum_{i = 1}^{N_{0}} (r (X_{i}) I (D_{i} > 0) - E [r (X) {1 - ϕ (X; β_{0}, k_{0})}]) . \end{matrix}

Because

E [{r (X)}^{2} {1 - ϕ (X; β_{0}, k_{0})}] < \infty

and

r (x)

is nonzero, by the central limit theorem we have:

N_{0}^{1 / 2} \{\frac{1}{N_{0}} \sum_{i = 1}^{n} r (x_{i}) - E [r (X) {1 - ϕ (X; β_{0}, k_{0})}]\} \overset{d}{⟶} N [0, V ar {r (X) I (Y > 0)}],

where

V ar {r (X) I (Y > 0)} = E [{r (X)}^{2} {1 - ϕ (X; β_{0}, k_{0})}] - {(E [r (X) {1 - ϕ (X; β_{0}, k_{0})}])}^{2} < \infty

and

\overset{d}{⟶}

stands for convergence in distribution. Hence, result (b) follows.

For result (c), it follows that:

E \{\frac{1}{N_{0}} \sum_{i = 1}^{n} g (y_{i}) r (x_{i})\} = E [r (X) E {g (Y) I (Y > 0) ∣ X}] = E [r (X) E {g (Y) ∣ X}] .

□

Lemma A4.

Under the negative binomial regression model (1), we have:

\begin{matrix} E [S_{1} (Y + k - 1, Y) ∣ X] & = & - log \{\frac{k}{k + μ (X; β)}\}, \end{matrix}

(A3)

\begin{matrix} E [Y S_{1} (Y + k - 1, Y) ∣ X] & = & [\frac{1}{k} - log \{\frac{k}{k + μ (X; β)}\}] μ (X; β), \end{matrix}

(A4)

\begin{matrix} E [S_{2} (Y + k - 1, Y) ∣ X] & = & {[log \{\frac{k}{k + μ (X; β)}\}]}^{2} - E [{S_{1} (Y + k - 1, Y)}^{2} ∣ X] . \end{matrix}

(A5)

Proof.

Consider the negative binomial distribution:

\begin{matrix} q (y; μ, k) = \frac{Γ (y + k)}{Γ (y + 1) Γ (k)} {(\frac{k}{k + μ})}^{k} {(\frac{μ}{k + μ})}^{y}, y = 0, 1, \dots . \end{matrix}

We use

E_{q}

to denote the expectation of Y with respect to the probability mass function

q (y; μ, k)

. Under regular conditions that the expectation operation and the first two partial derivatives of

q (y; μ, k)

with respect to

(μ, k)

are exchangeable, we have:

\begin{matrix} E_{q} [\frac{\partial log {q (Y; μ, k)}}{\partial k}] & = & 0, \end{matrix}

(A6)

\begin{matrix} E_{q} [\frac{\partial^{2} log {q (Y; μ, k)}}{\partial μ \partial k}] & = & - E_{q} [\frac{\partial log {q (Y; μ, k)}}{\partial μ} \frac{\partial log {q (Y; μ, k)}}{\partial k}], \end{matrix}

(A7)

\begin{matrix} E_{q} [\frac{\partial^{2} log {q (Y; μ, k)}}{\partial k^{2}}] & = & - E_{q} ({[\frac{\partial log {q (Y; μ, k)}}{\partial k}]}^{2}) . \end{matrix}

(A8)

Since:

\begin{matrix} \frac{\partial log {q (y; μ, k)}}{\partial k} & = & log (\frac{k}{k + μ}) - \frac{y - μ}{k + μ} + S_{1} (y + k - 1, y), \end{matrix}

it follows from Equation (A6) and

E_{q} (Y) = μ

that

E_{q} {S_{1} (Y + k - 1, Y)} = - log {k / (k + μ)}

. and thus, Equation (A3) holds. Since:

\begin{matrix} E_{q} [\frac{\partial^{2} log {q (Y; μ, k)}}{\partial μ \partial k}] = E_{q} \{\frac{Y - μ}{μ (k + μ)} (1 - \frac{1}{k + μ})\} = 0, \\ E_{q} [\frac{\partial log {q (Y; μ, k)}}{\partial μ} \cdot \frac{\partial log {q (Y; μ, k)}}{\partial k}] \\ = & E_{q} [\frac{k (Y - μ)}{μ (k + μ)} \cdot \{log \frac{k}{k + μ} - \frac{Y - μ}{k + μ} + S_{1} (Y + k - 1, Y)\}] \\ = & \frac{k}{μ (k + μ)} [E_{q} {Y S_{1} (Y + k - 1, Y)} - \frac{μ}{k} + μ log \frac{k}{k + μ}], \end{matrix}

where the last equation uses

E_{q} {{(Y - μ)}^{2}} = μ (k + μ) / k

. It follows from Equation (A7) that

E_{q} {Y S_{1} (Y + k - 1, Y)} = [k^{- 1} - log {k / (k + μ)}] μ

, and thus, Equation (A4) holds.

It can be verified that:

\begin{matrix} E_{q} [\frac{\partial^{2} log {q (Y; μ, k)}}{\partial k^{2}}] & = & \frac{1}{k} - \frac{1}{k + μ} + E_{q} {S_{2} (Y + k - 1, Y)}, \\ E_{q} [{\{\frac{\partial log {q (Y; μ, k)}}{\partial k}\}}^{2}] & = & - {\{log (\frac{k}{k + μ})\}}^{2} + E_{q} \{{(\frac{μ - Y}{k + μ})}^{2}\} \\ + E_{q} [{S_{1} (Y + k - 1, Y)}^{2}] - \frac{2 μ}{k (k + μ)} \end{matrix}

= - {\{log (\frac{k}{k + μ})\}}^{2} - \frac{μ}{k (k + μ)} + E_{q} [{S_{1} (Y + k - 1, Y)}^{2}] .

Following Equation (A8), we have:

E_{q} [S_{2} (Y + k - 1, Y)] + E_{q} [{S_{1} (Y + k - 1, Y)}^{2}] = {\{log (\frac{k}{k + μ})\}}^{2},

and thus, Equation (A5) holds.

□

Appendix B. Pseudocodes to Perform the Penalized Empirical Likelihood Method

Pseudocodes are provided for the implementation of the penalized empirical likelihood method. Algorithm A1 details the steps for maximizing

ℓ_{p} (N, β, k, α)

with respect to

(β, k, α)

for a fixed N using the EM algorithm. Algorithm A2 presents the process for obtaining the maximum penalized empirical likelihood estimator

({\hat{N}}_{p}, {\hat{β}}_{p}, {\hat{k}}_{p}, {\hat{α}}_{p})

. Finally, Algorithm A3 describes the procedure for constructing the penalized empirical likelihood ratio confidence interval

I_{p}

.

Algorithm A1 Pseudocode to calculate

{max}_{(β, k, α)} ℓ_{p} (N, β, k, α)

via the EM algorithm

1:: Input: Observations ${(y_{1}, x_{1}), (y_{2}, x_{2}), \dots, (y_{n}, x_{n})}$ , fixed value N, initial parameter values $(β^{new}, k^{new}, α^{new}, {p_{i}^{new}})$ , and convergence threshold $ϵ = 10^{- 5}$
2:: Output: $ℓ_{p} (N, β^{old}, k^{old}, α^{old})$
3:: do
4:: Set $(β^{old}, k^{old}, α^{old}, {p_{i}^{old}}) = (β^{new}, k^{new}, α^{new}, {p_{i}^{new}})$
5:: for $i = 1, 2, \dots, n$ do
6:: Compute $u_{i}^{old} = \frac{(N - n) ϕ (x_{i}; β^{old}, k^{old}) p_{i}^{old}}{α^{old}}$
7:: end for
8:: Update parameters $(β^{new}, k^{new})$ using the R function glm.nb() by fitting a negative binomial regression model to $(y_{1}, y_{2}, \dots, y_{n}, \underset{n}{\underset{︸}{0, 0, \dots, 0}})$ with covariates $(x_{1}, x_{2}, \dots, x_{n}, x_{1}, x_{2}, \dots, x_{n})$ and weights $(1, 1, \dots, 1, u_{1}^{old}, u_{2}^{old}, \dots, u_{n}^{old})$
9:: for $i = 1, 2, \dots, n$ do
10:: Update $p_{i}^{new} = \frac{u_{i}^{old} + 1}{\sum_{i = 1}^{n} (u_{i}^{old} + 1)}$
11:: end for
12:: Compute $α^{new} = \sum_{i = 1}^{n} ϕ (x_{i}; β^{new}, k^{new}) p_{i}^{new}$
13:: Calculate the difference $diff = ℓ_{p} (N, β^{new}, k^{new}, α^{new}) - ℓ_{p} (N, β^{old}, k^{old}, α^{old})$
14:: while $| diff | > ϵ$

Algorithm A2 Pseudocode for estimating

({\hat{N}}_{p}, {\hat{β}}_{p}, {\hat{k}}_{p}, {\hat{α}}_{p})

using the EM algorithm

1:: Input: Observations ${(y_{1}, x_{1}), (y_{2}, x_{2}), \dots, (y_{n}, x_{n})}$ , initial parameter values $(N^{n e w}, β^{n e w}, k^{n e w}, α^{n e w}, {p_{i}^{n e w}})$ , and threshold $ϵ = 10^{- 5}$
2:: Output: Estimate $(N^{o l d}, β^{o l d}, k^{o l d}, α^{o l d})$
3:: do
4:: Set $(N^{o l d}, β^{o l d}, k^{o l d}, α^{o l d}, {p_{i}^{o l d}}) = (N^{n e w}, β^{n e w}, k^{n e w}, α^{n e w}, {p_{i}^{n e w}})$
5:: for $i = 1, 2, \dots, n$ do
6:: Compute $u_{i}^{o l d} = \frac{(N^{o l d} - n) ϕ (x_{i}; β^{o l d}, k^{o l d}) p_{i}^{o l d}}{α^{o l d}}$
7:: end for
8:: Update parameters $(β^{n e w}, k^{n e w})$ by fitting a negative binomial regression model to $(y_{1}, y_{2}, \dots, y_{n}, \underset{n}{\underset{︸}{0, 0, \dots, 0}})$ with covariates $(x_{1}, x_{2}, \dots, x_{n}, x_{1}, x_{2}, \dots, x_{n})$ and weights $(1, 1, \dots, 1, u_{1}^{o l d}, u_{2}^{o l d}, \dots, u_{n}^{o l d})$ using the R function glm.nb()
9:: for $i = 1, 2, \dots, n$ do
10:: Calculate $p_{i}^{n e w} = \frac{u_{i}^{o l d} + 1}{\sum_{i = 1}^{n} (u_{i}^{o l d} + 1)}$
11:: end for
12:: Compute $α^{n e w} = \sum_{i = 1}^{n} ϕ (x_{i}; β^{n e w}, k^{n e w}) p_{i}^{n e w}$
13:: Update $N^{n e w}$ by maximizing $log (\binom{N}{n}) + (N - n) log (α^{n e w}) + f_{p} (N)$ with respect to N using the R function optimize()
14:: Calculate the difference $diff = ℓ_{p} (N^{n e w}, β^{n e w}, k^{n e w}, α^{n e w}) - ℓ_{p} (N^{o l d}, β^{o l d}, k^{o l d}, α^{o l d})$
15:: while $| diff | > ϵ$

Algorithm A3 Pseudocode for calculating

I_{p}

at the

(1 - a)

confidence level

1:: Input: Observations ${(y_{1}, x_{1}), (y_{2}, x_{2}), \dots, (y_{n}, x_{n})}$ and significance level a
2:: Output: Confidence interval $[{\hat{N}}_{p l}, {\hat{N}}_{p u}]$
3:: Apply Algorithm A2 to compute the maximum penalized empirical likelihood estimate $({\hat{N}}_{p}, {\hat{β}}_{p}, {\hat{k}}_{p}, {\hat{α}}_{p})$
4:: Use the R function uniroot() to find the lower bound ${\hat{N}}_{p l}$ by searching the root of $ℓ_{m} (N)$ in the interval $[n, {\hat{N}}_{p}]$ , where $ℓ_{m} (N)$ is defined in Equation (6)
5:: for $k = 1, 2, \dots$ do
6:: Set $M = 2^{k} {\hat{N}}_{p}$ .
7:: if $ℓ_{m} (M) > 0$ then
8:: Break
9:: end if
10:: end for
11:: Use the R function uniroot() to determine the upper bound ${\hat{N}}_{p u}$ by searching the root of $ℓ_{m} (N)$ within $[{\hat{N}}_{p}, M]$ , where $ℓ_{m} (N)$ is defined in Equation (6)

References

Lincoln, F.C. Calculating Waterfowl Abundance on the Basis of Banding Returns; Number 118; U.S. Department of Agriculture: Washington, DC, USA, 1930. [Google Scholar]
Petersen, C.G.J. The yearly immigration of young plaice in the Limfjord from the German sea. Rep. Dan. Biol. Stn. 1896, 6, 1–77. [Google Scholar]
McCrea, R.S.; Morgan, B.J.T. Analysis of Capture–Recapture Data; Chapman & Hall/CRC: London, UK, 2014. [Google Scholar]
Corrao, G.; Bagnardi, V.; Vittadini, G.; Favilli, S. Capture-recapture methods to size alcohol related problems in a population. J. Epidemiol. Community Health 2000, 54, 603–610. [Google Scholar] [CrossRef] [PubMed]
Frischer, M.; Bloor, M.; Finlay, A.; Goldberg, D.; Green, S.; Haw, S.; McKeganey, N.; Platt, S. A new method of estimating prevalence of injecting drug use in an urban population: Results from a Scottish city. Int. J. Epidemiol. 1991, 20, 997–1000. [Google Scholar] [CrossRef] [PubMed]
Gallay, A.; Vaillant, V.; Bouvet, P.; Grimont, P.; Desenclos, J.C. How many foodborne outbreaks of Salmonella infection occurred in France in 1995? Application of the capture-recapture method to three surveillance systems. Am. J. Epidemiol. 2000, 152, 171–177. [Google Scholar] [CrossRef] [PubMed]
Lindén, A.; Mäntyniemi, S. Using the negative binomial distribution to model overdispersion in ecological count data. Ecology 2011, 92, 1414–1421. [Google Scholar] [CrossRef]
Cruyff, M.J.L.F.; van Der Heijden, P.G.M. Point and interval estimation of the population size using a zero-truncated negative binomial regression model. Biom. J. 2008, 50, 1035–1050. [Google Scholar] [CrossRef]
Stoklosa, J.; Blakey, R.V.; Hui, F.K. An overview of modern applications of negative binomial modelling in ecology and biodiversity. Diversity 2022, 14, 320. [Google Scholar] [CrossRef]
Anan, O. Capture-Recapture Modelling for Zero-Truncated Count Data Allowing for Heterogeneity. Ph.D. Thesis, University of Southampton, Southampton, UK, 2016. [Google Scholar]
Böhning, D. Power series mixtures and the ratio plot with applications to zero-truncated count distribution modelling. Metron 2015, 73, 201–216. [Google Scholar] [CrossRef]
Rocchetti, I.; Bunge, J.; Böhning, D. Population size estimation based upon ratios of recapture probabilities. Ann. Appl. Stat. 2011, 5, 1512–1533. [Google Scholar] [CrossRef]
Godwin, R.T. One-inflation and unobserved heterogeneity in population size estimation. Biom. J. 2017, 59, 79–93. [Google Scholar] [CrossRef]
van Der Heijden, P.G.M.; Bustami, R.; Cruyff, M.J.L.F.; Engbersen, G.; van Houwelingen, H.C. Point and interval estimation of the population size using the truncated Poisson regression model. Stat. Model. 2003, 3, 305–322. [Google Scholar] [CrossRef]
van Der Heijden, P.G.M.; Cruyff, M.J.L.F.; van Houwelingen, H.C. Estimating the size of a criminal population from police records using the truncated Poisson regression model. Stat. Neerl. 2003, 57, 289–304. [Google Scholar] [CrossRef]
Liu, Y.; Li, P.; Qin, J. Maximum empirical likelihood estimation for abundance in a closed population from capture-recapture data. Biometrika 2017, 104, 527–543. [Google Scholar]
Liu, Y.; Li, P.; Liu, Y.; Zhang, R. Semiparametric empirical likelihood inference for abundance from one-inflated capture–recapture data. Biom. J. 2022, 64, 1040–1055. [Google Scholar] [CrossRef]
Liu, Y.; Liu, Y.; Li, P.; Qin, J. Full likelihood inference for abundance from continuous time capture–recapture data. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2018, 80, 995–1014. [Google Scholar] [CrossRef]
Liu, Y.; Liu, Y.; Li, P.; Zhang, R. Two-step semiparametric empirical likelihood inference from capture–recapture data with missing covariates. Test 2024, in press. [Google Scholar] [CrossRef]
Liu, Y.; Liu, Y.; Li, P.; Zhu, L. Maximum likelihood abundance estimation from capture-recapture data when covariates are missing at random. Biometrics 2021, 77, 1050–1060. [Google Scholar] [CrossRef]
Owen, A.B. Empirical likelihood ratio confidence intervals for a single functional. Biometrika 1988, 75, 237–249. [Google Scholar] [CrossRef]
Owen, A.B. Empirical likelihood ratio confidence regions. Ann. Stat. 1990, 18, 90–120. [Google Scholar] [CrossRef]
Chao, A. Estimating the population size for capture–recapture data with unequal catchability. Biometrics 1987, 43, 783–791. [Google Scholar] [CrossRef]
Böhning, D.; Vidal-Diez, A.; Lerdsuwansri, R.; Viwatwongkasem, C.; Arnold, M. A generalization of Chao’s estimator for covariate information. Biometrics 2013, 69, 1033–1042. [Google Scholar] [CrossRef]
Gurmu, S. Tests for detecting overdispersion in the positive Poisson regression model. J. Bus. Econ. Stat. 1991, 9, 215–222. [Google Scholar] [CrossRef]
Yehia, E.G. Power of Overdispersion Tests in Zero-Truncated Negative Binomial Regression Model. Am. J. Theor. Appl. Stat. 2021, 10, 152–157. [Google Scholar] [CrossRef]
Beston, J.A. Variation in life history and demography of the American black bear. J. Wildl. Manag. 2011, 75, 1588–1596. [Google Scholar] [CrossRef]
Royle, J.A.; Chandler, R.B.; Sollmann, R.; Gardner, B. Spatial Capture-Recapture; Academic Press: Cambridge, MA, USA, 2013. [Google Scholar]
Tourani, M. A review of spatial capture–recapture: Ecological insights, limitations, and prospects. Ecol. Evol. 2022, 12, e8468. [Google Scholar] [CrossRef]
Tzougas, G.; di Cerchiara, A.P. The multivariate mixed negative binomial regression model with an application to insurance a posteriori ratemaking. Insur. Math. Econ. 2021, 101, 602–625. [Google Scholar] [CrossRef]
Schmidt, A.M.; Guttorp, P. Flexible spatial covariance functions. Spat. Stat. 2020, 37, 100416. [Google Scholar] [CrossRef]
Thurston, S.W.; Wand, M.; Wiencke, J.K. Negative binomial additive models. Biometrics 2000, 56, 139–144. [Google Scholar] [CrossRef]
Bonat, W.H.; Jørgensen, B.; Kokonendji, C.C.; Hinde, J.; Demétrio, C.G. Extended Poisson–Tweedie: Properties and regression models for count data. Stat. Model. 2018, 18, 24–49. [Google Scholar] [CrossRef]

Figure 1. (A) The ratio plot depicts the relationship between the frequency (

f_{y}

) of capture counts (y), where the dashed line represents the fitted linear model. (B) The log-EL ratio curve shows the profile empirical log-likelihood ratio functions with penalty (red dashed line) and without penalty (black solid line) of the abundance.

Figure 1. (A) The ratio plot depicts the relationship between the frequency (

f_{y}

) of capture counts (y), where the dashed line represents the fitted linear model. (B) The log-EL ratio curve shows the profile empirical log-likelihood ratio functions with penalty (red dashed line) and without penalty (black solid line) of the abundance.

Table 1. Relative biases in percent (%Rbiases) and relative mean squared errors (RMSEs) in simulation studies. The RMSE values are rounded to the nearest integer.

		$β_{0} = {(- 0.5, 0.3)}^{⊤}$						$β_{0} = {(0.1, 0.3)}^{⊤}$
		$k_{0} = 0.5$		$k_{0} = 1$		$k_{0} = 5$		$k_{0} = 0.5$		$k_{0} = 1$		$k_{0} = 5$
	$N_{0}$	$\hat{N}$	${\hat{N}}_{p}$	$\hat{N}$	${\hat{N}}_{p}$	$\hat{N}$	${\hat{N}}_{p}$	$\hat{N}$	${\hat{N}}_{p}$	$\hat{N}$	${\hat{N}}_{p}$	$\hat{N}$	${\hat{N}}_{p}$
Scenario A
%Rbias	250	82.6	−2.92	34.9	4.24	16.7	10.1	29.8	0.59	5.64	1.86	0.46	0.44
	500	37.8	3.81	14.6	5.53	4.31	4.02	6.65	2.08	1.36	1.14	0.24	0.24
	1000	19.8	7.90	3.27	2.73	1.44	1.44	1.97	1.62	0.55	0.53	−0.06	−0.06
RMSE	250	4006	73	1567	73	330	57	1768	40	136	24	5	4
	500	2558	130	881	114	51	41	271	53	25	23	4	4
	1000	2336	246	123	96	28	28	61	54	17	16	4	4
Scenario B
%Rbias	250	83.14	−1.75	30.38	4.17	15.08	9.02	24.23	1.00	6.07	2.20	0.28	0.25
	500	33.18	4.30	11.01	4.66	3.72	3.57	6.48	2.03	1.01	0.86	0.19	0.19
	1000	15.12	6.95	3.02	2.56	1.19	1.18	1.55	1.24	0.48	0.47	−0.07	−0.07
RMSE	250	4222	73	1228	68	297	56	1143	38	128	24	4	4
	500	2474	121	720	96	34	31	571	48	19	17	4	4
	1000	1168	210	115	84	24	24	55	48	15	15	3	3

Table 2. Simulated coverage probabilities (CPs, unit: %) and average widths (AWs) of interval estimators

I

and

I_{p}

at the 95% level, along with the proportion of bounded cases (PBC, unit: %) whose upper limits are less than 100 times the observed sample sizes. The AW and PBC values are rounded to the nearest integer.

Table 2. Simulated coverage probabilities (CPs, unit: %) and average widths (AWs) of interval estimators

I

and

I_{p}

at the 95% level, along with the proportion of bounded cases (PBC, unit: %) whose upper limits are less than 100 times the observed sample sizes. The AW and PBC values are rounded to the nearest integer.

		$β_{0} = {(- 0.5, 0.3)}^{⊤}$						$β_{0} = {(0.1, 0.3)}^{⊤}$
		$k_{0} = 0.5$		$k_{0} = 1$		$k_{0} = 5$		$k_{0} = 0.5$		$k_{0} = 1$		$k_{0} = 5$
	$N_{0}$	$I$	$I_{p}$	$I$	$I_{p}$	$I$	$I_{p}$	$I$	$I_{p}$	$I$	$I_{p}$	$I$	$I_{p}$
Scenario A
CP	250	94.00	93.90	93.05	93.10	97.75	97.80	93.45	93.60	94.35	94.35	95.60	95.60
	500	94.70	94.70	94.00	94.05	97.15	97.15	93.75	93.90	94.55	94.55	95.25	95.25
	1000	94.60	94.80	95.25	95.25	95.80	95.80	94.65	94.65	95.20	95.20	96.10	96.10
AW	250	742	290	836	363	647	410	776	317	577	330	185	171
	500	1532	745	1294	833	929	745	1104	696	576	498	200	200
	1000	2721	1763	2046	1584	971	909	1426	1152	591	582	258	258
PBC	250	33	100	46	100	67	100	58	100	86	100	100	100
	500	46	100	66	100	90	100	80	100	98	100	100	100
	1000	67	100	89	100	99	100	97	100	100	100	100	100
Scenario B
CP	250	93.85	93.6	93.50	93.55	97.15	97.20	93.95	93.65	93.90	93.95	95.15	95.15
	500	94.85	94.8	94.60	94.60	96.65	96.65	94.30	94.40	94.85	94.90	95.10	95.10
	1000	94.15	94.2	95.35	95.35	96.10	96.10	94.85	94.85	95.65	95.65	95.75	95.75
AW	250	6002	820	5646	973	3654	947	5112	605	2122	458	174	154
	500	9773	1804	6999	1744	2462	1075	4817	1010	887	504	182	181
	1000	12250	3343	4981	2157	896	820	2084	1195	545	536	237	237
PBC	250	34	100	48	100	73	100	60	100	88	100	100	100
	500	48	100	70	100	93	100	83	100	99	100	100	100
	1000	70	100	92	100	100	100	98	100	100	100	100	100

Table 3. Description of the black bear data.

Number of Captures	1	2	3	4	5	6	7
Male	11	7	5	1	2	1	1
Female	8	4	2	1	0	1	3

Table 4. Lincoln–Petersen estimates and standard errors (SEs).

Sample 1	Sample 2	$n_{1}$	$n_{2}$	$n_{12}$	Estimate	SE
week 1	weeks 2–8	9	45	7	58	7.8
weeks 1–2	weeks 3–8	14	43	10	60	7.7
weeks 1–3	weeks 4–8	27	38	18	57	5.2
weeks 1–4	weeks 5–8	33	38	24	52	3.2
weeks 1–5	weeks 6–8	38	31	22	54	3.8
weeks 1–6	weeks 7-8	40	31	22	54	3.8
weeks 1–7	week 8	45	11	9	55	6.0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ji, Y.; Liu, Y. A Penalized Empirical Likelihood Approach for Estimating Population Sizes under the Negative Binomial Regression Model. Mathematics 2024, 12, 2674. https://doi.org/10.3390/math12172674

AMA Style

Ji Y, Liu Y. A Penalized Empirical Likelihood Approach for Estimating Population Sizes under the Negative Binomial Regression Model. Mathematics. 2024; 12(17):2674. https://doi.org/10.3390/math12172674

Chicago/Turabian Style

Ji, Yulu, and Yang Liu. 2024. "A Penalized Empirical Likelihood Approach for Estimating Population Sizes under the Negative Binomial Regression Model" Mathematics 12, no. 17: 2674. https://doi.org/10.3390/math12172674

APA Style

Ji, Y., & Liu, Y. (2024). A Penalized Empirical Likelihood Approach for Estimating Population Sizes under the Negative Binomial Regression Model. Mathematics, 12(17), 2674. https://doi.org/10.3390/math12172674

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Penalized Empirical Likelihood Approach for Estimating Population Sizes under the Negative Binomial Regression Model

Abstract

1. Introduction

2. Methodology

2.1. Model and Data

2.2. Semiparametric Empirical Likelihood

2.3. Penalized Empirical Likelihood Inference

2.4. Numerical Implementation

3. Simulations

3.1. Evaluation of Point Estimators

3.2. Evaluation of Interval Estimators

4. Case Study

5. Conclusions and Discussion

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Some Preparation for the Proof of Theorem 1

Appendix B. Pseudocodes to Perform the Penalized Empirical Likelihood Method

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI