2.1. Model and Data
Suppose there are N individuals in a population. Each individual is characterized by the number of captures, denoted by Y, a nonnegative interger-valued variable. A naive approach to modeling Y is to assume a Poisson distribution, , where represents the rate or expected number of captures. The Poisson model inherently assumes that all individuals in the population are homogeneous, meaning they share the same rate parameter . In addition, as noted in the introduction, a key limitation of the Poisson model is its inability to handle overdispersed data, where the variance exceeds the mean.
To address the heterogeneity and overdispersion, one can assume that individuals have varying rates. In practice, a common approach is to model the distribution of these rates using a gamma distribution as a prior. Specifically,
, where
is the shape parameter and
is the scale parameter with
. With this prior, the marginal probability mass function of
Y can be derived in a closed form:
which corresponds to the Poisson–Gamma or negative binomial distribution in the probability textbook. This distribution models the number of successes before the
rth failure occurs in a sequence of independent Bernoulli trials, with
p representing the probability of success in each trial.
As highlighted in ref. [
9], a common reparameterization of
is often used to interpret the counting processes in ecological and biodiversity studies. This reparameterization expresses
in terms of its mean
and a dispersion or aggregation parameter
k, which controls the variation in counts. By setting
and
,
can be reformulated as:
When the individual covariates, denoted by
, are available, it becomes necessary to account for the heterogeneity induced by these covariates. To do so, a parametric model is used, specifically
, which relates the mean parameter to the individual covariates. Thus, given
, the conditional probability mass function of
Y is expressed by:
where
represents the unknown regression coefficients and
represents the dispersion parameter. This formulation is referred to as the negative binomial regression model. As a special case, Equation (
1) reduces to
when all coefficients, except the intercept, are zero. The negative binomial regression model also includes the geometric regression model when
and reduces to the Poisson regression model as
.
Given , the conditional expectation of Y is equal to , while the conditional variance is expressed as , indicating a quadratic relationship. The parameter k controls the degree of overdispersion: as k decreases, the variance increases, leading to greater overdispersion. Overdispersion is commonly observed in capture–recapture studies, where the variance significantly exceeds the mean. Consequently, the negative binomial regression model is often more appropriate for modeling capture–recapture data under conditions of severe overdispersion, as compared to the Poisson model.
Because the event
is unobservable in capture–recapture studies, the zero-truncated version of model (
1) is considered:
where
represents the conditional probability that an individual with a covariate
is not captured at all.
Consider a study that captured
n distinct individuals, with
and
denoting their individual covariates and capture frequencies, respectively. Under the model (
2), ref. [
8] proposed a maximum conditional likelihood estimator
by maximizing:
According to the principle of inverse probability weighting, the Horvitz–Thompson type estimator of
N is defined as
. However,
might be inflated due to small detection probabilities.
2.2. Semiparametric Empirical Likelihood
The semiparametric empirical likelihood, initially derived from ref. [
16], is an appealing technique for implementing the full likelihood method when capture–recapture data contain individual covariates. Taking the negative binomial regression model as an example, we provide a brief introduction to this technique.
Considering that
n distinct individuals out of a total of
N individuals were captured,
n follows a binomial distribution and the corresponding probability is as follows:
where
represents the probability that a generic individual was not captured at all. For the given
n individuals, the conditional probability of their covariates and capture counts is as follows:
Multiplying these two expressions yields the full likelihood function:
In Equation (
3), the marginal probability
is unknown and shall be addressed by the empirical likelihood method; see refs. [
21,
22] for more details. Technically, we assume that
for
, where
is subject to the constraint
. With this substitution, we call the full likelihood the semiparametric empirical likelihood and refer to its logarithm as the empirical log-likelihood function, namely:
By the definition of
and the iterated expectation theorem, it follows that
, or equivalently:
With the constraints for
’s, the profile empirical log-likelihood function can be derived using the Lagrange multiplier method on Equation (
3):
where
is the Lagrange multiplier, satisfying:
Notice that there are a finite number of unknown parameters in the profile empirical log-likelihood function. By maximizing this function, we obtain the maximum empirical likelihood estimator, expressed as
2.3. Penalized Empirical Likelihood Inference
When the number of captures exhibits severe overdispersion, both the estimators
and
, proposed in
Section 2.1 and
Section 2.2, respectively, may exhibit spuriously large values, potentially leading to misleading conclusions. This issue has been addressed in ref. [
13] (p. 84) for the Horvitz–Thompson type estimator. Our simulation studies further confirm that the empirical likelihood estimators may also suffer from the boundary problem. This issue may arise due to the limited information available about the population size, causing the profile empirical log-likelihood to fail in distinguishing between different values of large
N.
To mitigate this problem, we intuitively incorporate prior information on the population size to reduce the probability of large values. We achieve this by augmenting the likelihood functions with an appropriate penalty term. Correspondingly, the penalized empirical log-likelihood function and its profile version are defined as:
where the penalty term
takes the form of
, where
is a lower bound of
N,
is a tuning parameter, and
is the indicator function. For specific values of
and
C, a maximum penalized empirical likelihood estimator is proposed, namely,
From a Bayesian perspective, adding the penalty term
into the log-likelihood is equivalent to imposing a prior for
N that has a mixture of the half-normal distribution
for
and a uniform distribution
for
. In other words, this penalty has no effect on the likelihood when
and gradually decreases the likelihood when
. The larger the population size, the more pronounced the decrease. Consequently, the penalized method encourages large values of
to shrink towards
. In practice, we recommend using the Chao estimator as the lower bound
; see ref. [
23] for details about this estimator. Alternatively, the generalized Chao estimator proposed in ref. [
24] can also be considered.
To derive the large-sample properties of the estimator
, we define some notation when the parameter vector
takes its true value, namely,
. Define
,
, and:
Let:
where
,
,
,
,
, and:
The following theorem presents the large-sample properties of the maximum penalized empirical likelihood estimator associated with the penalized empirical likelihood ratio statistic of N.
Theorem 1. Suppose that , , and the tuning parameter satisfies . If is nonsingular and , as :
- (a)
;
- (b)
, where - (c)
, where denotes the standard chi-square distribution.
Proof. As the proposed semiparametric empirical likelihood approach can be seen as an extension of the EL method of ref. [
16] to the negative binomial regression model, the proof of Theorem 1 is very similar to those of Theorem 1 and Corollary 1 in ref. [
16]. Here, we only highlight the difference and the formulae of
and
in our framework.
We first argue that
is the limit of
, where
is the solution to:
For this purpose, we define a function:
It can be seen that
where
is the solution to
. The first partial derivatives of
with respect to
and
are:
Setting the above equations to zero gives
Since
is the probability of never being captured,
n follows a Binomial distribution Bi(
). When
is consistent, it follows from the strong law of large numbers that as
:
where
denotes convergence in probability.
Below, we derive the formulae of and . Let , with ,
. Define
According to Lemma 2 in the Supplementary Material of [
16], deriving the formula of
is equivalent to calculating the first two partial derivatives of
with respect to
at
. It follows from the law of large numbers and the central limit theorem that:
where the first equation uses the result (a) of Lemma A1 and Equation (
A1) of Lemma A2 in the
Appendix A. Using the result (b) of Lemma A1 and Equation (A2) of Lemma A2, we have:
Similarly, it can be verified that
and:
With the arguments of Lemma A3 in the
Appendix A, we have that the leading term of
is as follows:
where
s are the same as those defined in Equation (
5).
Next, we derive the formula of
. Define
, where:
It can be verified that
and
. Here, we only verify that
and
. In fact, it follows Lemma A3 that:
where the last second equation uses Equation (
A3) of Lemma A4 in the
Appendix A.
For the covariance matrix of
, it follows Lemma A4 that:
By the central limit theorem, as
we have
, where:
Since the covariance matrix
has the same form as the
in Lemma 3 of the Supplementary Material in ref. [
16], so does the matrix
. The rest of the proof is similar and omitted. This completes the proof of Theorem 1. □
When
, there is no penalty term and the likelihood functions with and without penalty coincide. This implies that the asymptotic results in Theorem 1 hold for the empirical likelihood estimators without penalty. Utilizing the result (c), a penalized empirical likelihood ratio interval estimator can be constructed, namely:
where
stands for the
th quantile of
. Correspondingly, the empirical likelihood ratio interval estimator derived without penalty is as follows:
Despite both interval estimators asymptotically yielding correct coverage probability , our simulation studies indicate that generally outperforms in terms of interval width.
Remark 1. One might question whether overdispersion exists, or equivalently, whether the zero-truncated Poisson regression model adequately fit the data. Various methods have been proposed to address this question. See, for instances, refs. [8,11,25,26]. 2.4. Numerical Implementation
In this section, we aim to develop an EM algorithm to facilitate the proposed estimation method described in
Section 2.3. For better presentation, we begin by considering a special case when
N is fixed. Our primary objective is to maximize the profile penalized empirical log-likelihood function for a given
N, as specified in Equation (
4). In other words, we shall design an EM algorithm to calculate the maximum penalized empirical likelihood estimator of
when
N is fixed.
In this case, the observed data can be represented as , where each is positive. Additionally, the observed data include the counts , all of which are zero. For these individuals not captured, their covariate information is missing and represented as . According to the principle of empirical likelihood, the potential values of the ’s are drawn from , where the associated probabilities are .
The observed and missing data constitute the complete data. The likelihood is as follows:
Correspondingly, the log-likelihood of
becomes:
The core of the EM-algorithm is its iterative process, which consists of an expectation step (E-step) followed by a maximization step (M-step) in each iteration. Before these two steps, we use
to denote the current value of parameters. In the E-step, we need to compute the expectation of
conditional on
and
. For this purpose, we calculate the conditional expectation of the indicator
, which is equal to:
where
denotes the current value of
. Correspondingly, the conditional expectation of the log-likelihood
is equal to:
where
represents the weight for
.
The M-step consists of maximizing the function . The separation of parameters and ’s makes the maximization procedure much more elegant, which can be implemented using the following steps.
- Step 1.
Update to by maximizing . Given that can be interpreted as a weighted log-likelihood function, we propose that maximizing is analogous to fitting a negative binomial regression model to the observed counts and the n-dimensional zero vector with covariates and weights . This step can be readily implemented through the glm.nb() function from the MASS package in R.
- Step 2.
Update values by maximizing under the positive and sum-to-one constraints. This step yields a closed form, namely, for .
- Step 3.
Update by calculating .
The E- and M-steps are repeated until the sequence of or converges. The EM algorithm outlined above exhibits a desirable property under very general circumstances: the penalized empirical likelihood does not decrease with successive iterations. Given that the penalized empirical log-likelihood is bounded above by zero, the convergence of the sequence to a local maximum of is always guaranteed.
To compute the maximum penalized empirical likelihood estimator , the aforementioned EM algorithm remains applicable after some modifications. In this scenario, the current parameter is denoted by and the weight is in the E-step. In addition, the M-step incorporates a maximization step for the population size parameter.
- Step 4.
Calculate the updated value
, by maximizing the partial log-likelihood function relavent on
N, expressed as
. This optimization can be efficiently performed using the
optimize() function available in the
R software (version 4.3.1,
https://www.r-project.org/).
The penalized empirical likelihood ratio confidence interval for
N is computed by identifying the two zeros of the modified penalized likelihood ratio function:
where the search for these zeros is conducted within the intervals
and
, and
M is a sufficiently large user-specified value ensuring that
. This can be implemented via the
uniroot() function available in the
R software. In summary, the pseudocodes outlined in Algorithms A1–A3 (
Appendix B) offer the procedures for calculating both the maximum penalized empirical likelihood estimator and the corresponding penalized empirical likelihood ratio confidence interval for
N.