1. Introduction
Overlapping coefficients (OVL) are measures of how similar two populations are; this similarity is a function that assigns a real number between 0 and 1, where a value of zero indicates that the distributions are completely different and a value of one indicates that they are identical. There are many overlapping coefficients in the literature, including measures of overlap that determine the percentage of area that the two distributions have in common [
1]. Gini and Livada [
2] first introduced the idea of overlapping in 1943. Matusita’s coefficient [
3] was introduced to calculate the significant distance between two probability density functions, and has applications in several practical areas, including reliability analysis and clinical research [
4,
5]. Matusita developed a discrete version known as the Freeman–Tukey (FT) measure, which is related to the Hellinger distance [
6,
7] and the delta method [
8]. The Chi-Squared measure [
9] and Hellinger measure [
10] play key roles in information theory, statistics, learning, signal processing, and other theoretical and applied branches of mathematics [
11,
12]. Morisita’s coefficient [
13] was proposed as an index of similarity between two communities. Weitzman’s coefficient [
14], primarily used to compare income distributions, was defined as the region where the curves of two probability distributions intersect. Kullback and Leibler [
15] introduced the Kullback–Leibler measure, which measures the gain in information between two distributions and has been widely used in the literature on data mining. Jeffreys [
16] introduced and studied a divergence measure called the Jeffreys distance, which is regarded as a symmetrization of the Kullback–Leibler measure. For a comprehensive review of various divergence measures, see [
17,
18,
19].
The OVL coefficients are used in various fields, such as ecological processes [
20], statistical ecology [
21], clinical trials [
5], data fusion [
22], information processing [
23], applied statistics [
24], economics [
25], and others.
Inference for OVL measures has been investigated by several researchers under normal, Weibull, and exponential distributions. In 2005, al-Saidy et al. [
26] presented the inference of three OVL coefficients for two Weibull distributions with the same shape parameter and different scale parameters.
Al-Saleh and Samawi [
27] used bootstrap and Taylor series approximation to investigate the interval estimation of three OVL coefficients for two exponential distributions with different means. Samawi and Al-Saleh [
28] studied three OVL coefficients for two exponential distributions and estimated them using ranked set sampling. Hamza et al. [
29] proposed a new OVL coefficient based on the Kullback–Leibler measure for two exponential distributions. Sibil et al. [
30] investigated both interval estimation and hypothesis testing for the OVL coefficients for one- and two-parameter exponential distributions using the concept of a generalized pivotal quantity.
Pianka’s overlap coefficient is used to assess the similarity of resource use by two species [
31], they used Pianka’s overlap coefficient as a summary measure and to make inferences, typically about competition for resources.
Pianka’s overlap is used in mechanisms that favour morphological co-occurrence; Vieira and Port [
32] evaluated the Pianka’s overlap between two species based on three main niche dimensions: habitat, food, and time. Jacqueline et al. [
33] calculated dietary overlap between foxes and dingoes using Pianka’s index. Sa Oliveira et al. [
34] investigated diet and niche breadth in fish communities, for which they estimated niche breadth using the liven index and Pianka’s measure.
In this paper, we consider the Pianka’s OVL coefficient () between two exponential distributions. We determine both the limiting and exact distributions for the maximum likelihood estimator (MLE) of . We study the MLE and Bayesian estimators and compare their efficiency with each other. In addition, we consider interval estimation of using the asymptotic technique and the transformation technique, and compare the effectiveness of both techniques.
2. General Setting and Definition of the Pianka Overlap Measure
Let
and
be two continuous probability density functions. Pianka’s overlap measure is defined as follows [
31]:
If a random variable
X follows the exponential distribution, then the respective cdf and pdf of
X are provided by
and
X is denoted by
.
Now, let
and
be two independent random samples taken from
and
, respectively. Then, the Pianka’s overlap coefficient
between the two exponential distributions, as defined in Equation (
1), is provided by
Let
. Then, the Pianka’s OVL coefficient in (
2) can be written as a function of
k, as follows:
Several properties of are provided in the following lemma.
- 1.
for all
- 2.
iff i.e.,
- 3.
, since
- 4.
- 5.
is monotonically increasing for and decreasing for with a maximum of at
Proof. It is easy to derive the above results from the formula of the Pianka’s overlap coefficient formula in (
3). □
Figure 1 shows the plot of the Pianka’s overlap coefficient between two exponential distributions as a function of
k, where
In the following section, we find the maximum likelihood estimator of Pianka’s overlap coefficient , namely, , along with its distribution. In addition, we investigate the limiting distribution of .
3. Maximum Likelihood Estimator of
It is known that the MLEs for
and
based on two samples taken from
and
are provided by
and
, respectively. From the basic properties of the exponential distribution, we have
with
and
and with
, where
stands for the gamma distribution with shape parameter
and scale parameter
. It follows that the estimates
represent a complete minimal sufficient statistic for
. Thus, from the invariance property of the MLE, the MLE of
is
3.1. Limiting Distribution of
The following theorem concludes that the limiting distribution for the MLE of Pianka’s overlap coefficient for two exponential distributions with different scale parameters is the normal distribution, using to denote the normal distribution with location parameter and scale parameter .
Theorem 1. Let and be two independent random samples from and respectively, with Then, the asymptotic distribution for is Proof. Using the asymptotic property of MLE and the multivariate delta method (
δ−method), we have
That is,
where
is the Fisher information.
We want to find the asymptotic distribution of as
Using the fact that
and
and the continuous mapping theorem, we obtain
Now, we are interested in the asymptotic distribution of
Because
is a function of
and
using an alternative form of the multivariate
δ−method [
35] we obtain
and
Therefore, the asymptotic distribution of
is
□
3.2. The Exact Distribution of
To ease the derivation of the distribution of
we can rewrite Equation (
4) as follows:
where
and
. Now, we apply the following steps.
- Step 1.
Find the pdf of by considering the following transformations.
Let and ; then, and The absolute value of the Jacobian of this transform is
Thus, the joint pdf of
and
is
By integrating
out, the pdf of
is
Consequently, the pdf of
H is
where
- Step 2.
Solve for
From Equation (
5) and the transformation
, we have
Now, let
, allowing
to be rewritten as the quadratic equation
The two solutions of Equation (
6) are
and
- Step 3.
The pdf of
is
Figure 2 shows different plots of the density of
for
. Based on the figure, the pdf of
can be bell-shape, bimodal, or J-shaped.
4. Interval Estimation of
In this section, we find interval estimation of Pianka’s overlap coefficient
by considering both asymptotic and transformation techniques; later, in
Section 6, we perform a Monte Carlo analysis to compare the effectiveness of these two different approaches.
4.1. Asymptotic Technique
A large sample confidence interval for
can be easily calculated. From theorem (2.1) and the continuous mapping theorem, we have
Hence, a
large sample confidence interval for
is
where
is the
γth percentile of the standard normal distribution.
4.2. Transformation Technique
From the assumption in Equation (
3),
, where the MLE of
k is
. From
Section 3 and the relationship between the gamma distribution and the chi-square distribution, it is easy to conclude that
and
; thus,
has an
F-distribution with
degrees of freedom.
Let
L and
U be the lower and upper confidence limits, respectively; from the concept of the confidence interval, we have
By solving (
7) for k, we obtain the values of
L and
U as
and
.
However, the overlap coefficient
is not a monotone function of
k. Therefore, using the transformation technique, we can obtain a
confidence interval for
, as follows:
where
is the
percentile of the
F−distribution with
degrees of freedom.
5. Bayes Estimator of
Let and be two independent random samples taken from and , respectively. Let , and where is the inverse gamma distribution.
Using the fact that
and
, the posterior distribution of
given
is
where
and
are prior probability distributions for
and
, respectively.
Then,
where
and
The Bayes estimator
is
The above estimate does not have a simple closed form; thus, we obtain it numerically. For the asymptotic distribution of the Bayes estimator
, the Bernstein–von Misses theorem [
36] concludes that the Bayesian estimator and the maximum likelihood estimator are asymptotically equivalent for large sample sizes.
In the next section, we present a simulation study to compare the two approaches for finding the interval estimator of Pianka’s overlap coefficient, as described earlier in
Section 4. Additionally, we investigate the performance of the maximum likelihood estimator (
and Bayes estimator (
of the Pianka’s overlap coefficient detailed in
Section 3 and
Section 5.
6. Simulation Study
To compare the two approaches of interval estimation of Pianka’s overlap coefficient, we consider two criteria:
- 1.
The term “valid confidence level” can be applied to an interval estimation process when, in repeated sampling, the actual coverage of the true but unmeasured statistic is close to the nominal confidence level;
- 2.
If the expected length of the simulated period is short, a method for estimating intervals can be described as “valid length-efficient”.
To compare the estimators, we use the bias, mean square error (MSE), and efficiency for each estimator. In order to use the above criteria, we conducted a simulation study, as follows:
- 1.
A random sample of size n is generated from This random sample is used to calculate
- 2.
A random sample of size m is generated from This random sample is used to calculate
- 3.
The lower limit upper limit , and width are calculated with a nominal confidence level of .
- 4.
The MLE and the Bayes ( estimators are calculated.
- 5.
Steps 1–4 above are repeated 10,000 times.
- 6.
The average of the lower limits (AL), median of the lower limits (ML), average of the upper limits (AU), median of the upper limits (MU), average width (AW), and median width (MW) are calculated for each interval.
- 7.
The percentage of out of the 10,000 samples generated in Step 3 is called the “coverage probability” and is denoted by .
- 8.
Histogram Plots for and are generated.
- 9.
Bias and MSE are calculated for and , then efficiency is calculated
- 10.
Steps 1–9 above are repeated for
and
for each value of
Mathematica was used to simulate each of the interval estimation and point estimation methods for the Pianka’s overlap measure .
Table 1,
Table 2 and
Table 3 show the simulated interval estimators using the asymptotic and transformation techniques based on exponential random samples with a nominal confidence level of
. These results show that the average width (AW) is almost the same as the median width (MW) and that the transformation method consistently performs better in terms of the confidence interval width. Moreover, the transformation method appears to be effective in terms of the coverage probability except for values of k around one and very small sample sizes.
As the sample size increases, the coverage probability of the two techniques approaches the nominal value. The coverage probability of the asymptotic technique works very well, and increases as k approaches one; however, when and for small sample sizes the transformation technique performs exceptionally well.
Figure 3,
Figure 4 and
Figure 5 plot the MLE and Bayes estimators of
for
and
.
Table 4,
Table 5 and
Table 6 present the results of the simulation study carried out to compare the MLE and Bayes estimators for the Pianka’s overlap coefficient. Based on these results, which only consider the values of
the absolute values of the bias are in all cases less than
and decrease as the sample size increases. It appears that the MLE estimator works well, and the Bayes estimator seems to work quite well at
However, for
the calculations are provided in terms of
for the Pianka’s overlap measure. For sample sizes larger than 30, the bias and MSE are quite close to zero.
The estimates of the bias are plotted in
Figure 6 for the MLE and Bayes estimators. From these results, it can be seen that the bias decreases significantly as the sample size increases.
Figure 6a shows that the actual Pianka’s overlap is underestimated; however, for very small values of
k and small sample sizes the true Pianka’s overlap is overestimated. Furthermore, the bias increases as
k increases for the MLE estimator.
The estimates of MSE are plotted in
Figure 7 for the MLE and Bayes estimators. From these results, it can be seen that the MSE decreases significantly as the sample size increases.
Figure 7a shows that for small
k values and small sample sizes, there is a significant increase in the MSE for the MLE estimator. For the Bayes estimator,
Figure 6b and
Figure 7b show that both the bias and the MSE decrease as the value of
k increases.
7. Conclusions
We have estimated Pianka’s overlap coefficient for two exponential populations with different scale parameters using the MLE and Bayes estimators, then compared these estimators by calculating the bias and MSE in a simulation study. In addition, we have constructed confidence intervals for the Pianka’s overlap measure using asymptotic and transformation techniques, then compared them using the “valid confidence level” and “valid length-efficiency”.
We investigated the accuracy of the Pianka’s overlap coefficient through a Monte Carlo analysis. In conclusion, it appears that there is no ideal approach. Therefore, a transformation procedure is recommended when and the sample size is small. The asymptotic approach can be used if computers are available. For larger sample sizes and , the transformation approach is recommended.