1. Introduction
Information theory offers a feasible way for quantifying uncertainty and reciprocal information of random variables in which entropy plays a significant role. Shannon Shannon [
1] established the concept of statistical entropy, namely Shannon’s entropy, which measures the average level of missing information in a random source, as a fundamental idea in information theory. The relationship between Shannon’s entropy and the information available in classical and quantum systems is well established. In addition, statistical mechanics interpretations can be made through Shannon’s entropy. Shannon’s entropy is typically used in statistical mechanics to represent the entropy for classical systems with configurations taken from canonical ensembles. Moreover, Shannon’s entropy is useful in many disciplines, such as computer science, molecular biology, hydrology, and meteorology, to solve different scientific problems involving uncertainties in random quantities. For example, molecular biologists use the principle of Shannon’s entropy to study trends in gene sequences. Recently, Saraiva [
2] exemplified Shannon’s entropy in biological diversity and student migration studies and provided an intuitive introduction to Shannon’s entropy. For more details, one may refer to the excellent monograph by Cover [
3] on the theory and implications of entropy with applications in various disciplines. Shannon’s entropy is one of the most widely used entropies in statistics and information theory.
Suppose a random variable
X has probability density function (pdf)
, the Shannon’s entropy of the random variable
X, is given by
Please note that the entropy in Equation (
1) is a continuous entropy, and the measurement is relative to the coordinate system ([
1] Section 20). In other words, the continuous entropy is not invariant under a variable change in general. However, the entropy based on the variable change can be expressed as the original entropy less the expected logarithm of the Jacobian of the variable change. Recently, researchers studied parametric statistical inference for measuring the entropy under different lifetime models based on complete or censored data. For example, entropy for several shifted exponential populations was studied by Kayal and Kumar [
4]. Another author Cho et al. [
5] studied the entropy estimation for Rayleigh distribution under doubly generalized Type-II hybrid censored data. Another paper Du et al. [
6] developed statistical inference for the information entropy of the log-logistic distribution based on progressively Type-I interval-censored data, and Liu and Gui [
7] investigated the entropy estimation for Lomax distribution based on generalized progressively hybrid censoring. Another paper Yu et al. [
8] developed statistical inference on Shannon’s entropy for inverse Weibull distribution based on progressively first-failure censored data.
The Maxwell–Boltzmann distribution, popularly known as the Maxwell distribution (MWD), was initially developed by James Clerk Maxwell and Ludwig Boltzmann in the late 1800s as a distribution of velocities in a gas at a specific temperature. In this paper, we consider the MWD due to its simplicity and nice physical interpretation. For further information about the reliability characteristics of the MWD, see Bekker and Roux [
9]. The MWD is frequently employed in physics and chemistry for several reasons, including the fact that the MWD can be used to describe several important gas properties, such as pressure and diffusion. The MWD has become a well-known lifespan model in recent years. Many researchers studied this statistical distribution in depth for modeling lifespan data. As Bekker and Roux [
9] pointed out, the MWD is useful in life-testing and reliability studies because of its desirable properties, especially for situations where the assumption of a constant failure rate is not realistic. Recently, in the reliability report for Gallium Nitride (GaN) power devices by Pozo et al. [
10], they found that the Maxwell–Boltzmann distribution fits the hot carrier distribution in the high energy regime tails well.
The classical estimation of the model parameters of MWD has been studied by [
11,
12] for complete and censored samples, respectively. Following this, Krishna and Malik [
13] studied the MWD under progressive censoring, Krishna et al. [
14] discussed the MWD under randomly censored data, Tomer and Panwar [
15] developed an estimation procedure for the MWD based on Type-I progressive hybrid censored data, Panwar and Tomer [
16] discussed robust Bayesian analysis for the MWD, and Kumari et al. [
17] discussed classical and Bayesian estimation for the MWD based on adaptive progressive Type-II censored data.
The pdf and cumulative distribution function (cdf), respectively, for a random variable
X that follows the Maxwell distribution (MWD) with parameter
(denoted as
) are given by
where
is the incomplete gamma ratio. The MWD has an increasing failure rate function [
13]. The MWD is a special case of many generalized distributions, such as the generalized gamma distribution [
18], generalized Rayleigh distribution ([
19] p.452), and generalized Weibull distribution [
20].
Assume that the lifetime of the items of interest follow the MWD with pdf in Equation (
2), from Equation (
1), the Shannon’s entropy is given by
where
is the Euler–Mascheconi constant.
Censoring often occurs in reliability engineering or life testing procedures when the actual lifetimes of the items of interest are not observed; for instance, when subjects or experimental units are taken out of the experiments intentionally or accidentally. Pre-planned censoring saves time and costs for experiments, and it has been applied in various fields, including but not limited to engineering, survival analysis, clinical trials, medical research, etc. Several censoring schemes, such as Type-I and Type-II censoring schemes (also known as time and item censoring schemes, respectively), are often utilized in life-testing experiments. When products or items have a long lifespan, a life-testing experiment may require a long time to complete, even with Type-I or Type-II censoring. For these situations, Balasooriya [
21] established the first-failure censoring plan that provides a time and cost-saving strategy for life-testing experiments. In this censoring scheme, an experimenter examines
units by grouping them into
n groups, each having
k items, and then conducts all the tests jointly until the first failure is seen in each group. Although the first-failure censoring plan saves time and costs, it does not allow for the periodic removal of surviving units or subjects during the life-testing experiment. Therefore, the progressive censoring scheme proposed by Cohen [
22] that allows for removing units throughout the life-testing experiment can be considered. For this reason, first-failure and progressive censoring were combined to create a more flexible life-testing strategy known as the progressive first-failure censoring (PFFC) scheme [
23]. Due to its compatible features with other censoring plans, the PFFC scheme has gained much attention in the literature. For instance, the Lindley distribution based on PFFC data was studied by Dube et al. [
24], the exponentiated exponential distribution based on PFFC data was studied by Mohammed et al. [
25], the estimation of stress-strength reliability for generalized inverted exponential distribution based on PFFC data was studied by Krishna et al. [
26]. One author Kayal et al. [
27] studied the PFFC scheme and developed statistical inference on the Chen distribution. Following on this, Saini et al. [
28] studied the estimation of stress-strength reliability for generalized Maxwell distribution based on PFFC data, and Kumar et al. [
29] discussed the reliability estimation in inverse Pareto distribution based on PFFC data, etc.
The PFFC scheme can be described as follows: suppose there are
n independent groups, each with
k items, these are placed on a life test, and we have a prespecified number of observed failures
m and a prefixed progressive censoring plan
, where
(
) are the prefixed number of groups without item failure to be removed at
i-th failure such that
. In other words, the test will be terminated when the
m-th failure is observed. When the first failure occurs at time
,
groups without item failure and the group where the first failure is observed are removed from the experiment. When the second failure occurs at time
,
groups without item failure and the group containing the second failure are removed from the experiment, and so on. Finally, when the
m-th failure occurs at time
, the remaining
groups without item failure and the group containing
m-th failure are removed from the experiment. Consequently, the observed failure times,
, are called progressively first-failure censored order statistics with the progressive censoring plan
.
Figure 1 represents the schematic diagram of the PFFC scheme. Note that the PFFC scheme bears the following special cases: (i) it reduces to a complete sample case when
and
; (ii) it reduces to a conventional Type-II censoring plan, if
and
, and
; (iii) it becomes a progressive Type-II censoring plan when
, and (iv) it reduces to a first-failure censoring plan, when
and
. Moreover, the applicability or practicality of PFFC may be a particular setup of a parallel-series system in which
n homogeneous system like a batch of electric bulbs are connected parallel, each batch has
k bulbs or electronics components connected in series. For the testing procedures or collecting lifetimes, the same mechanism presented in
Figure 1 can be used.
Let
be a PFFC sample drawn from a continuous population with cdf
and pdf
. For notation convenience, we suppress the notations
and
in the observed data and denote the observed data as
. The likelihood function can be expressed as [
23]
where
.
To establish any information based on the available data, we can consider a statistical probability model involving some parameter(s), and first, we need to estimate the model parameter(s). The two popular estimation methods in the literature are classical and Bayesian. The Bayesian estimation can be used when the prior information is available. In this study, we consider both estimation methods for feasibility purposes. The main objective of this study is to estimate the associate parameter and Shannon’s entropy for the MWD based on progressively first-failure censored data. Through the study in this manuscript, we aim to provide practical values in applied statistics, especially in the areas of reliability and lifetime data analysis.
The rest of this paper is organized as follows.
Section 2 develops the frequentist estimation techniques, including the maximum likelihood estimation and asymptotic and bootstrap confidence intervals.
Section 3 is devoted to Bayesian estimation techniques using the Tierney-Kadane (T-K) approximation and Markov Chain Monte Carlo (MCMC) methods. In
Section 4, a Monte Carlo simulation study is used to evaluate the performance of the estimation procedures developed in this manuscript. A numerical example is provided in
Section 5 to illustrate the methodologies developed in this manuscript. Finally, in
Section 6, some concluding remarks are presented.
3. Bayesian Estimation Approach
In this section, we derive Bayes estimators under the linear exponential (LINEX) loss function and construct the highest posterior density (HPD) credible intervals for the parameters
and Shannon’s entropy
. For details of Bayesian statistical inference and data analysis methods, one may refer to the books by Box and Tiao [
40] and Gelman et al. [
41]. The Bayesian approach to reliability analysis involves previous knowledge of lifespan parameters, technical knowledge of failure mechanisms, as well as experimental data to be incorporated into the inferential procedure. As pointed out by Tian et al. [
42], employing the Bayesian approach in reliability analysis has the advantages of making statistical inferences using information from prior experience with the failure mechanism or physics-of-failure and avoiding making inferences based on plausibly inaccurate large-sample theory in frequentist approach. For more details about Bayesian inference methods and specifying prior distribution in reliability applications, one may refer to Tian et al. [
42]. As a result, Bayesian techniques are frequently applied to small sample data, which is highly advantageous in the case of pricey life testing tests. For Bayesian estimation, the inverted gamma distribution is commonly used as the natural conjugate prior density for the parameter
of MWD (see, for example, Bekker and Roux [
9], Chaudhary et al. [
43]). Following Bekker and Roux [
9] and Chaudhary et al. [
43], we consider the prior distribution of the unknown parameter
is the inverted gamma distribution with pdf
where
a and
b are hyper-parameters. Thus, by incorporating the prior information in Equation (
16) to the likelihood function in Equation (
6), the posterior distribution of
can be expressed as
where
is the normalizing constant given by
Here, we consider the LINEX loss function proposed by [
44], which is one of the most commonly used asymmetric loss functions. The LINEX loss is defined as
where the loss function’s scaling parameter is
c and
is an estimate of
. The LINEX loss function provides greater weight to overestimation or underestimation, depending on whether the value of
c is positive or negative, and for small values of
, it is virtually identical to the squared error loss function. This loss function is appropriate when overestimation is more expensive than underestimating. Thus, under the LINEX loss function, the Bayes estimator of any function of the parameter
, say
, is given by
From Equation (
19), the Bayes estimators take the ratio of two integrals for which has no closed-form solution. To obtain the ratio of the two integrals in Equation (
19), we suggest using two approximation techniques—the T-K approximation and the MCMC methods. The T-K approximation method is one of the oldest deterministic approximation techniques, whereas the MCMC method is one of the newest popularized techniques based on the posterior sample algorithm. The MCMC could be expensive to compute, especially for large sample sizes
n. Moreover, many MCMC algorithms require a rough estimate of key posterior quantities, such as the posterior variance. Compared to MCMC methods, the T-K approximation cannot be reduced by running the algorithm longer. However, deterministic approximation is typically very fast to compute and sufficiently reliable in several applied contexts. These issues motivate us to develop both techniques in this study. The details of these two approximation techniques are presented in the following subsections.
3.1. Tierney-Kadane (T-K) Approximation Technique
According to T-K approximation’s technique proposed by Tierney and Kadane [
45], the approximation of the posterior mean of the function of the parameter, say
is given by
where
,
is the log-likelihood function,
, and
and
are the determinants of inverse of the negative Hessian of
and
at
and
, respectively. Here,
and
maximize
and
, respectively. We observe that
To determine the value of
, we solve the following non-linear equation:
where
Now,
can be obtained from
where
To calculate the Bayes estimator of
under the LINEX loss function, we take
, consequently the function
becomes
and then
, is computed as solutions of the following non-linear equation
Thus, the approximate Bayes estimator of
under the LINEX loss function is given by
Similarly, the Bayes estimator of Shannon’s entropy
under the LINEX loss function is given by
3.2. Markov Chain Monte Carlo (MCMC) Techniques
In this subsection, we use the MCMC techniques to obtain the Bayes estimates of the parameter
and Shannon’s entropy
under the LINEX loss function. The Metropolis-Hastings (M-H) algorithm was initially established by Metropolis et al. [
46] and subsequently extended by Hastings [
47] and popularized as one of the most commonly used MCMC techniques. The candidate points are created from a normal distribution to a sample from the posterior distribution of
using the observed data
in (
17). The following steps are used to obtain MCMC sequences:
- Step C1.
For parameter , set as the initial guess value
- Step C2.
From proposal density , generate a candidate point .
- Step C3.
Generate u from uniform distribution in .
- Step C4.
Compute .
- Step C5.
If , set with acceptance probability ; otherwise, set .
- Step C6.
Compute
from Equation (
1).
- Step C7.
Repeat Steps C2–C6 M times to obtain the sequence of the parameter as and Shannon’s entropy H as .
To acquire an independent sample from the stationary distribution of the Markov chain, we consider a burn-in period of size
by discarding the first
values in the MCMC sequences. Thus, the Bayes estimators of
and
under the LINEX loss function, respectively, are given by
Based on the MCMC samples, we can obtain the HPD credible interval for the parameter
and Shannon’s entropy
. Suppose
and
denotes the ordered values of
and
, respectively, after the burn-in period, where
. Then, following Chen and Shao [
48], the
% HPD credible interval for
can be obtained as
, where
j is chosen such that
Similarly, the
% HPD credible interval for
is given by
, where
j is chosen such that
4. Monte Carlo Simulation Study
In this section, a Monte Carlo simulation study is conducted to evaluate the performance of the proposed estimation procedures. The frequentist and Bayesian point estimation procedures for the parameter and Shannon’s entropy are compared by means of the average estimates (AE) and mean squared errors (MSE). For interval estimation procedures, we compare the asymptotic confidence intervals (Asym), the percentile bootstrap confidence intervals (boot-p), the bootstrap-t confidence intervals (boot-t), and the HPD credible intervals in terms of their simulated average lengths (AL) and the simulated coverage probabilities (CP). For the bootstrap confidence intervals, the intervals are obtained based on bootstrap samples.
In the simulation study, we consider that the PFFC samples are generated from the MWD with parameter and (the corresponding entropy are and , respectively) with various combinations of a number of groups n, effective sample size m, group size k, and censoring scheme . We consider group sizes and 5, the number of groups and 50, and effective sample sizes and . Three different censoring schemes for each combination of n and m are considered:
- (I)
: groups are removed from the experiment at the first failure only;
- (II)
: groups are removed at failure;
- (III)
: first-failure censored sample.
The censoring schemes ([CS]) used in the Monte Carlo simulation study are summarized in
Table 1. Note that simplified notations are used to denote the censoring schemes, for example,
denotes
and
stands for
.
For the Bayesian estimation approach, the Bayes estimates of Shannon’s entropy are computed with informative inverted gamma prior under the LINEX loss function. The hyper-parameters
are selected for Bayesian computations of the parameter
and Shannon’s entropy in such a manner that the prior mean is precisely identical to the true values of the parameter, i.e.,
. Specifically, we consider
and
for
and
, respectively. When computing Bayes estimators under the LINEX loss function, we consider the loss function parameter
and 0.5. We use
M = 10,000 with a burn-in period
for the M-H algorithm. The simulation results are based on 1000 repetitions in this study. All the computations are performed using the statistical software R (
https://www.r-project.org/) [
49].
The simulated results for point estimation are presented in
Table 2 and
Table 3, and the simulation results for interval estimation are presented in
Table 4 and
Table 5. For point estimation, from
Table 2 and
Table 3, we observe that the MLEs and Bayes estimates of Shannon’s entropy are performing well with small MSEs. The simulated MSEs decrease as
n or
m increases. The Bayes estimates perform better than the MLEs in terms of MSEs when the prior information matches the true value of
. Among the two approaches to obtain the Bayes estimates, the Bayes estimates obtained by the M-H algorithm techniques outperform the Bayes estimates obtained by the T-K approximation in terms of the MSEs.
For interval estimation, from
Table 4 and
Table 5, it can be seen that the simulated average lengths of the 95% asymptotic, percentile bootstrap, bootstrap-
t confidence intervals, and the Bayesian HPD credible intervals are decreasing as the number of failures (
m) increases. All the interval estimation procedures provide reasonable simulated coverage probability (CP) that are close to the nominal level
. The HPD credible intervals have smaller simulated average lengths than those frequentist confidence intervals.
5. Practical Data Analysis
To demonstrate the effectiveness of the MWD in modeling lifetime data and illustrate the methodologies developed in the paper, a practical data analysis of a real data set is conducted. We consider the tensile strength (in GPa) of 100 carbon fibers. This data set was originally reported by Nichols and Padgett [
50] and further studied by Mohammed et al. [
25] and Xie and Gui [
51]. The data set is presented in
Table 6.
First, we use the scaled total time on test (TTT) transform to understand the behavior of the failure rate function of the data set. The scaled TTT transform is given by
where
represent the
i-th order statistic of the sample. If the plot
is convex (concave), the failure rate function has a decreasing (increasing) shape. For more details about scaled TTT transform, see, for example, Mudholkar et al. [
52]. The scaled TTT plot of the data set in
Table 6 is displayed in
Figure 2.
Figure 2 shows that the considered data set follows an increasing failure rate function. This empirical behavior of the failure rate function indicates that the MWD model can be considered a suitable model for this data set.
Furthermore, we check whether the MWD is well-fit for the data set in
Table 6 using two goodness-of-fit tests. We consider the Kolmogrov–Smirnov (KS) and Anderson–Darling (AD) test statistics and obtain the corresponding
p-values. The KS and AD statistics with their corresponding
p-values (in paratheses) are 0.0884 (0.4145) and 0.7977 (0.4824), respectively. According to these
p-values of the goodness-of-fit tests, the MWD fits quite well for the data set in
Table 6. In addition to the goodness-of-fit test, we also assess the feasibility of fitting the data set using MWD graphically using the empirical and fitted cdfs plot and the probability-probability (P-P) plots in
Figure 3. These plots are tools used in statistics to assess the goodness-of-fit of a statistical model to observed data. A good fit is indicated when the points in the P-P plot lie close to a straight line (usually a 45-degree line), suggesting that the empirical and theoretical cumulative probabilities are a good fit or similar. From
Figure 3, one can observe that the observed data points show almost similar patterns to the theoretical distribution, i.e., the MWD fits the data set in
Table 6 reasonably well.
To illustrate the methodologies developed in this paper, we generate first-failure censored data based on the data set in
Table 6. After grouping the 100 carbon fiber into
groups and
individuals within each group. The grouped data and the corresponding first-failure censored samples are reported in
Table 7. The items with “+” within each group indicate the first failure. Then, we obtain six different PFFC samples using different censoring schemes for
and 20 based on the first-failure censored data in
Table 7. The censoring schemes and the corresponding PFFC samples are presented in
Table 8. To avoid ambiguity with the censoring schemes in
Table 1, we named these censored schemes [CS1]–[CS6].
Based on each PFFC sample in
Table 8, we compute the MLEs and Bayes estimates of the parameter
and Shannon’s entropy
. As we do not have prior information on the parameter
, we use a non-informative prior for obtaining the Bayes estimates. The Bayes estimates are computed using the T-K approximation and MCMC methods under the LINEX loss function at two values of loss parameter
and 0.5. We construct 95% asymptotic, percentile bootstrap, bootstrap-
t confidence intervals and the Bayesian HPD credible intervals for the parameter
and Shannon’s entropy
. The point and interval estimation results are presented in
Table 9 and
Table 10, respectively.
For the Bayesian estimation procedures, we validate the convergence of the generated MCMC sequence of the parameter
samples from the posterior distribution by using the M-H algorithm for their stationary distributions using graphical diagnostic tools, such as the trace plot, boxplot, and histogram with Gaussian density plots, as shown in
Figure 4 for
. The trace plot shows a random scatter around the mean (shown by a thick red line) and a fine mixture of parameter chains. The posterior distribution is almost symmetric, as seen by the boxplots and histograms of produced samples, implying that the posterior mean can be used as a reasonable Bayes estimate of the parameter
. For illustration, we also present the simulated posterior predictive densities in
Figure 5.