1. Introduction
Process capability indicators are statistical measures of the inherent variability of a process and its ability to meet specifications. They are used to evaluate the quality and performance of parts and processes. Some common process capability indicators are: (i)
and
: They show how capable a process is of meeting its specification limits, used with continuous data.
measures the potential capability of a process, while
measures the actual capability during production. They are calculated as ratios of the specification width to the process width; (ii) Sigma: It is a capability estimate typically used with attribute data (i.e., with defect rates). It reflects the non-conformance rate of a process by expressing this performance in the form of a single number; and (iii)
: It is a capability estimate that takes into account the deviation of the process mean from the target value. It is used when the process output has a nominal or optimal value that is different from the midpoint of the specification limits. Process capability indicators can be applied to different fields and industries. For example, in agriculture, capability assessment is a method to assess land suitability for existing and potential agricultural and non-agricultural uses. It identifies possible physical, chemical and degradation constraints to land use on particular soils and landscapes. In medical sciences, the capability assessment can help measure the performance, precision and trueness of a process within the specifications and incorporate the loss function in capability analysis. Some examples of applications in the health field are: Assessment of quality control processes at a clinical laboratory chain; evaluation of well-being and social care interventions using the capability approach; measurement of process capability for electronic industries using a new index based on symmetric loss function. In engineering, the process capability indicators are appropriate and practical tools for asses ensuring that manufactured products conform to specifications. Moreover, product design information can be provided to reduce cost due to product failure. Therefore, quality control engineers use various statistical process methods to measure the capacity of a manufacturing process and to determine required specifications. In the literature, there are many compounds that can be applied to provide numerical measures of the ability of a process, see Montgomery [
1] and Kane [
2]. These vehicles/compounds include the process capability index/indicator (PCI), which provides a numerical indication of whether or not the production process is capable of producing products within specification limits. This specification is determined by the minimum specification limit
L, the upper limit
U, and the target value
T. Many articles seem to introduce new indicators or examine the properties of existing indicators based on various assumptions. For example, Kotz and Johnson [
3] presented an extensive bibliography on process capacity indicators. Kaminsky et al. [
4] provided an overview of the use of product conforming process capacity indicators. Schneider et al. [
5] discussed the uses PCI in the supplier certification process. The widest indices in particular of normally distributed characteristics are
,
,
and
. For more precise details and clarifications about these indices, we refer to Juran [
6], Chan et al. [
7], Kotz and Lovelace [
8], Pearn et al. [
9] and Gunter [
10]. On the other hand, there are several PCIs that are valid for both nonnormal and typical output process characteristics, see Pearn and Chen [
11,
12], Clements [
13] and Polansky [
14].
In recent years, Maiti et al. [
15] listed a new generic index
related directly or indirectly to most PCIs described in the past. Moreover, they include the characteristics of normal, continuous, abnormal as well as discrete quality and are defined as follows:
where
F is the distribution function of the quality characteristic
X.
L and
U are the lower and upper limits of the specification, respectively.
and
are the desirable lower and upper limits, respectively. It can also be formulated in terms of tail probabilities as
where
P is the output of the process, and
is the desired yield. For normal operation with
From (3) we observe that, if
, then
, if
(for a normal process,
), then
, if
then
and the value of
as
. From the perspective of reliability research, survival analysis, and life test trials/experiments, it is usually not possible to observe all life spans of all products under test due to time constraints or other constraints such as money, material resources, mechanical or experimental difficulties, etc. Data collection. For this reason, reducing the total time and high cost of testing is vital. In this type of experiment, units may fail or be removed prior to failure, which are used for future experiments. Thus, censored/controlled sampling may occur in practice. Nowadays, control methodologies in reliability analysis have several types that have been implemented in lifetime experiments. In practice, there are usually two random variables, the time and the number of failed units. This control charts strategy shows how the examiner imagines the experience based on a predetermined time. The first type of control regimen involves calculating a random number of units, which means that the exact time to stop the trial may be assumed. On the other hand, the type II control system involves a predetermined number of failure units and a random time. In both control schemes, units cannot be removed from the experiment until the final stage or failure of the number of units. However, the proposed methodology, incremental type II control/censorship (PT2C), has more flexibility than type II control by allowing units to be withdrawn from testing at different observed failure times. This approach has been explored in various studies such as Balakrishnan and Sandhu [
16], Balakrishnan et al. [
17], Fernindez et al. [
18], Aslam et al. [
19], Panahi [
20], Wang et al. [
21], Wang et al. [
22], Luo et al. [
23], Saberzadeh et al. [
24], and Zhuang et al. [
25].
Let
be the PT2C scheme described by Balakrishnan and Sandhu [
16], see
Figure 1. Probability function associated with PT2C with the scheme
can be formulated as
where
.
Several authors have discussed the statistical inference about PCIs based on censored samples from different lifetime distributions, see for example, Wu and Chiu [
26], Hong et al. [
27], Lee et al. [
28], Saha et al. [
29], Ahmadi and Doostparast [
30,
31] and EL-Sagheer et al. [
32].
In this paper, our main objective is to estimate using four different approaches, namely maximum likelihood, percentile bootstrap, bootstrap-t, and Bayes under PT2C samples from a Pareto distribution. Estimation techniques for model parameters are methods used to estimate the values of parameters within a model. These techniques generally use collected data and a model of the system to produce an estimate of the parameters of the system. There are a variety of techniques that can be used to estimate the parameters of a model. The most common techniques are maximum likelihood estimation (MLE) and Bayesian estimation. MLE is a method of estimating the parameters of a statistical model using the observations of a sample. It is based on the probability of the observations given the model parameters. Bayesian estimation is similar to MLE but takes into account the prior beliefs of the model parameters. It uses Bayesian probability theory to calculate the posterior probability of the model parameters given the observations. Other estimation techniques include the least squares method, which is used to estimate parameters of linear models, and the expectation-maximization algorithm, which is used to estimate parameters. The approximate confidence interval (ACI) of is constructed based on the asymptotic normality of MLE. In the Bayesian framework, Markov chain Monte Carlo (MCMC) techniques are applied due to the complexity of the system of equations to solve.
The rest of the paper is organized as follows. In
Section 2, we have developed
for the Pareto model.
Section 3 deals with MLE for the index
and asymptotic confidence interval. Bootstrap methods are investigated in
Section 4. Bayes estimators using importance sampling technique under gamma priors with SELF are discussed in
Section 5. In
Section 6, a simulation study is conducted to compare the performance of the proposed techniques. Two real data sets are reported and discussed in
Section 7. Finally, a brief conclusion is listed in
Section 8.
3. The Maximum Likelihood Estimation (MLE) Approach
Distributions. It is The MLE is a widely used statistical technique used to obtain the unknown parameters of a given statistical model using data from a sample. MLE is based on the principle that the set of parameter values that maximize the probability of obtaining the observed data is the most likely value of the unknown parameter. MLE is a well-established and widely used method in many scientific fields because it is characterized by its ease in estimating the parameters of a particular population given a sample. It also has the advantage of being easy to implement computationally. The MLE technique is commonly used in applications such as regression analysis, hypothesis testing, and model fitting. In MLE, the parameters of interest are estimated from the sample data by finding the maximum likelihood estimates of the parameters. This is achieved by setting up the likelihood function, which is the probability of the observed data given the parameters, and then maximizing the likelihood function with respect to the model parameters. Let
be a PT2C sample from
, with PDF and CDF as given in (5) and (6), respectively. Then, according to (4), the likelihood function (LF) of PT2C given as
where
C is defined in (4). The corresponding log-LF
is
Thus, the likelihood equations are
and
The Newton Raphson (NR) technique is applied to solve the previous system numerically. Once MLEs of
and
denoted by
and
are derived, the MLEs of the index
can be obtained as
The Fisher information matrix (FIM) is needed to create the ACIs for
and
, see Cohen [
37]
Thus, the elements of the FIM can be reported as
and
Further, the inverse of
is given by
To calculate the ACI for
, Greene [
38] utilized the delta approach for this issue. Assume
. Then, the variance
can be listed as
where
is the transpose matrix of
. Thus, the
ACI for
can be derived as
5. Bayesian Estimation
Bayesian estimation is a powerful tool used to infer unknown parameters from measured data. It relies on Bayes’ theorem, which is a theorem from probability theory that provides a means of updating the probability for a hypothesis as more evidence is collected. This approach has many advantages over traditional maximum likelihood estimation techniques due to its ability to incorporate prior knowledge into the estimation process. In addition, it has the capability to provide an estimate of the uncertainty associated with each parameter. Due to its many advantages, Bayesian estimation has become increasingly popular in a wide range of applications, including signal processing, machine learning, and artificial intelligence. For example, Bayesian estimation can be used to determine the parameters of a linear system given a set of observations. In machine learning, it can be used to determine the most likely hypothesis given a set of data. Similarly, it can be used in artificial intelligence for predicting the outcome of an event given a set of conditions. The family of gamma distributions is known to be flexible enough to cover a large variety of the experimenter’s prior beliefs, see Kundu and Howlader [
42]. Let that the parameters
and
be stochastically independently distributed (SID) with conjugate gamma prior model. Suppose that
Gamma
and
Gamma
and that these two priors are independent. Consequently, the joint prior density of
and
can be written as
where
and
,
are the hyper-parameters. Subsequently, via Bayes’ theorem, the joint posterior density function of
and
for given data is
where
and
The SELF can be written as
where,
is the estimation of
. Thus, the Bayesian estimation of
under SELF is
. Then, Bayes estimate of
using SELF is computed by
Obviously, estimators of and under SELF can be obtained by the importance sampling method.
Importance Sampling Procedure (ISP)
The ISP is one of the famous MCMC techniques. Also, it is consider an effective approach to attain Bayes estimates for . Moreover, the associated higher posterior density (HPD) intervals can be constructed through this method under PT2C data. MCMC is an important technique in the field of statistical computing. It is a powerful tool that can be used to sample from a given probability distribution, allowing for the estimation of a variety of parameters. The Markov chain, a collection of random variables that adheres to the Markov property, which claims that a system’s future state is solely reliant on its present state, is the foundation of MCMC. MCMC algorithms use the Markov chain to sample from a target probability distribution and estimate various parameters. In particular, they are used to compute expectations, like the mean and variance, of a given model. The MCMC algorithm works by constructing a Markov chain whose state space is equal to the space of possible values of the target probability distribution. Starting from an initial state, the algorithm iteratively generates a sequence of states which are accepted or rejected based on the value of the target distribution at each state. As mentioned previously that is the PDF of a gamma distribution and, therefore, samples of can be easily generated using any gamma generating routine. The density function cannot be reduced analytically to a well-known distribution, and thus cannot be directly sampled by standard methods. Through the ISP, we extract the sample from and and attain the Bayes estimates of and the corresponding estimator of the index . The ISP approach is described as follows.
- (1)
Begin with initial guess value
- (2)
Put
- (3)
Generate
from
utilizing the way/methodology reported via Metropolis et al. [
43] with the
.
- (4)
Generate from .
- (5)
Put .
- (6)
Repeat Steps 2–5, for N times and simulate the sequence of samples of .
- (7)
The Bayesian estimate of
can be calculated by
where
is the burn-in-period of MCMC.
- (8)
Using the above sequence in Step 6 and 7, we can obtain the sequence
, and then
To compute the CRIs of
, sort
in ascending order as
Then, the
symmetric CRIs of
can be obtained by
6. Simulation Study and Discussion
In this segment, a lot of simulation experiments are performed to evaluate and test the performance of the four estimation techniques (ML, boot-p, boot-t, MCMC) by Monte Carlo simulations of the index for PD. Point estimation is evaluated by mean squared errors (MSE), while interval estimation is evaluated based on mean lengths (ALs) and coverage probability (CPs) calculated as the number of CIs that covered true values divided by 10,000. Here, for the simulation study, we considered a different setting of parameter values such as = , , , and with the as and . Then, the true values (TV) of are evaluated to be , , and , respectively.
For Bayesian computation: Bayes estimates and the HPD credible intervals (CRIs) are computed based on
12,000 MCMC samples and discard the first values
as “burn-in” with the hyper-parameters
and
, respectively. For comparison purposes, we consider
,
,
,
,
,
,
and
. For all the combinations of sample size, three various censoring schemes (CS) are determined. For simplicity, we abbreviate the censoring schemes. For example, (1, 1, 1, 1, 0, 0, 0) is represented as
. Toward this end, we considered the various PT2C schemes which are listed in
Table 1. The simulation results are listed in
Table 2,
Table 3,
Table 4,
Table 5,
Table 6,
Table 7,
Table 8,
Table 9,
Table 10 and
Table 11, according to which we note the following:
The first scheme is the best strategy in terms of having lesser MSEs and ALs for fixed sample sizes and observed failures.
As actual value rises, ALs also decrease in parallel.
As we predicted, Bayes estimates for have the smallest MSEs and shortest ALs. Therefore, Bayes estimates outperform MLEs and bootstrap techniques.
In terms of MSEs and ALs, bootstrap techniques outperform the ML approach. Additionally, boot-t outperforms boot-p.
The estimates from the ML, bootstrap, and Bayesian approaches are all extremely close, and the CP values for the ACI are very high (around 0.95). The highest CPs are also found in the Bayesian CRIs.
In general, if prior knowledge about the problem under study is available, the Bayesian strategy used in conjunction with the significant sampling procedure is the optimum estimation approach.
Finally, we may say that the given inference methods produce reliable outcomes.
7. Data Analysis
In this section, to illustrate the inferential procedures discussed in the previous sections, two sets of real data are analysed. The first set represents the initial failure times (in months) for 20 electric vans used for internal transportation and delivery at a large manufacturing facility and details are presented in Zimmer et al. [
44] and recently used by Saha et al. [
22]. The second data group represents the failure intervals (in hours) of the air conditioning system of a 13 Boeing 720 aircraft taken from Proschan [
45]. The two sets of data are as follows.
Data Set I: 0.9, 1.5, 2.3, 3.2, 3.9, 5.0, 6.2, 7.5, 8.3, 10.4, 11.1, 12.6, 15.0, 16.3, 19.3, 22.6, 24.8, 31.1, 38.1, 53.0.
Data Set II: 1, 4, 11, 16, 18, 18, 18, 24, 31, 39, 46, 51, 54, 63, 68, 77, 80, 82, 97, 106, 111, 141, 142, 163, 191, 206, 216.
By evaluating the quality of fit of the two datasets, we first determined whether the analysed datasets genuinely originate from PD or not. The Kolmogorov-Smirnov (K-S) statistic and associated
p-values are the foundation of this method. The K-S statistic measures the separation between two samples’ empirical distribution functions or between their empirical distribution functions and the CDF of reference distribution. As a result, the K-S statistic is solely used to evaluate fit quality and not as a selection criterion. It is 0.067503 for the first group with a
p-value of 0.9999, and 0.11513 for the second group with a
p-value of 0.8666. The fact that the
p-values are so high means that we cannot rule out the possibility that the data came from a Pareto model, and as a result, this probability model fits the real datasets well.
Figure 3 and
Figure 4 display empirical, Q-Q, and P-P charts, which demonstrate how well PD fits the data. Here, we have fixed hypothetical
and hypothetical
are
for data Set I and
for data set II with desirable yield
.
According to the data set I, we can generate the PT2C sample of size
taken from a sample of size
with censoring scheme
using the algorithm described in Balakrishnan and Sandhu [
16]. A PT2C sample generated from the data set I is given as follows
0.9 | | 1.5 | | 3.2 | | 3.9 | | 5.0 | | 6.2 | | 22.6 | | 24.8 | | 31.1 |
Similarly, based on the data Set II, we can generate the PT2C sample of size
taken from a sample of size
with censoring scheme
. The PT2C sample is
1 | | 4 | | 11 | | 16 | | 18 | | 18 | | 18 | | 31 | | 39 | | 51 | | 54 | | 68 | | 82 | | 141 | | 216 |
For the previous data sets considered, based on a PT2C, we have computed the point estimates of the index
using ML, boot-p, boot-t and Bayes methods, the results are reported in
Table 12. Further, we determined the
ACIs based on ML and bootstrap methods as well as
HPD credible interval using MCMC samples and the results are listed in
Table 13. In the Bayes framework, we assume that the non-informative priors for
and
, that is, when
and
. In addition, 12000 MCMC samples were generated, and the first 2000 samples were generated as ‘burn-in’.
Figure 5 and
Figure 6 display the trace plots of
computed by MCMC approach for data Sets I and II.
8. Conclusions
In this manuscript, we have considered four different estimation techniques, namely maximum likelihood, bootstrap bounds, and Bayes to obtain the
index estimation and illustrate the proposed methods using two practical examples. MLEs are derived using NR’s iterative numerical technique. Meanwhile, the asymptotic confidence intervals are generated based on the observed and predicted Fisher information matrices. In order to address the problem of small sample size, two bootstrap confidence intervals were generated. The Bayesian estimate within the squared error loss function is also taken into account and the estimates are derived through a significant sampling procedure using the Metropolis – Hasting algorithm. Moreover, credible periods with a higher corresponding background intensity are generated. Since it is not possible to compare these methods theoretically, we performed a large-scale simulation study to compare these methods with different sample sizes
, different control schemes (1, 2, 3) and different values of
. In the simulation section, squared error values are averaged to evaluate point estimation performance while mean lengths and coverage rates are taken into account for interval estimation. Results for MSE estimates are reported in
Table 2,
Table 3,
Table 4,
Table 5 and
Table 6, while results for ALs and CPs for estimates are presented in
Table 7,
Table 8,
Table 9,
Table 10 and
Table 11. Finally, we feel that the contents of the manuscript may be useful to researchers and practitioners in various fields of industry where lifetime distributions are widely used.
In future research, we will discuss lifetime performance index assessment with numerical analysis based on the second type of adaptive stepwise control.