1. Introduction
The problem of examining how well the data fit a supposed distribution is very important, and it must be confirmed prior to any data analysis, because many data analysis methods assume a specific distribution of data. The evaluation of the GoF TstS for a statistical model involves assessing how effectively it aligns with a given set of observations. Measures quantifying GoF typically synthesize the disparities between observed values and those anticipated under the model that is being considered. Usually, histograms or Q-Q plots are employed for the assessment of data distribution. Additionally, a GoF TstS utilizes distance measurements between the empirical distribution function and the theoretical cumulative distribution function (cdf) to evaluate data distribution. In this situation, we reject the null hypothesized distribution if the distance is far in some cases.
In life-testing or reliability studies, the observed failure time of test units may not be recorded in some situations. Furthermore, there are situations wherein the removal of units prior to failure is pre-planned in order to reduce the cost or time associated with testing. Among the censoring methods, progressive Type II censoring schemes (PrCs) have become quite popular in life-testing or reliability studies. The PrCs arises in life-testing or reliability studies as follows. Randomly,
surviving test units are removed from the test after observed the 1st failure unit. Moreover, randomly,
surviving test units are removed from the test after the observed 2nd failure unit. Finally, all the remaining test units (
) are removed from the test after the observed
mth failure unit. In PrCs, we suppose that integer
m and
are pre-assigned. Moreover, the ordered observed times for failure units (
) are referred to as PrCsD (Ref. [
1]). Recently, some studies on PrCs were carried out by many authors (Refs. [
2,
3,
4,
5,
6,
7]). Ref. [
2] discussed the estimation of reliability in a multi-component stress–strength model for a general class of inverted exponentiated distributions under PrCs. Ref. [
3] discussed classical and Bayesian estimation of the inverse Weibull distribution using PrCs. Ref. [
4] discussed applying transformer insulation using Weibull extended distribution based on PrCs. Ref. [
5] discussed inference on maintenance service policy under step-stress partially accelerated life tests using PrCs. Ref. [
6] discussed monitoring the Weibull shape parameter (ShPm) under PrCs in presence of independent competing risks. Ref. [
7] discussed the analysis of gamma distribution under PrCs.
The GoF TstSs for completely observed data can no longer be used in PrCsD. For this reason, the GoF test based on PrCsD has received attention from authors (Refs. [
8,
9,
10,
11,
12,
13,
14,
15,
16,
17]). Ref. [
8] proposed a GoF test for the exponential distribution (ExpD) based on PrCsD using spacings for PrCsD. Ref. [
9] proposed approximate GoF tests for LoScD based on PrCsD using empirical distribution function. Ref. [
10] proposed GoF tests for LoScD based on PrCsD using spacings for PrCsD. Recently, in Ref. [
11], Lee and Lee (2019) proposed a GoF tests and plot method for LoScD based on PrCsD using generalized LrCv. Ref. [
12] proposed a GoF test for inverse Rayleigh distribution based on PrCsD using entropy. Ref. [
13] proposed a GoF test based on the Gini index of spacings for PrCsD. Ref. [
14] proposed a GoF test for ExpD based on general PrScD using spacings for PrCsD. Ref. [
15] proposed a GoF test for inverse Weibull distribution based on PrScD using OrSt. Ref. [
16] proposed a GoF test for distribution based on type I left censored data. Ref. [
17] proposed a GoF test for Rayleigh distribution based on censored data.
While numerous GoF TstS under PrCsD have been proposed in the literature for various distributions, to the best of our knowledge, these tests do not encompass both the TstS and graphical methods. This motivates us to develop new GoF TstS and graphical methods for LoScD for PrCsD. In this paper, therefore, we suggest a GoF TstSs and new graphical method for the GoF test of LoScD based on PrCsD. The rest of this paper is organized as follows. The introduction of the LrCv is presented in
Section 2. In
Section 3, we propose a GoF TstSs and a new plot method for the GoF test that uses a LrCv. In
Section 4, the power of the suggested TstSs is estimated through MC simulations, and it is compared with that of the TstSs proposed by Ref. [
10]. In
Section 5, we analyze two examples (real data sets). Finally, in
Section 6, we present the conclusion.
2. Lorenz Curve
LrCv presents the means to evaluate income disparity between two countries. From the LrCv, Ref. [
18] gave terms under which such an LrCv inequality comparison has normative significance. In the case of an increasing and strict concave utility function, Ref. [
18] indicates that one prefer a distribution with dominating LrCvs do not cross. Ref. [
19] presented an alternative definition of the LrCv in terms of the inverse of continuous variables as well as discrete variables. Let
F denote the cdf of income distribution, and the income is assumed to be non-negative. Furthermore, for a given percentile
p, let
denote the inverse cdf. We suppose throughout that
F is a continuous cdf with finite support. Then, the LrCvs corresponding to the distributions with
F is defined as
where
means the mean of the distribution. Assume that
are positive random variables (RanV) with OrSt
. Then, the sample LrCv (Ref. [
20]) is defined by
Given that the LrCv possesses the property of comparing the degree of wealth distribution between two different distributions, our intention is to utilize it in the development of a goodness-of-fit test statistic.
3. Test Statistics
Let
be the PrCsD with PrCs
from the LoScD. Moreover, let the PrCsD have an LoScD with a probability density function (pdf)
where
is the known function (Ref. [
8]). Furthermore, we assume that location and scale parameters,
and
, respectively, of
are unknown and
is the standard form of the
. Then, we want to test whether the PrCsD comes from an LoScD with Equation (
1), and test the null hypothesis (
)
where
denotes the distribution function (Ref. [
8]).
First of all, we introduced GoF TstS based on the distance between OsSt (Ref. [
10]). Let
. Then,
denotes the deviation between the
jth OrSt (
) and its expected value (
) based on the PrCsD. Here,
Then, TstSs based on the deviation between OrSts are obtained as
Here, the above TstSs are related to the modified Kolmogorov–Smirnov TstS.
Now, we propose TstSs by using sample LrCv. All LoScD do not have non-negative support. However, the sample LrCv supposed that
Y is a non-negative income. Therefore, in order to solve, all values of the ordered PrCsD were subtracted by the value of the 1st ordered PrCsD. Then, each result was added. Furthermore, a sample LrCv cannot show the property of the shape of distribution. Therefore, in order to solve, the result is added from
. Then, the modified sample LrCv is derived as
We used the percentile points (%pts) of Gumbel distribution (GumDist), log-gamma distribution (LGamDist) with ShPm 3, 6, 9 and
∞; normal distribution (NormDist); and
t distribution (
tDist) with 4, 5, 6 and 7 degrees of freedom (DoF) (
Figure 1 and
Figure 2). As shown in
Figure 1 and
Figure 2, the modified sample LrCv of LoScDs has a different shape. Here, the modified sample LrCv using the percentile points of LoScD is obtained as
Then, the ratio modified sample LrCv using the Equations (
4) and (
5) is obtained as
Here, the
has the following result.
Lemma 1. and are a location-scale (LoSc) invariant statistic.
Proof of Lemma 1. Let Y be a RanV with a location parameter (LoPm) and scale parameter (ScPm) . Let , then . The distribution of X does not depend on LoPm and ScPm .
of
is
of
is
Let have a LoPm and ScPm . If , then . The distribution of does not depend on LoPm and ScPm .
of
is
of
is
□
Theorem 1. is a LoSc invariant statistic.
Proof of Theorem 1. Theorem 1 is straightforward according to Lemma 1. □
If the data come from an LoScD, we expect all the
values to be 1. By applying these properties of
to Ref. [
10]’s TstSs (Equation (
3)), we propose the following TstSs.
If the data come from an LoScD, we expect
,
,
,
,
and
TstSts to be 0. Consequently, large values of
,
,
,
,
and
TstSts lead to the rejection of
(Equation (
2)). Therefore, we reject
if
,
,
,
,
and
TstSs exceed the corresponding null critical values (CrVal). Since
,
,
,
,
and
TstSs have a drawback in that their distribution theory is difficult, the %pts need to be determined through MC simulations because the CrVal are not available explicitly.
Furthermore, using
, we propose a new plot method for the GoF test. If the data come from an LoScD, we expect all the
values to be 1. Therefore, using these property of
, we would like to propose a new plot method as follows.
If the data come from an LoScD, the is 1 and converges with . Therefore, we are going to test if the data follow the LoScD by using the degree of how much the is apart from the .
To check the shape of
of various LoScDs, we generate %pts of NormDist;
tDist with 4, 5, 6 and 7 DoF; GumDist; and LGamDist with parameter 3, 6, 9 and
∞. Furthermore, we draw the
. The results of
for various LoScDs appear in
Figure 3. From
Figure 3,
converges with the
at NormDist and GumDist. In
tDist and LGamDist, however,
is apart from the
x-axis.
4. Simulation Result
In this Section, we assess the power of the proposed TstS by comparing the simulated power values with those of Ref. [
10]’s TstSs. First of all, we generated 10,000 data for various PrCs (different choices of sample size and PrCs). Here, PrCs were used by Ref. [
21].
The proposed TstS is designed to be free of LoSc parameters, ensuring that distributions with these parameters remain unaffected by their specific values. Consequently, the standard distribution serves as the parameter value for the null distribution, ensuring that the power of the test remains consistent irrespective of the parameter value in the null distribution. The alternative distribution, on the other hand, incorporates an ShPm with diverse values to represent a range of distribution shapes.
We consider a NormDist and GumDist as the NuDist. For testing the NormDist, the alternative distribution is considered
tDist with 4, 5, 6 and 7 DoF. For testing the GumDist, the alternative distribution is considered LGamDist with ShPm 3, 6, 9 and
∞. All numerical computations are carried out via R 4.3.2 software (
Supplementary Materials) utilizing two packages, namely: ‘goftest’ and ‘VGAM’ packages.
When considering the alternative distribution as the distribution from which the data are simulated, the rejection probabilities provide insights into the power of the TstSs. A power value approaching 1 indicates higher test effectiveness. The estimated power are presented in
Table 1 and
Table 2. The proposed TstSs gained better power as the PrCsD size increased.
Table 1 presents the estimated power of the TstSs when the
stipulates NormDist and the
corresponds to
tDist with 4, 5, 6 and 7 DoF.
Table 1 shows that the
TstS possessed better power than Ref. [
10]’s TstSs in a number of PrCs (indicated in bold). The
TstS was found to be better than Ref. [
10]’s TstSs in all PrCs. When the data were generated from
tDist with 4 DoF, the proposed TstSs gained better power. The
TstS was always more powerful than the other proposed TstSs. Furthermore,
,
,
,
,
and
TstSs were compared with
,
,
,
,
and
TstSs, respectively. As a result,
,
,
and
TstSs were found to be better than
,
,
and
TstS in 64, 80, 68 and 68 out of 108 PrCs, respectively.
Table 2 presents the estimated power of the TstSs when the
stipulates GumDist and the
corresponds to LGamDist with ShPm 3, 6, 9 and
∞.
Table 2 shows that the
TstS possessed better power than Ref. [
10]’s TstSs in a number of PrCs (indicated in bold).
TstS was found to be better than Ref. [
10]’s TstSs in 64 out of 108 PrCs. When the data were generated from LGamDist with ShPm
∞, the proposed TstSs gained better power. The
TstS was almost always more powerful than the other proposed TstSs. Moreover,
,
,
,
,
and
TstSs were compared with
,
,
,
,
and
TstSs, respectively. As a result,
,
,
,
and
TstSs were found to be better than
,
,
,
and
TstSs in 99, 93, 93, 75 and 68 out of 108 PrCs, respectively.
Therefore, it can be seen that the TstSs using the LrCv are better than the TstSs using the OrSt.
5. Real Data Analysis
In this Section, we present two examples of real data analysis using Ref. [
10]’s TstSs and the proposed TstSs for illustrative purposes.
Example 1 (Breaking strength data)
The Example 1 data were previously studied by Refs. [
11,
22,
23]. In Example 1, Refs. [
11,
22,
23] generated a PrCsD of size
from
. The PrCsD are given in
Table 3.
The values of the TstSs and the corresponding
p-values are presented in
Table 4.
Table 4 shows that all
p-values are greater than significance level 0.05. Therefore, the given
p-values support
of the NormDist for the data. This result is in agreement with the findings of Refs. [
10,
11,
22,
23].
We can confirm this with
. The
of Example 1 is presented in
Figure 4.
Figure 4 shows that the
of Example 1 is closed to
. Thus,
concludes that Example 1 follows NormDist.
Example 2 (log transformed insulating fluid test data)
The Example 2 data were previously studied by Refs. [
11,
23]. In Example 2, Refs. [
11,
23] generated a PrCsD of size
from
. The PrCsD are given in
Table 5.
The values of the TstSs and the corresponding
p-values are presented in
Table 6.
Table 6 shows that all
p-values are greater than significance level 0.05. Therefore, the given
p-values support the
of the GumDist for the data. This result is in agreement with the findings of Refs. [
10,
11,
23].
We are also able to confirm the
. The
of Example 2 is presented in
Figure 4.
Figure 4 shows that the
of Example 2 is closed to
. Thus,
concludes that Example 2 follows GumDist.
6. Conclusions
The problem of examining how well the data fit a supposed distribution is very important, and it must be confirmed prior to any data analysis. Usually, we use a histogram or Q-Q plot for the assessment of data distribution. Furthermore, we use a GoF TstS. In life-testing or reliability studies, the observed failure time of test units may not be recorded in some situations. The GoF TstSs for completely observed data can no longer be used in PrCsD. In this paper, we suggest a GoF TstSs and a new plot method for the GoF test of LoScD based on PrCsD.
The proposed TstS is designed to be free of LoSc parameters, ensuring that distributions with these parameters remain unaffected by their specific values. Consequently, the standard distribution serves as the parameter value for the null distribution, ensuring that the power of the test remains consistent irrespective of the parameter value in the null distribution. The power of the suggested TstSs is estimated through MC simulations, and it is compared with that of the TstSs using the OrSts. As the parent distributions, we consider NormDist and GumDist. For testing the NormDist and GumDist, the alternative distribution is considered tDist with 4, 5, 6 and 7 DoF and LGamDist with ShPm 3, 6, 9 and ∞, respectively.
For testing the NormDist, the
TstS possessed better power than Ref. [
10]’s TstSs in a number of PrCs.
TstS was found to be better than Ref. [
10]’s TstSs in all PrCs. For testing the GumDist, the
TstS possessed better power than Ref. [
10]’s TstSs in a number of PrCs.
TstS was found to be better than Ref. [
10]’s TstSs in 64 out of 108 PrCs. Therefore, it can be seen that the TstSs using the LrCv are better than the TstSs using the OrSts. Moreover, the proposed method in this study not only provides test statistics but also incorporates graphical representations, allowing for the visual interpretation of results.
Although we have supposed that the LoScDs are GumDist and NormDist, any other LoScD can also be considered.