Next Article in Journal
A Novel Gaussian Process Surrogate Model with Expected Prediction Error for Optimization under Constraints
Previous Article in Journal
Hadamard Product of Monomial Ideals and the Hadamard Package
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Tolerance Interval for the Mixture Normal Distribution Based on Generalized Extreme Value Theory

1
School of Mathematics and Statistics, Henan University of Science and Technology, Luoyang 471023, China
2
School of Mathematics, Statistics and Mechanics, Beijing University of Technology, Beijing 100124, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Mathematics 2024, 12(7), 1114; https://doi.org/10.3390/math12071114
Submission received: 23 February 2024 / Revised: 1 April 2024 / Accepted: 3 April 2024 / Published: 8 April 2024
(This article belongs to the Section Probability and Statistics)

Abstract

:
For a common type of mixture distribution, namely the mixture normal distribution, existing methods for constructing its tolerance interval are unsatisfactory for cases of small sample size and large content. In this study, we propose a method to construct a tolerance interval for the mixture normal distribution based on the generalized extreme value theory. The proposed method is implemented on simulated as well as real-life datasets and its performance is compared with the existing methods.

1. Introduction

The tolerance interval (TI) serves as an important statistical interval within the domain of statistics. Distinct from the confidence interval (CI), which furnishes insights into an unknown parameter pertaining to a population, the TI extends its utility by providing information concerning the entire population. For instance, consider a scenario wherein measurements of length have been acquired from a random sample of 1000 pencils sourced from a production process associated with a specific pencil brand. A CI computed from such data provides a range within which one can claim, with a specified degree of confidence (e.g., 95%), the potential presence of the parameter value, denoted as θ , characterizing the distribution of pencil lengths. In contrast, a TI derived from the same dataset establishes boundaries within which one can claim, with a specified degree of confidence (e.g., 95%), the inclusion of the (measured) lengths of at least a specified proportion (e.g., 0.99) of the distribution characterizing the lengths of the pencils.
Within the existing body of literature, two prominent categories of TIs have been scrutinized, denoted as “ β -content TIs” and “ β -expectation TIs”. In the context of a β -expectation TI, its average content precisely corresponds to β , representing what is colloquially termed a 100 β % prediction interval. This interval is designed to encompass a forthcoming randomly selected observation from a distribution with a predetermined level of confidence. On the other hand, a β -content TI, also identified as a ( β , 1 α ) two-sided TI, is constructed to include at least a proportion β of the distribution with a confidence level of 100 ( 1 α ) % [1,2]. The present investigation concentrates on the β -content TI, specifically denoted as the ( β , 1 α ) TI.
Two distinct categories of ( β , 1 α ) two-sided TIs exist, namely, the control-the-center TIs and the control-both-tails TIs, also recognized as equal-tailed TIs. Despite both falling under the umbrella of β -content TIs, they exhibit disparate definitions. The former is formulated to encompass at least a proportion, β , of the entire population with a confidence level of 1 α . In contrast, the latter is designed to encompass at least a proportion, β , of the central region of the population with the same confidence level. TIs stand as potent tools widely employed across diverse domains such as manufacturing and industrial statistics, engineering, environmental science, hydrology, medicine, meteorology, economics, and beyond (refer to, for instance, [3,4,5,6,7]). For more recent advancements and in-depth theoretical treatments of TIs, interested readers are directed to the comprehensive works of [1,2].
In the above applications, it is usually assumed that the data are singly distributed. Indeed, finite mixture distributions may be more suitable to fit the real data than single distributions, especially for the data that are believed to belong to two or more distinct, but unobserved, categories. Because of their flexibility in fitting real data, finite mixture distributions have continued to receive increasing attention over the years. Therein, finite mixture normal distributions are one of the most commonly used mixture distributions.
Mixture normal distributions have been successfully used to address diverse data problems, such as research on the evaluation of nuclear power plant safety [8], assessment of the debrisoquin and dextromethorphan phenotyping tests [9], examination of the ventricular size in schizophrenia [10], prediction of protein local conformations [11], fitting the asset returns data in portfolio optimization problems [12], modeling rainfall drop size distribution in southern England [13], and analysis of wireless channels [14].
In the above problems, it is important to construct TIs, which can be used to determine whether a change has occurred in the process. Few methods can be used to construct TIs for mixture normal distributions due to their higher computational costs. The distribution-free (DF) method provides an easy-to-compute way of constructing TIs without assuming the form of the underlying distribution, but the DF TIs may be relatively conservative, and not satisfactory for the cases of small sample size and large content [2,15].
In addition to the DF method, ref. [16] constructed the TIs for the mixture normal distribution based on the fiducial generalized pivotal quantiles (FGPQ); ref. [17] proposed two TIs for the mixture normal distribution based the expectation-maximization (EM) algorithm combined with the bootstrap method and the asymptotic normality of sample quantiles (ANSQ), where the bootstrap method is also used in [8]. Although the methods proposed by [16,17] outperform the DF method, there may still be a gap between the coverage probabilities (CP) of these TIs and the nominal levels when the sample size is not large enough and the content, β , is large. However, the content of the TI is usually required to be close to one in some applications requiring high precision, such as the evaluation of nuclear power plant safety. Moreover, considering the difficulty and cost of sample collection, the sample size is likely to be less than 300 in many practical applications.
Aiming at the problem of small sample size and large content, a method to construct the TIs for the mixture normal distribution based on the generalized extreme value theory (GEVT) is proposed in this study [18,19]. We compared the GEVT method with the DF method, the bootstrap method, the ANSQ method, and the FGPQ method in the simulation study. The simulation results show that the GEVT method works better than the other methods for the cases of small sample size and large content. Besides large content ( β = 0.99 ), β = 0.90 and β = 0.95 are also common in practice. Further investigations of a more comprehensive method are a topic of ongoing research.
The outline of the paper is as follows. Section 2 provides a brief review of the basics of TIs and the asymptotic behavior of the extreme value. We discuss the asymptotic distributions of the extreme values for the mixture normal distribution, and provide the GEVT method for constructing TIs in the following section. Simulations are conducted to compare the performance of these methods in Section 4 and a real data example is used for illustration in Section 5. Section 6 concludes the paper.

2. Preliminaries

This section reviews some basics of TIs and the GEVT briefly.

2.1. Basics of TIs

This subsection is excerpted from the first section of Chapter 1 of [1]. Suppose X = ( X 1 , , X n ) is a random sample from a continuous distribution F θ ( x ) , where F θ ( x ) is the cumulative distribution function (CDF) with an unknown parameter θ . The content and confidence level are two essential nominal parameters of a TI, which will be denoted by β and 1 α , respectively, ( 0 < β , α < 1 ) .
A ( β , 1 α ) control-the-center TI L ( X ) , U ( X ) for F θ ( x ) can be defined as
P X { P X L ( X ) X U ( X ) | X β } 1 α ,
or equivalently,
P X F θ U ( X ) F θ L ( X ) β 1 α .
And a ( β , 1 α ) control-both-tails (equal-tailed) TI can be defined as
P X F θ U ( X ) ( 1 + β ) / 2 and F θ L ( X ) ( 1 β ) / 2 1 α .
Substituting L ( X ) or U ( X ) in (1) with (or 0) or will yield a ( β , 1 α ) upper or lower tolerance limit for F θ ( x ) , respectively.
In fact, for the one-sided TI, a ( β , 1 α ) lower tolerance limit is a 1 α lower confidence limit for the 1 β quantile q 1 β , and a ( β , 1 α ) upper tolerance limit is a 1 α upper confidence limit for the β quantile q β . Thus, the computation of the one-sided tolerance limits reduces to the computation of the confidence limits for certain quantiles.

2.2. The Asymptotic Distribution of the Extreme Value

Suppose X 1 , X 2 , , X n are independent random samples with a common cumulative distribution function (CDF) F ( x ) . Denote X ( 1 ) and X ( n ) as the sample minima and maxima, respectively, i.e., X ( 1 ) = min ( X 1 , X 2 , , X n ) , X ( n ) = max ( X 1 , X 2 , , X n ) . The asymptotic behavior of X ( n ) has been thoroughly and rigorously discussed by [18,19]. Some of the main findings will be reviewed in this section.
Unlike the central limit problem, the normal distribution is no longer suitable as the limiting distribution (LD) of X ( n ) due to the inherent skewness of extreme values. In fact, for an arbitrary F ( x ) , if X ( n ) possesses an LD, then the LD must be one of the following three types:
( Fréchet ) G 1 ( x ; a ) = exp ( x a ) for x > 0 ;
( Weibull ) G 2 ( x ; a ) = exp [ ( x ) a ] for x 0 , a > 0 ;
( Gumbel ) G 3 ( x ) = exp ( e x ) for < x < .
More formally, if there exist constants a n and b n > 0 , such that
lim n + P ( X ( n ) < a n + b n x ) = G i ( x ) , for all x C ( G i ) , i = 1 , 2 , 3 ,
where C ( G i ) denotes all continuity points of G i , then we can say that F is in the domain of maximal attraction of the LD G i , denoted as F D ( G i ) .
Similarly, if the LD of X ( 1 ) exists, then it must be one of the following three types:
H 1 ( x ; a ) = 1 exp ( ( x ) a ) for x 0 , a > 0 ;
H 2 ( x ; a ) = 1 exp ( x a ) for x > 0 ;
H 3 ( x ) = 1 exp ( e x ) < x < ,
where H i ( x ) = 1 G i ( x ) , i = 1 , 2 , 3 .

3. Main Results

In this section, the detailed methodology for constructing tolerance intervals for the mixture normal distribution is provided.
We first discuss the asymptotic behavior of X ( 1 ) and X ( n ) for the mixture normal distribution. For a finite k -components mixture normal distribution, its CDF F θ has the following form:
F θ ( x ) = j = 1 k p j F j ( x ) , < x < + ,
where F j denotes the CDF of a certain normal distribution with mean μ j and standard deviation σ j , and p j s are mixing proportions that comprise the standard simplex { p R k : j = 1 k p j = 1 , p j > 0 , j = 1 , 2 , , k } , θ = { μ 1 , μ 2 , , μ k , σ 1 , σ 2 , , σ k , p 1 , p 2 , , p k } . In this paper, we assume that the value of k is known. Various procedures for dealing with cases of unknown k can be found in [20].
For the normal distribution, ref. [21] provides a detailed derivation of the LD for the extrema. Inspired by this, the asymptotic behavior of the extrema for the mixture normal distribution is deduced as follows.
Theorem 1. 
For the mixture normal distribution (8), we have
F θ D ( G 3 ) , a n d F θ D ( H 3 ) .
The proof of Theorem 1 is given in Appendix A.
Let X = ( X 1 , X 2 , , X n ) be a random sample from the mixture normal distribution (8). Based on Theorem 1, we have
X ( n ) a n b n D G 3 , n ,
X ( 1 ) c n d n D G 3 , n ,
where G 3 Gumbel ( 0 , 1 ) , and the standardizing constants are a n = q ( 1 1 / n ) , b n = 1 / [ n ( F ( a n ) ) ] , c n = q ( 1 / n ) , and d n = 1 / [ n ( F ( c n ) ) ] , where F ( x ) is the CDF and F ( · ) is the probability density function (see [21]). According to page 34 of [22], for each c > 0 , further extreme value theory results are as follows,
q 1 c / n a n b n ln c ,
q c / n c n d n ln c , n .
Combine Equations (9)–(12), and by Slutsky’s theorem we have
X ( n ) q 1 c / n b n D G 3 + ln c ,
and
q c / n X ( 1 ) d n D G 3 + ln c , c > 0 , n .
Based on (13) and (14), the ( 1 c / n , 1 α ) one-sided upper and lower tolerance limits can be constructed as
U 1 = X ( n ) b n ln c b n G 3 1 ( α ) ,
and
L 1 = X ( 1 ) + d n ln c + d n G 3 1 ( α ) ,
respectively, where G 3 1 ( α ) is the α quantile of Gumbel ( 0 , 1 ) .
Following the results of the ( 1 c / n , 1 α ) one-sided tolerance limits, the ( 1 c / n , 1 α ) two-sided equal-tailed TI [ L 2 , U 2 ] is
[ X ( 1 ) + d n ln ( c / 2 ) + d n G 3 1 ( α / 2 ) , X ( n ) b n ln ( c / 2 ) b n G 3 1 ( α / 2 ) ] .
The performance of the ( 1 c / n , 1 α ) two-sided equal-tailed TI in (17) is unsatisfactory, as the mixture normal distribution is often not symmetric. Therefore, an equal-tailed interval for the mixture normal distribution would be a little conservative. Inspired by [17], we adjust the TI in (17) as follows:
Pick up the lower limit of (17), and define β U * = F θ ( L 2 ) + 1 c / n , then the adjusted upper limit is U 2 * = X ( n ) b n ln [ n ( 1 β U * ) ] b n G 3 1 ( α / 2 ) . Therefore, the ( 1 c / n , 1 α ) two-sided equal-tailed TI after adjusting the upper limit is
[ X ( 1 ) + d n ln ( c / 2 ) + d n G 3 1 ( α / 2 ) , X ( n ) b n ln [ n ( 1 β U * ) ] b n G 3 1 ( α / 2 ) ] .
Similarly, let the upper limit of (17) remain unchanged, and define β L * = F θ ( U 2 ) ( 1 c / n ) , then the new lower limit is L 2 * = X ( 1 ) + d n ln n β L * + d n G 3 1 ( α / 2 ) . Therefore, the ( 1 c / n , 1 α ) two-sided equal-tailed TI after adjusting the lower limit is
[ X ( 1 ) + d n ln n β L * + d n G 3 1 ( α / 2 ) , X ( n ) b n ln ( c / 2 ) b n G 3 1 ( α / 2 ) ] .
Remark 1. 
Note that the GEVT method works for any distribution that satisfies Theorem 1. Thus, the GEVT method is asymptotic in nature; it can also be considered nonparametric within the wide class of CDFs satisfying Theorem 1, such as normal and lognormal distributions.

4. Simulation Study

To compare the performance of the GEVT method with the DF method [2], the bootstrap method [8], the ANSQ method [17], and the FGPQ method [16] in constructing TIs for the mixture normal distribution, a simulation study is conducted. In some practical applications, it is usually necessary to calculate the tolerance interval (TI) for a significance level of 0.99 ( β = 0.99 ). In addition, due to cost constraints, the sample size in real applications is likely to be less than 300 ( n 300 ). Thus, in this simulation study, the significance level ( β ) and the confidence level ( α ) are set to be 0.99 and 0.05 , respectively, and the sample size ( n ) is set to be 20 , 50 , 100 , 200 , 300 .
In addition to comparing the CPs of the TIs, we also employ the following measure defined in [17] as a criterion to compare the precision of the TIs:
δ U = U 1 q β ,
δ L = L 1 q 1 β ,
and
δ TI = U 2 q β U + L 2 q β L ,
where β U = 1 + β 2 , β L = 1 β 2 . Obviously, the smaller the values of δ L and δ TI are, the higher precision the intervals (limits) have. Table 1, Table 2 and Table 3 present the simulation results based on 5000 runs. It should be noted that the EM algorithm may sometimes fail to converge when the sample size is small or there is a large overlap of components. We use the notation “−” in the tables to denote such cases.
Table 1 and Table 2 present the results of the ( 0.99 , 0.95 ) one-sided upper and lower tolerance limits, respectively. As can be seen, the CPs corresponding to the GEVT method are closer to the nominal level 0.95 compared with the other methods, even for the small sample size. For example, the CPs of the ( 0.99 , 0.95 ) one-sided upper tolerance limits constructed by the DF, bootstrap, ANSQ, FGPQ, and GEVT methods for the case of n = 50 , p = ( 1 / 2 , 1 / 2 ) , μ = ( 0 , 5 ) , and σ = ( 1 , 1.5 ) are 0.395, 0.810, 0.971, 0.971, and 0.945, respectively. Under the same case, the CPs of the ( 0.99 , 0.95 ) one-sided lower tolerance limits constructed by these three methods are 0.395, 0.818, 0.970, 0.982 and 0.942, respectively. In addition, the precision of these tolerance limits corresponding to the GEVT method is the highest of the three methods. For example, the values of δ corresponding to the above two groups of CPs are δ U = 0.626, 0.901, 1.306, 1.979, 1.135, and δ L = 0.419, 0.930, 1.402, 1.558, 0.756, respectively. Note that, when the sample size n = 300 , the CPs for both the GEVT and the DF methods are pretty close to the nominal level. However, the values of δ corresponding to the DF method are larger than those corresponding to the GEVT method, which means that the GEVT tolerance limits have the higher precision.
Table 3 present the results of the ( 0.99 , 0.95 ) equal-tailed two-sided TIs, where GEVT(L) (GEVT(U)) denotes the results based on the GEVT method and adjusting the lower (upper) limit. As shown in the table, the adjusted GEVT methods have a very obvious advantage in constructing TIs for the mixture normal distribution. Moreover, comparing the two adjusted results of the GEVT method in the table, it is found that there is little difference in the effects of adjusting the upper limit and adjusting the lower limit. And the adjusted GEVT method is obviously superior to the other methods in terms of both the CPs of the intervals and the corresponding value of δ T I . More importantly, the CPs of the adjusted GEVT TIs are quite close to the nominal level.
To evaluate the proposed method under component overlap, we consider an extreme case as [17] does, i.e., we mis-specify the single standard normal distribution N ( 0 , 1 ) as the mixture normal distribution. The simulation results in Table 4 show that the modified GEVT method is still superior to the DF method in the extreme cases, which indicates a certain robustness of the GEVT method.

5. Example

The dataset utilized in this study is derived from the work of [8], encompassing information on the peak cladding temperature (PCT) observed in nuclear power plants. Within nuclear reactors, the reactor core is constructed using metal tubes known as fuel rods, wherein nuclear fuel pellets are stacked and sealed. The outer protective layer separating the nuclear fuel from the coolant is denoted as cladding. Stringent adherence to regulatory standards is imperative in the design and construction of nuclear reactors. To mitigate cladding embrittlements, it is crucial to maintain the PCT within a specified and accurate range. TIs with content 0.99 serve as effective and commonly used tools to achieve such goals. The PCT dataset originates from AREVA, Inc., a company specializing in supplying fuel and providing engineering services for nuclear power plants. The dataset, comprising 208 observations on PCT, is sourced from computer code simulations simulating postulated LOCA accidents in nuclear power plants [8]. The research by [8] demonstrates that a mixture of two normal distributions adeptly captures the bimodal nature of the PCT data. The maximum likelihood estimates (MLEs) of the parameters for this two-component mixture of normals are denoted as p ^ = ( 0.3 , 0.7 ) , μ ^ = ( 1100 , 1650 ) , and σ ^ 2 = ( 130 , 155 ) , where p ^ represents the MLE of the proportions for each component, μ ^ is the MLE of the vector for means, and σ ^ is the MLE of the vector for standard deviations. The outcomes based on the PCT data are presented in Table 5, Table 6 and Table 7. From the results in these tables, it can be seen that, for such practical applications with a sample size of 300 or less and a high demand for accuracy (such as β = 0.99 ), the GEVT method has superior performance and guarantees the coverage accuracy of TIs while satisfying the CP of TIs close to the nominal level.

6. Final Considerations

In some practical applications, it is often required to construct tolerance limits with content close to 1, such as 0.99. In addition, due to cost constraints, the sample size in real applications is likely to be less than 300. To the best of our knowledge, the available methods that can be used to construct tolerance limits for mixture distributions are the DF method, the bootstrap method, the ANSQ method, and the FGPQ method. However, these methods do not work so well for cases of small sample size and large content. In this study, we constructed the tolerance limits for the mixture normal distribution based on the GEVT, which contributed a more general treatment of the TIs with large content. We provided numerous numerical studies to show that the GEVT tolerance limits outperform the other tolerance limits in terms of both CPs and coverage accuracy for insufficiently large sample size and large content cases. In addition, the robustness of the GEVT tolerance limits is further verified by mistaking a single normal distribution for a mixture normal distribution. This work only highlighted the use of GEVT on TIs for distributions that satisfy Theorem 1. This approach could be extended to TIs for other types of distributions. However, the GEVT method only works for distributions that satisfy Theorem 1. Further research is needed to explore the construction of TIs for other types of distributions in cases of small sample size and high confidence levels.

Author Contributions

Conceptualization, J.J.; methodology, J.J.; software, R.G.; validation, J.J.; writing—original draft preparation, J.J.; writing—review and editing, J.J.; supervision, R.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The authors confirm that the data supporting the findings of this study are available in ref. [8].

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
TITolerance interval
DFDistribution free
ANSQAsymptotic normality of sample quantiles
GEVTGeneralized extreme value theory
CPCoverage probabilities
EMExpectation-maximization
FGPQFiducial generalized pivotal quantiles
CIConfidence interval
CDFCumulative distribution function
LDLimiting distribution
PCTPeak cladding temperature

Appendix A. Proof of Theorem 1

Proof of Theorem 1 
For the mixture normal distribution, F ,
F ( x ) = j = 1 k p j F j ( x ) ,
according to Theorem 3 of [23], to prove F D ( G 3 ) , we need to prove that there exists a sequence of strictly monotone continuous transformations { g n ( x ) } n = 1 , such that
F n ( g n ( x ) ) G 3 ( x ) .
Since [21] has proved that the LD of the maximum for the normal distribution is the Gumbel distribution, we thus have
F j n ( g n ( x ) ) G 3 ( x ) , j = 1 , 2 , , k .
Then, the rest of the proof is straightforward according to Theorem 1 of [24]. Similar ways can be used, of course, to prove F D ( H 3 ) . □

References

  1. Krishnamoorthy, K.; Mathew, T. Statistical Tolerance Regions: Theory, Applications, and Computation; John Wiley & Sons: New York, NY, USA, 2009; Volume 744. [Google Scholar]
  2. Meeker, W.Q.; Hahn, G.J.; Escobar, L.A. Statistical Intervals: A Guide for Practitioners and Researchers; John Wiley & Sons: New York, NY, USA, 2017; Volume 541. [Google Scholar]
  3. Hauck, W.W.; Shaikh, R. Modified two-sided normal TIs for batch acceptance of dose uniformity. Pharm. Stat. 2004, 3, 89–97. [Google Scholar] [CrossRef]
  4. Ryan, T.P. Modern Engineering Statistics; John Wiley & Sons, Inc.: New York, NY, USA, 2007. [Google Scholar]
  5. Tsong, Y.; Shen, M.; Shah, V.P. Three-stage sequential statistical dissolution testing rules. J. Biopharm. Stat. 2004, 14, 757–779. [Google Scholar] [CrossRef] [PubMed]
  6. Aryal, S.; Bhaumik, D.K.; Mathew, T.; Gibbons, R.D. Approximate tolerance limits and prediction limits for the gamma distribution. J. Appl. Stat. Sci. 2008, 16, 253–261. [Google Scholar]
  7. Chen, P.; Ye, Z.S. Approximate statistical limits for a gamma distribution. J. Qual. Technol. 2017, 49, 64–77. [Google Scholar] [CrossRef]
  8. Zimmer, Z.; Park, D.; Mathew, T. Tolerance limits under normal mixtures: Application to the evaluation of nuclear power plant safety and to the assessment of circular error probable. Comput. Stat. Data Anal. 2016, 103, 304–315. [Google Scholar] [CrossRef]
  9. Henthorn, T.K.; Benitez, J.; Avram, M.J.; Martinez, C.; Llerena, A.; Cobaleda, J.; Krejcie, T.C.; Gibbons, R.D. Assessment of the debrisoquin and dextromethorphan phenotyping tests by gaussian mixture distributions analysis. Clin. Pharmacol. Ther. 1989, 45, 328–333. [Google Scholar] [CrossRef] [PubMed]
  10. Daniel, D.G.; Goldberg, T.E.; Gibbons, R.D.; Weinberger, D.R. Lack of a bimodal distribution of ventricular size in schizophrenia: A gaussian mixture analysis of 1056 cases and controls. Biol. Psychiatry 1991, 30, 887–903. [Google Scholar] [CrossRef] [PubMed]
  11. Tendulkar, A.V.; Ogunnaike, B.; Wangikar, P.P. Protein local conformations arise from a mixture of gaussian distributions. J. Biosci. 2007, 32, 899–908. [Google Scholar] [CrossRef]
  12. Ian, B.; David, S.; Luis, S. Portfolio optimization when asset returns have the Gaussian mixture distribution. Eur. J. Oper. Res. 2008, 185, 1434–1461. [Google Scholar]
  13. Ekerete, K.M.E.; Hunt, F.H.; Jeffery, J.L.; Otung, I.E. Modeling rainfall drop size distribution in southern England using a gaussian Mixture Model. Radio Sci. 2015, 50, 876–885. [Google Scholar] [CrossRef]
  14. Selim, B.; Alhussein, O.; Muhaidat, S.; Karagiannidis, G.K.; Liang, J. Modeling and analysis of wireless channels via the mixture of gaussian distribution. IEEE Trans. Veh. Technol. 2016, 65, 8309–8321. [Google Scholar] [CrossRef]
  15. Wilks, S.S. Determination of sample sizes for setting tolerance limits. Ann. Math. Stat. 1941, 12, 91–96. [Google Scholar] [CrossRef]
  16. Tsai, S.F. Approximate two-sided tolerance intervals for normal mixture distributions. Aust. N. Z. J. Stat. 2020, 62, 367–382. [Google Scholar] [CrossRef]
  17. Chen, C.; Wang, H. Tolerance interval for the mixture normal distribution. J. Qual. Technol. 2020, 52, 145–154. [Google Scholar] [CrossRef]
  18. Gnedenko, B. Sur la distribution limite du terme maximum dune série aléatoire. Ann. Math. 1943, 44, 423–453. [Google Scholar] [CrossRef]
  19. De Haan, L. On Regular Variation and Its Application to the Weak Convergence of Sample Extremes; Mathematical Centre Tracts 32; Mathematisch Centrum: Amsterdam, The Netherlands, 1970. [Google Scholar]
  20. McLachlan, G.J.; Peel, D. Finite Mixture Models; John Wiley & Sons, Inc.: New York, NY, USA, 2000. [Google Scholar]
  21. David, H.A.; Nagaraja, H.N. Order Statistics, 3rd ed.; John Wiley & Sons: New York, NY, USA, 2003. [Google Scholar]
  22. Boos, D.D. Using extreme value theory to estimate large percentiles. Technometrics 1984, 26, 33–39. [Google Scholar] [CrossRef]
  23. Pantcheva, E. Limit theorems for extreme order statistics under nonlinear normalization. In Stability Problems for Stochastic Models; Kalashnikov, V.V., Zolotarev, V.M., Eds.; Lecture Notes in Mathematics; Springer: Berlin/Heidelberg, Germany, 1985; Volume 1155. [Google Scholar]
  24. AL-Hussaini, E.K.; El-Adll, M.E. Asymptotic distribution of normalized maximum under finite mixture models. Stat. Probab. Lett. 2004, 70, 109–117. [Google Scholar] [CrossRef]
Table 1. CPs of the ( 0.99 , 0.95 ) one-sided upper tolerance limits. The values in parentheses denote the corresponding values of δ U .
Table 1. CPs of the ( 0.99 , 0.95 ) one-sided upper tolerance limits. The values in parentheses denote the corresponding values of δ U .
cnDFBootstrapANSQFGPQGEVT
p = ( 1 / 3 , 2 / 3 ) , μ = ( 0 , 1 ) , σ = ( 1 , 1 )
0.2200.182 (0.620)0.973 (0.957)0.998 (7.632)0.965 (0.915)
0.5500.395 (0.394)0.975 (0.829)0.997 (4.176)0.944 (0.709)
11000.634 (0.365)0.788 (0.399)0.978 (0.645)0.993 (2.138)0.943 (0.632)
22000.866 (0.473)0.897 (0.455)0.970 (0.592)0.987 (0.631)0.945 (0.546)
33000.951 (0.583)0.906 (0.304)0.965 (0.501)0.985 (0.503)0.950 (0.472)
p = ( 1 / 2 , 1 / 2 ) , μ = ( 0 , 5 ) , σ = ( 1 , 1.5 )
0.2200.182 (1.007)0.678 (0.791)0.963 (1.539)0.995 (7.195)0.959 (1.496)
0.5500.395 (0.626)0.810 (0.901)0.971 (1.306)0.971 (1.979)0.945 (1.135)
11000.634 (0.577)0.882 (0.698)0.985 (1.309)0.964 (1.014)0.941 (0.996)
22000.866 (0.741)0.906 (0.579)0.977 (0.927)0.974 (0.903)0.944 (0.862)
33000.951 (0.909)0.917 (0.532)0.974 (0.825)0.986 (1.033)0.951 (0.683)
p = ( 1 / 4 , 1 / 2 , 1 / 4 ) , μ = ( 0 , 1 , 2 ) , σ = ( 1 , 1 , 1 )
0.2200.182 (0.687)0.968 (1.046)0.999 (21.349)0.962 (1.031)
0.5500.395 (0.434)0.973 (0.917)0.999 (16.635)0.943 (0.785)
11000.634 (0.403)0.732 (0.649)0.982 (0.910)0.998 (12.823)0.939 (0.692)
22000.866 (0.513)0.805 (0.401)0.976 (0.645)0.996 (5.329)0.944 (0.599)
33000.951 (0.634)0.877 (0.395)0.971 (0.611)0.992 (1.699)0.951 (0.479)
p = ( 1 / 3 , 1 / 3 , 1 / 3 ) , μ = ( 0 , 3 , 7 ) , σ = ( 1 , 1.5 , 1 )
0.2200.182 (0.740)0.941 (1.056)0.999 (26.957)0.959 (1.117)
0.5500.395 (0.449)0.823 (0.712)0.965 (0.925)0.999 (13.916)0.939 (0.816)
11000.634 (0.409)0.894 (0.593)0.975 (0.926)0.992 (7.928)0.941 (0.703)
22000.866 (0.523)0.911 (0.451)0.967 (0.731)0.990 (1.983)0.943 (0.610)
33000.951 (0.637)0.933 (0.590)0.975 (0.579)0.999 (1.638)0.950 (0.532)
Table 2. CPs of the ( 0.99 , 0.95 ) one-sided lower tolerance limits. The values in parentheses denote the corresponding values of δ L .
Table 2. CPs of the ( 0.99 , 0.95 ) one-sided lower tolerance limits. The values in parentheses denote the corresponding values of δ L .
cnDFBootstrapANSQFGPQGEVT
p = ( 1 / 3 , 2 / 3 ) , μ = ( 0 , 1 ) , σ = ( 1 , 1 )
0.2200.182 (0.654)0.973 (1.011)0.999 (7.682)0.966 (0.971)
0.5500.395 (0.416)0.975 (0.876)0.998 (4.532)0.945 (0.749)
11000.634 (0.388)0.781 (0.410)0.978 (0.682)0.992 (2.083)0.942 (0.615)
22000.866 (0.499)0.893 (0.442)0.970 (0.605)0.983 (0.627)0.944 (0.555)
33000.951 (0.614)0.910 (0.322)0.967 (0.571)0.979 (0.591)0.950 (0.498)
p = ( 1 / 2 , 1 / 2 ) , μ = ( 0 , 5 ) , σ = ( 1 , 1.5 )
0.2200.182 (0.672)0.694 (0.807)0.963 (1.526)0.996 (7.241)0.959 (0.935)
0.5500.395 (0.419)0.818 (0.930)0.970 (1.402)0.982 (1.558)0.942 (0.756)
11000.634 (0.386)0.897 (0.711)0.976 (1.278)0.962 (0.727)0.941 (0.601)
22000.866 (0.494)0.902 (0.574)0.967 (0.876)0.967 (0.688)0.944 (0.553)
33000.951 (0.606)0.913 (0.505)0.963 (0.793)0.983 (1.109)0.950 (0.475)
p = ( 1 / 4 , 1 / 2 , 1 / 4 ) , μ = ( 0 , 1 , 2 ) , σ = ( 1 , 1 , 1 )
0.2200.182 (0.687)0.969 (1.047)0.999 (22.367)0.963 (1.031)
0.5500.395 (0.434)0.973 (0.910)0.999 (17.992)0.943 (0.783)
11000.634 (0.402)0.695 (0.672)0.975 (0.706)0.998 (14.999)0.941(0.692)
22000.866 (0.516)0.807 (421)0.968 (0.610)0.996 (5.819)0.943 (0.606)
33000.951 (0.632)0.882 (0.407)0.966 (0.620)0.996 (1.598)0.950 (0.510)
p = ( 1 / 3 , 1 / 3 , 1 / 3 ) , μ = ( 0 , 3 , 7 ) , σ = ( 1 , 1.5 , 1 )
0.2200.182 (0.734)0.944 (1.153)0.999 (29.986)0.961 (1.051)
0.5500.395 (0.447)0.818 (0.694)0.966 (0.923)0.998 (13.925)0.941 (0.812)
11000.634 (0.410)0.901 (0.615)0.974 (0.933)0.999 (7.295)0.941 (0.703)
22000.866 (0.519)0.917 (0.476)0.969 (0.677)0.994 (2.008)0.944 (0.636)
33000.951 (0.636)0.929 (0.575)0.977 (0.589)0.999 (1.528)0.950 (0.575)
Table 3. CPs of the ( 0.99 , 0.95 ) two-sided TIs. The values in parentheses denote the corresponding values of δ T I .
Table 3. CPs of the ( 0.99 , 0.95 ) two-sided TIs. The values in parentheses denote the corresponding values of δ T I .
cnDFBootstrapANSQFGPQGEVT (L)GEVT (U)
p = ( 1 / 3 , 2 / 3 ) , μ = ( 0 , 1 ) , σ = ( 1 , 1 )
0.2200.017 (1.660)0.999 (3.030)0.999 (42.137)0.980 (1.972)0.979 (1.989)
0.5500.089 (1.020)0.999 (2.180)0.997 (29.585)0.960 (1.485)0.960 (1.503)
11000.264 (0.743)0.850 (1.012)0.999 (1.892)0.995 (19.828)0.954 (1.305)0.953 (1.295)
22000.595 (0.699)0.924 (0.813)0.996 (1.442)0.985 (4.409)0.955 (1.176)0.955 (1.202)
33000.802 (0.792)0.935 (0.712)0.994 (1.253)0.991 (1.179)0.955 (1.132)0.954 (1.138)
p = ( 1 / 2 , 1 / 2 ) , μ = ( 0 , 5 ) , σ = ( 1 , 1.5 )
0.2200.017 (2.174)0.732 (2.817)0.998 (3.791)0.996 (34.512)0.974 (2.800)0.974 (2.686)
0.5500.089 (1.311)0.897 (1.965)0.998 (2.756)0.970 (6.901)0.957 (2.012)0.956 (1.895)
11000.264 (0.948)0.968 (1.656)0.998 (2.397)0.964 (2.615)0.951 (1.707)0.952 (1.616)
22000.595 (0.886)0.976 (1.210)0.998 (1.929)0.998 (8.607)0.954 (1.544)0.953 (1.458)
33000.802 (1.001)0.989 (1.174)0.995 (1.579)0.993 (2.923)0.954 (1.477)0.953 (1.415)
p = ( 1 / 4 , 1 / 2 , 1 / 4 ) , μ = ( 0 , 1 , 2 ) , σ = ( 1 , 1 , 1 )
0.2200.017 (1.788)0.999 (3.197)0.999 (107.125)0.979 (2.189)0.979 (2.172)
0.5500.089 (1.091)0.999 (2.309)0.999 (89.076)0.960 (1.618)0.959 (1.611)
11000.264 (0.789)0.723 (1.182)0.999 (2.006)0.999 (75.350)0.953 (1.400)0.952 (1.388)
22000.595 (0.736)0.831 (0.979)0.997 (1.540)0.992 (26.108)0.954 (1.247)0.954 (1.261)
33000.802 (0.834)0.906 (0.885)0.993 (1.301)0.992 (6.120)0.952 (1.197)0.953 (1.192)
p = ( 1 / 3 , 1 / 3 , 1 / 3 ) , μ = ( 0 , 3 , 7 ) , σ = ( 1 , 1.5 , 1 )
0.2200.017 (1.902)0.992 (3.145)0.999 (132.970)0.968 (2.520)0.968 (2.534)
0.5500.089 (1.123)0.802 (2.935)0.994 (2.309)0.999 (74.821)0.955 (1.694)0.954 (1.714)
11000.264 (0.799)0.968 (1.526)0.995 (2.014)0.997 (41.117)0.950 (1.410)0.949 (1.417)
22000.595 (0.742)0.988 (1.197)0.997 (1.549)0.997 (9.577)0.953 (1.267)0.952 (1.263)
33000.802 (0.835)0.992 (1.019)0.997 (1.300)0.999 (4.906)0.955 (1.196)0.955 (1.197)
Table 4. CPs of the ( 0.99 , 0.95 ) two-sided TIs, when the sample is sampling from the single standard normal distribution N ( 0 , 1 ) . The values in parentheses denote the corresponding values of δ TI .
Table 4. CPs of the ( 0.99 , 0.95 ) two-sided TIs, when the sample is sampling from the single standard normal distribution N ( 0 , 1 ) . The values in parentheses denote the corresponding values of δ TI .
cnDFBootstrapANSQFGPQModified GEVT
k = 2
0.2200.017 (1.540)0.999 (2.858)0.999 (40.083)0.976 (1.817)
0.5500.089 (0.950)0.616 (0.732)0.999 (2.052)0.997 (27.461)0.961 (1.387)
11000.264 (0.695)0.944 (1.030)0.999 (1.777)0.996 (18.751)0.955(1.209)
22000.595 (0.657)0.976 (0.903)0.999 (1.362)0.989 (4.396)0.956 (1.108)
33000.803 (0.747)0.990 (0.781)0.999 (1.073)0.986 (1.056)0.960 (1.069)
k = 3
0.2200.017 (1.540)0.999 (2.859)0.999 (96.013)0.977 (1.811)
0.5500.089 (0.951)0.999 (2.051)0.999 (80.929)0.961 (1.385)
11000.264 (0.695)0.640 (0.653)0.999 (1.777)0.999 (63.747)0.955 (1.208)
22000.595 (0.658)0.998 (1.185)0.999 (1.361)0.997 (22.098)0.956 (1.113)
33000.803 (0.747)0.982 (0.800)0.999 (1.072)0.995 (5.349)0.960 (1.074)
k = 4
0.2200.017 (1.539)0.999 (2.860)0.999 (149.003)0.980 (1.800)
0.5500.089 (0.951)0.999 (2.051)0.999 (118.327)0.962 (1.390)
11000.264 (0.695)0.623 (0.637)0.999 (1.776)0.999 (93.538)0.955 (1.199)
22000.595 (0.659)0.805 (0.590)0.999 (1.362)0.999 (43.221)0.954 (1.112)
33000.802 (0.746)0.975 (0.876)0.999 (1.074)0.997 (14.977)0.957 (1.080)
Table 5. (0.99, 0.95) one-sided upper tolerance limits based on the PCT data.
Table 5. (0.99, 0.95) one-sided upper tolerance limits based on the PCT data.
DFBootstrapANSQFGPQGEVT
1859.8001801.3781834.5122108.8991869.974
Table 6. (0.99, 0.95) one-sided lower tolerance limits based on the PCT data.
Table 6. (0.99, 0.95) one-sided lower tolerance limits based on the PCT data.
DFBootstrapANSQFGPQGEVT
934.300875.842920.093876.949918.817
Table 7. (0.99, 0.95) TIs based on the PCT data.
Table 7. (0.99, 0.95) TIs based on the PCT data.
DFBootstrapANSQ
[ 934.300 , 1859.800 ] [ 852.550 , 1823.984 ] [ 885.853 , 1892.706 ]
FGPQGEVT(L)GEVT(U)
[ 779.851 , 2146.396 ] [ 883.925 , 1844.420 ] [ 826.734 , 1800.621 ]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jiao, J.; Guan, R. Tolerance Interval for the Mixture Normal Distribution Based on Generalized Extreme Value Theory. Mathematics 2024, 12, 1114. https://doi.org/10.3390/math12071114

AMA Style

Jiao J, Guan R. Tolerance Interval for the Mixture Normal Distribution Based on Generalized Extreme Value Theory. Mathematics. 2024; 12(7):1114. https://doi.org/10.3390/math12071114

Chicago/Turabian Style

Jiao, Junjun, and Ruijie Guan. 2024. "Tolerance Interval for the Mixture Normal Distribution Based on Generalized Extreme Value Theory" Mathematics 12, no. 7: 1114. https://doi.org/10.3390/math12071114

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop