Confidence Interval, Prediction Interval and Tolerance Interval for the Skew Normal Distribution: A Pivotal Approach

Qi, Xinlei; Li, Huihui; Tian, Weizhong; Yang, Yaoting

doi:10.3390/sym14050855

Open AccessArticle

Confidence Interval, Prediction Interval and Tolerance Interval for the Skew Normal Distribution: A Pivotal Approach

¹

The School of Cyberspace Security, Xi’an University of Posts and Telecommunications, Xi’an 710121, China

²

The Special Education School of Jinzhong City, Jinzhong 030600, China

³

College of Big Data and Internet, Shenzhen Technology University, Shenzhen 518118, China

⁴

Department of Applied Mathematics, Xi’an University of Technology, Xi’an 710048, China

^*

Author to whom correspondence should be addressed.

Symmetry 2022, 14(5), 855; https://doi.org/10.3390/sym14050855

Submission received: 22 March 2022 / Revised: 14 April 2022 / Accepted: 14 April 2022 / Published: 21 April 2022

(This article belongs to the Special Issue Skewed (Asymmetrical) Probability Distributions and Applications across Disciplines II)

Download

Browse Figures

Versions Notes

Abstract

:

The class of skew normal distributions, introduced by Azzalini (1985), which is an asymmetric distribution and allows the presence of skewness. In this paper, we propose the pivotal quantity approach to construct the confidence interval for the mean, prediction interval for the mean of the future sample, and tolerance interval for the quantile. The fiducial distribution is also studied. Moreover, the performances of all the proposed confidence intervals are investigated through the Monte Carlo simulation. The pivotal quantity is a common method for calculating confidence intervals, which is used to construct confidence intervals in this paper. And the convergence of the obtained confidence interval is illustrated by the figures. Finally, a real data is used to explain proposed intervals in real life.

Keywords:

fiducial distribution; confidence interval; prediction interval; tolerance interval; skew normal distribution

1. Introduction

Statistics consists of three types of statistical inferences: Bayesian inference, frequency inferences and fiducial inferences. Fiducial inference was originally proposed by Fisher [1] and was aimed to overcome the deficiency in Bayesian framework when there was little or no parameter information in prior distribution. From Fisher’s point of view, fiducial inference simply changed the logical identity of the parameter. Fiducial confidence interval is a kind of confidence interval based on fiducial statistical theory. It also treats the unknown population parameter as a random variable. Based on the fiducial inference principle, Li et al. [2] illustrated the usefulness of the fiducial inferences method; Wang et al. [3] have given construction method of prediction intervals for the normal distribution, exponential distribution, and gamma distribution; Veronese and Melilli [4] developed a simple and direct method to define the fiducial distribution of real exponential families; Krishnamoorthy and Wang [5] obtained the fiducial confidence limits and prediction limits of the gamma distribution; Hoang-Nguyen-Thuy et al. [6] have given the fiducial estimation method of location scale distribution family, and listed several distributions for analysis.

Confidence intervals are used to describe parameters that have some uncertainty due to sampling error. There are many methods to construct the three confidence intervals. The pivotal quantities approach is commonly used to calculate confidence intervals and the approach based on pivotal quantities allows finding exact test or confidence interval for the values of the parameter. The pivotal quantity method has been used by many scholars to obtain confidence interval. For example, the pivotal quantity used in Chen [7] for constructing confidence intervals was adjusted to improve the performance of the confidence intervals. Seo [8] provided the exact confidence intervals for unknown parameters and exacted predictive intervals for the future upper record values by providing some pivotal quantities in the two-parameter Rayleigh distribution. Johnson [9] mentioned a prediction interval covered a future observation from a random process in repeated sampling, and was typically constructed by identifying a pivotal quantity that was also an ancillary statistic.

In many real-world problems, the data did not satisfy the conditions of symmetry and the assumptions of normality are violated. The class of skew normal distributions is an extension of the normal distribution, allowing for the presence of skewness, see Azzalini [10]. Since then, the skew normal distributions have been studied in a number of important areas, see Azzalini [11] for details. Scholars have studied the confidence interval of a parameter in the skew normal distribution such as Mameli [12] analyzing the approximate confidence interval of skewness parameter under large sample by using Fisher’s transformation; Wang et al. [13] gave three confidence intervals for location parameters in skew normal distribution family with known coefficient of variation and skewness; Wang et al. [14] studied the confidence interval of skewness parameter under skew normal distribution.

Based on our knowledge, the confidence interval of the mean, the prediction interval of future sample mean, tolerance interval of the quantile, and the fiducial distribution for the skew normal distribution are seldom studied. In this paper, we will use the pivotal approach to construct the confidence intervals for the skew normal distribution. All experiments are implemented using R software. The rest of the paper is organized as follows. Some basic properties and pivotal quantities of the skew normal distribution are introduced in Section 2. The confidence interval of the mean for skew normal distribution is constructed by pivotal quantity method, and the simulation experiment is carried out in Section 3. The prediction interval of the future sample mean and one-side tolerance limit are studied in Section 4 and Section 5. The fiducial distribution for the probabilities of the skew normal distribution is discussed in Section 6. All proposed intervals are illustrated using an actual data in Section 7. Some conclusions are given in Section 8.

2. Point Estimates and Pivotal Quantities

According to Azzalini [15], the probability density function (pdf) of the skew normal distribution is given as follows

f (x | μ, σ, λ) = \frac{2}{σ} ϕ (\frac{x - μ}{σ}) Φ (λ \frac{x - μ}{σ}),

(1)

where

μ

is the location parameter,

σ

is the scale parameter,

λ

is the skewness parameter, and

ϕ

and

Φ

are the probability density function and cumulative distribution function of the normal distribution, respectively. We denote it by

S N (μ, σ^{2}, λ)

. Moreover, the effect of

λ

on the skew normal distribution will be graphically shown in Figure 1.

The expressions for the expectation, variance and skewness of the SN(

μ, σ^{2}, λ

) are

E (X) = μ + \sqrt{2 / π} σ δ,

(2)

\begin{matrix} V a r (X) & = & (1 - \frac{2}{π} δ^{2}) σ^{2}, \\ S k e w n e s s (X) & = & \frac{4 - π}{2} \frac{{(\sqrt{2 / π} δ)}^{3}}{{(1 - \frac{2}{π} δ^{2})}^{\frac{3}{2}}}, \end{matrix}

where

δ = \frac{λ}{\sqrt{1 + λ^{2}}}

.

According to the properties of the skew normal distribution,

λ

can be obtained from the skewness of the sample, which do not depend on

μ

and

σ^{2}

. Let

X_{1}, X_{2}, \dots, X_{n}

be a sample from

S N (μ, σ^{2}, λ)

, and denote

γ

to be the skewness of the sample, we have

\hat{λ} = \frac{{(\frac{2 γ}{4 - π})}^{\frac{1}{3}}}{\sqrt{\frac{2}{π} (1 + (1 + \frac{π}{2}) {(\frac{2 γ}{4 - π})}^{\frac{2}{3}})}} .

Meanwhile, let

\hat{μ}

and

\hat{σ}

be the MLE estimator of

μ

and

σ

based on the sample, which can be obtained by the followings

\begin{matrix} \sum_{i = 1}^{n} \frac{X_{i} - \hat{μ}}{\hat{σ}} - λ \sum_{i = 1}^{n} \frac{ϕ (λ \frac{X_{i} - \hat{μ}}{\hat{σ}})}{Φ (λ \frac{X_{i} - \hat{μ}}{\hat{σ}})} = 0, \\ - n + \frac{\sum_{i = 1}^{n} {(X_{i} - \hat{μ})}^{2}}{{\hat{σ}}^{2}} - λ \sum_{i = 1}^{n} \frac{ϕ (λ \frac{X_{i} - \hat{μ}}{\hat{σ}}) \frac{(X_{i} - \hat{μ})}{\hat{σ}}}{Φ (λ \frac{X_{i} - \hat{μ}}{\hat{σ}})} = 0 . \end{matrix}

The more details for MLE estimator of

μ

,

σ

and

λ

can be found in Figueiredo and Gomes [16].

Based on the Equation (1), we know that

Z_{i} = \frac{X_{i} - μ}{σ} \sim S N (0, 1, λ)

, and

\frac{X_{i} - \hat{μ}}{\hat{σ}} = \frac{Z_{i} - (\hat{μ} - μ) / σ}{\hat{σ} / σ} = \frac{Z_{i} - {\hat{μ}}^{*}}{{\hat{σ}}^{*}} .

According to the arguments from Lawless [17] and Krishnamoorthy et al. [18], we have

\frac{\hat{μ} - μ}{σ} \sim {\hat{μ}}^{*} a n d \frac{\hat{σ}}{σ} \sim {\hat{σ}}^{*},

(3)

where the notation ∼ means distributed as, and

{\hat{μ}}^{*}

and

{\hat{σ}}^{*}

are equivalent estimators for

μ = 0

and

σ = 1

of

S N (μ, σ^{2}, λ)

.

3. Confidence Interval of the Mean by Pivotal Quantity

In this section, we study the confidence intervals of the mean of

S N (μ, σ^{2}, λ)

through the pivot quantity approach. According to the Equations (2) and (3), we have

\frac{μ + σ δ \sqrt{\frac{2}{π}} - \hat{μ}}{\hat{σ}} = \frac{μ - \hat{μ}}{\hat{σ}} + δ \sqrt{\frac{2}{π}} \frac{σ}{\hat{σ}} = \frac{δ \sqrt{\frac{2}{π}} - {\hat{μ}}^{*}}{{\hat{σ}}^{*}} .

Let

u_{n} = \frac{\hat{δ} \sqrt{\frac{2}{π}} - {\hat{μ}}^{*}}{{\hat{σ}}^{*}}

, and

u_{n; α}

denotes the 100

α

percentile of

u_{n}

, then the

100 (1 - α) %

confidence interval for the mean is

(\hat{μ} + u_{n; α / 2} \hat{σ}, \hat{μ} + u_{n; 1 - α / 2} \hat{σ}) .

Without loss of generality, we choose

μ = 0

,

σ = 1

, and the values of

λ

are chosen as −2, 1, 0.05, respectively. The percentiles for calculating

100 (1 - α) %

confidence intervals based on different sample sizes are given in Table A1 with the Monte Carlo experiments. Figure 2 shows the percentiles of computing

5 %

,

95 %

,

2.5 %

,

97.5 %

,

1 %

and

99 %

confidence intervals for the mean based on the results obtained from Table A1.

It is observed from Figure 2 that despite the different confidence levels, the percentages gradually decrease and tend to zero as the sample size increases.

From Table A1, we can find that the distance between the upper and lower percentiles of the

u_{n}

decrease as the sample size increases. To evaluate the performances of the proposed confidence interval constructed by the pivotal quantity approach, we carried out simulation studies with the same values of

λ

given in Table A1. The coverage probabilities (CP), average length (AL), and associated standard deviations (SD) were calculated based on the R software. For each of the generated sets, we used the R code with

N =

10,000 runs to compute confidence intervals. The percentage of these 10,000 confidence intervals that include the actual mean value is an estimate of the CP. The AL and SD are estimated similarly. The corresponding results for sample sizes of n ranging from 5 to 100 and different values of

α

are displayed in the following Table A2. See the Appendix. The values of AL in Table A2 are displayed in Figure 3.

From Table A2, we can see that all the CP can reach the corresponding confidence levels. With the increase of sample size, both the mean length and the standard deviations of the interval decrease. Figure 3 illustrate our conclusion more visually. The convergence based on n for different

λ

would be clearer seen.

4. Prediction Intervals for the Mean of a Future Sample

A prediction interval is a statistical interval that contains future random variables with a specific probability, which works on estimating the range of the samples in the future according to the samples in the past or present. Hahn [19] and Kaminsky [20] have expounded the prediction interval of normal distribution and exponential distribution respectively. In the following, we aim to find a prediction interval for the mean value of the future data, with sample size m, from

S N (μ, σ^{2}, λ)

.

Let

\bar{Y}

denote the mean of future sample of size m from

S N (μ, σ^{2}, λ)

. To find a prediction interval for

\bar{Y}

, we denote the quantity

w_{n}

that

w_{n} = \frac{\bar{Y} - \hat{μ}}{\hat{σ}} \sim \frac{{\bar{Y}}^{*} - {\hat{μ}}^{*}}{{\hat{σ}}^{*}},

where

{\bar{Y}}^{*}

is the mean of a sample of size m from the

S N (0, 1, \hat{λ})

. Therefore, the

100 (1 - α) %

prediction intervals for a future sample mean

\bar{Y}

is

(\hat{μ} + P_{α / 2} \hat{σ}, \hat{μ} + P_{1 - α / 2} \hat{σ}),

where

P_{α}

is 100

α

percentile of

w_{n}

. In the real life, future data is not easy to be collected due to various factors. So here we only consider the case m less than n, and the

95 %

prediction interval based on the Monte Carlo simulation experiment are obtained in Table A3.

The values of width in Table A3 are plotted in Figure 4.

As can be seen from Table A3, the predicted interval length decreases with the increasing of m and n. Figure 4 also visually illustrates the above conclusion. And we can conclude that

95 %

of the mean of the future data is

(\hat{μ} - 0.2918 \hat{σ}, \hat{μ} + 1.5078 \hat{σ})

, when the current sample size is 20 and the future sample size is 5, where

\hat{μ}

and

\hat{σ}

can be estimated from the sample size 20.

5. One-Sided Tolerance Interval Limits

In many practical applications such as medical treatment, environment and engineering, people hope to find an interval estimate based on the sample, which can capture at least a proportion p in the sample population with confidence

ν

. This statistical interval is called the tolerance interval. This type of interval estimation is called p content-

ν

coverage tolerance interval or

(p, ν)

tolerance interval for short. Proschan [21] has studied the tolerance interval of normal distribution. Krishnamoorthy et al. [18] have discussed the prediction and tolerance intervals of the Rayleigh distribution with two-parameter. Hoang-Nguyen-Thuy et al. [22] have given the calculation method of tolerance interval of location scale distribution family. Therefore, it is necessary to study the tolerance interval of skew normal distribution when many data in life tend to show some skewness compared with normal distribution.

Let

0.5 \leq p \leq 1

and

Q_{p} (μ, σ, λ)

to be the

100 p

percentile of the distribution

S N (μ, σ^{2}, λ)

, then

\frac{Q_{p} (μ, σ, λ) - \hat{μ}}{\hat{σ}} = \frac{(Q_{p} (μ, σ, λ) - μ) / σ - (\hat{μ} - μ) / σ}{\hat{σ} / σ} \sim \frac{Q_{p} (0, 1, \hat{λ}) - {\hat{μ}}^{*}}{{\hat{σ}}^{*}} = q_{p},

where

q_{p}

can be used to set confidence bound on

Q_{p} (μ, σ, λ)

. If

q_{p, ν}

is the 100

ν

percentile of

q_{p}

, the one-sided tolerance interval is

(\hat{μ} + q_{1 - p, 1 - ν} \hat{σ}, \hat{μ} + q_{p, θ} \hat{σ}) .

In the following, we calculated the

q_{p, ν}

and

q_{1 - p, 1 - ν}

for

ν = 0.95

with different sample sizes of n and values of p. The simulations are based on

S N (0, 1, 1)

and results are shown in Table A4. The values of

q_{p, ν}

and

q_{1 - p, 1 - ν}

are described in Figure 5.

From Table A4 and Figure 5, we can see that the lower bound of one-side tolerance interval increases, the upper bound of one-side tolerance interval decreases, and the interval length of one-side tolerance interval decreases as the sample size increases, which means that the larger the sample size is, the smaller and more precise the intervals are.

Table A5 selects several sample sizes for simulation and gives the CP and AL of the tolerance confidence interval. Repeat 10,000 times to get the SD of the length of the tolerance confidence intervals. The values of AL in Table A5 are displayed in Figure 6.

From Table A5, we can see that the coverage of tolerance confidence interval can reach

95 %

, the AL and SD of tolerance confidence interval decrease with the increase of sample size.

6. Fiducial Distribution of Skew Normal Distribution

In practical application, it is often necessary to know the probability that the sample is larger than a certain critical value. When analyzing survival data, we need to get the probability that the patient’s survival time after illness is greater than a certain value t, that is

P (x > t)

. For example, in mechanical manufacturing, it is necessary to know the probability of the parts manufactured in the tolerance range, which can be obtained by using fiducial inference. Krishnamoorthy [23] has shown that the fiducial method is a useful tool for solving the frequency characteristics of many complex problems. The application of the fiducial method to the concrete distribution has also been studied by many scholars. O’Reilly [24] studied the fiducial distribution of exponential distribution, and Hoang-Nguyen-Thuy [6] obtained the fiducial distribution of position scale distribution family. In this section, we study the fiducial distribution for the probabilities of

S N (μ, σ^{2}, λ)

.

Given

X \sim S N (μ, σ, λ)

and

t \in ℜ

, let

P_{t} = P (X \leq t | μ, σ, λ) = F_{X} (t | μ, σ, λ)

be the cumulative distribution function (cdf) of

S N (μ, σ^{2}, λ)

. Consider the testing

H_{0} : P_{t} = P_{0} V S H_{a} : P_{t} > P_{0},

where

P_{0}

is a specific value between (0,1). The hypothesis above are equivalent to

H_{0} : P (\frac{X - μ}{σ} \leq \frac{t - μ}{σ}) = F_{z} (\frac{t - μ}{σ} | 0, 1, λ) \Rightarrow t = μ + σ F^{- 1} (P_{0} | 0, 1, λ),

and

H_{a} : t > μ + σ F^{- 1} (P_{0} | 0, 1, λ) .

For given level

α

and observed value (

{\hat{μ}}_{0}, {\hat{σ}}_{0}

) of (

μ, σ

), the

H_{0}

is rejected if,

P (\frac{F^{- 1} (P_{0} | 0, 1, λ) - {\hat{μ}}^{*}}{{\hat{σ}}^{*}} < \frac{t - {\hat{μ}}_{0}}{{\hat{σ}}_{0}}) < α \Leftrightarrow P (P_{0} < F^{*} ({\hat{μ}}^{*} + \frac{{\hat{σ}}^{*}}{{\hat{σ}}_{0}} (t - μ_{0}))) < α,

where

F^{*}

is the CDF of

S N (0, 1, λ)

.

Therefore, the fiducial distribution of

F_{X} (t | μ, σ, λ)

is given by

Q_{P_{t}} = F^{*} ({\hat{μ}}^{*} + \frac{{\hat{σ}}^{*}}{{\hat{σ}}_{0}} (t - {\hat{μ}}_{0})),

and the

100 (1 - α) %

fiducial confidence interval for

P_{t}

is formed by the lower and upper

100 α / 2

percentiles of

Q_{P_{t}}

.

Furthermore, let

P_{L U} = P (L \leq X \leq U) = P (X \leq U) - P (X \leq L)

and replace the parameters with their fiducial quantities. We can obtain a fiducial quantity of

P_{L U}

as

Q_{P_{L U}} = Q_{P_{U}} - Q_{P_{L}} = F^{*} ({\hat{μ}}^{*} + \frac{{\hat{σ}}^{*}}{{\hat{σ}}_{0}} (U - {\hat{μ}}_{0})) - F^{*} ({\hat{μ}}^{*} + \frac{{\hat{σ}}^{*}}{{\hat{σ}}_{0}} (L - {\hat{μ}}_{0})) .

(4)

Therefore, the

100 (1 - α) %

confidence interval for

P_{L U}

is

(Q_{P_{L U, α / 2}}, Q_{P_{L U, 1 - α / 2}})

, where

Q_{P_{L U, α}}

is the

100 α

percentile of

Q_{P_{L U}}

.

To verify the validity of the constructed fiducial confidence interval, the following simulation is performed. Let

α_{l}

and

α_{r}

denote the left and right-tail error probability, such that

α_{l} + α_{r} = α

. We choose

α_{l} = α_{r} = α / 2

in the study. Table 1 shows the CP and AL of fiducial confidence intervals simulated by Monte Carlo method. The SD of the AL was obtained by repeating the simulation experiment 10,000 times. Without loss of generality, the following simulations are based on

S N (0, 1, 1)

.

As can be seen from the simulation in Table 1, the coverage rate increases, AL and SD decrease with the increase of sample size.

7. Application

Corn seed quality is an important factor to determine corn yield and it is easy to suffer mechanical damage when threshing. Mancera-Rico et al. [25] conducted an experiment to measure the mechanical damage suffered by maize seeds, and they considered that maize seeds contained different levels of moisture and endosperm were compressed until rupture occurred. We choose one of the variables, stain, which has the same function as Mancera-Rico et al. [25] stated. The data set was presented in Table 2 contains 90 observations, and strain (mm) were measured on maize seeds containing flour endosperm and

8 %

water.

Using the sn package in R software, the estimators of parameter for the skew normal distribution are

\hat{μ} = 0.1814

,

\hat{σ} = 0.1020

, and

\hat{λ} = 1.1901

, respectively. Next, we will work on this data to illustrate our proposed methods. Based on the equations in Section 2, the estimators of the corresponding parameter for the skew normal distribution are

\hat{μ} = 0.1531

,

\hat{σ} = 0.1229

, and

\hat{λ} = 2.5025

, respectively.

Furthermore, the Kolmogorov-Smirnov (K-S) test, the Anderson-Darling (A-D) goodness-of-fit tests, as well as the p-value (pval) are reported in Table 3. The K-S statistic (based on the MLE of the parameter

\hat{μ} = 0.1814

,

\hat{σ} = 0.1020

, and

\hat{λ} = 1.1901

) is 0.0531 and the corresponding p-value is 0.9613. The K-S statistic (based on the our method of the parameter

\hat{μ} = 0.1531

,

\hat{σ} = 0.1229

, and

\hat{λ} = 2.5025

) is 0.0424 and the corresponding p-value is 0.9969. Therefore, the data set is reasonably fitted for the skew normal distribution. The fitting curves of the probability densities from these two methods are also displayed in the Figure 7.

Furthermore, we use the strain data in Table 2 to study the construction of different proposed statistical intervals. The 95% confidence interval for the mean strain (MCI) of corn,

95 %

prediction interval for the mean strain (MPI) in a future sample of size

m = 20

, and one-sided tolerance limited with

p = 0.975, ν = 0.95

(TL) are given in Table 4.

From Table 4, we can find that the

95 %

confidence interval for the mean of corn is (0.2276, 0.2621). In other words, there is 95% chance that the average strain of corn seed after extrusion will be between 0.2276 mm and 0.2621 mm. We also notice that the

95 %

prediction confidence interval for the 20 samples in the future is (0.2320, 0.2665). It means that

95 %

chance that the average strain of a corn seed will be between 0.2320 mm and 0.2665 mm. In addition, we study the one-sided tolerance interval about the strain of the corn seed, and found that the upper and lower tolerance limits are 0.4989 mm and 0.0688 mm respectively. This means that at least

97.5 %

of the corn that will change at least 0.4989 mm has a confidence

95 %

.

Finally, we study the probability of strain for the corn seed in a certain length, such as

L = 0.1, U = 0.45

. Based on Equation (4), we known

Q_{P_{L U}} = F^{*} ({\hat{μ}}^{*} + \frac{{\hat{σ}}^{*}}{0.1229} (0.45 - 0.1531)) - F^{*} ({\hat{μ}}^{*} + \frac{{\hat{σ}}^{*}}{0.1229} (0.1 - 0.1531)),

where

{\hat{μ}}^{*}

and

{\hat{σ}}^{*}

are estimates of

S N (0, 1, \hat{λ})

. With the same procedure instructed in Section 6, the lower and upper 2.5th percentiles of

Q_{P_{L U}}

are calculated as 0.5833 and 0.7263. Thus, the 95% confidence interval for

P (0.1 \leq X \leq 0.45)

is (0.5833, 0.7263), which means 58.33–72.63% of corn seeds have changed 0.1 mm to 0.45 mm in length with a confidence 95%.

8. Concluding Remarks

Secondly, we propose the confidence interval of the mean, the prediction interval of the future sample mean, and one-side tolerance limit for the skew normal distribution based on the pivotal quantity approach. We discuss that the estimator of the skewness parameter

λ

can be obtained without depending on

μ

and

σ

, and obtain the method to estimate the parameter

λ

, which is simpler than the traditional MLE method. Monte Carlo random simulation experiments are carried out for all the obtained intervals. The simulation experiments show that the CPs of the confidence intervals reach the corresponding confidence levels. Moreover, the mean lengths and standard deviations of the intervals decrease as the sample size increases, and the lengths of the prediction intervals decrease as m and n increase. In addition, we study the fiducial distribution of the skew normal distribution, and the pivotal approach provides a good idea to study the mean of one sample. In the end, we employ our proposed methods on the real data, which conclude our proposed methods can provide effective and useful information. It can be used as an extension of traditional methods to better solve specific problems in practice.

In fact, the proposed estimation method has some limitations, especially for the

λ \to 0

, which are also discussed by Azzalini [11]. In the future, the estimation method for solving the problem when

λ

closes to 0 can be studied. Meanwhile, we will keep working on these confidence intervals with the fiducial approach and do some comparisons between skew normal distribution and other skewed distributions, such as, lognormal distribution, skew-t distribution, and skew-Cauchy distribution. Furthermore, these three different intervals with the pivotal quantity approach based on the skew slash distribution, which proposed by Tian et al. [26], will be conducted to enrich the research work on the asymmetric data.

Author Contributions

X.Q. and W.T.: Conceptualization, Methodology, Validation, Investigation, Resources, Supervision, Project Administration, Visualization, Writing review and editing; X.Q., H.L. and Y.Y.: Software, Formal analysis, Data curation, Writing—original draft preparation, Visualization. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Informed Consent Statement

Not applicable.

Data Availability Statement

Datasets are provided in the paper.

Acknowledgments

The authors would like to thank the Editor and two anonymous referees for their careful reading of this article and for their constructive suggestions, which considerably improved this article.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Percentiles for computing

95 %

,

97.5 %

and

99 %

confidence intervals for the mean.

Table A1. Percentiles for computing

95 %

,

97.5 %

and

99 %

confidence intervals for the mean.

$λ$	n	5%	95%	2.5%	97.5%	1%	99%
−2	5	−0.2777	1.3049	−0.5259	1.6273	−0.7992	2.0208
	6	−0.1666	1.2251	−0.4185	1.4106	−0.6797	1.6698
	7	−0.1112	1.1800	−0.2915	1.3090	−0.4953	1.5542
	9	−0.0402	1.0680	−0.2290	1.1891	−0.3581	1.3687
	10	−0.0283	1.0308	−0.1638	1.1597	−0.3359	1.3243
	12	−0.0042	1.0027	−0.1067	1.0969	−0.2926	1.2365
	15	0.0471	0.9596	−0.0848	1.0461	−0.1862	1.1543
	20	0.0817	0.9283	−0.0183	0.9984	−0.1426	1.0830
	50	0.1788	0.8385	0.0835	0.8913	−0.0003	0.9053
	100	0.2692	0.8006	0.1717	0.8280	0.0830	0.8649
1	5	−0.2847	1.4305	−0.5385	1.5272	−1.0293	1.9166
	6	−0.1651	1.2116	−0.3537	1.4457	−0.6680	1.7314
	7	−0.1050	1.1774	−0.2779	1.3189	−0.5299	1.5005
	9	−0.0512	1.0820	−0.2341	1.1842	−0.3831	1.3218
	10	−0.0523	1.0577	−0.1895	1.1975	−0.3626	1.2757
	12	−0.0027	1.0100	−0.1295	1.0892	−0.2780	1.2587
	15	0.0298	0.9377	−0.1026	1.0603	−0.2309	1.1159
	20	0.0471	0.9030	−0.0556	0.9701	−0.2023	1.0582
	50	0.1046	0.7837	0.0257	0.8274	−0.0590	0.8659
	100	0.1055	0.6996	0.0303	0.7351	−0.0200	0.7846
0.05	5	−0.2777	1.3786	−0.5216	1.5586	−0.8433	1.8587
	6	−0.1799	1.1889	−0.3461	1.3726	−0.6821	1.5816
	7	−0.1107	1.1825	−0.3248	1.2880	−0.5636	1.5984
	9	−0.0600	1.0671	−0.2021	1.2250	−0.3798	1.3478
	10	−0.0461	1.0461	−0.1732	1.1317	−0.3801	1.2893
	12	−0.0011	0.9813	−0.1251	1.1021	−0.3206	1.1831
	15	0.0290	0.9477	−0.0847	1.0221	−0.1811	1.1478
	20	0.0303	0.8857	−0.0266	0.9575	−0.1903	1.0357
	50	0.0800	0.7543	0.0233	0.8203	−0.0663	0.8512
	100	0.0996	0.6764	0.0352	0.7210	−0.0262	0.7573

Table A2. CP and AL(SD) of 95%, 97.5% and 99% confidence intervals for mean.

$λ$	n	95% (CP)	AL (SD)	97.5% (CP)	AL (SD)	99% (CP)	AL (SD)
2	5	0.9463	1.7926 (0.7109)	0.9723	2.2516 (0.8903)	0.9851	3.2492 (1.3394)
	6	0.9567	1.6827 (0.5977)	0.9698	1.7316 (0.6173)	0.9859	2.5215 (0.9335)
	10	0.9566	1.1255 (0.3108)	0.9764	1.3098 (0.3771)	0.9954	1.7042 (0.4815)
	20	0.9539	0.8409 (0.1828)	0.9866	1.0676 (0.2354)	0.9912	1.0727 (0.2448)
	50	0.9647	0.6023 (0.1072)	0.9809	0.6876 (0.1182)	0.9927	0.8605 (0.1497)
	100	0.9579	0.5048 (0.0704)	0.9764	0.5870 (0.0828)	0.9908	0.6809 (0.0915)
1	5	0.9592	2.0615 (0.7752)	0.9727	2.5247 (1.0354)	0.9966	3.7686 (1.5299)
	6	0.9599	1.8600 (0.6249)	0.9753	2.2248 (0.7519)	0.9955	2.8911 (0.9883)
	10	0.9506	1.3376 (0.3507)	0.9824	1.5818 (0.4343)	0.9967	2.2160 (0.6313)
	20	0.9688	0.9907 (0.1969)	0.9781	1.1634 (0.2364)	0.9969	1.4784 (0.2986)
	50	0.9556	0.7147 (0.1096)	0.9748	0.7930 (0.1201)	0.9920	1.0239 (0.1566)
	100	0.9610	0.6266 (0.0839)	0.9780	0.6732 (0.0832)	0.9940	0.8248 (0.1056)
0.05	5	0.9542	2.6378 (1.0244)	0.9807	3.5027 (1.3501)	0.9919	4.6244 (1.7167)
	6	0.9695	2.2843 (0.7763)	0.9818	2.4948 (0.8453)	0.9921	3.8161 (1.3642)
	10	0.9636	1.6428 (0.4366)	0.9791	1.9899 (0.5264)	0.9934	2.4142 (0.6480)
	20	0.9589	1.2151 (0.2390)	0.9885	1.3537 (0.2675)	0.9958	1.6762 (0.3292)
	50	0.9527	0.8795 (0.1307)	0.9748	0.9390 (0.1374)	0.9910	1.2342 (0.1876)
	100	0.9680	0.7027 (0.0820)	0.9750	0.7964 (0.0943)	0.9950	0.9969 (0.1186)

Table A3. Lower and upper percentiles for computing 95% PI for the mean of the future sample.

$n = 20$				$n = 30$				$n = 50$
m	L	U	width	m	L	U	width	m	L	U	width
1	−1.1182	2.4937	3.6119	1	−0.9588	2.2933	3.2521	1	−0.9789	2.2230	3.2019
2	−0.5269	1.9618	2.4887	3	−0.3875	1.5591	1.9466	2	−0.5727	1.7031	2.2758
3	−0.4154	1.6516	2.0670	5	−0.1842	1.5152	1.6994	4	−0.2217	1.4219	1.6436
4	−0.3923	1.5325	1.9248	7	−0.1468	1.3345	1.4813	6	−0.0946	1.3541	1.4487
5	−0.2918	1.5078	1.7996	9	−0.0957	1.2806	1.3763	10	−0.0282	1.1481	1.1763
6	−0.2443	1.4077	1.6520	11	−0.0863	1.1941	1.2804	14	0.0093	1.1166	1.1073
7	−0.1653	1.3786	1.5439	13	−0.0271	1.1504	1.1775	20	0.0318	1.0695	1.0377
8	−0.1708	1.3004	1.4712	15	−0.0005	1.1578	1.1573	25	0.0738	1.0057	0.9319
9	−0.0659	1.2857	1.3516	17	0.0220	1.1515	1.1295	30	0.1286	1.0086	0.8800
10	−0.0717	1.2663	1.3380	20	0.0816	1.1025	1.0209	50	0.1320	0.9827	0.8507

Table A4.

q_{p, ν}

and

q_{1 - p, 1 - ν}

for computing one sided tolerance limits with

ν = 0.95

.

Table A4.

q_{p, ν}

and

q_{1 - p, 1 - ν}

for computing one sided tolerance limits with

ν = 0.95

.

	$q_{p, ν}$					$q_{1 - p, 1 - ν}$
n/p	0.70	0.90	0.95	0.975	0.99	0.70	0.90	0.95	0.975	0.99
5	−0.8757	−2.1135	2.9571	−4.1657	−5.5048	2.5910	4.2375	5.2984	6.2965	7.7219
6	−0.7468	−1.7410	−2.3376	−3.4057	−4.7283	2.2527	3.6149	4.6216	5.5571	6.5805
7	−0.4968	−1.5049	−2.1431	−3.2633	−3.9556	1.9949	3.3971	3.9997	5.1313	6.0606
8	−0.4793	−1.3428	−1.8418	−2.8103	−3.3364	1.8741	3.1053	3.6972	4.8000	5.3883
9	−0.3614	−1.2859	−1.7308	−2.6282	−3.3015	1.8550	2.9528	3.4950	4.6561	5.1384
10	−0.4069	−1.1523	−1.5725	−2.5213	−2.9053	1.8369	2.7935	3.4770	4.4894	4.8346
15	−0.2218	−0.9547	−1.4110	−2.1069	−2.6046	1.5904	2.5173	2.9389	3.8450	4.4227
20	−0.1727	−0.8567	−1.2687	−1.9871	−2.5127	1.4790	2.3626	2.7473	3.5745	4.0068
30	−0.1052	−0.7806	−1.1493	−1.7982	−2.2157	1.3622	2.1937	2.5913	3.3006	3.7451
50	−0.0481	−0.6849	−1.0436	−1.7050	−2.0744	1.2597	2.0053	2.3996	3.0694	3.4490
75	−0.0342	−0.6817	−1.0312	−1.6251	−1.9794	1.2089	1.9347	2.3021	2.9522	3.3199
100	−0.0162	−0.6757	−0.9957	−1.6056	−1.9747	1.1721	1.8877	2.2488	2.8879	3.2148
200	−0.0016	−0.6560	−0.9786	−1.5643	−1.9111	1.1045	1.7966	2.1520	2.7470	3.0836

Table A5. CP and AL(SD) of tolerance confidence intervals.

	$p = 0.95$		$p = 0.975$		$p = 0.99$
n	CP	AL (SD)	CP	AL (SD)	CP	AL (SD)
5	0.9580	7.7085 (2.8979)	0.9622	9.8706 (3.7216)	0.9632	12.2723 (4.6524)
6	0.9592	6.6142 (2.2054)	0.9610	8.5191 (2.8111)	0.9646	10.7338 (3.6711)
7	0.9555	5.8302 (1.8119)	0.9604	8.0063 (2.4752)	0.9626	9.5267 (2.9277)
10	0.9568	4.8681 (1.1886)	0.9620	6.8025 (1.7521)	0.9601	7.4704 (1.8516)
20	0.9555	3.9673 (0.6805)	0.9592	5.4877 (0.9281)	0.9593	6.3928 (1.0884)
50	0.9536	3.4139 (0.3558)	0.9543	4.7411 (0.5072)	0.9586	5.4889 (0.5626)
200	0.9527	3.1305 (0.1613)	0.9540	4.3032 (0.2228)	0.9560	4.9850 (0.2567)

References

Fisher, R.A. Inverse probability. In Mathematical Proceedings of the Cambridge Philosophical Society; Cambridge University Press: Cambridge, UK, 1930; Volume 26, pp. 528–535. [Google Scholar]
Li, X.; Wang, J.; Liang, H. Comparison of several means: A fiducial based approach. Comput. Stat. Data Anal. 2011, 55, 1993–2002. [Google Scholar] [CrossRef]
Wang, C.M.; Hannig, J.; Iyer, H.K. Fiducial prediction intervals. J. Stat. Plan. Inference 2012, 142, 1980–1990. [Google Scholar] [CrossRef]
Veronese, P.; Melilli, E. Fiducial and confidence distributions for real exponential families. Scand. J. Stat. 2015, 42, 471–484. [Google Scholar] [CrossRef]
Krishnamoorthy, K.; Wang, X. Fiducial confidence limits and prediction limits for a gamma distribution: Censored and uncensored cases. Environmetrics 2016, 27, 479–493. [Google Scholar] [CrossRef]
Hoang-Nguyen-Thuy, N.; Krishnamoorthy, K. Estimation of the probability content in a specified interval using fiducial approach. J. Appl. Stat. 2021, 48, 1541–1558. [Google Scholar] [CrossRef]
Chen, Z. Exact confidence interval for the shape parameter of a log-logistic distribution. J. Stat. Comput. Simul. 1997, 56, 193–211. [Google Scholar] [CrossRef]
Seo, J.I.; Jeon, J.W.; Kang, S.B. Exact interval inference for the two-parameter Rayleigh distribution based on the upper record values. J. Probab. Stat. 2016, 2016, 8246390. [Google Scholar] [CrossRef]
Johnson, G.S. Tolerance and prediction intervals for non-normal models. arXiv 2020, arXiv:2011.11583. [Google Scholar]
Azzalini, A. A class of distributions which includes the normal ones. Scand. J. Stat. 1985, 12, 171–178. [Google Scholar]
Azzalini, A. The Skew-Normal and Related Families; Cambridge University Press: Cambridge, UK, 2013; Volume 3. [Google Scholar]
Mameli, V.; Musio, M.; Sauleau, E.; Biggeri, A. Large sample confidence intervals for the skewness parameter of the skew-normal distribution based on Fisher’s transformation. J. Appl. Stat. 2012, 39, 1693–1702. [Google Scholar] [CrossRef]
Wang, Z.; Wang, C.; Wang, T. Estimation of Location Parameter in the Skew Normal Setting with Known Coefficient of Variation and Skewness. Int. J. Intell. Technol. Appl. Stat. 2016, 9, 191–208. [Google Scholar]
Wang, C.; Wang, T.; Trafimow, D.; Myuz, H.A. Desired sample size for estimating the skewness under skew normal settings. In International Conference of the Thailand Econometrics Society; Springer: Cham, Switzerland, 2019; pp. 152–162. [Google Scholar]
Azzalini, A. Further results on a class of distributions which includes the normal ones. Statistica 1986, 46, 199–208. [Google Scholar]
Figueiredo, F.; Gomes, M.I. The skew-normal distribution in SPC. REVSTAT–Statistical J. 2013, 11, 83–104. [Google Scholar]
Lawless, J.F. Statistical Models and Methods for Lifetime Data; John Wiley Sons: Hoboken, NJ, USA, 2011; Volume 362. [Google Scholar]
Krishnamoorthy, K.; Waguespack, D.; Hoang-Nguyen-Thuy, N. Confidence interval, prediction interval and tolerance limits for a two-parameter Rayleigh distribution. J. Appl. Stat. 2019, 47, 160–175. [Google Scholar] [CrossRef]
Hahn, G.J. Factors for calculating two-sided prediction intervals for samples from a normal distribution. J. Am. Stat. Assoc. 1969, 64, 878–888. [Google Scholar] [CrossRef]
Kaminsky, K.S.; Nelson, P.I. Prediction intervals for the exponential distribution using subsets of the data. Technometrics 1974, 16, 57–59. [Google Scholar] [CrossRef]
Proschan, F. Confidence and tolerance intervals for the normal distribution. J. Am. Stat. Assoc. 1953, 48, 550–564. [Google Scholar] [CrossRef]
Hoang-Nguyen-Thuy, N.; Krishnamoorthy, K. A method for computing tolerance intervals for a location-scale family of distributions. Comput. Stat. 2021, 36, 1065–1092. [Google Scholar] [CrossRef]
Krishnamoorthy, K.; Mathew, T. Statistical methods for establishing equivalency of several sampling devices. J. Occup. Environ. Hyg. 2007, 5, 15–21. [Google Scholar] [CrossRef]
O’Reilly, F.; Rueda, R. Fiducial inferences for the truncated exponential distribution. Commun. Stat.-Theory Methods 2007, 36, 2207–2212. [Google Scholar] [CrossRef]
Mancera-Rico, A.; Garca, G.; Zavaleta, H.A. Fracture resistance and physiological quality of corn seeds under axial compression. Rev. Mex Cienc. Agrcolas 2016, 7, 45–57. [Google Scholar]
Tian, W.; Wang, T.; Gupta, A.K. A new family of multivariate skew slash distribution. Commun. Stat.-Theory Methods 2018, 47, 5812–5824. [Google Scholar] [CrossRef]

Figure 1. PDF graph for different the values of

λ

.

Figure 1. PDF graph for different the values of

λ

.

Figure 2. Percentiles for computing

95 %

,

97.5 %

and

99 %

confidence intervals for the mean.

Figure 2. Percentiles for computing

95 %

,

97.5 %

and

99 %

confidence intervals for the mean.

Figure 3. Percentiles for computing

95 %

,

97.5 %

and

99 %

confidence intervals for the mean.

Figure 3. Percentiles for computing

95 %

,

97.5 %

and

99 %

confidence intervals for the mean.

Figure 4. The values of width for computing 95% PI for the mean of the future sample.

Figure 5.

q_{p, ν}

and

q_{1 - p, 1 - ν}

for computing one sided tolerance limits with

ν = 0.95

.

Figure 5.

q_{p, ν}

and

q_{1 - p, 1 - ν}

for computing one sided tolerance limits with

ν = 0.95

.

Figure 6. Percentiles for computing

95 %

,

97.5 %

and

99 %

confidence intervals for the mean.

Figure 6. Percentiles for computing

95 %

,

97.5 %

and

99 %

confidence intervals for the mean.

Figure 7. Histogram and PDF fit.

Table 1. CP and AL(SD) of 95% fiducial confidence intervals for

P (L < X < U)

.

Table 1. CP and AL(SD) of 95% fiducial confidence intervals for

P (L < X < U)

.

		$n = 10$		$n = 20$		$n = 30$		$n = 50$
L	U	CP	AL (SD)	CP	AL (SD)	CP	AL (SD)	CP	AL (SD)
−0.2	0.2	0.9412	0.1940 (0.0088)	0.9530	0.1499 (0.0050)	0.9559	0.1292 (0.0045)	0.9564	0.1060 (0.0028)
−0.5	0.5	0.9477	0.4542 (0.0152)	0.9558	0.3450 (0.0105)	0.9514	0.2896 (0.0076)	0.9556	0.2396 (0.0075)
−1	1	0.9434	0.6148 (0.0136)	0.9504	0.4893 (0.0124)	0.9528	0.4153 (0.0111)	0.9567	0.3519 (0.0103)
−2	2	0.9471	0.4530 (0.0172)	0.9485	0.3409 (0.0133)	0.9507	0.2684 (0.0118)	0.9537	0.1657 (0.0089)
−0.2	0.4	0.9425	0.2877 (0.0117)	0.9530	0.2329 (0.0089)	0.9549	0.1996 (0.0059)	0.9567	0.1657 (0.0050)
−0.3	0.5	0.9471	0.4098 (0.0139)	0.9506	0.2935 (0.0098)	0.9454	0.2578 (0.0070)	0.9658	0.2238 (0.0060)
−1.5	0.8	0.9477	0.5718 (0.0166)	0.9521	0.4158 (0.0140)	0.9542	0.3593 (0.0103)	0.9585	0.2842 (0.0086)
−2.3	1	0.9422	0.4613 (0.0138)	0.9446	0.3344 (0.0102)	0.9506	0.2616 (0.0079)	0.9545	0.2159 (0.0069)
0.1	0.3	0.9427	0.1151 (0.0042)	0.9500	0.0824 (0.0031)	0.9556	0.0656 (0.0024)	0.9587	0.0604 (0.0017)
0.5	1.5	0.9446	0.3757 (0.0189)	0.9545	0.2969 (0.0109)	0.9528	0.2502 (0.0093)	0.9546	0.2253 (0.0073)
0.6	3	0.9483	0.4716 (0.0165)	0.9545	0.3622 (0.0144)	0.9559	0.3096 (0.0125)	0.9577	0.2752 (0.0095)
−0.4	−0.1	0.9515	0.1332 (0.0056)	0.9547	0.1016 (0.0031)	0.9522	0.0808 (0.0025)	0.9557	0.0637 (0.0021)
−2	−1	0.9482	0.2810 (0.0047)	0.9503	0.2345 (0.0052)	0.9549	0.2241 (0.0045)	0.9555	0.2031 (0.0036)
−2.5	−0.3	0.9460	0.5189 (0.0153)	0.9496	0.4297 (0.0133)	0.9569	0.3678 (0.0117)	0.9596	0.3521 (0.0103)

Table 2. Strain of maize seeds.

0.293	0.274	0.280	0.262	0.270	0.284	0.177	0.463	0.257	0.212	0.192	0.336	0.220
0.371	0.208	0.287	0.343	0.261	0.246	0.276	0.262	0.269	0.226	0.331	0.267	0.231
0.329	0.246	0.465	0.168	0.164	0.170	0.193	0.270	0.242	0.369	0.242	0.206	0.227
0.226	0.307	0.325	0.166	0.118	0.145	0.225	0.210	0.130	0.103	0.232	0.257	0.099
0.249	0.116	0.183	0.355	0.147	0.128	0.193	0.237	0.128	0.186	0.448	0.160	0.282
0.197	0.400	0.213	0.196	0.272	0.386	0.213	0.165	0.215	0.192	0.147	0.126	0.186
0.322	0.201	0.415	0.223	0.287	0.331	0.234	0.130	0.318	0.322	0.185	0.357

Table 3. Goodness of fit test for data sets.

	$\hat{μ}$	$\hat{σ}$	$\hat{λ}$
Our method	0.1531	0.1229	2.5025
K-S	$D = 0.0424$	$p v a l = 0.9969$
A-D	$A n = 0.1484$	$p v a l = 0.9987$
	$\hat{μ}$	$\hat{σ}$	$\hat{λ}$
MLE	0.1814	0.1020	1.1901
K-S	$D = 0.0531$	$p v a l = 0.9613$
A-D	$A n = 0.3517$	$p v a l = 0.8946$

Table 4. Confidence interval, prediction interval and tolerance interval of the data.

	Lower Percentiles	Upper Percentiles	Interval
MCI	0.6064	0.8867	(0.2276, 0.2621)
MPI	0.6420	0.9223	(0.2320, 0.2665)
TL	−0.6856	2.8137	(0.0688, 0.4989)

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qi, X.; Li, H.; Tian, W.; Yang, Y. Confidence Interval, Prediction Interval and Tolerance Interval for the Skew Normal Distribution: A Pivotal Approach. Symmetry 2022, 14, 855. https://doi.org/10.3390/sym14050855

AMA Style

Qi X, Li H, Tian W, Yang Y. Confidence Interval, Prediction Interval and Tolerance Interval for the Skew Normal Distribution: A Pivotal Approach. Symmetry. 2022; 14(5):855. https://doi.org/10.3390/sym14050855

Chicago/Turabian Style

Qi, Xinlei, Huihui Li, Weizhong Tian, and Yaoting Yang. 2022. "Confidence Interval, Prediction Interval and Tolerance Interval for the Skew Normal Distribution: A Pivotal Approach" Symmetry 14, no. 5: 855. https://doi.org/10.3390/sym14050855

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Confidence Interval, Prediction Interval and Tolerance Interval for the Skew Normal Distribution: A Pivotal Approach

Abstract

1. Introduction

2. Point Estimates and Pivotal Quantities

3. Confidence Interval of the Mean by Pivotal Quantity

4. Prediction Intervals for the Mean of a Future Sample

5. One-Sided Tolerance Interval Limits

6. Fiducial Distribution of Skew Normal Distribution

7. Application

8. Concluding Remarks

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI