Next Article in Journal
Pressure Arch Effect of Deeply Buried Symmetrically Distributed Triple Tunnels
Next Article in Special Issue
A New Two-Parameter Discrete Distribution for Overdispersed and Asymmetric Data: Its Properties, Estimation, Regression Model, and Applications
Previous Article in Journal
Application of the q-Homotopy Analysis Transform Method to Fractional-Order Kolmogorov and Rosenau–Hyman Models within the Atangana–Baleanu Operator
Previous Article in Special Issue
Failure Evaluation of Electronic Products Based on Double Hierarchy Hesitant Fuzzy Linguistic Term Set and K-Means Clustering Algorithm
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Triple Sampling Inference Procedures for the Mean of the Normal Distribution When the Population Coefficient of Variation Is Known

1
College of Nursing, Public Authority of Applied Education and Training, Safat 13092, Kuwait
2
Department of Natural Sciences and Mathematics, College of Engineering, International University of Science and Technology in Kuwait, Ardiya 92400, Kuwait
3
Engineering Sciences Department, Faculty of Engineering, Abdullah Gul University, 38080 Kayseri, Türkiye
4
Faculty of Management Sciences, October University for Modern Sciences and Arts, 6th October City 12566, Egypt
*
Author to whom correspondence should be addressed.
Symmetry 2023, 15(3), 672; https://doi.org/10.3390/sym15030672
Submission received: 31 January 2023 / Revised: 25 February 2023 / Accepted: 3 March 2023 / Published: 7 March 2023

Abstract

:
This paper discusses the triple sampling inference procedures for the mean of a symmetric distribution—the normal distribution when the coefficient of variation is known. We use the Searls’ estimator as an initial estimate for the unknown population mean rather than the classical sample mean. In statistics literature, the normal distribution under investigation underlines almost all the natural phenomena with applications in many fields. First, we discuss the minimum risk point estimation problem under a squared error loss function with linear sampling cost. We obtained all asymptotic results that enhanced finding the second-order asymptotic risk and regret. Second, we construct a fixed-width confidence interval for the mean that satisfies at least a predetermined nominal value and find the second-order asymptotic coverage probability. Both estimation problems are performed under a unified optimal framework. The theoretical results reveal that the performance of the triple sampling procedure depends on the numerical value of the coefficient of variation—the smaller the coefficient of variation, the better the performance of the procedure.

1. Introduction

Let X 1 ,   X 2 , be a sequence of independent and identically distributed random variables from a symmetric distribution–normal distribution N ( μ , μ 2 η 2 ) , ( μ 0 ) where η 2  is a known coefficient of variation. In usual cases, the normal distribution N ( μ , σ 2 ) where σ 2 does not depend on μ , the sample mean X ¯ n = n 1 i = 1 n X i , n 1 is known to be the uniformly minimum variance UMV unbiased estimator of μ , but in the present case of a known coefficient of variation, the sample mean no longer achieves this property. Searls [1] suggested an improved estimator for µ in the form μ ^ n = n   ( n + η 2 ) 1   X ¯ n , n 1 and proved that the mean squared error of the Searls’ estimator  M S E ( μ ^ n ) = ( n + η 2 ) 1 σ 2 is smaller than  M S E   ( X ¯ n ) with the relative efficiency of μ ^ n to X ¯ n is ( η 2 / n ) + 1 ; see Sen [2].
Several authors have intensively studied the point estimation of μ . For example, Arnholt and Hebert [3] considered a wider class of estimators for µ when η is known and showed that Searls’ estimator still has minimum mean squared errors among other estimators.
Sinha [4] discussed the Bayesian estimation of the mean of the symmetric distribution–normal distribution when the coefficient of variation is known, while Gleser and Healy [5] considered a class of Bay’s estimators against inverted Gamma priors; see also Guo and Pal [6], Anis [7], Srisodaphol and Tongmol [8], and Hinkley [9]. Recently, fuzzy relational inference systems for estimation have been shown in [10,11].
The assumption of a known coefficient of variation is involved in many biological, physical, and engineering applications. For example, for applications in agricultural studies, see Bhat and Rao [12]; for applications in biological and medical experiments, see Brazauskas and Ghorai [13]; see also Hald [14], and Davis and Goldsmith [15].
Despite the extensive literature on point estimation for µ, few literary works are available for interval estimation. Niwitpong [16] proposed two confidence intervals for the mean of the symmetric distribution–normal distribution based on Searls’ work. Fu, Wang, and Wong [17] extended Bhat and Rao’s [12] approach and proposed the modified signed log-likelihood ratio test for the normal mean.
From a theoretical point of view, the standard inferential methods cannot be used directly to find the inference of the normal mean since the family N ( μ , η 2 μ 2 ) , ( μ 0 ) belongs to the curved exponential family model with a two-dimensional minimal sufficient statistic ( X ¯ n , S n 2   ) where S n 2 = ( n 1 ) 1 i = 1 n ( X i X ¯ n ) 2 n 2 ; see Efron [18].

Sequential Sampling Procedures

Sequential sampling procedures were mainly developed for statistical inference during and after World War II. Stein [19,20] and Cox [21] presented the two-stage procedure for constructing a fixed-width confidence interval for the population normal mean when the population variance is finite and unknown. It was shown from the literature that the two-stage sampling procedure attains exact consistency and asymptotic consistency but has asymptotic inefficiency (oversampling), especially when the pilot sample size is much smaller than the optimal sample size. To overcome such deficiency, Anscombe [22], Ray [23], and Chow and Robbins [24] proposed a purely sequential procedure. The procedure attains asymptotic consistency and efficiency but lacks exact consistency and time consumption. See Mukhopadhyay and de Silva [25].
As a compromising procedure, Hall [26] introduced the triple sampling procedure to achieve two primary objectives, the operational savings made possible by sampling in batches as in a two-stage procedure and the asymptotic efficiency attained by purely sequential sampling. The procedure is based on three stages, as we describe later. The procedure combines the efficiency of Anscombe, Chow, and Robbins’ one-by-one purely sequential procedure and the operational saving made possible by sampling in bulk using Stein’s group sampling techniques. It is an excellent trade-off between a purely sequential procedure and a two-stage procedure with ease of implementation. The triple sampling procedure was mainly developed to construct a fixed-width confidence interval for the normal mean that satisfies a predetermined width and coverage probability when the population variance is unknown. The procedure attains all customary measures except exact consistency. The following lines describe the above measures as follows:
If N is the final random sample size generated by a multistage sequential procedure and n * is the optimal sample size needed to estimate the parameter µ, then the procedure is said to be:
(i)
first-order asymptotically efficient if lim n * E ( N / n * ) = 1 and
(ii)
second-order asymptotically efficient if  lim n * E ( N n * ) <   ; see Ghosh and Mukhopadhyay [27].
Moreover, if I N is the fixed-width confidence interval constructed via a multistage sampling procedure, then the procedure is called (i) consistent or exactly consistent if P ( μ I N ) 1 α while it is asymptotically (first-order) consistent if lim n * P ( μ I N ) 1 α ; α is the desired nominal value in the sense of Stein [19], Mukhopadhyay [28], and Chow and Robbins [24], respectively. Moreover, if R N is the multistage sampling risk encountered in estimating the mean μ by the corresponding sample measure, and if R n * is the optimal fixed-sample-size risk had  σ been known, then the procedure is (i) first-order asymptotically risk efficient if  lim n * R N / R n * = 1 and (ii) second-order asymptotic regret if lim n * ( R N R n * ) remains bounded in the sense of Ghosh and Mukhopadhyay [27]. For more details, see Mukhopadhyay and de Silva [25], Ghosh, Mukhopadhyay, and Sen [29].
Mukhopadhyay [30] further developed a unified framework for the triple sampling procedure by focusing on higher-order moments of the final stopping variable N . Mukhopadhyay et al. [31] discussed the triple sampling sequential estimation for the normal mean. Hamdy [32] extended Hall’s results and proposed a triple sampling procedure to tackle the normal mean minimum risk point estimation problem and fixed-width confidence interval estimation problem. Meanwhile, Liu [33] extended Hall’s results to tackle hypothesis-testing problems for the normal mean. Yousef [34] discussed the sensitivity of the normal-based triple sampling sequential point estimation to the normality assumption, considering a class of an absolutely continuous distribution whose absolute first six moments are assumed finite but unknown. After that, he generalized the study to find the second-order asymptotic coverage probability and the second-order characteristic operating function for the mean. He studied the capability of the constructed confidence interval to detect possible shifts in the actual population mean occurring outside the confidence boundaries; see Yousef [35]. Son et al. [36] proposed the triple sampling procedure that tackled a fixed-width confidence interval and a hypothesis testing for the normal mean while controlling Type II error probability. Yousef [37] discussed the performance of the triple sampling procedure to a broader class of underlying continuous distributions applying the second-order Edgeworth series. Both Son et al. [36] and Yousef [35,38] provided second-order approximations of the characteristic operating function of the inference. Yousef [39,40] tackled estimation of the normal inverse coefficient of variation using Monte Carlo simulation. For other underlying distributions, see Yousef et al. [41,42]. For triple sampling minimum risk point estimation for a function of a normal mean under weighted power absolute error loss plus cost, see Banerjee and Mukhopadhyay [43].
Chaturvedi and Tomer [44] discussed the minimum risk and bounded risk point estimation problem for the normal mean when the coefficient of variation is known. They used two sequential procedures: the triple sampling procedure of Hall [26] and the accelerated sequential scheme Hall [45], using the Searls [1] estimator as an estimate for the normal mean. Although we consider the same problem addressed by Chaturvedi and Tomer [44], our approach differs in several ways. First, we combine point and confidence interval estimation in a unified optimal decision framework. This technique utilizes all the available data to construct quality control charts. The point estimation is for determining the center line of the quality control chart (quality mean), and the confidence interval estimation is for establishing the upper and the lower quality limits with a predetermined required specification ( 2 d ), d ( > 0 ) . Second, our theorems and proofs are provided as second-ordered approximations. Third and last, we provide more details regarding the asymptotic distribution characteristics of the final stopping time N , the estimate of the parameter μ , and its higher-ordered moments.

2. Problem Setting

Assume a sample of size n , say, ( x 1 ,   x 2 , , x n ) , is available from the normal distribution with mean μ and variance σ 2 = η 2 μ 2 , μ 0 . We use μ ^ n = n   ( n + η 2 ) 1   X ¯ n , n 1 as an initial estimate for the population mean μ . The aim is to discuss the minimum risk point estimation problem for the normal population mean and construct a confidence interval for the mean with a predetermined width and coverage probability. It has been shown by Dantziq [46] that there is no fixed sample size n that can solve the problem except sequentially. Therefore, we use the triple sampling procedure of Hall [26] to solve this problem in the presence of a known coefficient of variation.

3. Estimation of the Population Mean

3.1. Minimum Risk Point Estimation

Let L n ( A ) be the loss function incurred by estimating the population mean μ by Searls’ [1] estimator μ ^ n for all n 1 . That is,
L n ( A ) = A | μ ^ n μ | 2 + c n ,
where c is the cost per unit sample and assumed to be known to the experimenter, c n is the cost of sampling, while the constant A ( > 0 ) will be described after subsequent lines.
The risk associated with (1) is defined by
R n ( A ) = A E | μ ^ n μ | 2 + c n .
However,
E ( μ ^ n μ ) 2 = ( n + η 2 ) 2 n 2 E ( X ¯ n μ ) 2 + ( n + η 2 ) 2 μ 2 η 4 ,
substituting σ = η μ , follows E ( μ ^ n μ ) 2 = σ 2 ( n + η 2 ) 1 .
Hence,
R n ( A ) = A σ 2 ( n + η 2 ) 1 + c n .
By treating n as a continuous random variable, the minimum value for n is
n A / c σ η 2 = λ σ η 2 = n * ,   ( Say )
where   λ = A / c . As c 0 , λ . Since σ is unknown, then n * is unknown. It was shown by Dantzig [46], Stein [19,20], and Seelbinder [47] that no fixed sample size procedure exists that minimizes (2) uniformly over σ . Therefore, we propose the triple sampling procedure of Hall [26] to estimate n * through estimation of σ .
The optimal risk, had σ been known, is
R n * ( A ) = 2 c n * + c η 2 .
To obtain further insight into the nature of A , write (3) as A = c σ 2 ( n * + η 2 ) 2 from which we obtain the following representation of A :
A = c ( n * + η 2 ) I ( n * , σ 2 ) .
From (5), A is partially known (knowable) since it depends on the unknown n * . If we assume that A is known as mentioned in the literature on sequential estimation (see Chaturvedi and Tomer [44], Hamdy [32], and Mukhopadhyay et al. [31]), then this will impose restrictions on the parameter space of the population mean µ, as it can be seen from (5) A   n * + η 2 σ 2 . Now c ( n * + η 2 ) is the cost of optimal sampling, and I ( n * ,   θ ) = σ 2 ( n * + η 2 ) is the optimal Fisher information. Hence, we can define A as the cost of optimal sampling information.

3.2. Fixed-Width Confidence Interval Estimation

Assume we need to establish a fixed-width confidence interval for the mean of the normal distribution with a prescribed width of 2 d , d ( > 0 ) , and coverage probability of at least ( 1 α ) , 0 < α < 1 . That is, we need to find a solution to the inequality
P ( |   μ ^ n μ | d ) 1 α .
Since n n + η 2 1 as n , in probability, then it follows from Slutsky’s Theorem as n , μ ^ n = n n + η 2 X ¯ n N ( μ ,   σ 2 n ) in distribution. This leads to
P ( n σ |   μ ^ n μ   | d n σ ) 1 α = 2 Φ ( a ) 1 2 Φ ( d n σ ) 1 2 Φ ( a ) 1 ,
where Φ ( u ) = u ( 2 π ) 1 e y 2 / 2 d y and a = Φ 1 ( 1 α / 2 ) .
It follows immediately that
n ( a / d ) 2   σ 2 = n 0 .   ( say )
If σ is known, then n 0 is the optimal fixed-sample size required to solve (6) uniformly over σ > 0 . Consequently, the desired fixed-width confidence interval for μ is I n = ( μ ^ n 0 d ,   μ ^ n 0 + d   ) . Similarly, as in the previous section, we propose the triple sampling procedure of Hall [26] to estimate  n 0 through estimation of σ 2 .

4. Triple Sampling Procedure and Asymptotic Results

The following lines describe the triple sampling procedure based on (3).
Stage 1. Fix m , η , and the design factor δ , 0 < δ < 1 and generate a pilot sample of size m   ( 2 ) from the normal distribution and compute  μ ^ m = m ( m + η 2 ) 1 X ¯ m and S m 2 as initial estimates of μ and σ 2 , respectively.
Stage 2. Let S * = [ δ ( λ S m η 2 ) ] + 1 , where [ x ] is the largest integer less than x . Calculate
N 1 = m a x { m , S *   } .
If m S * , then stop sampling; consequently, the experiment terminates. Otherwise, sample extra observations ( S * m   ) and augment them with the previous observations. The resultant sample is of size N 1 * .
Stage 3. Let T * = [ ( λ S N 1 * η 2 ) ] + 1 . Calculate
N = m a x { N 1 ,   T * } .
If N 1 T * , then no further observations are needed; otherwise, sample an extra observation ( T * N 1 ) and augment them with the previous sample. As a result, we propose μ ^ N = N ( N + η 2 ) 1 X ¯ N , and σ ^ = S N are, respectively, the sequential point estimates for μ and σ .
To proceed further, the following assumption is necessary to setup all the upcoming theorems. This assumption was setup by Hall [26].
Assumption A.
The triple sampling procedure is carried out under the choice of m such that as m , n * = O ( λ r ) , r 1 , , and l i m s u p ( m / n * ) < δ .

4.1. Minimum Risk Point Estimation

Theorem 1.
Under assumption (A), for the triple sampling procedure (8) and (9) as λ .
(i)
E ( X ¯ N 1 ) = μ σ η δ n * + o ( λ 1 )
(ii)
E ( X ¯ N 1 2 ) = μ 2 σ 2 δ n * + o ( λ 1 )
(iii)
V a r ( X ¯ N 1 ) = σ 2 δ n * + o ( λ 1 )
(iv)
E ( S N 1 ) = σ σ η 2 δ n * + o ( λ 1 )
(v)
E ( S N 1 2 ) = σ 2 σ 2 η 2 δ n * + o ( λ 1 )
(vi)
V a r ( S N 1 ) = σ 2 η 2 δ n * + o ( λ 1 )
Proof.
To prove ( i ) , we condition on the σ f i e l d   generated   by   the  pilot study phase by X 1 , X 2 ,…, X m , and write
E ( X ¯ N 1 ) = E { N 1 1 E { ( i = 1 N 1 ( X i μ + μ ) ) | X 1 , X 2 , X m } } = μ + E { N 1 1 σ E ( i = 1 m Z i + i = m + 1 N 1 Z i | Z 1 , Z 2 , . , Z M ) } ,
where Z i = X i μ σ   i = 1 ,   2 ,   , m , are i . i . d random variables distributed as N (0,1).
Provided Z 1 , Z 2 , …, Z m , the summation i = 1 m Z i is non-random, so is N 1 . Therefore,
E i = m + 1 N 1 Z i = 0 .
and   E ( X ¯ N 1 ) = μ + σ E { N 1 1   i = 1 m Z i }
Then we expand N 1 1 in Taylor series around δ n * as
N 1 1 = ( δ n * ) 1   ( N 1 δ n * )   ( δ n * ) 2 +   ( N 1 δ n * ) 2   ( δ n * ) 3 + R 1 N 1 ,
where R 1 N 1 is the remainder term. Recall the second term in (12); we obtain
σ E { N 1 1   i = 1 m Z i } = σ 2 λ δ η   E ( i = 1 m Z i ) 2 m ( δ n * ) 2 + σ λ 2 δ 2 η 2   E ( i = 1 m Z i ) 3 m ( δ n * ) 3 + E ( R 1 N 1 )   = σ η   δ n * + E ( R 1 N 1 )
Now, the remainder term E ( R 1 N 1 ) = σ λ 3 δ 3 η 3 E { ( i = 1 m Z i ) 3   ( ν ) 3 } , where v is a random variable lying in between N 1 and δ n * . If N 1 ν   δ n * and since m N 1 , we obtain
E ( R 1 N 1 ) = σ λ 3 δ 3 η 3 E { ( i = 1 m Z i ) 4   ( ν ) 4 }   σ 4 λ 3 δ 3 η 3 m 5 E ( X ¯ μ σ / m ) 4 = 3 η 3 m = o ( λ 2 )
as m . Similarly, when δ n * ν   N 1 . Thus, we have
E ( R 1 N 1 ) = σ 4 λ 3 δ 3 η 3 E { ( i = 1 m Z i ) 4   ( ν ) 4 }   σ 4 λ 3 δ 3 η 3 m 2 ( δ n * ) 3 E ( X ¯ μ σ / m ) 4 = o   ( λ 2 )
as m , where we have used the assumption that m / n * δ as m . Finally, we obtain
E ( X ¯ N 1 ) = μ   σ η   δ n * + o ( λ 1 ) ,
which proves ( i ) of Theorem 1.
Hence, (iv) of Theorem 1 is straightforward if we write S N 1 =   η X ¯ N 1 .
To prove ( ii ) , we condition on the σ f i e l d  generated by X 1 , X 2 ,…, X m and write
E ( X ¯ N 1 2 ) = E { N 1 2 E ( i = 1 N 1 X i μ + μ ) 2 | X 1 ,   X 2 , ,   X m } = E   { N 1 2 E { ( i = 1 N 1 ( X i μ ) ) 2 + 2 μ N 1 i = 1 N 1 ( X i μ ) + N 1 2 μ 2 } | X 1 , X 2 , X m } = μ 2 + E {   N 1 2 E { ( i = 1 m ( X i μ ) + i = m + 1 N 1 ( X i μ ) ) 2 + 2 μ N 1 i = 1 m ( X i μ ) } } | X 1 ,   X 2 , . . . ,   X m = μ 2 + I + I I .
Thus,
I = E { N 1 2 E ( i = 1 m ( X i μ ) ) 2 } + E { N 1 2 σ 2 ( N 1 m ) } | X 1 , X 2 , , X m .
The second term in (13)
E { N 1 2 σ 2 ( N 1 m ) } σ 2 ( E ( N 1 ) = o ( λ 1 ) .
Therefore, the first term in (13)
I = E { N 1 2 { E ( i = 1 m ( X i μ ) ) 2 + 2 i = 1 m ( X i μ ) E i = m + 1 N 1 ( X i μ ) + E ( i = m + 1 N 1 ( X i μ ) ) 2 } } | X 1 ,   X 2 , ,   X m .  
Provided the σ f i e l d  generated b y   X 1 , X 2 , …   X m , the   summation   i = 1 m ( X i μ ) is non-random, so is N 1 .  Thus, similar to the arguments used above and the fact that
E i = m + 1 N 1 ( X i μ ) = 0 ,
and
E N 2 E ( i = m + 1 N 1 ( X i μ ) ) 2 } = E { N 1 2 σ 2 ( N 1 m ) } σ 2 E ( N 1 ) = o ( λ 1 )
provides
I = E { N 1 2 ( i = 1 m ( X i μ ) ) 2 } + o ( λ 1 ) .
Consequently, we expand N 1 2  in Taylor series around δ n *  and substitute N 1 = η X ¯ N 1 , to obtain
I = E ( i = 1 m ( X i μ ) ) 2 ( δ n * ) 2 2 μ δ λ E ( i = 1 m ( X i μ ) ) 3 ( δ n * ) 2 m + E ( R 2 N 1 ) ,
for the normal distribution, E ( i = 1 m ( X i μ ) ) 3 = 0 , and E ( i = 1 m ( X i μ ) ) 2 = σ 2 m and I = σ 2 m ( δ n * ) 2 + E ( R 2 N 1 ) by assumption A, m n *   δ . Furthermore,   E ( R 2 N 1 ) = o ( λ 2 ) by arguments similar to those used to evaluate E ( R 1 N 1 ) = o ( λ 1 ) ,  and finally,
I = σ 2 δ n * +   o ( λ 1 ) .
Now, recall I I :
I I = E { 2 μ N 1 1 { i = 1 m ( X i μ ) } } ,
and expand N 1 1 in Taylor series expansion; while we substitute N 1 = η X ¯ N 1 , we obtain
I I = 2 μ σ η δ n * + o ( λ 1 ) = 2 σ 2 δ n * + o ( λ 1 ) .
Combine terms until we finally obtain
E ( X ¯ N 1 2 ) = μ 2 σ 2 δ n * + o ( λ 2 ) ,  
which proves (ii) Theorem 1. Parts (iii), (v), and (vi) follow immediately; we omit further details for brevity.
The following Theorem 2 provides a second-order approximation of the expectation of a real-valued function g ( > 0 ) o f   S N 1 .  
Theorem 2.
Let g ( > 0) be a real-valued continuously differentiable and bounded function in a neighborhood around σ , such that S u p n m   g ( n ) = o | g ( n * ) | ; then, as λ , we obtain
E g ( S N 1 ) = g ( σ ) σ η 2 δ n * g ( σ ) + σ 2 η 2 2 ( δ n * ) g ( σ ) + o ( g ( λ ) )
Proof.
By expanding g ( S N 1 ) around σ using the Taylor series and taking the expectation over the terms we obtain
E g ( S N 1 ) = g ( σ ) + g ( σ ) E ( S N 1 σ ) + 1 2 g ( σ ) E ( S N 1 σ ) 2 + o ( g ( λ ) )
utilizing parts ( i v ) and ( v ) of Theorem 1, and the assumption that  g   is   a   bounded   function ; the proof is complete. □
Theorem 3.
Under assumption (A), for the triple sampling procedure (8) and (9), for all fixed μ and  σ 2 , with L n provided by (1), we obtain, as λ ,
(i)
E ( N ) = n * η 2 ( 1 + δ ) δ + 1 2 + o ( 1 )
(ii)
E ( N n * ) 2 = n * η 2 δ + o ( λ )
(iii)
E | N n * | 3 = o ( λ 2 )
Proof.
Part (i): We noticed N = T except possibly on a set of measures zero. That is,
ψ = { S < m }   { λ S N 1 η 2 < δ ( λ S m η 2 ) + 1 } ,
where ψ N m d P = o ( λ m 1 ) . Therefore,
N m = ( ( λ S N 1 η 2 ) + β N 1 ) m + o ( λ m 1 ) ,
where β N 1 = 1 { ( λ S N 1 η 2 ) [ ( λ S N 1 η 2 ) ] } . From Hall [26], as λ , β N 1 D U ( 0 , 1 ) .
By setting m = 1 , we obtain
E ( N ) = E ( λ S N 1 η 2 ) + E ( β N 1 ) + o ( 1 ) .
Part (iv) of Theorem 1 justifies (i) of Theorem 3. Next,
E ( N n * ) 2 = λ 2 E ( S N 1 σ ) 2
Substitute (vi) of Theorem 1, and the proof of part (ii) is immediate. Part (iii) of Theorem 3 is straightforward if we use Theorem 2 with g ( S N 1 ) = S N 1 3 ; we obtain
E | S N 1 σ | 3 = o ( λ 1 ) .
Part (i) of Theorem 1 shows that if η 2 > δ 2 ( 1 + δ ) then we attain early stopping. □
Lemma 1 provides a second-order approximation of a real-valued continuously differentiable function 𝒽 ( > 0 ) of the final stage stopping time N.
Lemma 1.
Let 𝒽 ( > 0) be a real-valued, continuously differentiable, and bounded function around n * , such that sup n > m | 𝒽 h ( n ) | = o | 𝒽 ( n * ) | ; as λ , then we obtain
E ( 𝒽 ( N ) ) = 𝒽 ( n * ) + 𝒽 ( n * ) { η 2 ( 1 + δ ) δ + 1 2 } + n * η 2 𝒽 ( n * ) 2 δ + o ( λ 2 | 𝒽 ( n * ) | ) .
Proof.
The proof follows immediately by expanding 𝒽 ( N ) around n * and taking the expectation all over the terms; we obtain
E ( 𝒽 ( N ) ) = 𝒽 ( n * ) + 𝒽 ( n * ) E ( N n * ) + 𝒽 ( n * ) 2 E ( N n * ) 2 + 1 6 E { 𝒽 ( ρ ) ( N n * ) 3 }
By utilizing Theorem 3, parts (i), (ii), and (iii), the proof is complete. □
Theorem 3.
For the triple sampling procedure (8) and (9) and (1), the asymptotic risk and regret, as λ , is
(i)
R N ( A ) = 2 c n * + c η 2 + c δ η 2 + o ( λ )
(ii)
ω ( A ) = c δ η 2 + o ( 1 )
Proof.
(i) It is seen that
E R N ( A ) = A E ( μ ^ N μ ) 2 + c E ( N ) , E ( μ ^ n μ ) 2 = n = m E ( ( μ ^ N μ ) 2 | N = n )   P ( N = n ) .
Since the events { N = n } and μ ^ N are stochastically independent for all n = m ,   m + 1 , then,
E ( μ ^ n μ ) 2 = n = m E ( μ ^ n μ ) 2   P ( N = n ) = n = m   σ 2 n + η 2   P ( N = n ) = E ( σ 2 N + η 2 ) .
It follows that
R N ( A ) = c ( n * + η 2 ) 2 E ( 1 N + η 2 ) + c E ( N ) .
By using Lemma 1 and (i) of Theorem 3, we obtain
E ( 1 N + η 2 ) = c ( n * + η 2 ) + c η 2 ( 1 + δ ) δ   c 2 + c η 2 n * δ ( n * + η 2 ) + o ( λ 2 ) .
Hence,
R N ( A ) = c ( n * + η 2 ) + c η 2 ( 1 + δ ) δ   c 2 + c η 2 n * δ ( n * + η 2 ) + o ( λ 2 ) .
The proof of part (i) is complete.
(ii) It is known that
ω ( A ) = R N ( A ) R n * ( A ) = c η 2 n * δ ( n * + η 2 ) + o ( λ 1 ) ,
ω ( A ) = c η 2 δ ( 1 η 2 n * + η 2 ) + o ( λ 1 )   as   λ .
The proof of part (ii) is complete. □
Part (ii) shows that the amount of regret incurred by estimating the population mean by the Searls’ estimator is c η 2 δ ( 1 η 2 n * + η 2 ) . Thus, the smaller the coefficient of variation, the smaller the regret.

4.2. Triple Sampling Fixed-Width Confidence Interval

Recall that the triple-sampling confidence interval I N = ( μ ^ N d ,   μ ^ N + d , ) . Then, the asymptotic coverage probability is
P ( μ ϵ   I N ) = n = m ( P |   μ ^ N μ | d ,   N = n ) = n = m ( P | μ ^ N μ | d | N = n )   P ( N = n )
The results of Anscombe [48] provide the asymptotic distribution of μ ^ N as standard normal, N ( μ ^ N μ ) σ N ( 0 , 1 ) , as m , independent of the random variable N = m ,   m + 1 ,   m + 2 ,   .
Thus,
P η ( μ I N ) = n = m ( P | n σ ( μ ^ n μ ) | d n σ ) P ( N = n ) = E { 2 Φ ( d σ N ) 1 } ,
where Φ ( a ) = a 1 2 π e t 2 / 2 σ 2 d t .
By using Lemma 1 with h ( N ) = Φ ( N ) we acquire as d 0 , we obtain
P η ( μ I N ) = ( 1 α ) a ϕ ( a ) 4 δ n * { η 2 ( a 2 + 5 + 4 δ ) 2 δ } + o ( d 2 ) ,
where ϕ is the probability density function of N ( 0 ,   1 ) .
It is evident from (15) that the performance of the asymptotic coverage probability depends on the value of η 2 . That is, if η 2 > 2 δ a 2 + 5 + 4 δ , then the asymptotic coverage probability is always less than the desired nominal value, while if η 2 < 2 δ a 2 + 5 + 4 δ , the procedure exceeds the desired nominal value. This shows that the value of the coefficient of variation controls the procedure. For example, at δ = 0.5 and for 1 α = 0.9 ,   0.95 , and 0.99, the respective η 2 = 0.10303 ,   0.09224 , and 0.07323. For example, the calculated coverage probability at δ = 0.5 , n * = 500 , and for 1 α = 0.95 at η = 0.01 ,   0.3 , and 0.5 is, respectively, 0.95011, 0.95, and 0.9498. This shows that knowing the coefficient of variation would control the coverage probability.
For the triple sampling coverage probability for μ using the classical sample mean as an estimator of the mean and with unknown η as d 0 , see Hall [26] and Yousef [35,37].
P ( μ I n ) = ( 1 α ) a ϕ ( a ) 2 δ n * ( a 2 δ + 5 ) + o ( d 2 )
It is evident from (16) that the asymptotic coverage probability is always less than ( 1 α ) and attains the nominal value only asymptotically.

5. Monte Carlo Simulation

To visualize the asymptotic results obtained in the above theorems, wrote FORTRAN codes and ran them using Microsoft Developer Studio software with IMSL. We generated a pilot sample of size m  from the normal distribution with mean μ  and variance σ 2 = η 2 μ 2 . We took n * = 24 ,   43 ,   61 ,   76 ,   96 ,   125 ,   171 ,   246 , and 500; see Hall [26]. Such selected values of the optimal sample size allowed us to explore the procedure’s performance as the optimal sample size increased. For brevity, we took μ = 2 ,   δ = 0.5 ,   m = 15 , 1 α = 95 % , and η = 0.3 . The number of replications was taken at 50,000. For more details about the simulation methodology, see Yousef [37,40].
The estimates were as follows: N ¯ was the simulated estimate for n *  with standard errors S ( N ¯ ) ;   μ ^ was the simulated estimate for the population mean with standard errors S ( μ ^ ) ; σ ^ was the simulated estimate for the population variance with standard errors S ( σ ^ ) ; ω ^ was the simulated estimate for the asymptotic regret; and finally, 1 α ^ was the simulated estimate for the asymptotic coverage probability. Table 1 shows that, as the optimal sample size increased, N ¯ was always less than n * (early stopping), with standard errors decreasing. μ ^ approached the actual value of the mean, with standard errors decreasing. σ ^ approached the actual value of 0.6, with standard error decreasing. ω ^ was finite and positive (positive regret). The simulated coverage probability 1 α ^ was always less than the targeted value and attained it only asymptotically. This means the triple sampling procedure attains all the above customary measures except exact consistency and provides good estimates in the presence of η .

6. Conclusions

We discussed the triple sampling estimation for the mean of the symmetric distribution–normal distribution when the coefficient of variation is known. We used the Searls’ estimator as an initial estimator for the mean. Such a problem can be used in quality control. We studied the minimum risk point estimation under a squared error loss function with linear sampling cost and found the asymptotic risk and regret. Then, we utilized the asymptotic results to construct a confidence interval with a precise width and coverage probability. We found the region where the asymptotic coverage probability was either less than or exceeded the desired nominal value. The theoretical results show that the procedure is sensitive to the choice of the coefficient of variation. Finally, a series of simulation results were conducted to explore the performance of the estimates as the optimal sample size increased, and these agreed with the theoretical results.

Author Contributions

Conceptualization, A.A., A.Y. and H.H.; methodology, A.A., A.Y. and H.H.; validation, A.A., A.Y. and H.H.; formal analysis, A.A., A.Y. and H.H.; investigation, A.A., A.Y. and H.H.; resources, A.A., A.Y. and H.H.; data curation, A.A., A.Y. and H.H.; writing—original draft preparation, A.A., A.Y. and H.H.; writing—review and editing, A.A., A.Y. and H.H.; visualization, A.A., A.Y. and H.H.; supervision, A.A., A.Y. and H.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

All data information is mentioned in the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Searls, D.T. The utilization of a known coefficient of variation in the estimate procedure. J. Am. Stat. Assoc. 1964, 59, 1225–1226. [Google Scholar] [CrossRef]
  2. Sen, A.R. Relative Efficiency of Estimators of the Mean of a Normal Distribution when Coefficient of Variation is Known. Biom. J. 1979, 21, 131–137. [Google Scholar] [CrossRef]
  3. Arnholt, A.T.; Hebert, J.L. Estimating the mean with known coefficient of variation. Am. Stat. 1995, 49, 367–369. [Google Scholar]
  4. Sinha, S.K. Bayesian Estimation of the Mean of a Normal Distribution when the Coefficient of Variation is Known. J. R. Stat. Soc. Ser. D 1983, 32, 339. [Google Scholar] [CrossRef]
  5. Gleser, L.J.; Healy, J.D. Estimating the mean of normal distribution with known coefficient of variation. J. Am. Stat. Assoc. 1976, 71, 977–981. [Google Scholar] [CrossRef]
  6. Guo, H.; Pal, N. On a Normal Mean with Known Coefficient of Variation. Calcutta Stat. Assoc. Bull. 2003, 54, 17–30. [Google Scholar] [CrossRef]
  7. Anis, M. Estimating the Mean of Normal Distribution with Known Coefficient of Variation. Am. J. Math. Manag. Sci. 2008, 28, 469–487. [Google Scholar] [CrossRef]
  8. Srisodaphol, W.; Tongmol, N. Improved Estimators of the Mean of a Normal Distribution with a Known Coefficient of Variation. J. Probab. Stat. 2012, 2012, 807045. [Google Scholar] [CrossRef]
  9. Hinkley, D.V. Conditional inference about a normal mean with known coefficient of variation. Biometrika 1977, 64, 105–108. [Google Scholar] [CrossRef]
  10. Tang, Y.M.; Zhang, L.; Bao, G.Q.; Ren, F.J.; Pedrycz, W. Symmetric implicational algorithm derived from intuitionistic fuzzy entropy. Iran. J. Fuzzy Syst. 2022, 19, 27–44. [Google Scholar]
  11. Tang, Y.; Pedrycz, W. Oscillation-Bound Estimation of Perturbations Under Bandler–Kohout Subproduct. IEEE Trans. Cybern. 2021, 52, 6269–6282. [Google Scholar] [CrossRef]
  12. Bhat, K.; Rao, K.A. On Tests for a Normal Mean with Known Coefficient of Variation. Int. Stat. Rev. 2007, 75, 170–182. [Google Scholar] [CrossRef]
  13. Brazauskas, V.; Ghorai, J. Estimating the common parameter of normal models with known coefficients of variation: A sensitivity study of asymptotically efficient estimators. J. Stat. Comput. Simul. 2007, 77, 663–681. [Google Scholar] [CrossRef]
  14. Hald, A. Statistical Theory with Engineering Applications; John Wiley and Sons: New York, NY, USA, 1952. [Google Scholar]
  15. Davies, O.L.; Goldsmith, P.L. Statistical Methods in Research and Production; Longman Group Ltd.: London, UK, 1976. [Google Scholar]
  16. Niwitpong, S. Confidence Intervals for the Normal Mean with Known Coefficient of Variation. Int. J. Math. Comput. Sci. 2012, 69, 677–680. [Google Scholar]
  17. Fu, Y.; Wang, H.; Wong, A. Inference for the Normal Mean with Known Coefficient of Variation. Open J. Stat. 2013, 3, 45–51. [Google Scholar] [CrossRef]
  18. Efron, B. Defining the Curvature of a Statistical Problem (with Applications to Second Order Efficiency). Ann. Stat. 1975, 3, 1189–1242. [Google Scholar] [CrossRef]
  19. Stein, C. A Two-Sample Test for a Linear Hypothesis Whose Power is Independent of the Variance. Ann. Math. Stat. 1945, 16, 243–258. [Google Scholar] [CrossRef]
  20. Stein, C. Some problems in sequential estimation (abstract). Econometrics 1949, 17, 77–78. [Google Scholar]
  21. Cox, D.R. Estimation by double sampling. Biometrika 1952, 39, 217–227. [Google Scholar] [CrossRef]
  22. Anscombe, F.J. Sequential Estimation. J. R. Stat. Soc. Ser. B 1953, 15, 1–21. [Google Scholar] [CrossRef]
  23. Ray, W.D. Sequential Confidence Intervals for the Mean of a Normal Population with Unknown Variance. J. R. Stat. Soc. Ser. B 1957, 19, 133–143. [Google Scholar] [CrossRef]
  24. Chow, Y.S.; Robbins, H. On the asymptotic theory of fixed width sequential confidence intervals for the mean. Ann. Math. Stat. 1965, 36, 1203–1212. [Google Scholar] [CrossRef]
  25. Mukhopadhyay, N.; de Silva, B.M. Sequential Methods and Their Applications; Chapman and Hall/CRC: London, UK, 1950. [Google Scholar]
  26. Hall, P. Asymptotic Theory of Triple Sampling for Sequential Estimation of a Mean. Ann. Stat. 1981, 9, 1229–1238. [Google Scholar] [CrossRef]
  27. Ghosh, M.; Mukhopadhyay, N. Consistency and asymptotic efficiency of two-stage and sequential procedures. Sankhya Ser. A 1981, 43, 220–227. [Google Scholar]
  28. Mukhopadhyay, N. Stein’s two-stage procedure and exact consistency. Scand. Actuar. J. 1982, 1982, 110–122. [Google Scholar] [CrossRef]
  29. Ghosh, M.; Mukhopadhyay, N.; Sen, P. Sequential Estimation; Wiley Series in Probability and Statistics: New York, NY, USA, 1997. [Google Scholar]
  30. Mukhopadhyay, N. Some properties of a three-stage procedure with applications in sequential analysis. Indian J. Stat. Ser. A 1990, 52, 218–231. [Google Scholar]
  31. Mukhopadhyay, N.; Hamdy, H.; Al-Mahmeed, M.; Costanza, M. Three-Stage point Estimation Procedures for a Normal Mean. Seq. Anal. 1987, 6, 21–36. [Google Scholar] [CrossRef]
  32. Hamdy, H.I. Remarks on the asymptotic theory of triple stage estimation of the normal mean. Scand. J. Stat. 1988, 15, 303–310. [Google Scholar]
  33. Liu, W. Fixed-width simultaneous confidence intervals for all-pairwise comparisons. Comput. Stat. Data Anal. 1995, 20, 35–44. [Google Scholar] [CrossRef]
  34. Yousef, A.; Kimber, A.; Hamdy, H. Sensitivity of normal-based triple sampling sequential point estimation to the normality assumption. J. Stat. Plan. Inference 2013, 143, 1606–1618. [Google Scholar] [CrossRef]
  35. Yousef, A. A note on a three-stage sequential confidence interval for the mean when the underlying distribution departs away from normality. Int. J. Appl. Math. Stat. 2018, 57, 57–69. [Google Scholar]
  36. Son, M.S.; Haugh, L.D.; Hamdy, H.I.; Costanza, M.C. Controlling Type II Error While Constructing Triple Sampling Fixed Precision Confidence Intervals for the Normal Mean. Ann. Inst. Stat. Math. 1997, 49, 681–692. [Google Scholar] [CrossRef]
  37. Yousef, A.S. Constructing a Three-Stage Asymptotic Coverage Probability for the Mean Using Edgeworth Second-Order Approximation. In International Conference on Mathematical Sciences and Statistics; Springer: Singapore, 2014; pp. 53–67. [Google Scholar] [CrossRef]
  38. Yousef, A.; Hamdy, H. Three-Stage Estimation of the Mean and Variance of the Normal Distribution with Application to an Inverse Coefficient of Variation with Computer Simulation. Mathematics 2019, 7, 831. [Google Scholar] [CrossRef]
  39. Yousef, A.; Hamdy, H. Three-Stage Sequential Estimation of the Inverse Coefficient of Variation of the Normal Distribution. Computation 2019, 7, 69. [Google Scholar] [CrossRef]
  40. Yousef, A. Performance of Three-Stage Sequential Estimation of the Normal Inverse Coefficient of Variation Under Type II Error Probability: A Monte Carlo Simulation Study. Front. Phys. 2020, 8, 71. [Google Scholar] [CrossRef]
  41. Yousef, A.; Amin, A.A.; Hassan, E.E.; Hamdy, H.I. Multistage Estimation of the Rayleigh Distribution Variance. Symmetry 2020, 12, 2084. [Google Scholar] [CrossRef]
  42. Yousef, A.; Hassan, E.; Amin, A.; Hamdy, H. Multistage Estimation of the Scale Parameter of Rayleigh Distribution with Simulation. Symmetry 2020, 12, 1925. [Google Scholar] [CrossRef]
  43. Banerjee, B.; Mukhopadhyay, N. Minimum risk point estimation for a function of a normal mean under weighted power absolute error loss plus cost: First-order and second-order asymptotics. Seq. Anal. 2021, 40, 336–369. [Google Scholar] [CrossRef]
  44. Chaturvedi, A.; Tomer, S.K. Three-stage and ‘accelerated’ sequential procedures for the mean of a normal population with known coefficient of variation. Statistics 2003, 37, 51–64. [Google Scholar] [CrossRef]
  45. Hall, P. Sequential Estimation Saving Sampling Operations. J. R. Stat. Soc. Ser. B 1983, 45, 219–223. [Google Scholar] [CrossRef]
  46. Dantzig, G.B. On the non-existence of tests of student’s hypothesis having power function independent of σ. Ann. Math. Stat. 1940, 11, 186–192. [Google Scholar] [CrossRef]
  47. Seelbinder, B.M. On Stein’s Two-stage Sampling Scheme. Ann. Math. Stat. 1953, 24, 640–649. [Google Scholar] [CrossRef]
  48. Anscombe, F.J. Large Sample Theory of Sequential Estimation. Math. Proc. Camb. Phil. Soc. 1952, 45, 600–607. [Google Scholar] [CrossRef]
Table 1. The simulated estimates of the triple sampling procedure at η = 0.3.
Table 1. The simulated estimates of the triple sampling procedure at η = 0.3.
n* N ¯ S ( N ¯ ) μ ^ S ( μ ^ ) σ ^ S ( σ ^ ) ω ^ 1 − α ^
2419.360.0371.99010.00060.57000.00044.270.9062
4338.220.0681.99370.00050.56600.000413.590.9040
6156.550.0801.99600.00040.57730.000312.430.9218
7671.650.0881.99710.00030.58380.000311.430.9279
9691.810.0981.99820.00030.58890.00029.810.9344
125120.950.1121.99850.00030.59230.00028.020.9380
171166.990.1301.99840.00020.59490.00024.600.9433
246242.390.1541.99920.00020.59680.00017.620.9445
500496.570.2221.99960.00010.59850.000111.040.9474
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Alhajraf, A.; Yousef, A.; Hamdy, H. Triple Sampling Inference Procedures for the Mean of the Normal Distribution When the Population Coefficient of Variation Is Known. Symmetry 2023, 15, 672. https://doi.org/10.3390/sym15030672

AMA Style

Alhajraf A, Yousef A, Hamdy H. Triple Sampling Inference Procedures for the Mean of the Normal Distribution When the Population Coefficient of Variation Is Known. Symmetry. 2023; 15(3):672. https://doi.org/10.3390/sym15030672

Chicago/Turabian Style

Alhajraf, Ali, Ali Yousef, and Hosny Hamdy. 2023. "Triple Sampling Inference Procedures for the Mean of the Normal Distribution When the Population Coefficient of Variation Is Known" Symmetry 15, no. 3: 672. https://doi.org/10.3390/sym15030672

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop