Next Article in Journal
Stochastic Dynamic Buckling Analysis of Cylindrical Shell Structures Based on Isogeometric Analysis
Next Article in Special Issue
Dual Transformation of Auxiliary Variables by Using Outliers in Stratified Random Sampling
Previous Article in Journal
Pricing a Defaultable Zero-Coupon Bond under Imperfect Information and Regime Switching
Previous Article in Special Issue
Weighted Ranked Set Sampling for Skewed Distributions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

New Techniques for Estimating Finite Population Variance Using Ranks of Auxiliary Variable in Two-Stage Sampling

by
Umer Daraz
1,
Mohammed Ahmed Alomair
2,*,
Olayan Albalawi
3 and
Abdulaziz S. Al Naim
4
1
School of Mathematics and Statistics, Central South University, Changsha 410017, China
2
Department of Quantitative Methods, School of Business, King Faisal University, Al-Ahsa 31982, Saudi Arabia
3
Department of Statistics, Faculty of Science, University of Tabuk, Tabuk 71491, Saudi Arabia
4
Accounting Department, Business School, King Faisal University, Al-Ahsa 31982, Saudi Arabia
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(17), 2741; https://doi.org/10.3390/math12172741
Submission received: 5 August 2024 / Revised: 27 August 2024 / Accepted: 2 September 2024 / Published: 3 September 2024
(This article belongs to the Special Issue Survey Statistics and Survey Sampling: Challenges and Opportunities)

Abstract

:
This article presents a new set of estimators designed to estimate the finite population variance of a study variable in two-phase sampling. These estimators utilize the information about extreme values and ranks of an auxiliary variable. Through a first-order approximation, we investigate the properties of these estimators, including biases and mean squared errors (MSEs). Furthermore, a comprehensive simulation study is conducted to assess their performance and validate our theoretical insights. Results demonstrate that our proposed class of estimators performs better in terms of percent relative efficiency (PRE) across various simulation scenarios compared to existing estimators. In addition, in the application section, we utilize three data sets to further validate the performance of our proposed estimators against conventional unbiased variance estimators, ratio and regression estimators, as well as other existing methods.

1. Introduction

In sampling theory, it is standard practice to incorporate auxiliary variables alongside the study variable to enhance design and improve the estimator efficiency by using their relationship. However, in many practical scenarios, information about auxiliary variables is not available before conducting a survey. In such cases, a two-phase sampling technique is preferred. Two-phase sampling, also known as double sampling, involves two distinct phases to select a sample from a population. Two-phase sampling is a cost-effective sampling scheme, so it plays an important role in sample surveys and is widely used when the auxiliary information is not available in advance. A brief review of two-phase sampling was first introduced by Neyman [1]. Following that, the topic was not investigated until the research conducted by Sukhatme [2]. In recent years, two-phase sampling has received much interest due to its abilities of screening variables at a low cost. Some studies on two-phase sampling include [3,4,5,6,7,8,9,10].
Variation is an inherent phenomenon of nature, and the estimation problem of finite population variance is a significant concern. Das [11] first discussed the use of auxiliary information to estimate population variance, which was extended by Isaki [12]. The accuracy of the estimators can be improved by carefully using auxiliary information. To estimate population variance, Bahl and Tuteja [13] suggested a few ratio- and product-type exponential estimators. Many scholars have provided a number of estimators for finite population variance, including [14,15,16,17,18,19,20,21,22].
There may be abnormal observation results in the sampling survey data when extreme values are part of the sample, which may lead to biased results. In this sense, some researchers have worked on extreme values and provided different kinds of transformations to estimate population characteristics. Through a linear transformation, Mohanty [23] provided two estimators given the known minimum and largest observations of the auxiliary variable. After that, these works were not examined any more until the work of Khan [24]. They used several finite population mean estimators and the idea of employing extreme values in them. Daraz et al. [25] enhanced the estimate of the finite population mean under extreme values by employing a stratified random sampling technique. Through transformations of extreme values, Daraz and Khan [26] proposed many efficient classes of estimators to estimate population variance. A recent study by Daraz et al. [27] looked into the characteristics of finite population variance and presented various types of estimators with minimal mean squared errors. By using the extreme values of the auxiliary variable, Daraz et al. [28] suggested double exponential ratio-type estimators to discuss the efficiency of the estimators for estimating population variance. In order to address the accuracy of the estimators through the linear transformation of extreme values and ranks of auxiliary variables, Daraz et al. [29] obtained a class of efficient estimators by utilizing the dual use of auxiliary variables under simple random sampling. For more details, see [30,31,32] and the references therein.
The significance of classical estimators tends to decrease in terms of the mean squared error ( M S E ) when handling extreme values in a data set. The temptation to exclude such data from the sample may happen. Including these data in the process of determining population characteristics is important for properly addressing this difficulty. If the auxiliary variable and the study variable are related, the rankings of the auxiliary variable are associated with the study variable. Thus, these rankings can be employed as an effective way to increase the accuracy of the estimator. In this article, the extreme values of the auxiliary variable are kept in the data and are used as supplementary information to increase the accuracy of the proposed class of estimators. Inspired by [26,27,28,29], we use the transformations technique to provide a new class of estimators by utilizing the known information on the extreme values and the ranks of the auxiliary variable to estimate the finite population variance in two-phase sampling.
The following sections comprise this article: The concepts and notations are presented in Section 2. A number of existing estimators are included in Section 3. In Section 4, we describe our suggested class of estimators. The mathematical comparison is given in Section 5. To evaluate the theoretical results presented in Section 5, we simulate six distinct artificial populations using different probability distributions in Section 6. We also provide numerical examples in this section to validate our theoretical findings. In conclusion, the results are discussed along with suggestions for further research, as offered in Section 7.

2. Concepts and Notations

Let us consider a finite population ϕ = ϕ 1 , ϕ 2 , , ϕ N of size N units. Let the i t h unit values of the dependent (study) variable Y, the independent (auxiliary) variable X, and the corresponding ranks of the independent variable R be represented by y i , x i , and r i , respectively. The population variances for these variables are defined as
S y 2 = 1 N 1 i = 1 N Y i Y ¯ 2 ,
S x 2 = 1 N 1 i = 1 N X i X ¯ 2
and
S r 2 = 1 N 1 i = 1 N R i R ¯ 2 ,
where
Y ¯ = 1 N i = 1 N Y i , X ¯ = 1 N i = 1 N X i ,   and R ¯ = 1 N i = 1 N R i ,
are the corresponding population means of Y , X , and R. For these variables ( Y , X , R ) , the population coefficients of variation are as follows:
C y = S y Y ¯ , C x = S x X ¯ ,   and C r = S r R ¯ ,
respectively. Additionally, we have knowledge about the population correlation coefficients that exist between Y and X ,  Y and R , and X and R, as follows:
ρ y x = S y x S y S x , ρ y r = S y r S y S r ,   and ρ x r = S x r S x S r ,
where
S y x = 1 N 1 i = 1 N Y i Y ¯ X i X ¯ ,
S y r = 1 N 1 i = 1 N Y i Y ¯ R i R ¯
and
S x r = 1 N 1 i = 1 N x i X ¯ R i R ¯ ,
are the population covariances, respectively.
In this paper, we provide a set of estimators to estimate the finite population variance S y 2 of Y in the presence of the auxiliary variable X. The definition of the two-phase sampling scheme is
  • A sample of size (ń < N) from the first phase is chosen in order to estimate the population variance S x 2 .
  • For the second phase, a sample size of (n < ń) is chosen in order to observe both y and x, respectively.
The biases and mean squared errors for various estimators can be derived by defining the following terms:
ξ 0 = s y 2 S y 2 S y 2 ,   ξ 1 = s x 2 S x 2 S x 2 ,   ξ 2 = s ´ x 2 S x 2 S x 2 ,   ξ 3 = s r 2 S r 2 S r 2 ,   ξ 4 = s ´ r 2 S r 2 S r 2
such that E ξ i = 0 for i = 0 , 1 , 2 , 3 4 .
E ξ 0 2 = η Δ 400 * , E ξ 1 2 = η Δ 040 * , E ξ 2 2 = η Δ 040 * ,
E ξ 3 2 = η Δ 004 * , E ξ 4 2 = η Δ 004 * , E ξ 0 ξ 1 = η Δ 220 * ,
E ξ 0 ξ 2 = η Δ 220 * , E ξ 0 ξ 3 = η λ 202 * , E ξ 0 ξ 4 = η Δ 202 * ,
E ξ 1 ξ 2 = η Δ 040 * , E ξ 1 ξ 3 = η Δ 022 * , E ξ 1 ξ 4 = η Δ 022 * ,
E ξ 2 ξ 3 = η Δ 022 * , E ξ 2 ξ 4 = η Δ 022 * , E ξ 3 ξ 4 = η Δ 004 * ,
where
Δ 400 * = ( Δ 400 1 ) , Δ 040 * = ( Δ 040 1 ) , Δ 004 * = ( Δ 004 1 ) , Δ 220 * = ( Δ 220 1 ) , Δ 202 * = ( Δ 202 1 ) , Δ 022 * = ( Δ 022 1 ) , η = 1 n 1 N , η = 1 n ´ 1 N ,   and η = 1 n 1 n ´ .
Also,
Δ l q s = φ l q s φ 200 l / 2 φ 020 q / 2 φ 002 s / 2 ,
where
φ l q s = i = 1 N Y i Y ¯ l X i X ¯ q R i R ¯ s N 1 .
Here, Δ 400 = β 2 ( y ) , Δ 040 = β 2 ( x ) , and Δ 004 = β 2 ( r ) are the population coefficients of kurtosis.

3. Literature Review

In this section, our next step is to compare and contrast existing estimators for finite population variances with the proposed class of estimators.
The usual variance estimator of S ^ y 2 = s y 2 for population variance is given by
V a r ( S ^ y 2 ) = η S y 4 Δ 400 * .
Isaki [12] suggested a ratio estimator for population variance S ^ R 2 , which is given by
S ^ R 2 = s y 2 s ´ x 2 s x 2 .
The bias and M S E of S ^ R 2 are expressed as follows:
B i a s S ^ R 2 η S y 4 Δ 040 * Δ 220 *
and
M S E S ^ R 2 S y 4 η Δ 400 * + η Δ 040 * 2 η Δ 220 * .
Watson [33] proposed the linear regression estimator S ^ l r 2 , which is defined as
S ^ l r 2 = s y 2 + b ( s y 2 , s x 2 ) s ´ x 2 s x 2 ,
where b ( s y 2 , s x 2 ) = s y 2 Δ ^ 220 * s x 2 Δ ^ 040 * is the sample regression coefficient.
The M S E of the estimator S ^ l r 2 is expressed as follows:
M S E S ^ l r 2 S y 4 Δ 400 * η η ρ y x * 2 ,
where ρ y x * = Δ 220 * Δ 400 * Δ 040 * .
Bahal and Tuteja [13] introduced an exponential ratio type estimator S ^ b t 2 , which is defined as follows:
S ^ b t 2 = s y 2 exp s ´ x 2 s x 2 s ´ x 2 + s x 2 .
The bias and M S E of S ^ b t 2 are expressed as follows:
B i a s S ^ b t 2 1 2 η S y 2 3 Δ 040 * 4 Δ 220 *
and
M S E S ^ b t 2 S y 4 η Δ 400 * + η Δ 040 * 4 Δ 220 * .
A ratio-type estimator called S ^ u s 2 developed by Upadhyaya and Singh [14], which employs the kurtosis of an auxiliary variable, is expressed as follows:
S ^ u s 2 = s y 2 S x 2 + Δ 040 s x 2 + Δ 040 .
The bias and M S E of S ^ u s 2 are expressed as follows:
B i a s S ^ u s 2 η g S y 2 g Δ 040 * Δ 220 *
and
M S E S ^ u s 2 S y 4 η Δ 400 * + η g 2 Δ 040 * 2 g Δ 220 * ,
where g = S x 2 S x 2 + Δ 040 .
Kadilar and Cingi [16] suggested that some ratio estimators are defined as
S ^ c k 1 2 = s y 2 s ´ x 2 + C x s x 2 + C x ,
S ^ c k 2 2 = s y 2 Δ 040 s ´ x 2 + C x Δ 040 s x 2 + C x
and
S ^ c k 3 2 = s y 2 C x s ´ x 2 + Δ 040 C x s x 2 + Δ 040 .
The bias and M S E of S ^ c k i 2 ( i = 1 , 2 , 3 ) are expressed as follows:
B i a s S ^ c k i 2 η t i S y 2 t i Δ 040 * Δ 220 * ,
and
M S E S ^ c k i 2 S y 4 η Δ 400 * + η t i 2 λ 040 * 2 t i Δ 220 * ,
where t 1 = S x 2 S x 2 + C x , t 2 = Δ 040 S x 2 Δ 040 S x 2 + C x , and t 3 = C x S x 2 C x S x 2 + Δ 040 .

4. Proposed Class of Estimators

This section presents an improved class of estimators inspired by prior works [26,27,28,29]. These estimators employ minimum and maximum values of auxiliary variables, along with their ranks, in two-phase sampling to estimate the variance of the finite population. The suggested estimator is defined as
S ^ T 2 = s y 2 exp θ 1 γ 1 s ´ x 2 s x 2 γ 1 s ´ + s x 2 + 2 γ 2 exp θ 2 γ 3 s ´ r 2 s r 2 γ 3 s ´ r 2 + s r 2 + 2 γ 4 ,
where θ i , i = 1 , 2 are known constants values either (1 or 2), and γ i , i = 1 , 2 , 3 , 4 are the parameters of the auxiliary variables. The minimum and maximum observations of the independent (auxiliary) variable are represented by ( x m , x M ) , while the minimum and maximum observations of the ranks of the independent variable are represent by ( R m , R M ) . The known values of γ 1 , γ 2 are given in Table 1, γ 3 = 1 , and γ 4 = R M R m . We can introduce the different classes of the recommended estimator from (18), which are listed in Table 1.
Now, we discuss the properties of the new proposed class of estimators; we rewrite (18) in terms of errors to obtain the bias and the M S E of S ^ T 2 , i.e.,
S ^ T 2 = S y 2 1 + ξ 0 exp b 1 ξ 2 ξ 1 2 1 + b 1 2 ξ 1 + ξ 2 1 exp b 2 ξ 4 ξ 3 2 1 + b 2 2 ξ 3 + ξ 4 1
where
θ 1 = θ 2 = 1 , b 1 = γ 1 S x 2 γ 1 S x 2 + γ 2 , and b 2 = S r 2 S r 2 + γ 4 .
Applying the Taylor series to the first approximation order, we obtain
S ^ T 2 S y 2 S y 2 ξ 0 b 1 2 ξ 1 ξ 2 b 2 2 ξ 3 ξ 4 + 3 b 1 2 8 ξ 1 2 b 1 2 8 ξ 2 2 + b 2 2 8 ξ 3 2 b 2 2 8 ξ 4 2 ξ 1 2 ξ 0 ξ 1 + b 1 2 ξ 0 ξ 2 b 2 2 ξ 0 ξ 3 + b 2 2 ξ 0 ξ 4 b 1 2 2 ξ 1 ξ 2 + b 1 b 2 4 ξ 1 ξ 3 b 1 b 2 4 ξ 1 ξ 4 b 1 b 2 4 ξ 2 ξ 3 + b 1 b 2 4 ξ 2 ξ 4 b 2 2 2 ξ 3 ξ 4 .
Using Equation (20), the bias of S ^ T 2 is given by
B i a s S ^ T 2 η S y 2 3 b 1 2 8 Δ 040 * + 3 b 2 2 8 Δ 004 * b 1 2 Δ 220 * b 2 2 Δ 202 * + b 1 b 2 2 Δ 022 * η S y 2 3 b 1 2 8 Δ 040 * + 3 b 2 2 8 Δ 004 * b 1 2 Δ 220 * b 2 2 Δ 202 * + b 1 b 2 2 λ 022 * .
After the simple simplifications, we get
B i a s S ^ T 2 η S y 2 3 8 b 1 2 Δ 040 * + b 2 2 Δ 004 * 1 2 b 1 Δ 220 * + b 2 Δ 202 * b 1 b 2 Δ 022 * ,
where η = η η .
In order to obtain a first-order approximation of the M S E , we squared both sides of Equation (20) and then applied the expected value, which is given by the following equation:
M S E S ^ T 2 η S y 4 Δ 400 * + b 1 2 4 Δ 040 * + b 2 2 4 Δ 004 * b 1 Δ 220 * b 2 Δ 202 * + b 1 b 2 2 Δ 022 * η S y 4 b 1 2 4 Δ 040 * + b 2 2 4 Δ 004 * b 1 Δ 220 * b 2 Δ 202 * + t 4 t 5 2 λ 022 * .
After the simplification, we get
M S E S ^ T 2 S y 4 η Δ 400 * + η 4 b 1 2 Δ 040 * + b 2 2 Δ 004 * 4 b 1 Δ 220 * 4 b 2 Δ 202 * + 2 b 1 b 2 Δ 022 * .

5. Mathematical Comparison

This section covers the comparisons between the suggested estimator S ^ T 2 and several existing estimators, such as S ^ y 2 , S ^ R 2 , S ^ l r 2 , S ^ b t 2 , S ^ u s 2 , and S ^ c k i 2 .
(i)
Comparison of the estimators given in Equations (1) and (22):
V a r ( S ^ y 2 ) > M S E S ^ T 2 if η η b 1 2 Δ 040 * + b 2 2 Δ 004 * 4 b 1 Δ 220 * 4 b 2 Δ 202 * + 2 b 1 b 2 Δ 022 * < 0
For η η > 0 , that is, η > η
b 1 2 Δ 040 * + b 2 2 Δ 004 * 4 b 1 Δ 220 * 4 b 2 Δ 202 * + 2 b 1 b 2 Δ 022 * < 0
Similarly, η η < 0 , that is, η < η
b 1 2 Δ 040 * + b 2 2 Δ 004 * 4 b 1 Δ 220 * 4 b 2 Δ 202 * + 2 b 1 b 2 Δ 022 * > 0
If Conditions (23) or (24) hold true, the suggested estimator S ^ T 2 demonstrates a higher efficiency in comparison to M S E ( S ^ y 2 ) .
(ii)
Comparison of the estimators given in Equations (4) and (22):
M S E ( S ^ R 2 ) > M S E S ^ T 2 if η η Δ 040 * 2 Δ 022 * 1 4 b 1 2 Δ 040 * + b 2 2 Δ 004 * 4 b 1 Δ 220 * 4 b 2 Δ 202 * + 2 b 1 b 2 Δ 022 * > 0
For η η < 0 , that is, η < η
Δ 040 * 2 Δ 022 * 1 4 b 1 2 Δ 040 * + b 2 2 Δ 004 * 4 b 1 Δ 220 * 4 b 2 Δ 202 * + 2 b 1 b 2 Δ 022 * > 0
Similarly, η η > 0 , that is, η > η
Δ 040 * 2 Δ 022 * 1 4 b 1 2 Δ 040 * + b 2 2 Δ 004 * 4 b 1 Δ 220 * 4 b 2 Δ 202 * + 2 b 1 b 2 Δ 022 * < 0
If Conditions (25) or (26) hold true, the suggested estimator S ^ T 2 demonstrates a higher efficiency in comparison to M S E ( S ^ R 2 ) .
(iii)
Comparison of the estimators given in Equations (6) and (22):
M S E ( S ^ l r 2 ) > M S E S ^ T 2 if η η ρ y x * 2 + 1 4 b 1 2 Δ 040 * + b 2 2 Δ 004 * 4 b 1 Δ 220 * 4 b 2 Δ 202 * + 2 b 1 b 2 Δ 022 * < 0
For η η > 0 , that is, η > η
ρ y x * 2 + 1 4 b 1 2 Δ 040 * + b 2 2 Δ 004 * 4 b 1 Δ 220 * 4 b 2 Δ 202 * + 2 b 1 b 2 Δ 022 * < 0
Similarly, η η < 0 , that is, η < η
ρ y x * 2 + 1 4 b 1 2 Δ 040 * + b 2 2 Δ 004 * 4 b 1 Δ 220 * 4 b 2 Δ 202 * + 2 b 1 b 2 Δ 022 * > 0
If Conditions (27) or (28) hold true, the suggested estimator S ^ T 2 demonstrates a higher efficiency in comparison to M S E ( S ^ l r 2 ) .
(iv)
Comparison of the estimators given in Equations (9) and (22):
M S E ( S ^ b t 2 ) > M S E S ^ T 2 if η η b 1 2 Δ 040 * + b 2 2 Δ 004 * 4 b 1 Δ 220 * 4 b 2 Δ 202 * + 2 b 1 b 2 Δ 022 * Δ 040 * 4 Δ 220 * < 0
For θ θ > 0 , that is, η > η
b 1 2 Δ 040 * + b 2 2 Δ 004 * 4 b 1 Δ 220 * 4 b 2 Δ 202 * + 2 b 1 b 2 Δ 022 * Δ 040 * 4 Δ 220 * < 0
Similarly, η η < 0 , that is, η < η
b 1 2 Δ 040 * + b 2 2 Δ 004 * 4 b 1 Δ 220 * 4 b 2 Δ 202 * + 2 b 1 b 2 Δ 022 * Δ 040 * 4 Δ 220 * > 0
If Conditions (29) or (30) hold true, the suggested estimator S ^ T 2 demonstrates a higher efficiency in comparison to M S E ( S ^ b t 2 ) .
(v)
Comparison of the estimators given in Equations (12) and (22):
M S E ( S ^ u s 2 ) > M S E S ^ T 2 if η η b 1 2 Δ 040 * + b 2 2 Δ 004 * 4 b 1 Δ 220 * 4 b 2 Δ 202 * + 2 b 1 b 2 Δ 022 * 4 g 2 Δ 040 * 4 g Δ 220 * < 0
For η η > 0 , that is, η > η
b 1 2 Δ 040 * + b 2 2 Δ 004 * 4 b 1 Δ 220 * 4 b 2 Δ 202 * + 2 b 1 b 2 Δ 022 * 4 g 2 Δ 040 * 4 g Δ 220 * < 0
Similarly, η η < 0 , that is, η < η
b 1 2 Δ 040 * + b 2 2 Δ 004 * 4 b 1 Δ 220 * 4 b 2 Δ 202 * + 2 b 1 b 2 Δ 022 * 4 g 2 Δ 040 * 4 g Δ 220 * > 0
If Conditions (31) or (32) hold true, the suggested estimator S ^ T 2 demonstrates a higher efficiency in comparison to M S E ( S ^ u s 2 ) .
(vi)
Comparison of the estimators given in Equations (17) and (22):
M S E ( S ^ c k i 2 ) > M S E S ^ T 2 if η η b 1 2 Δ 040 * + b 2 2 Δ 004 * 4 b 1 Δ 220 * 4 b 2 Δ 202 * + 2 b 1 b 2 Δ 022 * 4 t i 2 Δ 040 * 4 t i Δ 220 * < 0
For η η > 0 , that is, η > η
b 1 2 Δ 040 * + b 2 2 Δ 004 * 4 b 1 Δ 220 * 4 b 2 Δ 202 * + 2 b 1 b 2 Δ 022 * 4 t i 2 Δ 040 * 4 t i Δ 220 * < 0
Similarly, η η < 0 , that is, η < η
b 1 2 Δ 040 * + b 2 2 Δ 004 * 4 b 1 Δ 220 * 4 b 2 Δ 202 * + 2 b 1 b 2 Δ 022 * 4 t i 2 Δ 040 * 4 t i Δ 220 * > 0
If Conditions (33) or (34) hold true, the suggested estimator S ^ T 2 demonstrates a higher efficiency in comparison to M S E ( S ^ c k i 2 ) .

6. Numerical Comparison

In this section, we assess the effectiveness of the proposed class of estimators as compared to other existing estimators through the percent relative efficiency (PREs). This evaluation is carried out using both simulated data sets and three distinct real data sets.

6.1. Simulation Study

In order to validate the theoretical results presented in Section 5, we employ the methodology suggested by [27,29] to conduct a simulation analysis. The objective is to assess the performance of the proposed class of estimators based on the known minimum and maximum values of the auxiliary variable, as well as its ranks within the framework of two-phase sampling. By employing the following probability distributions, it is possible to carefully produce six different populations for the auxiliary variable X.
  • Population 1: X U n i f o r m ( 2 , 4 ) .
  • Population 2: X U n i f o r m ( 0 , 1 ) .
  • Population 3: X G a m m a ( 3 , 6 ) .
  • Population 4: X G a m m a ( 5 , 10 ) .
  • Population 5: X E x p o n e n t i a l ( 4 ) .
  • Population 6: X E x p o n e n t i a l ( 6 ) .
Subsequently, the dependent variable Y is measured as
Y = r y x × X + e ,
where
r y x = 0.78
signifies the correlation coefficient between the dependent and independent variables, while
e N ( 0 , 1 )
indicates the error term. The selected value of r y x might reflect the quality and consistency of the data. A correlation coefficient r y x = 0.78 suggests that there is relatively low noise and the relationship between x and y is consistent across the data set.
To calculate the percent relative efficiencies (PREs), we adopted the following procedures in R-Software (latest v. 4.4.0).
Step 1: 
Firstly, we make use of particular probability distributions to obtain a population of 1500.
Step 2: 
We apply the simple random sampling without replacement (SRSWOR) approach to obtain a first phase sample of size n ´ from a population of size N.
Step 3: 
Using the SRSWOR approach again, we obtain the second phase sample size n from the first phase sample.
Step 4: 
We calculate the population total and the minimum and maximum values of the auxiliary variable from the above steps.
Step 5: 
For each population, we generate samples of different sizes using SRSWOR.
Step 6: 
For each sample size, we find the P R E s values of all the estimators discussed in this article.
Step 7: 
We executed Steps 5 and 6 a total of 65,000 times. The outcomes for artificial populations are detailed in Table 2, and the results for real data sets are summarized in Table 6.
Finally, we use the following formulas to obtain the MSEs and PREs of each estimator across all replications:
M S E ( S ^ i 2 ) min = j = 1 65,000 S ^ i 2 S y 2 2 65,000
and
P R E = V ( S ^ y s t 2 ) M S E ( S ^ i 2 ) m i n × 100 ,
where i is one of R , l r , b t , u s , c k 1 , c k 2 , c k 3 , T k ( k = 1 , 2 , , 8 ) .

6.2. Numerical Examples

We compared the percent relative efficiencies ( P R E s ) of different estimators using three real data sets in order to assess the performances of the proposed estimators. The descriptions of the data sets are defined below, while summary statistics of the data sets are given in Table 3, Table 4 and Table 5.
  • Data 1. This data set was selected from Bureau of Statistics page 226 [34] and was conducted in Pakistan during the year 2012, which comprised 33 divisions. The data set can be downloaded from the Pakistan Bureau of Statistics web page via the following link: https://www.pbs.gov.pk/content/microdata (accessed on 5 August 2024).
    Y: Departmental employment levels in 2012.
    X: Number of factories the departments registered in 2012.
    R: Ranks the number of factories the departments registered in 2012.
  • Data 2. This data set was selected from Cochran page 23 [35], comprising 33 units of food cost and weekly income of families.
    Y: Food expenses related to the families’ employment.
    X: Families’ weekly income.
    R: Ranks the families’ weekly income.
  • Data 3. Another data set was selected from Bureau of Statistics page 126 [34], conducted in Pakistan during the year 2012, which comprised 33 divisions. The data set can be downloaded from the Pakistan Bureau of Statistics web page via the following link: https://www.pbs.gov.pk/content/microdata (accessed on 5 August 2024).
    Y: The total enrollment of students in 2012.
    X: Government elementary and secondary schools in 2012.
    R: Ranks the government elementary and secondary schools in 2012.
Finally, we use the following formula to calculate the percent relative efficiencies (PREs):
P R E = V ( S ^ y s t 2 ) = M S E ( S ^ y s t 2 ) M S E ( S ^ l 2 ) × 100 ,
where l is one of R , l r , b t , u s , c k 1 , c k 2 , c k 3 , T k ( k = 1 , 2 , , 8 ) .
We used simulation studies and three real data sets in order to determine the performance of the proposed class of estimators. The P R E criterion was used for the comparisons between different estimators. The P R E values of the proposed and existing estimators obtained from the simulation study are given in Table 2, while the outcomes for real data sets are presented in Table 6, respectively. The following are some general findings:
  • For all simulated scenarios and real data sets, Table 2 and Table 6 illustrate that the M S E values of each proposed estimate are smaller than those of the current estimators defined in the literature, confirming the higher accuracy of the recommended estimators over the existing estimators.
  • Furthermore, all of the proposed estimators have P R E values that are higher than those of the existing estimators, as shown in Table 2 and Table 6. This indicates that the performance of the proposed class of estimators is preferred to that of the existing estimators.

7. Conclusions

In order to estimate finite population variance, this study presented a new family of efficient estimators. These estimators considered the extreme values of the auxiliary variable, alongside its ranks. The suggested class of estimators is shown to be more efficient under the theoretical assumptions given in Section 5, enabling a comparative analysis compared to the existing ones. To investigate these constraints, we conducted a simulation study and examined multiple empirical data sets. The results show that the proposed class of estimators regularly outperforms the existing ones in terms of P R E s . The results are shown in Table 2. The results presented in Table 6 provide additional evidence for this conclusion, which is consistent with the theoretical understandings presented in Section 5. We conclude that the suggested estimators S ^ T i 2 ( i = 1 , 2 , 3 , , 8 ) are more efficient than the other estimators taken into consideration based on both the simulation and empirical results. Among these, S ^ T 2 2 is particularly preferred due to its minimal M S E .
However, we examined the characteristics of the proposed efficient class of estimators within a two-phase sampling framework. Additionally, it is feasible to develop new estimators using the two-phase stratified sampling method, and our findings may assist in identifying more efficient estimators with lower M S E s. It is also a good topic for future research work.

Author Contributions

Conceptualization, U.D.; software, U.D.; validation, U.D., M.A.A. and O.A.; formal analysis, U.D., M.A.A., O.A. and A.S.A.N.; investigation, U.D.; resources, U.D., M.A.A. and O.A.; data curation, U.D. and O.A.; writing—original draft, U.D.; writing—review and editing, U.D.; visualization, U.D., M.A.A. and O.A.; supervision, U.D. and M.A.A.; project administration, U.D., M.A.A., O.A. and A.S.A.N.; funding acquisition, U.D., M.A.A. and A.S.A.N. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Deanship of Scientific Research, Vice Presidency for Graduate Studies and Scientific Research, King Faisal University, Saudi Arabia [Grant No. KFU241734].

Data Availability Statement

The real data are secondary, and their sources are given in the data section, while the simulated data have been generated using R software (latest v. 4.4.0).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Neyman, J. Contribution to the theory of sampling human population. J. Am. Stat. Assoc. 1938, 33, 101–116. [Google Scholar] [CrossRef]
  2. Sukhatme, B.V. Some ratio-type estimators in two-phase sampling. J. Am. Stat. Assoc. 1960, 57, 628–632. [Google Scholar] [CrossRef]
  3. Erinola, A.Y.; Singh, R.V.K.; Audu, A.; James, T. Modified class of estimator for finite population mean under two-phase sampling using regression estimation approach. Asian. J. Prob. Stat. 2021, 4, 52–64. [Google Scholar] [CrossRef]
  4. Jabbar, M.; Javid, Z.; Zaheer, A.; Zainab, R. Ratio type exponential estimator for the estimation of finite population variance under two-stage sampling. Res. J. Appl. Sci. Eng. Technol. 2014, 7, 4095–4099. [Google Scholar] [CrossRef]
  5. Qureshi, M.N.; Tariq, M.U.; Hanif, M. Memory-type ratio and product estimators for population variance using exponentially weighted moving averages for time-scaled surveys. Commun. Stat. Simul. Comput. 2024, 53, 1484–1493. [Google Scholar] [CrossRef]
  6. Sanaullah, A.; Hanif, M.; Asghar, A. Generalized exponential estimators for population variance under two-phase sampling. Int. J. Appl. Comput. Math. 2016, 2, 75–84. [Google Scholar] [CrossRef]
  7. Singh, H.P.; Singh, S.; Kim, J.M. Efficient use of auxiliary variables in estimating finite population variance in two-phase sampling. Int. Commun. Stat. Appl. Methods 2010, 17, 165–181. [Google Scholar] [CrossRef]
  8. Khan, M. Improvement in estimating the finite population mean under maximum and minimum values in double sampling scheme. J. Stat. Appl. Probab. Lett. 2015, 2, 115–121. [Google Scholar]
  9. Vishwakarma, G.K.; Zeeshan, S.M. Generalized ratio-cum-product estimator for finite population mean under two-phase sampling scheme. J. Mod. Appl. Stat. Meth. 2020, 19, 1–16. [Google Scholar] [CrossRef]
  10. Zaman, T.; Kadilar, C. New class of exponential estimators for finite population mean in two-phase sampling. Commun. Stat. Theory Methods 2021, 50, 874–889. [Google Scholar] [CrossRef]
  11. Das, A.K.; Tripathi, T.P. Use of auxiliary information in estimating the finite population variance. Sankhya Indian J. Stat. Ser. C 1978, 40, 39–148. [Google Scholar]
  12. Isaki, C.T. Variance estimation using auxiliary information. J. Am. Stat. Assoc. 1983, 78, 117–123. [Google Scholar] [CrossRef]
  13. Bahl, S.; Tuteja, R. Ratio and product type exponential estimators. J. Inf. Optim. Sci. 1991, 12, 159–164. [Google Scholar] [CrossRef]
  14. Upadhyaya, L.; Singh, H. An estimator for population variance that utilizes the kurtosis of an auxiliary variable in sample surveys. Vikram Math. J. 1999, 19, 14–17. [Google Scholar]
  15. Dubey, V.; Sharma, H. On estimating population variance using auxiliary information. Stat. Transit. New Ser. 2008, 9, 7–18. [Google Scholar]
  16. Kadilar, C.; Cingi, H. Ratio estimators for the population variance in simple and stratified random sampling. Appl. Math. Comput. 2006, 173, 1047–1059. [Google Scholar] [CrossRef]
  17. Singh, H.; Chandra, P. An alternative to ratio estimator of the population variance in sample surveys. J. Transp. Stat. 2008, 9, 89–103. [Google Scholar]
  18. Shabbir, J.; Gupta, S. Some estimators of finite population variance of stratified sample mean. Commun. Stat. Theory Methods 2010, 39, 3001–3008. [Google Scholar] [CrossRef]
  19. Singh, H.P.; Solanki, R.S. A new procedure for variance estimation in simple random sampling using auxiliary information. J. Stat. Pap. 2013, 54, 479–497. [Google Scholar] [CrossRef]
  20. Yadav, S.K.; Kadilar, C.; Shabbir, J.; Gupta, S. Improved family of estimators of population variance in simple random sampling. J. Stat. Theory Pract. 2015, 9, 219–226. [Google Scholar] [CrossRef]
  21. Shabbir, J.; Gupta, S. Using rank of the auxiliary variable in estimating variance of the stratified sample mean. Int. J. Comput. Theor. Stat. 2019, 6. [Google Scholar] [CrossRef]
  22. Zaman, T.; Bulut, H. An efficient family of robust-type estimators for the population variance in simple and stratified random sampling. Commun. Stat. Theory Methods 2023, 52, 2610–2624. [Google Scholar] [CrossRef]
  23. Mohanty, S.; Sahoo, J. A note on improving the ratio method of estimation through linear transformation using certain known population parameters. Sankhyā Indian J. Stat. Ser. 1995, 57, 93–102. [Google Scholar]
  24. Khan, M.; Shabbir, J. Some improved ratio, product, and regression estimators of finite population mean when using minimum and maximum values. Sci. World J. 2013, 2013, 431868. [Google Scholar] [CrossRef]
  25. Daraz, U.; Shabbir, J.; Khan, H. Estimation of finite population mean by using minimum and maximum values in stratified random sampling. J. Mod. Appl. Stat. Methods 2018, 17, 20. [Google Scholar] [CrossRef]
  26. Daraz, U.; Khan, M. Estimation of variance of the difference-cum-ratio-type exponential estimator in simple random sampling. Res. Math. Stat. 2021, 8, 1899402. [Google Scholar] [CrossRef]
  27. Daraz, U.; Wu, J.; Albalawi, O. Double exponential ratio estimator of a finite population variance under extreme values in simple random sampling. Mathematics 2024, 12, 1737. [Google Scholar] [CrossRef]
  28. Daraz, U.; Wu, J.; Alomair, M.A.; Aldoghan, L.A. New classes of difference cum-ratio-type exponential estimators for a finite population variance in stratified random sampling. Heliyon 2024, 10, e33402. [Google Scholar] [CrossRef]
  29. Daraz, U.; Alomair, M.A.; Albalawi, O. Variance estimation under some transformation for both symmetric and asymmetric data. Symmetry 2024, 16, 957. [Google Scholar] [CrossRef]
  30. Cekim, H.O.; Cingi, H. Some estimator types for population mean using linear transformation with the help of the minimum and maximum values of the auxiliary variable. Hacet. J. Math. Stat. 2017, 46, 685–694. [Google Scholar]
  31. Chatterjee, S.; Hadi, A.S. Regression Analysis by Example; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
  32. Walia, G.S.; Kaur, H.; Sharma, M. Ratio type estimator of population mean through efficient linear transformation. Am. J. Math. Stat. 2015, 5, 144–149. [Google Scholar]
  33. Watson, D.J. The estimation of leaf area in field crops. J. Agric. Sci. 1937, 27, 474–483. [Google Scholar] [CrossRef]
  34. Bureau of Statistics. Punjab Development Statistics Government of the Punjab, Lahore, Pakistan; Bureau of Statistics: Lahore, Pakistan, 2013.
  35. Cochran, W.B. Sampling Techniques; John Wiley and Sons: Hoboken, NJ, USA, 1963. [Google Scholar]
Table 1. Some classes of the proposed estimator.
Table 1. Some classes of the proposed estimator.
Subsets of the Proposed Estimator S ^ T 2 γ 1 γ 2
S ^ T 1 2 = s y 2 exp θ 1 β 2 ( x ) s ´ x 2 s x 2 β 2 ( x ) s ´ x 2 + s x 2 + 2 ( x M x m ) exp θ 2 δ β 2 ( x ) x M x m
S ^ T 2 2 = s y 2 exp θ 1 c x s ´ x 2 s x 2 c x s ´ x 2 + s x 2 + 2 ( x M x m ) exp θ 2 δ c x x M x m
S ^ T 3 2 = s y 2 exp θ 1 ( x M x m ) s ´ x 2 s x 2 ( x M x m ) s ´ x 2 + s x 2 + 2 c x exp θ 2 δ x M x m c x
S ^ T 4 2 = s y 2 exp θ 1 ( x M x m ) s ´ x 2 s x 2 ( x M x m ) s ´ x 2 + s x 2 2 c x exp θ 2 δ x M x m c x
S ^ T 5 2 = s y 2 e x p θ 1 ( x M x m ) s ´ x 2 s x 2 ( x M x m ) s ´ x 2 + s x 2 + 2 β 2 ( x ) exp θ 2 δ x M x m β 2 ( x )
S ^ T 6 2 = s y 2 exp θ 1 β 2 ( x ) s ´ x 2 s x 2 β 2 ( x ) s ´ x 2 + s x 2 + 2 ( x M x m ) exp θ 2 δ β 2 ( x ) x M x m
S ^ T 6 2 = s y 2 exp θ 1 ( x M x m ) s ´ x 2 s x 2 ( x M x m ) s ´ x 2 + s x 2 2 β 2 ( x ) exp θ 2 δ x M x m β 2 ( x )
S ^ T 8 2 = s y 2 exp θ 1 c x s ´ x 2 s x 2 c x s ´ x 2 + s x 2 + 2 ( x M x m ) exp θ 2 δ c x x M x m
where δ = s ´ r 2 s r 2 s ´ r 2 + s r 2 + 2 ( R M R m ) .
Table 2. Percent relative efficiency ( P R E ) of the estimators based on artificial populations.
Table 2. Percent relative efficiency ( P R E ) of the estimators based on artificial populations.
Estimator Uni ( 2 , 4 ) Uni ( 0 , 1 ) Gam ( 3 , 6 ) Gam ( 5 , 10 ) Exp ( 4 ) Exp ( 6 )
S ^ y 2 100100100100100100
S ^ R 2 130.789119.760123.025118.247130.126117.587
S ^ l r 2 140.970133.638145.188132.315132.505118.765
S ^ t b 2 146.486137.946138.344134.200131.425116.078
S ^ s u 2 153.1451 36.108142.670136.526139.772119.589
S ^ c k 1 2 155.980145.964146.520138.789134.405117.164
S ^ c k 2 2 155.1121 45.123145.345138.002133.408117.664
S ^ c k 3 2 154.304144.245143.156137.328131.528117.589
S ^ T 1 2 214.668210.356180.712190.139177. 547183.329
S ^ T 2 2 383.129415.467350.225363.167324.724310.289
S ^ T 3 2 289.189286.578300.667289.369250.949244.345
S ^ T 4 2 192.707198.689210.576195.508187.333172.148
S ^ T 5 2 172.837178.790190.031173.625160.495156.279
S ^ T 6 2 220.456232.098230.321212.353202.132192.399
S ^ T 7 2 200.065222.987210.401202.323188.369188.950
S ^ T 8 2 226.825240.876246.677250.688230. 712222.952
Table 3. Summary statistics for Data 1.
Table 3. Summary statistics for Data 1.
N = 36 n ´ = 15 n = 6 X ¯ = 1659.58
Y ¯ = 4015.22 R ¯ = 18.50 X M = 2379 X m = 49
R M = 36 R m = 1 S x = 3410 S y = 8512
S r = 10.53 C x = 2.06 C y = 2.12 C r = 0.57
ρ y x = 0.65 ρ y r = 0.36 ρ x r = 0.75 Δ 400 = 2051
Δ 040 = 2237 Δ 004 = 3297 Δ 220 = 1542 Δ 202 = 2698
Δ 022 = 3698 η = 0.14 η = 0.04 η = 0.10
Table 4. Summary statistics for Data 2.
Table 4. Summary statistics for Data 2.
N = 33 n ´ = 15 n = 6 X ¯ = 72.55
Y ¯ = 27.49 R ¯ = 17 X M = 95 X m = 58
R M = 33 R m = 1 S x = 10.58 S y = 10.13
S r = 9.64 C x = 0.14 C y = 0.37 C r = 0.57
ρ y x = 0.25 ρ y r = 0.20 ρ x r = 0.98 Δ 400 = 5.55
Δ 040 = 2.08 Δ 004 = 1.10 Δ 220 = 2.22 Δ 202 = 0.94
Δ 022 = 1.54 η = 0.14 η = 0.04 η = 0.10
Table 5. Summary statistics for Data 3.
Table 5. Summary statistics for Data 3.
N = 36 n ´ = 15 n = 6 X ¯ = 1054.39
Y ¯ = 14818.70 R ¯ = 18.50 X M = 2370 X m = 39
R M = 36 R m = 1 S x = 2214.22 S y = 32601.14
S r = 10.544 C x = 2.10 C y = 22.07 C r = 0.56
ρ y x = 0.69 ρ y r = 0.39 ρ x r = 0.84 Δ 400 = 236.50
Δ 040 = 209.80 Δ 004 = 329.70 Δ 220 = 397.50 Δ 202 = 365.80
Δ 022 = 469.80 η = 0.14 η = 0.04 η = 0.10
Table 6. Percent relative efficiency using empirical data sets.
Table 6. Percent relative efficiency using empirical data sets.
EstimatorData 1Data 2Data 3
S ^ y 2 100100100
S ^ R 2 141.830143.664121.469
S ^ l r 2 143.225104.673125.532
S ^ t b 2 152.034128.010115.634
S ^ s u 2 141.858143.509121.380
S ^ c k 1 2 141.839143.655121.460
S ^ c k 2 2 141.839143.660121.460
S ^ c k 3 2 141.849142.044121.469
S ^ T 1 2 147.360154.388133.017
S ^ T 2 2 276.108254.005290.116
S ^ T 3 2 235.023220.660212.978
S ^ T 4 2 162.723179.478186.215
S ^ T 5 2 152.411160.308170.165
S ^ T 6 2 212.907192.115208.864
S ^ T 7 2 188.606181.529198.220
S ^ T 8 2 226.560196.409263.810
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Daraz, U.; Alomair, M.A.; Albalawi, O.; Al Naim, A.S. New Techniques for Estimating Finite Population Variance Using Ranks of Auxiliary Variable in Two-Stage Sampling. Mathematics 2024, 12, 2741. https://doi.org/10.3390/math12172741

AMA Style

Daraz U, Alomair MA, Albalawi O, Al Naim AS. New Techniques for Estimating Finite Population Variance Using Ranks of Auxiliary Variable in Two-Stage Sampling. Mathematics. 2024; 12(17):2741. https://doi.org/10.3390/math12172741

Chicago/Turabian Style

Daraz, Umer, Mohammed Ahmed Alomair, Olayan Albalawi, and Abdulaziz S. Al Naim. 2024. "New Techniques for Estimating Finite Population Variance Using Ranks of Auxiliary Variable in Two-Stage Sampling" Mathematics 12, no. 17: 2741. https://doi.org/10.3390/math12172741

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop