Next Article in Journal
Experimental Investigations on the Cavitation Bubble Dynamics near the Boundary of a Narrow Gap
Previous Article in Journal
Special Issue: “Fluctuating Asymmetry as a Measure of Stress: Influence of Natural and Anthropogenic Factors”
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Optimizing Variance Estimation in Stratified Random Sampling through a Log-Type Estimator for Finite Populations

by
Gullinkala Ramya Venkata Triveni
1,
Faizan Danish
1,* and
Olayan Albalawi
2
1
Department of Mathematics, School of Advanced Sciences, VIT-AP University, Inavolu, Beside AP Secretariat, Amaravati AP-522237, India
2
Department of Statistics, Faculty of Science, University of Tabuk, Tabuk 47713, Saudi Arabia
*
Author to whom correspondence should be addressed.
Symmetry 2024, 16(5), 540; https://doi.org/10.3390/sym16050540
Submission received: 16 March 2024 / Revised: 13 April 2024 / Accepted: 17 April 2024 / Published: 1 May 2024
(This article belongs to the Section Mathematics)

Abstract

:
In this research, a logarithmic-type estimator was formulated for estimating the finite population variance in stratified random sampling. By ensuring that the sampling process is symmetrically conducted across the population, biases can be minimized, and the sample is more likely to be representative of the population as a whole. We conducted a comprehensive numerical study and simulation study to evaluate the performance of the proposed estimator. The mean squared error values were computed for both our proposed estimator and several existing ones, including the standard unbiased variance estimator, difference-type estimator, and other considered estimators. The results of the numerical study and simulation study demonstrated that the proposed log-type estimator outperforms the other considered estimators in terms of MSE and percentage relative efficiency. Graphical representations of the results are also provided to illustrate the efficiency of the proposed estimator. Based on the findings of this study, we conclude that the proposed log-type estimator is a valuable addition to the existing literature on variance estimation in stratified random sampling. It provides a more efficient and accurate estimate of the population variance, which can be beneficial for various statistical applications.

1. Introduction

In survey sampling, it is critical to ensure accurate and exact estimations of population parameters. When creating strata in stratified sampling, symmetry can be applied to ensure that each stratum is internally homogeneous and balanced. This involves dividing the population into groups that exhibit similar characteristics, creating symmetric groupings. For example, if you’re stratifying by income levels, you might aim to create strata with similar income distributions within each group, thereby achieving symmetry.
This study explores the intricate realm of variance estimation in stratified random sampling (STRS), a technique often used to improve survey efficiency by splitting the population into distinct strata. Understanding and resolving the sources of variation within and between strata is crucial for creating accurate estimates. This work also emphasizes the importance of log-type estimators in the context of variance estimation. In the STRS paradigm, the use of log-type estimators can play a critical role in contributing to more robust and accurate variance estimations.
The role of stratified sampling in estimating population variance is discussed by [1]. Later, ref. [2] presented more precise variance estimators for predicting population variance that leverage auxiliary information to reduce bias and improve estimate when compared to existing approaches, therefore supporting numerous sectors that rely on correct variance estimation. In simple random sampling (SRS), ref. [3] as well as, ref. [4] suggested a variance estimator. They compared the proficiency of the estimator to the traditional ratio estimator and ref. [2] estimator, and theoretical and numerical investigations were used to demonstrate the effectiveness of the suggested estimator. Later, ref. [5] extended their research to variance-type ratio estimators in both SRS and STRS, demonstrating the efficacy of the proposed estimator. Further, ref. [6] established a new ratio-type exponential estimator in SRS that is superior to classic ratio, regression, refs. [2,5] estimators. Later, ref. [7] introduced unbiased estimators for population variance using equilibrated stratification and obtained lower variances. Further, ref. [8] suggested exponential ratio- and product-type estimators using bivariate data of auxiliary variables and illustrated the efficiency of these estimators through empirical research.
Further, ref. [9] introduced a category of exponential estimators, demonstrating their effectiveness over other methods in terms of bias and MSE using the provided dataset. Novel estimators using known population parameters to estimate variance are introduced by [10], comparing them to established estimators and showing their superiority under optimal conditions through bias and MSE analysis. An empirical study validates the proposed estimators’ effectiveness. Refs. [11,12] introduced a category of estimators and proved its effectiveness over others by utilizing four datasets. Additionally, by analyzing large sample properties, they demonstrated their superior efficiency over various existing estimators by employing a numerical study. By utilizing bivariate auxiliary information to estimate population variance, ref. [13] proposed a novel generalized exponential estimator. This analysis showed its enhanced efficiency compared to existing estimators through empirical and simulated studies. For estimating population variance, ref. [14] suggested a log-type estimator. For population variance, ref. [15] introduced an innovative set of exponential ratio estimators within the context of STRS, demonstrating equal optimal efficiency with regression estimators and outperforming classical ratio estimators by using analytical and numerical results. Later, ref. [16] offered a few estimators for finite population variance, and ref. [17] proposed a new class of estimators and ranks in STRS for finite population variance, outperforming conventional estimators in efficiency on empirical evaluation with real data analysis. Further, ref. [18] introduced innovative variance estimators using ln-function in STRS, outperforming conventional estimators. The separate method showcases superior efficiency, validated by MSE derivation, numerical examples, and simulations. Ref. [19] proposed variance estimator by using L-moments approach under double stratified sampling. Later, ref. [20] recommended generalized variance estimators by using single and double auxiliary variables and proved their efficiency over others by employing empirical and simulation studies. Further, refs. [21,22,23] proposed various variance estimators. Ref. [24] proposed hybrid estimators in SRS. Theoretical comparisons and empirical evidence showcase their enhanced efficiency over other estimators. Further, ref. [25] introduced an improved variance estimator and proved its efficiency with others by using three datasets. Ref. [26] proposed an advanced variance estimator and showed its superiority by utilizing numerical and simulation studies with real datasets. Further, ref. [27] proposed a nonparametric maximum likelihood estimator (MLE), developed using the EM algorithm and a likelihood based on order statistics, which outperforms over other considered estimators. Later, ref. [28] suggested an exponential ratio with a product estimator was proposed for the estimation of population variance in SRS. Empirical validation confirms theoretical discoveries and assists data practitioners. Further, ref. [29] introduced finite population variance estimation in random responses via SRS for applied and environmental sciences and proved its effectiveness over others. Refs. [30,31] explored innovative approaches for variance estimation in sampling methodologies, particularly focusing on L-moments and calibration techniques. Their work contributed to refining variance estimation methods, especially in the context of stratified and double stratified random sampling method, with practical applications including analyses related to the COVID-19 pandemic. A ratio-type estimator was proposed by [32] and ref. [33] suggested an estimator in conditional and unconditional post-stratification.
Expanding on the contributions of [31], future directions may involve refining variance estimation methods through a deeper exploration of calibration approaches and the integration of L-moments in diverse sampling frameworks. Additionally, there is potential for investigating the robustness and scalability of these methods across various domains, with a focus on enhancing their applicability in real-world data analysis contexts beyond epidemiological studies. Furthermore, efforts to streamline implementation and improve computational efficiency could enhance the practical utility of these variance estimators in large-scale surveys and monitoring programs.
The existing literature lacks a comprehensive exploration of variance estimation through log-type estimators. In this study, our objective is to introduce a log-type estimator tailored for estimating population variance within the framework of stratified random sampling. We develop a logarithmic estimator for population variance, detailed in Section 4. Through a comparative analysis with established methods outlined in the current literature, and considering the conditions delineated in Section 5, we derive valuable insights. Empirical findings presented in Section 6, along with simulation investigations, corroborate the superior efficiency of our proposed estimator over alternative approaches.

2. Notations

Consider a finite population χ = χ 1 , χ 2 , χ N , comprising N units distributed across L strata. Let z 1 h i   a n d   z 2 h i represent the characteristics of the study variable ( z 1 ) and auxiliary variable ( z 2 ) , r e s p e c t i v e l y , in stratum h such that h = 1 l N h = N . A sample of n h units are drawn from N h in each stratum satisfying h = 1 l n h = n .
Let s z 1 h 2 = 1 n h 1 h = 1 n h ( z 1 h i z ¯ 1 h )   a n d   s z 2 h 2 = 1 n h 1 h = 1 n h ( z 2 h i z ¯ 2 h ) represent the sample variances accordingly to the population variances S z 1 h 2 = 1 N h 1 h = 1 N h ( z 1 h i Z ¯ 1 h ) and S z 2 h 2 = 1 N h 1 h = 1 N h ( z 2 h i Z ¯ 2 h ) . Here, z ¯ 1 h and z ¯ 2 h represent the sample means according to the population means Z ¯ 1 h and Z ¯ 2 h .
We assume error terms to obtain the equations for bias and MSE for the variance estimators as
ζ 0 h = s z 1 h 2 S z 1 h 2 S z 1 h 2   a n d   ζ 1 h = s z 2 h 2 S z 2 h 2 S z 2 h 2  
E ( ζ 0 h ) = E ζ 1 h = 0
E ζ 0 h 2 = γ h ψ 40 h = ϑ 0 h   ( s a y )
E ζ 1 h 2 = γ h ψ 04 h = ϑ 1 h   ( s a y )  
E ζ 0 h ζ 1 h = γ h ψ 22 h = ϑ 01 h   ( s a y )
where
ψ 40 h = ( ψ 40 h 1 ) ,   ψ 04 h = ( ψ 04 h 1 ) ,   ψ 22 h = ( ψ 22 h 1 ) ,
ψ a b h = ξ a b h ξ 20 h a 2 ξ 02 h b 2 ,     ξ a b h = 1 N h 1 h = 1 N ( z 1 h i Z ¯ 1 h ) a   ( z 2 h i Z ¯ 2 h ) b ,   γ h = 1 n h 1 N h .

3. Review of the Literature

The literature contains various variance estimators utilized in STRS, accompanied by their respective Var/MSE formulae. Employing these estimators, we compared them with the proposed estimator, identifying pertinent conditions crucial for evaluating efficiency in comparisons.
1.
The unbiased variance estimator is
t 1 ( s t ) = h = 1 l W h 2 γ h s z 1 h 2
Variance of t 1 ( s t ) is given by
V a r   t 1 s t = h = 1 l W h 4 γ h 3 S z 1 h 4 ψ 40 h
2.
The usual difference-type estimator is
t 2 = h = 1 l W h 2 γ h s z 1 h 2 + θ h ( S z 2 h 2 s z 2 h 2 )
where θ h is unknown. Its optimum value is θ h = S z 1 h 2 ψ 22 h S z 2 h 2 ψ 04 h .
The minimum variance of t 2 is attained at the optimum value of θ h ,
V a r   t 2 = h = 1 l W h 4 γ h 3 S z 1 h 4 ψ 40 h ( 1 r h 2 )
where r h = ψ 22 h ψ 40 h ψ 04 h .
3.
The population variance’s unbiased estimator as provided by [3] is
t 3 = h = 1 l W h 2 γ h s z 1 h 2 s z 2 h 2 S z 2 h 2 + 1
and its variance is given by
M S E ( t 3 ) = h = 1 l W h 4 γ h 3 S z 1 h 4 ψ 40 h + ψ 04 h S z 1 h 4 2 ψ 22 h S z 1 h 2
4.
We transformed the [10] estimator in STRS as
t 4 = h = 1 l W h 2 γ h s z 1 h 2 π S z 2 h 2 s z 2 h 2 + ( 1 π ) s z 2 h 2 S z 2 h 2
where π is a suitable constant.
The minimum MSE of t 4 for the optimum value of π is provided as
M S E   m i n t 4 = h = 1 l W h 4 γ h 3 S z 1 h 4 ψ 40 h ( ψ 22 h ) 2 ψ 04 h
5.
Ref. [16] suggested an estimator such as
t 5 = h = 1 l W h 2 γ h s z 1 h 2 ω h S z 2 h 2 s z 2 h 2
where ω h = 1 + ψ 04 h ψ 22 h 1 + 3 ψ 04 h 4 ψ 22 h + ψ 40 h .
By using the optimum value of ω h , we obtain MSE as
M S E   m i n t 5 = h = 1 l W h 4 γ h 3 S z 1 h 4 1 ( 1 + ψ 04 h ψ 22 h ) 2 1 + γ h 3 ψ 04 h 4 ψ 22 h + ψ 40 h
6.
A suggested a generalized exponential ratio with a product estimator was suggested by [28] is
t 6 = h = 1 l W h 2 γ h s z 1 h 2 S z 2 h 2 s z 2 h 2 τ 1 S z 2 h 2 s z 2 h 2 τ + 1
where τ is a suitable value to minimize the MSE of t 6 , as follows:
M S E   m i n t 6 = h = 1 l W h 4 γ h 3 S z 1 h 4 ψ 40 h ( ψ 22 h ) 2 ψ 04 h

4. Proposed Estimator

To better estimate variance, we introduced a log-type estimator, enhancing accuracy and reliability. This method improves precision and reliability in variance estimation, offering a more effective alternative to conventional approaches. Below is the combination of difference and ratio type logarithmic estimator.
t p r o p = h = 1 l W h 2 γ h [ R 1 s z 1 h 2 + R 2 ( S z 2 h 2 s z 2 h 2 ) ] l o g S z 2 h 2 s z 2 h 2 + α   w h e r e   α   i s   a   c o n s t a n t t p r o p = h = 1 l W h 2 γ h R 1 S z 1 h 2 1 + e 0 h + R 2 ( S z 2 h 2 S z 2 h 2 ( 1 + e 1 h ) ) l o g S z 2 h 2 S z 2 h 2 ( 1 + e 1 h ) + α = h = 1 l W h 2 γ h R 1 S z 1 h 2 + R 1 S z 1 h 2 e 0 h R 2 S z 2 h 2 e 1 h log α + 1 1 α + 1 e 1 h + 2 α + 1 2 α + 1 2 e 1 h 2
By taking deviation on both sides with h = 1 l W h 2 γ h S z 1 h 2 , we have
t p r o p h = 1 l W h 2 γ h S z 1 h 2 = h = 1 l W h 2 γ h S z 1 h 2 { R 1 log α + 1 1 + ( R 1 log α + 1 e 0 h ) R 1 α + 1 e 1 h + 2 α + 1 R 1 2 α + 1 2 e 1 h 2 R 1 α + 1 e 0 h e 1 h } + h = 1 l W h 2 γ h S z 2 h 2 R 2 α + 1 e 1 h 2 R 2 log α + 1 e 1 h
By computing expectations on both sides of Equation (10), the resulting outcome yields the bias.
B i a s t p r o p   = h = 1 l W h 2 γ h S z 1 h 2 R 1 log α + 1 1 + 2 α + 1 R 1 2 α + 1 2 ϑ 1 h R 1 α + 1 ϑ 01 h + h = 1 l W h 2 γ h S z 2 h 2 R 2 α + 1 ϑ 1 h
Upon squaring both sides of Equation (10) and subsequently computing expectations, the mean squared error (MSE) is derived as
  t p r o p h = 1 l W h 2 γ h S z 1 h 2 2 = h = 1 l W h 4 γ h 2 S z 1 h 4 { R 1 log α + 1 1 2 + ( R 1 log α + 1 ) 2 ϑ 0 h + R 1 2 ( α + 1 ) 2 + R 1 2 l o g ( α + 1 ) ( 2 α + 1 ) ( α + 1 ) 2 2 α + 1 ( α + 1 ) 2 ϑ 1 h 4 R 1 2 l o g ( α + 1 ) α + 1 2 R 1 α + 1 ϑ 01 h } + h = 1 l W h 4 γ h 2 S z 2 h 4 ( R 2 log α + 1 ) 2 ϑ 1 h + h = 1 l W h 4 γ h 2 S z 1 h 2 S z 2 h 2 4 R 1 R 2 α + 1 log α + 1 ϑ 1 h 2 R 1 R 2 ( l o g ( α + 1 ) ) 2 ϑ 01 h 2 R 2 α + 1 ϑ 1 h M S E   t p r o p = h = 1 l W h 4 γ h 2 S z 1 h 4 { 1 2 α + 1 ( α + 1 ) 2 ϑ 1 h + R 1 2 ϑ 01 h α + 1 2 l o g ( α + 1 ) + R 1 2 ( log α + 1 2 ( 1 + ϑ 0 h ) + ϑ 1 h 1 ( α + 1 ) 2 + ( 2 α + 1 ) log α + 1 ( α + 1 ) 2 ϑ 01 h 4 log α + 1 α + 1 ) } + h = 1 l W h 4 γ h 2 S z 2 h 4 R 2 2 ( log α + 1 ) 2 ϑ 1 h + h = 1 l W h 4 γ h 2 S z 1 h 2 S z 2 h 2 R 2 2 α + 1 ϑ 1 h + R 1 R 2 4 l o g ( α + 1 ) α + 1 ϑ 1 h 2 ( l o g ( α + 1 ) ) 2 ϑ 01 h
By differentiation Equation (9) with respect to R 1 and R 2 and equating them with zero, we obtain
R 1 = s 2 s 1
where
s 1 = 4 h = 1 l W h 4 γ h 2 S z 1 h 4 h = 1 l W h 4 γ h 2 S z 2 h 4 ϑ 1 h log α + 1 2 [ log α + 1 2 1 + ϑ 0 h + ϑ 1 h α + 1 2 + 2 α + 1 log α + 1 α + 1 2 ϑ 1 h ϑ 01 h 4 log α + 1 α + 1 ] + h = 1 l W h 4 γ h 2 S z 2 h 2 S z 2 h 2 2 ( 2 ϑ 01 h log α + 1 2 4 log α + 1 α + 1 ϑ 1 h ) 2 ϑ 1 h α + 1 4 log α + 1 α + 1 ϑ 1 h + 2 ( log α + 1 ) 2 ϑ 01 h
s 2 = 4 h = 1 l W h 4 γ h 2 S z 1 h 4 h = 1 l W h 4 γ h 2 S z 2 h 4 log α + 1 ϑ 01 h α + 1 ϑ 1 h log α + 1 2 + h = 1 l W h 4 γ h 2 S z 2 h 2 S z 2 h 2 2 2 ϑ 01 h log α + 1 2 4 log α + 1 α + 1 ϑ 1 h 2 ϑ 1 h α + 1
a n d   R 2 = ( h = 1 l W h 4 γ h 2 S z 2 h 2 S z 2 h 2 ) 2 α + 1 ϑ 1 h R 1 4 log α + 1 α + 1 ϑ 1 h 2 ϑ 01 h ( log α + 1 ) 2 2 ϑ 1 h h = 1 l W h 4 γ h 2 S z 2 h 4 ( log α + 1 ) 2
By substituting the values of R 1 and R 2 into Equation (9), we obtain the minimum MSE as
M S E m i n   t p r o p = h = 1 l W h 4 γ h 2 S z 1 h 4 1 2 α + 1 ( α + 1 ) 2 ϑ 1 h + R 1 h = 1 l W h 4 γ h 2 S z 1 h 4 2 ϑ 01 h α + 1 2 l o g ( α + 1 ) + R 2 h = 1 l W h 4 γ h 2 S z 1 h 2 S z 2 h 2 2 α + 1 ϑ 1 h + R 1 2 { h = 1 l W h 4 γ h 2 S z 1 h 4 ( log α + 1 2 ( 1 + ϑ 0 h ) + ϑ 1 h 1 ( α + 1 ) 2 + ( 2 α + 1 ) log α + 1 ( α + 1 ) 2 ϑ 01 h 4 log α + 1 α + 1 ) } + R 2 2 h = 1 l W h 4 γ h 2 S z 2 h 4 ( log α + 1 ) 2 ϑ 1 h + R 1 R 2 h = 1 l W h 4 γ h 2 S z 1 h 2 S z 2 h 2 4 l o g ( α + 1 ) α + 1 ϑ 1 h 2 ( l o g ( α + 1 ) ) 2 ϑ 01 h
M S E m i n   t p r o p   = c o n s t + R 1 η 1 + R 2 η 2 + R 1 2 η 3 + R 2 2 η 4 + R 1 R 2 η 5
where
c o n s t = h = 1 l W h 4 γ h 2 S z 1 h 4 1 + 2 α + 1 α + 1 2 ϑ 1 h
η 1 = h = 1 l W h 4 γ h 2 S z 1 h 4 2 ϑ 01 h α + 1 2 l o g ( α + 1 )
η 2 = h = 1 l W h 4 γ h 2 S z 1 h 2 S z 2 h 2 2 α + 1 ϑ 1 h
η 3 = h = 1 l W h 4 γ h 2 S z 1 h 4 log α + 1 2 ( 1 + ϑ 0 h ) + ϑ 1 h 1 ( α + 1 ) 2 + ( 2 α + 1 ) log α + 1 ( α + 1 ) 2 ϑ 01 h 4 log α + 1 α + 1
η 4 = h = 1 l W h 4 γ h 2 S z 2 h 4 l o g ( α + 1 ) 2 ϑ 1 h
η 5 = h = 1 l W h 4 γ h 2 S z 1 h 2 S z 2 h 2 4 l o g ( α + 1 ) α + 1 ϑ 1 h 2 ( l o g ( α + 1 ) ) 2 ϑ 01 h

5. Comparison of Efficiency

In this research, we theoretically specified numerous conditions for comparing the proposed estimators to a variety of traditional and existing estimators used in this context. This comparison analysis provides insights into why the proposed estimators outperform others, particularly with regard to MSE and percentage relative efficiency (PRE).
From (1) and (10), we obtain
M S E m i n   t p r o p   V a r   t 1 s t 0
c o n s t + R 1 η 1 + R 2 η 2 + R 1 2 η 3 + R 2 2 η 4 + R 1 R 2 η 5 h = 1 l W h 4 γ h 3 S z 1 h 4 ψ 40 h 0
From (2) and (10), we obtain
M S E m i n   t p r o p   V a r   t 2 0
c o n s t + R 1 η 1 + R 2 η 2 + R 1 2 η 3 + R 2 2 η 4 + R 1 R 2 η 5 h = 1 l W h 4 γ h 3 S z 1 h 4 ψ 40 h ( 1 r h 2 ) 0
From (3) and (10), we obtain
M S E m i n   t p r o p   M S E t 3 0
c o n s t + R 1 η 1 + R 2 η 2 + R 1 2 η 3 + R 2 2 η 4 + R 1 R 2 η 5 h = 1 l W h 4 γ h 3 S z 1 h 4 ψ 40 h + ψ 04 h S z 1 h 4 2 ψ 22 h S z 1 h 2 0
From (4) and (10), we obtain
M S E m i n   t p r o p   M S E   m i n t 4 0
c o n s t + R 1 η 1 + R 2 η 2 + R 1 2 η 3 + R 2 2 η 4 + R 1 R 2 η 5 h = 1 l W h 4 γ h 3 S z 1 h 4 ψ 40 h ( ψ 22 h ) 2 ψ 04 h 0
From (5) and (10), we obtain
M S E m i n   t p r o p   M S E   m i n t 5 0
c o n s t + R 1 η 1 + R 2 η 2 + R 1 2 η 3 + R 2 2 η 4 + R 1 R 2 η 5 h = 1 l W h 4 γ h 3 S z 1 h 4 1 ( 1 + ψ 04 h ψ 22 h ) 2 1 + γ h 3 ψ 04 h 4 ψ 22 h + ψ 40 h 0  
From (6) and (10), we obtain
M S E m i n   t p r o p   M S E   m i n t 6 0
c o n s t + R 1 η 1 + R 2 η 2 + R 1 2 η 3 + R 2 2 η 4 + R 1 R 2 η 5 h = 1 l W h 4 γ h 3 S z 1 h 4 ψ 40 h ( ψ 22 h ) 2 ψ 04 h 0

5.1. Quantitative Assessment

Population-I: We used the data from [5]. The data are about the information on apple production amounts (considered as the primary variable of interest) and the count of apple trees (regarded as an auxiliary variable) originating from the dataset encompassing 854 villages across Turkey in the year 1999, sourced from the Institute of Statistics, Republic of Turkey. Initially, the data were stratified based on the distinct regions within Turkey. Symmetry can also be applied in determining the allocation of sample units to each stratum. Symmetric allocation ensures that each stratum receives a fair representation in the sample relative to its size and variability. This can involve proportional allocation based on the size of each stratum or optimal allocation methods that consider both stratum size and variability.
Following this stratification, a random sampling approach was employed to select villages from each region using Neyman allocation to determine sample sizes per stratum (region). Specifically, a predetermined sample size of n = 140 was utilized. Subsequently, after analyzing the outcomes of the sample sizes for individual regions, a decision was made to merge the two regions. Consequently, the data were organized into six strata, designated as follows: (1) Marmara, (2) Aegean, (3) Mediterranean, (4) Central Anatolia, (5) Black Sea, and (6) East and Southeast Anatolia.
The theoretical conditions outlined in Equations (11)–(16) are not only theoretically sound but were also validated numerically. Employing the data statistics provided in Table 1, we calculated the MSE values for the estimators, as detailed in Table 2. The results reveal that the proposed estimator exhibits a lower MSE value, coupled with a significantly higher PRE value. This indicates that, among the estimators considered in this study, the proposed estimator boasts the highest PRE, underscoring its superior performance.

5.2. Simulation Analysis

A simulation exercise was performed using the R program to show the proposed and considered estimators’ performance by using two populations.
(a)
Population-II: We subdivided N = 1500 into four subpopulations of varying sizes. We conducted 10,000 iterations to achieve efficient results. The models are as follows:
z 2 1 = r n o r m N 1 , 8 ,   3 , z 2 2 = r n o r m N 2 , 5 ,   2 ,
z 2 3 = r n o r m N 3 , 4 ,   1 ,   a n d   z 2 4 = r n o r m N 4 , 3 ,   1 ;
z 1 1 = 2 + 2.5 z 2 1 + e 1 ,   z 1 2 = 3 5 z 2 2 + e 2 ,
z 1 3 = 2 7.2 z 2 3 + e 3   a n d   z 1 4 = 1 + 5.5 z 2 4 + e 4 ;
(b)
Population–III: We divided 2000 samples into four strata and applied optimum allocation to obtain samples of strata. We conducted 20,000 iterations to obtain the MSE values of estimators. The models considered in this population are as follows:
z 2 1 = r n o r m N 1 , 8 ,   4 , z 2 2 = r n o r m N 2 , 6 ,   3 ,
z 2 3 = r n o r m N 3 , 5 ,   1 ,   a n d   z 2 4 = r n o r m N 4 , 9 ,   7 ;
z 1 1 = 2 + 3 z 2 1 + e 1 ,   z 1 2 = 3 5 6 2 + e 2 ,
z 1 3 = 2 8 z 2 3 + e 3   a n d   z 1 4 = 1 + 6 z 2 4 + e 4 ;
E r r o r   t e r m s   e i = r n o r m N i , 0,1 ;   i = 1 ,   2 ,   3 ,   4 .  
The PRE of the estimators was determined by employing
P R E t r , t 1 = M S E t 1 M S E t r 100
where r = 1, 2, 3, 4, 5, 6, prop.

5.3. Discussion of Results

This article introduced a logarithmic-type estimator specifically developed for estimating the finite population variance of a study variable. This estimator leverages the information from an auxiliary variable to enhance the precision of the variance estimation. This study emphasizes the importance of variance estimation to enhance the reliability of survey outcomes. We suggested an estimator and derived its bias and MSE equations, and we also considered the existing variance estimators from the literature and derived their MSE equations. When the considered estimator’s efficiency was compared with the proposed estimator’s efficiency, we obtained the theoretical conditions from (11) to (16).
By using a real dataset in Population-I, we computed the performance effectiveness of the estimators under consideration, including the proposed estimator, by assessing their MSE and PRE values. From Table 2, we can observe that the value of MSE is low compared to the other one. Also, the PRE is high, which indicates the importance of the proposed estimator.
Moreover, from simulation studies, we can prove the effectiveness of the proposed estimator. Table 3 reveals that the proposed estimator demonstrates superior efficiency in comparison to the existing methods. Here, we considered two populations and generated data by using a normal distribution and performed a simulation. In Populations II and III, we performed 10,000 and 20,000 replications respectively. After the replications, we obtained the data statistics of the replications’ average. Then, we found the values of MSE and PRE for all the considered and suggested estimators.
In the graphical representation plotted in Figure 1, the red, blue and green colors indicate the PRE values of Population-I, Population-II, and Population-III respectively. We considered estimator 7 as the proposed estimator in this study. We can observe that the proposed estimator’s PRE values are high compared to the others in all three populations. Our suggested log-type estimator’s superior performance highlights its potential as a valuable tool in variance estimation, providing a more efficient and reliable alternative to the existing approaches.
These results contribute to the ongoing discourse on refining statistical methodologies for survey research, providing a robust alternative for enhancing the precision of survey outcomes. As we navigate the implications of our findings, the proposed estimator stands as a promising avenue for further exploration and potential adoption in diverse sampling contexts, signaling a positive step forward in the evolution of variance estimation techniques.

6. Conclusions

In this study, we formulated a logarithmic-type estimator for finite population variance estimation and we conducted an in-depth analysis of its effectiveness using a real-world dataset. Our investigation delved into a meticulous comparison of our proposed estimator against the established methods, aiming to evaluate its performance comprehensively. The computation of MSE values served as a pivotal metric in assessing the efficiency of the proposed estimator and several existing ones.
The comparison set included well-known estimators such as the standard unbiased variance, difference type, and those proposed by [3,10,16,28]. In both numerical and simulation studies, we examined our proposed estimator’s performance across three distinct populations to understand its characteristics. From Table 2 and Table 3, it is clear that the proposed estimator performed better. From the graphical representation (Figure 1), we can also conclude that the proposed estimator achieved the greatest efficiency in comparison to the other considered estimators.
Upon a thorough examination and interpretation of the results, our findings unequivocally indicate the superior performance of the proposed logarithmic-type estimator. The proposed estimator consistently received favorable assessment metrics in both MSE and PRE values, suggesting its heightened accuracy and efficiency compared to the considered alternatives.

Author Contributions

Conceptualization, G.R.V.T. and F.D.; methodology, F.D., G.R.V.T. and O.A.; software, G.R.V.T.; validation, G.R.V.T. and F.D.; formal analysis, G.R.V.T.; investigation, F.D.; resources, F.D., G.R.V.T. and O.A.; data curation, G.R.V.T.; writing—original draft preparation, G.R.V.T. and F.D.; writing—review and editing, F.D. and O.A.; visualization, G.R.V.T. and F.D.; supervision, F.D.; project administration, F.D. and O.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Acknowledgments

We highly appreciate the efforts of the reviewers and the assigned editor along with Assistant Editor, for making improvements to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Wakimoto, K. Stratified random sampling (1) estimation of the population variance. Ann. Inst. Stat. Math. 1971, 23, 233–252. [Google Scholar] [CrossRef]
  2. Isaki, C.T. Variance estimation using auxiliary information. J. Am. Stat. Assoc. 1983, 78, 117–123. [Google Scholar] [CrossRef]
  3. Prasad, B.; Singh, H.P. Unbiased estimators of finite population variance using auxiliary information in sample surveys. Commun. Stat. Theory Methods 1992, 21, 1367–1376. [Google Scholar] [CrossRef]
  4. Kadilar, C.; Cingi, H. Improvement in variance estimation using auxiliary information. Hacet. J. Math. Stat. 2006, 35, 111–115. [Google Scholar]
  5. Kadilar, C.; Cingi, H. Ratio estimators for the population variance in simple and stratified random sampling. Appl. Math. Comput. 2006, 173, 1047–1059. [Google Scholar] [CrossRef]
  6. Shabbir, J.; Gupta, S.A.T. On improvement in variance estimation using auxiliary information. Commun. Stat. -Theory Methods 2007, 36, 2177–2185. [Google Scholar] [CrossRef]
  7. Espejo, M.R.; Singh, H.P.; Pineda, M.D.; Nadarajah, S. Optimal estimation of population variance using equilibrated stratified sampling from infinite populations. J. Korean Stat. Soc. 2008, 37, 375–383. [Google Scholar] [CrossRef]
  8. Singh, R.; Chauhan, P.; Sawan, N.; Smarandache, F. Improvement in estimating the population mean using exponential estimator in simple random sampling. Int. J. Stat. Econ. 2009, 3, 13–18. [Google Scholar]
  9. Koyuncu, N. Improved Estimators of Finite Population Variance in Stratified Random Sampling. World Appl. Sci. J. 2013, 23, 130–137. [Google Scholar]
  10. Yadav, S.K.; Kadilar, C. A class of ratio-cum-dual to ratio estimator of population variance. J. Reliab. Stat. Stud. 2013, 6, 29–34. [Google Scholar]
  11. Yadav, S.K.; Kadilar, C.; Shabbir, J.; Gupta, S. Improved family of estimators of population variance in simple random sampling. J. Stat. Theory Pract. 2015, 9, 219–226. [Google Scholar] [CrossRef]
  12. Yadav, S.K.; Mishra, S.S.; Kumar, S.; Kadilar, C. A new improved class of estimators for the population variance. J. Stat. Appl. Probab. 2016, 5, 385–392. [Google Scholar] [CrossRef]
  13. Sanaullah, A.; Asghar, A.; Hanif, M. General class of exponential estimator for estimating finite population variance. J. Reliab. Stat. Stud. 2017, 10, 1–16. [Google Scholar]
  14. Bhushan, S.; Kumari, C. A new log type estimator for estimating the population variance. Int. J. Comp. App. Math. 2018, 13, 43–54. [Google Scholar]
  15. Etebong, P.C. Improved family of ratio estimators of finite population variance in stratified random sampling. Biostat. Biom. Open Access J. 2018, 5, 55659. [Google Scholar] [CrossRef]
  16. Muili, J.O.; Singh, R.V.K.; Audu, A. Study of Efficiency of Some Finite Population Variance Estimators in Stratified Random Sampling. Cont. J. Appl. Sci. 2018, 13, 1–17. [Google Scholar]
  17. Shabbir, J.; Gupta, S. Using rank of the auxiliary variable in estimating variance of the stratified sample mean. Int. J. Comput. Theor. Stat. 2019, 6, 172–181. [Google Scholar] [CrossRef]
  18. Cekim, H.O.; Kadilar, C. In-type estimators for the population variance in stratified random sampling. Commun. Stat.-Simul. Comput. 2020, 49, 1665–1677. [Google Scholar] [CrossRef]
  19. Shahzad, U.; Ahmad, I.; Almanjahie, I.M.; Al-Noor, N.H. L-Moments Based Calibrated Variance Estimators Using Double Stratified Sampling. Comput. Mater. Contin. 2021, 68, 3412–3430. [Google Scholar] [CrossRef]
  20. Yasmeen, U.; Noor-ul-Amin, M. Estimation of Finite Population Variance Under Stratified Sampling Technique. J. Reliab. Stat. Stud. 2021, 14, 565–584. [Google Scholar] [CrossRef]
  21. Ahmad, S.; Hussain, S.; Shabbir, J.; Zahid, E.; Aamir, M.; Onyango, R. Improved estimation of finite population variance using dual supplementary information under stratified random sampling. Math. Probl. Eng. 2022, 2022, 3813952. [Google Scholar] [CrossRef]
  22. Aloraini, B.; Khalil, S.; Qureshi, M.N.; Gupta, S. Estimation of Population Variance for a Sensitive Variable in Stratified Sampling Using Randomized Response Technique: Accepted: June 2022. REVSTAT-Stat. J. 2022. Available online: https://revstat.ine.pt/index.php/REVSTAT/article/view/508 (accessed on 2 March 2024).
  23. Niaz, I.; Sanaullah, A.; Saleem, I.; Shabbir, J. An improved efficient class of estimators for the population variance. Concurr. Comput. Pract. Exp. 2022, 34, e6620. [Google Scholar] [CrossRef]
  24. Sanaullah, A.; Niaz, I.; Shabbir, J.; Ehsan, I. A class of hybrid type estimators for variance of a finite population in simple random sampling. Commun. Stat. Simul. Comput. 2022, 51, 5609–5619. [Google Scholar] [CrossRef]
  25. Ahmad, S.; Adichwal, N.K.; Aamir, M.; Shabbir, J.; Alsadat, N.; Elgarhy, M.; Ahmad, H. An enhanced estimator of finite population variance using two auxiliary variables under simple random sampling. Sci. Rep. 2023, 13, 21444. [Google Scholar] [CrossRef]
  26. Ahmad, S.; Al Mutairi, A.; Nassr, S.G.; Alsuhabi, H.; Kamal, M.; Rehman, M.U. A new approach for estimating variance of a population employing information obtained from a stratified random sampling. Heliyon 2023, 9, e21477. [Google Scholar] [CrossRef]
  27. Frey, J.; Zhang, Y. Nonparametric maximum likelihood estimation of the distribution function using ranked-set sampling. J. Korean Stat. Soc. 2023, 52, 901–920. [Google Scholar] [CrossRef]
  28. Jan, R.; Jan, T.R.; Danish, F. Generalised Exponential Ratio-Cum-Product Estimator for Estimating Population Variance in Simple Random Sampling. Reliab. Theory Appl. 2023, 18, 625–631. [Google Scholar]
  29. Javed, S.; Masood, S.; Shokri, A. Generalized Class of Finite Population Variance in the Presence of Random Nonresponse Using Simulation Approach. Complexity 2023, 2023, 6643435. [Google Scholar] [CrossRef]
  30. Shahzad, U.; Ahmad, I.; Almanjahie, I.M.; Al-Noor, N.H.; Hanif, M. A novel family of variance estimators based on L-moments and calibration approach under stratified random sampling. Commun. Stat. Simul. Comput. 2023, 52, 3782–3795. [Google Scholar] [CrossRef]
  31. Shahzad, U.; Ahmad, I.; Mufrah Almanjahie, I.; Hanif, M.; Al-Noor, N.H. L-Moments and calibration-based variance estimators under double stratified random sampling scheme: Application of Covid-19 pandemic. Sci. Iran. 2023, 30, 814–821. [Google Scholar] [CrossRef]
  32. Triveni, G.R.V.; Danish, F. Heuristical Approach for Optimizing Population Mean Using Ratio Estimator in Stratified Random Sampling. J. Reliab. Stat. Stud. 2023, 16, 137–152. [Google Scholar] [CrossRef]
  33. Triveni, G.R.V.; Danish, F. Leveraging Auxiliary Variables: Advancing Mean Estimation Through Conditional and Unconditional Post-Stratification. Reliab. Theory Appl. 2023, 18, 57–68. [Google Scholar]
Figure 1. Graphical representation of PRE values of the three populations.
Figure 1. Graphical representation of PRE values of the three populations.
Symmetry 16 00540 g001
Table 1. Data statistics.
Table 1. Data statistics.
N = 854 N h 10610694171204173
N = 140 n h 917386772
Z ¯ 1 = 29.30 Z ¯ 1 h 15.3722.1393.8455.889.674.04
Z ¯ 2 = 376.00 Z ¯ 2 h 243.76274.22724.10743.65264.4298.44
β 2 z 1 = 195.84 β 2 z 1 h 80.1397.6824.14101.9753.3827.96
β 2 z 2 = 312.07 β 2 z 2 h 25.7134.5726.1497.6027.4728.10
S z 1 = 171.06 S z 1 h 64.25115.52299.07286.4323.909.46
S z 2 = 1447.94 S z 2 h 491.89574.611607.572856.03454.03187.94
θ h 33.357.420.899.5321.0923.08
W h 0.120.120.110.200.240.20
Table 2. MSE and PRE values of the considered and proposed estimators for Population-I.
Table 2. MSE and PRE values of the considered and proposed estimators for Population-I.
EstimatorTheoretical ConditionsMSEPRE
V a r   t 1 s t M S E m i n   t p r o p   V a r   t 1 s t 0 1940.6100
V a r   t 2 M S E m i n   t p r o p   V a r   t 2 0 1940.6100
M S E t 3 M S E m i n   t p r o p   M S E t 3 0 1940.80.99
M S E   m i n t 4 M S E m i n   t p r o p   M S E t 4 0 1939.8100.04
M S E   m i n t 5 M S E m i n   t p r o p   M S E   m i n t 5 0 299.2648.60
M S E   m i n t 6 M S E m i n   t p r o p   M S E   m i n t 6 0 1939.8100.04
M S E m i n   t p r o p   M S E m i n   t p r o p   M S E   m i n t i 0 ,   i = 1 , , 6 184.81050.11
Table 3. Comparisons between the proposed estimator and other considered estimators through simulation.
Table 3. Comparisons between the proposed estimator and other considered estimators through simulation.
Population-IIPopulation-III
EstimatorMSEPREMSEPRE
V a r   t 1 s t 2.11432 × 10−6100.000.07100.00
V a r   t 2 7.43164 × 10−628.450.2427.88
M S E t 3 2.04132 × 10−6103.580.06103.21
M S E   m i n t 4 2.15414 × 10−698.150.1158.62
M S E   m i n t 5 1.28051 × 10−6165.120.05125.22
M S E   m i n t 6 2.15414 × 10−698.150.1158.62
M S E m i n   t p r o p   9.04361 × 10−7233.790.02369.10
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Triveni, G.R.V.; Danish, F.; Albalawi, O. Optimizing Variance Estimation in Stratified Random Sampling through a Log-Type Estimator for Finite Populations. Symmetry 2024, 16, 540. https://doi.org/10.3390/sym16050540

AMA Style

Triveni GRV, Danish F, Albalawi O. Optimizing Variance Estimation in Stratified Random Sampling through a Log-Type Estimator for Finite Populations. Symmetry. 2024; 16(5):540. https://doi.org/10.3390/sym16050540

Chicago/Turabian Style

Triveni, Gullinkala Ramya Venkata, Faizan Danish, and Olayan Albalawi. 2024. "Optimizing Variance Estimation in Stratified Random Sampling through a Log-Type Estimator for Finite Populations" Symmetry 16, no. 5: 540. https://doi.org/10.3390/sym16050540

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop