Advancing Survey Sampling Efficiency under Stratified Random Sampling and Post-Stratification: Leveraging Symmetry for Enhanced Estimation Accuracy in the Prediction of Exam Scores

Triveni, Gullinkala Ramya Venkata; Danish, Faizan; Albalawi, Olayan

doi:10.3390/sym16050604

Open AccessArticle

Advancing Survey Sampling Efficiency under Stratified Random Sampling and Post-Stratification: Leveraging Symmetry for Enhanced Estimation Accuracy in the Prediction of Exam Scores

by

Gullinkala Ramya Venkata Triveni

¹,

Faizan Danish

^1,*

and

Olayan Albalawi

²

¹

Department of Mathematics, School of Advanced Sciences, VIT-AP University, Inavolu, Beside AP Secretariat, Amaravati 522237, Andhra Pradesh, India

²

Department of Statistics, Faculty of Science, University of Tabuk, Tabuk 47713, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Symmetry 2024, 16(5), 604; https://doi.org/10.3390/sym16050604

Submission received: 27 March 2024 / Revised: 4 May 2024 / Accepted: 6 May 2024 / Published: 13 May 2024

(This article belongs to the Section Mathematics)

Download

Browse Figures

Versions Notes

Abstract

:

This pioneering investigation introduces two innovative estimators crafted to evaluate the finite population distribution function of a study variable, employing auxiliary variables within the framework of stratified random sampling and post-stratification while emphasizing symmetry in the sampling process. The derivation of mathematical expressions for bias and the mean square error up to the first degree of approximation fortifies the credibility of the proposed estimators. Drawing from three distinct datasets, including real-world data capturing student behaviors and exam performances from 500 students, this research highlights the superior efficiency of the proposed estimators compared to existing methods across both sampling schemes. Employing the proposed estimator, we effectively forecast students’ exam scores based on their study hours, backed by empirical evidence showcasing its precision in terms of mean square error and percentage relative efficiency. This study not only introduces inventive solutions to enduring challenges in survey sampling but also provides practical insights into enhancing predictive accuracy in educational assessments.

Keywords:

cumulative distribution function; stratified sampling; post-stratification; percentage relative efficiency; symmetry

MSC:

62D05; 94A20

1. Introduction

In the pursuit of estimating population parameters, auxiliary variables are crucial tools. These variables, which are different but intricately linked to the variable of interest, offer a dependable method for improving the consistency and validity of statistical estimations. Based on survey sampling theory, the importance of auxiliary variables becomes clear, especially in guaranteeing symmetry in the sampling process. Given the impracticality or difficulty of collecting comprehensive data from the entire population in survey research, researchers use sampling—an intentional selection of a smaller subset—to collect data, aiming for symmetry in representation. The goal is to extrapolate findings from this sample to the entire population, using auxiliary variables that promote symmetry in the estimate. Using auxiliary variables in the sample and estimating process appears to be a powerful method for boosting estimation accuracy while maintaining symmetry in the representation of population features. Previous research has clarified a variety of population parameters, including mean, median, total, distribution function, etc., each of which requires supportive variable data in addition to the variable of interest, adding to the symmetrical representation of population traits. In stratified sampling, a distribution function estimator is useful for estimating the cumulative distribution function (CDF) within each stratum. These estimates are then combined to generate a comprehensive estimate for the entire population, ensuring symmetry throughout the estimating process.

Over the years, numerous scholars have delved into different facets of estimators in stratified random sampling (St RS), enriching our understanding and refining the methodologies in this critical domain. The author of [1] addressed the credibility of the approximate formula for computing variance, while [2] discussed estimation methods and the dual property of ratio estimation. Eventually, ref. [3] focused on techniques in post-stratification, and later, ref. [4] demonstrated methods to improve ratio and regression estimators. Further, ref. [5] explored the characteristics of estimators for finite population distribution functions. Later, ref. [6] proposed a Bayesian model-based theory for post-stratification, and [7] presented calibration estimators using auxiliary data. Further, refs. [8,9] also contributed with their estimators for distribution functions in post-stratification.

Later, ref. [10] extended estimators with readily accessible supporting variables. An efficient ratio estimator for stratified sampling was introduced by [11]. Further, ref. [12] presented a family of estimators for the population mean, which was validated empirically. Later, ref. [13] proposed superior exponential ratio estimators. Further, ref. [14] derived a diligent ratio and product estimator, which outperformed others. Thereafter, ref. [15] devised exponential ratio estimators based on supporting variables. Additionally, ref. [16] suggested reliable ratio and difference estimators for population distributions. Researchers have made significant advancements in sampling estimation. Later, ref. [17] improved exponential estimators for post-stratification. Ref. [18] introduced superior estimators for SRS and stratified sampling with two supporting variables. Following that, ref. [19] enhanced the difference cum exponential estimator. Thereafter, ref. [20] proposed efficient estimators. Ref. [21] suggested a ratio estimator for post-stratification. Subsequently, ref. [22] improved the generalized population mean estimator, while [23] developed estimators for the finite population mean in SRS. Later, ref. [24] presented a two-parameter ratio product ratio estimator. Afterward, ref. [25] suggested an innovative family of exponential estimators using supporting attributes and actual data sets.

In recent times, several authors have focused on distribution function estimators using supporting variables. Ref. [26] proposed finite population distribution function estimators, which have outperformed others in simple random sampling (SRS) and stratified sampling. Following that, ref. [27] introduced imputation methods for calculating the population mean in two-occasion successive sampling. Eventually, ref. [28] recommended exponential-type estimators for finite population mean, demonstrating superiority with four data sets. Thereafter, ref. [29] devised estimators for the population mean and efficiently combined and separate estimators in stratified sampling. Additionally, ref. [30] proposed a ratio estimator with the highest effectiveness via empirical and simulation studies. Afterward, ref. [31] developed an estimator for estimating the population distribution function and proved its efficiency by using a simulation study. Further, ref. [32] discussed the efficiency of the ratio estimator in stratified sampling and proved its efficiency by utilizing empirical studies. Later, ref. [33] suggested robust-type estimators for population variance, outperforming existing methods in simple and St RS. A hybrid estimator for the population mean was proposed by [34], showing superior efficiency through empirical and simulated experiments. Later, ref. [35] proposed a log-type estimator in stratified ranked set sampling. A new approach to the mean estimators in ranked set sampling was introduced by [36].

The literature on the estimation of CDFs is notably sparse, highlighting a significant gap in research. In response, this article is committed to advancing this field by introducing innovative CDF estimators. Our focus lies in proposing two distinctive classes of estimators that harness auxiliary variable information to accurately estimate the CDF of a specific variable under examination. By leveraging auxiliary variables, our proposed estimators aim to fill this gap and provide enhanced methods for estimating CDFs, thus contributing to the broader advancement of statistical estimation techniques. The paper has been systematically organized to enhance clarity and coherence in presenting the research on stratified and post-stratified sampling methods. Beginning with an introduction that sets the stage for the study, Section 2 elucidates key terms and concepts essential for understanding the subsequent discussion. The literature review delves into existing estimators in both stratified and post-stratified sampling, laying the groundwork for the methodology section, where novel estimators for each method are proposed. The theoretical framework provides a theoretical underpinning for both sampling techniques, while Section 6 details the implementation and outcomes of empirical investigations conducted for each method. Section 7 brings together the findings from both empirical studies, facilitating a comprehensive analysis of the proposed estimators and their implications. Finally, Section 8 offers a concise summary of the study’s key findings and their significance for future research and practical applications. This organization ensures a logical flow of ideas and a clear delineation of the contributions made in the domains of both stratified and post-stratified sampling methodologies.

2. Background and Notations

2.1. Notations in Stratified Random Sampling

To evaluate the finite population distribution function, regarding a finite population,

Ω = 1, 2, 3, \dots N

of

N

distinct units is distributed to k homogeneous strata,

N_{h}

is the size of

h^{t h}

stratum such that

\sum_{h = 1}^{k} N_{h} = N .

A sample size

n_{h} (\sum_{h = 1}^{k} n_{h} = n)

is taken from the

h^{t h}

stratum by utilizing SRS without replacement.

Let

F_{s t} (y) = F (y) = \sum_{h = 1}^{k} W_{h} F_{h} (y)

and

F_{s t} (x) = F (x) = \sum_{h = 1}^{k} W_{h} F_{h} (x)

be the population distribution function of the variables

Y

(study variable) and

X

(auxiliary variable) under St RS, respectively. Let

{\hat{F}}_{s t} (y) = \hat{F} (y) = \sum_{h = 1}^{k} W_{h} {\hat{F}}_{h} (y)

and

{\hat{F}}_{s t} (x) = \hat{F} (x) = \sum_{h = 1}^{k} W_{h} {\hat{F}}_{h} (x)

be the sample distribution functions of the variables

Y

and

X,

respectively.

where

h = 1, 2, 3 \dots k

and

i = 1, 2, 3 \dots N_{h}

and

n =

sample size.

$W_{h} = \frac{N_{h}}{N}$ denotes the stratum weight of $h^{t h}$ stratum.
$F_{h} (y) = \sum_{i = 1}^{N_{h}} ∆ (Y_{i h} \leq y) / N_{h}$ and ${\hat{F}}_{h} (y) = \sum_{i = 1}^{n_{h}} ∆ (Y_{i h} \leq y) / n_{h}$ represents population and sample distribution functions of $Y$ for the $h^{t h}$ stratum and $∆ (Y_{i h} \leq y)$ is the indicator variable of $Y$ .
$F_{h} (x) = \sum_{i = 1}^{N_{h}} ∆ (X_{i h} \leq x) / N_{h}$ and ${\hat{F}}_{h} (x) = \sum_{i = 1}^{n_{h}} ∆ (X_{i h} \leq x) / n_{h}$ represents the population and sample distribution functions of $X$ for the $h^{t h}$ stratum and $∆ (X_{i h} \leq x)$ is the indicator variable of $X$ .

Here, we consider error terms for finding bias and MSE of the estimators.

Let

e_{y} = \frac{{\hat{F}}_{s t} (y) - F (y)}{F (y)}

,

{\hat{F}}_{s t} (y) = F (y) + e_{y} F (y) = F (y) (1 + e_{y})

and

e_{x} = \frac{{\hat{F}}_{s t} (x) - F (x)}{F (x)}

,

{\hat{F}}_{s t} (x) = F (x) + e_{x} F (x) = F (x) (1 + e_{x})

E (e_{y}) = E (e_{x}) = 0

E (e_{y}^{2}) = \sum_{h = 1}^{k} w_{h}^{2} λ_{h} C_{y h}^{2} = V_{0 s t} (s a y)

E (e_{x}^{2}) = \sum_{h = 1}^{k} w_{h}^{2} λ_{h} C_{x h}^{2} = V_{1 s t} (s a y),

E (e_{x} e_{y}) = \sum_{h = 1}^{k} w_{h}^{2} λ_{h} R_{x y h} C_{x h} C_{y h} = V_{01 s t} (s a y)

where

C_{y h} = \frac{S_{y h}}{F (y)}, C_{x h} = \frac{S_{x h}}{F (x)}, λ_{h} = \frac{1}{n_{h}} - \frac{1}{N_{h}} and R_{x y h} = \frac{S_{x y h}}{S_{y h} S_{x h}} .

S_{y h}^{2} = \sum_{i = 1}^{N_{h}} {(∆ (Y_{i h} \leq y) - F (y))}^{2} / (N - 1),

S_{x h}^{2} = \sum_{i = 1}^{N_{h}} {(∆ (X_{i h} \leq x) - F (x))}^{2} / (N - 1),

S_{x y h} = \sum_{i = 1}^{N_{h}} (∆ (Y_{i h} \leq y)) (∆ (X_{i h} \leq x)) / (N - 1) .

2.2. Notations in Post-Stratification

Post-stratification in survey sampling addresses missing crucial attributes by dividing the population into subgroups based on known auxiliary variables. Survey weights are adjusted to account for variations in the distribution of these variables, mitigating biases from nonresponse and small sample sizes. By using CDF estimators, researchers achieve more precise distribution estimates within specific subgroups, enhancing the understanding of the study variable’s characteristics.

In post-stratification, the traditional unbiased estimator of the population distribution function is referred to as

{\hat{F}}_{p s} (y) = \sum_{h = 1}^{K} W_{h} {\hat{F}}_{h} (y)

where

{\hat{F}}_{p s} (y)

is the post-stratified empirical distribution function at

y

K

is the number of post-strata.

{\hat{F}}_{h} (y)

is the distribution function of

y

for the

h^{t h}

stratum.

Variance of

{\hat{F}}_{p s} (y)

is formulated as

V a r ({\hat{F}}_{p s} (y)) = [\frac{1}{n} - \frac{1}{N}] \sum_{h = 1}^{K} W_{h} S_{y h}^{2} - \frac{1}{n^{2}} \sum_{h = 1}^{K} (1 - W_{h}) S_{y h}^{2}

Consider the error terms below to obtain the bias and MSE of our proposed estimator,

e_{0} = \frac{\sum_{h = 1}^{K} W_{h} F_{h} (y) e_{0 h}}{F (y)}, e_{1} = \frac{\sum_{h = 1}^{K} W_{h} F_{h} (x) e_{1 h}}{F (x)} and e_{2} = \frac{\sum_{h = 1}^{K} W_{h} F_{h} (\bar{x}) e_{2 h}}{F (\bar{x})}

where

e_{0 h} = \frac{{\hat{F}}_{p s} (y) - F_{h} (y)}{F_{h} (y)}, e_{1 h} = \frac{{\hat{F}}_{p s} (x) - F_{h} (x)}{F_{h} (x)} and e_{2 h} = \frac{{\hat{F}}_{p s} (\bar{x}) - F_{h} (\bar{x})}{F_{h} (\bar{x})} E (e_{0 h}) = E (e_{1 h}) = E (e_{2 h}) = 0 E (e_{0 h}^{2}) = [\frac{1}{n W_{h}} - \frac{1}{N_{h}}] C_{y h}^{2}, E (e_{1 h}^{2}) = [\frac{1}{n W_{h}} - \frac{1}{N_{h}}] C_{x h}^{2}, E (e_{2 h}^{2}) = [\frac{1}{n W_{h}} - \frac{1}{N_{h}}] C_{\bar{x} h}^{2} E (e_{0 h} e_{1 h}) = [\frac{1}{n W_{h}} - \frac{1}{N_{h}}] R_{x y h} C_{x h} C_{y h}, E (e_{0 h} e_{1 h}) = [\frac{1}{n W_{h}} - \frac{1}{N_{h}}] R_{\bar{x} y h} C_{\bar{x} h} C_{y h} E (e_{0 h} e_{1 h}) = [\frac{1}{n W_{h}} - \frac{1}{N_{h}}] R_{x \bar{x} h} C_{x h} C_{\bar{x} h} .

Now, we will find the expected values of error terms:

E (e_{0}) = E (\frac{\sum_{h = 1}^{K} W_{h} F_{h} (y) e_{0 h}}{F (y)}) = \frac{1}{F (y)} (\sum_{h = 1}^{K} W_{h} F_{h} (y) {E (e}_{0 h})) = 0

Similarly,

E (e_{0}) = E (e_{1}) = E (e_{2}) = 0 E (e_{0}^{2}) = E {(\frac{\sum_{h = 1}^{K} W_{h} F_{h} (y) e_{0 h}}{F (y)})}^{2} = \frac{1}{F^{2} (y)} \sum_{h = 1}^{K} W_{h}^{2} F_{h}^{2} (y) E (e_{0 h}^{2}) = \frac{1}{F^{2} (y)} \sum_{h = 1}^{K} W_{h}^{2} F_{h}^{2} (y) [\frac{1}{n W_{h}} - \frac{1}{N_{h}}] C_{y h}^{2} E (e_{0}^{2}) = \frac{1}{F^{2} (y)} [\frac{1}{n} - \frac{1}{N}] \sum_{h = 1}^{K} W_{h} S_{y h}^{2} = V_{0 p s} (s a y)

Similarly,

E (e_{1}^{2}) = \frac{1}{F^{2} (x)} [\frac{1}{n} - \frac{1}{N}] \sum_{h = 1}^{K} W_{h} S_{x h}^{2} = V_{1 p s} (s a y) E (e_{2}^{2}) = \frac{1}{{\bar{X}}^{2}} [\frac{1}{n} - \frac{1}{N}] \sum_{h = 1}^{K} W_{h} S_{\bar{x} h}^{2} = V_{2 p s} (s a y)

and

E (e_{0} e_{1}) = \frac{1}{F (x) F (y)} [\frac{1}{n} - \frac{1}{N}] \sum_{h = 1}^{K} W_{h} {R_{x y h} C}_{y h} C_{x h} = V_{01 p s} (s a y) E (e_{0} e_{2}) = \frac{1}{F (y) \bar{X}} [\frac{1}{n} - \frac{1}{N}] \sum_{h = 1}^{K} W_{h} {R_{y \bar{x} h} C}_{y h} C_{\bar{x} h} = V_{02 p s} (s a y) E (e_{1} e_{2}) = \frac{1}{F (x) \bar{X}} [\frac{1}{n} - \frac{1}{N}] \sum_{h = 1}^{K} W_{h} R_{x \bar{x} h} C_{\bar{x} h} C_{x h} = V_{12 p s} (s a y)

3. Literature Review

3.1. Pre-Existing Estimators under Stratified Random Sampling

Several authors have suggested estimators for calculating the finite population mean in stratified sampling. We adopted them in the cumulative distribution function estimators under stratified sampling to evaluate the population cumulative distribution function of the study variable by utilizing the knowledge of information of supporting variable. Until the first degree of approximation, we obtained bias and MSE equations for the following pre-existing estimators, which were given by prominent authors.

1. The usual unbiased estimator of

F (y)

is given as

{\hat{F}}_{S {R S}_{s t}} (y) = \frac{1}{n} \sum_{i = 1}^{n} ∆ (Y_{i} \leq y)

The MSE of

{\hat{F}}_{S {R S}_{s t}} (y)

is

M S E ({\hat{F}}_{S {R S}_{s t}} (y)) = F^{2} (y) \sum_{h = 1}^{k} w_{h}^{2} λ_{h} C_{y h}^{2} = {F^{2} (y) V}_{0 s t}

(1)

2. The classical ratio estimator of

F (y),

according to [1], is given by

{\hat{F}}_{R e} (y) = {\hat{F}}_{s t} (y) [\frac{F (x)}{{\hat{F}}_{s t} (x)}]

And its bias and MSE become

B i a s ({\hat{F}}_{R e} (y)) = F (y) (V_{1 s t} - V_{01 s t}) M S E ({\hat{F}}_{R e} (y)) = F^{2} (y) (V_{0 s t} + V_{1 s t} - 2 V_{01 s t})

(2)

3. The product estimator of

F (y)

was proposed by [2]

{\hat{F}}_{P e} (y) = {\hat{F}}_{s t} (y) [\frac{{\hat{F}}_{s t} (x)}{F (x)}]

Its bias and MSE become approximately constant until the first order:

B i a s ({\hat{F}}_{P e} (y)) = F (y) V_{01 s t} M S E ({\hat{F}}_{P e} (y)) = F^{2} (y) (V_{0 s t} + V_{1 s t} + 2 V_{01 s t})

(3)

4. A difference-type estimator was proposed by [4]:

{\hat{F}}_{D e} (y) = m_{1} {\hat{F}}_{s t} (y) + m_{2} [F (x) - {\hat{F}}_{s t} (x)]

where

m_{1}

and

m_{2}

are unknown fixed values.

{B i a s (\hat{F}}_{D e} (y)) = F (y) (m_{1} - 1) M S E ({\hat{F}}_{D e} (y)) = F^{2} (y) - 2 m_{1} F^{2} (y) + m_{1}^{2} F^{2} (y) {+ m}_{1}^{2} F^{2} (y) V_{0 s t} - 2 m_{1} m_{2} F (x) F (y) V_{01 s t} + m_{2}^{2} F^{2} (x) V_{1 s t}

After minimizing

M S E ({\hat{F}}_{D e} (y)),

we get optimum values as:

m_{1} = \frac{V_{1 s t}}{V_{1 s t} V_{0 s t} - {(V_{01 s t})}^{2} + V_{0 s t}}

and

m_{2} = \frac{F (y) V_{01 s t}}{F (x) (V_{1 s t} V_{0 s t} - {(V_{01 s t})}^{2} + V_{0 s t})}

Now,

M S E ({\hat{F}}_{D e} (y))

. Can we rewrite it as

{M S E}_{m i n} ({\hat{F}}_{D e} (y)) = \frac{F^{2} (y) (V_{0 s t} V_{1 s t} - {(V_{01 s t})}^{2})}{V_{1 s t} V_{0 s t} - {(V_{01 s t})}^{2} + V_{0 s t}}

(4)

5. A generalized ratio type exponential estimator was adopted by [13]:

{\hat{F}}_{R E e} (y) = {\hat{F}}_{s t} (y) e x p (\frac{a_{s t} (F (x) - {\hat{F}}_{s t} (x))}{a_{s t} (F (x) + {\hat{F}}_{s t} (x)) + 2 b_{s t}})

Here,

a_{s t}

and

b_{s t}

are fixed values, and bias and MSE will be

B i a s ({\hat{F}}_{R E e} (y)) = F (y) (\frac{3}{8} \sum_{h = 1}^{k} V_{1 s t} - \frac{1}{2} V_{01 s t}) M S E ({\hat{F}}_{R E e} (y)) = F^{2} (y) (V_{0 s t} + \frac{1}{4} V_{1 s t} - V_{01 s t})

(5)

6. Ref. [34] proposed a general class of estimators:

{\hat{F}}_{t_{k}} (y) = [t_{1} {\hat{F}}_{s t} (x) + t_{2} (F (x) - {\hat{F}}_{s t} (x))] {[\frac{a_{s t} F (x) + b_{s t}}{c_{s t} {\hat{F}}_{s t} (x) + d_{s t}}]}^{α} {[\exp (\frac{F (x) - {\hat{F}}_{s t} (x)}{F (x) + {\hat{F}}_{s t} (x)})]}^{β}

Here,

t_{1}, t_{2}, α, β

are suitable fixed values, and

a_{s t}, b_{s t}, c_{s t}, d_{s t}

are either functions of the known parameters of

x

or fixed values. The bias and MSE are

\begin{matrix} B i a s ({\hat{F}}_{t_{k}} (y)) = & F (y) [(t_{1} φ_{s t} - 1) \\ + \{φ_{s t} ((\frac{β}{2} + {α η}_{s t}) t_{2} r + (\frac{β}{2} + {α η}_{s t}) \frac{t_{1}}{2} + {(\frac{β}{2} + {α η}_{s t})}^{2} \frac{t_{1}}{2}) V_{1 s t} - (\frac{β}{2} + {α η}_{s t}) t_{1} V_{01 s t}\}] \end{matrix}

where

φ_{s t} = {[\frac{a_{s t} F (x) + b_{s t}}{c_{s t} F (x) + d_{s t}}]}^{α}, η_{s t} = \frac{c_{s t} F (x)}{c_{s t} F (x) + d_{s t}}

and

r = \frac{F (x)}{F (y)} .

\begin{matrix} M S E ({\hat{F}}_{t_{k}} (y)) = & F^{2} (y) [{{(t}_{1} φ_{s t} - 1)}^{2} \\ + {φ_{s t}}^{2} \{t_{1}^{2} V_{0 s t} - {(t_{2} r + t_{1} (\frac{β}{2} + {α η}_{s t}))}^{2} V_{1 s t} - 2 (t_{1} t_{2} r + t_{1}^{2} (\frac{β}{2} + {α η}_{s t})) V_{01 s t}\} \\ + 2 φ_{s t} ({(t}_{1} φ_{s t} - 1) {((\frac{β}{2} + {α η}_{s t}) t_{2} r + (\frac{β}{2} + {α η}_{s t}) \frac{t_{1}}{2} + {(\frac{β}{2} + {α η}_{s t})}^{2} \frac{t_{1}}{2}) V_{1 s t} \\ - (\frac{β}{2} + {α η}_{s t}) t_{1} V_{01 s t}}] \end{matrix}

We can rewrite the above equation as

M S E ({\hat{F}}_{t_{k}} (y)) = F^{2} (y) [1 - γ_{1} t_{1} + {γ_{2} t}_{1}^{2} - 2 γ_{3} t_{2} + γ_{4} t_{2}^{2} + 2 γ_{5} t_{1} t_{2}]

(6)

Such that

\begin{matrix} γ_{1} = & φ_{s t} [2 + (V_{1 s t} - 2 V_{01 s t}) (\frac{β}{2} + {α η}_{s t}) + {(\frac{β}{2} + {α η}_{s t})}^{2} V_{1 s t}] \\ γ_{2} = & φ_{s t} [1 + V_{0 s t} + (V_{1 s t} - 4 V_{01 s t}) (\frac{β}{2} + {α η}_{s t}) + 2 {(\frac{β}{2} + {α η}_{s t})}^{2} V_{1 s t}] \\ γ_{3} = & r φ_{s t} V_{1 s t} (\frac{β}{2} + {α η}_{s t}) \\ γ_{4} = & {φ_{s t}}^{2} r^{2} V_{1 s t} \\ γ_{5} = & r {φ_{s t}}^{2} V_{1 s t} [2 (\frac{β}{2} + {α η}_{s t}) V_{1 s t} - V_{01 s t}] \end{matrix}

We determine the optimal values by computing the derivatives of the MSE with respect to

t_{1}

and

t_{2}

.

t_{1} = \frac{γ_{1} γ_{4} - 2 γ_{3} γ_{5}}{2 γ_{2} γ_{4} - 2 γ_{5}^{2}}

and

t_{2} = \frac{2 γ_{2} γ_{3} - γ_{1} γ_{5}}{2 γ_{2} γ_{4} - 2 γ_{5}^{2}}

.

3.2. Existing Estimators in Post-Stratification

We have transformed the following stratified estimators into post-stratified estimators as follows:

1. The usual unbiased estimator of

F (y)

is given as

{\hat{F}}_{(p s)} (y) = \frac{1}{n} \sum_{i = 1}^{n} ∆ (Y_{i} \leq y)

The variance of

{\hat{F}}_{S {R S}_{p s}} (y)

is

a r ({\hat{F}}_{(p s)} (y)) = [\frac{1}{n} - \frac{1}{N}] \sum_{h = 1}^{K} W_{h} S_{y h}^{2} - \frac{1}{n^{2}} \sum_{h = 1}^{K} (1 - W_{h}) S_{y h}^{2} = V_{0 p s} F^{2} (y) - \frac{1}{n^{2}} \sum_{h = 1}^{K} (1 - W_{h}) S_{y h}^{2}

(7)

2. The classical ratio estimator of

F (y),

according to [1], is given by

{\hat{F}}_{R e (p s)} (y) = {\hat{F}}_{p s} (y) [\frac{F (x)}{{\hat{F}}_{p s} (x)}]

and its bias and MSE become

B i a s ({\hat{F}}_{R e (p s)} (y)) = F (y) (V_{1 p s} - V_{01 p s}) M S E ({\hat{F}}_{R e (p s)} (y)) = F^{2} (y) (V_{0 p s} + V_{1 p s} - 2 V_{01 p s})

(8)

3. The product estimator of

F (y)

was proposed by [2]:

{\hat{F}}_{P e (p s)} (y) = {\hat{F}}_{p s} (y) [\frac{{\hat{F}}_{p s} (x)}{F (x)}]

Its bias and MSE become approximately constant until the first order:

B i a s ({\hat{F}}_{P e (p s)} (y)) = F (y) V_{01 p s} M S E ({\hat{F}}_{P e (p s)} (y)) = F^{2} (y) (V_{0 p s} + V_{1 p s} + 2 V_{01 p s})

(9)

4. A difference-type estimator was proposed by [4]:

{\hat{F}}_{D e (p s)} (y) = m_{1} {\hat{F}}_{p s} (y) + m_{2} [F (x) - {\hat{F}}_{p s} (x)]

where

m_{1}

and

m_{2}

are unknown fixed values.

{B i a s (\hat{F}}_{D e (p s)} (y)) = F (y) (m_{1} - 1) M S E ({\hat{F}}_{D e (p s)} (y)) = F^{2} (y) - 2 m_{1} F^{2} (y) + m_{1}^{2} F^{2} (y) {+ m}_{1}^{2} F^{2} (y) V_{0 p s} - 2 m_{1} m_{2} F (x) F (y) V_{01 p s} + m_{2}^{2} F^{2} (x) V_{1 p s}

After minimizing

M S E ({\hat{F}}_{D e (p s)} (y)),

we obtain optimum values as

m_{1}

= \frac{V_{1 p s}}{V_{0 p s} V_{1 p s} + V_{0 p s} - V_{01 p s}^{2}}

and

m_{2} = \frac{F (y) V_{01 p s}}{F (x) (V_{0 p s} V_{1 p s} + V_{0 p s} - V_{01 p s}^{2})}

Now,

M S E ({\hat{F}}_{D e (p s)} (y))

. We can we rewrite it as

{M S E}_{m i n} ({\hat{F}}_{D e (p s)} (y)) = \frac{(V_{0 p s} V_{1 p s} - V_{01 p s}^{2})}{V_{0 p s} V_{1 p s} + V_{0 p s} - V_{01 p s}^{2}}

(10)

5. A generalized ratio-type exponential estimator was adopted by [13]:

{\hat{F}}_{R E e (p s)} (y) = {\hat{F}}_{p s} (y) e x p (\frac{a_{p s} (F (x) - {\hat{F}}_{p s} (x))}{a_{p s} (F (x) + {\hat{F}}_{p s} (x)) + 2 b_{p s}})

Here,

a_{p s}

and

b_{p s}

are fixed values, and bias and MSE will be

i a s ({\hat{F}}_{R E e (p s)} (y)) = F (y) (\frac{3}{8} V_{1 p s} - \frac{1}{2} V_{01 p s}) M S E ({\hat{F}}_{R E e (p s)} (y)) = F^{2} (y) (V_{0 p s} + \frac{1}{4} V_{1 p s} - V_{01 p s})

(11)

6. Ref. [34] proposed a general class of estimators given by

{\hat{F}}_{t_{k} (p s)} (y) = [t_{1 p s} {\hat{F}}_{p s} (x) + t_{2 p s} (F (x) - {\hat{F}}_{p s} (x))] {[\frac{a_{p s} F (x) + b_{p s}}{c_{p s} {\hat{F}}_{p s} (x) + d_{p s}}]}^{α} {[\exp (\frac{F (x) - {\hat{F}}_{p s} (x)}{F (x) + {\hat{F}}_{p s} (x)})]}^{β}

Here,

t_{1 p s}, t_{2 p s}, α, β

are suitable fixed values, and

a_{p s}, b_{p s}, c_{p s}, d_{p s}

are either fixed values or functions of the known parameters of

x .

Bias and MSE are

\begin{matrix} B i a s ({\hat{F}}_{t_{k} (p s)} (y)) \\ = F (y) [(t_{1 p s} φ_{p s} - 1) \\ + {φ_{p s} ((\frac{β}{2} + {α η}_{p s}) t_{2 p s} r + (\frac{β}{2} + {α η}_{p s}) \frac{t_{1 p s}}{2} \\ + {(\frac{β}{2} + {α η}_{p s})}^{2} \frac{t_{1 p s}}{2}) V_{1 p s} - (\frac{β}{2} + {α η}_{p s}) t_{1 p s} V_{01 p s}}] \end{matrix}

where

φ_{p s} = {[\frac{a_{p s} F (x) + b_{p s}}{c_{p s} {\hat{F}}_{p s} (x) + d_{p s}}]}^{α}, η_{p s} = \frac{c_{p s} F (x)}{c_{p s} F (x) + d_{p s}}

and

r = \frac{F (x)}{F (y)}

.

\begin{matrix} M S E ({\hat{F}}_{t_{k} (p s)} (y)) \\ = F^{2} (y) [{{(t}_{1 p s} φ_{p s} - 1)}^{2} {+ φ}_{p s}^{2} {t_{1 p s}^{2} V_{0 p s} - {(t_{2 p s} r + t_{1 p s} (\frac{β}{2} + {α η}_{p s}))}^{2} V_{1 p s} \\ - 2 (t_{1 p s} t_{2 p s} r + t_{1 p s}^{2} (\frac{β}{2} + {α η}_{p s})) V_{01 p s}} + 2 φ_{p s} ({(t}_{1 p s} φ_{p s} \\ - 1) {(\frac{β}{2} + {α η}_{p s}) t_{2 p s} r + (\frac{β}{2} + {α η}_{p s}) \frac{t_{1 p s}}{2} + {(\frac{β}{2} + {α η}_{p s})}^{2} \frac{t_{1 p s}}{2}) V_{1 p s} \\ - (\frac{β}{2} + {α η}_{p s}) t_{1 p s} V_{01 p s}}] \end{matrix}

We can rewrite the above equation as

M S E ({\hat{F}}_{t_{k} (p s)} (y)) = F^{2} (y) [1 - γ_{1 p s} t_{1 p s} + {γ_{2 p s} t}_{1 p s}^{2} - 2 γ_{3 p s} t_{2 p s} + γ_{4 p s} t_{2 p s}^{2} + 2 γ_{5 p s} t_{1 p s} t_{2 p s}]

(12)

Such that

\begin{matrix} γ_{1 p s} = & φ_{p s} [2 + (V_{1 p s} - 2 V_{01 p s}) (\frac{β}{2} + {α η}_{p s}) + {(\frac{β}{2} + {α η}_{p s})}^{2} V_{1 p s}] \\ γ_{2 p s} = & φ_{p s} [1 + V_{0 p s} + (V_{1 p s} - 4 V_{01 p s}) (\frac{β}{2} + {α η}_{p s}) + 2 {(\frac{β}{2} + {α η}_{p s})}^{2} V_{1 p s}] \\ γ_{3 p s} = & r φ_{p s} V_{1 p s} (\frac{β}{2} + {α η}_{p s}) \\ γ_{4 p s} = & φ_{p s}^{2} r^{2} V_{1 p s} \\ γ_{5 p s} = & r φ_{p s}^{2} V_{1 p s} [2 (\frac{β}{2} + {α η}_{p s}) V_{1 p s} - V_{01 p s}] \end{matrix}

We obtain the optimal values of

t_{1 p s}

and

t_{2 p s}

by differentiating

M S E ({\hat{F}}_{t_{k} (p s)} (y))

with respect to

t_{1 p s}

and

t_{2 p s}

.

t_{1 p s} = \frac{γ_{1 p s} γ_{4 p s} - 2 γ_{3 p s} γ_{5 p s}}{2 γ_{2 p s} γ_{4 p s} - 2 γ_{5 p s}^{2}}

and

t_{2 p s} = \frac{2 γ_{2 p s} γ_{3 p s} - γ_{1 p s} γ_{5 p s}}{2 γ_{2 p s} γ_{4 p s} - 2 γ_{5 p s}^{2}}

.

4. Proposed Estimators

4.1. Proposed Estimator in Stratified Random Sampling

Inspired by [34], we have proposed a compound of difference, ratio, product, and exponential type of estimator to evaluate the population cumulative distribution function of the study variable as

{\hat{F}}_{s t p} (y) = [n_{1} {\hat{F}}_{s t} (y) + n_{2} (F (x) - {\hat{F}}_{s t} (x))] {[\frac{a_{s t} F (x) + b_{s t}}{c_{s t} {\hat{F}}_{s t} (x) + d_{s t}}]}^{α} {[\frac{a_{s t} F (x) + b_{s t}}{c_{s t} {\hat{F}}_{s t} (x) + d_{s t}}]}^{- γ} e x p {[\frac{F (x) - {\hat{F}}_{s t} (x)}{F (x) + {\hat{F}}_{s t} (x)}]}^{β}

(13)

where

n_{1}, n_{2}, α, β, γ

are suitable constants and

a_{s t}, b_{s t}, c_{s t}, d_{s t}

denote the functions or constants of known parameters of auxiliary variable

x

.

We have considered six estimators from the literature. By substituting suitable values of

n_{1}, n_{2}, a_{s t}, b_{s t}, c_{s t}, d_{s t}, α, γ,

and

β

in our proposed estimator, i.e., (13), we obtained the above-mentioned estimators and represented them in Table 1 as follows:

Bias and MSE of Proposed Estimator ${\hat{F}}_{s t p} (y)$

By converting Equation (13) as

e_{i}^{' s} (i = x, y)

, we have

{\hat{F}}_{s t p} (y) = [n_{1} F (y) + n_{1} F (y) e_{y} - n_{2} F (x) e_{x}] [φ_{s t} {(1 + η_{s t} e_{x})}^{- α}] [φ_{s t 1} {(1 + η_{s t} e_{x})}^{γ}] [1 - \frac{β}{2} e_{x} + \frac{(2 + β) β}{8} e_{x}^{2}]

where

φ_{s t} = {(\frac{a_{s t} F (x) + b_{s t}}{c_{s t} F (x) + d_{s t}})}^{α}

,

φ_{s t 1} = {(\frac{a_{s t} F (x) + b_{s t}}{c_{s t} F (x) + d_{s t}})}^{- γ}

,

η_{s t} = \frac{c_{s t} F (x)}{c_{s t} F (x) + d_{s t}}

and

r = \frac{F (x)}{F (y)}

\begin{matrix} {\hat{F}}_{s t p} (y) - F (y) = & φ_{s t} φ_{s t 1} F (y) [n_{1} + n_{1} e_{y} - 1] \\ + φ_{s t} φ_{s t 1} e_{x} \{n_{1} η_{s t} (η_{s t} \frac{α (α + 1)}{2} - α + γ + \frac{γ (γ + 1)}{2} η_{s t} - \frac{β}{2 η_{s t}}) - r n_{2}\} \\ + φ_{s t} φ_{s t 1} e_{x}^{2} {n_{1} η_{s t} (- α^{2} γ - \frac{α γ (γ + 1)}{2} η_{s t} + \frac{α (α + 1)}{2} \frac{γ (γ + 1)}{2} η_{s t}^{3} + \frac{α γ (γ + 1)}{2} η_{s t}^{2} + \frac{α β}{2} \\ - \frac{β}{2} \frac{α (α + 1)}{2} η_{s t} - \frac{β γ}{2} - \frac{β}{2} \frac{γ (γ + 1)}{2} η_{s t}) \\ + n_{2} η_{s t} (r α - \frac{γ (γ + 1)}{2} η_{s t} - r γ - r \frac{γ (γ + 1)}{2} η_{s t} - r \frac{β}{2 γ η_{s t}})} \\ + φ_{s t} φ_{s t 1} e_{x} e_{y} \{n_{1} η_{s t} (- α + \frac{α (α + 1)}{2} η_{s t} + γ + \frac{γ (γ + 1)}{2} η_{s t} - \frac{β}{2 η_{s t}})\} \end{matrix}

(14)

By considering expectations on each side of (14), we acquire the proposed estimator’s bias:

\begin{matrix} B i a s ({\hat{F}}_{s t p} (y)) = & φ_{s t} φ_{s t 1} F (y) [n_{1} - 1] \\ + φ_{s t} φ_{s t 1} V_{1 s t} {n_{1} η_{s t} (- α^{2} γ - \frac{α γ (γ + 1)}{2} η_{s t} + \frac{α (α + 1)}{2} \frac{γ (γ + 1)}{2} η_{s t}^{3} + \frac{α γ (γ + 1)}{2} η_{s t}^{2} + \frac{α β}{2} \\ - \frac{β}{2} \frac{α (α + 1)}{2} η_{s t} - \frac{β γ}{2} - \frac{β}{2} \frac{γ (γ + 1)}{2} η_{s t}) \\ + n_{2} η_{s t} (r α - \frac{γ (γ + 1)}{2} η_{s t} - r γ - r \frac{γ (γ + 1)}{2} η_{s t} - r \frac{β}{2 γ η_{s t}})} \\ + φ_{s t} φ_{s t 1} V_{01 s t} \{n_{1} η_{s t} (- α + \frac{α (α + 1)}{2} η_{s t} + γ + \frac{γ (γ + 1)}{2} η_{s t} - \frac{β}{2 η_{s t}})\} \end{matrix}

Squaring on both sides of (14) and eliminating higher powers of

e_{i}^{' s}

, we acquire

\begin{matrix} {[{\hat{F}}_{s t p} (y) - F (y)]}^{2} \\ = φ_{s t}^{2} φ_{s t 1}^{2} F^{2} (y) [{(n_{1} - 1)}^{2} + n_{1}^{2} e_{y}^{2}] \\ + {φ_{s t}}^{2} {φ_{s t 1}}^{2} e_{x}^{2} {\{n_{1} η_{s t} (\frac{α (α + 1)}{2} - α + γ + \frac{γ (γ + 1)}{2} η_{s t} - \frac{β}{2 η_{s t}}) - r n_{2}\}}^{2} \\ + {φ_{s t}}^{2} {φ_{s t 1}}^{2} {(e_{x} e_{y})}^{2} {\{n_{1} η_{s t} (- α + \frac{α (α + 1)}{2} η_{s t} + γ + \frac{γ (γ + 1)}{2} η_{s t} - \frac{β}{2 η_{s t}})\}}^{2} \end{matrix}

(15)

MSE is obtained by considering expectations on both sides of (15):

\begin{matrix} M S E ({\hat{F}}_{s t p} (y)) = & φ_{s t}^{2} φ_{s t 1}^{2} F^{2} (y) {1 \\ + n_{1}^{2} (1 + V_{0 s t} + η_{s t}^{2} V_{1 s t} {({(η}_{s t} \frac{α (α + 1)}{2} - α) + (γ + \frac{γ (γ + 1)}{2} η_{s t} - \frac{β}{2 η_{s t}}))}^{2} \\ + 2 V_{01 s t} {((η_{s t} \frac{α (α + 1)}{2} - α) + (γ + \frac{γ (γ + 1)}{2} η_{s t} - \frac{β}{2 η_{s t}}))}^{2}) - 2 n_{1} + n_{2}^{2} r^{2} \\ - 2 {r n}_{1} n_{2} η_{s t} (η_{s t} \frac{α (α + 1)}{2} - α + γ + \frac{γ (γ + 1)}{2} η_{s t} - \frac{β}{2 η_{s t}})} \end{matrix}

M S E ({\hat{F}}_{s t p} (y)) = φ_{s t}^{2} φ_{s t 1}^{2} F^{2} (y) [1 + l_{1} n_{1}^{2} - 2 n_{1} + n_{2}^{2} r^{2} - 2 n_{1} n_{2} l_{2}]

(16)

where

l_{1} = 1 + V_{0 s t} + η_{s t}^{2} V_{1 s t} {((η_{s t} \frac{α (α + 1)}{2} - α) + (γ + \frac{γ (γ + 1)}{2} η_{s t} - \frac{β}{2 η_{s t}}))}^{2} + 2 V_{01 s t} {((η_{s t} \frac{α (α + 1)}{2} - α) + (γ + \frac{γ (γ + 1)}{2} η_{s t} - \frac{β}{2 η_{s t}}))}^{2} {l_{2} = r η}_{s t} (η_{s t} \frac{α (α + 1)}{2} - α + γ + \frac{γ (γ + 1)}{2} η_{s t} - \frac{β}{2 η_{s t}})

We obtained optimum values by differentiating (16) separately with respect to

n_{1}

and

n_{2}

to equate them with zero. We obtain

n_{1} = \frac{r^{2}}{r^{2} l_{1} - l_{2}^{2}}

and

n_{2} = \frac{l_{2}}{r^{2} l_{1} + l_{2}^{2}}

.

4.2. Proposed Estimator in Post-Stratification

We have proposed a post-stratified estimator by taking the combination of different types of estimators. The proposed post-stratified estimator is

{\hat{F}}_{p s p} (y) = {\hat{F}}_{p s} (y) + [t_{3} (F (x) - {\hat{F}}_{p s} (x)) + t_{4} χ (\bar{X} - {\bar{x}}_{p s})] e x p {[\frac{F (x) - {\hat{F}}_{p s} (x)}{F (x) + {\hat{F}}_{p s} (x)}]}^{ψ}

(17)

Here,

{\hat{F}}_{p s p} (y)

denotes the post-stratified proposed estimator of the distribution function of

y .

Here,

t_{3}, t_{4}, χ

and

ψ

are suitable constants.

Expressing Equation (17) in terms of

e_{i}^{' s}

, we have

{\hat{F}}_{p s p} (y) = F (y) (1 + e_{0}) + [t_{3} (F (x) - F (x) (1 + e_{1}) + t_{4} χ (\bar{X} - \bar{X} (1 + e_{2})] e x p [\frac{F (x) - F (x) (1 + e_{1})}{F (x) + F (x) (1 + e_{1})}] {\hat{F}}_{p s p} (y) = F (y) + F (y) e_{0} - t_{3} F (x) e_{1} - t_{4} χ \bar{X} e_{2} + t_{3} F (X) \frac{ψ}{2} e_{1}^{2} + \bar{X} t_{4} χ ψ \frac{e_{1} e_{2}}{2}

where

F (y) = \sum_{h = 1}^{K} W_{h} F_{h} (y)

,

F (x) = \sum_{h = 1}^{K} W_{h} F_{h} (x), {\hat{F}}_{p s} (y) = \sum_{h = 1}^{K} W_{h} {\hat{F}}_{h} (y)

,

{\hat{F}}_{p s} (x) = \sum_{h = 1}^{K} W_{h} {\hat{F}}_{h} (x),

{\hat{F}}_{h} (y) = F_{h} (y) (1 + e_{0 h})

and

{\hat{F}}_{h} (x) = F_{h} (x) (1 + e_{1 h})

.

Bias and MSE of Proposed Estimator ${\hat{F}}_{p s p} (y)$

{\hat{F}}_{p s p} (y) - F (y) = F (y) e_{0} - t_{3} F (x) e_{1} - t_{4} χ \bar{X} e_{2} + t_{3} F (x) \frac{ψ}{2} e_{1}^{2} + \bar{X} t_{4} χ ψ \frac{e_{1} e_{2}}{2}

(18)

Therefore, the bias of the proposed post-stratified estimator is obtained by applying expectation on both sides of (18):

B i a s ({\hat{F}}_{p s p} (y)) = t_{3} F (x) \frac{ψ}{2} V_{1 p s} + \bar{X} t_{4} χ ψ \frac{V_{12 p s}}{2}

Squaring on both sides of (18) and eliminating higher powers of

e_{i}^{' s}

, we acquire

\begin{matrix} {[{\hat{F}}_{p s p} (y) - F (y)]}^{2} \\ = F^{2} (y) e_{0}^{2} + F^{2} (x) t_{3}^{2} e_{1}^{2} + {\bar{X}}^{2} t_{4}^{2} χ^{2} e_{2}^{2} - 2 F (x) F (y) t_{3} e_{0} e_{1} \\ + 2 F (x) t_{3} t_{4} \bar{X} χ e_{1} e_{2} - 2 F (y) t_{4} \bar{X} χ e_{0} e_{2} \end{matrix}

(19)

By considering expectations on each side of (19), the MSE of our proposed post-stratified estimator will be

M S E ({\hat{F}}_{p s p} (y)) = F^{2} (y) V_{0 p s} + F^{2} (x) t_{3}^{2} V_{1 p s} + {\bar{X}}^{2} t_{4}^{2} χ^{2} V_{2 p s} - 2 F (x) F (y) t_{3} V_{01 p s} + 2 F (x) t_{3} t_{4} \bar{X} χ V_{12 p s} - 2 F (y) t_{4} \bar{X} χ V_{02 p s}

(20)

We obtain values for

t_{3}

and

t_{4}

by applying the differentiation of Equation (20) separately with respect to

t_{3}

and

t_{4}

and equate them with zero. We obtain

t_{3} = \frac{F (y) [V_{01 p s} V_{2 p s} - V_{12 p s} V_{02 p s}]}{F (x) [V_{1 p s} V_{2 p s} - V_{12 p s}^{2}]}

and

t_{4} = \frac{F (y) [V_{1 p s} V_{02 p s} - V_{01 p s} V_{12 p s}]}{\bar{X} χ [V_{1 p s} V_{2 p s} - V_{12 p s}^{2}]}

After substituting

t_{3}

and

t_{4}

in (20), we have

\begin{matrix} M S E ({\hat{F}}_{p s p} (y)) \\ = F^{2} (y) \{V_{0 p s} + [\frac{V_{1 p s} w_{1}^{2} + V_{2 p s} w_{2}^{2} - 2 V_{01 p s} w_{1} w_{3} + 2 V_{12 p s} w_{1} w_{2} - 2 V_{02 p s} w_{2} w_{3}}{w_{3}^{2}}]\} \\ = F^{2} (y) (V_{0 p s} + R) (say) \end{matrix}

(21)

where

R = [\frac{V_{1 p s} w_{1}^{2} + V_{2 p s} w_{2}^{2} - 2 V_{01 p s} w_{1} w_{3} + 2 V_{12 p s} w_{1} w_{2} - 2 V_{02 p s} w_{2} w_{3}}{w_{3}^{2}}]

w_{1} = V_{01 p s} V_{2 p s} - V_{12 p s} V_{02 p s} w_{2} = V_{1 p s} V_{02 p s} - V_{01 p s} V_{12 p s} w_{3} = V_{1 p s} V_{2 p s} - V_{12 p s}^{2} .

5. Theoretical Framework

5.1. Efficiency Comparison of Existing Estimators and Proposed Estimator under St RS

By comparing Equation (16) with Equations (1)–(6), we discover the following conditions.

M S E ({\hat{F}}_{s t p} (y)) < M S E ({\hat{F}}_{S {R S}_{s t}} (y)) φ_{s t}^{2} φ_{s t 1}^{2} [1 + l_{1} n_{1}^{2} - 2 n_{1} + n_{2}^{2} r^{2} - 2 n_{1} n_{2} l_{2}] < V_{0 s t}

(22)

S E ({\hat{F}}_{s t p} (y)) < M S E ({\hat{F}}_{S {R S}_{s t}} (y)) φ_{s t}^{2} φ_{s t 1}^{2} [1 + l_{1} n_{1}^{2} - 2 n_{1} + n_{2}^{2} r^{2} - 2 n_{1} n_{2} l_{2}] < V_{0 s t}

(23)

M S E {\hat{F}}_{s t p} (y) < M S E ({\hat{F}}_{P e} (y)) φ_{s t}^{2} φ_{s t 1}^{2} [1 + l_{1} n_{1}^{2} - 2 n_{1} + n_{2}^{2} r^{2} - 2 n_{1} n_{2} l_{2}] < (V_{0 s t} + V_{1 s t} + 2 V_{01 s t})

(24)

M S E {\hat{F}}_{s t p} (y) < M S E ({\hat{F}}_{D e} (y)) φ_{s t}^{2} φ_{s t 1}^{2} [1 + l_{1} n_{1}^{2} - 2 n_{1} + n_{2}^{2} r^{2} - 2 n_{1} n_{2} l_{2}] < \frac{(V_{0 s t} V_{1 s t} - {(V_{01 s t})}^{2})}{V_{1 s t} V_{0 s t} - {(V_{01 s t})}^{2} + V_{0 s t}}

(25)

M S E {\hat{F}}_{s t p} (y) < M S E ({\hat{F}}_{R E e} (y)) φ_{s t}^{2} φ_{s t 1}^{2} [1 + l_{1} n_{1}^{2} - 2 n_{1} + n_{2}^{2} r^{2} - 2 n_{1} n_{2} l_{2}] < (V_{0 s t} + \frac{1}{4} V_{1 s t} - V_{01 s t})

(26)

M S E {\hat{F}}_{s t p} (y) < M S E ({\hat{F}}_{t_{k}} (y)) φ_{s t}^{2} φ_{s t 1}^{2} [1 + l_{1} n_{1}^{2} - 2 n_{1} + n_{2}^{2} r^{2} - 2 n_{1} n_{2} l_{2}] < [1 - γ_{1} t_{1} + {γ_{2} t}_{1}^{2} - 2 γ_{3} t_{2} + γ_{4} t_{2}^{2} + 2 γ_{5} t_{1} t_{2}]

(27)

5.2. Theoretical Conditions under Post-Stratification

The following conditions are derived when comparing the MSE of the proposed estimator with the MSEs of other considered existing estimators.

From Equations (21) and (7), we have

M S E ({\hat{F}}_{p s p} (y)) - V a r ({\hat{F}}_{(p s)} (y)) < 0 \frac{F^{2} (y) (V_{0 p s} + R)}{V_{0 p s} F^{2} (y) - \frac{1}{n^{2}} \sum_{h = 1}^{K} (1 - W_{h}) S_{y h}^{2}} < 1

(28)

From Equations (21) and (8),

M S E ({\hat{F}}_{p s p} (y)) - M S E ({\hat{F}}_{R e (p s)} (y)) < 0 \frac{(V_{0 p s} + R)}{\sum_{h = 1}^{k} w_{h}^{2} λ_{h} (C_{y h}^{2} + C_{x h}^{2} - 2 {R_{x y h} C}_{x h} C_{y h})} < 1

(29)

From Equations (21) and (9),

M S E ({\hat{F}}_{p s p} (y)) - M S E ({\hat{F}}_{P e (p s)} (y)) < 0 \frac{(V_{0 p s} + R)}{\sum_{h = 1}^{k} w_{h}^{2} λ_{h} (C_{y h}^{2} + C_{x h}^{2} + 2 {R_{x y h} C}_{x h} C_{y h})} < 1

(30)

From Equations (21) and (10),

S E ({\hat{F}}_{p s p} (y)) - {M S E}_{m i n} ({\hat{F}}_{D e (p s)} (y)) < 0 \frac{(V_{0 p s} + R)}{\frac{(V_{0 p s} V_{1 p s} - V_{01 p s}^{2})}{V_{0 p s} V_{1 p s} + V_{0 p s} - V_{01 p s}^{2}}} < 1

(31)

From Equations (21) and (11),

S E ({\hat{F}}_{p s p} (y)) - M S E ({\hat{F}}_{R e e (p s)} (y)) < 0 \frac{(V_{0 p s} + R)}{(V_{0 p s} + \frac{1}{4} V_{1 p s} - V_{01 p s})} < 1

(32)

From Equations (21) and (12),

M S E ({\hat{F}}_{p s p} (y)) - M S E ({\hat{F}}_{t_{k} (p s)} (y)) < 0 \frac{(V_{0 p s} + R)}{[1 - γ_{1 p s} t_{1 p s} + {γ_{2 p s} t}_{1 p s}^{2} - 2 γ_{3 p s} t_{2 p s} + γ_{4 p s} t_{2 p s}^{2} + 2 γ_{5 p s} t_{1 p s} t_{2 p s}]} < 1

(33)

6. Empirical Studies

6.1. Empirical Study in Stratified Random Sampling

Data Set-I: We have used the data from [35] to evaluate the suggested estimator’s relative effectiveness. The data consist of six strata. A sample of 180 observations is taken from a total of 923 observations. Table 2 presents estimations of the data as follows:

The functions of auxiliary variables are

\sum_{h = 1}^{k} W_{h} S_{h x} = 0.486847, \sum_{h = 1}^{k} W_{h} C_{h x} = 6.2793, \sum_{h = 1}^{k} W_{h} R_{h x y} = 0.852239 .

By using the above functions of known auxiliary variables, we have calculated the MSE and percentage relative efficiency (PRE) of the estimators in both Table 3 and Table 4.

By using the data in Table 2, we calculated MSE and PRE values for the pre-existing estimator and our proposed stratified estimator, and they are shown in Table 3 with suitable values for α = 1, β = 1, γ = 0, and

a_{s t} = c_{s t}

= 1,

b_{s t} = d_{s t} = 0.4868 .

Additionally, by substituting different values in relevant variables in our suggested estimator, we obtained the following types of estimators, ratio, product, etc. The MSE and PRE values of some estimators of the proposed class of estimators are presented in Table 4.

Data Set-II

In this numerical investigation, we utilized the data [37] detailing student behaviors and exam performances. The dataset encompasses information for 500 students, ensuring a diverse range of study patterns and their exam performances. Here, we maintained symmetry in our sampling process by dividing the data into six strata, as presented in Table 5. We selected a sample of 120 by using the Neyman allocation method. We focused on exam scores as the study variable, reflecting the student’s score in an exam, while study hours served as an auxiliary variable, indicating the number of hours a student dedicated to exam preparation. We aim to predict students’ exam scores based on their study hours, thereby emphasizing the symmetry of representation across different strata in our analysis.

The functions of auxiliary variables are

\sum_{h = 1}^{k} {W_{h}^{2} S}_{h x} = 0.023, \sum_{h = 1}^{k} W_{h}^{2} C_{h x} = 1.126 .

Based on the data in Table 6, it is evident that the MSE of our proposed estimator is lower compared to all other existing estimators. Additionally, the PRE of our proposed estimator is notably high in comparison. This suggests that our proposed estimator demonstrates superior precision when compared to other estimators.

The line graphs illustrating the PRE results for Data Set I and Data Set II, obtained from Table 3 and Table 6, are displayed in Figure 1 and Figure 2, respectively.

6.2. Empirical Validation under Post-Stratification

Data Set-III: Source [38].

In Data Set,

Y

represents the apple production amount in 1999, and

X

represents the number of apple trees in 1999. The data statistics are available in Table 7. We consider

Y

as the study variable and

X

as the auxiliary variable.

Utilizing the statistical data provided in Table 7, we computed the MSE and PRE values, which are summarized in Table 8. This table allows us to assess the effectiveness of the proposed estimators compared to others.

The line graphs depicting the PRE results for Data Set II and Data Set III, derived from Table 8, are showcased in Figure 3 and Figure 4, correspondingly.

7. Results and Discussion

In this study, we have proposed two novel estimators to estimate the CDF of a study variable by employing the auxiliary variables’ information under stratified random sampling and in post-stratification.

The first estimator is proposed under St RS, which contains a combination of estimators presented in Equation (13). By taking suitable constants in place of

n_{1}, n_{2}, α, β, γ

and functions of auxiliary variable

x

or constants in places of

a_{s t}, b_{s t}, c_{s t}, d_{s t},

we obtain efficient results for our proposed estimators. Because the proposed estimator contains a class of estimators, it has several existing estimators in it. Because we used suitable values in the functions or constants, we have different estimators, which are represented in Table 1. Here, two data sets were used to prove the efficiency of the proposed estimator. The derived conditions are available in Equations (22)–(27). Data Set-I is taken from [35], and the numerical study is presented in Section 6. From Table 3, we can observe the results; the proposed estimator

{\hat{F}}_{s t p} (y)

Outperform other estimators in terms of MSE and PRE. Data Set-II is extracted from the website https://doi.org/10.34740/KAGGLE/DSV/7623777 (accessed on 29 February 2024). Table 5 presents all the values needed for the calculation of MSEs. From Table 6, among the estimators,

{\hat{F}}_{s t p} (y)

stands out with its remarkably low MSE of 0.000057 and a high PRE of 194.876102, underscoring its superior predictive accuracy and demonstrating greater symmetry than usual unbiased [1,2,4,13,34] estimators.

The second estimator we proposed in this study is under post-stratification with constants

t_{3}, t_{4}, χ

and

ψ

. We derived the equations of bias and MSE up to the first degree of approximation, and can find the theoretical conditions in Section 5 from Equations (28)–(33). To prove the efficiency of the proposed estimator in post-stratification, we have utilized two data sets. We have taken the information of

Y

,

X,

and

\bar{X}

from the Data Set-II and III. We can observe from Table 8, that the MSE value of the proposed estimator is low and the relative efficiency values are high compared with the considered estimators, which is the same as we can observe from the figures. From Table 8, the comparative analysis of various estimators applied to Data Set-II and Data Set-III reveals distinct performance characteristics. Notably, the estimator

{\hat{F}}_{p s p} (y)

consistently exhibits superior predictive accuracy, as evidenced by its low MSE value and consistently high PRE across both datasets. Conversely,

{\hat{F}}_{P e (p s)} (y)

demonstrates poor performance, with significantly higher MSE values and lower PRE, suggesting limited predictive capability. Among the estimators,

{\hat{F}}_{R e (p s)} (y)

,

{\hat{F}}_{D e (p s)} (y)

, and

{\hat{F}}_{R E e (p s)} (y)

present moderate performance, displaying relatively lower MSE and higher PRE compared to

{\hat{F}}_{P e (p s)} (y)

but not reaching the levels of

{\hat{F}}_{p s p} (y)

.

From Figure 1 and Figure 3, a striking trend emerges as the plotted trend line gracefully ascends, embodying our recommended estimator’s trajectory. In contrast, Figure 2 reveals a consistent decline in MSE values, notably showcasing the diminishing errors of both [34] and our proposed estimator, labeled as 6 and 7, respectively. Figure 4 accentuates the nearly identical MSE values of the second and fourth estimators, hinting at commendable performance, albeit not surpassing the prowess demonstrated by our proposed estimator. Examining Figure 3, a clear victor emerges as the proposed estimator outshines its counterparts in both Dataset-II and Dataset-III, closely trailed by [4], in both datasets. Conversely, ref. [2] presents a lackluster performance across both datasets, marking it as the weakest contender. Figure 4 mirrors this pattern, with our proposed method boasting the lowest MSE followed closely by estimator [4] across both datasets. Notably, ref. [2] and the classical estimator struggle to keep pace, recording notably higher MSE values in Dataset-I and Dataset-II, respectively. Hence, the evidence from Figure 3 and Figure 4 unequivocally supports the superiority of our proposed estimator over its counterparts, a conclusion further reinforced by the insights gleaned from Figure 1 and Figure 2. Table 3 serves as a comprehensive showcase of MSE and PRE metrics for existing estimators juxtaposed with our proposed solution, listed as serial No. 7. Notably, our proposed estimator garners the lowest MSE and the highest PRE, setting a benchmark closely followed by [34]. Table 4 corroborates this finding, further establishing the pre-eminence of our proposed estimator. Additionally, Table 6 unveils the performance metrics for Dataset-II, highlighting once more the supremacy of our proposed method, trailed by the estimator [13]. This consistent dominance across datasets underscores the inconsistency plaguing existing estimators, a testament to the robustness and reliability of our proposed solution.

8. Conclusions

This study introduces two unique estimators that are precisely built to assess the limited population distribution function within the realms of stratified random sampling and post-stratification, ensuring symmetry in the sampling process. The study illustrates the outstanding efficiency of these estimators in comparison to conventional approaches across both sampling schemes by exploiting three unique datasets, including real-world data encompassing student behavior and exam results. Through complete empirical validation, the estimators routinely beat their counterparts in terms of both mean square error and percentage relative efficiency, demonstrating their ability to perform in practical settings. Furthermore, the study provides important insights into the predictive accuracy of educational assessments, as demonstrated by the successful prediction of students’ exam scores based on study hours using the proposed estimator. This study not only introduces novel approaches to long-standing survey sample challenges but also reveals avenues for improving prediction accuracy in educational assessments. The convergence of theoretical derivations and empirical validations emphasizes the proposed estimators’ resilience and versatility, ensuring symmetry in their potential use across a diverse range of sampling settings. The study of Figure 1, Figure 2, Figure 3 and Figure 4 demonstrates the estimators’ higher efficiency, establishing their place as pioneering contributions to the field of survey sampling. Additionally, fundamental ideas such as non-response analysis and calibration approaches are proposed to improve the resilience of estimators across different data sets and settings. In conclusion, the findings reflect a substantial advancement in the field of survey sampling, with major implications for future research efforts. As research into potential expansions and modifications of the estimators continues, there is a concerted attempt to improve their effectiveness and usability in practical contexts, maintaining the trend of innovation and advancement in survey sampling procedures.

Author Contributions

Methodology, G.R.V.T. and F.D.; Software, G.R.V.T. and F.D.; Formal analysis, O.A.; Data curation, F.D.; Writing—original draft, G.R.V.T.; Writing—review & editing, F.D. and O.A.; Visualization, F.D.; Supervision, F.D.; Funding acquisition, O.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No primary data has been used in the article.

Acknowledgments

We highly appreciate the efforts of the reviewers and editors for their efforts towards the improvement of the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Cochran, W.G. The estimation of the yields of cereal experiments by sampling for the ratio of grain to total produce. J. Agric. Sci. 1940, 30, 262–275. [Google Scholar] [CrossRef]
Murthy, M.N. Product method of estimation. Sankhya. Indian J. Stat. Ser. A 1964, 26, 69–74. [Google Scholar]
Holt, D.; Smith, T.F. Post-stratification. J. R. Stat. Soc. Ser. A (Gen.) 1979, 142, 33–46. [Google Scholar] [CrossRef]
Rao, T.J. On certail methods of improving ratio and regression estimators. Commun. Stat.-Theory Methods 1991, 20, 3325–3340. [Google Scholar] [CrossRef]
Chambers, R.L.; Dorfman, A.H.; Hall, P. Properties of estimators of the finite population distribution function. Biometrika 1992, 79, 577–582. [Google Scholar] [CrossRef]
Little, R.J. Post-stratification: A modeler’s perspective. J. Am. Stat. Assoc. 1993, 88, 1001–1012. [Google Scholar] [CrossRef]
Rao, J.N.K. Estimating totals and distribution functions using auxiliary information at the estimation stage. J. Off. Stat. 1994, 10, 153–165. [Google Scholar]
Silva, P.L.D.; Skinner, C.J. Estimating distribution functions with auxiliary information using poststratification. J. Off. Stat. 1995, 11, 277–294. [Google Scholar]
Wang, S.; Dorfman, A.H. A new estimator for the finite population distribution function. Biometrika 1996, 83, 639–652. [Google Scholar] [CrossRef]
Abu-Dayyeh, W.A.; Ahmed, M.S.; Ahmed, R.A.; Muttlak, H.A. Some estimators of a finite population mean using auxiliary information. Appl. Math. Comput. 2003, 139, 287–298. [Google Scholar] [CrossRef]
Kadilar, C.; Cingi, H. A new ratio estimator in stratified random sampling. Commun. Stat. Theory Methods 2005, 34, 597–602. [Google Scholar] [CrossRef]
Koyuncu, N.; Kadilar, C. Family of estimators of population mean using two auxiliary variables in stratified random sampling. Commun. Stat. Theory Methods 2009, 38, 2398–2417. [Google Scholar] [CrossRef]
Singh, R.; Chauhan, P.; Sawan, N.; Smarandache, F. Improvement in estimating the population mean using an exponential estimator in simple random sampling. Int. J. Stat. Econ. 2009, 3, 13–18. [Google Scholar]
Singh, H.P.; Solanki, R.S. Efficient ratio and product estimators in stratified random sampling. Commun. Stat.-Theory Methods 2013, 42, 1008–1023. [Google Scholar] [CrossRef]
Grover, L.K.; Kaur, P. A generalized class of ratio type exponential estimators of population mean under linear transformation of auxiliary variable. Commun. Stat.-Simul. Comput. 2014, 43, 1552–1574. [Google Scholar] [CrossRef]
Munoz, J.F.; Arcos, A.; Alvarez, E.; Rueda, M. New ratio and difference estimators of the finite population distribution function. Math. Comput. Simul. 2014, 102, 51–61. [Google Scholar] [CrossRef]
Tailor, R.; Tailor, R.; Chouhan, S. Improved ratio-and product-type exponential estimators for population mean in case of post-stratification. Commun. Stat.-Theory Methods 2017, 46, 10387–10393. [Google Scholar] [CrossRef]
Muneer, S.; Shabbir, J.; Khalil, A. Estimation of finite population mean in simple random sampling and stratified random sampling using two auxiliary variables. Commun. Stat.-Theory Methods 2017, 46, 2181–2192. [Google Scholar] [CrossRef]
Shabbir, J.; Gupta, S. Estimation of finite population mean in simple and stratified random sampling using two auxiliary variables. Commun. Stat.-Theory Methods 2017, 46, 10135–10148. [Google Scholar] [CrossRef]
Singh, G.N.; Khalid, M.; Sharma, A.K. Some efficient classes of estimators of population mean in two-phase successive sampling under random nonresponse. Commun. Stat.-Theory Methods 2017, 46, 12194–12209. [Google Scholar] [CrossRef]
Vishwakarma, G.K. An alternative to ratio estimator in post-stratification. Commun. Stat.-Theory Methods 2018, 47, 1989–2000. [Google Scholar] [CrossRef]
Gupta, R.K.; Yadav, S.K. Improved estimation of population mean using information on size of the sample. Am. J. Math. Stat. 2018, 8, 27–35. [Google Scholar]
Haq, A.; Shabbir, J. An improved class of estimators of finite population mean in simple random sampling using an auxiliary attribute. J. Stat. Theory Pract. 2018, 12, 282–289. [Google Scholar] [CrossRef]
Pal, S.K.; Singh, P. Estimation of finite population mean using auxiliary information in presence of non-response. Commun. Stat.-Simul. Comput. 2018, 47, 143–165. [Google Scholar] [CrossRef]
Zaman, T.; Kadilar, C. Novel family of exponential estimators using information of auxiliary attribute. J. Stat. Manag. Syst. 2019, 22, 1499–1509. [Google Scholar] [CrossRef]
Hussain, S.; Ahmad, S.; Saleem, M.; Akhtar, S. Finite population distribution function estimation with dual use of auxiliary information under simple and stratified random sampling. PLoS ONE 2020, 15, e0239098. [Google Scholar] [CrossRef] [PubMed]
Singh, G.N.; Khalid, M. Some imputation methods to compensate with non-response for estimation of population mean in two-occasion successive sampling. Commun. Stat.-Theory Methods 2020, 49, 3329–3351. [Google Scholar] [CrossRef]
Ahmad, S.; Arslan, M.; Khan, A.; Shabbir, J. A generalized exponential-type estimator for population mean using auxiliary attributes. PLoS ONE 2021, 16, e0246947. [Google Scholar] [CrossRef] [PubMed]
Bhushan, S.; Kumar, A.; Singh, S. Some efficient classes of estimators under stratified sampling. Commun. Stat.-Theory Methods 2021, 52, 1767–1796. [Google Scholar] [CrossRef]
Mradula Yadav, S.K.; Varshney, R.; Dube, M. Efficient estimation of population mean under stratified random sampling with a linear cost function. Commun. Stat.-Simul. Comput. 2021, 50, 4364–4387. [Google Scholar] [CrossRef]
Ahmad, S.; Hussain, S.; Zahid, E.; Iftikhar, A.; Hussain, S.; Shabbir, J.; Aamir, M. A Simulation Study: Population Distribution Function Estimation Using Dual Auxiliary Information under Stratified Sampling Scheme. Math. Probl. Eng. 2022, 2022, 3263022. [Google Scholar] [CrossRef]
Triveni, G.R.V.; Danish, F. Heuristical Approach for Optimizing Population Mean Using Ratio Estimator in Stratified Random Sampling. J. Reliab. Stat. Stud. 2023, 16, 137–152. [Google Scholar] [CrossRef]
Zaman, T.; Bulut, H. An efficient family of robust-type estimators for the population variance in simple and stratified random sampling. Commun. Stat.-Theory Methods 2023, 52, 2610–2624. [Google Scholar] [CrossRef]
Tiwari, K.K.; Bhougal, S.; Kumar, S. A general class of estimators in stratified random sampling. Commun. Stat.-Simul. Comput. 2023, 52, 442–452. [Google Scholar] [CrossRef]
Triveni, G.R.V.; Danish, F. Exploring the dependability of Combined Ratio Estimators in Stratified Ranked Set Sampling: Insights from COVID-19 data. Alex. Eng. J. 2024, 92, 267–272. [Google Scholar] [CrossRef]
Kocyigit, E.G.; Kadilar, C. Information theory approach to ranked set sampling and new sub-ratio estimators. Commun. Stat.-Theory Methods 2024, 53, 1331–1353. [Google Scholar] [CrossRef]
MrSimple07. Student Exam Performance Prediction [Data Set]. Kaggle. 2024. Available online: https://www.kaggle.com/datasets/mrsimple07/student-exam-performance-prediction (accessed on 29 February 2024).
Kadilar, C.; Cingi, H. Ratio estimators in stratified random sampling. Biom. J. 2003, 45, 218–225. [Google Scholar] [CrossRef]

Figure 1. PRE values of the estimators for Data Sets I and II.

Figure 2. MSE values of the estimators for Data Sets I and II.

Figure 3. PRE values of the estimators for Data Sets II and III.

Figure 4. MSE values of the estimators for Data Sets II and III.

Table 1. Several recognized estimators from the proposed class.

S. No	$n_{1}$	$n_{2}$	$a_{s t}$	$b_{s t}$	$c_{s t}$	$d_{s t}$	$α$	$γ$	$β$	Converge Estimator
1.	1	0	0	0	0	0	0	0	0	${\hat{F}}_{S {R S}_{s t}} (y)$
2.	1	0	1	0	1	0	1	0	0	${\hat{F}}_{R e} (y)$
3.	1	0	1	0	1	0	-1	0	0	${\hat{F}}_{P e} (y)$
4.	1	0	1	0	1	0	0	1	0	${\hat{F}}_{P e} (y)$
5.	$n_{1}$	$n_{2}$	-	-	-	-	0	0	0	${\hat{F}}_{D e} (y)$
6.	1	0	-	-	-	-	0	0	1	${\hat{F}}_{R E e} (y)$
7.	$n_{1}$	$n_{2}$	$a_{s t}$	$b_{s t}$	$c_{s t}$	$d_{s t}$	$α$	$0$	$β$	${\hat{F}}_{t_{k}} (y)$

Table 2. Summary statistics for the Data Set-I.

h	$N_{h}$	$n_{h}$	$W_{h}$	$λ_{h}$	$F_{h} (y)$	$F_{h} (x)$	$C_{y h}$	$C_{x h}$	$R_{x y h}$	$S_{y x h}$	F(y)	F(x)
1	127	31	0.1375	0.0244	0.3543	0.3779	0.197955	0.21084	4.0597	0.0103	0.0487	0.0520
2	117	21	0.1267	0.039	0.4188	0.4872	0.575675	0.669312	10.8358	0.0356	0.0531	0.0617
3	103	29	0.1115	0.0248	0.4272	0.466	0.275157	0.300381	5.7725	0.0143	0.0476	0.0520
4	170	38	0.1841	0.0204	0.5765	0.6118	0.19553	0.20751	1.8412	0.0220	0.1061	0.1126
5	205	22	0.2221	0.0406	0.6146	0.6537	0.663306	0.705382	4.8587	0.0963	0.1365	0.1452
6	201	39	0.2177	0.0207	0.5025	0.3532	0.154403	0.108518	1.4113	0.0119	0.1094	0.0769

Table 3. A comparison of the MSEs and PREs of considered pre-existing estimators and our proposed estimator.

S. No	Estimator	MSE	PRE
1.	${\hat{F}}_{S {R S}_{s t}} (y)$	0.0488	100
2.	${\hat{F}}_{R e} (y)$	0.0928	52.58
3.	${\hat{F}}_{P e} (y)$	0.0972	50.21
4.	${\hat{F}}_{D e} (y)$	0.0414	117.87
5.	${\hat{F}}_{R E e} (y)$	0.0593	82.30
6.	${\hat{F}}_{t_{k}} (y)$	0.0223	218.83
7.	${\hat{F}}_{s t p} (y)$	0.0056	871.42

Table 4. MSEs of our proposed estimator.

$a_{s t}$	$b_{s t}$	$c_{s t}$	$d_{s t}$	$α$	$γ$	$β$	Estimator	MSE	PRE
0.8522	0.4868	0.8522	0.4868	0	−1	0	${\hat{F}}_{s t p 1} (y)$	0.0092	530.43
6.2793	1	6.2793	1	−1	0	0	${\hat{F}}_{s t p 2} (y)$	0.0085	574.12
6.2793	1	6.2793	1	0	−1	0	${\hat{F}}_{s t p 3} (y)$	0.0085	574.12
1	0.4868	1	0.4868	1	0	1	${\hat{F}}_{s t p 4} (y)$	0.0055	887.27
0.8522	0.4868	0.8522	0.4868	1	0	1	${\hat{F}}_{s t p 5} (y)$	0.0053	920.75
1	6.2793	1	6.2793	−1	−1	−1	${\hat{F}}_{s t p 6} (y)$	0.0052	938.46
0.8522	0.4868	0.8522	0.4868	0	1	1	${\hat{F}}_{s t p 7} (y)$	0.0052	938.46
1	6.2793	1	6.2793	0	0	−1	${\hat{F}}_{s t p 8} (y)$	0.0052	938.46
0.4868	0.8522	0.4868	0.8522	−1	1	−1	${\hat{F}}_{s t p 9} (y)$	0.0052	938.46
1	0.4868	1	0.4868	0	−1	0	${\hat{F}}_{s t p 10} (y)$	0.0044	1109.09
0.8522	1	0.8522	1	1	−1	0	${\hat{F}}_{s t p 11} (y)$	0.0042	1161.90
0.4868	1	0.4868	1	1	−2	0	${\hat{F}}_{s t p 12} (y)$	0.0039	1251.28
0.4868	6.2793	0.4868	6.2793	1	0	1	${\hat{F}}_{s t p 13} (y)$	0.0010	4800.00
1	0.8522	1	0.8522	1	0	1	${\hat{F}}_{s t p 14} (y)$	0.00088	5545.45
0.4868	6.2793	0.4868	6.2793	0	−1	1	${\hat{F}}_{s t p 15} (y)$	0.00086	5674.42

Table 5. Data statistics for Data Set-II.

h	$N_{h}$	$n_{h}$	$W_{h}$	$λ_{h}$	$F_{h} (y)$	$F_{h} (x)$	$S_{y h}$	$S_{x h}$	$S_{y x h}$	F(y)	F(x)
1	91	23	0.182	0.032	0.110	0.143	0.139	0.157	0.0103	0.0487	0.0520
2	82	19	0.164	0.040	0.110	0.146	0.132	0.151	0.0356	0.0531	0.0617
3	89	22	0.178	0.034	0.124	0.101	0.145	0.132	0.0143	0.0476	0.0520
4	89	21	0.178	0.036	0.112	0.112	0.139	0.139	0.0220	0.1061	0.1126
5	72	18	0.144	0.042	0.083	0.139	0.097	0.139	0.0963	0.1365	0.1452
6	77	18	0.154	0.043	0.117	0.117	0.132	0.132	0.0119	0.1094	0.0769

Table 6. MSE and PRE values of the estimators.

S. No	Estimator	MSE	PRE
1.	${\hat{F}}_{S {R S}_{s t}} (y)$	0.000110	100.000000
2.	${\hat{F}}_{R e} (y)$	0.000073	151.118987
3.	${\hat{F}}_{P e} (y)$	0.000341	32.373953
4.	${\hat{F}}_{D e} (y)$	0.000249	44.335351
5.	${\hat{F}}_{R E e} (y)$	0.000068	163.389178
6.	${\hat{F}}_{t_{k}} (y)$	0.000085	129.674930
7.	${\hat{F}}_{s t p} (y)$	0.000057	194.876102

Table 7. Data statistics for Data Set-III.

h	1	2	3	4	5	6
$N_{h}$	106	106	94	171	204	173
$n_{h}$	9	17	38	67	7	2
$W_{h}$	0.1241	0.1241	0.1101	0.2002	0.2389	0.2026
$λ_{h}$	0.1017	0.0494	0.0157	0.0091	0.138	0.4942
$F_{h} (y)$	0.5872	0.5189	0.3298	0.3684	0.4657	0.7052
$F_{h} (x)$	0.5472	0.566	0.3404	0.3801	0.4657	0.7225
${\bar{X}}_{h}$	24376	27422	72410	74365	26442	9844
$S_{y h}$	0.495	0.502	0.4727	0.4838	0.5	0.4573
$S_{x h}$	0.5001	0.4979	0.4764	0.4868	0.4965	0.4490
$S_{\bar{x} h}$	49189	5746	160757	285603	45403	18794
$R_{x y h}$	0.7722	0.8330	0.7854	0.7755	0.6750	0.7319
$R_{y \bar{x} h}$	−0.4470	−0.4370	−0.2957	−0.1848	−0.3929	−0.5598
$R_{x \bar{x} h}$	−0.4523	−0.4816	−0.3087	−0.1936	−0.4129	−0.6102

Table 8. MSE and PRE values of the estimators.

S. No	Estimator	Data Set-II		Data Set-III
S. No	Estimator	MSE	PRE	MSE	PRE
1.	${\hat{F}}_{(p s)} (y)$	0.037	100.000	0.759	100.000
2.	${\hat{F}}_{R e (p s)} (y)$	0.018	205.556	0.383	198.018
3.	${\hat{F}}_{P e (p s)} (y)$	0.126	29.365	2.615	29.013
4.	${\hat{F}}_{D e (p s)} (y)$	0.016	231.250	0.341	222.647
5.	${\hat{F}}_{R E e (p s)} (y)$	0.019	194.737	0.386	196.642
6.	${\hat{F}}_{t_{k} (p s)} (y)$	0.018	205.556	0.370	204.822
7.	${\hat{F}}_{p s p} (y)$	0.016	231.250	0.331	229.412

In both Data Sets II and III, we can observe the efficiency of the proposed post-stratified estimator compared to other considered estimators in terms of MSE and PRE.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Triveni, G.R.V.; Danish, F.; Albalawi, O. Advancing Survey Sampling Efficiency under Stratified Random Sampling and Post-Stratification: Leveraging Symmetry for Enhanced Estimation Accuracy in the Prediction of Exam Scores. Symmetry 2024, 16, 604. https://doi.org/10.3390/sym16050604

AMA Style

Triveni GRV, Danish F, Albalawi O. Advancing Survey Sampling Efficiency under Stratified Random Sampling and Post-Stratification: Leveraging Symmetry for Enhanced Estimation Accuracy in the Prediction of Exam Scores. Symmetry. 2024; 16(5):604. https://doi.org/10.3390/sym16050604

Chicago/Turabian Style

Triveni, Gullinkala Ramya Venkata, Faizan Danish, and Olayan Albalawi. 2024. "Advancing Survey Sampling Efficiency under Stratified Random Sampling and Post-Stratification: Leveraging Symmetry for Enhanced Estimation Accuracy in the Prediction of Exam Scores" Symmetry 16, no. 5: 604. https://doi.org/10.3390/sym16050604

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Advancing Survey Sampling Efficiency under Stratified Random Sampling and Post-Stratification: Leveraging Symmetry for Enhanced Estimation Accuracy in the Prediction of Exam Scores

Abstract

1. Introduction

2. Background and Notations

2.1. Notations in Stratified Random Sampling

2.2. Notations in Post-Stratification

3. Literature Review

3.1. Pre-Existing Estimators under Stratified Random Sampling

3.2. Existing Estimators in Post-Stratification

4. Proposed Estimators

4.1. Proposed Estimator in Stratified Random Sampling

Bias and MSE of Proposed Estimator ${\hat{F}}_{s t p} (y)$

4.2. Proposed Estimator in Post-Stratification

Bias and MSE of Proposed Estimator ${\hat{F}}_{p s p} (y)$

5. Theoretical Framework

5.1. Efficiency Comparison of Existing Estimators and Proposed Estimator under St RS

5.2. Theoretical Conditions under Post-Stratification

6. Empirical Studies

6.1. Empirical Study in Stratified Random Sampling

6.2. Empirical Validation under Post-Stratification

7. Results and Discussion

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Advancing Survey Sampling Efficiency under Stratified Random Sampling and Post-Stratification: Leveraging Symmetry for Enhanced Estimation Accuracy in the Prediction of Exam Scores

Abstract

1. Introduction

2. Background and Notations

2.1. Notations in Stratified Random Sampling

2.2. Notations in Post-Stratification

3. Literature Review

3.1. Pre-Existing Estimators under Stratified Random Sampling

3.2. Existing Estimators in Post-Stratification

4. Proposed Estimators

4.1. Proposed Estimator in Stratified Random Sampling

Bias and MSE of Proposed Estimator F ^ s t p y

4.2. Proposed Estimator in Post-Stratification

Bias and MSE of Proposed Estimator F ^ p s p y

5. Theoretical Framework

5.1. Efficiency Comparison of Existing Estimators and Proposed Estimator under St RS

5.2. Theoretical Conditions under Post-Stratification

6. Empirical Studies

6.1. Empirical Study in Stratified Random Sampling

6.2. Empirical Validation under Post-Stratification

7. Results and Discussion

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Bias and MSE of Proposed Estimator ${\hat{F}}_{s t p} (y)$

Bias and MSE of Proposed Estimator ${\hat{F}}_{p s p} (y)$