1. Introduction
The goal of survey sampling is to collect precise information on the characteristics of the population in order to maximize the effectiveness of the estimators under investigation while reducing expenses, time, and human efforts. There are a few extreme values in many populations, and it can be quite sensitive to estimate unknown population characteristics without taking these data into account. In some cases, the results can be inflated or underestimated. It is important to note that when dealing with extreme values in the dataset, the effectiveness of classical estimators tends to decrease in terms of mean square error (
). There may be a temptation to remove such data from the sample. To effectively confront this challenge, it is essential to incorporate this information into the process of estimating the population characteristics. By performing a linear transformation on the known minimum and maximum values of the auxiliary variable, ref. [
1] provided two estimators. Such designs were not studied further after that, until the works of [
2]. They employed the idea of using extreme values on various estimators of the finite population mean. Under the extreme values, ref. [
3] improved the estimation of the finite population mean using a stratified random sampling strategy. For more details, see [
4,
5,
6,
7] and references therein.
It is difficult to control variability in applications, and the estimation problem of finite population variance is a significant concern. Researchers encounter this issue in biological and agricultural studies, which makes the desired outcomes appear unpredictable. A careful approach to utilizing auxiliary information can improve the estimator’s accuracy. For estimating the finite population variance, a number of researchers have proposed different kinds of estimators, including [
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21].
In this article, the extreme values of the auxiliary variable are retained in the data and used as auxiliary information. As discussed by [
10], we propose an improved class of estimators in this article for estimating the finite population variance utilizing the known information on the extreme values of the auxiliary variable under a simple random sampling scheme for further improvement.
The next sections of this article are as follows.
Appendix A introduces the concepts and notations. Some existing estimators are described in
Section 2. In
Section 3, we provide an in-depth discussion of our proposed class of estimators.
Section 4 provides the mathematical comparison.
Section 5 presents a simulation study to generate six different artificial populations by using different probability distributions to verify theoretical results discussed in
Section 4. Some numerical examples are also listed in this section to illustrate our theoretical results. Finally, some conclusions and ideas for further research are discussed in
Section 6.
2. Existing Estimators
In this section, we consider the existing estimators of the finite population variances and compare them with our proposed class of estimators.
The usual variance estimator of
for population variance is given by:
For population variance
, [
12] proposed a ratio estimator, which is given by:
The bias and
of
are expressed as follows:
and
The linear regression estimator
proposed by [
22], is defined as:
where
is the sample regression coefficient.
The
of the estimator
is expressed as follows:
where
.
For population variance under simple random sampling, [
9] proposed an exponential ratio type estimator
, which is defined as:
The bias and
of
are expressed as follows:
and
Using the kurtosis of an auxiliary variable in simple random sampling, ref. [
18] suggested a ratio-type estimator
, which is defined as:
The bias and
of
are expressed as follows:
and
where
.
According to [
13], some ratio estimators are defined as:
and
The bias and
of
are expressed as follows:
and
where
.
3. Proposed Estimator
In this section, motivated by [
10], an improved class of estimators is introduced by utilizing the known minimum and maximum values of the auxiliary variable to estimate the finite population variance. The proposed estimators is defined as:
where
represent known constant values, whereas the auxiliary variables’ parameters are
and
. We derive the various classes of the suggested estimator from (
18), which are listed in
Table 1, where
Now, we rewrite (
18) in terms of errors to get the bias and the
of the suggested estimator
, that is:
where
and
.
Applying the Taylor series to the first approximation order, we obtain:
Using (
20), the bias of
is given by:
By squaring both sides of (
20) and taking the expected value, we obtained a first-order approximate
, which is given by the following equation:
The bias and
for
, can be rewritten by substituting the known constant values of
into (
21) and (
22), and after the simple simplifications, we obtain:
and
4. Mathematical Comparison
In this section, we discuss the comparisons between the proposed class of estimators , with other existing estimators , and .
Condition (i): By (
1) and (
24):
Condition (ii): By (
4) and (
24):
Condition (iii): By (
6) and (
24):
Condition (iv): By (
9) and (
24):
Condition (v): By (
12) and (
24):
Condition (vi): By (
17) and (
24):
5. Numerical Comparison
In this section, to examine the performances of the proposed class of estimators, we compare the MSEs of different estimators using one simulated and three real datasets.
5.1. Simulation Study
To validate the theoretical findings presented in
Section 4, we employ the concept from [
10] to carry out a simulation study. The auxiliary variable
X can be artificially generated into six distinct populations using the following probability distributions:
Population 1: ;
Population 2: ;
Population 3: ;
Population 4: ;
Population 5: ;
Population 6:
Subsequently, the variable of interest
Y is calculated as:
where
represents the correlation coefficient between the study and auxiliary variables, and
denotes the error term.
In R Software (latest v. 4.4.0), we considered into account the subsequent processes to determine the mean squared errors () of the suggested estimator:
Step 1: First, we use certain types of probability distributions to get a population of size 1500.
Step 2: From Step 1, we get the population total, as well as the minimum and maximum values of the auxiliary variable.
Step 3: We use simple random sampling without replacement (SRSWOR) to generate different sizes of sample for each population.
Step 4: Determine the values of all the estimators covered in this article for each sample size.
Step 5: Steps 3 and 4 are performed 60,000 times, and the findings for artificial populations are presented in
Table 2, while
Table 3 summarizes the results for real data sets.
Step 6: use the following formula to get the
of each estimator across all replications:
5.2. Numerical Examples
In order to check the effectiveness of the proposed estimator, we employed three actual datasets to compare the Mean Squared Errors MSEs of various estimators. The descriptions and summary statistics of the datasets are provided below:
Y: The total enrollment of students in 2012 and
X: Government elementary and secondary schools in 2012. The following are the summary statistics:
Y: Departmental employment levels in 2012 and
X: Number of factories the departments registered in 2012. The following are the summary statistics:
Data 3. (
Source: [
23], p. 24)
Y: Food expenses related to the family’s employment and
X: Families’ weekly income. The following are the summary statistics:
To evaluate the performance of the proposed class of estimators, we employed simulation studies and three real datasets. In order to compare various estimators, the
criterion is applied.
Table 2 presents the
values of the proposed and existing estimators derived from the simulation study, whereas
Table 3 presents the results for real datasets. Furthermore, various diagrams are used to present the
of the proposed and existing estimators for both simulation studies and real datasets. The following are some general findings:
Regarding all simulation scenarios and real datasets,
Table 2 and
Table 3 demonstrate that the
values of each proposed estimator are lower when compared to the existing estimators described in the literature. This confirms the proposed estimators’ superior performance in comparison to existing estimators.
The
of all proposed estimators are lower than those of existing estimators, as illustrated in
Figure 1,
Figure 2 and
Figure 3 for simulation studies and real datasets because the lines in the graphs are going in the downward direction. Consequently, there exists an inverse relationship between the value of
for both the proposed and existing estimators.
6. Conclusions
In this article, we introduced a set of efficient estimators for estimating the finite population variance. These estimators utilize the known minimum and maximum values of the auxiliary variable. To compare the properties of the proposed estimators with existing ones, we presented theoretical conditions in
Section 4 that demonstrate the superior efficiency of the proposed estimators. To validate these conditions, we conducted a simulation study and analyzed several empirical datasets. The results, as shown in
Table 2, indicate that the proposed estimators consistently outperform the existing estimators in terms of
. This observation is further supported by the empirical data presented in
Table 3, which confirms the theoretical findings in
Section 4. Based on both the simulation and empirical results, we conclude that the proposed estimators
exhibit greater efficiency compared to the other considered estimators. Among these proposed estimators,
is particularly preferred due to its lowest
.
However, we analyzed the properties of the proposed efficient class of estimators under a simple random sampling scheme. It is further possible to provide some new estimators in two-phase sampling technique, and our results can be helpful for finding more efficient estimators that can provide the least . Further research on this topic can help improve the estimators’ efficiency.
Author Contributions
Conceptualization, U.D. and J.W.; Methodology, U.D.; Software, U.D.; Validation, J.W.; Formal analysis, J.W. and O.A.; Investigation, J.W.; Resources, J.W. and O.A.; Data curation, U.D. and J.W.; Writing—original draft, U.D.; Writing—review & editing, U.D.; Visualization, U.D. and O.A.; Supervision, J.W.; Project administration, U.D. and O.A.; Funding acquisition, O.A. All authors have read and agreed to the published version of the manuscript.
Funding
This research work was funding by the National Social Science Foundation of China under grant number (20BTJ044).
Data Availability Statement
The real data are secondary and their sources are given in data section while the simulated data have been generated using R software.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A
Let us consider a finite population , where the i-th unit values of the auxiliary variable X and the study variable Y are represented by and , respectively. Consider the population means of the auxiliary variable and study variable to be and , respectively. Further assumed that the corresponding population variances of the auxiliary variable and study variable are and , respectively. We also know that the population correlation coefficients between Y and X is calculated as , respectively.
We select a random sample of size n units from the population by employing simple random sampling without replacement for the purpose to estimate the unknown population parameter . Let the auxiliary and study variables sample means be expressed by the formulas and . For these variables, the sample variances are and , respectively. Additionally, the coefficients of variation for the auxiliary and study variables are and , respectively.
To derive the biases and mean square errors for various estimators, we define the following terms:
such that
for
.
where
,
,
,
.
Also:
where
where
and
are the population coefficients of kurtosis.
References
- Mohanty, S.; Sahoo, J. A note on improving the ratio method of estimation through linear transformation using certain known population parameters. Sankhyā Indian J. Stat. Ser. 1995, 57, 93–102. [Google Scholar]
- Khan, M.; Shabbir, J. Some improved ratio, product, and regression estimators of finite population mean when using minimum and maximum values. Sci. World J. 2013, 2013, 431868. [Google Scholar] [CrossRef]
- Daraz, U.; Shabbir, J.; Khan, H. Estimation of finite population mean by using minimum and maximum values in stratified random sampling. J. Mod. Appl. Stat. Methods 2018, 17, 20. [Google Scholar] [CrossRef]
- Cekim, H.O.; Cingi, H. Some estimator types for population mean using linear transformation with the help of the minimum and maximum values of the auxiliary variable. Hacet. J. Math. Stat. 2017, 46, 685–694. [Google Scholar]
- Chatterjee, S.; Hadi, A.S. Regression Analysis by Example; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
- Khan, M. Improvement in estimating the finite population mean under maximum and minimum values in double sampling scheme. J. Stat. Appl. Probab. Lett. 2015, 2, 115–121. [Google Scholar]
- Walia, G.S.; Kaur, H.; Sharma, M. Ratio type estimator of population mean through efficient linear transformation. Am. J. Math. Stat. 2015, 5, 144–149. [Google Scholar]
- Alomair, M.A.; Gardazi, S.A.H.S. Hybrid class of robust type estimators for variance estimation using mean and variance of auxiliary variable. Heliyon 2024, 10, E31039. [Google Scholar] [CrossRef] [PubMed]
- Bahl, S.; Tuteja, R. Ratio and product type exponential estimators. J. Inf. Optim. Sci. 1991, 12, 159–164. [Google Scholar] [CrossRef]
- Daraz, U.; Khan, M. Estimation of variance of the difference-cum-ratio-type exponential estimator in simple random sampling. Res. Math. Stat. 2021, 8, 1899402. [Google Scholar] [CrossRef]
- Dubey, V.; Sharma, H. On estimating population variance using auxiliary information. Stat. Transit. New Ser. 2008, 9, 7–18. [Google Scholar]
- Isaki, C.T. Variance estimation using auxiliary information. J. Am. Stat. Assoc. 1983, 78, 117–123. [Google Scholar] [CrossRef]
- Kadilar, C.; Cingi, H. Ratio estimators for the population variance in simple and stratified random sampling. Appl. Math. Comput. 2006, 173, 1047–1059. [Google Scholar] [CrossRef]
- Shabbir, J.; Gupta, S. Some estimators of finite population variance of stratified sample mean. Commun. Stat. Theory Methods 2010, 39, 3001–3008. [Google Scholar] [CrossRef]
- Shabbir, J.; Gupta, S. Using rank of the auxiliary variable in estimating variance of the stratified sample mean. Int. J. Comput. Theor. Stat. 2019, 6, 171–181. [Google Scholar] [CrossRef]
- Singh, H.; Chandra, P. An alternative to ratio estimator of the population variance in sample surveys. J. Transp. Stat. 2008, 9, 89–103. [Google Scholar]
- Singh, H.P.; Solanki, R.S. A new procedure for variance estimation in simple random sampling using auxiliary information. J. Stat. Pap. 2013, 54, 479–497. [Google Scholar] [CrossRef]
- Upadhyaya, L.; Singh, H. An estimator for populationvariance that utilizes the kurtosis of an auxiliary variablein sample surveys. Vikram Math. J. 1999, 19, 14–17. [Google Scholar]
- Yadav, S.K.; Kadilar, C.; Shabbir, J.; Gupta, S. Improved family of estimators of population variance in simple random sampling. J. Stat. Theory Pract. 2015, 9, 219–226. [Google Scholar] [CrossRef]
- Yasmeen, U.; Noor-ul-Amin, M. Estimation of Finite Population Variance Under Stratified Sampling Technique. J. Reliab. Stat. Stud. 2021, 14, 565–584. [Google Scholar] [CrossRef]
- Zaman, T.; Bulut, H. An efficient family of robust-type estimators for the population variance in simple and stratified random sampling. Commun. Stat. Theory Methods 2023, 52, 2610–2624. [Google Scholar] [CrossRef]
- Watson, D.J. The estimation of leaf area in field crops. J. Agric. Sci. 1937, 27, 474–483. [Google Scholar] [CrossRef]
- Cochran, W.B. Sampling Techniques; John Wiley and Sons: Hoboken, NJ, USA, 1963. [Google Scholar]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).