Next Article in Journal
Innovation in Brazilian Industries: Analysis of Management Practices Using Fuzzy TOPSIS
Next Article in Special Issue
Mean Estimation for Time-Based Surveys Using Memory-Type Logarithmic Estimators
Previous Article in Journal
Consumer Acceptance and Adoption of AI Robo-Advisors in Fintech Industry
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Efficient Ratio-Cum-Exponential Estimator for Estimating the Population Distribution Function in the Existence of Non-Response Using an SRS Design

by
Ayesha Khalid
1,
Aamir Sanaullah
1,*,
Mohammed M. A. Almazah
2,3 and
Fuad S. Al-Duais
4,5
1
Department of Statistics, COMSATS University Islamabad, Lahore Campus, Islamabad 45550, Pakistan
2
Department of Mathematics, College of Science and Arts (Muhyil), King Khalid University, Muhyil 61421, Saudi Arabia
3
Department of Mathematics and Computer, College of Sciences, Ibb University, Ibb 70270, Yemen
4
Mathematics Department, College of Humanities and Science, Prince Sattam Bin Abudulaziz University, Al Aflaj 16278, Saudia Arabia
5
Administrative Department, Administrative Science College, Thamar University, Thamar 87246, Yemen
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(6), 1312; https://doi.org/10.3390/math11061312
Submission received: 22 January 2023 / Revised: 19 February 2023 / Accepted: 3 March 2023 / Published: 8 March 2023
(This article belongs to the Special Issue Survey Statistics and Survey Sampling: Challenges and Opportunities)

Abstract

:
To gain insight into various phenomena of interest, cumulative distribution functions (CDFs) can be used to analyze survey data. The purpose of this study was to present an efficient ratiocum-exponential estimator for estimating a population CDF using auxiliary information under two scenarios of non-response. Up to first-order approximation, expressions for the bias and mean squared error (MSE) were derived. The proposed estimator was compared theoretically and empirically, with the modified estimators. The proposed estimator was found to be better than the modified estimators based on present-relative efficiency PRE and MSE criteria under the specific conditions.

1. Introduction

It is a well-accepted fact in survey sampling that, under certain conditions, auxiliary information can provide precise estimates of population parameters such as the mean, median, standard deviation, totals, quantiles, and the cumulative distribution function (CDF), etc. If a linear and higher correlation is observed between the study variable Y and auxiliary variable X, researchers often use a traditional estimator for the mean, like ratio, product, and regression estimators, to estimate a population mean. The literature includes a significant amount of work for estimating different parameters of a population, for example, see [1,2,3,4,5,6,7].These studies propose improved ratio-, product-, and regression-type estimators for estimating the mean and variance of a population using auxiliary variables.
Non-response is an issue that cannot be avoided in complex sample surveys, and it can be found in surveys that involve human responses. Language problems, inaccurate return addresses, a lack of information, and the sensitivity of the survey question(s), among many other reasons, can play a part in causing this issue. For example, an individual may be reluctant to provide salary information. Non-response in sample surveys is more prevalent and pervasive in postal surveys than in special canvasser surveys. Therefore, the term non-response refers to the inability to measure part of the units in a sample survey. Non-response can compromise estimator accuracy and increase its bias.
To cope with the non-response problem, several measures are proposed by various researchers, such as the weighting adjustment approach as, suggested by Oh and Scheuren [8]; imputation techniques provided by Kalton [9] and Kalton and Maligalig [10]; and the approach of sub-sampling non-respondents as recommended by Hansen and Hurwitz [11]. Various researchers have attempted to reduce the bias and to improve the effectiveness of the estimators of a population mean in the existence of non-response. Some significant references on estimating the population mean utilizing auxiliary variables in the existence of non-response include [12,13,14,15,16,17], etc.
Although there is extensive literature on estimating different estimators, it is noted that an auxiliary information-based estimation of a population CDF is less emphasized. It is becoming increasingly significant in survey sampling when statisticians are frequently interested in the proportion of a variable’s domain under examination. For example, policymakers may want to know where the percentage of educated women is higher or equal to 50% in Pakistan, the proportion of individuals having a weekly income of 100 USD or more in a developing country, etc. Similarly, a psychiatrist may be interested in knowing how many children spend one or more hours with their mobile activities or what proportion of children spend one or more hours on their phones. Several studies have revealed that spending more than an hour a day on phones or smart devices has a significant relation with psychological problems among childern, such as anxiety, loneliness, and depression.
Hence, it has become necessary to estimate the finite population CDF. Therefore, Singh et al. [18], Muñoz et al. [19], Yaqub and Shabbir [20,21], Hussain et al. [22], and Hussain et al. [23] have put their efforts on estimating the population CDF using auxiliary information.

2. Sampling Design and Notations

2.1. Notations for the CDF under SRS

Consider a finite population U = { U 1 , U 2 , U 3 , , U N } of N distinct units, let ( y i , x i ) ϵ U i be the values of research variable Y and auxiliary variable X, respectively, on the i t h unit. For every index t y and t x where ( t y , t x ) ϵ U , the population CDFs of Y and X are defined, respectively, by,
F Y ( t Y ) = i = 1 N I ( Y i t y ) N , F X ( t X ) = i = 1 N I ( X i t x ) N ,
where I(.) is an indicator variable. It is an average of the Bernoulli distributed variable, such that
I ( Y i t y ) = 1 for ( Y i t y ) 0 and ( Y i > t y ) .
Theorem 1. 
In SRS, F ^ y ( t y ) = i = 1 I ( y i t y ) = I ( Y t y ) is a hyper-geometrically distributed variable with expected mean E ( . ) and variance V ( . ) for Y, respectively,
E F ^ y ( t y ) = F Y ( t y ) , V F ^ y ( t y ) = N ( N 1 ) [ F y ( t y ) ( 1 F 1 ( t y ) ) ] , Cov F ^ y ( t y ) , F ^ x ( t x ) = N N N 11 N 22 N 12 N 21 N 2 ,
where we have the following:
N 11 = the number of units in the population that belong to I ( Y i t y ) and I ( X i t x ) ;
N 12 = the number of units in the population that belong to I ( Y i t y ) and I ( X i > t x ) ;
N 21 = the number of units in the population with I ( Y i > t y ) and I ( X i t x ) ; and
N 22 = the number of units in the population that belong to I ( Y i > t y ) and I ( X i > t x ) .
Theorem 1. can be proved easily along the lines of García et al. [24].
Lemma 1. 
For a large sample size , the variance of F ^ y ( t y ) is defined as
V F ^ y ( t y ) = N N [ F y ( t y ) ( 1 F y ( t y ) ) ] .
Let us consider that S F Y ( t y ) 2 = [ F Y ( t y ) ( 1 F Y ( t y ) ) ] and S F X ( t x ) 2 = [ F X ( t x ) ( 1 F X ( t x ) ) ] are the population variances of I ( Y t y ) and I ( X t x ) , respectively.
Let S ( F Y ( t y ) , F X ( t x ) ) = Cov F Y ( t y ) , F X ( t x ) be the population covariance between I ( Y t y ) and I ( X t x ) , then we have the following:
C F Y ( t y ) = S F Y ( t y ) F Y ( t y ) is the population coefficient of variation of I ( Y t y ) , and
C F Y ( t y ) 2 = 1 F Y ( t y ) F Y ( t y ) ;
C F X ( t x ) = S F X ( t x ) F X ( t x ) is the population coefficient of variation of I ( X t x ) , and
C F X ( t x ) 2 = 1 F X ( t x ) F X ( t x ) ;
ρ ( F Y ( t y ) , F X ( t x ) ) = S ( F ( t y ) , F ( t x ) ) ( S F Y ( t y ) ) ( S F X ( t x ) ) is the phi-coefficient of correlation between I ( Y t y ) and I ( X t x ) .

2.2. Notation for the CDF with Non-Response under an SRS Design

Consider the case where a finite population of N units is divided into two groups: a respondent’s group of N 1 units and another non-respondent’s group of N 2 units, where N = N 1 + N 2 . Consider the case where a sample of size is drawn from a target population using SRSWOR, and it is further assumed that only 1 out of units respond, while 2 = 1 units do not. Now, a sub-sample, also referred to as the 2nd phase sample, of q = 2 / k units, where k > 1 , is taken from the group of non-respondents of size 2 for interviewing. This way of dealing with non-respondents to obtain responses from them is also referred to as the canvasser method. Hence, the total number of responses is ( 1 + q ) , collected from units, and only ( 2 q ) units are left as non-respondents who are not selected in the 2nd phase sample. Following Hansen and Hurwitz [11] a population CDF in the existence of non-response can be defined as follows:
F Y * ( t y ) = W 1 F Y ( 1 ) ( t y ) + W 2 F Y ( 2 ) ( t y ) .
Similarly, let
F X * ( t x ) = W 1 F X ( 1 ) ( t x ) + W 2 F X ( 2 ) ( t x ) ,
where W j = N j N and j = 1 , 2 . In addition, we have the following:
F Y ( 1 ) ( t y ) = i = 1 N 1 I ( Y i t y ) / N 1 is the population CDF of I ( Y t y ) for the response group;
F Y ( 2 ) ( t y ) = i = 1 N 2 I ( Y i t y ) / N 2 is the population CDF of I ( Y t y ) for the non-response group;
F X ( 1 ) ( t x ) = i = 1 N 1 I ( X i t x ) / N 1 is the population CDF of I ( X t x ) for the response group;
F X ( 2 ) ( t x ) = i = 1 N 2 I ( X i t x ) / N 2 is the population CDF of I ( X t x ) for the non-response group.
Yaqub and Shabbir [20] briefly studied the unbiased estimator of the population CDF of the research variable when there was non-response in the sample.
Let the sample CDF F ^ y * ( t y ) , F ^ x * ( t x ) be the unbiased estimators of the population CDF F Y * ( t y ) , F X * ( t x ) , based on units in the existence of non-response.
By using the Hansen and Hurwitz [11] approach, F ^ y * ( t y ) is defined as
F ^ y * ( t y ) = ω 1 F ^ y ( 1 ) ( t y ) + ω 2 F ^ y ( 2 q ) ( t y ) ,
where ω 1 = 1 and ω 2 = 2 . In addition, we have the following:
F ^ y ( 1 ) ( t y ) = i = 1 1 I ( Y i t y ) / 1 denotes the sample CDF based on 1 responding units out of units;
F ^ ( 2 q ) ( t y ) = i = 1 q I ( Y i t y ) / q denotes the sample CDF based on q responding units out of 2 non-response units.
Theorem 2. 
The mean and variance of F ^ y * ( t y ) is defined as follows:
  • F ^ y * ( t y ) is an unbiased estimator of F Y ( t y ) , i.e., E ( F ^ y * ( t y ) ) = F Y ( t y ) ;
  • The variance of F ^ y * ( t y ) is defined by
    Var ( F ^ Y * ( t y ) ) = δ 1 F Y ( 1 ) ( t y ) 1 F Y ( 1 ) ( t y ) + δ 2 F Y ( 2 ) ( t y ) 1 F Y ( 2 ) ( t y ) ,
where δ 1 = ( N ) N and δ 2 = W 2 ( k 1 ) . Theorem 2. can be proved along the lines of [20].
Similarly, for the supplemental variable X, the estimator F ^ x * ( t x ) is defined as
F ^ x * ( t x ) = ω 1 F ^ x ( 1 ) ( t x ) + ω 2 F ^ x ( 2 q ) ( t x ) .
In addition, we have the following:
F ^ x ( 1 ) ( t x ) = i = 1 1 I ( X i t x ) / 1 denotes the sample CDF based on 1 responding units out of units;
F ^ ( 2 q ) ( t x ) = i = 1 q I ( X i t x ) / q denotes the sample CDF based on q responding units out of 2 non-response units.
Lemma 2. 
On the lines of Theorem 2, the mean and variance of F ^ x * ( t x ) are defined as follows:
  • F ^ x * ( t x ) is an unbiased estimator of F X ( t x ) , i.e., E F ^ x * ( t x ) = F X ( t x ) ;
  • The variance of F ^ x * ( t x ) is defined by
    V a r ( F ^ x * ( t x ) ) = δ 1 F X ( 1 ) ( t x ) 1 F X ( 1 ) ( t x ) + δ 2 F X ( 2 ) ( t x 2 ) 1 F X ( 2 ) ( t x 2 ) ;
    .
  • The covariance between F ^ y * ( t y ) and F ^ x * ( t x ) is given by
    Cov F ^ y * ( t y ) , F ^ x * ( t x ) = δ 1 N 11 N 22 N 12 N 21 N 2 + δ 2 N 11 ( 2 ) N 22 ( 2 ) N 12 ( 2 ) N 21 ( 2 ) N ( 2 ) 2 .
    (For the proof see [20]).
In addition, let us define the following:
S F Y ( 1 ) ( t y ) 2 = F Y ( 1 ) ( t y ) 1 F Y ( 1 ) ( t y ) is the population variance of I ( Y t y ) for the response group;
S F Y ( 2 ) ( t y ) 2 = F Y ( 2 ) ( t y ) 1 F Y ( 2 ) ( t y ) is the population variance of I ( Y t y ) for the non-response group;
S F X ( 1 ) ( t x ) 2 = F X ( 1 ) ( t x ) 1 F X ( 1 ) ( t x ) is the population variance of I ( X t x ) for the response group;
S F X ( 2 ) ( t x ) 2 = F X ( 2 ) ( t x ) 1 F X ( 2 ) ( t x ) is the population variance of I ( X t x ) for the non-response group;
C F Y ( 1 ) ( t y ) = S F Y ( 1 ) ( t y ) / F Y ( 1 ) ( t y ) is the population coefficient of variation of I ( Y t y ) for the response group;
C F Y ( 1 ) ( t y ) = S F Y ( 1 ) ( t y ) / F Y ( 1 ) ( t y ) is the population coefficient of variation of I ( Y t y ) for the response group;
C F X ( 2 ) ( t x ) = S F X ( 2 ) ( t x ) / F X ( 2 ) ( t x ) is the population coefficient of variation of I ( X t x ) for the non-response group;
C F X ( 2 ) ( t x ) = S F X ( 2 ) ( t x ) / F X ( 2 ) ( t x ) is the population coefficient of variation of I ( X t x ) for the non-response group;
S F Y ( 1 ) ( t y ) F X ( 1 ) ( t x ) = Cov F Y ( 1 ) ( t y ) , F X ( 1 ) ( t x ) is the population covariance between I ( Y t y ) and I ( X t x ) for the response group;
S F Y ( 2 ) ( t y ) F X ( 2 ) ( t x ) = Cov F Y ( 2 ) ( t y ) , F X ( 2 ) ( t x ) is the population covariance between I ( Y t y ) and I ( X t x ) for the non-response group.
The following relative error terms are taken into account to determine the biases and MSEs of the existing and proposed estimators:
e 0 * = F ^ y * ( t y ) F Y ( t y ) F Y ( t y ) , e 1 * = F ^ x * ( t x ) F X ( t x ) F X ( t x ) , and e 2 = F x ^ ( t x ) F X ( t x ) F X ( t x ) ,
such that E ( e i * ) = 0 = E ( e 2 ) for i = 0 , 1 , where E ( · ) is mathematical expectation. Utilizing approximation up to the first order we have the following:
E ( e 0 * 2 ) 1 F Y 2 ( t y ) δ 1 F Y ( 1 ) ( t y ) 1 F Y ( 1 ) ( t y ) + δ 2 F Y ( 2 ) ( t y ) 1 F Y ( 2 ) ( t y ) ; E ( e 0 * 2 ) 1 F Y 2 ( t y ) δ 1 S F Y ( 1 ) ( t y ) 2 + δ 2 S F Y ( 2 ) ( t y ) 2 V 200 * ; E ( e 1 * 2 ) 1 F X 2 ( t x ) δ 1 F X ( 1 ) ( t x ) 1 F X ( 1 ) ( t x ) + δ 2 F X ( 2 ) ( t x ) 1 F X ( 2 ) ( t x ) ; E ( e 1 * 2 ) 1 F X 2 ( t x ) δ 1 S F X ( 1 ) ( t x ) 2 + δ 2 S F X ( 2 ) ( t x ) 2 V 020 * ;
E ( e 0 * e 1 * ) 1 F Y ( t y ) F X ( t x ) δ 1 N 11 N 22 N 12 N 21 N 2 + δ 2 N 11 ( 2 ) N 22 ( 2 ) N 12 ( 2 ) N 21 ( 2 ) N ( 2 ) 2 ; E ( e 0 * e 1 * ) 1 F Y ( t y ) F X ( t x ) δ 1 S F Y ( 1 ) ( t y ) F X ( 1 ) ( t x ) + δ 2 S F Y ( 2 ) ( t y ) F X ( 2 ) ( t x ) V 110 * ;
E ( e 2 2 ) 1 F X 2 ( t x ) δ 1 F X ( t x ) 1 F X ( t x ) V 002 ; and E ( e 0 * e 2 ) 1 F Y ( t y ) F X ( t x ) δ 1 S ( 1 ) F ( t y ) F ( t x ) V 101 * .
There are two scenarios under consideration in the existence of non-response:
  • Scenario I refers to non-response on both the study and auxiliary variables, whereas
  • Scenario II solely refers to non-response on the study variable.

3. Some Modified Estimators for the CDF under Non-Response

3.1. Modified Estimators in Scenario I

In this section, some existing estimators for population mean estimation are modified for the estimation of a population CDF using SRS under Scenario I, i.e., non-response is present in both the study and the auxiliary variables. Furthermore, the biases and MSEs of the modified estimators are derived to the first order of approximation.
1.
The Cochran [25] ratio estimator is modified for F Y * ( t y ) , and given by
F ^ R * ( t y ) = F ^ y * ( t y ) F x ^ * ( t x ) F X ( t x ) .
  To the first order of approximation, the bias and MSE of Equation (4) are
Bias F ^ R * ( t y ) F Y ( t y ) V 020 * V 110 * and
MSE F ^ R ( t y * ) F Y 2 ( t y ) V 200 * + V 020 * 2 V 110 * .
2.
Singh et al. [26] proposed an exponential estimator under non-response on both the study and auxiliary variables along the lines of Bahl and Tuteja [27]. The modified form of [26] for estimating the CDF is given by
F S 1 ^ ( t y * ) = F ^ y * ( t y ) exp F X ( t x ) F x ^ * ( t x ) F X ( t x ) + F x ^ * ( t x ) .
  The bias and MSEs of Equation (6) to the first order of approximation are given as
Bias ( F S 1 ^ ( t x ) ) F Y ( t y ) 3 8 V 020 * 1 2 V 110 * , and MSE F S 1 ( t x ) F Y 2 ( t y ) 4 4 V 200 * + V 020 * + 4 V 110 * .
3.
The modified regression estimator for F Y * ( t y ) is provided as
F ^ R e g ( t y ) = F Y ^ * ( t y ) + B * F X ( t x ) F x ^ * ( x ) ,
where B * is the regression co-efficient. Moreover, Equation (8) is an unbiased estimator of F Y ( t y ) .
  In addition, at the optimum value B ( opt ) * = ( S F Y ( t y ) F X ( t x ) * ) / ( S F X ( t x ) * 2 ) , the minimum variance of F ^ R e g * ( t y ) is given as
MSE min F R e g * t x = F Y 2 ( t y ) V 200 * 1 V 110 * 2 V 200 * V 020 * .

3.2. Modified Estimators in Scenario II

In this section, some of the existing estimators used to estimate the mean of a population, are modified for the estimation of the population CDF under Scenario II, i.e., non-response is present only on the study variable. Furthermore, the biases and MSEs of these modified estimators are obtained to their first order approximation. Let F ^ ( . ) ( t y ) be the estimator of the population CDF under Scenario II.
1.
The Cochran [25] ratio estimator is modified for F Y * ( t y ) under Scenario II, and given as
F ^ R ( t x ) = F ^ y * ( t y ) F x ^ ( t x ) F X ( t x ) .
  Up to the first order of approximation, the bias and MSE of Equation (10) are
Bias F ^ R ( t y ) F Y ( t y ) V 020 * V 101 * , and
MSE F ^ R ( t y ) F Y 2 ( t y ) V 200 * + V 020 2 V 101 * .
2.
The exponential estimator of Singh et al. [26] is modified for estimating F Y * ( t y ) and is provided as
F S 2 ^ ( t y ) = F ^ y * ( t y ) exp F X ( t x ) F x ^ ( t x ) F X ( t x ) + F x ^ ( t x ) .
  The bias and MSEs of Equation (12) up to the first order of approximation are given as
Bias ( F S 2 ^ ( t y ) ) F Y ( t y ) 3 8 V 002 1 2 V 101 * , and MSE F S 2 ( t y ) F Y 2 ( t y ) 4 4 V 200 * + V 002 + 4 V 101 * .
3.
The modified regression estimator for F Y * ( t y ) in Scenario II is provided as
F ^ R e g ( t y ) = F y ^ * ( t y ) + B F X ( t x ) F x ^ ( t x ) ,
where B is said to be the regression coefficient. Moreover, Equation (14) is an unbiased estimator of F Y ^ ( t y ) . In addition, at the optimum value B ( opt ) = ( S F Y ( t y ) F X ( t x ) * ) / ( S F X ( t x ) 2 ) , the minimum variance of F ^ R e g ( t y ) is given as
Var min F R e g ^ ( t y ) = F Y 2 ( t y ) V 200 * 1 V 101 * 2 V 200 * V 002 ,
or
Var min F R e g ^ ( t y ) = F Y 2 ( t y ) δ 1 C F Y ( 1 ) ( t y ) 2 ( 1 ρ F y ( t y ) F x ( t y ) 2 ) + δ 2 ( C F Y ( 2 ) ( t y ) 2 ) .

4. Proposed Estimators

4.1. The Proposed Estimator in Scenario I

Following [17], an estimator for estimating a population CDF of the study variable under Scenario I is defined as
F ^ p r o p 1 * ( t y ) = F y ^ * ( t y ) F ^ x * ( t x ) F X ( t x ) α exp F X ( t x ) F x ^ * ( t x ) F X ( t x ) + F x ^ * ( t x ) ,
where α ( < α < + ) is unknown and needs to be estimated such that the MSE is minimum.
Theorem 3. 
The bias and MSE of (16) are given, respectively, as follows:
B i a s F ^ p r o p 1 * ( t y ) = F Y ( t y ) α 2 2 α + 3 8 V 020 * + α 1 2 V 110 * , and
M S E ( F ^ p r o p 1 * ( t y ) ) = F Y 2 ( t y ) V 200 * + α 2 α + 1 4 V 020 * + 2 α 1 V 110 * .
Proof. 
Equation (16) is expressed in error terms as
F ^ p r o p 1 * ( t y ) = F Y ( t y ) ( 1 + e 0 * ) 1 + e 1 * α exp ( e 1 * ) ( 2 + e 1 * ) .
Expanding Equation (18) up to the first order of approximation, yields
F ^ P r o p 1 * ( t y ) = F Y ( t y ) ( 1 + e 0 * ) 1 + α e 1 * + α ( α 1 ) 2 e 1 * 2 1 1 2 e 1 * + 3 8 e 1 * 2 .
Keeping the terms up to the second power and extending the above equation, we get the following:
F ^ p r o p 1 * ( t y ) = F Y ( t y ) 1 e 1 * 2 + 3 e 1 * 2 8 + α e 1 * α 2 e 1 * 2 + α 2 2 e 1 * 2 α 2 e 1 * 2 + e 0 * e 0 * e 1 * 2 + α e 0 * e 1 *
and
F ^ p r o p 1 * ( t y ) F Y ( t y ) = F Y ( t y ) e 0 * + e 1 * α 1 2 + 3 e 1 * 2 8 α e 1 * 2 + α 2 2 e 1 * 2 e 0 * e 1 * 2 + α e 0 * e 1 * .
After simplifying the expectation on both sides of Equation (20), we obtain the bias of (16):
Bias F ^ p r o p 1 * ( t y ) = F Y ( t y ) α 2 2 α + 3 8 V 020 * + α 1 2 V 110 * .
Squaring (20) and applying the expectation yield MSE of F ^ p r o p 1 * ( t y ) we obtain
F ^ p r o p 1 * ( t y ) F Y ( t y ) 2 = F Y 2 ( t y ) e 0 * 2 + α 2 e 1 * 2 + 1 4 e 1 * 2 + 2 α e 0 * e 1 * e 0 * e 1 * α e 1 * 2 .
The MSE of (16) is obtained as
MSE ( F ^ p r o p 1 * ( t y ) ) = F Y 2 ( t y ) V 200 * + α 2 α + 1 4 V 020 * + 2 α 1 V 110 * .
Hence the theorem is proved.
Theorem 4. 
The minimum MSE of F ^ p r o p 1 * ( t y ) is given as follows:
M S E m i n ( F ^ p r o p 1 * ( t y ) ) = F Y 2 ( t y ) V 200 * ( V 110 * ) 2 V 020 * .
Proof. 
Differentiating Equation (17) with respect to α and simplifying it to obtain the optimal value of α for minimal M S E , we get
α ( opt ) = V 020 * 2 V 110 * 2 V 020 * .
Substituting α ( opt ) into (17) we obtain the minimal MSE of F ^ p r o p 1 * ( t y ) , such that
MSE m i n ( F p r o p 1 * ( t y ) ) = F Y 2 ( t y ) V 200 * ( V 110 * ) 2 V 020 * .
Hence the theorem is proved.

4.2. Proposed Estimator in Scenario II

Motivated by [28], to estimate the population CDF in the presence of non-response under Scenario II, an estimator is proposed as
F p r o p 2 ( t y ) = F y ^ * ( t y ) F ^ x ( t x ) F X ( t x ) α 1 exp F X ( t x ) F x ^ ( t x ) F X ( t x ) + F x ^ ( t x ) ,
where α 1 ( < α 1 < + ) is an unknown and needs to be estimated such that the MSE is minimum.
Theorem 5. 
The B i a s F ^ p r o p 2 ( t y ) and M S E ( F ^ p r o p 2 ( t y ) ) are given, respectively, by the following:
B i a s F ^ p r o p 2 ( t y ) = F Y ( t y ) α 1 2 2 α 1 + 3 8 V 002 + α 1 1 2 V 101 * and
M S E ( F ^ p r o p 2 ( t y ) ) = F Y 2 ( t y ) V 200 * + α 1 2 α 1 + 1 4 V 002 + 2 α 1 1 V 101 * .
Proof. 
In error terms Equation (24) can be expressed as
F ^ p r o p 2 ( t y ) = F Y ( t y ) ( 1 + e 0 * ) 1 + e 2 α 1 exp ( e 2 ) ( 2 + e 2 ) .
Expanding Equation (26) to the first order of approximation yields
F ^ p r o p 2 ( t y ) = F Y ( t y ) ( 1 + e 0 * ) 1 + α 1 e 2 + α 1 ( α 1 1 ) 2 e 2 2 1 1 2 e 2 + 3 8 e 2 2 .
Keeping the terms up to the second power and extending the above equation, we get
F ^ p r o p 2 ( t y ) = F Y ( t y ) 1 1 2 e 2 + 3 8 e 2 2 + α 1 e 2 α 1 2 e 2 2 + α 1 2 2 e 2 2 α 1 2 e 2 2 + e 0 * e 0 * e 2 2 + α 1 e 0 * e 2
and
F ^ p r o p 2 ( t y ) F Y ( t y ) = F Y ( t y ) e 0 * + α 1 1 2 e 2 + α 1 2 2 α 1 + 3 8 e 2 2 + α 1 1 2 e 0 * e 2 .
After simplifying the expectation on both sides of Equation (28), we obtain the bias of (24) as
Bias F ^ p r o p 2 ( t y ) = F Y ( t y ) α 1 2 2 α 1 + 3 8 V 002 + α 1 1 2 V 101 * .
Squaring (28) and applying the expectation after simplification we obtain
E F ^ p r o p 2 ( t y ) F Y ( t y ) 2 = F Y 2 ( t y ) E e 0 * 2 + α 1 2 e 2 2 + 1 4 e 2 2 + 2 α 1 e 0 * e 2 e 0 * e 2 α 1 e 2 2 .
The MSE of F ^ p r o p 2 ( t y ) is determined as
MSE F ^ p r o p 2 ( t y ) = F Y 2 ( t y ) V 200 * + α 1 2 α 1 + 1 4 V 002 + 2 α 1 1 V 101 * .
Hence the theorem is proved.
Theorem 6. 
The Minimum MSE of F ^ p r o p 2 ( t y ) is given as
M S E m i n F ^ p r o p 2 ( t y ) = F Y 2 ( t y ) V 200 * ( V 101 * ) 2 V 002 .
Proof. 
Differentiating Equation (25) with respect to α 1 , equating it to zero, and simplifying it to obtain the optimal value of α 1 for the minimum MSE, we get
α 1 ( opt ) = V 002 2 V 110 * 2 V 002 .
Substituting α 1 ( opt ) into (25) we obtain the minimum MSE of F ^ p r o p 2 ( t y ) as
MSE m i n F ^ p r o p 2 ( t y ) = F Y 2 ( t y ) V 200 * ( V 101 * ) 2 V 002 .
Hence the theorem is proved.

5. Efficiency Comparisons

The MSEs of the modified and proposed estimators F ^ p r o p 1 * ( t y ) are compared in this section.

5.1. Efficiency Comparisons under Scenario I

The proposed estimator under Scenario I is more efficient if we have the following:
(a)
From (5) and (22),
MSE ( F * ^ R ( t y ) ) > MSE m i n ( F * ^ p r o p 1 ( t y ) ) if MSE ( F * ^ R ( t y ) ) MSE m i n ( F * ^ p r o p 1 ( t y ) ) > 0 , or V 110 * V 020 * < 1 ;
(b)
From (7) and (22),
MSE ( F * ^ S 1 ( t y ) ) > MSE m i n ( F * ^ p r o p 1 ( t y ) ) if MSE ( F * ^ S 1 ( t y ) ) MSE m i n ( F * ^ p r o p 1 ( t y ) ) > 0 , or ( V 110 * + V 020 * ) 2 2 V 020 * V 110 * > 1 ;
(c)
From (9) and (22), it can be shown iff α = V 020 * 2 V 110 * 2 V 020 * ,
MSE m i n F ^ p r o p 1 * ( t y ) = MSE m i n F ^ R e g * ( t y ) .

5.2. Efficiency Comparisons under Scenario II

The proposed estimator under Scenario II is more efficient compared to existing modified estimators if we have the following:
(a)
From (11) and (30),
MSE ( F ^ R ( t y ) ) > MSE m i n ( F ^ p r o p 2 ( t y ) ) if MSE ( F ^ R ( t y ) ) MSE m i n ( F * ^ p r o p 2 ( t y ) ) > 0 , or V 101 * V 002 < 1 ;
(b)
From (13) and (30),
MSE ( F ^ S 2 ( t y ) ) > MSE m i n ( F ^ p r o p 2 ( t y ) ) if MSE ( F ^ S 2 ( t y ) ) MSE m i n ( F ^ p r o p 2 ( t y ) ) > 0 , or ( V 101 * + V 002 ) 2 2 V 002 V 101 * > 1 ;
(c)
From (15) and (30), it can be shown iff α 1 = V 002 2 V 110 * 2 V 002 ,
MSE m i n F ^ p r o p 2 ( t y ) = MSE m i n F ^ R e g ( t y ) .

6. Numerical Study

An empirical evaluation is presented to evaluate the performance of the proposed estimators and some of the existing estimators, by using three different populations. The summary statistics for these populations are shown in Table 1, Table 2 and Table 3 respectively:
MSEs of the modified estimators and the proposed estimators are given in Table 4 and Table 5, with respect to Scenario I and Scenario II, respectively.
The proposed estimators and the modified estimators of a population CDF were compared to the variance of F ^ y ( t y ) under both scenarios in terms of their percent-relative efficiencies (PREs) by using the following formula:
PRE ( · ) = MSE ( · ) MSE F ^ y ( t y ) × 100 % .
PREs of the proposed estimators, and the modified estimators, are given in Table 6 and Table 7, with respect to Scenario I and Scenario II.
From Table 6 and Table 7 we have the following:
It was observed that the PREs corresponding of the estimators F ^ R * ( t y ) , F ^ S 1 * ( t y ) , F ^ R ( t y ) , and F ^ S 2 ( t y ) declined.
The PREs corresponding to the proposed estimators, F ^ p r o p 1 * ( t y ) and F ^ p r o p 2 ( t y ) , and the modified regression estimators, F ^ R e g * ( t y ) and F ^ R e g ( t y ) , showed that the proposed estimators were efficient estimators under both scenarios of non-response.

7. Conclusions

This study proposed an improved class of estimators for the estimation of a finite population CDF under two different scenarios of non-response using SRS. From theoretical and empirical comparisons, the proposed estimators were found to be perform better, based on large PRE and smaller MSE criteria. Therefore, our study suggests the use of the proposed estimators for estimating the CDF in the presence of non-response. Limitations of this study are provided in the Appendix A.

Author Contributions

Conceptualization, A.K. and A.S.; methodology, A.K. and A.S.; software, A.K.; formal analysis, A.K.; investigation, A.K. and A.S., F.S.A.-D. and M.M.A.A.; data curation, A.K.; writing—original draft preparation, A.K.; writing—review and editing, A.K. and A.S.; supervision, A.S.; funding acquisition, F.S.A.-D. and M.M.A.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data utilized to support the numerical conclusions of this study are available within the article. The data can also be retrieved by searching the provided data sources.

Acknowledgments

The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work through the Larg Groups Program, under the grant number (RGP.2/23/44). This study was supported via funding from Prince Sattam bin Abdulaziz University, under project number (PSAU/2023/R/1444).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
CDFcumulative distribution function;
SRSsimple random sampling;
SRSWORsimple random sampling without replacement;
MSEmean square error;
PREpercent-relative efficiency.

Appendix A. Limitations

The proposed class of estimators was designed to perform well under certain conditions. When these conditions deviated, the proposed estimator did not outperform. For example, we have the following:
(1)
The performance of the estimator relies on the relationship between the study variable and the auxiliary variable. If this relationship is weak the proposed estimator did not perform well.
(2)
The proposed estimator assumes that the underlying distribution of the study variable, as well as auxiliary variable, has a certain shape, such as exponential or normal. In situations there are large gaps in the data, the distribution is highly skewed, or risk behavior of the distribution is non-additive, the proposed estimators may not perform well.
(3)
The choice of an auxiliary variable can have a significant impact on the performance of the estimator. The auxiliary variable should be strongly related to the variable of interest and should be easy to measure for all units in the sample.
(4)
The proposed estimator is expected to perform well when the sample size is large enough to provide sufficient precision in the estimates.

References

  1. Shabbir, J.; Gupta, S. Estimation of finite population mean in simple and stratified random sampling using two auxiliary variables. Commun. Stat.-Theory Methods 2017, 46, 10135–10148. [Google Scholar] [CrossRef]
  2. Haq, A.; Khan, M.; Hussain, Z. A new estimator of finite population mean based on the dual use of the auxiliary information. Commun. Stat.-Theory Methods 2017, 46, 4425–4436. [Google Scholar] [CrossRef]
  3. Kaur, H.; Brar, S.S.; Sharma, M. Efficient ratio type estimators of population variance through linear transformation in simple and stratified random sampling. Int. J. Stat. Reliab. Eng. 2018, 4, 144–153. [Google Scholar]
  4. Ahmad, S.; Shabbir, J. Use of extreme values to estimate finite population mean under pps sampling scheme. J. Reliab. Stat. Stud. 2018, 43, 99–112. [Google Scholar]
  5. Ahmad, S.; Hussain, S.; Yasmeen, U.; Aamir, M.; Shabbir, J.; El-Morshedy, M.; Al-Bossly, A.; Ahmad, Z. A simulation study: Using dual ancillary variable to estimate population mean under stratified random sampling. PLoS ONE 2022, 17, e0275875. [Google Scholar] [CrossRef] [PubMed]
  6. Niaz, I.; Sanaullah, A.; Saleem, I.; Shabbir, J. An improved efficient class of estimators for the population variance. Concurr. Comput. Pract. Exp. 2022, 34, e6620. [Google Scholar] [CrossRef]
  7. Shabbir, J.; Onyango, R. Use of an efficient unbiased estimator for finite population mean. PLoS ONE 2022, 17, e0270277. [Google Scholar] [CrossRef]
  8. Oh, H.; Scheuren, F. Weighting adjustments for unit non-response. Incomplete Data Sample Surv. 1983, 2, 143–184. [Google Scholar]
  9. Kalton, G. Handling wave nonresponse in panel surveys. J. Off. Stat. 1986, 2, 303–314. [Google Scholar]
  10. Kalton, G.; Maligalig, D.S. A comparison of methods of weighting adjustment for nonresponse. In Proceedings of the 1991 Annual Research Conference, Arlington, TX, USA, 17–20 March 1991; US Bureau of the Census: Washington, DC, USA, 1991; Volume 409428. [Google Scholar]
  11. Hansen, M.H.; Hurwitz, W.N. The problem of non-response in sample surveys. J. Am. Stat. Assoc. 1946, 41, 517–529. [Google Scholar] [CrossRef]
  12. Sanaullah, A.; Noor-ul Amin, M.; Hanif, M. Generalized exponential-type ratio-cum-ratio and product-cum-product estimators for population mean an the presence of non-response under stratified two-phase random sampling. Pak. J. Stat. 2015, 31, 71–94. [Google Scholar]
  13. Ahmed, S.; Shabbir, J.; Gupta, S. Use of scrambled response model in estimating the finite population mean in presence of non response when coefficient of variation is known. Commun. Stat.-Theory Methods 2017, 46, 8435–8449. [Google Scholar] [CrossRef]
  14. Saleem, I.; Sanaullah, A.; Hanif, M. A Generalized class of estimators for estimating population mean in the presence of non-Response. J. Stat. Theory Appl. 2018, 17, 616–626. [Google Scholar] [CrossRef]
  15. Makhdum, M.; Sanaullah, A.; Hanif, M. A modified regression-cum-ratio estimator of population mean of a sensitive variable in the presence of non-response in simple random sampling. J. Stat. Manag. Syst. 2020, 23, 495–510. [Google Scholar] [CrossRef]
  16. Bhushan, S.; Pandey, A.P. An efficient estimation procedure for the population Mean under non-response. Statistica 2020, 79, 363–378. [Google Scholar]
  17. Ünal, C.; Kadilar, C. Exponential type estimator for the population mean in the presence of non-response. J. Stat. Manag. Syst. 2020, 23, 603–615. [Google Scholar] [CrossRef]
  18. Singh, H.P.; Singh, S.; Kozak, M. A family of estimators of finite-population distribution function using auxiliary information. Acta Appl. Math. 2008, 104, 115–130. [Google Scholar] [CrossRef]
  19. Muñoz, J.; Arcos, A.; Álvarez, E.; Rueda, M. New ratio and difference estimators of the finite population distribution function. Math. Comput. Simul. 2014, 102, 51–61. [Google Scholar] [CrossRef]
  20. Yaqub, M.; Shabbir, J. Estimation of population distribution function in the presence of non-response. Hacet. J. Math. Stat. 2018, 47, 471–511. [Google Scholar] [CrossRef]
  21. Yaqub, M.; Shabbir, J. Estimation of population distribution function involving measurement error in the presence of non response. Commun. Stat.-Theory Methods 2020, 49, 2540–2559. [Google Scholar] [CrossRef]
  22. Hussain, S.; Ahmad, S.; Saleem, M.; Akhtar, S. Finite population distribution function estimation with dual use of auxiliary information under simple and stratified random sampling. PLoS ONE 2020, 15, e0239098. [Google Scholar] [CrossRef] [PubMed]
  23. Hussain, S.; Akhtar, S.; El-Morshedy, M. Modified estimators of finite population distribution function based on dual use of auxiliary information under stratified random sampling. Sci. Prog. 2022, 105, 00368504221128486. [Google Scholar] [CrossRef] [PubMed]
  24. García, M.R.; Cebrián, A.A.; Rodríguez, E. Quantile interval estimation in finite population using a multivariate ratio estimator. Metrika 1998, 47, 203–213. [Google Scholar] [CrossRef]
  25. Cochran, W. The estimation of the yields of cereal experiments by sampling for the ratio of grain to total produce. J. Agric. Sci. 1940, 30, 262–275. [Google Scholar] [CrossRef]
  26. Singh, R.; Kumar, M.; Chaudhary, M.K.; Smarandache, F. Estimation of Mean in Presence of Non Response Using Exponential Estimator; Infinite Study: Dubai, United Arab Emirates, 2009. [Google Scholar] [CrossRef]
  27. Bahl, S.; Tuteja, R.K. Ratio and product type exponential estimators. J. Inf. Optim. Sci. 1991, 12, 159–164. [Google Scholar] [CrossRef]
  28. Singh, G.; Usman, M. Ratio-to-product exponential-type estimators under non-response. Jordan J. Math. Stat. (JJMS) 2019, 12, 593–616. [Google Scholar]
  29. Singh, S. Advanced Sampling Theory with Applications: How Michael ‘Selected’ Amy, Volume 1; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2003. [Google Scholar]
  30. Gujarati, D.N. Basic Econometrics; Tata McGraw-Hill Education: New York, NY, USA, 2009. [Google Scholar]
Table 1. Summary statistics for Population I.
Table 1. Summary statistics for Population I.
ParameterValueParameterValue
N69 S 2 0.50361
n20 S 3 7058.99
δ 1 0.03550 S 44 2 20.0602
X ¯ 4954.44 R 12 0.94202
R ¯ 35.0000 R 13 0.57090
F Y ( t y ) 0.50725 R 23 0.57274
F X ( t x ) 0.50725 R 14 0.85584
S 1 0.50361 R 24 −0.86603
Non-response
ParameterValueParameterValue
N ( 2 ) 17 S 4 ( 2 ) 5.04975
W 2 0.24638 R 12 ( 2 ) 0.88273
δ 2 0.01231 R 13 ( 2 ) 0.62943
S 1 ( 2 ) 0.51449 R 23 ( 2 ) 0.57830
S 3 ( 2 ) 5920.86 R 24 ( 2 ) −0.85391
Population I (source: [29])
Table 2. Summary statistics for Population II.
Table 2. Summary statistics for Population II.
ParameterValueParameterValue
N50 S 2 0.5051
n15 S 3 21.805
λ 0.04667 S 4 15756
R ¯ 25.5000 R 13 0.2518
F Y ( t y ) 0.5000 R 23 −0.7952
F X ( t x ) 0.5000 R 14 0.2134
S 1 0.5051 R 24 −0.8663
Population II (source: [30])
Table 3. Summary statistics for Population III.
Table 3. Summary statistics for Population III.
ParameterValueParameterValue
N50 S 2 0.50508
n15 S 3 21.3175
δ 1 0.04667 S 4 14.5756
X ¯ 78.2900 R 12 0.12000
R ¯ 25.5000 R 13 0.22925
F Y ( t y ) 0.50000 R 23 0.78936
F X ( t x ) 0.50000 R 14 0.18435
S 1 0.50508 R 24 0.86630
Non-response
ParameterValueParameterValue
N 2 12 S 4 ( 2 ) 3.60555
w 2 0.24000 R 12 ( 2 ) 0.16903
δ 2 0.01600 R 13 ( 2 ) 0.25695
S 1 ( 2 ) 0.51493 R 23 ( 2 ) 0.81369
S 2 ( 2 ) 0.52223 R 14 ( 2 ) 0.22034
S 3 ( 2 ) 18.2593 R 24 ( 2 ) 0.86905
Population III (source: [30])
Table 4. MSEs under Scenario I.
Table 4. MSEs under Scenario I.
EstimatorData 1Data 2Data 3
MSE F ^ y * ( t y ) 0.029421250.035135610.04084997
MSE F ^ R * ( t y ) 0.100469900.118755800.13704170
MSE F ^ S 1 * ( t y ) 0.054275910.061761730.05427591
MSE F ^ R e g * ( t y ) 0.014716630.018475410.02223419
MSE F ^ p r o p 1 * ( t y ) 0.014696900.018409320.02210629
Table 5. MSEs under Scenario II.
Table 5. MSEs under Scenario II.
EstimatorData 1Data 2Data 3
MSE F ^ y ( t y ) 0.029421250.035135610.04084997
MSE F ^ R ( t y ) 0.087898270.093612620.09932698
MSE F ^ S 2 ( t y ) 0.052733040.058447390.06416175
MSE F ^ R e g ( t y ) 0.016672210.022386570.02810093
MSE F ^ p r o p 2 ( t y ) 0.016672210.022386570.02810093
Table 6. PREs of estimators under Scenario I.
Table 6. PREs of estimators under Scenario I.
EstimatorData 1Data 2Data 3
PRE F ^ y * ( t y ) 100100100
PRE F ^ R * ( t y ) 29.2829.5929.81
PRE F ^ S 1 * ( t y ) 54.2156.8975.26
PRE F ^ R e g * ( t y ) 199.99190.18184.73
PRE F ^ p r o p 1 * ( t y ) 200.19190.86184.79
Table 7. PREs of the estimators under Scenario II.
Table 7. PREs of the estimators under Scenario II.
EstimatorData 1Data 2Data 3
PRE F ^ y ( t y ) 100100100
PRE F ^ R ( t y ) 33.4737.5341.13
PRE F ^ S 2 ( t y ) 55.7960.1163.67
PRE F ^ R e g ( t y ) 176.47156.95145.37
PRE F ^ p r o p 2 ( t y ) 176.47156.95145.37
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Khalid, A.; Sanaullah, A.; Almazah, M.M.A.; Al-Duais, F.S. An Efficient Ratio-Cum-Exponential Estimator for Estimating the Population Distribution Function in the Existence of Non-Response Using an SRS Design. Mathematics 2023, 11, 1312. https://doi.org/10.3390/math11061312

AMA Style

Khalid A, Sanaullah A, Almazah MMA, Al-Duais FS. An Efficient Ratio-Cum-Exponential Estimator for Estimating the Population Distribution Function in the Existence of Non-Response Using an SRS Design. Mathematics. 2023; 11(6):1312. https://doi.org/10.3390/math11061312

Chicago/Turabian Style

Khalid, Ayesha, Aamir Sanaullah, Mohammed M. A. Almazah, and Fuad S. Al-Duais. 2023. "An Efficient Ratio-Cum-Exponential Estimator for Estimating the Population Distribution Function in the Existence of Non-Response Using an SRS Design" Mathematics 11, no. 6: 1312. https://doi.org/10.3390/math11061312

APA Style

Khalid, A., Sanaullah, A., Almazah, M. M. A., & Al-Duais, F. S. (2023). An Efficient Ratio-Cum-Exponential Estimator for Estimating the Population Distribution Function in the Existence of Non-Response Using an SRS Design. Mathematics, 11(6), 1312. https://doi.org/10.3390/math11061312

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop