2.1. Notations for the CDF under SRS
Consider a finite population
of
N distinct units, let
be the values of research variable Y and auxiliary variable X, respectively, on the
unit. For every index
and
where (
,
)
, the population CDFs of Y and X are defined, respectively, by,
where
I(.) is an indicator variable. It is an average of the Bernoulli distributed variable, such that
Theorem 1. In SRS, = is a hyper-geometrically distributed variable with expected mean and variance for Y, respectively,
where we have the following:
the number of units in the population that belong to and ;
the number of units in the population that belong to and ;
the number of units in the population with and ; and
the number of units in the population that belong to and .
Theorem 1. can be proved easily along the lines of García et al. [
24].
Lemma 1. For a large sample size is defined as Let us consider that and are the population variances of and , respectively.
Let be the population covariance between and , then we have the following:
is the population coefficient of variation of , and
;
is the population coefficient of variation of , and
;
is the phi-coefficient of correlation between and .
2.2. Notation for the CDF with Non-Response under an SRS Design
Consider the case where a finite population of
N units is divided into two groups: a respondent’s group of
units and another non-respondent’s group of
units, where
. Consider the case where a sample of size
ℓ is drawn from a target population using SRSWOR, and it is further assumed that only
out of
ℓ units respond, while
units do not. Now, a sub-sample, also referred to as the 2nd phase sample, of
units, where
, is taken from the group of non-respondents of size
for interviewing. This way of dealing with non-respondents to obtain responses from them is also referred to as the canvasser method. Hence, the total number of responses is
, collected from
ℓ units, and only
units are left as non-respondents who are not selected in the 2nd phase sample. Following Hansen and Hurwitz [
11] a population CDF in the existence of non-response can be defined as follows:
Similarly, let
where
and
In addition, we have the following:
is the population CDF of for the response group;
is the population CDF of for the non-response group;
is the population CDF of for the response group;
is the population CDF of for the non-response group.
Yaqub and Shabbir [
20] briefly studied the unbiased estimator of the population CDF of the research variable when there was non-response in the sample.
Let the sample CDF be the unbiased estimators of the population CDF , based on ℓ units in the existence of non-response.
By using the Hansen and Hurwitz [
11] approach,
is defined as
where
and
. In addition, we have the following:
denotes the sample CDF based on responding units out of ℓ units;
denotes the sample CDF based on q responding units out of non-response units.
Theorem 2. The mean and variance of is defined as follows:
where
and
. Theorem 2. can be proved along the lines of [
20].
Similarly, for the supplemental variable
X, the estimator
is defined as
In addition, we have the following:
denotes the sample CDF based on responding units out of ℓ units;
denotes the sample CDF based on q responding units out of non-response units.
Lemma 2. On the lines of Theorem 2, the mean and variance of are defined as follows:
In addition, let us define the following:
is the population variance of for the response group;
is the population variance of for the non-response group;
is the population variance of for the response group;
is the population variance of for the non-response group;
is the population coefficient of variation of for the response group;
is the population coefficient of variation of for the response group;
is the population coefficient of variation of for the non-response group;
is the population coefficient of variation of for the non-response group;
is the population covariance between and for the response group;
is the population covariance between and for the non-response group.
The following relative error terms are taken into account to determine the biases and MSEs of the existing and proposed estimators:
such that
for
, where
is mathematical expectation. Utilizing approximation up to the first order we have the following:
There are two scenarios under consideration in the existence of non-response:
Scenario I refers to non-response on both the study and auxiliary variables, whereas
Scenario II solely refers to non-response on the study variable.