1. Introduction
In numerous investigations conducted in real-world settings, particularly in the fields of ecology and environmental research, the main focus variable (referred to as
Y) can be challenging to directly observe due to factors such as high costs, procedures that require a significant amount of labor, intrusiveness, or the possibility of harming the subjects under study. Despite the challenges and complexities involved in data collection, it is often relatively simple and cost-effective to rank the sampled units. To demonstrate this point, let us take the case of calliphoridae flies as an example. These flies possess an inherent survival mechanism that enables them to rapidly detect and inhabit a source of food, such as a decaying body, soon after it has perished. In investigations conducted after death, forensic entomologists often depend on the larvae of these flies to estimate the post-mortem interval. The larvae cease feeding once they reach their maximum size. By observing the volume of their intestinal contents (as their anterior intestine remains empty during further development), forensic entomologists can accurately determine the post-mortem interval. However, using radiographic techniques to assess changes in the maggots’ intestinal contents presents challenges (Sharma et al. [
1]). On the other hand, since the larvae appear to continuously grow in length, measuring and ranking their length is relatively straightforward. Another example can be found in a health-related research endeavor aiming to obtain an average estimate of the cholesterol level of a population. Rather than performing intrusive blood tests on every participant in the sample, it is possible to visually rank the subjects based on their weights, and blood samples can be collected from only a small number of individuals.
McIntyre [
2] provided the initial proposal of RSS. The RSS procedure can be described as follows: A sample is selected from a population
N using simple random sampling (SRS). Each unit in this sample undergoes evaluation based on subjective criteria. Only the smallest unit is measured, while the rest are disregarded. Similarly, a second sample is selected, and only the second-smallest unit is measured, while the rest are disregarded. The process of selecting a new sample and measuring the subsequent smallest unit is repeated until the desired sample size is achieved.
Since its inception, ranked set sampling (RSS) has garnered significant attention from researchers and remains an active area of study. While it originated in horticulture with McIntyre’s foundational work in 1952, RSS has expanded its applications and is now being utilized in commercial settings. To delve deeper into the intricacies of RSS, interested readers can consult the works of Chen et al. [
3], Hassan et al. [
4], Bouza [
5], Nagy et al. [
6], and Benchiha et al. [
7]. Shahzad et al. [
8] successively used the ranked and true observations of auxiliary variables for mean estimation in MRSS. The three-fold use of auxiliary variables was suggested by Shahzad et al. [
9] for mean estimation in MRSS. Bushan et al. [
10] defined difference-type estimators in RSS. Muttlak [
11] introduced a variation of RSS known as MRSS in order to estimate population means. Muttlak demonstrated that MRSS yields more precise estimates than RSS. In MRSS, rather than measuring the
kth (
) minimum observation, the median of each sample within a cycle is measured. Essentially, MRSS can be seen as an adapted form of RSS designed to improve the accuracy of estimation.
The classic ratio estimator is widely recognized and commonly used in sampling theory to estimate population means (Oral and Oral [
12]). Expanding upon this estimator, Al-Omari [
13] introduced novel ratio-type estimators that incorporate the MRSS scheme. Later, Koyuncu [
14] extended the concepts introduced by Al-Omari [
13] and developed different types of estimators. However, all these efforts focused on ratio and difference-type mean estimation within the framework of MRSS. It is worth noting that these estimators rely on traditional descriptive statistics measures. However, no studies have been conducted on calibrated mean estimation under MRSS using robust covariance matrices, such as the MCD matrix. Thus, this study represents an initial step toward developing robust calibrated mean estimators within the MRSS framework.
The structure of this document is as follows:
Section 2 begins by introducing the calibration technique and presenting the modified estimators for stratified MRSS. In
Section 3 and
Section 4, a fresh set of estimators is introduced under single and double MRSS schemes. In
Section 5, a comprehensive simulation analysis is carried out to compare the effectiveness of the suggested estimators with alternative methods. The
Section 6 offers concluding remarks that summarize the findings of this paper.
2. Generalized Class of Calibrated MCD-Based Estimators
MCD estimation was defined by Rousseeuw [
15]. To estimate multivariate locations and dispersion with a high-breakdown point, it is necessary to assess the determinant of the
(variance–covariance matrix). When
is a positive semi-definite matrix with dimensions of
and
positive eigenvalues, the determinant represents the product of these eigenvalues. Hence, a small determinant value indicates the presence of linear patterns in the data. The MCD method involves considering all subsets of size
from a dataset and calculating the determinant of
for each subset. The MCD estimators are obtained by selecting the subset with the lowest determinant along with the typical
mean vector and its corresponding
matrix. These estimators are discussed in the study by Muthukrishnan and Mahesh [
16]. Note that estimators related to central tendency and dispersion can also be improved in the MCD framework by using auxiliary information.
Incorporating auxiliary information has the potential to greatly enhance the mean estimators. In various real-life scenarios, a linear correlation can be observed between a study variable Y and an auxiliary variable X (see Shahzad et al. [
17] and Abbasi et al. [
18]). As an example, consider the association between depression and suicide, where individuals with severe depression are more likely to commit suicide compared to those without depression (Johnson et al. [
19]). Additionally, we can consider the established direct and positive correlation between body mass index (BMI) and total cholesterol levels (Schroder et al. [
20]). These scenarios demonstrate how auxiliary variables can provide valuable information and contribute to more accurate mean estimation.
Zaman and Bulut [
21] introduced the concept of MCD-based mean estimation using auxiliary information. Shahzad et al. [
22] extended their work on handling missing observations. Zaman and Bulut [
23] also introduced MCD-based variance estimators. To learn more about MCD-based mean and variance estimation within simple and stratified sampling designs, readers can refer to Zaman and Bulut [
24,
25]. However, to the best of our knowledge, no attention has been paid to MCD-based calibrated mean estimators in a stratified MRSS design.
Calibration is a technique used for the development of modified weights. Calibration estimation is a core methodology that seeks to refine initial weights by minimizing a designated measure of distance while incorporating auxiliary data. In the corresponding literature, scholars have investigated the application of calibration weighting in stratification to improve the precision of population parameter estimates. The generation of new calibration weights relies on two key components: (1) a distance function and (2) constraints. These two components serve as the basis for constructing enhanced calibration weights. Since the study variable and the auxiliary variables exhibit a strong correlation, effective weights for the auxiliary variable are expected to be effective for the research variable as well. Building on the pioneering work of Deville and Sarndal [
26], many researchers have explored calibration estimation using different calibration constraints in survey sampling (see [
14,
27,
28,
29,
30]). Drawing inspiration from these significant studies, we propose mean estimators using MRSS under the MCD framework.
In a stratified MRSS sampling design, we draw a random sample of size without replacement from a population with a size of in stratum (where ). Let represent the order statistics of and the imperfectly ranked order of for the units in the -th stratum, where and indicate perfect ranking for X and imperfect ranking for Y, respectively. We denote the units measured using MRSS as for odd sample sizes and for even sample sizes.
Define the observed units
for the case of an odd sample size in the
-th stratum as
. Let
denote the overall averages in the
th strata in Equation (
1) and the sample averages in the
th stratum in Equation (
2). Below,
are the observed units, i.e.,
overall averages in the
th strata Equation (
3) and the sample averages in the
th stratum Equation (
4), for an even sample size. Note that
is the stratum weight.
Now, we are provide a generalized class of calibration estimators under an MRSS design, which are expressed as
and are subject to the following constraints:
where
is the calibrated weight, and
j denotes odd and even sample sizes for MRSS, i.e.,
. Defining the Lagrange function with its multipliers
and
yields
is the chi-square distance function, where
is 1 or a reciprocal of any known characteristics of auxiliary information. Through the following calculation,
, we obtain
By inserting (
9) into (
6) and (
7), we obtain
By substituting
and
into (
9), we obtain
Inserting
into
yields the calibrated mean estimator of the study variable
This estimator can be rewritten as
where
Note that in a generalized class,
be any known characteristic of an auxiliary variable. By replacing
with some known parameters, for instance, the arithmetic mean
and the coefficient of variation
, we can obtain the estimators reported in [
17,
29,
30]. The cited authors defined these estimators under simple and MRSS designs. We adapt their work under the MCD framework, as shown in
Table 1. Further, many other estimators can be developed by replacing
with some known population characteristics.