1. Introduction
Within the field of statistical inference, numerous approaches can be utilized to arrive at findings, depending on the circumstances and goals of the investigation. A common method is to include the inference procedure in a probabilistic framework by utilizing models designed for enumeration-based and analytical inference. This technique is widely recognized as a fundamental component of statisticians’ arsenal, supporting the development of trustworthy inferences from observable data. It is crucial to understand, however, that enumeration inference requires a different probability structure than analytical inference. This discrepancy highlights the necessity for flexibility in statistical modeling and results from the different conditions and assumptions inherent in these approaches. A turning point in statistical inquiry was reached with the extraordinary upsurge in sampling-related research that followed World War II. Few statistical problems attracted as much attention and scientific investigation at this time as sampling theory. The main driver of this increase may be traced to the wide range of real-world uses of sampling theory, which sparked a radical change in data-gathering techniques. This revolutionary influence is demonstrated by the large number of distinguished statisticians who have devoted large amounts of their research efforts to the analysis of sample surveys. In practical applications, these surveys are vital instruments for gathering information from segments of the general public. Survey settings are noteworthy for their capacity to utilize supplementary data. Redistributing some of the survey resources makes it possible to expedite access to additional data. A vast array of repositories, such as census data, previous surveys, or pilot studies, might be included in these auxiliary data sources. These flexible data streams can take many different shapes and offer insightful information about one or more relevant variables. Specific questions are frequently asked in surveys, and answers are gathered from a carefully chosen sample of the community being studied. One such innovative survey approach that has surfaced is the randomized response (RR) technique, which was first developed by [
1]. Using this method, a basic random sample, usually consisting of ‘n’ individuals, is selected with replacement from the population. The main goal is to determine the percentage of the population that possesses a sensitive characteristic, represented by the letter “G”. Every member of the sample receives an identical randomization device designed to produce results based on a predetermined probability distribution. “I do not possess the trait G” and “I possess the trait G” are both considered genuine with a predefined probability “P”. The respondent then chooses “Yes” or “No”, depending on how well the results of the randomization device match their real situation. Interested parties can consult a number of seminal articles written by [
2,
3,
4,
5,
6,
7,
8,
9], among others, for a more in-depth examination of the nuances and complexity present in these methodologies. These publications provide thorough insights and direction in this area. These scholarly works provide deeper insights into statistical inference, sampling strategies, and the changing landscape of data-gathering methodologies, making them invaluable knowledge repositories for both researchers and practitioners.
2. Some Related Models
Ref. [
10] used the randomization device carrying three types of cards bearing statements: (i) “I belong to sensitive group A
1”, (ii) “I belong to group A
2”, and (iii) “Blank cards”, with corresponding probabilities
Q1,
Q2, and
Q3, respectively, such that
. If the blank card is drawn by the respondent, he/she will report “no.” The rest of the procedure remains as usual. The probability of a “Yes” answer is given by
where
and
are the true proportion of the rare sensitive attribute A1 and the rare unrelated attribute A2 in the population, respectively. From the above Equation, the estimator of
is as
where
the observed proportion of “yes” answers in the sample. The variance of the estimator
is given as
An approach to calculating the mean number of persons possessing a rare sensitive trait using stratified sampling was proposed by Ref. [
9], Bayesian by Ref. [
11], and others Refs. [
12,
13,
14,
15,
16,
17,
18,
19,
20,
21]. In light of the aforementioned research, we have attempted in this paper to extend the unrelated randomized response model of [
10] to a three-stage procedure for unrelated randomized responses in stratified and stratified random double sampling using a Poisson distribution to estimate the rare sensitive attribute in cases where the rare unrelated question’s parameter is known. The recommended model has been shown to perform better through empirical investigations. Analyses of the data are provided, along with appropriate suggestions.
3. Problem Formulation
Over time, significant advancements in electrical system size and efficiency have been observed, mostly as a result of the use of optimization techniques. By continuously modifying and optimizing parameters in real-time, these approaches are crucial for improving the performance of both linear and nonlinear systems. Whether they represent maxima or minima, optimization strategies are flexible and able to find optimal values by fine-tuning system parameters. It is important to recognize, though, that it would be premature to declare any specific electrical system design to be the best option, given how quickly technology is developing. Technology constantly reshapes what is possible, pushing the envelope and breaking preconceived notions of what is possible. A problem of optimization occurs in the context of Tarray and Singh’s two-stage stratified random sampling model with fuzzy costs. In 2015, fuzzy, nonlinear programming was used to solve this problem. This model’s optimal allocation was the main problem to be solved. The method of Lagrange multipliers, a potent optimization tool, was employed to solve the problem. Furthermore, at a predefined alpha threshold, the fuzzy numbers—which represented the ideal allocation—were transformed into discrete integers using the alpha-cut approach. For practical applications, working with integer sample sizes is essential;hence, finding an integer-based solution is quite important. Utilizing LINGO software 21.0, the researchers were able to arrive at this answer without having to round off the continuous data. By framing the issue as a fuzzy integer nonlinear programming problem, this method produced a more accurate and useful solution for electrical system design. This methodology allows for the reliable collection of data on sensitive attributes within the population while also protecting the privacy of the respondents. The challenge of optimal allocation in stratified random sampling with fuzzy costs is successfully addressed through the innovative use of fuzzy nonlinear programming and sophisticated optimization techniques, offering important insights for the design and enhancement of electrical systems.
Initially, Ω is a finite population of size
N, which is composed of
L strata of size
Nh (
h = 1, 2, 3…,
L). A sample of size
nh is drawn by simple random sampling with replacement (SRSWR) from the
hth stratum. It is assumed that the stratum
is known. The
nh respondents from the
hth stratum are provided with the following three-stage randomization device:
Statements | Selection probability |
Statement 1: Are you a member of a rare sensitive Group A1?
Statement 2: Go to randomization device R2h |
|
Second-stage randomization device R
2h consists of two statements
Statements | Selection probability |
Statement 1: Are you a member of a rare sensitive Group A1?
Statement 2: Go to randomization device R3h |
Ph (1 − Ph)
|
The randomization device
R3h utilized three statements. Let
Xh and
Yh denotethe number of cards that the respondent drew from the first and second decks, respectively, to obtain the cards that represent their personal status.
might be stated as, if
is the
ith respondent in the
hth stratum:
with
and
Using (2)
with a sampling variance
or
where
with variance
where
is the available fixed budget for the survey,
is the available fixed budget for the survey, and is the overhead expense. Nonlinear programming (NLPP) problem with fixed costs
The limitations and are put in place, respectively.
7. Numerical Illustration
Using a population size of 1000 and total survey budgets of 3500, 4000, and 4800 units, respectively, for TFNs and 3500, 4000, 4400, and 4600 units for TrFNS, the required FNLLP is calculated based on the values provided in
Table 1 and
Table 2 as follows:
The required optimal allocations for the issue may be obtained by entering the values from
Table 1 and
Table 2 in (9) at =0.5.
In a similar way, the problem of the optimal allocation will be determined by inserting values from
Table 1,
Table 2 and
Table 3 at a value of 0.50.
Using the above minimization problem, we obtain the optimal solution as , , and the optimal value is Minimize = 0.000777032.
Since
n1 and
n2 are required to be integers, we branch problem
R1 into two sub-problems,
R2 and
R3, by introducing the constraints
n2 ≤ 177 and
n2 ≥ 178, respectively, indicated by the value
n1 = 300. The optimal solution is then determined as
n1 = 300 and
n2 = 177, minimizing V (
) = 0.000777032. Notably, these optimal integer values are the same as those obtained by rounding
ni to the nearest integer. Assuming V (
) = Z, the various nodes for the NLPP using case I are illustrated in
Figure 1.
Using the above minimization problem, we obtain the optimal solution as , , and the optimal value is minimized .
Since
n1 and
n2 are required to be integers, problem
R1 is further branched into sub-problems
R2,
R3,
R4, and
R5 with additional constraints of
n1 ≤ 237,
n1 ≥ 238,
n2 ≤ 176, and
n2 ≥ 177, respectively. Problem
R4 is resolved optimally as the solutions for n
1 and n
2 are integers. Problem
R2 is further divided into sub-problems
R6 and
R7 with constraints
n2 ≤ 176 and
n2 ≥ 177, respectively. R
6 is resolved optimally, while R
7 has no feasible solution. The optimal solution is
n1 = 240 and
n2 = 176, minimizing V (
) = 0.000796088. Assuming V (
) = Z, the various nodes for the NLPP using case II are shown in
Figure 2.
Using the above minimization problem, we obtain the optimal solution as
,
, and the optimal value is minimized
= 0.00085209. Since
n1 and
n2 are required to be integers, problem
R1 is further branched into sub-problems
R2 and
R3,
R4, and
R5 with additional constraints of
n1 ≤ 177 and
n1 ≥ 178, respectively. Problems
R4 are resolved optimally with integer solutions for
n1 and
n2. Problem
R2 is further divided into sub-problems
R6 and
R7 with constraints of
n2 ≤ 170 and
n2 ≥ 171, respectively.
R6 is resolved optimally, while
R7 has no feasible solution. The optimal solution is
n1 = 180 and
n2 = 170, minimizing
V (
)
= 0.0008521187. Assuming
V(
)
= Z, the various nodes for the NLPP using case III are shown in
Figure 3.
8. Discussion Section
Using a randomized response technique, we have developed a novel fuzzy logic approach to decision-making that can be implemented in three phases and is specifically designed to minimize variation while taking costs into consideration in stratified random sampling. Although our model is useful in practice and provides a thorough framework for enhancing data collection techniques, it is crucial to recognize its limitations and consider future research directions. Our method assumes fixed costs for each stratum, but additional research into adaptive models will be necessary because these costs could change as a result of logistical issues, inflation, and resource availability. Although our model allocates using the alpha-cut method, it might not accurately reflect the uncertainty in real-world data. Machine learning or other alternative fuzzy logic approaches could increase the flexibility and robustness of the model. Additionally, the main focus of our research has been the theoretical and numerical verification of the suggested model. To evaluate the practical applicability and efficacy of our technique in various circumstances, empirical validation via real-world case studies is necessary. Field research and partnerships with business partners may yield insightful information and aid in the model’s empirical, evidence-based refinement. We demonstrated the practical value of our concept by applying it to three different cases. Our approach’s effectiveness was validated in each case by the notable decrease in variance that was maintained at a reasonable cost. However, the results also pointed out several areas that needed work. These point to the need for more accurate cost estimation methods as well as more optimization of the allocation process inside the model.