Next Article in Journal
A Mathematica-Based Interface for the Exploration of Inter- and Intra-Regional Financial Flows
Next Article in Special Issue
Improved Apriori Method for Safety Signal Detection Using Post-Marketing Clinical Data
Previous Article in Journal
The Constrained 2-Maxian Problem on Cycles
Previous Article in Special Issue
Novel Design and Analysis for Rare Disease Drug Development
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Accounting for Measurement Error and Untruthfulness in Binary RRT Models

1
Department of Mathematics, University of Louisiana at Lafayette, Lafayette, LA 70504, USA
2
Department of Mathematics, University of California, Berkeley, Berkeley, CA 94720, USA
3
Department of Mathematics and Statistics, UNC Greensboro, Greensboro, NC 27412, USA
4
Department of Statistics, Lahore College for Women University, Lahore 44444, Pakistan
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(6), 875; https://doi.org/10.3390/math12060875
Submission received: 16 November 2023 / Revised: 31 January 2024 / Accepted: 14 March 2024 / Published: 16 March 2024

Abstract

:
This study examines the effect of measurement error on binary Randomized Response Technique models. We discuss a method for estimating and accounting for measurement error and untruthfulness in two basic models and one comprehensive model. Both theoretical and empirical results show that not accounting for measurement error leads to inaccurate estimates. We introduce estimators that account for the effect of measurement error. Furthermore, we introduce a new measure of model privacy using an odds ratio statistic, which offers better interpretability than traditional methods.
MSC:
62D05

1. Introduction

Examining controversial attitudes and behaviors via survey often necessitates a careful approach. Rather than affirmatively responding to questions about sensitive or self-incriminating behavior, survey respondents may instead withhold information or provide inaccurate answers. To mitigate the bias that these actions introduce into survey data, Ref. [1] developed the Randomized Response Technique (RRT) to restore the assurance of confidentiality. Through the deliberate introduction of randomness into survey procedures, randomized response models safeguard respondent privacy and yield more accurate data.
The first binary RRT model was proposed by [1]. The Warner model prompts participants to answer sensitive questions in direct or indirect form. This method utilizes binary sensitive variables that preserve respondent privacy. Since then, many other models have been proposed, including the model by [2]. The Greenberg model asks respondents to answer either a direct sensitive question or an unrelated question. In 2023, Ref. [3] proposed a mixture model that combines the Warner and Greenberg models, treating the two models as special cases of the larger model. This paper also investigated the impact of untruthfulness on binary RRT models, but these models have not accounted for the effect of measurement error. In this paper, we aim to investigate the impact of measurement error and untruthfulness on binary RRT models.
The effect of measurement error has been studied in quantitative RRT models by many authors including [4,5]. Ref. [6] explored the effect of measurement error on the [1] binary model. This model and the mixture model proposed by [3] are discussed in detail in Section 2. We apply the effect of measurement error explored by [6] onto the Lovig mixture model in Section 3. In this paper, we aim to study the impact of measurement error in the Lovig mixture model in comparison to the models of [1] and the [2].
Certain models may be better suited against the effects of measurement error. In Section 4, we test our mixture model that accounts for measurement error by introducing measurement error in a numerical study. This study computes several levels of the error that is introduced into the model by the measurement error, MSE [ m ^ ] , for different choices of p and q. This design compares the mixture model to the Warner and Greenberg models since these two are considered special cases of the Lovig mixture model ( q = 0 and q = 1 p , respectively). This investigation relies on our estimation of measurement error m, which uses a secondary question described in Section 3.3.2.
In the discussion of privacy in Section 5, we propose a new measure of privacy for binary RRT models offered by the model. This method uses binary logistic regression to compute the odds ratio (OR) between true participant responses and recorded responses from the model as a measure of predictability. A higher OR value indicates higher individual predictability of the presence of the sensitive trait as the recorded response changes from “No” to “Yes”, thereby offering lower privacy. This privacy measure is estimated in the presence of measurement error and participant untruthfulness.

2. Previous Models

Several RRT models form the building blocks for the proposed model in Section 3. The Warner model was the first RRT model introduced in [1]. In this model, participants are presented with either the sensitive question or an indirect version of the sensitive question using a randomization device with known probabilities. The key aspect of RRT models is that the interviewer remains unaware of whether the respondent is answering the direct or the indirect question, ensuring confidentiality. Similarly, the Greenberg model involves respondents responding to either a sensitive question or an entirely unrelated question.
Prior to [7], all binary RRT models assumed that respondents provide truthful responses. In this work, the authors accounted for a lack of trust and demonstrated that not accounting for untruthfulness leads to poor estimates in the Greenberg model. Later, Ref. [3] demonstrated the negative impact that untruthfulness has on the efficiency of the Greenberg, Warner, and Lovig mixture binary models. The method of accounting for untruthfulness proposed by [7] added a node to switch the respondent’s answer with probability 1 A only when the respondent is in the sensitive group with probability π x . This design ensures that the respondent’s mistrust of the model will only occur at questions that the respondent finds sensitive. This conceptual framework of accounting for lack of trust was utilized to develop a more efficient and private model.
The Warner model has been shown to offer the highest privacy protection to respondents, and the best efficiency is found in the Greenberg model [3]. The mixture model proposed in [3] that combines these two models asks participants an indirect sensitive question, a direct sensitive question, or an unrelated question, each with known likelihoods. This model is discussed and depicted in Section 2. The approach in [3] offered an opportunity to compare privacy and efficiency between the Warner model, the Greenberg model, and a mixture of the two accounting for lack of trust. To fairly compare the three models, this work uses a modified version of the unified measure proposed by [8], which proposed unified measure M . This novel model that accounts for untruthfulness may be improved to also account for measurement error.
Accounting for the effect of measurement error in the model of [1] was first proposed in [6]. This model represents the first work exploring measurement error in binary RRT models. This work developed a unique method of defining measurement error in the context of binary RRT models and examined the estimation of the prevalence of sensitive characteristics under measurement error in some cases. The authors found that, in most cases, the measurement error introduced a non-negligible bias into the efficiency offered by the Warner model. This work includes estimators for measurement error and the sensitive trait accounting for measurement error. The approach in this paper examines the same effect of measurement error in the more comprehensive mixture model by [3].

Mixture Model Accounting for Lack of Trust by [3]

The [3] mixture model offers a trichotomy that randomly chooses from the direct question, the indirect question, or the unrelated question. This model is best shown by the flow diagram given in Figure 1 below:
This work found that respondent lack of trust decreases model efficiency and increases privacy when truthfulness remains unaccounted for. Note that the Warner and Greenberg models may be considered as special cases of the Lovig mixture model as pictured above. Comparing the three models, the Greenberg model ( q = 0 ) outperforms other models in terms of efficiency, the Warner model ( q = 1 p ) excels in privacy protection, and the Lovig mixture model ( q 1 p ) emerges as the best model in terms of the unified measure M .
The Lovig mixture model outperforms both the Greenberg and the Warner models. It exhibits a lower MSE than the Warner model, better privacy protection than the Greenberg model, and significantly better unified measure M than both of the basic models in most recommended cases. This model was groundbreaking in investigating and accounting for the effect of untruthfulness in binary RRT models. The proposed model in this work is an extension of the Lovig model in Figure 1 to account for measurement error.

3. Proposed Model

We hypothesize that measurement error imposes a significant bias on model efficiency. This section proposes a model that accounts for such effects by estimating it using a secondary question. We acknowledge that this will not account for every type of measurement error. The appropriate methodological guidelines should still be followed to reduce the risk of measurement error confounding study results. These guidelines may include types of survey modes, response formats, respondent training, and response biases.
This paper proposes a model that introduces the effect of measurement error as in [6] onto the binary RRT mixture model that accounts for lack of trust proposed by [3]. Following the work of [7] and utilizing the model proposed by [3], a respondent in the sensitive group (with probability π x ) will switch his/her answer due to mistrust with probability 1 A . So far, this design constitutes the Lovig mixture model shown in Figure 1. To account for measurement error, the model is designed such that each recorded response is switched with probability m. Nodes that switch the recorded response with rate m are added on each terminal node because we assume that the measurement error discussed has an equal chance of occurring at each branch.
After demonstrating that poor estimates occur when measurement error is not accounted for in Section 3.2, we outline a method for estimating the prevalence of measurement error m and trust rates A in Section 3.3 using secondary questions. To begin this discussion, our proposed mixture model is best shown by Figure 2 below:
For this model, we use the following notation:
  • n = size of random sample with replacement ;
  • p = probability that the respondent was in the direct question group ;
  • q = probability that the respondent was in the indirect question group ;
  • 1 p q = probability that the respondent was in the unrelated question group ;
  • π x = proportion of the sensitive trait ;
  • π y = proportion of the unrelated trait ;
  • A = proportion of people who trust the underlying RRT model ;
  • m = probability that the participant’s recorded response was switched due to measurement error;
  • P y = probability of the respondent entering a Yes response .

3.1. Estimating Trust Parameter A with a Greenberg Model

Before estimating the proportion of the sensitive trait, we must estimate the trust parameter A. This study uses an initial question to estimate A using a Greenberg model similar to the approach in [3]. Since the Greenberg model is the most efficient model and privacy is not prioritized in this initial question, the Greenberg model is the ideal choice to estimate untruthfulness. For this question, let there be the following:
  • p g = proportion of the direct question used in a Greenberg model to estimate truthfulness;
  • P y g = probability of a “Yes” response in this Greenberg model;
  • π y g = proportion of people who would answer “Yes” to the unrelated question in the Greenberg model.
Following the approach from [3], we use the question, “Do you trust the model?”, for Question 1 with probability p g and an unrelated question with probability 1 p g . This leads to the equations:
P y g = p g A + ( 1 p g ) π y g A ^ = P y g ^ ( 1 p g ) π y g p g
where P y g ^ is the proportion of “Yes” responses in the sample of Greenberg responses. Then,
E [ P y g ^ ] = P y g ; Var [ P y g ^ ] = P y g ( 1 P y g ) n E [ A ^ ] = A ; Var [ A ^ ] = P y g ( 1 P y g ) n p g 2 .
Since E [ A ^ ] = A , this question design provides an unbiased estimator for respondent trust prevalence A.
Now, we discuss the efficiency of the proposed model using the estimator A ^ . In Section 3.2, we compute the efficiency of the proposed model that does not account for measurement error to test the behavior of the bias. After demonstrating its impact, this initial question that estimates untruthfulness is used as Question 1 in the proposed model in Section 3.3.

3.2. Proposed Model: Not Accounting for Measurement Error

3.2.1. Efficiency

In this section, we build an estimator for estimator π x u ^ for the sensitive trait that does not account for measurement error using the probability of a “Yes” response. The probability of a “Yes” response is given by
P y = q ( 1 π x ) ( 1 m ) + π x ( 1 A ) ( 1 m ) + A m + p π x A ( 1 m ) + ( 1 A ) m + ( 1 π x ) m ] + ( 1 p q ) π y ( 1 m ) + ( 1 π y ) m = A π x p q + π y 1 p q 1 2 m + m 1 2 q + q .
If the researcher ignores measurement error and erroneously assumes m = 0 , our model becomes naive to measurement error. Let P y u be the probability of a “Yes” response in this naive case. Then, we may find P y u using (2) as follows:
P y u = A π x p q + π y 1 p q + q .
From this, we have the estimator for the sensitive trait under the naive case:
π x u ^ = P y ^ q ( 1 p q ) π y A ^ p q ; p q
where P y ^ is the proportion of “Yes” responses in the sample assuming random sampling with replacement is given by
E [ P y ^ ] = P y ; Var [ P y ^ ] = P y ( 1 P y ) n .
Since π x u ^ is a function of random variables P y ^ and A ^ , we use a first-order Taylor’s expansion for π x u ^ given by
f ( x , y ) = f ( x 0 , y 0 ) + ( x x 0 ) f x ( x 0 , y 0 ) + ( y y 0 ) f y ( x 0 , y 0 )
where x = P y ^ , x 0 = P y , y = A ^ , and y 0 = A . Then, we have the expansion:
π x u ^ P y q ( 1 p q ) π y A p q + P y ^ P y A p q ( A ^ A ) P y q ( 1 p q ) π y A 2 p q ; p q .
From (4), we can easily verify
E [ π x u ^ ] P y q ( 1 p q ) π y A p q A π x p q 1 2 m 2 m π y 1 p q + m 1 2 q A p q ; p q Var [ π x u ^ ] Var [ P y ^ ] A 2 p q 2 + Var [ A ^ ] P y q ( 1 p q ) π y A 2 p q 2 ; p q
where Var [ A ^ ] is given by (1) and Var [ P y ^ ] is given by (3). From (5), we may observe the bias that is introduced when measurement error is not accounted for. Bias for this naive approach is given by:
Bias [ π x u ^ ] = E [ π x u ^ ] π x = m 1 2 A π x p q 2 π y 1 p q 2 q A p q ; p q 0 MSE [ π x u ^ ] = Var [ π x u ^ ] + Bias [ π x u ^ ] 2 .

3.2.2. Simulation Results

To simulate randomized response trials, we used the NumPy module in Python to randomly generate survey results based on specified parameters for π ^ x , π ^ y , p, q, A, and m. Estimators were then calculated using these generated data. We tabulated the data across trials via the Pandas software library in Python and created visualizations with the Matplotlib plotting library.
The simulations in Table 1 demonstrate the simulated effect of the bias from measurement error using the Lovig model in Section 3.2.1. We used MSEs to compare different models. The MSE has been used for this purpose in all major RRT papers including the seminal papers [1,2]. Observing the MSE [ π ^ x u ] , which encompasses the variance and bias of π ^ x u as in (6), the error rates increase in this estimator of π x as significant levels of A and m are introduced. This positive bias causing poor estimates is especially prevalent when measurement error is introduced. Low levels ( m = 0.01 ) do not cause noticeable changes, but a 10% rate of measurement error causes approximately a 5% error rate in the sensitive trait. A similar positive bias may be observed when untruthfulness is introduced, but this is not as severe.

3.3. Proposed Model: Accounting for Measurement Error

Following [3], we first estimate A, then the probability of measurement error (m), and finally, the sensitive trait π x .

3.3.1. Approach

Let Question 1 in this approach be the question, “Do you trust the model?”, using the Greenberg model outlined in Section 3.1. This provides an estimator for A. This is followed by Questions 2 and 3 to estimate measurement error m in the model and the prevalence of the sensitive trait π x , respectively, using the proposed mixture model.

3.3.2. Estimating Measurement Error Using a Secondary Question

Section 3.2 demonstrates that not accounting for measurement error leads to inaccurate estimates. Following [6], we estimate the parameter for measurement error m ^ using our model with an additional modified question that ensures a known sensitivity probability.
For example, this test question could be, “Are you a robot?” Such a question ensures a known sensitive probability of π x = 0 for human respondents. Note that truthfulness A does not have to be accounted for in this question since sensitivity is always zero. Researchers must design a question for this estimator m ^ that satisfies these two requirements. For this secondary design, we may derive the probability of a “Yes” response in this rigged question P y 0 using (2), where π x = 0 .
Question 2: (with secondary question) “Are you a robot?”
P y 0 = m 1 2 q 2 π y 1 p q + q + 1 p q π y .
Using (7), we can set
m ^ = P y 0 ^ q π y ( 1 p q ) 1 2 q 2 π y ( 1 p q ) ; p + q 1 E [ P y 0 ^ ] = P y 0 ; Var [ P y 0 ^ ] = P y 0 ( 1 P y 0 ) n E [ m ^ ] = E [ P y 0 ^ ] q π y ( 1 p q ) 1 2 q 2 π y ( 1 p q ) = m ; p + q 1 .
Since E [ m ^ ] = m , (8) provides an unbiased estimator of m. We may then find the MSE given by
Var [ m ^ ] = Var [ P y 0 ^ ] 1 2 q 2 π y ( 1 p q ) 2 = P y 0 ( 1 P y 0 ) n 1 2 q 2 π y ( 1 p q ) 2 = MSE [ m ^ ] ; p + q 1 .

3.3.3. Estimating Proportion of Sensitive Trait π x

Now that we have estimates for m and A, Question 3 uses the proposed mixture model from Figure 2 to ask a direct sensitive, indirect sensitive, or unrelated question. We use the full probability of a “Yes” response P y to this question to derive the estimator π x ^ .
Question 3: (with mixture model) “Do you have the sensitive trait?”
P y = A π x p q + π y 1 p q 1 2 m + m 1 2 q + q π x ^ = P y ^ m ^ 1 2 q q π y 1 p q 1 2 m ^ A 1 2 m ^ p q ; p 0 , m ^ 1 2
We use the following Taylor’s approximation to make use of the estimator π x ^ :
f ( x , y , z ) f ( x 0 , y 0 , z 0 ) + ( x x 0 ) f x ( x 0 , y 0 , z 0 ) + ( y y 0 ) f y ( x 0 , y 0 , z 0 ) + ( z z 0 ) f z ( x 0 , y 0 , z 0 )
where x = P y ^ , y = m ^ , z = A ^ , x 0 = P y , y 0 = m , and z 0 = A . Then, π x ^ is approximated by:
π x ^ P y m 1 2 q q π y 1 p q 1 2 m A 1 2 m p q + P y ^ P y A 1 2 m p q + ( m ^ m ) 2 P y 1 A ( p q ) ( 1 2 m ) 2 ( A ^ A ) P y m 1 2 q q π y 1 p q 1 2 m A 2 1 2 m p q ; p 0 , m ^ 1 2 .
From (11), we may note that
E [ π x ^ ] π x Var [ π x ^ ] Var [ P y ^ ] A 1 2 m p q 2 + Var [ m ^ ] 2 P y 1 A ( p q ) ( 1 2 m ) 2 2 + Var [ A ^ ] P y m 1 2 q q π y 1 p q 1 2 m A 2 1 2 m p q 2 ; p 0 , m ^ 1 2
where Var [ P y ^ ] is given by (3), Var [ m ^ ] is given by (9), and Var [ A ^ ] is given by (1).

3.3.4. Simulation Results

We observed that the estimates for π x were more accurate when we accounted for measurement error. The estimates in Table 2 contrast the poor results shown in Table 1, where measurement error was ignored. It is now clear that neglecting to account for measurement error in binary RRT models results in inaccurate estimators. Table 2 simulates the estimators discussed.
The simulation in Table 2 offers several insights. Our estimators m ^ , A ^ , π ^ x , and MSE [ π ^ x ] are statistically close to their theoretical values, indicating that these are good estimators. Most significantly, the π ^ x column confirms that our estimator of π x using the proposed model is accurate as untruthfulness and measurement error are introduced. The Greenberg model offers the greatest efficiency indicated by the lowest MSE [ π ^ x ] rates, and the Warner model offers the worst efficiency. The Lovig mixture model is between these values for each level of m , A . We will note in Section 5 that, when privacy protection is also factored in, the mixture model will offer the best performance.

4. Comparison of Measurement Error between Models

Observing the effect of measurement error across models will aid in the researcher’s choice of the model design. Since it is always desired to minimize the impact of measurement error in the chosen model, we discuss the MSE of m for different choices of p and q. To do this, we compare the effect of measurement error between the Lovig, Warner, and Greenberg models.
Each model is differentially affected by measurement error since it is estimated using a secondary question outlined in Section 3.3.2. Recall that the Warner and Greenberg models are special cases of the Lovig mixture model, where q = 0 and q = 1 p , respectively. The estimator for m in (8) is unbiased for all models, so we will compare the measurement error between the models using the respective MSEs.

Numerical Comparison

The simulation results so far have fixed the level of p and parameterized q, but the choice of p has an effect on the model’s performance. The numerical discussion provided in Figure 3 provides four levels of p to observe the behavior of MSE [ m ^ ] .
Asymptotic behavior is expected around q = 0.5 when p < 0.5 since we have p q terms in the denominators of these estimators for MSE [ m ^ ] . This indicates that p , q should never be chosen close to 0.5 to avoid poor estimators.
When p > 0.5 , the Warner model introduces the highest error in terms of measurement error, while the Greenberg model offers the least error due to measurement error for all considered values of m. Since this error function is monotonically increasing, the Lovig model offers moderate error as the bold values in Table 3 demonstrate. When p < 0.5 , it is imperative that the parameter q be carefully chosen for the Lovig model to avoid large values of MSE [ m ^ ] . In this case, the comparative rank of error due to measurement error between models is similar to when p > 0.5 . This is seen in Table 3 with the MSE [ m ^ ] values provided below. This table also shows that MSE [ m ^ ] ^ is a good estimator.
Both Figure 3 and Table 3 show that the choice of p and q has a more significant impact than the level of m on the amount of error introduced into the model from MSE [ m ^ ] . This fact underscores the importance of a proper choice of model parameters since certain models are better suited for accounting for measurement error. To provide a complete recommendation to a researcher designing an RRT study, a discussion on privacy and a unified measure is required. The choice of p , q involves considering trade-offs between efficiency, privacy protection, and measurement error. This is provided in Section 6.

5. Privacy of Mixture Model

We provide a brief overview of how privacy is measured in previous binary RRT models and then propose our new privacy measure.

5.1. Previous Work

Model efficiency is not the only performance basis by which researchers should design their studies; respondent privacy is just as important. Without response privacy, respondents may refuse to respond or provide an untruthful response.

5.1.1. Privacy Measure

Ref. [9] provided a measure of privacy loss as described below. Let:
δ = Max { η 1 , η 2 }
where
η 1 = P ( S | Y ) = probability   of   being   in   the   sensitive   group   given   that   the   response   is   Yes ; η 2 = P ( S | N ) = probability   of   being   in   the   sensitive   group   given   that   the   response   is   No .
Privacy protection, P P introduced by [10] is defined as
P P = 1 δ 1 π x .

5.1.2. Unified Measure M

Ref. [3] proposed a unified measure of privacy and efficiency using the following metric:
M = P P a MSE b ,
where a and b are weights based on the importance the researcher places on privacy and efficiency, respectively. Ref. [3] assumed a = b = 1 , arguing equal importance to efficiency and privacy. We follow the same approach here.

5.1.3. Privacy of Proposed Model

The following estimators for privacy loss η 1 , η 2 are derived for the proposed mixture model accounting for measurement error from Section 3.
η 1 = P ( S | Y ) = P ( Y S ) P ( Y ) = π x m + A p ( 1 2 m ) + ( 1 2 m ) ( 1 p ) π y + ( 1 2 m ) ( 1 A π y ) q P y η 2 = P ( S | N ) = P ( N S ) P ( N ) = π x 1 q π y ( 1 p q ) A ( p q ) m ( 1 2 A ( p q ) 2 q + 2 π y ( p + q 1 ) ) 1 P y

5.1.4. Unified Measure M of Proposed Model

Table 4 performs a simulation study that compares the three models by efficiency, privacy protection, and [3]’s unified measure. Privacy is calculated using the traditional method as defined in (13). The unified measure M is defined in (14). Several key values are noted in bold.
This is the classical view of model privacy, but now, we pursue a new method of discussing privacy offered by binary RRT models. As expected, the P P and M match well with their empirical estimates even in the presence of measurement error. We observed the same conclusions as in [3] for efficiency, privacy protection, and M in the presence of m: the Greenberg model is best in terms of efficiency; the Warner model is best in terms of privacy; the Lovig model is best in terms of M .

5.2. Proposed Measure of Privacy for Binary RRT Models

5.2.1. Description

The traditional method of privacy introduced by [9] can be difficult to interpret. We propose using the odds ratio as a measure of the predictability of participants’ true responses from recorded model responses. The odds ratio (OR) is a statistical measure commonly used in binary logistic regression to quantify the association between two binary variables.
The odds ratio quantifies the change in the odds of the sensitive trait as the reported response changes from a 0 (“No”) to a 1 (“Yes”). Values greater than 1 indicate higher odds of the true response being “Yes”, suggesting better predictability. The odds ratio serves as a valuable metric to assess the predictability of the reported response data from the RRT model.
In our study, we hypothesize that a higher odds ratio corresponds to a lower level of privacy protection offered by the RRT model. Conversely, a lower odds ratio suggests a lower level of predictability and efficiency, but provides greater respondent privacy. We will investigate how different factors, such as the introduction of measurement error and trust in the chosen RRT model, affect the odds ratio.

5.2.2. Privacy Measure Simulation Results

In Table 5, we simulate the odds ratio in all three models accounting for measurement error.
Table 5 provides several insights into the effect of privacy offered by the three models accounting for measurement error. We observed the same conclusions as in [3] for privacy with this new approach: the Warner model offers the best privacy; the Greenberg model offers the least privacy; the Lovig model offers moderate to high privacy.
The odds ratio offers greater interpretability for a choice of p , q compared to P P from (13). For example, a mixture model was conducted where p , q were chosen as p = 0.7 and q = 0.15 and A = 0.95 and m = 0.05 . The estimated odds ratio for this model would be interpreted by making the claim that the odds of the true answer being “Yes” given that the reported answer was “Yes” is 11.63-times greater than the odds of the true answer being “No”. This contrasts the traditional definition of privacy protection P P , where its definition is a conditional probability that may only be interpreted using relative levels between models. While the traditional definition of privacy protection is useful for defining the unified measure M , the logistic regression coefficients hold interpretation value. This new definition for privacy requires no adjustments to the unified measure’s definition.

6. Discussion

The researcher’s choice of the ideal model must take into account measurement error and the unified measure. It has been shown that the effect of measurement error on both model efficiency and privacy is significant. To account for this, the unified measure M accounts for measurement error since it uses the updated estimators in this paper. Since MSE [ m ^ ] is in the equation for MSE [ π x ^ ] , M discounts the performance for choices of p , q where MSE [ m ^ ] increases. Therefore, we recommend utilizing M for designing a binary RRT study accounting for measurement error.
After fixing the sample size and preliminary question parameters, for all reasonable possibilities of untruthfulness, measurement error, and the sensitive trait, the Lovig model for measurement error (with parameters p , q sufficiently far from 0.5) optimizes the unified measure M . This is preserved when the ratio between privacy protection and efficiency is parameterized using a , b . There are two regions of p , q that locally optimize M : when p ( 0.7 , 0.8 ) and q ( 0.05 , 0.15 ) , as demonstrated in Table 4, and when p ( 0.1 , 0.3 ) and q ( 0.7 , 0.8 ) . The former was chosen in the tables because its unified measure is always greater than the latter. Unified measure values for these cases become closer when a > b (indicating prioritization of privacy protection above efficiency) or under higher rates of untruthfulness and measurement error. All options reduce the overall M values. Researchers should, therefore, choose their parameters in this region so that M is maximized regardless of untruthfulness, measurement error, or sensitive trait levels.
The Lovig mixture model for measurement error outperforms both the Greenberg and the Warner models in terms of the MSE and unified measure in most cases in the presence of untruthfulness and measurement error. The researcher must choose model parameters that optimize the unified measure while ensuring a high rate of participant cooperation. To this point, [3] noted that the choice from three questions in the mixture model helps improve the respondent cooperation as compared to when they have a choice of two questions, as in the Warner and Greenberg models.

7. Concluding Remarks

The proposed mixture model is the recommended RRT model for collecting sensitive data with the best performance and flexibility for privacy and efficiency compared to existing models. This model accounts for measurement error and untruthful responses using secondary questions to estimate their prevalence. The choice of model parameters significantly impacts both privacy and efficiency. A new logistic regression method is proposed to compare the models comprehensively concerning privacy. The adaptability for conditions of untruthfulness and measurement error allows researchers to choose the most suitable model for their specific needs.
While this paper has outlined a promising design for addressing measurement error within binary RRT models and introduced an innovative privacy measure, several considerations warrant acknowledgment. The real-world implementation of the outlined methods might encounter challenges, particularly with the proposed three-question design. The complexity of this design necessitates a more extensive explanation to participants. This may lead to increased abandonment when combined with sensitive questions, impacting data collection and accuracy. Future research and practical application should address these concerns to ensure the effectiveness and feasibility of these proposed procedures in diverse settings.
This study leverages the comprehensive [3] mixture model to account for the effects of measurement error. The measurement error’s impact on efficiency and privacy has not been discussed in the published literature on binary RRT models. This development will help to improve the efficacy of future studies that make use of binary RRT models.

Author Contributions

Conceptualization, B.M., V.P., S.G. and S.K.; Methodology, B.M., V.P., S.G. and S.K.; Software, B.M. and V.P.; Formal analysis, B.M., V.P. and S.G.; Resources, S.G.; Writing—original draft, B.M.; Writing—review & editing, B.M., V.P., S.G. and S.K.; Visualization, B.M. and V.P.; Supervision, S.G. and S.K.; Project administration, S.G.; Funding acquisition, S.G. All authors have read and agreed to the published version of the manuscript.

Funding

We would like to thank the National Science Foundation, under grant No. DMS-2244160 for supporting this research.

Data Availability Statement

The numerical data analyzed in this study are generated from the software available at https://github.com/BaileyMeche/MeasurementError_RRT. No external datasets were used.

Acknowledgments

The authors would like to express their deep appreciation to the reviewers for their careful reading of the initial submission and helpful comments, which helped to improve the presentation.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Warner, S.L. Randomized response: A survey technique for eliminating evasive answer bias. J. Am. Stat. Assoc. 1965, 60, 63–69. [Google Scholar] [CrossRef]
  2. Greenberg, B.G.; Abul-Ela, A.L.A.; Simmons, W.R.; Horvitz, D.G. The unrelated question randomized response model: Theoretical framework. J. Am. Stat. Assoc. 1969, 64, 520–539. [Google Scholar] [CrossRef]
  3. Lovig, M.; Khalil, S.; Rahman, S.; Sapra, P.; Gupta, S. A mixture binary RRT model with a unified measure of privacy and efficiency. Commun. Stat.-Simul. Comput. 2023, 52, 2727–2737. [Google Scholar] [CrossRef]
  4. Kumar, S.; Kour, S.P. The joint influence of estimation of sensitive variable under measurement error and non-response using ORRT models. J. Stat. Comput. Simul. 2022, 92, 3583–3604. [Google Scholar] [CrossRef]
  5. Tiwari, K.K.; Bhougal, S.; Kumar, S.; Rather, K.U.I. Using Randomized Response to Estimate the Population Mean of a Sensitive Variable under the Influence of Measurement Error. J. Stat. Theory Pract. 2022, 16, 28. [Google Scholar] [CrossRef]
  6. McCance, W.; Gupta, S.; Khalil, S.; Shou, W. Binary Randomized Response Technique (RRT) Models Under Measurement Error. Commun. Stat. Simul. Comput. 2024, 1–8. [Google Scholar] [CrossRef]
  7. Young, A.; Gupta, S.; Parks, R. A binary unrelated-question rrt model accounting for untruthful responding. Involve J. Math. 2019, 12, 1163–1173. [Google Scholar] [CrossRef]
  8. Gupta, S.; Mehta, S.; Shabbir, J.; Khalil, S. A unified measure of respondent privacy and model efficiency in quantitative RRT models. J. Stat. Theory Pract. 2018, 12, 506–511. [Google Scholar] [CrossRef]
  9. Lanke, J. On the degree of protection in randomized interviews. In International Statistical Review/Revue Internationale de Statistique; International Statistical Institute: The Hague, The Netherlands, 1976; Volume 44, No. 2; pp. 197–203. [Google Scholar]
  10. Fligner, M.A.; Policello, G.E.; Singh, J. A comparison of two randomized response survey methods with consideration for the level of respondent protection. Commun. Stat. Theory Methods 1977, 6, 1511–1524. [Google Scholar] [CrossRef]
Figure 1. Mixture binary RRT model accounting for untruthfulness by [3].
Figure 1. Mixture binary RRT model accounting for untruthfulness by [3].
Mathematics 12 00875 g001
Figure 2. Accounting for measurement error in mixture model by [3].
Figure 2. Accounting for measurement error in mixture model by [3].
Mathematics 12 00875 g002
Figure 3. Impact of varying q on MSE [ m ^ ] with n = 500 and π y = 1 12 . The values of q = 0 , 1 p represent the Greenberg and Warner models, respectively. The values of q ( 0 , 1 p ) represent the Lovig model. Top left: p = 0.7 , top right: p = 0.6 , bottom left: p = 0.4 , bottom right: p = 0.3 .
Figure 3. Impact of varying q on MSE [ m ^ ] with n = 500 and π y = 1 12 . The values of q = 0 , 1 p represent the Greenberg and Warner models, respectively. The values of q ( 0 , 1 p ) represent the Lovig model. Top left: p = 0.7 , top right: p = 0.6 , bottom left: p = 0.4 , bottom right: p = 0.3 .
Mathematics 12 00875 g003
Table 1. Estimates π ^ x u and MSE [ π ^ x u ] ^ when measurement error is not accounted for. Results aggregated over 10 , 000 trials with n = 500 , p = 0.7 , q = 0.15 , π x = 0.4 , and π y = 1 12 .
Table 1. Estimates π ^ x u and MSE [ π ^ x u ] ^ when measurement error is not accounted for. Results aggregated over 10 , 000 trials with n = 500 , p = 0.7 , q = 0.15 , π x = 0.4 , and π y = 1 12 .
pqmA A ^ π x π ^ xu MSE [ π ^ xu ] MSE [ π ^ xu ] ^
0.70.150.001.001.00010.40.39980.00170.0016
0.70.150.000.950.95040.40.39910.00190.0017
0.70.150.000.900.89990.40.40020.00210.0019
0.70.150.011.001.00010.40.40410.00170.0016
0.70.150.010.950.95030.40.40480.00190.0017
0.70.150.010.900.89990.40.40610.00210.0019
0.70.150.051.001.00050.40.42080.00210.0020
0.70.150.050.950.95020.40.42400.00250.0023
0.70.150.050.90.89960.40.42840.00290.0027
0.70.150.101.000.99990.40.44320.00350.0035
0.70.150.100.950.95020.40.44890.00430.0042
0.70.150.100.900.90020.40.45630.00520.0051
Table 2. Estimates π ^ x and MSE [ π ^ x ] ^ when we account for measurement error. Results aggregated over 10 , 000 trials, with n = 500 , p = 0.7 , π x = 0.4 , and π y = 1 12 .
Table 2. Estimates π ^ x and MSE [ π ^ x ] ^ when we account for measurement error. Results aggregated over 10 , 000 trials, with n = 500 , p = 0.7 , π x = 0.4 , and π y = 1 12 .
qm m ^ A A ^ π x π ^ xu MSE [ π ^ xu ] MSE [ π ^ xu ] ^
Greenberg model
0.000.010.00991.000.99970.40.40010.00100.0009
0.000.010.01010.950.95030.40.39970.00110.0010
0.000.010.01010.900.90020.40.39960.00120.0011
0.000.050.04991.001.00020.40.39970.00120.0011
0.000.050.05010.950.94970.40.40000.00140.0012
0.000.050.05020.900.90010.40.40050.00150.0013
0.000.100.10031.000.99950.40.39940.00160.0014
0.000.100.09990.950.94990.40.40010.00170.0016
0.000.100.09990.900.90020.40.39920.00190.0017
Lovig model
0.150.010.01001.001.00000.40.39960.00180.0016
0.150.010.01030.950.95000.40.39950.00190.0018
0.150.010.01020.900.89950.40.39930.00210.0020
0.150.050.05011.000.99930.40.40030.00210.0019
0.150.050.04990.950.95010.40.39950.00230.0021
0.150.050.05020.900.90020.40.39960.00250.0023
0.150.100.09961.000.99980.40.40140.00260.0025
0.150.100.10010.950.94990.40.39970.00290.0028
0.150.100.09960.900.89990.40.40040.00320.0030
Warner model
0.30.010.01041.001.00000.40.39930.00340.0032
0.30.010.01070.950.95000.40.40120.00370.0036
0.30.010.00990.900.90020.40.39870.00420.0040
0.30.050.05201.001.00030.40.40010.00400.0038
0.30.050.05100.950.94980.40.40060.00440.0042
0.30.050.05070.900.89980.40.40060.00490.0046
0.30.100.09951.001.00030.40.39980.00500.0048
0.30.100.10010.950.95030.40.39960.00550.0054
0.30.100.10000.900.89960.40.39920.00620.0060
Table 3. Theoretical and estimated values of MSE of measurement error estimators in Section 3.3.2. Estimates aggregated over 10 , 000 trials, with n = 500 , π x = 0.4 , p = 0.7 , and π y = 1 12 .
Table 3. Theoretical and estimated values of MSE of measurement error estimators in Section 3.3.2. Estimates aggregated over 10 , 000 trials, with n = 500 , π x = 0.4 , p = 0.7 , and π y = 1 12 .
pqm m ^ MSE [ m ^ ] MSE [ m ^ ] ^
Greenberg model
0.70.000.010.00990.00010.0001
0.70.000.050.04990.00010.0001
0.70.000.100.10030.00020.0002
Lovig model
0.70.150.010.01000.00060.0006
0.70.150.050.05010.00070.0007
0.70.150.100.09960.00080.0008
Warner model
0.70.30.010.01040.00260.0027
0.70.30.050.05200.00270.0028
0.70.30.100.09950.00280.0027
Table 4. A comparison of theoretical P P and M between the three binary RRT models. Calculated with n = 500 , p = 0.7 , π x = 0.4 , and π y = 1 12 .
Table 4. A comparison of theoretical P P and M between the three binary RRT models. Calculated with n = 500 , p = 0.7 , π x = 0.4 , and π y = 1 12 .
q π x Am MSE [ π ^ x ] MSE [ π ^ x ] ^ PP PP ^ M M ^
Greenberg model
0.000.41.000.010.00100.00090.11170.1119107.6471123.2866
0.000.40.950.010.00110.00100.11690.1168103.0169120.1149
0.000.40.900.010.00120.00110.12260.122798.5821113.0005
0.000.41.000.050.00120.00110.22340.2234180.9270199.7414
0.000.40.950.050.00140.00120.23240.2320171.7727194.7648
0.000.40.900.050.00150.00130.24220.2436162.8959189.5389
0.000.41.000.100.00160.00140.34880.3494222.2470245.2008
0.000.40.950.100.00170.00160.36060.3604209.1361224.7503
0.000.40.900.100.00190.00170.37310.3729196.3539220.4197
Lovig model
0.150.41.000.010.00180.00160.43980.4395249.8652271.3308
0.150.40.950.010.00190.00180.45250.4516233.1124253.8097
0.150.40.900.010.00210.00200.46590.4645216.8621235.4783
0.150.41.000.050.00210.00190.49780.4975239.3552257.3392
0.150.40.950.050.00230.00210.51060.5111222.4784242.4145
0.150.40.900.050.00250.00230.52410.5225206.1068223.8847
0.150.41.000.100.00260.00250.56650.5670216.0764228.5641
0.150.40.950.100.00290.00280.57910.5792199.9674208.7430
0.150.40.900.100.00320.00300.59220.5923184.3541195.9438
Warner model
0.30.41.000.010.00340.00320.65970.6596195.7034204.8066
0.30.40.950.010.00370.00360.67110.6727179.7029186.7708
0.30.40.900.010.00420.00400.68300.6817164.3027171.1752
0.30.41.000.050.00400.00380.68970.6898173.4890179.6031
0.30.40.950.050.00440.00420.70050.7016159.0592167.1141
0.30.40.900.050.00490.00460.71170.7139145.1756153.5604
0.30.41.000.100.00500.00480.72650.7270145.2988150.4105
0.30.40.950.100.00550.00540.73660.7351132.9618135.4669
0.30.40.900.100.00620.00600.74690.7456121.0998125.1110
Table 5. Estimates averaged over 10,000 simulations for the Lovig mixture model for measurement error in Section 3 with n = 500 , π x = 0.4 , π y = 1 12 , and p = 0.7 .
Table 5. Estimates averaged over 10,000 simulations for the Lovig mixture model for measurement error in Section 3 with n = 500 , π x = 0.4 , π y = 1 12 , and p = 0.7 .
qp + qAmOR
Greenberg model
00.71.000.00124.93
00.71.000.05124.62
00.71.000.10124.62
00.70.950.00105.53
00.70.950.05105.74
00.70.950.10105.74
00.70.900.0089.93
00.70.900.0590.05
00.70.900.1090.05
Lovig model
0.150.851.000.0013.26
0.150.851.000.0513.27
0.150.851.000.1013.27
0.150.850.950.0011.61
0.150.850.950.0511.63
0.150.850.950.1011.63
0.150.850.900.0010.24
0.150.850.900.0510.25
0.150.850.900.1010.25
Warner model
0.31.001.000.005.60
0.31.001.000.055.60
0.31.001.000.105.60
0.31.000.950.005.09
0.31.000.950.055.11
0.31.000.950.105.11
0.31.000.900.004.65
0.31.000.900.054.66
0.31.000.900.104.66
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Meche, B.; Poruri, V.; Gupta, S.; Khalil, S. Accounting for Measurement Error and Untruthfulness in Binary RRT Models. Mathematics 2024, 12, 875. https://doi.org/10.3390/math12060875

AMA Style

Meche B, Poruri V, Gupta S, Khalil S. Accounting for Measurement Error and Untruthfulness in Binary RRT Models. Mathematics. 2024; 12(6):875. https://doi.org/10.3390/math12060875

Chicago/Turabian Style

Meche, Bailey, Venu Poruri, Sat Gupta, and Sadia Khalil. 2024. "Accounting for Measurement Error and Untruthfulness in Binary RRT Models" Mathematics 12, no. 6: 875. https://doi.org/10.3390/math12060875

APA Style

Meche, B., Poruri, V., Gupta, S., & Khalil, S. (2024). Accounting for Measurement Error and Untruthfulness in Binary RRT Models. Mathematics, 12(6), 875. https://doi.org/10.3390/math12060875

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop