Next Article in Journal
Riemann Integral on Fractal Structures
Previous Article in Journal
A New Distance-Type Fuzzy Inference Method Based on Characteristic Parameters
Previous Article in Special Issue
DINA Model with Entropy Penalization
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Generalized Linear Models with Covariate Measurement Error and Zero-Inflated Surrogates

1
Division of Public Health Sciences, Fred Hutchinson Cancer Center, P.O. Box 19024, Seattle, WA 98109-1024, USA
2
Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, P.O. Box 19024, Seattle, WA 98109-1024, USA
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(2), 309; https://doi.org/10.3390/math12020309
Submission received: 30 November 2023 / Revised: 5 January 2024 / Accepted: 15 January 2024 / Published: 17 January 2024
(This article belongs to the Special Issue Statistical Methods in Data Science and Applications)

Abstract

:
Epidemiological studies often encounter a challenge due to exposure measurement error when estimating an exposure–disease association. A surrogate variable may be available for the true unobserved exposure variable. However, zero-inflated data are encountered frequently in the surrogate variables. For example, many nutrient or physical activity measures may have a zero value (or a low detectable value) among a group of individuals. In this paper, we investigate regression analysis when the observed surrogates may have zero values among some individuals of the whole study cohort. A naive regression calibration without taking into account a probability mass of the surrogate variable at 0 (or a low detectable value) will be biased. We developed a regression calibration estimator which typically can have smaller biases than the naive regression calibration estimator. We propose an expected estimating equation estimator which is consistent under the zero-inflated surrogate regression model. Extensive simulations show that the proposed estimator performs well in terms of bias correction. These methods are applied to a physical activity intervention study.
MSC:
62E20; 62F10; 62J12

1. Introduction

In biomedical research, regression analysis is an important tool to understand associations between disease outcomes and risk factors. In practice, however, a risk factor may not be measured precisely. This problem is often called covariate measurement error [1,2,3]. We consider an example when a biomarker is a risk factor for a disease outcome. In practice, the biomarker may have seasonal, daily, or even hourly variation, and a single measurement is prone to a covariate measurement error from instrumentation or human error. Hence, an average of an infinite number of the biomarker measurements during a specified period of time is, therefore, a more meaningful covariate variable than the average of a few observed measurements. However, in practice it is not feasible to make such measurements, and thus studies often rely on single measures at a specific time point with associated measurement error.
Physical activity and nutrient intake are important risk factors for disease incidence and mortality. However, physical activity and nutrient intake data may be measured with errors since they are generally self-report data. This issue is important since measurement error in diet or physical activity may have an attenuation effect on the regression coefficients of exposures in the range of approximately 20% to 50% [4,5,6]. That is, an odds ratio of 1.5 from diet or physical activity may be reduced to the range of 1.22 to 1.38 due to measurement errors in these measures. In addition, an important challenge in this research is that some physical activity or dietary data may have a zero value, such as 0 metabolic equivalent (MET) hours per week from moderate or vigorous physical activity or 0 alcohol intake. One MET is defined as the amount of oxygen consumed while at rest per kilogram of body weight [7]. A 3 MET activity expends three times the energy used by the body at rest. Hence, if a person does a 3 MET activity for 4 h in a week, he or she has done 12 MET hours of physical activity in a week. A naive method without taking into account measurement error may lead to biased effect estimation in regression analysis, and the bias is attenuation in most (but not all) cases [8]. A standard bias correction for measurement error without taking into account a subset of individuals with zero exposure value may be biased in the effect estimation.
One motivating example of our methodology research is covariate measurement error associated with the measurement of physical activity in the APPEAL study (A Program Promoting Exercise and Active Lifestyles; APPEAL: Clinicaltrials.gov NCT00668161) [9]. APPEAL was a year long randomized controlled trial of moderate-to-vigorous intensity exercise vs. control (no exercise) among 202 healthy, sedentary adults recruited between 2001 and 2004 primarily through physician practices, and randomized to an exercise program (n = 100) or a control group (n = 102). The trial was designed to test the effects of exercise on biomarkers of colon cancer and other physiologic and psychosocial outcomes. Numerous case-control and cohort studies have found an inverse association between physical activity and risk of colon cancer [10]. Physical activities are commonly quantified by determining the energy expenditure in kilocalories or by using the MET of the activity. A question of interest is whether there is an association between physical activity via MET-hours/week and c-reactive protein, a biomarker of inflammation, with elevated levels of CRP associated with risk of developing colon cancer. The true average of MET-hours/week is an unobserved variable that is the average of an infinite number of MET-hours/week scores. However, in practice it is not possible to obtain this measure and, thus, the true average of MET-hours/week scores cannot be observed.
In the motivating example given above, two methodology challenges are involved. The first challenge is regression analysis with covariate measurement error, which is due to physical activity (MET-hours/week). The observed error-prone variable is typically called a surrogate variable for the true but unobserved exposure. The second challenge is the zero-inflated surrogate model because some individuals may have zero MET-hours/week. The zero-inflated surrogate issue in some similar research examples is also called truncation of the observed surrogates. In our problem, the second challenge (zero-inflated surrogate modeling) is added to the first challenge (covariate measurement error). Methods for covariate measurement error have been well developed. For example, regression calibration (RC) for covariate measurement error is to replace an error-prone covariate by its conditional expectation given the observed covariates [11]. In linear regression, the RC estimator is a consistent estimator for regression coefficients (Buonaccorsi, 2010, Chapter 5) [12]. However, for logistic and Cox regression, it is known that it is not consistent (Carroll, et al., 2006, Chapter 4) [2]. There is further research on refinement of RC for logistic and Cox regression [13,14]. Another general approximation approach for covariate measurement error is the simulation extrapolation (SIMEX) approach [15,16]. An advantage of SIMEX is that it has the advantage of being easy to implement. There are methods to address the situation when the surrogate variables may be truncated (which is in general the same as zero-inflated surrogate modeling). Tooze et al. investigated a likelihood approach for repeated measures data with clumping at zero [17]. When the observed exposure variables are truncated by a lower limit, the estimation of the disease–exposure association due to measurement error and truncation may not always be attenuation [18].
As discussed above, there is relatively limited research that addresses the issue of measurement error when some individuals may have a zero value (or lower limit) in the observed surrogates. The main objective of the paper is to develop and apply methods to adjust for measurement error in generalized linear models when the observed surrogates may be truncated at a low value (such as 0) among some individuals. The paper is organized as follows: In Section 2, we describe the statistical models for the problem of interest, and discuss the bias issue when we apply a naive RC estimator without taking into account the zero-inflated surrogates. In Section 3, we study a regression calibration estimator for this problem. In Section 4, we propose a maximum likelihood estimator via expected estimating equations for this problem. In Section 5, the results from simulation studies are presented. In Section 6, we apply the methods to the APPEAL study data. We discuss the advantages and limitations of the proposed EEE estimator in Section 7. Concluding remarks are given in Section 8.

2. Statistical Models and Naive RC Estimator

We assume that the total sample size of the study cohort is n. The regression model of interest is the generalized linear model. Let Y i be the response variable, X i be the unobserved true covariate (dietary intake or physical activity) that cannot be precisely measured, and Z i be the vector of covariates which is available for all individuals, i, i = 1 , , n . For simplicity of presentation, the true unobserved exposure X is assumed to be a scalar throughout this paper. The main interest is to estimate the vector of regression coefficients β ( β 0 , β 1 , β 2 ) in the followingregression model:
E ( Y i | X i , Z i ) = g ( β 0 + β 1 X i + β 2 Z i ) ,
where g ( · ) is a specified function. Model (1) contains many important regression models. For example, g ( u ) = u in linear regression, while g ( u ) = ( 1 + e u ) 1 in logistic regression. The goal of the research is to develop valid estimation methods for the regression coefficients β . For the true unobserved covariate X i , we assume that there are k i non–negative surrogate variables W i j , j = 1 , , k i such that W i j = max ( c , W i j ) , where c is a detection limit, W i j = X i + U i j , in which U i j is an independent measurement error with E ( U i j ) = 0 . Let η i j be the indicator function for a positive W i j value, that is, η i j = I [ W i j > c ] . In a covariate measurement error problem when the surrogates are not truncated, replicates W i j , j = 1 , , k i , are used to estimate the measurement error variance where k i is the number of replicates. We use notation W ˜ i for ( W i 1 , , W i k i ), W ˜ i for ( W i 1 , , W i k i ) , and η ˜ i for ( η i 1 , , η i k i ) .
To understand the RC estimator, we consider a special linear regression case that Y i = β 0 + β 1 X i + e i , where e i is a mean-zero random residual term. Assume W i j = X i + U i j , j = 1 , , k , then it is easily seen that E ( Y i | W ˜ i ) = β 0 + β 1 E ( X i | W ˜ i ) . From this argument, it is seen that under the special linear regression case above, replacing an unobserved true X i with E ( X i | W ˜ i ) will lead to a consistent estimator. This method is often called the RC estimator [2]. In this case, E ( Y i | W ˜ i ) is the calibration function. We may also use E ( Y i | W ¯ i ) , where W ¯ i = j = 1 k W i j / k , as the calibration function to replace the unobserved X i . If replicates W i j , j = 1 , , k i are from a normal distribution, then E ( Y i | W ¯ i ) = E ( Y i | W ˜ i ) [14]. Let μ x and σ x denote the mean and standard deviation of any random variable X, respectively. Calculation of the conditional expectation of the unobserved exposure given the surrogates can be obtained based on a bivariate normal assumption such that
E ( X i | W ¯ i ) = μ x + σ x 2 σ x 2 + σ u 2 / k 1 W ¯ i μ x .
Therefore, E ( Y i | W ¯ i ) = β 0 + β 1 W ¯ i , then β 1 = { σ x 2 σ x 2 + σ u 2 / k 1 } β 1 . From this calculation, a naive estimator using W ¯ i as a replacement for X i will have an attenuation effect. When Z is in the model, a standard RC estimator is to replace X i with E ( X i | W ¯ i , Z i ) . This can be done by a multivariate-normal assumption with a conditional mean formula similar to the formula given above. However, a more practical approach is via a semiparametric RC approach by assuming a working regression model of E ( W i j | W i j , Z i ) = α 0 + α 1 W i j + α 2 Z i , where j j = 1 , , k , and ( α 0 , α 1 , α 2 ) is the vector of regression coefficients. This semiparametric RC estimator does not assume a multivariate normality assumption of the observed surrogates and covariates [19,20].
However, in our problem, the observed W i j is different from W i j if W i j < c . Using W i j data will likely overestimate μ x , but underestimate σ x , and σ u since W i j = c if W i j < c . For linear regression with truncated surrogates, standard RC may be biased because E ( X i | W ¯ i ) will be different from E ( X i | W ¯ i ) . One naive approach is to use the observed W i j as W i j , without taking into consideration the truncated surrogates, to calculate the RC estimator. We call this estimator a naive RC (NRC) estimator. As discussed above, the NRC estimator is biased even when the main regression model is linear. The asymptotic variance of the NRC estimator can be obtained by a sandwich variance estimator where the vector of the estimating equations is obtained by stacking the estimating equations for β and the nuisance parameters involved in the calculation of the calibration function E ( X i | W ˜ i , Z i ) (but noting that the NRC estimator assumes W ˜ i is the same as W ˜ i ). However, if there are many covariates in the modeling of the calibration function, then it will be computationally easier to use bootstrap variance estimation to obtain the standard errors.

3. Regression Calibration for Zero-Inflated Surrogates

The NRC estimator described in the previous section does not take into account zero values due to truncation. Now, we consider calibration based on truncate surrogates due to zero values. To understand the method, we first consider a linear regression model Y i = β 0 + β 1 X i + β 2 Z i + e i , where e i has mean 0, and is independent of X i and Z i . Then, E ( Y i | W ˜ i , Z i ) = β 0 + β 1 E ( X i | W ˜ i , Z i ) + β 2 Z i . That is, replacing X i with E ( X i | W ˜ i , Z i ) in the regression analysis may be a valid approach. Let X ^ i E ( X | W ˜ , Z ) . The estimating equation for the RC estimator can be expressedas
i = 1 n ( 1 , X ^ i , Z i ) { Y i ( β 0 + β 1 X ^ i + β 2 Z i ) } = 0 .
Hence, when Y i given ( X i , Z i ) is linear, we have the following result:
Proposition 1.
Assume the surrogate variables W i j , j = 1 , , k i may be truncated by a lower limit, and the truncation indicator η ˜ i is independent of Y i given ( X i , Z i ) . If Y i = β 0 + β 1 X i + β 2 Z i + e i , where e i has mean 0, and is independent of X i and Z i . Then the RC estimator solving (2) is a consistent estimator of β.
The proof of Proposition 1 is given in Appendix A. We note that because of the surrogate assumption, the measurement errors U i j and e i are independent, which is needed to ensure that estimating Equation (2) is unbiased. Hence, for linear regression with zero-inflated surrogates, the RC estimator is consistent. However, when the mean function of Y i given X i , Z i is not linear, the RC estimator may be biased since the expectation of the estimating score will no longer be zero. For logistic regression, pr ( Y i = 1 | X i , Z i ) = H ( β 0 + β 1 X i + β 2 Z i ) , where H ( u ) = { 1 + exp ( u ) } 1 is the logistic function. Although the RC estimator is not consistent, the RC estimator can be considered as an improved estimator of the NRC estimator described in the last section. The calibration function can be calculated based on the likelihood function. We use notation L ( X ) to denote a likelihood function for any random variable X, and L ( Y | X ) to denote a conditional likelihood function of Y given X, for any two random variables X and Y. Generally, the conditional calibration function can be calculated by the following:
E { X i | W ˜ i , Z i } = x x j { L ( W i j | X i = x , Z i ) } η i j { L ( W i j = c | X i = x , Z i ) } 1 η i j L ( Z i , X i = x ) d x x j { L ( W i j | X i = x , Z i ) } η i j { L ( W i j = c | X i = x , Z i ) } 1 η i j L ( Z i , X i = x ) d x .
In (3), we note that L ( W i j = c | X i = x , Z i ) = L ( U i j c x ) . From the argument given above, the RC estimator can be obtained by replacing an unobserved X i by E { X i | W ˜ i , Z i } based on (3). The asymptotic variance of the RC estimator can be obtained by a stacked sandwich estimator that is similar to the one for the NRC estimator described in the last section, or by bootstrap variance estimation.

4. Expected Estimating Equation Estimator

We now develop another approach to this problem via the maximum likelihood (ML) estimation. We first take a different viewpoint linking the ML estimation and the conditional expectation of the full data estimating equation, namely, the estimating equation when there is no measurement error. The full data likelihood, L ( Y i | X i , Z i ) , is the likelihood function of Y i given ( X i , Z i ) . The full data estimating equation for β can be expressed as i = 1 n ϕ ( Y i , X i , Z i , β ) = 0 , in which ϕ ( Y i , X i , Z i , β ) is the derivative of log { L ( Y i | X i , Z i ) } with respect to β . Because the true X i is not observed, the full data estimating equation can not be directly applied to the data. With the observed data, the estimating score will be from the likelihood of Y i given Z i and W i , denoted by L ( Y i | Z i , W i ) . If the distribution of ( W ˜ i , X i , Z i ) does not involve β , then
β log L ( Y i | W ˜ i , Z i ) = ( / β ) x L ( Y i | X i , Z i ) L ( W ˜ i | X i = x , Z i ) L ( X i = x , Z i ) d x L ( Y i , W ˜ i , Z i ) = E { β log L ( Y i | X i = x , Z i ) | Y i , W ˜ i , Z i } .
From the equations given above, the likelihood-based score of the observed data can be obtained by the conditional expectation of the likelihood-based score of the full data given the observed data. That is, the estimating score for an individual can be expressed as E { ϕ ( Y i , X i , Z i , β ) | Y i , W ˜ i , Z i } , which is the observed data estimating score. The ML estimator can be obtained from the idea of expected estimating equations [21]. Therefore, the ML estimator can be obtained by solving
i = 1 n E { ϕ ( Y i , X i , Z i , β ) | Y i , W ˜ i , Z i } = 0 .
In general, ϕ ( Y i , X i , Z i , β ) does not need to be the full data likelihood-based estimating score. It can be any estimating equation that satisfies E { ϕ ( Y i , X i , Z i , β ) } = 0 . For example, it can be a weighted estimating equation of the ML estimator. The estimator solving (4) is the expected estimating equation (EEE) estimator for β . Let Equation (4) be denoted by S ( β , X , Z ) = 0 . Let the EEE estimator be denoted by β ^ e e e . The asymptotic distribution of β ^ e e e can be presented as the following result:
Proposition 2.
Assume Y i given ( X i , Z i ) follows (1), and the surrogate variables W i j , j = 1 , , k i may be truncated by a lower limit, and the truncation indicator η ˜ i is conditionally independent of Y i given ( X i , Z i ) . Assume ϕ ( Y i , X i , Z i , β ) is any estimating equation that satisfies E { ϕ ( Y i , X i , Z i , β ) } = 0 . The EEE estimator solving (4) is consistent for β . Furthermore, n 1 / 2 ( β ^ e e e β ) is asymptotically normal with mean 0 and asymptotic variance given in Appendix A.
The proof of Proposition 2 is given in Appendix A. The EEE in (4) can be calculated by the following:
E { ϕ ( Y i , X i , Z i , β ) | Y i , W ˜ i , Z i } = x ϕ ( Y i , X i , Z i ) L ( Y i | X i = x , Z i ) { j = 1 k i L ( W i j | X i = x , Z i ) } L ( Z i , X i = x ) d x x L ( Y i | X i = x , Z i ) { j = 1 k i L ( W i j | X i = x , Z i ) } L ( Z i , X i = x ) d x ,
where L ( W i j | X i = x , Z i ) = { L ( W i j | X i = x , Z i ) } η i j { L ( W i j = c | X i = x , Z i ) } 1 η i j . The asymptotic variance of the EEE estimator solving (4) for β can be obtained by a sandwich variance estimator. The vector of the estimating equations is obtained by stacking two sets of estimating equations. The first set is the estimating equations for β and the second set is the nuisance parameters involved in the conditional distribution of Y i given ( Z i , W ˜ i ) . However, bootstrap variance estimation is another approach to obtain the standard errors of the EEE estimator.

5. Simulation Study

We conducted a simulation study to examine the finite sample performance of the NRC, RC, and EEE estimators with the naive estimator that used W ¯ i for X i . In Table 1, we illustrate the situation when the regression model is linear and the observed surrogates may have a zero value among some individuals. That is, the observed surrogates were truncated at c = 0 in the simulations. In this table, each individual’s true covariate is X i . We first generated X i , i = 1 , , n , from a normal distribution, where the sample size was n = 500 , and n = 1000 , respectively. We generated two replicates W i 1 and W i 2 for the unobserved X i . With μ x = 1.5 , σ x = 1 , and σ u = 0.707 . The percent of non–zero W i j was η ¯ = 89 % ; 11% of W i j was truncated at 0. We also considered the situation when σ u = 1 , 1.5, and 3 , respectively, in which the percent of non-zero covariates were η ¯ = 86 % , 80%, and 77%, respectively. The outcomes were generated based on linear regression with coefficients β 0 = 0.5 and β 1 = 1 , and the residuals were from a standard normal distribution. In Table 1, Table 2, Table 3 and Table 4, “bias” was obtained from the average of the biases of the regression coefficients estimates of the 500 simulation replicates, “SD” was the sample standard deviation of the estimates, and “ASE” was the average of the estimated standard errors of the estimates. The 95% confidence interval coverage probabilities (CP) were also obtained. The standard errors of the estimates were obtained from sandwich variance estimation. From the result of Table 1, the NRC estimator was not much better than the naive estimator. The reason for limited improvement from the NRC over the naive estimator was because of truncated W values. The RC and EEE estimators were consistent with limited biases under this setting, and hence, they were better than the naive and NRC estimators. Under this setting, the RC and EEE were very comparable.
We considered non-normal X in Table 2 to investigate if the estimators were sensitive to the normality assumption in the calculation. We also examined the sensitivity of the estimators to misspecification of the measurement error distribution. On the upper portion of Table 2, the unobserved X was generated from a mixture of two normal distributions; one with mean 2.5 and variance 1, and the other with mean 1 and variance 0.25, and the mixture percentages were (1/3, 2/3). The result from the upper portion of the table was similar to that of Table 1, except that there were small biases from the RC and EEE estimators. We found that the RC and EEE showed small biases when the unobserved exposure had a skewed distribution, but the bias was not too large in general. Nevertheless, the RC and EEE estimators were still better than the NRC and naive estimators under this situation. On the lower portion of Table 2, we considered the situation when X was normal but measurement error was from a location/scale-transformed chi-squared distribution and a mixture of two normal distributions, respectively. The specification of the mixture of two normal distributions was the same as the mixture of normal distributions given above. The location/scale-transformed chi-squared distribution has mean 0 and variance σ u 2 after a chi-squared random variable was location/scale-transformed. From the sensitivity analysis, the RC and EEE estimators were not sensitive to mild violation due to a mixture of normal distributions since the biases were considered small. However, the biases may be sensitive to violation of the normality assumption while the true distribution was very skewed, as for chi-squared distributions. The biases were moderate, rather than small, when the errors were from chi-squared distributions.
In Table 3, the data were generated similarly to those in Table 1 but the main model was logistic regression such that pr ( Y i = 1 | X i ) = H ( β 0 + β 1 X i ) , where the regression coefficients were β = ( 0 , ln ( 2 ) ) and β = ( 0 , ln ( 3 ) ) , respectively. The findings were similar to those from Table 1 for the situation when β = ( 0 , ln ( 2 ) ) . The biases of the RC and EEE estimators were very small. Although RC is not consistent, it may have limited biases if the relative risk parameter is small to moderate, such as β 1 = ln ( 1.5 ) or β 1 = ln ( 2 ) when the exposure’s standard deviation is about 1. However, when β 1 = ln ( 3 ) , the biases of the RC estimator were larger than those of the EEE estimator. The reason is that the RC estimator’s bias will increase if the relative risk parameter is large. The findings are typically similar to those for measurement error in longitudinal data and survival analysis with covariate measurement error [20,21].
In Table 4, we investigated the situation when both X and Z were included in a linear regression model. We first generated X i , i = 1 , , n and two replicates W i 1 and W i 2 in the same way as those in Table 1. Covariate Z i , i = 1 , , n , were generated via Z i = ρ X i / σ x + 1 ρ 2 V i / σ z , where V i were from N ( 0 , σ z 2 ) and independent from X i , σ z 2 = 1 and ρ = 0.2 . The outcomes were generated via Y i = β 0 + β 1 X i + β 2 Z i + e i , where β 0 = 0.5 , β 1 = 1 and β 2 = 1 , The residuals e i , i = 1 , , n , were generated from a standard normal random variable which was independent of X i and Z i . The findings were mostly similar to those from Table 1. That is, the naive and NRC estimators had large biases while the RC and EEE estimators were consistent with limited biases.

6. Analysis of APPEAL Data

The design of the APPEAL study was briefly reviewed in the Introduction. In this section, we are interested in investigating the association between physical activity measured via MET hours per week and CRP. The outcome variable of interest is the CRP value at baseline. In the APPEAL study, MET hours per week and other data including biomarkers were collected at both baseline and 12 months (end of study). In the control group who did not receive the exercise intervention, physical activity levels did not change significantly between baseline and 12 months. Hence, it seems reasonable to assume that the two MET-hours/week scores at baseline and 12 months in the control group (n = 102) can be treated as replicates. The MET-hours/week data for the exercise intervention group at 12 months were not included in the analysis as the MET-hours/week value changed significantly for study participants randomized to the exercise intervention between baseline and 12 months. As such, these values cannot be treated as replicates. The MET-hours/week scores at baseline and 12 months are surrogate variables (replicates, control arm only) for an unobserved true MET-hours/week score of an individual (unobserved underlying average of a period of time). The true unobserved average MET-hours/week variable is a variable to measure the actual physical activity which cannot be observed. In addition to MET-hours/week, age at baseline was another covariate in the regression analysis.
We first investigated an association between MET-hours/week and CRP at baseline. A scatterplot and a fitted kernel smoother of MET-hours/week and CRP at baseline are shown in the upper portion of Figure 1. The lower portion of Figure 1 is the scatterplot and a fitted kernel smoother of log(MET+1) and log(CRP) at baseline. We excluded 26 individuals with missing data and outliers (defined as values larger than median + 3× interquartile range) for CRP. Hence, a total of 176 individuals are included in the data analysis. The percentage of non-zero log(MET+1) at baseline is 67%, and 68% at 12 month. In our regression analysis, we used the log-transformed data since the transformed data were less skewed.
In this section, the data analysis involved applying our methods to the regression association for the effects of physical activity (MET-hours/week) and age on CRP. The data application here is primarily for the purpose of a demonstration of our new methods. The regression coefficients were estimated based on the naive, RC, CRC, and EEE estimators. The results are given in Table 5. All the four estimators showed that MET was negatively associated with the inflammatory marker CRP; but not significant.
From the naive estimator, when the log(MET+1) score increased by 1 h/week, the CRP, on average, decreased by about 0.07 mg/L. From the NRC, RC, and EEE estimates, when the log(MET+1) score increased by 1 h/week, the CRP, on average, decreased by about 0.1 mg/L. It was observed that the standard errors from the NRC, RC, and EEE estimates were larger than those from the naive estimates. This was a general phenomenon of a bias–efficiency trade-off that has been reported in the measurement error literature, and is consistent with the findings from our simulations. Furthermore, all the four estimates demonstrated a significant effect of age on CRP. On average, an increase of 10 years in age was associated with an increase of approximately 0.15 mg/L in log(CRP).

7. Discussion

In the paper, we propose an EEE estimator for generalized linear models with covariate measurement error when the surrogate variables may have zero values among a subset of individuals. Our work is applicable to the situation for more applications when an exposure may be truncated. Our numerical studies show that RC is better than the naive estimator and NRC estimator in general, but it may be biased under some situations. Overall, the EEE estimator has smaller biases. There is a trade-off between bias and efficiency. The EEE has a larger SE due to this. One limitation of the proposed EEE estimator is that it may be biased if the likelihood function of the exposure variable is misspecified. Our simulation results demonstrate that the biases are moderate if the exposure distribution is not too skewed. Future research is needed to develop a non-parametric approach that does not require the exposure variable distribution [22].
In addition to physical activity or dietary data, biomarker measurements are important for the early detection and monitoring of disease progression. Our methods developed in this paper can be applied to biomarker data. When a biomarker is truncated due to a detection limit, decisions are required concerning how to handle values at or below the threshold in order to avoid biasing the parameter estimates. However, biomarkers are often measured with errors for many reasons, such as imperfect laboratory conditions, analytic variability of the assay, or temporal variability within individuals. The statistical modeling of zero-inflated surrogates in this paper can be applied to the situation when biomarker data are truncated due to a detection limit. Further research is needed if longitudinal biomarker, physical activity, or dietary data, are available over time [23,24,25].

8. Conclusions

We have developed an EEE approach for regression analysis with covariate measurement error when the surrogates may be truncated. One limitation of our proposed EEE estimator is that it is not consistent if the covariate distribution or the measurement error distribution is misspecified. In our simulations, the covariates and measurement errors are from normal distributions. Our simulation results demonstrate that if the misspecification is not too extreme, then the bias is typically small. Hence, if the covariates are skewed, then an appropriate (such as a logarithmic) transformation of the data may reduce the skewness of the data. Then the proposed EEE estimator may work well with likely minimal biases.

Author Contributions

Conceptualization, C.-Y.W. and A.M.; investigation, C.-Y.W. and J.d.D.T.; methodology, C.-Y.W. and J.d.D.T.; writing—original draft, C.-Y.W.; writing—review and editing, C.-Y.W., J.d.D.T., C.D. and A.M. All authors read and agreed to the published version of the manuscript.

Funding

This research was partially supported by US National Institute of Health grants CA235122 (Wang), HL130483 (Wang), CA77572 (McTiernan), CA239168 (Wang, Tapsoba, Duggan and McTiernan), a Breast Cancer Research Foundation award BCRF-23-107 (Wang, Tapsoba, Duggan and McTiernan), and a travel award from the Mathematics Research Promotion Center of the National Science Council of Taiwan (Wang).

Data Availability Statement

The data that support the findings of this study are not available for public access at this moment, but can be requested from the APPEAL study.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proofs of Propositions 1 and 2

Proof of Proposition 1.
Based on a standard surrogate assumption, the measurement errors U i j and e i are independent. Also, the truncation indicator η ˜ i is independent of e i . Hence, E ( e i | W ˜ i , Z i ) = 0 . The unbiasedness of the estimating Equation (2) of the RC estimator can be obtained by calculating the expectation of the estimating score for individual i,
E ( 1 , X ^ i , Z i ) { Y i ( β 0 + β 1 X ^ i + β 2 Z i ) = E ( ( 1 , X ^ i , Z i ) E { Y i ( β 0 + β 1 X ^ i + β 2 Z i ) } | W ˜ i , Z i ) = 0 .
Hence, for linear regression with zero-inflated surrogates, the RC estimator is consistent. □
Proof of Proposition 2.
We note that ϕ ( Y i , X i , Z i , β ) is an estimating score that satisfies E { ϕ ( Y i , X i , Z i , β ) } = 0 . We note that
E E { ϕ ( Y i , X i , Z i , β ) | Y i , W ˜ i , Z i } = E { ϕ ( Y i , X i , Z i , β ) } = 0 .
Hence, estimating Equation (4) for the EEE estimator is unbiased. We now develop the asymptotic distribution of the EEE estimator. Let the estimating score of the EEE estimator for the ith participant E { ϕ ( Y i , X i , Z i , β ) | Y i , W ˜ i , Z i } be denoted by ψ ( Y i , W ˜ i , Z i , β ) . Let G ( β ) = E { ψ ( Y , W ˜ , Z , β ) / β } . By a Taylor expansion of the estimating equation at the true β , and under some regularity conditions, it can be shown that
n 1 / 2 ( β ^ e e e β ) = G 1 ( β ) n 1 / 2 i = 1 n ψ ( Y i , W ˜ i , Z i , β ) + o p ( 1 ) ,
Hence, it is seen that n 1 / 2 ( β ^ e e e β ) is asymptotically normal with mean 0 and variance
{ G ( β ) } 1 n 1 [ i = 1 n ψ ( Y i , W ˜ i , Z i , β ) { ψ ( Y i , W ˜ i , Z i , β ) } ] { G 1 ( β ) } ,

References

  1. Fuller, W.A. Measurement Error Models; John Wiley & Sons: New York, NY, USA, 1987. [Google Scholar]
  2. Carroll, R.J.; Ruppert, D.; Stefanski, L.A.; Crainiceanu, C.M. Measurement Error in Nonlinear Models, A modern Perspective, 2nd ed.; Chapman and Hall: London, UK, 2006. [Google Scholar]
  3. Yi, G.Y. Statistical Analysis with Measurement Error or Misclassification, Strategy, Methods and Application; Springer: New York, NY, USA, 2017. [Google Scholar]
  4. Freedman, L.S.; Carroll, R.J.; Wax, Y. Estimating the relationship between dietary intake obtained from a food frequency questionnaire and true average intake. Am. J. Epidemiol. 1991, 134, 310–320. [Google Scholar] [CrossRef]
  5. Kipnis, V.; Subar, A.F.; Midthune, D.; Freedman, L.S.; Ballard-Barbash, R.; Troiano, R.; Bingham, S.; Schoeller, D.A.; Schatzkin, A.; Carroll, R.J. The structure of dietary measurement error: Results of the OPEN biomarker study. Am. J. Epidemiol. 2003, 158, 14–21. [Google Scholar] [CrossRef]
  6. Kipnis, V.; Midthune, D.; Buckman, D.W.; Dodd, K.W.; Guenther, P.M.; Krebs-Smith, S.M.; Subar, A.F.; Tooze, J.A.; Carroll, R.J.; Freedman, L.S. Modeling data with excess zeros and measurement error: Application to evaluating relationships between episodically consumed foods and health outcomes. Biometrics 2009, 65, 1003–1010. [Google Scholar] [CrossRef]
  7. Jette, M.; Sidney, K.; Blumchen, G. Metabolic equiva-lents (METS) in exercise testing, exercise prescription, and evaluation of functional capacity. Clin Cardiol. 1990, 13, 555–565. [Google Scholar] [CrossRef] [PubMed]
  8. Carroll, R.J.; Galindo, C.D. Measurement Error, Biases, and the Validation of Complex Models for Blood Lead Levels in Children. Environ. Health Perspect. 1998, 106, 1535–1539. [Google Scholar] [CrossRef] [PubMed]
  9. McTiernan, A.; Yasui, Y.; Sorensen, B.; Irwin, M.L.; Morgan, A.; Rudolph, R.E.; Surawicz, C.; Lampe, J.W.; Ayub, K.; Potter, J.D.; et al. Effect of a 12-month exercise intervention on patterns of cellular proliferation in colonic crypts: A randomized controlled trial. Cancer Epidemiol. Biomarkers Prev. 2006, 15, 1588–1597. [Google Scholar] [CrossRef]
  10. Slattery, M.L.; Potter, J.; Caan, B.; Edwards, S.; Coates, A.; Ma, K.N.; Berry, T.D. Energy balance and colon cancer-—beyond physical activity. Cancer Res. 1997, 57, 75–80. [Google Scholar] [PubMed]
  11. Prentice, R.L. Covariate measurement errors and parameter estimation in a failure time regression model. Biometrika 1982, 69, 331–342. [Google Scholar] [CrossRef]
  12. Buonaccorsi, J. Measurement Error: Models, Methods, and Applications; Hapman and Hall/CRC: Boca Raton, FL, USA, 2010. [Google Scholar]
  13. Tsiatis, A.A.; DeGruttola, V.; Wulfsohn, M.S. Modeling the relationship of survival to longitudinal data measured with error. Applications to survival and CD4 count in patients with AIDS. J. Am. Stat. Assoc. 1995, 90, 27–37. [Google Scholar] [CrossRef]
  14. Wang, C.Y.; Wang, N.; Wang, S. Regression analysis when covariates are regression parameters of a random effect model for observed longitudinal measurements. Biometrics 2000, 56, 487–495. [Google Scholar] [CrossRef]
  15. Cook, J.; Stefanski, L.A. A simulation extrapolation method for parametric measurement error models. J. Amer. Statist. Assoc. 1994, 89, 1314–1328. [Google Scholar] [CrossRef]
  16. Stefanski, L.A.; Cook, J.R. Simulation-Extrapolation: The Measurement Error Jackknife. J. Am. Stat. Assoc. 1995, 90, 1247–1256. [Google Scholar] [CrossRef]
  17. Tooze, J.A.; Grunwald, G.K.; Jones, R.H. Analysis of repeated measures data with clumping at zero. Stat. Methods Med. Res. 2002, 11, 341–355. [Google Scholar] [CrossRef]
  18. Richardson, D.B.; Ciampi, A. Effects of exposure measurement error when an exposure variable is constrained by a lower limit. Am. J. Epidemiol. 2003, 15, 355–363. [Google Scholar] [CrossRef]
  19. Wang, C.Y.; Cullings, H.; Song, X.; Kopecky, K.J. Joint nonparametric correction estimation for excess relative risk regression in survival analysis. J. Roy. Statist. Soc. Ser. B 2017, 79, 1583–1599. [Google Scholar] [CrossRef]
  20. Wang, C.Y.; Song, X. Semiparametric regression calibration for general hazard models in survival analysis with covariate measurement error; surprising performance under linear hazard. Biometrics 2021, 77, 561–572. [Google Scholar] [CrossRef]
  21. Wang, C.Y.; Huang, Y.; Chao, E.C.; Jeffcoat, M.K. Expected estimating equations for missing data, measurement error, and misclassification, with application to longitudinal nonignorably missing data. Biometrics 2008, 64, 85–95. [Google Scholar] [CrossRef] [PubMed]
  22. Huang, Y.H.; Hwang, W.H.; Chen, F.Y. Differential measurement errors in zero-truncated regression models for count data. Biometrics 2011, 67, 1471–1480. [Google Scholar] [CrossRef] [PubMed]
  23. Tsiatis, A.A.; Davidian, D. A semiparametric estimator for the proportional hazards model with longitudinal covariates measured with error. Biometrika 2001, 88, 447–458. [Google Scholar] [CrossRef]
  24. Tsiatis, A.A.; Davidian, M. Joint modeling of longitudinal and time-to-event data: An overview. Statistica Sinica 2004, 14, 809–834. [Google Scholar]
  25. Tooze, J.A.; Kipnis, V.; Buckman, D.W.; Carroll, R.J.; Freedman, L.S.; Guenther, P.M.; Krebs-Smith, S.M.; Subar, A.F.; Dodd, K.W. A mixed-effects model approach for estimating the distribution of usual intake of nutrients: The NCI method. Stat. Med. 2010, 29, 2857–2868. [Google Scholar] [CrossRef]
Figure 1. Upper: CRP versus MET; Lower: log(CRP) versus log(MET+1). The lines were obtained from fitting lowess smoothers.
Figure 1. Upper: CRP versus MET; Lower: log(CRP) versus log(MET+1). The lines were obtained from fitting lowess smoothers.
Mathematics 12 00309 g001
Table 1. Simulation study for linear regression with truncated surrogates.
Table 1. Simulation study for linear regression with truncated surrogates.
NaiveNRCRCEEENaiveNRCRCEEE
n = 500 n = 1000
μ x = 1.5 , σ x = 1 , σ u = 0.707 , η ¯ = 89 %
β 0 = 0.5 Bias0.134−0.230−0.0020.0030.133−0.228−0.0030.002
SD0.0930.1170.1030.1030.0640.0800.0720.071
ASE0.0930.1170.1060.1060.0660.0830.0750.074
CP0.6840.4860.9720.9620.4600.1800.9540.966
β 1 = 1 Bias−0.1260.1070.0040.000−0.1270.1030.001−0.002
SD0.0500.0680.0600.0600.0350.0470.0430.042
ASE0.0490.0680.0610.0610.0350.0480.0430.043
CP0.2700.6580.9580.9540.0560.4460.9560.960
μ x = 1.5 , σ x = 1 , σ u = 1 , η ¯ = 86 %
β 0 = 0.5 Bias0.301−0.349−0.007−0.0060.299−0.343−0.005−0.004
SD0.0960.1610.1330.1320.0670.1090.0910.091
ASE0.0950.1620.1360.1360.0680.1130.0950.095
CP0.1220.4040.9600.9520.0020.1060.9660.962
β 1 = 1 Bias−0.2520.1540.0060.006−0.2520.1470.0030.002
SD0.0500.0960.0800.0790.0350.0660.0560.056
ASE0.0490.0960.0820.0820.0350.0670.0570.057
CP0.0020.6740.9520.9580.0000.4240.9480.958
μ x = 1.5 , σ x = 1 , σ u = 1.5 , η ¯ = 80 %
β 0 = 0.5 Bias0.556−0.652−0.0350.0330.558−0.616−0.018−0.019
SD0.1010.3410.2440.2410.0700.2170.1560.157
ASE0.0980.3250.2300.2290.0690.2200.1570.158
CP0.0000.4620.9620.9420.0000.1040.9600.960
β 1 = 1 Bias−0.4450.2630.0230.022−0.4470.2410.0110.012
SD0.0480.1970.1520.1500.0330.1260.0970.099
ASE0.0470.1880.1440.1440.0330.1280.0990.099
CP0.0000.8460.9600.9420.0000.5580.9520.954
μ x = 1.5 , σ x = 1 , σ u = 3 , η ¯ = 77 %
β 0 = 0.5 Bias0.655−0.839−0.057−0.0510.657−0.769−0.024−0.025
SD0.1010.6090.3230.3070.0700.3020.1970.198
ASE0.0980.4660.3000.2960.0690.3020.1980.229
CP0.0000.6340.9560.9220.0000.1500.9560.950
β 1 = 1 Bias−0.5190.3270.0380.034−0.5220.2870.0150.015
SD0.0460.2860.2040.1950.0330.1700.1260.127
ASE0.0450.2630.1910.1890.0320.1700.1260.148
CP0.0000.9720.9560.9180.0000.7160.9480.930
NOTE: Naive is an estimator that uses the average of two replicates as the covariate, NRC is the naive RC estimator described in Section 2, RC is the RC estimator that uses E ( X | W ˜ ) as the covariate, and EEE is the expected estimating equation estimator described in Section 4.
Table 2. Simulation study for linear regression with truncated surrogates; misspecified distribution for covariate X or measurement error.
Table 2. Simulation study for linear regression with truncated surrogates; misspecified distribution for covariate X or measurement error.
NaiveNRCRCEEENaiveNRCRCEEE
n = 500 n = 1000
X is from a mixture of two normal distributions and the error is normal
μ x = 1.5 , σ x = 1 , σ u = 0.707 , η ¯ = 91 %
β 0 = 0.5 Bias0.209−0.0960.0410.0360.204−0.1010.0370.032
SD0.0810.0990.0970.0970.0610.0740.0730.073
ASE0.0840.1050.1030.1030.0600.0740.0720.073
CP0.3000.8780.9400.9460.0740.7200.9000.916
β 1 = 1 Bias−0.1600.038−0.020−0.018−0.1580.041−0.018−0.016
SD0.0450.0580.0570.0570.0330.0430.0420.042
ASE0.0460.0610.0590.0600.0320.0430.0420.042
CP0.0600.9200.9460.9500.0020.8480.9280.928
μ x = 1.5 , σ x = 1 , σ u = 1 , η ¯ = 86 %
β 0 = 0.5 Bias0.341−0.1990.0510.0360.336−0.2040.0500.034
SD0.0840.1320.1230.1250.0630.0980.0900.091
ASE0.0860.1390.1300.1310.0610.0980.0910.092
CP0.0240.7340.9280.9460.0000.4600.9020.920
β 1 = 1 Bias−0.2680.074−0.024−0.017−0.2650.076−0.024−0.017
SD0.0450.0780.0750.0760.0330.0580.0540.055
ASE0.0460.0820.0780.0790.0330.0580.0550.055
CP0.0000.8920.9380.9500.0000.7440.9160.932
X is normal and the error is from a modified chi-square distribution
μ x = 1.5 , σ x = 1 , σ u = 1 , η ¯ = 87 %
β 0 = 0.5 Bias0.384−0.2780.0820.0880.385−0.2750.0850.091
SD0.0950.1690.1340.1340.0670.1180.0940.094
ASE0.0930.1630.1290.1290.0660.1150.0910.091
CP0.0120.6140.8700.8500.0000.3220.8160.792
β 1 = 1 Bias−0.2950.125−0.038−0.040−0.2930.125−0.038−0.040
SD0.0520.1010.0810.0810.0360.0700.0560.056
ASE0.0500.0970.0780.0780.0360.0690.0550.055
CP0.0000.7640.8980.8900.0000.5940.8800.882
X is normal and the error is from a mixture of two normal distribution
μ x = 1.5 , σ x = 1 , σ u = 1 , η ¯ = 84 %
β 0 = 0.5 Bias0.376−0.4310.024−0.0240.380−0.418−0.018−0.018
SD0.0960.1960.1620.1620.0690.1360.1070.107
ASE0.0960.1980.1600.1610.0680.1390.1120.112
CP0.0300.4020.9540.9580.0000.1140.9540.958
β 1 = 1 Bias−0.3110.1830.0130.013−0.3140.1750.0090.009
SD0.0480.1160.0980.0980.0330.0800.0660.066
ASE0.0490.1180.0980.0990.0350.0820.0680.068
CP0.0000.7240.9500.9500.0000.4300.9540.956
NOTE: See the footnote of Table 1 for notation.
Table 3. Simulation study for logistic regression with truncated surrogates.
Table 3. Simulation study for logistic regression with truncated surrogates.
NaiveNRCRCEEENaiveNRCRCEEE
n = 500 n = 1000
μ x = 1.5 , σ x = 1 , σ u = 0.707 , η ¯ = 89 %
β 0 = 0 Bias0.065−0.190−0.010−0.0100.063−0.190−0.012−0.012
SD0.1910.2340.2030.2080.1360.1690.1470.150
ASE0.1810.2240.1930.1990.1280.1580.1360.140
CP0.9220.8360.9380.9440.8920.7660.9360.942
β 1 = ln ( 2 ) Bias−0.0800.083−0.0080.007−0.0790.083−0.0060.008
SD0.1220.1540.1330.1420.0850.1090.0940.100
ASE0.1150.1470.1260.1340.0820.1040.0890.095
CP0.8680.9140.9280.9300.7880.8740.9360.944
β 0 = 0 Bias0.069−0.340−0.014−0.0130.065−0.341−0.018−0.016
SD0.2070.2660.2190.2320.1480.1890.1590.169
ASE0.1970.2540.2100.2230.1390.1790.1480.156
CP0.9300.7060.9500.9480.9000.5180.9280.928
β 1 = ln ( 3 ) Bias−0.1160.146−0.0350.015−0.1140.145−0.0340.014
SD0.1590.2050.1650.1900.1110.1410.1150.132
ASE0.1490.1910.1550.1780.1060.1350.1090.125
CP0.8480.8840.9200.9400.7660.8360.9200.942
μ x = 1.5 , σ x 2 = 1 , σ u 2 = 1 , η ¯ = 86 %
β 0 = 0 Bias0.175−0.276−0.014−0.0150.171−0.277−0.017−0.016
SD0.1860.2770.2220.2300.1350.2030.1660.172
ASE0.1770.2670.2140.2230.1250.1880.1500.156
CP0.8240.8000.9380.9480.7000.6720.9340.940
β 1 = ln ( 2 ) Bias−0.1730.108−0.0140.011−0.1710.109−0.0120.012
SD0.1130.1780.1460.1620.0810.1280.1060.117
ASE0.1080.1710.1400.1550.0760.1210.0980.109
CP0.6100.9140.9480.9460.4040.8560.9260.940
β 0 = 0 Bias0.232−0.487−0.028−0.0230.225−0.487−0.031−0.023
SD0.2040.3330.2490.2690.1460.2360.1830.199
ASE0.1930.3140.2380.2590.1360.2210.1670.181
CP0.7540.6420.9460.9520.6260.3980.9240.922
β 1 = ln ( 3 ) Bias−0.2730.175−0.0560.023−0.2700.174−0.0550.021
SD0.1480.2400.1830.2270.1040.1660.1290.162
ASE0.1380.2220.1710.2130.0980.1560.1200.148
CP0.4880.8920.9000.9460.2300.8240.9020.940
NOTE: See the footnote of Table 1 for notation.
Table 4. Simulation study for linear regression model with truncated surrogates; covariates are X and Z.
Table 4. Simulation study for linear regression model with truncated surrogates; covariates are X and Z.
NaiveRCCRCEEENaiveRCCRCEEE
n = 500 n = 1000
μ x = 1.5 , σ x = 1 , σ u = 0.707 , η ¯ = 89 %
β 0 = 0.5 Bias0.137−0.225−0.006−0.0010.134−0.224−0.0010.005
SD0.0950.1220.1090.1100.0650.0820.0740.073
ASE0.0930.1170.1060.1060.0660.0830.0750.074
CP0.6940.5040.9380.9300.4540.2260.9460.944
β 1 = 1 Bias−0.1370.0940.0040.001−0.1360.0930.001−0.003
SD0.0510.0710.0710.0650.0330.0480.0440.043
ASE0.0500.0690.0640.0630.0360.0490.0440.044
CP0.2040.7420.9400.9380.0200.5380.9540.956
β 2 = 1 Bias0.0420.042−0.004−0.0040.0490.0490.0020.003
SD0.0520.0520.0530.0530.0360.0360.0380.037
ASE0.0500.0500.0500.0500.0350.0350.0360.036
CP0.8520.8520.9380.9380.7040.7040.9420.942
μ x = 1.5 , σ x = 1 , σ u = 1 , η ¯ = 86 %
β 0 = 0.5 Bias0.300−0.347−0.016−0.0160.298−0.338−0.005−0.004
SD0.0980.1700.1420.1430.0670.1140.0950.094
ASE0.0950.1620.1360.1360.0670.1130.0950.094
CP0.1100.4060.9440.9440.0060.1320.9560.954
β 1 = 1 Bias−0.2640.1380.0110.011−0.2640.1320.0040.002
SD0.0510.0990.0870.0870.0330.0680.0600.059
ASE0.0490.0960.0830.0830.0350.0680.0580.058
CP0.0000.7320.9440.9480.0000.5180.9580.958
β 2 = 1 Bias0.0700.070−0.005−0.0060.0760.0760.0020.002
SD0.0540.0540.0590.0590.0380.0380.0420.042
ASE0.0520.0520.0530.0540.0370.0370.0380.038
CP0.7360.7360.9340.9380.4640.4640.9220.920
NOTE: Naive is an estimator that uses the average of two replicates as the covariate, RC is the usual RC estimator that uses E ( X | W ˜ , Z ) as the covariate, CRC is a conditional RC estimator that uses E ( X | W ˜ , Z , η ) as the covariate, EEE is the expected estimating equation estimator described.
Table 5. Analysis results of data from the APPEAL study.
Table 5. Analysis results of data from the APPEAL study.
NaiveNRCRCEEE
   Intercept β 0 0.2590.3450.2990.282
SE0.3600.3770.3670.364
   log(MET+1) β 1 −0.067−0.136−0.107−0.098
SE0.0450.0980.0710.062
   Age β 2 0.0150.0150.0140.015
SE0.0060.0060.0070.007
Nuisance parameters
μ x 1.2580.9250.927
SE 0.1000.1600.161
σ x 2 0.4470.9760.987
SE 0.1450.3370.330
σ u 2 0.9101.6741.671
SE 0.1300.2930.292
Note: See the footnote of Table 1 for notation. The percentages of non-zero log(1+MET) were 66.7% and 67.8% at baseline and 12 months among the participants in the control group, respectively. The total sample size in the analysis was 176.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, C.-Y.; Tapsoba, J.d.D.; Duggan, C.; McTiernan, A. Generalized Linear Models with Covariate Measurement Error and Zero-Inflated Surrogates. Mathematics 2024, 12, 309. https://doi.org/10.3390/math12020309

AMA Style

Wang C-Y, Tapsoba JdD, Duggan C, McTiernan A. Generalized Linear Models with Covariate Measurement Error and Zero-Inflated Surrogates. Mathematics. 2024; 12(2):309. https://doi.org/10.3390/math12020309

Chicago/Turabian Style

Wang, Ching-Yun, Jean de Dieu Tapsoba, Catherine Duggan, and Anne McTiernan. 2024. "Generalized Linear Models with Covariate Measurement Error and Zero-Inflated Surrogates" Mathematics 12, no. 2: 309. https://doi.org/10.3390/math12020309

APA Style

Wang, C.-Y., Tapsoba, J. d. D., Duggan, C., & McTiernan, A. (2024). Generalized Linear Models with Covariate Measurement Error and Zero-Inflated Surrogates. Mathematics, 12(2), 309. https://doi.org/10.3390/math12020309

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop