A Discriminant Function Approach to Adjust for Processing and Measurement Error When a Biomarker is Assayed in Pooled Samples
Abstract
:1. Introduction
2. Methods
2.1. Models for Individual-Level Data without Measurement or Processing Error
2.2. Models for Pooled Data without Measurement or Processing Error
2.3. Models for Pooled Data with Measurement and/or Processing Error
2.4. Design Considerations and Bias Adjustment
3. Example
3.1. Collaborative Perinatal Project Data
3.2. Results
Model | e | AIC | |||||
---|---|---|---|---|---|---|---|
ME and PE b | 0.031 (0.026) | -- | -- | -- | -- | -- | -- |
ME only | 0.032 (0.025) | 0.102 | 0.001 c | -- | 0.311 (0.25) [−0.17, 0.80] | 0.310 (0.25) [−0.17, 0.79] | 420.64 |
PE only | 0.031 (0.026) | 0.079 | -- | 0.078 | 0.388 (0.32) [−0.25, 1.02] | 0.383 (0.32) [−0.25, 1.01] | 412.82 |
Neither ME nor PE | 0.032 (0.025) | 0.103 | -- | -- | 0.309 (0.25) [−0.17, 0.79] | 0.308 (0.24) [−0.17, 0.79] | 418.46 |
Logistic regression d | -- | -- | -- | -- | 0.270 (0.24) [−0.20, 0.74] | -- | -- |
4. Simulations
4.1. Results of Simulations with Neither ME nor PE
N | MSE | Logistic Regression c | |||
---|---|---|---|---|---|
2000 | 0.035 (0.013) | 0.080 | 0.439 (0.166) [94.6%] | 0.438 (0.166) [94.6%] | 0.441 (0.168) [94.8%] |
200 | 0.035 (0.042) | 0.080 | 0.447 (0.545) [95.8%] | 0.438 (0.534) [95.8%] | 0.474 (0.586) [95.1%] |
4.2. Results of Simulations with ME and/or PE
N | d | Logistic Regression e | ||||
---|---|---|---|---|---|---|
2000 | 0.035 (0.017) | 0.079 | 0.081 | 0.082 | 0.474 |0.438| (0.28) [95.4%] | 0.254 |0.254| (0.13) [66.7%] |
1000 | 0.035 (0.024) | 0.077 | 0.082 | 0.081 | 0.463 |0.417| (0.37) [96.2%] | 0.252 |0.251| (0.18) [79.4%] |
500 | 0.035 (0.034) | 0.077 | 0.081 | 0.080 | 0.448 |0.402| (0.49) [97.2%] | 0.259 |0.254| (0.26) [88.9%] |
N | Logistic regression d | |||||
---|---|---|---|---|---|---|
2000 | 0.035 (0.016) | 0.079 | 0.080 | 0.448 (0.22) [0.21] {95.4%} | 0.438 (0.21) [0.21] {95.6%} | 0.291 (0.13) [0.13] {79.9%} |
1000 | 0.035 (0.021) | 0.079 | 0.080 | 0.474 (0.32) [0.31] {96.5%} | 0.450 (0.30) [0.30] {96.4%} | 0.298 (0.19) [0.19] {89.0%} |
500 | 0.036 (0.031) | 0.076 | 0.083 | 0.522 c (0.51) [0.50] {97.5%} | 0.454 c (0.42) [0.43] {96.8%} | 0.307 (0.28) [0.27] {92.0%} |
N | Logistic Regression | |||||
---|---|---|---|---|---|---|
2000 | 0.035 (0.014) | 0.080 | 0.080 | 0.444 (0.18) [0.18] {95.6%} | 0.442 (0.18) [0.18] {95.5%} | 0.356 (0.15) [0.15] {91.3%} |
1000 | 0.035 (0.020) | 0.079 | 0.078 | 0.441 (0.26) [0.26] {95.0%} | 0.438 (0.26) [0.26] {95.0%} | 0.356 (0.21) [0.21] {92.4%} |
500 | 0.035 (0.029) | 0.079 | 0.078 | 0.447 (0.37) [0.37] {96.0%} | 0.440 c (0.37) [0.36] {96.0%} | 0.361 (0.30) [0.30] {94.3%} |
5. Discussion
6. Conclusions
Acknowledgements
Author Contributions
Conflicts of Interest
Appendix: SAS/IML Code Used to Fit Model (7) to Example Data
proc iml worksize = 70 symsize = 250; use fordiscrim; read all var{ki} into kj; read all var{SAsum} into ystar; read all var{smokesum} into smokestar; read all var{racesum} into racestar; read all var{mcp1_sum} into xstrtilde; close fordiscrim; npools = 415; ** 251 pools of size 2, and 164 individual samples **; ** Specifying likelihood for FULL ML method **; START LIKELIC(parms) global (npools,kj,pi,ystar,smokestar,racestar,xstrtilde); bet0prm = parms [1]; bet1prm = parms [2]; gamm1prm = parms [3]; *** Parameters in model (7) to be estimated ***; gamm2prm = parms [4]; sigsqx = parms [5]; sigsqp = parms [6]; sigsqm = parms [7]; pi = 2 * arsin (1); *** NOTE: LOWER BOUND CONSTRAINT ON VARIANCE COMPONENTS FOR STABILITY ***; sigsqx = max (sigsqx,.001); sigsqp = max (sigsqp,.001); sigsqm = max (sigsqm,.001); * contributions to likelihood ; func_lkC = j (npools,1,.); do u = 1 to npools; ystr = ystar [u,1]; smkstr = smokestar [u,1]; racestr = racestar [u,1]; ki = kj [u,1]; xstrt = xstrtilde [u,1]; kigt1 = 0; if ki > 1 then kigt1 = 1; muxtstrgyc = ki#bet0prm + bet1prm#ystr + gamm1prm#smkstr + gamm2prm#racestr; sigsqxtstrgyc = ki#sigsqx + sigsqp#kigt1 + sigsqm; func_lkC[u,1] = (1/sqrt(2#pi#max(sigsqxtstrgyc,1E-4)))#exp(-(xstrt-muxtstrgyc)##2/(2#max(sigsqxtstrgyc,1E-4))); ** Next 2 lines to prevent instability during iterations **; if func_lkC [u,1] < 1E-100 then func_lkC [u,1] = 1E-100; if func_lkC [u,1] > 1E20 then func_lkC [u,1] = 1E20; func_lkchk = func_lkC [u,1]; * print func_lkchk; end; m2loglikC = -2 # sum (log(func_lkC)) ; return (m2loglikC); FINISH LIKELIC; ********************************************************************** The following calls the minimization function, computes the Hessian, etc. **********************************************************************; START COMPC; ** Maximum likelihood method **; * create vector of initial parameter estimates for function; parms =.2||.2||.2||.2||.5||.5||.5; * options vector for minimization function; option = {0 3}; ** matrix of lower (row 1) and upper (row 2) bound constraints on probabilities **; con={. . . . .001 .001 .001, . . . . . . .}; *call function minimizer in IML; call nlpqn(rc,xres, “likelic”,parms,option,con); * create vector of mles computed using function minimizer; Parms = xres`; * compute numerical value of Hessian( and covariance matrix) using mles calculated above ; print parms; * call function to approximate 2nd derivatives for Hessian; call NLPFDD (crit,grad,hess, “likelic”,parms); cov_mat = 2 * inv (hess); se_vec1 = sqrt (vecdiag (cov_mat)); print se_vec1; print cov_mat; print rc; bet0prm = parms [1]; bet1prm = parms [2]; gamm1prm = parms [3]; gamm2prm = parms [4]; sigsqx = parms [5]; sigsqp = parms [6]; sigsqm = parms [7]; sebet1prm = sqrt (cov_mat [2,2]); segamm1prm = sqrt (cov_mat [3,3]); bet1discrim = bet1prm/sigsqx; print bet1discrim; FINISH COMPC; run compc; QUIT;
References
- Dorfman, R. The detection of defective members of a large population. Ann. Math. Stat. 1943, 14, 436–440. [Google Scholar] [CrossRef]
- Emmanuel, J.C.; Bassett, M.T.; Smith, H.J.; Jacobs, J.A. Pooling of sera for human immunodeficiency virus (HIV) testing: An economical method for use in developing countries. J. Clin. Pathol. 1988, 41, 582–585. [Google Scholar] [CrossRef] [PubMed]
- Kline, R.L.; Brothers, T.A.; Brookmeyer, R.; Zeger, S.; Quinn, T.C. Evaluation of human immunodeficiency virus seroprevalence in population surveys using pooled sera. J. Clin. Microbiol. 1989, 27, 1449–1452. [Google Scholar] [PubMed]
- Lan, S.; Hsieh, C.; Yen, Y. Pooling strategies for screening blood in areas with low prevalence of HIV. Biomed. J. 1993, 35, 553–565. [Google Scholar] [CrossRef]
- Brookmeyer, R. Analysis of multistage pooling studies of biological specimens for estimating disease incidence and prevalence. Biometrics 1999, 55, 608–612. [Google Scholar] [CrossRef] [PubMed]
- Schisterman, E.F.; Vexler, A. To pool or not to pool, from whether to when: Applications of pooling to biospecimens subject to a limit of detection. Pediatr. Perinat. Epidemiol. 2008, 22, 486–496. [Google Scholar] [CrossRef] [PubMed]
- Schisterman, E.F.; Vexler, A.; Mumford, S.F.; Perkins, N.J. Hybrid pooled-unpooled design forcost-efficient measurement of biomarkers. Stat. Med. 2010, 29, 597–613. [Google Scholar] [PubMed]
- Weinberg, C.R.; Umbach, D.M. Using pooled exposure assessment to improve efficiency in case-control studies. Biometrics 1999, 55, 718–726. [Google Scholar] [CrossRef] [PubMed]
- Ma, C.-X.; Vexler, A.; Schisterman, E.F.; Tian, L. Cost-efficient designs based on linearly associated biomarkers. J. Appl. Stat. 2011, 38, 2739–2750. [Google Scholar] [CrossRef]
- Zhang, Z.; Albert, P.S. Binary regression analysis with pooled exposure measurements: A regression calibration approach. Biometrics 2011, 67, 636–645. [Google Scholar] [CrossRef] [PubMed]
- Delaigle, A.; Hall, P. Nonparametric regression with homogeneous group testing data. Ann. Stat. 2012, 40, 131–158. [Google Scholar] [CrossRef]
- Saha-Chaudhuri, P.; Weinberg, C.R. Specimen pooling for efficient use of biospecimens in studies of time to a common event. Am. J. Epidemiol. 2013, 178, 126–135. [Google Scholar] [CrossRef] [PubMed]
- Lyles, R.H.; Mitchell, E.M. On Efficient Use of Logistic Regression to Analyze Exposure Assay Data on Pooled Biospecimens; Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University: Atlanta, Georgia, USA, 2013. [Google Scholar]
- Mitchell, E.M.; Lyles, R.H.; Manatunga, A.K.; Danaher, M.; Perkins, N.J.; Schisterman, E.F. Regression for skewed biomarker outcomes subject to pooling. Biometrics 2014, 70, 202–211. [Google Scholar] [CrossRef] [PubMed]
- Mitchell, E.M.; Lyles, R.H.; Manatunga, A.K.; Perkins, N.J.; Schisterman, E.F. A highly efficient design strategy for regression with outcome pooling. Stat. Med. 2014, 33, 5028–5040. [Google Scholar] [CrossRef] [PubMed]
- Cornfield, J. Joint dependence of risk of coronary heart disease on serum cholesterol and systolic blood pressure: A discriminant function analysis. Fed. Proc. 1962, 21, 58–61. [Google Scholar] [PubMed]
- Halperin, M.; Blackwelder, W.C.; Verter, J.I. Estimation of the multivariate logistic risk function: A comparison of the discriminant function and maximum likelihood approaches. J. Chronic Dis. 1971, 24, 125–158. [Google Scholar] [CrossRef]
- Armstrong, B.G.; Whittemore, A.S.; Howe, G.R. Analysis of case-control data with covariate measurement error: Application to diet and colon cancer. Stat. Med. 1989, 8, 1151–1163. [Google Scholar] [CrossRef] [PubMed]
- Buonaccorsi, J.P. Double sampling for exact values in the normal discriminant model with application to binary regression. Commun. Stat. Theory Methods 1990, 19, 4569–4586. [Google Scholar] [CrossRef]
- Lyles, R.H.; Guo, Y.; Hill, A.N. A fresh look at the discriminant function approach for estimating crude or adjusted odds ratios. Am. Stat. 2009, 63, 320–327. [Google Scholar] [CrossRef] [PubMed]
- Hardy, J.B. The Collaborative Perinatal Project: Lessons and legacy. Ann. Epidemiol. 2003, 13, 303–311. [Google Scholar] [CrossRef]
- Whitcomb, B.W.; Schisterman, E.F.; Klebanoff, M.A.; Baumgarten, M.; Rhoten-Vlasak, A.; Luo, X.; Chegini, N. Circulating chemokine levels and miscarriage. Am. J. Epidemiol. 2007, 166, 323–331. [Google Scholar] [CrossRef] [PubMed]
- SAS/STAT 9.2 User’s Guide; SAS Institute, Inc.: Cary, NC, USA, 2008.
- SAS/IML 9.2 User’s Guide; SAS Institute, Inc.: Cary, NC, USA, 2008.
- Firth, D. Bias reduction of maximum likelihood estimates. Biometrika 1993, 80, 27–38. [Google Scholar] [CrossRef]
- Akaike, H. A new look at the statistical model identification. IEEE Trans. Automat. Contr. 1974, 19, 716–723. [Google Scholar] [CrossRef]
- Weinberg, C.R.; Umbach, D.M. Correction to “Using pooled exposure assessment to improve efficiency in case-control studies”. Biometrics 2014. [Google Scholar] [CrossRef]
© 2015 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lyles, R.H.; Van Domelen, D.; Mitchell, E.M.; Schisterman, E.F. A Discriminant Function Approach to Adjust for Processing and Measurement Error When a Biomarker is Assayed in Pooled Samples. Int. J. Environ. Res. Public Health 2015, 12, 14723-14740. https://doi.org/10.3390/ijerph121114723
Lyles RH, Van Domelen D, Mitchell EM, Schisterman EF. A Discriminant Function Approach to Adjust for Processing and Measurement Error When a Biomarker is Assayed in Pooled Samples. International Journal of Environmental Research and Public Health. 2015; 12(11):14723-14740. https://doi.org/10.3390/ijerph121114723
Chicago/Turabian StyleLyles, Robert H., Dane Van Domelen, Emily M. Mitchell, and Enrique F. Schisterman. 2015. "A Discriminant Function Approach to Adjust for Processing and Measurement Error When a Biomarker is Assayed in Pooled Samples" International Journal of Environmental Research and Public Health 12, no. 11: 14723-14740. https://doi.org/10.3390/ijerph121114723