Bayesian Feature Extraction for Two-Part Latent Variable Model with Polytomous Manifestations
Abstract
:1. Introduction
2. Model Description
2.1. Two-Part Latent Variable Model
2.2. Bayesian Feature Selection
3. Bayesian Inference
3.1. Prior Specification and MCMC Sampling
3.2. MCMC Sampling
- Draw from ;
- Draw from ;
- Draw from ;
- Draw from ;
- Draw from .
4. Simulation Study
5. Chinese Household Financial Survey Data
6. Discussion
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
TPM | Two-part model |
TPLVM | Two-part latent variable model |
SS | Spike and slab bimodal prior |
BaLsso | Bayesian lasso |
MCMC | Markov chain Monte Carlo |
CHFS | Chinese Household Financial Survey |
Appendix A
- Full conditional of
- 2.
- Full conditional of
- 3.
- Full conditional of
- 4.
- Full conditional of
- 5.
- Full conditional of
References
- Deb, P.; Munkin, M.K.; Trivedic, R.K. Bayesian analysis of the two-part model with endogeneity: Application to health care expenditure. J. Appl. Econ. 2006, 21, 1081–1099. [Google Scholar] [CrossRef]
- Cragg, J.G. Some statistical models for limited dependent variables with application to the demand for durable goods. Econometrica 1971, 39, 829–844. [Google Scholar] [CrossRef]
- Neelon, B.; Zhu, L.; Neelon, S.E.B. Bayesian two-part spatial models for semicontinuous data with application to emergency department expenditures. Biostatistics 2015, 16, 465–479. [Google Scholar] [CrossRef]
- Manning, W.G.; Morris, C.N.; Newhouse, J.P.; Orr, L.L.; Duan, N.; Keeler, E.B.; Leibowitz, A. A two-part model of the demand for medical Care: Preliminary results from the health insurance experiment. In Health, Economics, and Health Economics; van der Gaag, J., Perlman, M., Eds.; North-Holland: Amsterdam, The Netherlands, 1991; pp. 103–104. [Google Scholar]
- Su, L.; Tom, B.D.; Farewell, V.T. Bias in 2-part mixed models for longitudinal semi-continuous data. Biostatistics 2009, 10, 374–389. [Google Scholar] [CrossRef] [PubMed]
- Su, L.; Tom, B.D.; Farewell, V.T. A likelihood-based two-part marginal model for longitudinal semi-continuous data. Stat. Methods Med. Res. 2015, 24, 194–205. [Google Scholar] [CrossRef] [PubMed]
- Duan, N.; Manning, W.G.; Morris, C.N.; Newhouse, J.P. A comparison of alternative models for the demand for medical Care. J. Bus. Econ. Stat. 1983, 1, 115–126. [Google Scholar]
- Liu, L.; Cowen, M.E.; Strawderman, R.L.; Shih, Y.C.T. A flexible two-part random effects model for correlated medical costs. J. Health Econ. 2010, 29, 110–123. [Google Scholar] [CrossRef] [PubMed]
- Smith, V.A.; Neelon, B.; Preisser, J.S.; Maciejewski, L. A marginalized two-part model for semicontinuous data. Stat. Med. 2015, 33, 4891–4903. [Google Scholar] [CrossRef] [PubMed]
- Tooze, J.A.; Grunwald, J.K.; Jones, R.H. Analysis of repeated measures data with clumping at zero. Stat. Methods Med. Res. 2002, 11, 341–355. [Google Scholar] [CrossRef] [PubMed]
- Brown, R.A.; Monti, P.M.; Myers, M.G.; Martin, R.A.; Rivinus, T.; Dubreuil, M.E.T.; Rohsenow, D.J. Depression among cocaine abusers in treatment: Relation to cocaine and alcohol use and treatment outcome. Am. J. Psychiat. 1998, 155, 220–225. [Google Scholar] [CrossRef]
- Olsen, M.K.; Schafer, J.L. A two-part random-effects model for semicontinuous longitudinal data. J. Am. Stat. Assoc. 2001, 96, 730–745. [Google Scholar] [CrossRef]
- Xing, D.Y.; Huang, Y.X.; Chen, H.N.; Zhu, Y.L.; Dagen, G.A.; Baldwin, J. Bayesian inference for two-part mixed effects model using skew distributions, with application to longitudinal semi-continuous alcohol data. Stat. Methods Med. Res. 2017, 26, 1838–1853. [Google Scholar] [CrossRef]
- Chen, J.Y.; Zheng, L.Y.; Xia, Y.M. Bayesian analysis for two-part latent variable model with application to fractional data. Commun. Stat. Theory Methods, 2023; preprint. [Google Scholar]
- Kim, Y.; Muthén, B.O. Two-part factor mixture modeling: Application to an aggressive behavior measurement instrument. Struct. Equ. Model. Multidiscip. J. 2009, 16, 602–624. [Google Scholar] [CrossRef] [PubMed]
- Feng, X.; Lu, B.; Song, X.Y.; Ma, S. Financial literacy and household finances: A Bayesian two-part latent variable modeling approach. J. Empir. Financ. 2019, 51, 119–137. [Google Scholar] [CrossRef]
- Xia, Y.M.; Tang, N.S. Bayesian analysis for mixture of latent variable hidden Markov models with multivariate longitudinal data. Comput. Stat. Data Anal. 2019, 132, 190–211. [Google Scholar] [CrossRef]
- Gou, J.W.; Xia, Y.M.; Jiang, D.P. Bayesian analysis of two-part nonlinear latent variable model: Semiparametric method. Stat. Model 2023, 23, 721–741. [Google Scholar] [CrossRef]
- Xiong, S.C.; Xia, Y.M.; Lu, B. Bayesian analysis of two-part latent variable model with mixed data. Commun. Math. Stat. 2023, preprint. [Google Scholar] [CrossRef]
- Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
- Fu, W.J. Penalized regression: The bridge versus the lasso. J. Comput. Graph. Stat. 1998, 7, 109–148. [Google Scholar]
- Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning; Springer: New York, NY, USA, 2009. [Google Scholar]
- Hastie, T.; Tibshirani, R.; Wainwright, M. Statistical Learning with Sparsity—The Lasso and Generalization; CRC Press: New York, NY, USA, 2015. [Google Scholar]
- Kuo, L.; Mallick, B.K. Variable selection for regression models. Sankhyā Indian J. Stat. Ser. B 1998, 60, 65–81. [Google Scholar]
- Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
- Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. B Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef]
- Zou, H. The adaptive Lasso and its oracle properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar] [CrossRef]
- Zhang, W.; Ota, T.; Shridhar, V.; Chien, J.; Wu, B.; Kuang, R. Network based survival analysis reveals subnetwork signatures for predicting outcomes of ovarian cancer treatment. PLoS Comput. Biol. 2013, 9, e1002975. [Google Scholar] [CrossRef]
- Zhao, Q.; Shi, X.J.; Xie, Y.; Huang, J.; Shia, B.C.; Ma, S. Combining multidimensional genomic measurements for predicting cancer prognosis: Observations from TCGA. Brief. Bioinform. 2015, 16, 291–303. [Google Scholar] [CrossRef]
- George, E.I.; McCulloch, R.E. Variable selection via Gibbs sampling. J. Am. Stat. Assoc. 1993, 88, 881–889. [Google Scholar] [CrossRef]
- George, E.I.; McCulloch, R.E. Approaches for Bayesian variable selection. Stat. Sin. 1997, 7, 339–373. [Google Scholar]
- Chipman, H.A. Bayesian variable selection with related predictors. Canad. J. Statist. 1996, 24, 17–36. [Google Scholar] [CrossRef]
- Ishwaran, H.; Rao, J.S. Spike and Slab gene selcetion for multigroup microarray data. J. Am. Stat. Assoc. 2005, 87, 371–390. [Google Scholar]
- Ishwaran, H.; Rao, J.S. Spike and Slab variable selection: Frequentist and Bayesian strageies. Ann. Stat. 2005, 33, 730–773. [Google Scholar] [CrossRef]
- Mitchell, T.J.; Beauchamp, J.J. Bayesian variable selection in linear regression. J. Am. Stat. Assoc. 1988, 83, 1023–1032. [Google Scholar] [CrossRef]
- Rockova, V.; George, E.I. EMVS: The EM approach to Bayesian variable selection. J. Am. Stat. Assoc. 2014, 109, 828–846. [Google Scholar] [CrossRef]
- Tang, Z.X.; Shen, Y.P.; Zhang, X.Y.; Yi, N.J. The Spike-and-Slab Lasso generalized linear modelsfor prediction and associated genes detection. Genetics 2017, 205, 77–88. [Google Scholar] [CrossRef]
- Park, T.; Casella, G. The Bayesian Lasso. J. Am. Stat. Assoc. 2008, 103, 681–686. [Google Scholar] [CrossRef]
- Skrondal, A.; Rabe-Hesketh, S. Generalized Latent Variable Modelling: Multilevel, Longitudinal and Structural Equation Models; Chapman & Hall/CRC: London, UK, 2004. [Google Scholar]
- Bollen, K.A. Structural Equations with Latent Variables; John Wiley & Sons: New York, NY, USA, 1989. [Google Scholar]
- Lee, S.Y. Structural Equation Modeling: A Bayesian Approach; John Wiley & Sons: New York, NY, USA, 2007. [Google Scholar]
- Polson, N.G.; Scott, J.G.; Windle, J. Bayesian inference for logistic models using Polya-Gamma latent variables. J. Am. Stat. Assoc. 2013, 108, 1339–1349. [Google Scholar] [CrossRef]
- Feng, X.; Wang, Y.F.; Lu, B.; Song, X.Y. Bayesian regularized quantile structural equation models. J. Multivar. Anal. 2017, 154, 234–248. [Google Scholar] [CrossRef]
- Anderson, T.W. An Introduction to Multivariate Statistical Analysis; John Wiley & Sons: New York, NY, USA, 1984. [Google Scholar]
- Sha, N.J.; Dechi, B.O. A Bayes inference for ordinal response with latent variable approach. Stats 2019, 2, 321–331. [Google Scholar] [CrossRef]
- Tanner, M.A.; Wong, W.H. The calculation of posterior distributions by data augmentation (with discussion). J. Am. Stat. Assoc. 1987, 82, 528–550. [Google Scholar] [CrossRef]
- Gelfand, A.E.; Smith, A.F.M. Sampling-based approaches to calculating marginal densities. J. Am. Stat. Assoc. 1990, 85, 398–409. [Google Scholar] [CrossRef]
- Geman, S.; Geman, D. Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 1984, PAMI-6, 721–741. [Google Scholar] [CrossRef]
- Gelman, A.; Rubin, D.B. Inference from iterative simulation using multiple sequences (with discussion). Stat. Sci. 1992, 7, 457–511. [Google Scholar] [CrossRef]
- Little, R.J.A.; Rubin, D.B. Statistical Analysis with Missing Data; John Wiley & Sons: New York, NY, USA, 2002. [Google Scholar]
- Song, X.Y.; Lee, S.Y. A tutorial on the Bayesian approach for analyzing structural equation models. J. Math. Psychol. 2012, 56, 135–148. [Google Scholar] [CrossRef]
- Song, X.Y.; Xia, Y.M.; Zhu, H.T. Hidden Markov latent variable models with multivariate longitudinal data. Biometrics 2017, 73, 313–323. [Google Scholar] [CrossRef] [PubMed]
- Devroye, L. Non-Uniform Random Variate Generation; Springer: New York, NY, USA, 1986. [Google Scholar]
- Ross, S.M. A Course in Simulation; MacMillan: New York, NY, USA, 1991. [Google Scholar]
- Chhikara, R.S.; Folks, L. The Inverse Gaussian Distribution: Theory, Methodology, and Applications; Marcel Dekker: New York, NY, USA, 1989. [Google Scholar]
SS | BaLsso | |||||
---|---|---|---|---|---|---|
PAR | BIAS | RMS | SD | BIAS | RMS | SD |
−0.015 | 0.097 | 0.129 | 0.028 | 0.150 | 0.134 | |
−0.056 | 0.143 | 0.142 | −0.152 | 0.217 | 0.136 | |
−0.001 | 0.021 | 0.061 | −0.019 | 0.042 | 0.079 | |
−0.144 | 0.216 | 0.145 | −0.122 | 0.251 | 0.148 | |
0.005 | 0.030 | 0.064 | −0.008 | 0.040 | 0.078 | |
−0.091 | 0.147 | 0.137 | −0.045 | 0.135 | 0.137 | |
0.017 | 0.028 | 0.075 | 0.026 | 0.055 | 0.096 | |
−0.187 | 0.237 | 0.184 | −0.126 | 0.209 | 0.184 | |
0.010 | 0.079 | 0.084 | 0.008 | 0.063 | 0.085 | |
−0.035 | 0.079 | 0.077 | −0.011 | 0.065 | 0.074 | |
0.005 | 0.032 | 0.051 | −0.018 | 0.031 | 0.054 | |
−0.007 | 0.061 | 0.070 | −0.021 | 0.085 | 0.069 | |
−0.007 | 0.029 | 0.049 | −0.003 | 0.031 | 0.053 | |
−0.070 | 0.093 | 0.077 | −0.018 | 0.082 | 0.075 | |
−0.040 | 0.086 | 0.089 | −0.020 | 0.069 | 0.088 | |
−0.011 | 0.033 | 0.062 | 0.014 | 0.036 | 0.069 | |
0.085 | 0.129 | 0.117 | 0.038 | 0.082 | 0.111 | |
0.042 | 0.078 | 0.073 | 0.058 | 0.098 | 0.071 | |
0.030 | 0.072 | 0.071 | 0.034 | 0.063 | 0.072 | |
0.058 | 0.079 | 0.072 | 0.052 | 0.090 | 0.073 | |
0.031 | 0.060 | 0.072 | 0.037 | 0.064 | 0.073 | |
0.014 | 0.041 | 0.074 | 0.018 | 0.058 | 0.076 | |
Total | - | 1.870 | 1.975 | - | 2.016 | 2.035 |
SS | BaLsso | |||||
---|---|---|---|---|---|---|
PAR | BIAS | RMS | SD | BIAS | RMS | SD |
0.052 | 0.096 | 0.087 | 0.009 | 0.092 | 0.087 | |
0.005 | 0.069 | 0.089 | 0.055 | 0.117 | 0.090 | |
0.003 | 0.048 | 0.058 | 0.032 | 0.052 | 0.060 | |
0.007 | 0.086 | 0.093 | −0.045 | 0.076 | 0.091 | |
0.004 | 0.015 | 0.049 | −0.020 | 0.043 | 0.060 | |
0.010 | 0.071 | 0.086 | 0.013 | 0.074 | 0.085 | |
−0.003 | 0.029 | 0.059 | 0.032 | 0.064 | 0.077 | |
0.002 | 0.102 | 0.120 | −0.042 | 0.108 | 0.114 | |
0.017 | 0.042 | 0.053 | 0.030 | 0.056 | 0.054 | |
−0.023 | 0.038 | 0.046 | −0.016 | 0.039 | 0.047 | |
−0.007 | 0.019 | 0.033 | −0.005 | 0.018 | 0.037 | |
−0.028 | 0.060 | 0.042 | −0.014 | 0.026 | 0.043 | |
−0.007 | 0.023 | 0.033 | 0.000 | 0.018 | 0.036 | |
−0.005 | 0.035 | 0.046 | 0.003 | 0.043 | 0.047 | |
−0.031 | 0.058 | 0.053 | −0.039 | 0.063 | 0.054 | |
−0.001 | 0.031 | 0.045 | −0.025 | 0.081 | 0.053 | |
0.018 | 0.049 | 0.068 | 0.041 | 0.053 | 0.071 | |
0.021 | 0.041 | 0.045 | 0.033 | 0.038 | 0.045 | |
0.016 | 0.049 | 0.045 | 0.028 | 0.038 | 0.045 | |
0.032 | 0.049 | 0.045 | 0.054 | 0.057 | 0.045 | |
0.043 | 0.059 | 0.046 | 0.043 | 0.054 | 0.046 | |
0.016 | 0.043 | 0.049 | 0.005 | 0.037 | 0.048 | |
Total | - | 1.112 | 1.290 | - | 1.247 | 1.335 |
SS | BaLsso | |||||
---|---|---|---|---|---|---|
PAR | ||||||
100 | 100 | 100 | 100 | 100 | 100 | |
98 | 96 | 85 | 88 | 86 | 76 | |
100 | 100 | 100 | 100 | 100 | 100 | |
96 | 95 | 86 | 93 | 93 | 85 | |
100 | 100 | 100 | 100 | 100 | 100 | |
96 | 94 | 93 | 97 | 92 | 87 | |
100 | 100 | 100 | 100 | 100 | 100 | |
99 | 100 | 100 | 100 | 100 | 100 | |
100 | 99 | 95 | 100 | 98 | 93 | |
100 | 100 | 100 | 100 | 100 | 100 | |
100 | 100 | 97 | 98 | 100 | 91 | |
100 | 100 | 100 | 100 | 100 | 100 | |
100 | 100 | 100 | 100 | 100 | 100 | |
100 | 98 | 97 | 97 | 96 | 96 |
Variable | Description | Mean | Max | Min | SD |
---|---|---|---|---|---|
Gender () | =1, male; =0, otherwise | 0.756 | 1 | 0 | 0.430 |
Age () | 51.81 | 91 | 19 | 14.931 | |
Marital status () | =1, married; 0, otherwise | 0.863 | 1 | 0 | 0.344 |
Health condition ( | =1, good; 0, otherwise | 0.833 | 1 | 0 | 0.373 |
Educational experience | =1, high school or above; | ||||
=0, otherwise | 0.352 | 1 | 0 | 0.478 | |
Employment () | =1, yes; 0, otherwise | 0.092 | 1 | 0 | 0.290 |
No. of adults () | 3.002 | 3 | 0 | 1.301 | |
Annual Income (CYN) * | 0 |
SS | BaLsso | SS | BaLsso | ||||||
---|---|---|---|---|---|---|---|---|---|
Par | Est. | SD | Est. | SD | Par | Est. | SD | Est. | SD |
−0.835 | 0.078 | −0.838 | 0.080 | 9.782 | 0.152 | 9.670 | 0.125 | ||
0.050 | 0.063 | 0.076 | 0.070 | −0.137 | 0.103 | −0.107 | 0.088 | ||
−0.750 | 0.099 | −0.757 | 0.102 | −0.147 | 0.141 | −0.015 | 0.081 | ||
0.107 | 0.085 | 0.147 | 0.088 | −0.022 | 0.065 | −0.006 | 0.075 | ||
0.428 | 0.062 | 0.072 | 0.070 | −0.019 | 0.060 | −0.029 | 0.069 | ||
0.577 | 0.070 | 0.082 | 0.081 | 0.259 | 0.123 | 0.322 | 0.107 | ||
0.004 | 0.040 | 0.005 | 0.052 | 0.035 | 0.058 | 0.053 | 0.067 | ||
0.118 | 0.079 | 0.130 | 0.079 | 0.043 | 0.072 | 0.281 | 0.113 | ||
0.747 | 0.073 | 0.092 | 0.077 | 0.384 | 0.132 | 0.188 | 0.118 | ||
−0.059 | 0.112 | −0.039 | 0.092 | 1.205 | 0.106 | 1.910 | 0.104 | ||
0.312 | 0.150 | 0.300 | 0.152 | ||||||
−0.791 | 0.062 | −0.714 | 0.057 | ||||||
−0.865 | 0.067 | −0.625 | 0.068 |
Part One | Part Two | |||
---|---|---|---|---|
VAR | SS | BaLsso | SS | BaLsso |
Gender | 0 | 0 | 1 | 1 |
Age | 1 | 1 | 1 | 0 |
Marital status | 1 | 1 | 0 | 0 |
Health condition | 1 | 0 | 0 | 0 |
Education | 1 | 0 | 1 | 1 |
Employment | 0 | 0 | 0 | 0 |
No. of adults | 1 | 1 | 0 | 1 |
Income | 1 | 0 | 1 | 1 |
Family culture | 0 | 0 | 1 | 1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, Q.; Zhang, Y.; Xia, Y. Bayesian Feature Extraction for Two-Part Latent Variable Model with Polytomous Manifestations. Mathematics 2024, 12, 783. https://doi.org/10.3390/math12050783
Zhang Q, Zhang Y, Xia Y. Bayesian Feature Extraction for Two-Part Latent Variable Model with Polytomous Manifestations. Mathematics. 2024; 12(5):783. https://doi.org/10.3390/math12050783
Chicago/Turabian StyleZhang, Qi, Yihui Zhang, and Yemao Xia. 2024. "Bayesian Feature Extraction for Two-Part Latent Variable Model with Polytomous Manifestations" Mathematics 12, no. 5: 783. https://doi.org/10.3390/math12050783
APA StyleZhang, Q., Zhang, Y., & Xia, Y. (2024). Bayesian Feature Extraction for Two-Part Latent Variable Model with Polytomous Manifestations. Mathematics, 12(5), 783. https://doi.org/10.3390/math12050783