Learning Data Heterogeneity with Dirichlet Diffusion Trees
Abstract
1. Introduction
2. Methods: Bayesian Dirichlet Diffusion Trees for Data Heterogeneity
2.1. An Overview of Dirichlet Diffusion Trees
2.2. A Bayesian Latent Tree Model for Characterizing Data Heterogeneity
2.3. Posterior Sampling Using Markov Chain Monte Carlo
Algorithm 1: The MCMC sampler for the latent regression tree model |
3. Simulation Study
3.1. Simulation Setting
3.2. Simulation Results
4. Real Data Application
5. Conclusions
6. Discussion
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Zhang, J.; Stevens, M.F.G.; Bradshaw, T.D. Temozolomide: Mechanisms of action, repair and resistance. Curr. Mol. Pharmacol. 2012, 5, 102–114. [Google Scholar] [CrossRef] [PubMed]
- Just, N. Improving tumour heterogeneity MRI assessment with histograms. Br. J. Cancer 2014, 111, 2205–2213. [Google Scholar] [CrossRef] [PubMed]
- Sachdeva, J.; Kumar, V.; Gupta, I.; Khandelwal, N.; Ahuja, C.K. A novel content-based active contour model for brain tumor segmentation. Magn. Reson. Imaging 2012, 30, 694–715. [Google Scholar] [CrossRef] [PubMed]
- Zhou, M.; Hall, L.O.; Goldgof, D.B.; Gillies, R.J.; Gatenby, R.A. Survival time prediction of patients with glioblastoma multiforme tumors using spatial distance measurement. In Proceedings of the Medical Imaging 2013: Computer-Aided Diagnosis. International Society for Optics and Photonics, Lake Buena Vista, FL, USA, 12–14 February 2013; Volume 8670, p. 86702O. [Google Scholar]
- Bharath, K.; Kurtek, S.; Rao, A.; Baladandayuthapani, V. Radiologic image-based statistical shape analysis of brain tumours. J. R. Stat. Soc. Ser. C Appl. Stat. 2018, 67, 1357–1378. [Google Scholar] [CrossRef] [PubMed]
- Yang, H.; Baladandayuthapani, V.; Rao, A.U.K.; Morris, J.S. Quantile function on scalar regression analysis for distributional data. J. Am. Stat. Assoc. 2020, 115, 90–106. [Google Scholar] [CrossRef] [PubMed]
- Poursaeed, R.; Mohammadzadeh, M.; Safaei, A.A. Survival prediction of glioblastoma patients using machine learning and deep learning: A systematic review. BMC Cancer 2024, 24, 1581. [Google Scholar] [CrossRef] [PubMed]
- Felsenstein, J. Statistical inference of phylogenies. J. R. Stat. Soc. Ser. A (General) 1983, 146, 246–262. [Google Scholar] [CrossRef]
- Teh, Y.W.; Daume, H., III; Roy, D.M. Bayesian agglomerative clustering with coalescents. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 3–6 December 2007; pp. 1473–1480. [Google Scholar]
- Hu, Y.; Ying, J.L.; Daume, H., III; Ying, Z.I. Binary to bushy: Bayesian hierarchical clustering with the Beta coalescent. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 5–10 December 2013; pp. 1079–1087. [Google Scholar]
- Yang, Z.; Rannala, B. Bayesian phylogenetic inference using DNA sequences: A Markov Chain Monte Carlo method. Mol. Biol. Evol. 1997, 14, 717–724. [Google Scholar] [CrossRef] [PubMed]
- Mau, B.; Newton, M.A.; Larget, B. Bayesian phylogenetic inference via Markov chain Monte Carlo methods. Biometrics 1999, 55, 1–12. [Google Scholar] [CrossRef] [PubMed]
- Huelsenbeck, J.P.; Ronquist, F. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 2001, 17, 754–755. [Google Scholar] [CrossRef] [PubMed]
- Chipman, H.A.; George, E.I.; McCulloch, R.E. Bayesian CART model search. J. Am. Stat. Assoc. 1998, 93, 935–948. [Google Scholar] [CrossRef]
- Gramacy, R.B.; Lee, H.K.H. Bayesian treed Gaussian process models with an application to computer modeling. J. Am. Stat. Assoc. 2008, 103, 1119–1130. [Google Scholar] [CrossRef]
- Aldous, D. Probability distributions on cladograms. In Random Discrete Structures; Springer: Berlin/Heidelberg, Germany, 1996; pp. 1–18. [Google Scholar]
- Neal, R.M. Density modeling and clustering using Dirichlet diffusion trees. Bayesian Stat. 2003, 9, 619–629. [Google Scholar]
- Knowles, D.A.; Ghahramani, Z. Pitman-Yor diffusion trees. arXiv 2011, arXiv:1106.2494. [Google Scholar]
- Ghahramani, Z.; Jordan, M.I.; Adams, R.P. Tree-structured stick breaking for hierarchical data. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 6–11 December 2010; pp. 19–27. [Google Scholar]
- Vikram, S.; Dasgupta, S. Interactive bayesian hierarchical clustering. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 2081–2090. [Google Scholar]
- Kingman, J.F.C. The coalescent. Stoch. Process. Their Appl. 1982, 13, 235–248. [Google Scholar] [CrossRef]
- Neal, R.M. Defining Priors for Distributions Using Dirichlet Diffusion Trees; Technical Report; University of Toronto: Toronto, ON, Canada, 2001. [Google Scholar]
- Knowles, D.A.; Van Gael, J.; Ghahramani, Z. Message Passing Algorithms for the Dirichlet Diffusion Tree. In Proceedings of the International Conference on Machine Learning, Bellevue, WA, USA, 28 June–2 July 2011; pp. 721–728. [Google Scholar]
- Schuhmacher, C.; Meier, J.S.; von der Heide, B.E. Transport: Computation of Optimal Transport Plans and Wasserstein Distances, R package version 0.13-2; R Foundation for Statistical Computing: Vienna, Austria, 2022.
- Bharath, K.; Kambadur, P.; Dey, D.K.; Rao, A.; Baladandayuthapani, V. Statistical tests for large tree-structured data. J. Am. Stat. Assoc. 2017, 112, 1733–1743. [Google Scholar] [CrossRef] [PubMed][Green Version]
- Geyer, C.J.; Thompson, E.A. Annealing Markov chain Monte Carlo with applications to ancestral inference. J. Am. Stat. Assoc. 1995, 90, 909–920. [Google Scholar] [CrossRef]
- Zhang, C.; Matsen, F.A., IV. A Variational Approach to Bayesian Phylogenetic Inference. J. Mach. Learn. Res. 2024, 25, 1–56. [Google Scholar]
Parameter | Quantile | ||
---|---|---|---|
2.5% | 50% | 97.5% | |
0.6386 | 0.8041 | 1.2013 | |
−2.2270 | −1.6980 | −1.4199 | |
0.2236 | 0.7631 | 0.9492 | |
0.6462 | 0.9963 | 0.9998 |
Parameter | Quantile | ||
---|---|---|---|
2.5% | 50% | 97.5% | |
0.3222 | 0.5282 | 0.8427 | |
0.2143 | 0.4721 | 0.9579 | |
0.4658 | 0.8439 | 0.9790 | |
0.5138 | 0.9019 | 0.9927 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Huo, S.; Zhu, H. Learning Data Heterogeneity with Dirichlet Diffusion Trees. Mathematics 2025, 13, 2568. https://doi.org/10.3390/math13162568
Huo S, Zhu H. Learning Data Heterogeneity with Dirichlet Diffusion Trees. Mathematics. 2025; 13(16):2568. https://doi.org/10.3390/math13162568
Chicago/Turabian StyleHuo, Shuning, and Hongxiao Zhu. 2025. "Learning Data Heterogeneity with Dirichlet Diffusion Trees" Mathematics 13, no. 16: 2568. https://doi.org/10.3390/math13162568
APA StyleHuo, S., & Zhu, H. (2025). Learning Data Heterogeneity with Dirichlet Diffusion Trees. Mathematics, 13(16), 2568. https://doi.org/10.3390/math13162568