Advanced Approach for Distributions Parameters Learning in Bayesian Networks with Gaussian Mixture Models and Discriminative Models
Abstract
1. Introduction
2. Problem Statement
3. Gaussian Mixture Model in Parameter Learning
3.1. Gaussian Mixture Model
3.2. Gaussian Mixture Regression
3.3. Algorithm for Parameter Learning and Inference with GMM
Algorithm 1 Structure and Parameters learning for continuous nodes in BNs with GMM |
|
Algorithm 2 Inference for continuous nodes in BNs with GMM |
|
4. Classification Models in Parameters Learning
4.1. Logistic Regression
4.2. Linear Discriminant Analysis (LDA)
Algorithm 3 Parameter learning for discrete nodes in BNs with a classifier |
|
Algorithm 4 Inference for discrete nodes in BNs with classifier |
|
4.3. K-Nearest Neighbors (kNN)
4.4. Decision Tree
4.5. Random Forest
5. Experiments
5.1. Synthetic Data
5.1.1. Parameters Learning on Different Kinds of GMM
- For each combination of parameters, a data sample was generated from the mixture;
- The network parameters were trained on the sample;
- Then a sample was generated from the Bayesian network, and the mixture parameters were trained on it;
- The mixtures were compared, and the divergence between them was calculated.
- For each combination of parameters, a sample was generated from the mixture;
- The sample was divided into train and test in a ratio of 90 to 10%;
- The network parameters were trained on the train in two ways—based on one Gaussian distribution and based on GMM;
- For each observation in the test, the value in each node was removed sequentially;
- The deleted value in a node was restored as an average value after sampling in this node;
- The root mean square error (RMSE) of restoration was calculated for two approaches—based on one Gaussian distribution and based on mixtures.
5.1.2. Parameters Learning with Classifiers from a GMM Perspective
- For each combination of parameters, a data sample was generated from the mixture;
- OLR was calculated on the true parameters of the mixtures;
- The sample was divided into train and test in a ratio of 90 to 10%;
- The network parameters of two simple networks from an edge oriented one way or the other were trained on training sample by the classical method without the use of mixtures;
- For each observation in the test, the value in each node was removed sequentially;
- The deleted value in a continuous node was restored as an average value after sampling in this node, and the discrete value as the mode;
- RMSE and restoration accuracy were calculated for two approaches—based on an edge from the discrete node to the continuous node with a conditional Gaussian model and an inverted direction with a classifier from the list above.
5.2. Real Data
5.2.1. Using GMM for Parameters Learning on Real Data
- Structural learning;
- Then, we take a test sample, sequentially delete the values in continuous nodes and restore them using sampling from the network in two ways—based on one Gaussian distribution and based on mixtures;
- Calculate the quality of restoration as RMSE.
5.2.2. Using Classifiers on Real Data
- Structural learning with and without prohibition for four different score-functions (LL, BIC, AIC and MI);
- Parameter learning, and if there is no prohibition then for fully discretised data, where continuous data are replaced by five bins of discretisation, and for all available classifiers;
- For each discrete parameter and each sample in a dataset, we remove value in the node and recover it by selecting the most frequent value from a conditional distribution estimated and described using a table or classifier;
- Calculation of the quality of recovering as accuracy.
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Kalinina, A.; Spada, M.; Burgherr, P. Application of a Bayesian hierarchical modeling for risk assessment of accidents at hydropower dams. Saf. Sci. 2018, 110, 164–177. [Google Scholar] [CrossRef]
- Gehl, P.; D’ayala, D. Development of Bayesian Networks for the multi-hazard fragility assessment of bridge systems. Struct. Saf. 2016, 60, 37–46. [Google Scholar] [CrossRef]
- Afenyo, M.; Khan, F.; Veitch, B.; Yang, M. Arctic shipping accident scenario analysis using Bayesian Network approach. Ocean. Eng. 2017, 133, 224–230. [Google Scholar] [CrossRef]
- Koller, D.; Friedman, N. Probabilistic Graphical Models: Principles and Techniques; MIT Press: Cambridge, MA, USA, 2009. [Google Scholar]
- Neapolitan, R.E. Learning Bayesian Networks; Pearson Prentice Hall: Upper Saddle River, NJ, USA, 2004; Volume 38. [Google Scholar]
- Deeva, I.; Bubnova, A.; Andriushchenko, P.; Voskresenskiy, A.; Bukhanov, N.; Nikitin, N.O.; Kalyuzhnaya, A.V. Oil and Gas Reservoirs Parameters Analysis Using Mixed Learning of Bayesian Networks. In Proceedings of the International Conference on Computational Science, Krakow, Poland, 16–18 June 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 394–407. [Google Scholar]
- Cooper, G.F.; Herskovits, E. A Bayesian method for the induction of probabilistic networks from data. Mach. Learn. 1992, 9, 309–347. [Google Scholar] [CrossRef]
- Scutari, M. An empirical-Bayes score for discrete Bayesian networks. In Proceedings of the Conference on Probabilistic Graphical Models, Lugano, Switzerland, 6–9 September 2016; PMLR: Lugano, Switzerland, 2016; pp. 438–448. [Google Scholar]
- Saputro, D.R.S.; Widyaningsih, P.; Handayani, F.; Kurdhi, N.A. Prior and posterior dirichlet distributions on bayesian networks (BNs). AIP Conf. Proc. 2017, 1827, 20036. [Google Scholar]
- Dolera, E.; Favaro, S. Rates of convergence in de Finetti’s representation theorem, and Hausdorff moment problem. Bernoulli 2020, 26, 1294–1322. [Google Scholar] [CrossRef]
- Lauritzen, S.L. Graphical Models; Clarendon Press: Wotton-Under-Edge, UK, 1996; Volume 17. [Google Scholar]
- Yin, J.; Li, H. A sparse conditional Gaussian graphical model for analysis of genetical genomics data. Ann. Appl. Stat. 2011, 5, 2630. [Google Scholar] [CrossRef]
- Lerner, U.N. Hybrid Bayesian Networks for Reasoning about Complex Systems; Stanford University: Stanford, CA, USA, 2003. [Google Scholar]
- Langseth, H.; Nielsen, T.D.; Rumí, R.; Salmerón, A. Inference in hybrid Bayesian networks. Reliab. Eng. Syst. Saf. 2009, 94, 1499–1509. [Google Scholar] [CrossRef]
- Bishop, C.; Spiegelhalter, D.; Winn, J. VIBES: A variational inference engine for Bayesian networks. Adv. Neural Inf. Process. Syst. 2002, 15, 777–784. [Google Scholar]
- Hanea, A.; Napoles, O.M.; Ababei, D. Non-parametric Bayesian networks: Improving theory and reviewing applications. Reliab. Eng. Syst. Saf. 2015, 144, 265–284. [Google Scholar] [CrossRef]
- Ghosh, A.; Ahmed, S.; Khan, F.; Rusli, R. Process safety assessment considering multivariate non-linear dependence among process variables. Process. Saf. Environ. Prot. 2020, 135, 70–80. [Google Scholar] [CrossRef]
- Astudillo, R.; Frazier, P. Bayesian optimization of function networks. Adv. Neural Inf. Process. Syst. 2021, 34, 14463–14475. [Google Scholar]
- Monti, S.; Cooper, G.F. Learning hybrid Bayesian networks from data. In Learning in Graphical Models; Springer: Berlin/Heidelberg, Germany, 1998; pp. 521–540. [Google Scholar]
- Liu, H. Bayesian Networks and Gaussian Mixture Models in Multi-Dimensional Data Analysis with Application to Religion-Conflict Data. Ph.D. Thesis, Arizona State University, Tempe, AZ, USA, 2012. [Google Scholar]
- Roos, J.; Bonnevay, S.; Gavin, G. Dynamic Bayesian networks with Gaussian mixture models for short-term passenger flow forecasting. In Proceedings of the 2017 12th International Conference on Intelligent Systems and Knowledge Engineering (ISKE), Nanjing, China, 24–26 November 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–8. [Google Scholar]
- Cobb, B.R.; Shenoy, P.P. Inference in hybrid Bayesian networks with mixtures of truncated exponentials. Int. J. Approx. Reason. 2006, 41, 257–286. [Google Scholar] [CrossRef]
- Shenoy, P.P.; West, J.C. Inference in hybrid Bayesian networks using mixtures of polynomials. Int. J. Approx. Reason. 2011, 52, 641–657. [Google Scholar] [CrossRef]
- Rijmen, F. Bayesian networks with a logistic regression model for the conditional probabilities. Int. J. Approx. Reason. 2008, 48, 659–666. [Google Scholar] [CrossRef]
- Sierra, B.; Lazkano, E.; Martínez-Otzeta, J.M.; Astigarraga, A. Combining Bayesian Networks, k Nearest Neighbours Algorithm and Attribute Selection for Gene Expression Data Analysis. In Proceedings of the Australasian Joint Conference on Artificial Intelligence, Cairns, Australia, 4–6 December 2004; Springer: Berlin/Heidelberg, Germany, 2004; pp. 86–97. [Google Scholar]
- McLachlan, G.J.; Rathnayake, S. On the number of components in a Gaussian mixture model. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2014, 4, 341–355. [Google Scholar] [CrossRef]
- Verbeek, J.J.; Vlassis, N.; Kröse, B. Efficient greedy learning of Gaussian mixture models. Neural Comput. 2003, 15, 469–485. [Google Scholar] [CrossRef]
- Nasios, N.; Bors, A.G. Variational learning for Gaussian mixture models. IEEE Trans. Syst. Man Cybern. B 2006, 36, 849–862. [Google Scholar] [CrossRef]
- Pernkopf, F.; Bouchaffra, D. Genetic-based EM algorithm for learning Gaussian mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1344–1348. [Google Scholar] [CrossRef]
- Cohn, D.A.; Ghahramani, Z.; Jordan, M.I. Active learning with statistical models. J. Artif. Intell. Res. 1996, 4, 129–145. [Google Scholar] [CrossRef]
- Bubnova, A.V.; Deeva, I.; Kalyuzhnaya, A.V. MIxBN: Library for learning Bayesian networks from mixed data. arXiv 2021, arXiv:2106.13194. [Google Scholar] [CrossRef]
- BAMT. Repository Experiments and Data. Available online: https://github.com/ITMO-NSS-team/BAMT.git (accessed on 1 February 2021).
- Tipping, M.E. Deriving Cluster Analytic Distance Functions from Gaussian Mixture Models. In Proceedings of the 1999 Ninth International Conference on Artificial Neural Networks ICANN 99, Edinburgh, UK, 7–10 September 1999; Volume 2, pp. 815–820. [Google Scholar]
- Sun, H.; Wang, S. Measuring the component overlapping in the Gaussian mixture model. Data Min. Knowl. Discov. 2011, 23, 479–502. [Google Scholar] [CrossRef]
- Menéndez, M.; Pardo, J.; Pardo, L.; Pardo, M. The jensen-shannon divergence. J. Frankl. Inst. 1997, 334, 307–318. [Google Scholar] [CrossRef]
- Zhang, K.; Peters, J.; Janzing, D.; Schölkopf, B. Kernel-based conditional independence test and application in causal discovery. arXiv 2012, arXiv:1202.3775. [Google Scholar]
- Agresti, A.; Min, Y. Effects and non-effects of paired identical observations in comparing proportions with binary matched-pairs data. Stat. Med. 2004, 23, 65–75. [Google Scholar] [CrossRef]
Node | LL | BIC | ||||
---|---|---|---|---|---|---|
GMM | Gauss | Discrete | GMM | Gauss | Discrete | |
Gross | 438.01 | 492.7 | 1137.1 | 409.1 | 441.7 | 879.2 |
Netpay | 85.1 | 93.4 | 278 | 88.1 | 90.4 | 172.6 |
Porosity | 7.1 | 7.1 | 13.9 | 5.8 | 5.9 | 8.85 |
Permeability | 990.5 | 1103.2 | 2356.7 | 1058 | 1117.2 | 2450.2 |
Depth | 1058.8 | 1063.1 | 1191.1 | 990.7 | 993.1 | 1110.3 |
Node | AIC | MI | ||||
GMM | Gauss | Discrete | GMM | Gauss | Discrete | |
Gross | 436.5 | 492.3 | 853.4 | 409.1 | 441.5 | 1137.03 |
Netpay | 92.1 | 91.9 | 172.4 | 87.4 | 90.4 | 277.9 |
Porosity | 5.8 | 5.9 | 9.08 | 5.9 | 5.8 | 13.9 |
Permeability | 989.4 | 1103.7 | 1786.3 | 1038.7 | 1114.5 | 2356.7 |
Depth | 1035.4 | 1034.5 | 1082.9 | 1035.1 | 1033.7 | 1187.5 |
Node | LL | BIC | ||||
---|---|---|---|---|---|---|
GMM | Gauss | Discrete | GMM | Gauss | Discrete | |
Mean_tr | 6194 | 6722 | 24,502 | 6219.5 | 6787.1 | 23,871.1 |
Median_tr | 5555.8 | 6148.4 | 23,821.7 | 5559.5 | 6182.9 | 24,876.1 |
Tr_per_month | 22.6 | 22.7 | 116.8 | 21.2 | 22.3 | 114.3 |
Node | AIC | MI | ||||
GMM | Gauss | Discrete | GMM | Gauss | Discrete | |
Mean_tr | 6225.1 | 6793.1 | 24,501.2 | 6213.5 | 6710.6 | 24,510.3 |
Median_tr | 5569 | 6178.8 | 24,123.5 | 5594 | 6143.2 | 23,728.6 |
Tr_per_month | 22.7 | 22.6 | 114.8 | 22.3 | 22.4 | 117.1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Deeva, I.; Bubnova, A.; Kalyuzhnaya, A.V. Advanced Approach for Distributions Parameters Learning in Bayesian Networks with Gaussian Mixture Models and Discriminative Models. Mathematics 2023, 11, 343. https://doi.org/10.3390/math11020343
Deeva I, Bubnova A, Kalyuzhnaya AV. Advanced Approach for Distributions Parameters Learning in Bayesian Networks with Gaussian Mixture Models and Discriminative Models. Mathematics. 2023; 11(2):343. https://doi.org/10.3390/math11020343
Chicago/Turabian StyleDeeva, Irina, Anna Bubnova, and Anna V. Kalyuzhnaya. 2023. "Advanced Approach for Distributions Parameters Learning in Bayesian Networks with Gaussian Mixture Models and Discriminative Models" Mathematics 11, no. 2: 343. https://doi.org/10.3390/math11020343
APA StyleDeeva, I., Bubnova, A., & Kalyuzhnaya, A. V. (2023). Advanced Approach for Distributions Parameters Learning in Bayesian Networks with Gaussian Mixture Models and Discriminative Models. Mathematics, 11(2), 343. https://doi.org/10.3390/math11020343